Friday, April 29, 2011

Will Dot Net Framework 4.0 create significant change in parallel programming?

From many years, Since the birth of multi-core computing, there has been a need for parallel-programming architecture. But now, the multi-core computing has become the prevailing paradigm in computer architecture due to significant improvement in the trend of multi core-processors.

Very recently, Microsoft released the beta versions of dot Net Framework 4.0 and Visual Studio 2010. The all eyes fell on.NET 4, yet the labels boasted the advent of parallel-programming.

The next question, yet to be answered is, whether there are any advantages (more specifically towards aiding normal traditional programmer and performance regarding multi-threading) on migrating from existing APIs?

My thoughts about dot net 4.0,..

1) Multi-Core processing ability:
Visual Studio along with dot net 4.0 now has significantly improved the Parallel Extensions, which might help migrating normal programmers onto multi-core computing(distributed computing). The way Microsoft has organised the support for the Framework is that, it has four broad areas like library, LINQ, data structures and diagonastic tools..NET 4's peers and predecessors lacked the multi-core operable ability. The main criteria like communication and synchronization of sub-tasks were considered as the biggest obstacles in getting a good parallel program performance. But in Dot Net 4.0, it has promising parallel library technology enables developers to define simultaneous, asynchronous tasks without having to work with threads, locks, or the thread pool and without having all other burdens. This helps in programmer concentrating more on the logic of the application rather than spending most of his/her time on those tasks listed previously.

2)Complete support for multiple programming languages and compilers:
Apart from VB & C# languages,.NET 4 establishes full support for programming languages like Ironpython, Ironruby, F# and other similar.NET compilers. Unlike 3.5, it encompasses both functional-programming and imperative object-oriented programming.

3)Parallel-diagnostics:
Unlike Visual Studio 2008, the new Visual Studio 2010 supports debugging and profiling, extensively. This helps programmer in debugging, hence this was very much expected improvement in Visual Studio 2010 from most users of Visual Studio 2008 programmers. The new profiling tools provides various data views which displays graphical, tabular and numerical information about how a parallel or multiple-threaded application interacts with itself and with other programs, thus programmer can now know how the application is actually working. The results enable developers to quickly identify areas of concern, and helps in navigating from points on the displays to call stacks & source codes. 

4)Dynamic language runtime:
Addition of the dynamic language runtime (DLR) is a boon to.NET beginners. Using this new DLR runtime environment, developers can add a set of services for dynamic languages to the CLR. In addition to that, the DLR makes it simpler and easier to develop dynamic languages and to add dynamic features to statically typed languages. A new System Dynamic name space has been added to the.NET Framework on supporting the DLR and several new classes supporting the.NET Framework infrastructure are added to the System

5)Runtime Compiler Services.
Anyway, the new DLR provides the following advantages to developers:
Developers can use rapid feedback loop which lets them enter various statements and execute them to see the results almost immediately.
Support for both top-down and more traditional bottom-up development. For instance, when a developer uses a top-down approach, he can call-out functions that are not yet implemented and then add them when needed.
Easier refactoring and code modifications (Developers do not have to change static type declarations throughout the code)

If you think only parallel programming abilities and promising capabilities make the MS.NET 4.0 a more promising next generation programming tool, think again! That's not all. There are also a number of enhancements to the Base Libraries for things like collections, reflection, data structures, handling, threading and lots of new features for the web.Apart from Microsoft, there are other Open Source communities working continuously towards bringing the make-over in current traditional(non-parallel) programming trend.

Tuesday, March 15, 2011

General Structure of Open MP statements and all Clauses

So from now on lets stop Theory and get into some practical stuffs..!!!!

Open MP provides various Directives, Run-time routines, Environment variables to help us achieve parallelism in our application. So from now on lets begin our journey on Open MP.

First lets look at general structure of any Open MP statements :

       
    #pragma omp directive-name [clause, ...] newline
    Required for all OpenMP C/C++ directives.A valid OpenMP directive. Must appear after the #pragma omp and before any
    clauses.

    Optional. Clauses can be in any order, and repeated as necessary unless otherwise restricted.Required. Precedes the structured block which is enclosed by this directive.

Clauses :

Usually these clauses are often referred as Data-Sharing or Data-Scope Clauses. These clauses are very important while dealing with Open-MP, because these clauses are used to specify How the data is shared among the threads? Which data is private to specific thread? Which aren't?..

(Following requires prerequisite-thread, If you are not familiar with "threads" please get an overview about it here.)


Here is the list of clauses supported by most OpenMP,

                1. "private" Clause
                2. "shared" Clause
                3. "default" Clause
                4. "firstprivate" Clause
                5. "lastprivate" Clause
                6. "reduction" Clause




Clause 1 : "private" 

                   Format of this clause =>        private (list)


                  The "private"clause declares variables in its list to be private to each thread. You can specify a list of variables which you need to make it private variable for each thread. Once we specify private clause, actual work going on behind the scene is explained as follows,

  • A new object(variable) of the same data type is declared once for each thread in the team of threads to which belong.
  • All references(pointers) to the original object are replaced with references to the new object(variable) of the corresponding threads new variable(which is created in previous step)
  • Variables specified in "private" clause should be assumed to be uninitialized for each thread
  • E.g.
          int tid;
          #pragma omp parallel private(tid)
          {
                     tid = omp_get_thread_num();
                     printf("Thread ID  : %d:\n",tid);                                                    
         } /* end of parallel section */










Clause 2 : "shared"

                   Format of this clause =>        shared (list)

  • The "shared" clause declares variables in its list to be shared among all threads in the team.
  • This is one way which OpenMP allows us to share the data among multiple threads, hence achieve inter-thread communication by modifying the shared data.
  • E.g,
          int tid;
          #pragma omp parallel shared(tid)
          {
                     tid = omp_get_thread_num();
                     printf("Thread ID  : %d:\n",tid);                                                    
         } /* end of parallel section */











Clause 3 : "default"

                   Format of this clause =>        default(shared | none)

  • The "default"clause allows the user to specify a default scope for all variables in any of the parallel region, up-till the next "default" clause is encountered.
  • Specific variables can be exempted from the default using the "private", "shared", "firstprivate", "lastprivate" and "reduction" clauses.
  • Most of the C/C++ OpenMP specification allow only shared or none as a possible default, it does not allow "private" or "firstprivate" as a possible default.
  • Using none as a default requires that the programmer explicitly scope all variables.
  • E.g,
          int tid;
          #pragma omp parallel default(shared)
                      // some statements... 
          #pragma omp parallel
          {
                     tid = omp_get_thread_num(); /*here tid has default scope, which is set to "shared"*/
                     printf("Thread ID  : %d:\n",tid);                                                    
         } /* end of parallel section */













Clause 4 : "firstprivate"

                   Format of this clause =>        firstprivate(list)


  • The "firstprivate" clause combines the behavior of the "private" clause with automatic initialization of the variables in its list.
  • The variables in the field list are initialized to their respective values prior(before) to entry into the parallel or work-sharing construct
  • E.g,
          int tid = 10;
          #pragma omp parallel firstprivate(tid)
          {
                     printf("Thread ID  : %d:\n",tid);                                               
                     tid = omp_get_thread_num();
                     printf("Thread ID  : %d:\n",tid);                                                    
         } /* end of parallel section */












Clause 5 : "lastprivate"

                   Format of this clause =>        lastprivate(list)

  • The "lastprivate" clause combines the behavior of the "private" clause with a copy from the last loop iteration or section to the original variable object.
  • The value copied back into the original variable object is obtained from the last (sequentially) iteration or section of the enclosing construct.
  • For example, the team member which executes the final iteration for a DO section, or the team member which does the last SECTION of a SECTIONS context performs the copy with its own values.
  • This clause is not used most of the time, so if you don't get understand this clause don't worry

Clause 6 : "reduction"


                   Format of this clause =>        reduction(operator : list)


  • The "reduction" clause performs a reduction on the variables that appear in its list.
  • A private copy for each variable in the list is created for each thread. At the end of the parallel section  the reduction operation is applied to all copies of the variable in the list, and the final result is written back to the actual variable shared by other team members.
  • E.g,
          int x = 0, i;
          #pragma omp parallel for default(shared) private(i) reduction(+ : x)
          {
                      for(i = 1; i <= 10; i++)
                             x = x + 1;
          } /* end of parallel section, now the value of x is added back to actual variable shared by other team members */










    Saturday, January 15, 2011

    Other sites based on multicore programming,..

    You can get some more ideas by visiting following sites..

    Home

    This is a dedicated blog only for Multi-Core Programming and OpenMP.Here discussions will be on Multi-Core technology and how to improve efficiency of system using Multi-Core programming.

    This is my first blog, hope it will be nice, cool and useful for everyone who are interested to learn multi-core programming and OpenMP,etc..

    Here, as i said will be posting some information on multi-core programming and complete tutorial on OpenMP. I will begin from basics of OpenMP and proceed towards to advanced topics. To say in one line according to me - "OpenMP is very useful and easy to learn",  to get an overview its enough if you go through all of my current posts and some more to come..

    My sincere request to one and all is, please feel free to leave a comment as to how you felt about my first blog.. Any and/or all advice is always welcome, please do reply via comments..

    At any point of time you can refer to index present at top of the page, so that you get an overview of all topics covered so far.

    Threads and Parallel programming model



    what is a thread?
    • Informally said, a thread consists of a series of instructions with its own program counter(PC) and state
    • A multi-threaded program executes multiple threads in parallel
    • This is the core concept to increase the utilization of CPU, therefore performance of system is increased.(if you are not convinced with this statement, have a glance at this page  click here)
    Multiple Threads
     
    Parallel programming models : 

    There are various programming models present, some of the well-known are listed below. They can be categorized into two types,
    1. Distributed Memory
    2. Shared Memory
    Some models which come under Distributed Memory are : Sockets, PVM(Parallel Virtual Machine), MPI(Message Passing Interface), etc..

    Some models which come under Shared Memory are : OpenMP, posix thread, Automatic parallelization, etc,..

    I will discuss in one or two sentence about one each from above two categories,

    MPI (Message Passing Interface) : 
    • Is an example for Distributed memory model
    • All data is private
    • All threads have access to their own private memory
    • Data is shared explicitly by exchanging buffers
    • Data transfer has to be programmed explicitly
    • An overview is given below in the figure,
    Distributed Memory Model - MPI(Message passing Interface)

    OpenMP :  (Open Multi-Processing)
    • Is an example for Shared Memory Model
    • Data can be shared or private
    • All threads can have access to the same or globally shared memory
    • private data can be accessed only by the thread that owns it
    • Easy to learn and implement
    • it is as shown below

    Shared Memory Model - Open MP

    Note : In this blog i will only be discussing about OpenMP and other models on shared memory models, sorry to say no to MPI. So once you are through with this post you can get on to next post which discusses more about OpenMP.

    Introduction to Multi-Core Technology

    Before we get into,..
    • In computing, a processor is the unit that reads and executes program instructions and provides expected output 
    • Clock speed plays major role in determining the speed of the processors 
    • Initially it was started at 20MHz and later gone up to 3.8GHz 
    • Later the situation was such that they could not increase the clock speed. So they researched about other aspects where they could improve the speed of processing 
    • So they came up with MULTI-CORE  TECHNOLOGY.

    Cores and Multi-Cores  
    • Initially processors were originally developed with only one core 
    • The core is the part of the processor that actually performs the reading and executing of the instruction
    Time efficiency with usage of multi-core processors

    • Single-core processors can only process one instruction at a time
    • A multi-core processor is a processing system composed of two or more independent cores. It is an integrated circuit(IC) to which two or more individual processors (called cores in this sense) have been attached
    • So if a system has more cores, the more strength it has.
    • One best example you could think off is, just imagine what if you had 4 hands instead of two..? that means you can work faster.. :)
    • But there is certain limit on it, which i will dicuss in future posts.
    • But more number of cores does not mean that performance of the system will always be better than before.
    • That is because suppose if you have 4 hands but your mind is not able to utilize the 4 hands efficiently, so whats the use of having four hands then..?
    • It is the same case in today's Computer Industry. Hardware improvement isn't just enough, there has to be change in software trend to improve efficient usage of the multi-core technology.
    • Always remember the basic thing about computer
    Hardware + Software = Computer
    • Of-course that's not exactly accurate in today's trend, i mean there is many more additions to the above like firmware,etc.. But the above equation is the very basic one.

    Other Issues :  
    • Currently computer market has hardware that will only run at full speed with parallel programs. But unfortunately number of programmers  who write such supportable and efficient programs are very less…
    • Usually Parallel computing requires,the problems to be decomposed into sub-problems that can be safely solved at the same time(concurrently)
    • The programmer structures the code and data to solve these sub-problems concurrently.
    • There are various tools that help the programmer achieve this, which i will discuss in next post,click here to go to next post.


    What is OpenMP? Why OpenMP?



    What is OpenMP?
    • OpenMP is an API (Application Programming Interface)
    • OpenMP Supports shared memory model for parallel programming
    • OpenMP can be used along with languages like C, C++, FORTRAN, etc..
    • OpenMP works on Microsoft Windows platform and also on all Unix flavors
    • Therefore, OpenMP is more flexible
    OpenMP consists of vast set of compiler directives, library routines, and environment variables that influence run-time behavior. To conclude, OpenMP is a API used to make your application run in parallel, so that CPU utilization is more hence improving time efficiency.(if you aren't convinced, plz have a look at this)



    Why do we need OpenMP?

    Why do we need OpenMP? why not any other alternatives(for other alternatives click here). This seems to be a BIG question, a million dollar $$ question. But the answer varies from person to person based on personal point of view on OpenMP.

    But few common reasons for using OpenMP other than similar alternatives, are listed below :
    • First, OpenMP is easy to learn and implement on a fresh projects
    • Secondly, Minimum code change required to alter existing applications


    Compilers :  

    Below is the list of compilers which can be used to compile programs which uses OpenMP directives,libraries,..

    and many more,.. I recommend you to download and use Intel Parallel Studio along with visual studio 2008 or higher version, because it has nice user interface, has IntelliSense and easy to debug.