« Algorithmic Complexity | Main | Looking thru microscope - performance! »

Concurrent Programming!

 

Hmm, we can't really get by without it! If so, then what we gain from it?

Because of multi-core and multi-processors, we can't just have single processor sequential programming to achieve better performance without concurrency. The idea behind it is to have each processor do a part of the work to finish up the total work rapidly. Two heads is better than one! Four is better than two!. And so on...

 Not necessarily! Some work could be inherently sequential, some could be purely parallel, and some other could be in between the two extreem. Identifying them, and estimating how much parallel processing could be done is essential first step in achieving good concurrency. For example, merge-sort could be an example that falls in between, link list processing could be purely sequential, and matrix multiplication could be purely parallel.

 Though above classification is somewhat unclear, here the discussion is about kernel mode drivers. How much parallelism we can get? And how exactly can we estimate the speed up? In other words, I've a driver and my concern is would it be able to scale as number of processor scales? Also what should be an approach for estimating the speed up?

 

Well, you will see books and articles about parallel and concurrent programming that takes mostly scientific programs and try to analyze how best they can be parallelized. And sometime they also calculate the estimated speed up. If I'm posed with this question, I will take the following steps first -

 

  • I will take Amdahl's law for maximum speed up.
  • We will divide the driver code with frequency of execution to define the workload in measurable units.
  • We look at the places where we need some locks: coarse or fine grained
  • Then apply the formula to get an estimate

Why go through such trouble? It should give us areas to improve upon based on our estimation. Let's look at Amdahl's formula.

Assume that the work takes 1 unit of time when ran sequentially. Now the work has some sequential part. If the sequential part is zero, then it is a pure parallel work, and it scales with the number of processor. Yeah, somewhat idealized condition. Assume it has  took  0< p < 1 unit of time to the parallel part, and 1 - p unit of time for the sequential part.  Then if we have n processor to work on this, the time it would take is -

TPn = Time spent by n parallel processors =  1 - p  +  p/n. So the Speed up is  1 /  ( 1 - p + p/n ).

Example: Suppose the algorithm has 1/2 of the work that is purely sequential, and another 1/2 is parallelize-able. Then the speed up with 2 processor is  =  1 / ( 1 - 1/2  +  1/(2*2) ) = 1 / ( 1/2 + 1/4) = 4/3.  So basically the result shows that we just have added 1/3 more processing capacity, so to speak mathematically!

 For drivers, you can not just assume it run all parts to completion. Some parts runs on demand, and some parts run intermittently and some other parts run very frequently. So before we even look to see what part is inherently sequential, and what part(s) are parallelize-able, we have to inject the frequency to get an weighted method of estimating the speed up. 

 Once we have an estimation, our push would be take most weighted sequential part, and try to parallelize. As we all know that any place we guard using locks are essentially sequential. So critical sections are essentially sequential execution path, and our objective to push those code out of locks as much as possible.

In summary, next time you are asked how much speedup you achieved by modifying your critical section, you don't have to run some specific tests and answer the question. Moreover, your tests may not be representative of the full spectrum of running conditions of your driver. You should be able to plug the numbers in your formula, and answer the estimated speed up! Wonderful, at least I think so!!!

Posted on Sunday, September 19, 2010 at 09:18AM by Registered CommenterProkash Sinha | CommentsPost a Comment | References1 Reference

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    There are different programmers who use different paths. The key to success is that we must use one program for one path. If we terser it in alternatives then it may make wrong decision and make confusion. It’s best to use the same route for any particular issue. It’s key for ...

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
All HTML will be escaped. Hyperlinks will be created for URLs automatically.