Let’s split men!

Categories: Featured

Actually we can say that a multicore environment is a common part of your life (unless you are working with smart phones, or tablets or something similar). But how much have we profit from this environment.

Sure that when we are running our applications at a server the application server tries to take advantage of the cores; it will split the work as much as it can. But a monolithic program is still a monolithic program (maybe we can run two of them at the same time but still).

I believe that we could take extract much more from this kind of resource. I don’t have any statistical data to prove my idea, but let’s use this example:

WebService1 is responsible to store a WorkUnit called WUA, it receives this unit of work in a call and before storing that it should: check if the unit doesn’t have any script injection, check if the unit doesn’t have any virus, convert the unit to an object, process the object and finally store it.

So it would be a top view flow for it.

BaseFlow

As we can see you have everything in sequence. But the question is: do we need that in sequence? Is there actions that can be executed in parallel?

Why are those questions important? Simply because now we have the computation facility to really execute actions in parallel. So if we can divide the actions in parallel we also can use the multicore advantage.

But before going forward in our diagram, let’s learn about the Fork-join queues. I rely on wikipedia for some definition:

“In queueing theory, a discipline within the mathematical theory of probability, a fork-join queue is a queue where incoming jobs are split on arrival for service by numerous servers and joined before departure.”

Now is the time that you can say: but what is the business of a mathematical theory with multi core. Well in this case everything, the same concept can be applied in a computational queue. You split your tasks in jobs, that can be queued and processed in parallel.

Each processor can look at the queue and catch a job to do, so at the end we can have to jobs executed in parallel. Once they have finished the result is merged and the flow continue.

I will not get in details of how it can be implemented, and in fact there is quite a few frameworks around to do that. Even the java concurrency library offer mechanisms for that, but if you are interested in how to implement it, and the possible difficulties take a look at [1][2][3] – they are quite good references.

Now back to our diagram: what can we execute in parallel. I would say that virus check, the script injection check and input deserialization can be executed in parallel.

BaseFlow (1)

Let’s look at this problem in three situations, assuming that all the parallel process can happen in different processors, they don’t need to wait to be executed, they consumes the same amount of time AND the deserialization always works:

  1. No virus and no script detected: the deserialization is not lost, and it was executed in 1/3 of the time need for the sequential function, because it didn’t need to wait for the other two process;
  2. A virus is detected: I spend 3 times more resource than it would be needed in a sequential function;
  3. A script injection is detected: I spend 2 times more resource than it would be needed in a sequential function.

In most of the cases the situation 1 happens so in the end the average amount of time spend is less, in rough calculation we would that we reduced the time 1/3.

Sure that in a real life example this reduction is not that big (there is other factors involved). But at the same time we can find other situation in those processes that can be executed in parallel.

Finally we need to remember that this approach is also valid for fine granularity tasks, simples calculation or recursive call algorithms. In fact recursive algorithm is are the best for it.

But my point here is that you can use it in the common approach of recursive algorithms and big box computations.

So next time that you have a task to reduce the time of critical features of you software, give a try and check what can be executed in parallel. So you can take advantage of the amount of cores using a simple framework like Fork-join (much simpler than dealing directly with Threads).

I finish this post with the song Breaking the Silence, from the Loreena Mckennitt’s album Parallel Dreams, enjoy it!

[1] GPars framework – Fork-Join (http://gpars.org/guide/guide/single.html#3.6. Fork-Join)

[2] Lea, D. A Java Fork/Join Framework (http://gee.cs.oswego.edu/dl/papers/fj.pdf)

[3] Harned, E. Fork-Join development in Java SE (http://www.coopsoft.com/ar/ForkJoinArticle.html)

[4] Beust, C. Parallel framework shootout (http://java.dzone.com/articles/parallel-framework-shootout)

[5] Goertz, B. Java theory and practice: Stick a fork in it, Part 1 (http://www.ibm.com/developerworks/java/library/j-jtp11137.html)

«
»

    Leave a Reply

    Your email address will not be published. Required fields are marked *