A snippet from wikipedia – “In computer science, false sharing is a performance-degrading usage pattern that can arise in systems with distributed, coherent caches at the size of the smallest resource block managed by the caching mechanism. When a system participant attempts to periodically access data that will never be altered by another party, but that data shares a cache block with data that is altered, the caching protocol may force the first participant to reload the whole unit despite a lack of logical necessity. The caching system is unaware of activity within this block and forces the first participant to bear the caching system overhead required by true shared access of a resource.“
Too abstract to understand easily , isn’t it ?
To understand this better let’s take one hypothetical situation.It goes like this:
There are three painters. Each one has his own wooden board on which they paint, each board has three divisions , say division 1, division 2 and division 3. A painter can only paint one of these three divisions. When a painter paints one division of his wooden board, the other two boards must also be changed to reflect what the first painter has done.
Here the wooden boards are analogous to cache blocks,
painters are analogous to parallel threads
and painting is analogous to write activity.
Remember that this update was logically unnecessary as the divisions used by each painter do not intersect with divisions used by other painters. What could have been done is after painting is done by all the painters, the wooden boards could be updated at the end.But this is not how our computer architecture works. This is because the component which manages the caching mechanism do not know which division of the cache block is actually updated. It marks the whole block dirty.This forces a memory update to maintain cache coherency. This is a very expensive computation compared to a write activity in the cache block.
This problem only arises when there is a write process and two parallel threads have intersecting cache blocks.The only way now to solve this problem would be to make sure that two parallel threads have different cache blocks.
Performance hit due to false sharing:
Now without the stats on time differences between with and without false sharing this article would be incomplete.
Refer the code here:
Steps to run the code in linux:
The above code has three parts:
1. Serial computation.
2. Parallel computation with false sharing : false sharing in this part of the code is ensured by updating two adjacent memory locations of an array repeatedly.As the memory locations are adjacent they will be in the same cache block.
3. Parallel computation without false sharing : Two elements with a considerable memory gap between them are taken to make sure they don’t lie in the same cache block.
The input size in the above graph is decided by the number of times the loop is executed in the expensive function.
As you can see that parallel computing with false sharing is worse than serial computation, that’s how expensive it is to maintain cache coherency between two threads having intersecting cache blocks.So it’s very important to be aware of what’s going on in your parallel program till the cache level.
With this I end my article here. If you have any doubts please feel free to comment. I will get back to them as early as possible.