Wednesday, October 24, 2012

Effective Concurrency by Herb Sutter

Have never ever written feedback on events or courses, but here I decided to write one. It is about "Effective concurrency" course by Herb Sutter. Hopefully that post will help to someone to support an approval for that course :)

So, as I have already said, a few weeks back I was lucky enough to attend "Effective Concurrency" course by Herb Sutter. That guy is software architect at Microsoft where he has been the lead designer of C++/CLI, C++/CX, C++ AMP, and other technologies. He also has served for a decade as chair of the ISO C++ standards committee. Many people also know him for his books.


So, about the course. It is very complete and covers concurrency subject starting from software design patters (actors, pipelines, atomics, etc) and finishing with hardware stuff (caches, pipelines, NUMA, etc). In overall course was great and I really enjoyed it, if you ever have a chance to attend it - do it without hesitation, you will learn a lot. Regarding to myself, I can't say that I learned many new things there, more re-iterating of what I have already learned from years of programming. Still there was some new stuff, which I didn't know, especially about the approach for solving some concurrency problems. BUT!, The most important thing I got out of that course is that it helped me to arrange and structure all that knowledge in my head. Also, there was a lot of good reasoning why some things things should be done/work this way, not the other and many interesting code examples with explanations (e.g. parallel multi-threaded graph traversing algorithm was just amazing). As far as he is C++ guy, the most examples were in C++11, but all the same concepts can be applied to Java or to any other language, so the course is language-agnostic.

Herb is good lecturer - very focused, he doesn't support long discussions and doesn't spend time on talking not exactly about the topic. He was talking non-stop for a three days and everything he said was valuable information. Not sure that I can ever do that :)

Below is brief overview of topics covered on that course. Unfortunately, I do not think that I can share sliders, but hope that description should give some indication of what it is all about. Specifically, I want to make a note that it is my understanding of the course may have nothing with reality and may be completely different from the other's opinion. Just keep that in mind.

Day 1

An overview of concurrency and it's evolution, why it's important to know how to use it, primitives, thread-pooling and actors.
  • Free lunch is over and Welcome to the Jungle: That's all about the fact that computers are not getting simply faster, how it was before. They are getting bigger, more complicated and more specialized. 
  • Types of possible concurrency levels: Single-threaded, K-threaded (K-constant), N-threaded (N-numbers of physical threads). 
  • Types of multi-threading: cooperative, preemptive, multi-core preemptive and why the last one is the most complicated. 
  • Overview of concurrency primitives. Locks, thread pools, futures, atomics, etc. The main idea is thread as it is is too low-level concept, hard to operate. Need something higher level - Futures, Actors, etc. 
  • Code has to be structured and should not look like spaghetti. And that's where libraries with threading primitives come in. 
  • Thread pools and work stealing. Mostly that was about work stealing and how cool is that. Example using Quick sort. Actors and pool with work stealing as implementation. 
  • Futures in C++, Java, C#.
Conclusion: concurrency is here and we have to use it effectively. To archive that and not over complicate your code make use concurrency primitives/libraries like actors, futures on the top of thread pools. Prefer not to use 'raw' threads.

Day 2

Memory architecture, caching, hardware details.
  • Pipeline and concurrency. What it pipe line and how to build it properly with some examples. 
  • What is bandwidth and latency. The main idea is that you can always get better bandwidth, but latency is something much more complicated and much harder (if possible) to fix. 
  • Memory latency is the biggest problem and the most of CPU manufacturers efforts are applied to hide that latency. If you have a look at CPU - "1% die to change data, 99% to move, store data", i.e. to hide latency (http://www.ll.mit.edu/HPEC/agendas/proc04/invited/patterson_keynote.pdf). 
  • CPU pipe lines - how it works and why it's required. Superscalar CPUs. 
  • Reordering and hardware memory model. Main conclusion is that the main aim of CPU is to execute program is a such way that it will be valid, as if it is executed just on one thread. The code you wrote and the way how code is executed by CPU are two different things. If you want it to execute the program exactly as you wrote it, you need to tell CPU about it. 
  • Hardware loves arrays. Pre-fetching, caches, etc. The love comes from the simplicity of optimization for memory latency. 
  • False sharing, cache locality. 
Conclusion: one of the main problem of concurrency performance is speed of the memory access. Love arrays the same way as hardware does and try not to use random-access for the same reason. Do not modify the data in the same location (same cache line) from different threads. Use cache-friendly data structures. Keep in mind that what you wrote is not exactly what is executed.

Day 3

Doing concurrency - writing concurrent code
  • Example of concurrent concurrency (hm...). Basically when data is not just processed by several threads, but also by several clients. E.g. traversing tree and updating it at the same time in several threads by several clients. 
  • Some data structures are well for concurrency, some are not. E.g. arrays are brilliant, linked lists are ok, but balanced binary tree is not concurrency friendly at all. Choose correct data structures. 
  • Different types of scalability in detail - K-scalable, linear scalability and super-linear scalability. Super-linear scalability is interesting, it can be archived using algorithms which are naturally faster when run in parallel. Also, several threads can benefit of several caches, etc. 
  • How to stop thread. Kill - very bad, almost kill (not available in java) - still bad, interrupt - well..., just a flag... maybe. The recommended approach is just a flag. In Java I would also consider interrupt, but I have to say that as far as not many people know how to use it, it can be slightly dangerous. 
  • Locks are complicated and always are cause of problems. It is possible that deadlock can happen even when locks are not related at all. E.g. one lock on your code, the other one is in library. You won't even know about it. To make YOUR locks work safer, use lock levels and lock groups. Levels mean that thread won't ever grab the lock with lower level. Groups help with ordering to ensure that locks are always grabbed in the same order. Keep in mind that it will make YOUR locks better, but if you are using third party code and it is using locks, nothing can help, really.
Conclusion: you have to know concurrency algorithms and data structures. Locks are complicated, never trust to code which you didn't write.

And the overall conclusion. As I have already mentioned course is very good. I would recommend it to everybody. It has extremely good coverage of the topic and the only thing, which I found missed is MSI, it would be good to have an overview with some examples how it affects latency. Unfortunately, I do not think that Herb can do it more often than once in a year, so if you want to attend you have to wait and check for it to re-appear here (http://developerfocus.com/) or just ping Herb, he definitely has to know about his plans :)

2 comments:

Blogger said...

Did you know that that you can earn money by locking premium pages of your blog or website?
To start just join AdscendMedia and implement their Content Locking tool.

Scarlett H said...

Hi nice reading yoour blog