To my delight, I still get compliments on and criticisms of my post from three years ago (can it possibly be that long?) on parallelism and concurrency. In that post I offered a “top down” argument to the effect that these are different abstractions with different goals: parallelism is about exploiting computational resources to maximize efficiency, concurrency is about non-deterministic composition of components in a system. Parallelism never introduces bugs (the semantics is identical to the sequential execution), but concurrency could be said to be the mother lode of all bugs (the semantics of a component changes drastically, without careful provision, when composed concurrently with other components). The two concepts just aren’t comparable, yet somehow the confusion between them persists. (Not everyone agrees with me on this distinction, but neither have I seen a rigorous analysis that shows them to be the same concept. Most complaints seem to be about my use of the words “parallelism” and “concurrency” , which is an unavoidable problem, or about my temerity in trying to define two somewhat ill-defined concepts, a criticism that I’ll just have to accept.)
I’ve recently gotten an inkling of why it might be that many people equate the two concepts (or see no point in distinguishing them). This post is an attempt to clear up what I perceive to be a common misunderstanding that seems to explain it. It’s hard for me to say whether it really is all that common of a misunderstanding, but it’s the impression I’ve gotten, so forgive me if I’m over-stressing an obvious point. In any case I’m going to try for a “bottom up” explanation that might make more sense to some people.
The issue is scheduling.
The naive view of parallelism is that it’s just talk for concurrency, because all you do when you’re programming in parallel is fork off some threads, and then do something with their results when they’re done. I’ve previously argued that this is the wrong way to think about parallelism (it’s really about cost), but let’s just let that pass. It’s unarguably true that a parallel computation does consist of a bunch of, well, parallel computations. So, the argument goes, it’s nothing but concurrency. I’ve previously argued that that’s not a good way to think about concurrency either, but we’ll let that pass too. So, the story goes, concurrency and parallelism are synonymous, and bullshitters like me are just trying to confuse people and make trouble.
Being the troublemaker that I am, my response is, predictably, no, just no. Sure, it’s kinda sorta right, as I’ve already acknowledged, but not really, and here’s why: scheduling as you learned about it in OS class (for example) is an altogether different thing than scheduling for parallelism. And this is the heart of the matter, from a “bottom-up” perspective.
There are two aspects of OS-like scheduling that I think are relevant here. First, it is non-deterministic, and second, it is competitive. Non-deterministic, because you have little or no control over what runs when or for how long. A beast like the Linux scheduler is controlled by a zillion “voodoo parameters” (a turn of phrase borrowed from my queueing theory colleague, Mor Harchol-Balter), and who the hell knows what is going to happen to your poor threads once they’re in its clutches. Second, and more importantly, an OS-like scheduler is allocating resources competitively. You’ve got your threads, I’ve got my threads, and we both want ours to get run as soon as possible. We’ll even pay for the privilege (priorities) if necessary. The scheduler, and the queueing theory behind it (he says optimistically) is designed to optimize resource usage on a competitive basis, taking account of quality of service guarantees purchased by the participants. It does not matter whether there is one processor or one thousand processors, the schedule is unpredictable. That’s what makes concurrent programming hard: you have to program against all possible schedules. And that’s why you can’t prove much about the time or space complexity of your program when it’s implemented concurrently.
Parallel scheduling is a whole ‘nother ball of wax. It is (usually, but not necessarily) deterministic, so that you can prove bounds on its efficiency (Brent-type theorems, as I discussed in my previous post and in PFPL). And, more importantly, it is cooperative in the sense that all threads are working together for the same computation towards the same ends. The threads are scheduled so as to get the job (there’s only one) done as quickly and as efficiently as possible. Deterministic schedulers for parallelism are the most common, because they are the easiest to analyze with respect to their time and space bounds. Greedy schedulers, which guarantee to maximize use of available processors, never leaving any idle when there is work to be done, form an important class for which the simple form of Brent’s Theorem is obvious.
Many deterministic greedy scheduling algorithms are known, of which I will mention p-DFS and p-BFS, which do p-at-a-time depth- and breadth-first search of the dependency graph, and various forms of work-stealing schedulers, pioneered by Charles Leiserson at MIT. (Incidentally, if you don’t already know what p-DFS or p-BFS are, I’ll warn you that they are a little trickier than they sound. In particular p-DFS uses a data structure that is sort of like a stack but is not a stack.) These differ significantly in their time bounds (for example, work stealing usually involves expectation over a random variable, whereas the depth- and breadth-first traversals do not), and differ dramatically in their space complexity. For example, p-BFS is absolutely dreadful in its space complexity. For a full discussion of these issues in parallel scheduling, I recommend Dan Spoonhower’s PhD Dissertation. (His semantic profiling diagrams are amazingly beautiful and informative!)
So here’s the thing: when you’re programming in parallel, you don’t just throw some threads at some non-deterministic competitive scheduler. Rather, you generate an implicit dependency graph that a cooperative scheduler uses to maximize efficiency, end-to-end. At the high level you do an asymptotic cost analysis without considering platform parameters such as the number of processors or the nature of the interconnect. At the low level the implementation has to validate that cost analysis by using clever techniques to ensure that, once the platform parameters are known, maximum use is made of the computational resources to get your job done for you as fast as possible. Not only are there no bugs introduced by the mere fact of being scheduled in parallel, but even better, you can prove a theorem that tells you how fast your program is going to run on a real platform. Now how cool is that?