Parallelism Is Not Concurrency

In an earlier post I mentioned that one goal of the new introductory curriculum at Carnegie Mellon is to teach parallelism as the general case of computing, rather than an esoteric, specialized subject for advanced students.  Many people are incredulous when I tell them this, because it immediately conjures in their mind the myriad complexities of concurrency: locks, monitors, deadlock, dining philosophers, ….  And if you look at the recent research literature, you will see countless papers of the form (a) parallelism is important, so (b) let’s talk about concurrency.  The situation has gotten so bad that I think that most computer scientists cannot distinguish the two concepts.  (Or maybe they willfully confuse the two so as to justify funding for work on concurrency by appealing to the importance of parallelism.)

Given this state of affairs, I cannot explain what we are doing at Carnegie Mellon (to anyone who doesn’t already know) without first rebooting you.  Trust me, I’m just going to hit Ctrl-Alt-Delete, and then click Restart.  OK, we’re back.

The first thing to understand is parallelism has nothing to do with concurrency.  Concurrency is concerned with nondeterministic composition of programs (or their components).  Parallelism is concerned with asymptotic efficiency of programs with deterministic behavior.  Concurrency is all about managing the unmanageable: events arrive for reasons beyond our control, and we must respond to them.  A user clicks a mouse, the window manager must respond, even though the display is demanding attention.  Such situations are inherently nondeterministic, but we also employ pro forma nondeterminism in a deterministic setting by pretending that components signal events in an arbitrary order, and that we must respond to them as they arise.  Nondeterministic composition is a powerful program structuring idea.  Parallelism, on the other hand, is all about dependencies among the subcomputations of a deterministic computation.  The result is not in doubt, but there are many means of achieving it, some more efficient than others.  We wish to exploit those opportunities to our advantage.

Now I can hear you object, but isn’t concurrency required to implement parallelism?  Well, yes, it is, but concurrency is also required to implement sequentiality too!  The timing signal on your processor chip is essentially a synchronization mechanism with which to coordinate the otherwise independent activity of the components of the processor.  (Yes, I know that there are asynchronous processors, but this only makes my point stronger, you still need to control the concurrency.)  The point is that concurrency is not relevant to parallelism, even if the engineers who build our parallel computing platforms must deal with concurrency.  Another way to say the same thing is that parallelism is a useful abstraction, and abstractions should never be confused with their implementations.  (Numbers are not certain sets, but they can be implemented in terms of sets.  And so on.)

All well and good, parallelism is not concurrency, but what is it?  What is an appropriate model for parallel computation?  As in the sequential case, there are two answers, a machine-based approach and a language-based approach.  I am going to criticize the machine-based formulation, and argue for the language-based formulation on two grounds: (a) it is inherently more elegant and natural, and (b) it can be taught directly to students without any fuss or bother.  Much of what I have to say is derived from the seminal and beautiful work of Guy Blelloch, which I have absorbed over years of collaboration.  To the extent I’ve gotten it right, give Guy the credit; to the extent that I’ve gotten it wrong, assign me the blame.

The canonical machine-based model is the PRAM, or parallel RAM, which generalizes the classical RAM model to allow p>0 processors that share a common memory.  There are many variations on the PRAM, according to what sorts of interactions are allowed and how conflicts are resolved among the processors.  This is not important for present purposes, so I will not attempt to categorize the variations here.  The key idea is that the number, p, of processors is fixed in advance—results are stated in terms of how efficiently an algorithm executes on a p-processor PRAM as a function of the size, n, of the input.  This is well and good as a cost model for parallel execution, but it totally sucks as a programming model!  To write code for a PRAM, you fix the number of processors, and write the code for that number of processors (often the various cases are “analogous”, but in reality you must fix p and write code for that fixed p).  Moreover, you must (officially, anyway) write low-level imperative RAM code, which introduces spurious dependencies among sub-computations by explicitly scheduling the steps of the computation.  This is horrendous!  Of course typically one writes pidgin C, and sketches how it is compiled, but the fact is you’re still writing sequential, imperative code to implement a parallel algorithm.

As I’ve argued previously, what is needed is a language-based model of computation in which we assign costs to the steps of the program we actually write, not the one it (allegedly) compiles into.  Moreover, in the parallel setting we wish to think in terms of dependencies among computations, rather than the exact order in which they are to be executed.  This allows us to factor out the properties of the target platform, such as the number, p, of processing units available, and instead write the program in an intrinsically parallel manner, and let the compiler and run-time system (that is, the semantics of the language) sort out how to schedule it onto a parallel fabric.  Here’s a useful analogy.  Just as abstract languages allow us to think in terms of data structures such as trees or lists as forms of value and not bother about how to “schedule” the data structure into a sequence of words in memory, so a parallel language should allow us to think in terms of the dependencies among the phases of a large computation and not bother about how to schedule the workload onto processors.  Storage management is to abstract values as scheduling is to deterministic parallelism.

The paradigmatic example is provided by Quicksort, a naturally parallel algorithm.  Many people think that the “big idea” of Quicksort is the in-place swapping technique that Hoare invented in the original formulation.  I argue that this is a detail; the “big idea” is that it is parallelizable.  To see this, we must work at a reasonable level of abstraction: a sort function permutes a non-empty sequence of (comparable) elements to obtain another sequence in sorted order.

fun qs xs = if |xs|=1 then xs else let val (xsl, xsg) = split xs in concat (qs xsl, qs xsg)

That is, we split the sequence into two parts (by some means) such that everything in the left part is less than everything in the right part, then sort the parts independently (in parallel!), and merge the results together.  The exact complexity of Quicksort depends on the representation of sequences, and the complexity of the operations on them.  For now, let us just gloss over this to focus on the overall structure of the computation.  If we can ensure that the sequence is split roughly in half (by any of a number of means, including randomization), then the recursion is of logarithmic depth.  At each stage we do linear work, but the recursive calls are independent of one another.  The algorithm has a natural fork-join structure in which we (a) split the sequence into two parts, (b) in parallel sort the two parts, and (c) join the results together.  (The splitting itself can be done in parallel as well, but we’ll not bother about that here.)

Analysis of Quicksort splits into two parts, the work and the depth (or span).  The work is the overall number of steps required to perform the algorithm.  On a sequence of length n, we perform O(n log n) steps to sort the sequence, as might have been expected.  This is the sequential complexity, the amount of time it would take to run on a single processor.  To account for parallelism, there is a second measure, called the depth, that measures the critical path length of the computation, the length of the longest chain of dependencies among subcomputation.  No matter how much parallelism we may have available, we can never run faster than the depth; it represents the idealized parallel complexity of the algorithm, the amount of time it would take given any number of processors.  Depending on the choice of data structures, the depth of Quicksort is O(log^2 n), or thereabouts, which makes it quite parallelizable.  (Note that, as in our earlier discussion, a step of computation is a transition step in the operational semantics of the language we are using to express the algorithm.)

This is very nice and elegant, but now comes the really beautiful part, the concept of a provable implementation of the language.  (The word “provable” is only needed here because of the shabby state of the field; having a proof that your implementation works, and how well it works, remains an oddity to be emphasized, rather than an expectation to be fulfilled by any self-respecting computer scientist.)  The idea is that the language-based model can be implemented on a PRAM (with stated interconnect properties that I will suppress here) with a provable bound on the cost that takes account of the overhead of scheduling onto a p-processor fabric.  Thus, p need not be specified in advance, but is rather determined by the run-time system and compiler, which are provided by the proof of the theorem.

Theorem A computation with work w and depth d can be implemented in a p-processor PRAM in time O(max(w/p, d)).

(Sometimes the second factor is d log p, depending on the assumptions we are making about the target PRAM.)

This is called Brent’s Principle, and it states a fairly obvious fact.  First, we can never execute a depth d program in fewer than d steps.  The depth is the critical path length; this is the amount of time we must wait to execute the algorithm, regardless of any available parallelism.  Second, to the extent that parallelism arises from the dependency (rather, indepedency) structure of the computation, we can at best do p units of work at time, keeping all processors warm, until we are done.  And that’s it!

The effectiveness of the language-based model of parallelism lies entirely in its ability to expose the dependency structure of the computation by not introducing any dependencies that are not forced on us by the nature of the computation itself.  And the key to this is functional programming, which manifests itself here in the transformational approach to computation: sorting is conceived of as a mathematical function that transforms a given sequence into another sequence.  It does not destroy the given sequence any more than adding two numbers destroys those numbers!  Since Quicksort is a mathematical function, we need not worry that execution of qs xsl interferes with (depends on) qs xsg; we can readily run them in parallel without fear of untoward consequences.  The payoff is that there are many fewer dependencies among the subcomputations, and hence many more opportunities for parallelism that can be exploited, in accord with Brent’s Principle, when scheduling the work onto a parallel fabric.

The upshot of all this is that functional programming is of paramount importance for parallelism.  That’s (one reason) why we are teaching functional programming to freshmen at Carnegie Mellon.  And it’s the main reason why we are able to.  There’s nothing to it, just program like you do math, and the parallelism will take care of itself!

Update: Simon Marlowe made similar points to mine some time ago.


56 Responses to Parallelism Is Not Concurrency

  1. […] last thing I want to illustrate is a difference between parallelism and concurrency. This topic is well covered, and there is a great talk by Rob Pike on the subject. One of the #mustwatch videos, […]

  2. […] 最後に見せたいのは、並列化と並行性の違いです。これについては頻繁に議論の対象となっています。Rob Pikeによる素晴らしいプレゼンがあるので、ぜひ見てください。#mustwatch(必見)ビデオの1つです。 […]

  3. […] объясняющая отличия между concurrency и параллелизмом раскрыта многократно, в том числе и Робом Пайком в […]

  4. […] functional composition with first-class functions can express the depth in the parallelism by separating out the independent […]

  5. […] functional composition with first-class functions can express the depth in the parallelism by separating out the independent […]

  6. […] functional composition with first-class functions can express the depth in the parallelism by separating out the independent […]

  7. […] functional composition with first-class functions can express the depth in the parallelism by separating out the independent […]

  8. […] プログラミング言語の研究者たちが関数型言語を好むのには、十分な理由があります。例えばC言語のような命令型言語の場合、プロシージャのコード1行がプログラム全体に影響を与えてしまうこともありますが、関数型言語の場合はそういった副作用が大幅に制限されるため、正確性や安全性を導くのが容易になります。どういうことかと言うと、AのプロシージャとBのプロシージャの構成の正確性を分析する際には、AとBを分けて個別に分析できる利点があるということです。また研究者たちは、関数型プログラミングによる設計が並列処理の活用を容易にするということも度々論じています。 […]

  9. […] A and B can be broken down into analyzing A and analyzing B. Researchers have also frequently argued that functional program design makes it fundamentally easier to exploit […]

  10. Thanks for this post. It is powerful to be able to identify and express deterministic algorithms that are side effect free and can be automatically consumed by the compiler or processor and scaled almost as needed. The stumbling block I have is in how concurrent and parallel are defined here.

    I was happy to reboot myself and try something new, but when I applied the new ideas the pieces didn’t seem to fit. Though, I am happy to be enlightened (even if you just list a few links).

    If we take a step down the stack from the programming to processing and ultimately to the system we generally understand concurrent and parallel based on time. At the system level, when a person sees a number of bees buzzing around a hive, then tend to think of how they all appear to be doing things at the same time, rather than whether their activity is deterministic or not.

    A standard definition (I am sure you come across) for concurrent processing is where computations are executed in overlapping time periods. The definition for parallel is where operations are performed simultaneously. The key differences being ‘overlapping time periods’ vs ‘simultaneously’. Where concurrent processing includes all parallel, but parallel does not include all concurrent.

    The benefit of this definition is that it is very much how we think of the processing. Two CPU cores can process tasks in parallel (simultaneously). It is also useful to think of examples where concurrent processing is not parallel. An example being a single core that can swap between tasks giving the impression of multiple operations in overlapping time periods (which is how ‘things happening at the same time’ was originally achieved).

    We are in the business of doing things at the same time and in this post it seems that in the program (or algorithm) the time dimension has been swapped with determinism (I know there is more to it than that, but that seems to be the fundamental part). While I see value in differentiating deterministic programs (see last paragraph), I have a hard time adopting the new definitions applied to concurrent and parallel because: –

    1. It seems less intuitive to consume as a non-expert. It is easy to understand the concept of activities running at the same time and how that relates to concurrency and parallelism, less easy to understand how determinism relates (as an introduction).
    2. It seems more difficult to understand the natural evolution from sequential to single-processor multi-tasking to multiple processors purely in terms of determinism.
    3. It seems to be a disconnect with the natural time metaphor used for processing and systems to the point that concurrent (non-parallel) programs can run largely in parallel.

    So, my questions, if I may be so bold, are: –
    1. How do you continue this determinism/non-determinism metaphor from the algorithm / program level to the processor and system without the confusion that a concurrent program (by the determinism definition) can have multiple threads of execution active and running in parallel (on separate processors) much of the time.
    2. How would you describe the evolution of concurrent and parallel systems using the metaphor used in 1.
    3. How would you describe a program that provides the illusion of running multiple operation at the same time, but all that it is doing is time-slicing (e.g. swapping between timers).

    Please note, I do think that it is valuable to differentiate deterministic side-effect free programs from their badly behaved (but still very useful) non-deterministic lock-ridden counterparts. However, I see that as a refinement of the definition of parallel. This would stop confusion at the grey areas of concurrent vs parallel and elevate this deterministic class of program from mere parallel to ‘deterministic parallel’ (or something like that). This would also keep the time metaphor so that non-deterministic programs that are processed largely in parallel could still be considered parallel. By this definition, a concurrent, but non-parallel program would be one that time sliced/swapped to give the impression of multiple tasks happening at the same time (as many Javascript programs do).

    • The short response is to suggest that you read my PFPL. We are largely in agreement. In my view concurrency is about non-deterministic composition of programs, and has zero to do with parallelism. Because of non-determinism, reasoning about such programs requires thinking about all possible choices that can be made among the composed computations. This is very difficult, as we all know. It makes no difference whether concurrency is implemented on one processor or a thousand, the crucial point is non-deterministic composition, period. Parallelism, on the other hand, is all about making full use of computational resources to make your programs run faster. It’s all about efficiency, and not at all about correctness or composition. It just says “do these two things at once”, period. The cost semantics supports asymptotic analysis of how well you are doing, given unbounded parallel resources. The provable implementation maps that to concrete platforms with specific limitations, such as the number of processors.

      An instance of the problem of reasoning about concurrent programs (those built from non-deterministic composition) is that it is very hard or impossible to make precise statements about their computational complexity, in either time or space. This is the antithesis of parallelism! The restriction to determinism is to ensure that there is no question of correctness introduced by using parallelism (debug sequentially, run in parallel), and that it is possible to make precise statements about their efficiency. Combining parallelism with concurrency is deadly for both enterprises, and is not to be recommended. If only more people understood this fundamental point, lots of wasted effort would not be wasted.

      Incidentally, another consequence of non-determinism making it hard to get provable efficiency bounds on programs is that Haskell sucks. It is nearly impossible to make accurate time or, especially, space predictions about Haskell programs. That’s why there’s so much emphasis on space profiling, as opposed to space analysis. Why is Haskell non-deterministic? Because it is lazy, and, as Gilles Kahn, the father of laziness, said many decades ago: laziness is all about concurrency.

    • As you say, we are largely in agreement, its just the use of the terms that I have difficulty with.

      From being in the trenches of multi-threaded programming I quickly learnt that giving up some attempts at parallelism in favour of increased determinism and stability was a wise trade-off given the commonly used programming languages and techniques at the time. Especially when developing a business-critical single-point-of-failure server (as we used to enjoy doing). I generally kept my threads locked in a cage, fed them queues and never let them meet face to face. With the deterministic side-effect free approach I can relax this.

      A lot of the new developments (which are probably quite old by now) that have entered the mainstream such as being aware of the relationship between side-effects and concurrency, STM and even async seem to be steps in the right direction (even if they are just band aids). Hopefully, Haskell can be salvaged or there is another language in the pipeline that gets it all right.

      Food for thought, thanks. I will look out for your PFPL.

    • dmbarbour says:

      It bothers me that so many people conflate concurrency with non-determinism. These are orthogonal properties. We can have a non-deterministic, fully sequential computation simply by asking for a random choice at some steps. Conversely, we can have deterministic, concurrent models – e.g. Conway’s Game of Life is Turing complete and formally models all cells as updating concurrently (at the same logical time). Temporal logic and event calculus are two more examples of deterministic concurrency.

      Concurrency is a property of the model or problem domain. Parallelism is a property of the implementation – e.g. whether we use more than one processor. This is the best distinction I’ve found. Not every concurrent model offers effective opportunity for parallelism, and not every opportunity for parallel implementation involves concurrency. All three concepts featured here – non-determinism, concurrency, and parallelism – are entirely distinct and orthogonal.

      Unfortunately, multi-threaded imperative programming makes it difficult to consider these concepts separately. I suppose this is why so many people conflate them.

    • Well, please see PFPL and agree to disagree.

    • dmbarbour says:

      PFPL describes only one model for concurrency, plus an adaptation of this model to an Algol. Are you attempting to generalize from a single example of concurrency? That seems rather premature.

      “All cars are red; look at my book, it has an example of a red car!” s/car/concurrency s/red/non-deterministic

    • I am using the pi calculus, a general model of concurrency, as a basis for what concurrency means.

    • I don’t agree with you, as I I’ve explained here and in PFPL.

    • dmbarbour says:

      Where do you explain your disagreement in PFPL? PFPL describes one general model of concurrency (Milner’s process calculus) but, as far as I can tell, PFPL does not present an argument that every model of concurrency must exhibit non-determinism.

      Could you point me to a place where you clearly argue your opinion?

    • it’s implicit in the decomposition of the pi calculus into non-deterministic composition, plus a bunch of stuff, such as channels, that have not much to do with concurrency (they are assignables, essentially).

    • dmbarbour says:

      Decomposing pi calculus to find the operator of concurrency seems an odd sort of reductionism, rather like trying to tear apart a bird to identify a flight bone.

      I do believe the elements you’re dismissing – processes and channels – are the relevant ones. It isn’t that we need processes and channels specifically, but rather that they play the necessary roles for a concurrent system: independent definition of subsystems, and simultaneous interactions between subsystems.

    • In my experience, most concurrency models allow you to implement one in another in practice. A proof of their equivalence is much harder, but you can often get the same effect from one in another model.

      The Pi calculus is, to me, general in same sense that Lambda-calc is (The Turing Machine being an important but largely uninteresting detour). You would never program in Pi, but it encapsulates all of “what is a concurrent action” that it is excellent as a vehicle of exposition.

      The key is that I can describe a random process in the Pi calc as one hooked up to a radioactive decay source pushing bits out on a channel.

      And I can probably describe Game of Life by modeling each cell as a process. For simplicity, I could use the polyadic Pi calc which is equivalent to the monadic Pi calc to describe this model.

      In that view, the two models are not that different from each other. It all depends on what level of abstraction I work on.

    • dmbarbour says:

      “The Pi calculus is, to me, general in same sense that Lambda-calc is”

      It’s certainly ‘general’, but that seems logically irrelevant to the present argument.

      Consider: There are many ‘general’ models of computation – Lambda calculus, Turing machines, non-deterministic Turing machines, Actors model, cellular automata, etc.. – and it happens that some of these happen to exhibit concurrency and/or non-determinism. But that doesn’t imply non-determinism and concurrency are essential for computation.

      To say: “Pi calculus is a general model of concurrency and exhibits non-determinism, therefore all models of concurrency should exhibit non-determinism” is directly analogous to the false argument: “Lambda calculus is a general model of computation and exhibits substitution semantics, therefore all models of concurrency should exhibit substitution semantics”. I don’t know the name for this fallacy, but the formal structure seems invalid.

      So, either I’m badly misreading Professor Harper’s position, or he isn’t explaining it well… or his argument for his position isn’t logically valid.

      Of course, definitions aren’t based in logical validity. They’re based in history, and operational utility.

      Given that we have models exhibiting non-determinism without concurrency (such as NDTMs, or non-deterministic oracle machines) AND we have models exhibiting what most people would consider interactive concurrency without non-determinism (such as cellular automata, event calculus, temporal logics, synchronous reactive programming, and functional reactive programming), it seems ridiculous to me that we insist on conflating these. It’s simply more useful to address concurrency and non-determinism as orthogonal properties.

    • define orthogonal. it’s a pretense word that has no meaning.

    • dmbarbour says:

      Here, I use the word ‘orthogonal’ to describe variables that may vary independently, even if they don’t in practice.

      For example: the color of a car is orthogonal to the shape of the car, even though some models of car have a high correlation with certain colors.

      When I say that concurrency is orthogonal to non-determinism, that perhaps doesn’t make much sense to you – because you are attempting to *define* concurrency as meaning non-deterministic composition. (Even though we already have the word “non-determinism” for that purpose.) Unfortunately, your definition isn’t compatible even with the English definition of concurrency (which is a rather low bar to meet!). A glance at offers: “cooperation, as of agents or causes; combined action or effort” or “simultaneous occurrence; coincidence: the concurrence of several unusual events”

      I define concurrency as describing any model of independent but interacting subsystems, of which multi-agent systems are the prototypical example. The most important formal parts are the ‘independent’ (we can define these subsystems separately) and ‘interacting’ (we cannot explain the local behavior of a subsystem independently of the greater system). This fits the English definition, and includes common examples like multi-agent systems or pi calculus (where processes may be independently specified and interact).

  11. “let val (xsl, xsg) = split xs”

    That has linear running time, so your parallel algorithm takes N+N/2 +N/4+…=2N. Which is a log N improvement over the sequential time. That’s kinda bad. I think I’ll stick with my imperative program which reduced the running time *to* log N, not *by*.

    • This can also be done in lg time in functional languages; just don’t use a list. Nothing whatever to do with imperative.

    • So why don’t you give that code? I suspect it looks far less elegant than what you’re displaying here.

    • You can represent a sequence as a tree for which surgery takes logarithmic time. And, as a bonus, it does it with a persistent, rather than a more restrictive ephemeral, data structure, and hence is readily used in a parallel setting. Can’t beat it.

    • vicdiesel says:

      Ever heard of caches? TLB? One of the reasons that functional languages will never be relevant in practice is that their practitioners have no concept of efficiency other than asymptotic. In the real world proportionality constants actually matter.

    • Funnily enough, Guy Blelloch and I wrote a paper on just this topic published at POPL 2013, called “Cache and I/O Efficient Functional Algorithms”. I suggest you have a look at that, and at the growing body of work on analyzing the efficiency of functional programs. My PFPL has a chapter on writing efficient parallel programs using a functional model; it’s based on Guy’s work on using FL’s for efficient parallelism. Manuel Chakravarty and Gabriele Keller, among others, have done a lot of work in this area over the last decade or two.

    • vicdiesel says:

      I’ll look for your paper. Just to motivate me: what percentage of peak performance do you get on matrix-matrix multiplication? Do you do better than the Fortran reference?

    • Undoubtedly not on dense matrix multiplication, but I don’t have figures to hand. But that’s hardly the point. The algorithms that people wish to consider these days are far more sophisticated than anyone could stand to program in Fortran. Surely you appreciate this, since you seem to program in Fortran rather than machine code, or dedicated hardware built from transistors specifically for dense matrix multiplication. And it is far easier to take advantage of parallelism in a functional language than in any imperative language, including Fortran. Vectorization is a joke, and HPF was a total failure, I wonder why?

    • vicdiesel says:

      HPF was dead on arrival because compiler writers don’t understand caches and the cost of memory copies. Its performance was a fraction of what the Fortran programmers were used to, so they never considered it. That had nothing to do with being imperative. It was in fact closer to your abstract ideal than plain Fortran. (See where I’m coming from? Most attempts at abstraction suck at performance.)

      I don’t understand your “vectorization is a joke” comment. On modern processors, if you don’t vectorize you don’t get performance. Does your functional code generate AVX instructions? I suspect not. Your remarks about representing a list as a tree will also work beautifully in the face of the reality of cachelines. Not.

      Look, I like abstractions. I just think that the functional abstraction (for as far as I’ve seen it) ignores the reality of modern architectures.

      Oh, and about taking advantage of parallelism, how about I give you a cluster with a couple thousand nodes, how well will you fare?

    • Chakravarty is the best person to ask about current performance figures. By “vectorization” I mean trying to turn DO loops into vector operations; it’s an artifact of one-by-one processing. Compilers for parallel platforms make full use of available parallel instructions sets, including coprocessors.

    • vicdiesel says:

      “Compilers for parallel platforms make full use of available parallel instructions sets, including coprocessors.” You wish. So why are we still unrolling loops so that the compiler will finally do its damn job?

    • I meant compilers for functional languages do so. There are no loops to unroll, that’s the whole point.

    • vicdiesel says:

      But a few comments back you were proposing to implement a list as a tree to make splitting linear. How are you going to apply vector instructions to that? Never mind that you’re probably losing 7/8ths of your bandwidth because of your tree organization.

      Really, do you know what the stream benchmark is? If you imlemented that with one of your tree-sequences, can you get more than, oh, say, 5 percent of the stream performance? The code is simple, shouldn’t take more than 10 minutes to write.

      Report back to me…..

  12. […] adding lamda expressions to the language, Robert Harper from Carnegie Mellon University argues that functional programming can hide concurrency complexities from the programmer and allow the runtime environment to take care of them. This seems like the […]

  13. johneganz says:

    Now I can hear you object, but isn’t concurrency required to implement parallelism?

    Nope. My first objections is “but isn’t parallelism required to implement concurrency?”

    Parallelism, on the other hand, is all about dependencies among the subcomputations of a deterministic computation. The result is not in doubt, but there are many means of achieving it, some more efficient than others. We wish to exploit those opportunities to our advantage.

    Concurrency, on the other hand (at least as you have defined it in this post), is really nothing more than the dynamic resolution at run time of scheduling which of the all the possibly non-determinsitic computations that could possibly be executing at that moment in time in such a way that they can execute in parallel so that the result is deterministic.

    In fact, I don’t even agree with your definitions of the terms “concurrency” and “parallel”. It is clear that a “dictionary” definition of the terms, at least in the context in which they are being used, both mean “at the same time.”

    You have attempted to add additional qualifiers to their usage to further specialize their meaning. IMHO, and this is certainly a point that reasonable people can disagree on, the additional qualifiers you have applied to the terms are not in common usage in the community that would use these terms for the purposes being discussed. Personally, I have never see the words used in a CS context in the way that you have described, nor do I see how your specialization of the terms adds to their descriptive power.

    Assuming that the two terms mean something different, fundamentally both “concurrent” and “parallel” computations must produce results that are deterministic. Often when dealing with concurrent/parallel programming on typical hardware, the amount of time it takes to perform the computation can be non-deterministic, but it is always a deterministic result.

    In practice, on the vast majority of commercially available computers, concurrent or parallel programming must eventually be reduced to some atomic primitives provided by the underlying architecture. For a LL/SC based architecture, the amount of time it takes to perform an atomic compare and swap is non-deterministic. This means that in practice, on most LL/SC based architectures, all parallel operations will most likely reduce to a primitive that takes a non-deterministic amount of time to complete.

    It is my opinion that “concurrent” and “parallel” mean exactly the same thing, and that the distinction you assert exists is not one that it shared by the larger community.

    I also find your definition somewhat vague and confusing- it uses the terms “deterministic” and “non-deterministic” to differentiate concurrency and parallelism such that

    Parallelism, on the other hand, is all about dependencies among the subcomputations of a deterministic computation.

    What does this mean? It would seem to imply that concurrent computations are “non-deterministic”, but non-determinstic how? The result they compute? Or the amount of time? Clearly we are only interested in computations that produce deterministic results, which leaves only the amount of time it takes to produce those deterministic results. The only conclusion is that “Concurrency takes a non-deterministic amount of time, and parallelism takes a deterministic amount of time.” But on most commercially available computers, truly deterministic, bound time parallel computations are simply not possible because the architecture does not provide atomic primitives that are guaranteed to complete in a bounded number of steps. Clearly this reasoning about your definition of the terms is in error because it implies that from a practical point of view, parallelism doesn’t exist, only concurrency.

    • johnzabroski says:


      Concurrency is about externally visible behavior. Parallelism adds no expressive power to a language. The time a parallel operation takes to execute is whatever time it starts until it produces output. In mechanized solutions, model of parallelism is really a property derived from the language semantics itself – for example, programs defined in terms of absolutely non-circular attribute grammars can then have a model of parallelism that shreds the program graph by leveraging the non-circularity property. The results can also be memoized and in turn used for incremental rather than strictly parallel computation. A given model tells you how much performance you can squeeze out from it.

      Also, there exists a form of concurrent programming, deterministic concurrency, that has no race conditions, is as easy to program as sequential programs, and can implicitly exploit parallel processors: dataflow programming. It is the only solution to shared memory concurrency that scales, and is arguably the most appropriate default solution for taking advantage of multi-core processors.

      But there is more to concurrency research than multi-core processors. The Internet’s infrastructure has grown massively, and it is approaching its ultimate bottleneck – the speed of light! For most applications, the speed of light is embarassingly adequate, but roundtrip communication around the globe faces real communication issues, such as propagation delay, and therefore communication protocol design becomes the dominating factor in performance. Translation: The better we do this, the cooler the Cloud apps we can build that end-users get really excited about.

      Finally, there is no reason why nondeterministic composition means composition in the sense of the Kahn Principle. True, real world concurrency (as in, what is implementd in programs today) might have meager or ambiguous agents, resulting in the input-output relation not matching the history relation of the program. We might be nondetereministically composing problems, and doing it in dumb ways. The point behind teaching concurrency is to teach how to do it effectively for real world problems, and avoid traps proven to exist, thanks to good theory – like the Brock-Ackerman Anomaly. You really, really don’t want to build systems that pervasively exhibit the Brock-Ackerman Anomaly, if at all. That’s because determinism is one of the most important properties you can shoot for in designing systems; you only give it up when you have to, not simply “cuz”.

    • bowaggoner says:

      As I understand it, the author views parallelism as an algorithmic or complexity-oriented concept. You can take any algorithm and analyze in terms of its efficiency in terms of parallelism. To do so requires no knowledge of the machine on which the algorithm runs. This process is “deterministic” in the sense that the orders of steps are clearly defined. “First the list is split in half, then the two halves are sorted in parallel.” “Given k processors working in parallel, the running time of this algorithm is O(k log k / n).”

      On the other hand, concurrency is taken to mean the methods and steps required to actually implement parallel algorithms. These include mutexes and semaphores and all of the above. These are obviously machine-dependent and “non-deterministic” in the sense that the order and timing of steps are not clearly defined or known ahead of time. “Thread 2 waits until the lock is released by some other thread, then acquires the lock and waits on a signal….”

  14. […] for first-year students at Carnegie Mellon.  Apart from parallelism, wbich I’ve already discussed, we are placing a strong emphasis on verification and proof as tools for the practicing programmer, […]

  15. schmerg says:

    I wrote a brief (some would say cryptic) note of some experimentation I did in this area – particularly a functional language based approach to achieve parallel execution of large computations without the need to address concurrency in any way in the expression of the problem.

    My particular bugbear concerned the composability of such systems – assembling computations from libraries of code and the like to run across large computation grids of multicore machines.

    In particular the language assembles the program so that it can executed in the minimum depth (as you describe) when given sufficient resources, but it can also balance any arbitrary problem across fewer resources and determine an optimal minimum depth for execution, and all this is done without changing the source code that expresses the problem.

  16. marklillibridge says:

    Do you count a memorization as a functional programming primitive? Without it, some algorithms will be exponentially slower.

    I don’t think you can implement memorization without concurrency.

    • abstract type says:

      If state is the same as concurrency, you’re right, but ordinarily people don’t think of it that way. You can have memoization or not according to your selection of types. There is no reason to force it on you, or deprive you of it.

  17. […] Parallelism is not concurrency In an earlier post I mentioned that one goal of the new introductory curriculum at Carnegie Mellon is to teach […] […]

  18. […] Parallelism is not concurrency « Existential Type Useful read RT: @tunixman Parallelism is not concurrency « Existential Type (tags: […]

  19. dmbarbour says:

    There are many models for deterministic concurrency: temporal logic programming, functional reactive programming, future passing with single assignment variables (promoted heavily by Peter Van Roy), Kahn process networks, concurrent constraint programming. I do not agree that concurrency is inherently coupled to non-determinism.

    I do distinguish concurrency and parallelism: concurrency is a semantic property that says many behaviors are happening at the same time, and parallelism is an implementation property that says many calculations are happening at the same time. Concurrency can be implemented without parallelism, and parallelism without concurrency.

    Concurrency is sometimes an good opportunity for parallelism. And high-performance parallelism is often easier to manage if you control it semantically (i.e. with concurrency) rather than as a mere implementation/optimization detail.

    • rafaeldff says:

      I believe you are conflating nondeterminism with ‘observable non-determinism’. Since you mentioned PvR, check out CTM section 4.1 (page 238).

      Bearing this distinction in mind, dataflow concurrency and concurrent constraint programming both involve nondeterminism.

    • dmbarbour says:

      The relevant issue is whether nondeterminism exists in the semantics. PvR was working with Oz semantics, so he was stuck in the rather precarious position of explaining ‘yes, Oz threads can be indeterministic, but this here model is deterministic’. If you extracted just the model in question into its own language, though, there would be no question that the semantics are deterministic.

      Concurrent constraint programming is often extended with nondeterminism of a very observable form: choice, backtracking.

  20. yminsky says:

    I think I understand the argument you’re making, but I would argue it’s not as cut-and-dried as you suggest.

    It’s true that concurrency is primarily about non-determinism, and there exist useful deterministic approaches to parallelism for which concurrency is not terribly relevant. But efficient parallel systems are almost always non-deterministic, and I think this is no accident. Building efficient deterministic parallelism is hard, and I believe it’s not actually possible for the general case, because it requires too much of an understanding in advance of the costs of the different parts of a computation than is reasonable to expect.

    So, while I think distinguishing concurrency and parallelism is important, and that a deterministic approach to parallelism is a great pedagogical place to start, I remain wholly unconvinced that concurrency is not relevant to parallelism.

    • jlouisa says:

      Let me counter your argument with the following: You surely know Google’s Sawzall language for map-reduce queries. Where in map-reduce do you deploy concurrency to solve your problem? Where is the non-deterministic part? I think the answer to those questions are “nowhere”.

      Another, more machine-level observation is the distinction between a GPU (mostly SIMD executing) and a SMP CPU (mostly MIMD executing). The SIMD approach is massively parallel, yet it stands no chance of ever doing concurrent work since all cores are step-locked in choreography to execute the same instruction.

      Note that MIMD is much more applicable to a heavily concurrent setting. If we have 30000 individual concurrent processes, we can as well map them to as many physical cores we have and gamble that spinning the cores will yield at least some performance benefit. This doesn’t work in a SIMD setting since the c. processes might be executing wildly different tasks.

    • abstract type says:

      Sawzall, and its derivatives, well illustrate my point.

  21. marchamann says:

    I am a “language-based model of computation” fellow traveler, and I agree that functional programming is a great basis for parallelism.

    However, I have to disagree with your characterization that parallelism and concurrency having nothing to do with each other.

    While your rationale is much stronger than some of the other merely pedantic presentations of the same idea I’ve seen elsewhere, I think that parallelism and concurrency are at the very least dual to each other, perhaps even the same phenomenon looked at from two different ends.

    Parallelism happens when you start out thinking you have a single “atomic” computation, but then realize that you really want to break it up into independent sub-computations. You solve this problem, as you point out, by understanding where your real dependencies are, and where the perceived dependencies are merely artifacts of fuzzy thinking and sloppy organization.

    As you also point out, functional programming is a great tool for helping to clarify these issues.

    Concurrency, on the other hand, happens when you start out thinking you have two completely independent computations, but it turns out that they share dependencies, intentional and necessary or not. You solve this problem by understanding where your real dependencies are, and where they are merely artifacts of fuzzy thinking and sloppy organization, such as the gratuitous use of shared state.

    Here too, functional programming is a great tool to clarify which is which.

    I guess ultimately my objection is that, by separating the two phenomena, you are being insufficiently bold in your claims for FP! ;-)

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: