Bellman on “Dynamic Programming”

April 21, 2014

Everyone who has studied algorithms has wondered “why the hell is Bellman’s memorization technique called dynamic programming?”.  I recently learned the answer from my colleague, Guy Blelloch, who dug up the explanation from Richard Bellman himself:

“I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes.

“An interesting question is, ‘Where did the name, dynamic programming, come from?’ The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various rea- sons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities” (p. 159).

As with algorithms, so too with dynamic languages?

Update: why is it called “memoization” and not “memorization”?

Update: rewrite of the commentary.

Parallelism and Concurrency, Revisited

April 9, 2014

I still get compliments on and criticisms of my post from three years ago (can it possibly be that long?) on parallelism and concurrency.  In that post I offered a “top down” argument to the effect that these are different abstractions with different goals: parallelism is about exploiting computational resources to maximize efficiency, concurrency is about non-deterministic composition of components in a system.  Parallelism never introduces bugs (the semantics is identical to the sequential execution), but concurrency could be said to be the mother lode of all bugs (the semantics of a component changes drastically, without careful provision, when composed concurrently with other components).  From this point of view the two concepts aren’t comparable, yet relatively few people seem to accept the distinction, or, even if they do, do not accept the terminology.

Here I’m going to try a possible explanation of why the two concepts, which seem separable to me, may seem inseparable to others.

I think that it is to do with scheduling.

One view of parallelism is that it’s just talk for concurrency, because all you do when you’re programming in parallel is fork off some threads, and then do something with their results when they’re done.  I’ve previously argued that parallelism is about cost, but let’s leave that aside.  It’s unarguable that a parallel computation does consist of a bunch of, well, parallel computations, and so it is about concurrency.  I’ve previously argued that that’s not a good way to think about concurrency either, but let’s leave that aside as well.  So, the story goes, concurrency and parallelism are synonymous, and people like me are just creating confusion.

Perhaps that is true, but here’s why it may not be a good idea to think of parallelism this way.  Scheduling as you learned about it in OS class (for example) is a altogether different than scheduling for parallelism.  There are two aspects of OS-like scheduling that I think are relevant here.  First, it is non-deterministic, and second, it is competitive.  Non-deterministic, because you have little or no control over what runs when or for how long.  A beast like the Linux scheduler is controlled by a zillion “voodoo parameters” (a turn of phrase borrowed from my queueing theory colleague, Mor Harchol-Balter), and who the hell knows what is going to happen to your poor threads once they’re in its clutches.  Second, and more importantly, an OS-like scheduler is allocating resources competitively.  You’ve got your threads, I’ve got my threads, and we both want ours to get run as soon as possible.  We’ll even pay for the privilege (priorities) if necessary.  The scheduler, and the queueing theory behind it is designed to optimize resource usage on a competitive basis, taking account of quality of service guarantees purchased by the participants.  It does not matter whether there is one processor or one thousand processors, the schedule is unpredictable.  That’s what makes concurrent programming hard: you have to program against all possible schedules.  And that’s why it’s hard to prove much about the time or space complexity of your program when it’s implemented concurrently.

Parallel scheduling is a whole ‘nother ball of wax.  It is (usually, but not necessarily) deterministic, so that you can prove bounds on its efficiency (Brent-type theorems, as discussed in a previous post and in PFPL).  And, more importantly, it is cooperative in the sense that all threads are working together for the same computation towards the same ends.  The threads are scheduled so as to get the job (there’s only one) done as quickly and as efficiently as possible.  Deterministic schedulers for parallelism are the most common, because they are the easiest to analyze with respect to their time and space bounds.  Greedy schedulers, which guarantee to maximize use of available processors, never leaving any idle when there is work to be done, form an important class for which the simple form of Brent’s Theorem is obvious.

Many deterministic greedy scheduling algorithms are known, of which I will mention p-DFS and p-BFS, which do p-at-a-time depth- and breadth-first search of the dependency graph, and various forms of work-stealing schedulers, pioneered by Charles Leiserson at MIT.  (Incidentally, if you don’t already know what p-DFS or p-BFS are, I’ll warn you that they are a little trickier than they sound.  In particular p-DFS uses a data structure that is sort of like a stack but is not a stack.)  These differ significantly in their time bounds (for example, work stealing usually involves expectation over a random variable, whereas the depth- and breadth-first traversals do not), and differ dramatically in their space complexity.  For example, p-BFS is absolutely dreadful in its space complexity.  (For a full discussion of these issues in parallel scheduling, I recommend Dan Spoonhower’s PhD Dissertation.  His semantic profiling diagrams are amazingly beautiful and informative!)

So here’s the thing: when you’re programming in parallel, you don’t just throw some threads at some non-deterministic competitive scheduler.  Rather, you generate an implicit dependency graph that a cooperative scheduler uses to maximize efficiency, end-to-end.  At the high level you do an asymptotic cost analysis without considering platform parameters such as the number of processors or the nature of the interconnect.  At the low level the implementation has to validate that cost analysis by using clever techniques to ensure that, once the platform parameters are known, maximum use is made of the computational resources to get your job done for you as fast as possible.  Not only are there no bugs introduced by the mere fact of being scheduled in parallel, but even better, you can prove a theorem that tells you how fast your program is going to run on a real platform.  Now how cool is that?

[Update: word-smithing.]

[Update: more word-smithing for clarity and concision.]