Web site sml-family.org Up and Running!

After far too long, and far too many obstacles to be overcome, Dave MacQueen, Lars Bergstrom, and I have finally prepared an open-source site for the entire family of languages derived from Standard ML.  The defining characteristic of Standard ML has always been that it has a rigorous definition, so that it is always clear what is a valid program and how it should behave.  And indeed we have seven different working compilers, all of which are compatible with each other, with the exception of some corner cases arising from known mistakes in the definition.  Moreover, there are several active projects developing new variations on the language, and it would be good to maintain the principle that such extensions be precisely defined.

To this end the sources of the 1990 and 1997 versions of the definition are on the web site, with the permission of MIT Press, as is the type-theoretic definition formulated by Stone and H., which was subsequently used as the basis for a complete machine-checked proof of type safety for the entire language done by Crary, Lee, and H.  It is be hoped that the errors in the definition (many are known, we provide links to the extensive lists provided by Kahrs and Rossberg in separate investigations) may now be corrected.  Anyone is free to propose an alteration to be merged into the main branch, which is called “SML, The Living Language” and also known as “Successor ML”.  One may think of this as a kind of “third edition” of the definition, but one that is in continual revision by the community.  Computer languages, like natural languages, belong to us all collectively, and we all contribute to their evolution.

Everyone is encouraged to create forks for experimental designs or new languages that enrich, extend, or significantly alter the semantics of the language.  The main branch will be for generally accepted corrections, modifications, and extensions, but it is to be expected that completely separate lines of development will also emerge.

The web site, sml-family.org is up and running, and will be announced in various likely places very soon.

Update: We have heard that some people get a “parked page” error from GoDaddy when accessing sml-family.org.  It appears to be a DNS propagation problem.

Update: The DNS problems have been resolved, and I believe that the web site is stably available now as linked above.

Update: Word smithing for clarity.


20 Responses to Web site sml-family.org Up and Running!

  1. Brian Adkins says:

    The SMLserver link (at the bottom of the home page in the “Other relevant links” section) consistently times out for me. I’ve tried on several different days.

  2. Where is the proof of type safety by Crary and Lee, and what definition of type safety do they use?

    • The soundness proof was carried out by Karl Crary, Daniel Lee, and me, based on foundational work by Mark Lillibridge, Chris Stone, Derek Dreyer, Karl Crary and me. A previous effort to verify the safety of Standard ML based on the Definition was not successful because there were too many mistakes and imprecisions in the formulation for that to be possible. Among many technical issues, the biggest problem is that the original definition attempted to defined the dynamics on the same external language as the statics, but this is not feasible for several reasons, the principal one being modules (signature matching has coercive effect). To achieve safety required a re-formulation of the definition as an elaboration from the External Language (the abstract syntax of Standard ML as we know it) into an Internal Language (a well-behaved typed lambda calculus derived by Chris Stone and me, and based on yet-earlier work by Mark Lillibridge and me). The idea is that elaboration produces the “true program” that you wrote in short-hand (e.g., using type inference and implicit signature coercion), which is then written in a type-safe language to which the dynamics is given. The Type-Theoretic Definition of Standard ML by Chris Stone and me was published in the Milner volume; this was the critical step in making the verification of safety possible.

      Taking advantage of work by Derek Dreyer, Karl Crary, and I on the type-theoretic semantics of modules, Daniel Lee, Karl Crary, and I developed a new Internal Language, a variant of Chris Stone’s and mine, and then proved this language to be type-safe using Twelf; this was reported at POPL 2007. Subsequently, Karl Crary and I, working one day per week for a couple of hours, pair-programmed the elaborator and the proof of well-typedness of well-elaborated code, again using Twelf. This was a matter of a semester’s work, one day per week, to finish off. Combining the Crary-Lee-Harper work on the type safety of the internal language with Karl’s and my proof of well-typedness for well-elaborated code, we obtained the first full mechanized proof of safety of a full-scale programming language. Given what we know now, I don’t think anyone would use any strategy but ours. Because of fashion, probably a lot of people would try to use Coq to do it, but I would not recommend it: for this sort of thing Twelf is the superior tool. (This was demonstrated in head-to-head competition in the POPLMark Challenge, which was supposed to determine which was the superior tool for mechanized metatheory of PL’s. We won handily, in two weeks, whereas the first partial Coq solution came in six months later, and no other full solutions were ever given, afaik.)

    • Oh, I forgot to mention that our definition of type safety is the standard one, called progress and preservation. The IL is equipped with a transition system with two properties: (1) if a state is well-typed and you make a transition, then the result state is well-typed, and (2) a well-typed state is either complete (represents an answer, or indicates a checked error), or can make another transition. The safety of the EL is defined as the well-typedness of a well-elaborated program. There is nothing to guarantee that every program you expect to elaborate will in fact do so; this is an empirical matter about how one expects type inference to behave, and so forth. The more the elaborator does for you automatically, the harder it is (usually) to say precisely when it is going to succeed. ML is particularly predictable in this respect, because of the elegant type inference mechanism, and relatively few cute tricks that make certain examples work out well but never scale in practice (in contrast to other languages I can think of). Although we never proved it, I believe it to be true that anything that would elaborate according to the definition (modulo some clear mistakes) would also elaborate in our formulation. But, truthfully, I never saw the point, because who says that the elaboration rules of the original definition are adequate in this sense? And relative to what? At some point you just say that the elaborator defines what the EL is, and rely on the readability of the elaboration specification to make clear what are the well-formed programs. I should add that in no case are we ever worse off than in the usual situation in which a language is “defined” by whatever it is some compiler is doing this week (and good luck understanding what that might be if you hit a weird case).

    • Daniel R. Grayson says:

      Thank you for your answers. Suppose we have the following C program:

      main () { int i; printf(“%d”,i); }

      Is that an example of a well-typed well-elaborated program? If so, is the unknown initial value of i part of the elaboration?

    • I wouldn’t know, because I have no idea what is the current “semantics” for C. According to me a C program is a mapping from 2^64 64-bit words to the same, so your program is basically a projection function. Garbage-in, garbage-out.

    • btw, it may help to say that in my view the only unsafe feature of c is casting as a function pointer, because doing so commits to a representation of programs as data. you really cannot make sense of this without bizarre contortions that no one would ever want to use, and which would never be portable.

    • Daniel R. Grayson says:

      Re: “According to me a C program is a mapping from 2^64 64-bit words to the same, so your program is basically a projection function.”

      Hmm — you assert that even though no definition of the language specifies the address of the variable i in the sample program I posted, nor whether a value is stored there before the main routine is entered? I would have thought the strongest statement that could be true in this direction would be that some compilers produce object code that amounts to projection functions, but that different compilers may validly produce different projection functions.

      If type safety doesn’t include predictability of object code behavior, then that seems to make the concept of type safety less useful for its traditional application of asserting that the code may safely be run on a server. After all, the value stored in the location addressed by “i” might be my social security number or my password.

    • none of this has any bearing on our achievement of proving safety of ml. but anyway, because c let’s you form &i, then every identifier has to have an address. i didn’t specify the assignment, but presumably some such info has to be attached to the memory on which the c code acts.

    • Daniel R. Grayson says:

      Thank you for your responses.

      Are there any consequences of your proof of safety of ML for predictability of behavior of object code? For example, does it follow that any two correct compilers will produce identical behavior for programs with the same elaboration? … for programs with the same ML source code?

    • yes, up to how one would model i/o channeks.

    • Daniel R. Grayson says:

      Thanks, that’s a good thing, then. If it’s generally true that proofs of type safety of programming languages imply that object code behavior is independent of the (correct) implementation of the compiler, then C cannot be proved to be type safe, simply because the language definition leaves some behaviors undefined (as illustrated by the sample program above).

      I wonder whether the following approximation to the definition of type safety, more understandable by lay people, is a useful one in your opinion: a programming language is a “safe” one if the behavior of a program is completely determined by the published semantics of the language, the source code of the program, and the input.

    • Yes, that’s a good criterion that underlies the technical definition of safety. Your suggestion might then be used as saying when a compiler is faithful to that semantics. So safety would be a semantics issue (self-consistent), and faithfulness connecting semantics to implementation.

  3. Harley says:

    I am trying to access the new site, but I am failing to get there. I get a Go Daddy page saying the website is parked for free.

    • Hmm, we’ll look into this. We had thought that we had ironed out all these things over the last couple of months.

    • Harley says:

      Sounds good! I do think this is a great service, and I am looking forward to see the site.

    • From Lars:

      The only thing that I can imagine is that either their DNS provider or their local DNS client did not update the entry when you changed the mapping at godaddy.

      They can try using one of the google DNS providers (e.g., to see if that helps. Otherwise, I suspect we’d have to know their specific operating system.

    • Harley says:

      Looks like it was my machine. It finally updated and I can see the site now. Sorry, I am on campus, so it must have been the network.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: