Existential Crisis

In a long discussion at LtU, user Jules Jacobs advances a use-case that for him would have been difficult to solve in a statically-typed language. I focus on his first two points:

  1. A data structure that is a container of T plus a function on T's. I would have needed existential types to hide the T, which I don't get in many languages.
  2. A couple of functions that take heterogeneous list of items that the function will convert to the interface it needs the elements to be with an extensible conversion function. […]

It's true, not many languages have existential types but OCaml happens to do, or at least allows existentials to be encoded. I'll take a big hint from MLton and I'll try to solve both problems (I won't pretend to solve his actual problems; just my interpretation of the problems he posed). Another reason why I write this post is because I always forget what the essence of existential types is. Pierce writes:

Similarly, there are two different ways of looking at an existential type, written {∃X,T}. The logical intuition is that an element of {∃X,T} is a value of type [XS]T, for some type S. The operational intuition, on the other hand, is that an element of {∃X,T} is a pair, written {*S,t}, of a type S and a term t of type [XS]T.

[…] To understand existential types, we need to know two things: how to build (or introduce, in the jargon of §9.4) elements that inhabit them, and how to use (or eliminate) these values in computations.

An existentially typed value is introduced by pairing a type with a term, written {*S,t}. A useful concrete intuition is to think of a value {*S,t} of type {∃X,T} as a simple form of package or module with one (hidden) type component and one term component. The type S is often called the hidden representation type, or sometimes (to emphasize a connection with logic, cf. §9.4) the witness type of the package. For example, the package p = {*Nat, {a=5, f=λx:Nat. succ(x)}} has the existential type {∃X, {a:X, f:XX}}. The type component of p is Nat, and the value component is a record containing a field a of type X and a field f of type XX, for some X (namely Nat).

(Types and Programming Languages, p. 363–364) I quote at some length because Pierce's presentation is so condensed that for me it really bears repeating; also, because every time I want a refresher I can come to this post instead of cracking open the physical book. The gist of it is pithily and more memorably stated in Mitchell and Plotkin's slogan, abstract types have existential type.

The complication is that in ML types are not values, so in order to pack an existential we must reify the type component as a term, for instance as a pair of functions, an injection and a projection:

module type UNIV = sig
  type t
  val embed : unit -> ('a -> t) * (t -> 'a)

Why UNIV when we're talking about existential types? Go ask Oleg. Actually, there is a type isomorphism analogous to the logical equivalence between (∃x. P x) ⇒ Q and x.(P x ⇒ Q); as Jeremy Gibbons writes …the justification being that a datatype declaration such as type e = ∃t. E of t foo introduces a constructor E : (∃t. t foo) → e, and this type is isomorphic to t.(t foo → e) because e is independent of t (Haskellisms paraphrased).

In any case, here the existential type is t, and embed produces a (inj, prj) pair that can be applied to values of some type α. Only the prj of the pair can recover the injected value; the use of any other prj will fail. There are at least two possible implementations, one using references and another one using exceptions (which are values of a single open, extensible, generative type). The latter is very clever:

module Univ : UNIV = struct
  type t = exn
  let embed (type u) () =
    let module E = struct
      exception E of u
      let inj x = E x
      let prj = function E x -> x | _ -> raise Not_found
    end in E.(inj, prj)

The use of the named type u and a local module are 3.12 features that allow declaring a polymorphic exception (compare the SML solution in MLton's website). Since the exception constructor E is different for every invocation of embed (this is the "generative" bit referred to above), only the prj of the pair can recover the value:

let () =
  let inj_int  , prj_int   = Univ.embed ()
  and inj_float, prj_float = Univ.embed () in
  let r = ref (inj_int 13) in
  let s1 = try string_of_int   (prj_int   !r) with Not_found -> "None" in
  r := inj_float 13.0;
  let s2 = try string_of_int   (prj_int   !r) with Not_found -> "None" in
  let s3 = try string_of_float (prj_float !r) with Not_found -> "None" in
  Printf.printf "%s %s %s\n" s1 s2 s3

Note that the reference r holds values "of different types" via the corresponding inj. This code typechecks and when run outputs:

13 None 13.

On top of this "universal" existential type we can build heterogeneous property lists à la Lisp (see again MLton's site):

module PList : sig
  type t
  val make     : unit -> t
  val property : unit -> (t -> 'a -> unit) * (t -> 'a)
end = struct
  type t = Univ.t list ref

  let make () = ref []

  let property (type u) () =
    let inj, prj = Univ.embed () in
    let put r v = r := inj v :: !r
    and get r   =
      let rec go = function
      | []      -> raise Not_found
      | x :: xs -> try prj x with Not_found -> go xs
      in go !r
    in (put, get)

Each property must be created explicitly but independently of any list using it. They encapsulate an existentially-typed value; look-up just proceeds by attempting to project out the corresponding value. These lists really are magical:

let () =
  let put_name  , get_name   = PList.property ()
  and put_age   , get_age    = PList.property ()
  and put_weight, get_weight = PList.property () in
  let show p = Printf.printf "%s: %d years, %.1f kg\n"
    (get_name   p)
    (get_age    p)
    (get_weight p)
  let boy, girl = PList.make (), PList.make () in
  put_name   boy  "Tim";
  put_age    boy  13;
  put_weight boy  44.0;
  put_name   girl "Una";
  put_age    girl 12;
  put_weight girl 39.0;
  List.iter show [boy; girl]

Here boy and girl are two different, independent property lists that act as extensible records with labels get_name, get_age and get_weight. The list iteration prints the properties uniformly via show, without having to cast or do any strange contortions (at least not any outside the definitions). The output is:

Tim: 13 years, 44.0 kg
Una: 12 years, 39.0 kg

Of course nothing says that the property lists must be homogeneous in the properties they contain; looking for an inexistent property will simply fail. On the other hand, probably the preferred way to handle extensible records in OCaml is via objects, using structural subtyping in the same way dynamically-typed languages would use duck typing. This would make solving the original problem a little more familiar to Python or Ruby programmers; but then, recovering the original objects from the lists would be impossible without downcasts.


phil tomson said...

Wow. It's going to take me a while to understand this fully, but from what I can understand this is pretty cool. Please post more on this!

Brian Hurt said...

I don't think you need existential types or any significant type hackery to solve either of these problems. There are two cases here. One, you know what type (or types) can be in the list. Or two, you don't.

In case 1, where you know the type, then the type just becomes:
type 'a ex = (foo * (foo -> 'a)) list
If there is a known set of types that foo can be, then foo is just a variant type. Note that it's very easy to write a function of type (foo -> 'b) -> 'a ex -> 'b list in this case.

In case 2, where no constraints can be placed on the type, then the program can't touch it any ways. At which point you are just doing some form of lazy evaluation. At which point, you can do either:
type 'a ex = 'a lazy list
type 'a ex = (unit -> 'a) list
depending upon whether you want to to the caching of lazy evaluation or not. Note that the convenience function:
let capture f x () = f x
makes it easy to capture a function and a value to pass to it in a closure.