Thoughts on "Maybe Not"

A friend sent me Rich Hickey's talk, "Maybe Not" the other day. If you haven't seen it, you should at least skim through it before reading this post. Like pretty much all of Hickey's talks, it's worth watching. The talk was given at a Clojure conference in 2018, and the context for the talk was the upcoming clojure.spec type validation system. clojure.spec provides a flexible type annotation and runtime type validation system for Clojure.

I brought a variety of perspectives to the talk. I am currently programming in Rust professionally. I previously worked in Python (with and without type annotations), TypeScript, JavaScript, and Perl. I am a fan of functional programming, with hobbyist experience in Haskell and Clojure. I think Lisps are great: I use emacs as my editor and frequently tinker with things in Emacs Lisp.

My Rust experience in particular made me feel like, somewhat unusually, Hickey was leaving out some important context in his talk, and that is largely what inspired me to write up these thoughts.

Major Points of "Maybe Not"

The focus of the talk is the handling of optionality in type systems.

Need for Optionality

Hickey covers a few places where optionality is generally used:

Optional function arguments (although this isn't necessarily needed in languages like closure with variadic and keyword arguments)
Conditional function returns (for example trying to get a record out of a database with a given ID)
Managing partial information in aggregates

He briefly points out that the third use may be an antipattern, but we get into that more later.

Review of Systems for Handling Optionality

It starts with a discussion of null and null-pointer errors, i.e. having your type system completely unable to protect you from trying to access absent data. This is obviously a non-solution, so we move on to the classical solution in functional programming, which is the Maybe or the Option type.

The problem that Hickey points out with Maybe/Option is essentially that it's not a "true union" — instead it's just another type. This means that either relaxing restrictions on input types or increasing restrictions on output types requires changes to downstream callers of an interface. (Being flexible in what you accept and specific in what you return is generally a good principle for robust system design.)

Consider the following interface in Rust:

fn add_item(item: String);

Imagine I decide that actually it's alright if I don't get a String, I can just do nothing in that case. The function signature then changes to:

fn add_item(item: Option<String>);

Of course, this requires updates at every single call-site of the function, e.g. changing this:

let val = "foo".to_owned();
add_item(val);

To this:

let val = "foo".to_owned();
add_item(Some(val));

Similarly, if I have an existing function where I return an optional value:

fn get_item(idx: usize) -> Option<String>;

And I decide I can tighten up the output by, for example, always returning a default value:

fn get_item(idx: usize) -> String;

This now requires updates again at every downstream call-site, e.g. changing this:

let val = get_item(2).unwrap();

to this:

let val = get_item(2);

Of course, the Rust compiler is fantastic and will guide us through the process of fixing all the call-sites step-by-step, but this is a real concern for open-source packages, where you might want to improve your API without it being a breaking change.

Hickey contrasts this implementation of optionality with the ? nullability indicator in Kotlin, which we also see in TypeScript. In either of those languages, if you have a function that currently takes a String, you can update it to take String? without breaking any call-sites, since the argument will now accept a "true union" of types, i.e. either a String or a null.

Records vs Maps

Anyone familiar with Clojure or Hickey will be aware of the strong preference for generic data structures like maps to custom records/structs. Hickey spends some time pointing out that maps are essentially simple functions that convert keys to values and that records engage in "place-oriented programming," since they are both representations of data but also a (potentially mutable) location in memory.

Hickey points out that with maps, unlike with records, there is no need for something like a Maybe type, since if a field is absent it can simply be omitted. Which is to say that a record necessarily asserts the existence of a field, so if it's value may be absent, the value's type must account for this, while a map does not assert the existence of any particular field.

The other problem that he points out with records as types is that they hamper reuse. If I want to reuse my User type in many places, I either have to ensure I have a full and complete User (even if I just want the username), or I have to make many/most/all of its properties optional.

The clojure.spec Solution

There are a few properties that clojure.spec is trying to optimize for:

Re-use: you should be able to have one canonical definition of a type and be able to use it in a variety of contexts. If you only need one property of the type in that context, the presence or absence of the other properties shouldn't matter.
Symmetric request/response: types should be able to support a unified definition for a type even in situations where it might be used across a boundary where neither side can know the full instantiation of a type. Think clients creating records where they don't yet know the IDs.
Pipelines: a type should support being partially constructed by multiple steps in a pipeline
Aggregation/Nesting: types should be arbitrarily composable

Essentially all of these are getting at the same idea: we want a canonical type definition that eschews any optionality and just tells us what the values' types are if they are present. We then want some solution to specify which values must be present in a given context. And indeed this is the solution that clojure.spec settled on. The current implementation looks a little different than the one in the talk, but let's assume you have a canonical type like the following representing a user account (examples lightly modified from the clojure.spec docs):

(s/def :acct/id int?)
(s/def :acct/first-name string?)
(s/def :acct/last-name string?)
(s/def :acct/email string?)

Note that here the ? on types is not an indicator of nullability. The specs are actually composed of predicates, so int? is essentially a function that returns true if a value is an integer or false otherwise. This is a really nice feature, because you can use your own arbitrary predicates to validate custom types!

You can compose the above types into arbitrary aggregates, let's say the following, which represents some user that has successfully registered:

(s/def :acct/registered 
  (s/keys :req [:acct/id :acct/email] 
          :opt [:acct/first-name :acct/last-name]))

And this one that represents the person associated with the account:

(s/def :acct/person
  (s/keys :req [:acct/first-name :acct/last-name]))

These types can be used to validate objects at runtime, e.g. checking if a map represents a fully registered account:

(s/valid? :acct/registered 
  {:acct/id 27 
   :acct/first-name "Bob" 
   :acct/last-name "Jones"})
;; => false

You can also put validation into the :pre and :post conditions of a function to validate its inputs and return type, respectively. For example, a function that returns a user's full name:

(defn full-name
  [person]
  ;; assert our input conforms to :acct/person
  {:pre [(s/valid? :acct/person person)],
  ;; and our output is a string
   :post [(s/valid? string? %)]}
  (str (:acct/first-name person) " " (:acct/last-name person)))

In addition, function signatures can be specified using the (s/fdef) function, which puts the types into the function docstring automatically and enables some neat options for development and testing. For example, a function that sends a user an email really only cares that the email property is present, and its spec:

(defn send-email 
  "Send the specified content to the specified account"
  [content account]
  (smtp-send content (:acct/email account))
  "Sent email")
  
(s/fdef send-email
  ;; s/cat defines a sequence of items. Here we want content to be a string
  ;; and the account to be a map with the required key :acct/email
  :args (s/cat :content string? :account (s/keys :req [:acct/email]))
  :ret string?)

These function definition specs can then be used to instrument functions in testing and pre-production environments, performing runtime type checking of arguments automatically, by calling:

(stest/instrument `send-email)

And they can be used for property-based testing to automatically validate the function:

(stest/check `send-email`)

So, all told, a pretty nifty type annotation system! I think it effectively solves for the its use-cases, and the Clojure team did a great job optimizing for their desired properties: the types are super flexible, allowing for substantial reuse, the optionality of types is separate from their definitions, allowing functions to only require what they care about, and they support arbitrary combination and aggregation.

So how does this compare to other contexts?

Relation to Actual Types

The first thing that stood out to me is that this is clearly a type annotation system and not a type system, which is to say that it only describes the content of existing data structures, rather than the data structures themselves. The former can generally only be validated at runtime, while the latter can be validated at compile time. Compile-time validation of types is important in low-level and systems programming, because it enables the compiler to optimize code paths in a way that's not possible when everything is a pointer.

This is also why a "true union" type, e.g. String? seems very unlikely in Rust. Rust requires that the in-memory size of function inputs and outputs be known at compile time, and the compiler relies on the types being accurate in order to produce correct and optimized code.

Consider again a function that takes a string:

fn add_item(item: String);

Rust doesn't have a null value, but if we imagine that it did, and then imagine making the function nullable:

fn add_item(item: String?);

It's now impossible for the compiler to know what the in-memory size is of item! If it's a String, it's (approximately) a pointer to an array of chars somewhere on the heap. That pointer has a known size. If we imagine our theoretical null instead, we now no longer know how much memory we're taking up or how to access it safely. In order for this to work, the compiler would need to implicitly wrap the incoming value in some kind of enum with a known size and a discriminant, and then we would have to perform some kind of runtime validation in order to determine whether value is a String or a null. Given rust's focus on performance, it tries to stick to zero-cost abstractions and making costs explicit. Implicitly constructing an enumeration and doing a runtime check to determine type seems therefore pretty much immediately out of the running.

So, what's the solution? Do it explicitly! An Option is essentially:

enum Option<T> {
    Some(T),
    None,
}

Rust's enums have a known size (determined by their composite types) and defined semantics for discriminating which type is present at runtime. In order to use the value in the Option, you have to write code that handles the null case. This makes the implicit type-level nature of something like String? explicit, and makes it clear that, no matter how performant it is, it is less performant than a function that just takes a String.

Of course, the argument about changing a function's type requiring downstream changes is still valid! If I go from a String to an Option<String>, my callers are going to have to make updates. However, I would argue that this is appropriate, since these things are fundamentally different types. A string vs "maybe a string" have to be handled differently in memory, by the compiler, and by the user.

If you want the kind of optionality that you get from Clojure's maps in Rusts, i.e. having properties that may or may not be present, Rust is going to make you be explicit about the requisite cost: you can make your own heterogeneous HashMap with fallible runtime downcasting, and you can even build a specification language for your maps that validates the content of the map at runtime, but these abstractions are high-cost relative to regular function arguments and compile-time types. As is I think typical of Rust, if you need the functionality, it's not out of reach, but the language always wants you to think about the cost of what you're doing.

What about Generics

Okay, but also, there are other ways to solve some of these problems without passing maps around all over the place. One that I'm surprised wasn't mentioned is generics. Rust's trait system allows you to specify the behavior of incoming types without specifying their types. This allows some degree of "true union" behavior. For example, let's say my function that takes a string only needs it to print some value to the console:

fn print_value(value: String) {
    println!("{:?}", value);
}

Here, we actually can loosen the restrictions quite a bit! println! is a macro for string formatting, and the "{:?}" formatting operator only requires that the incoming value implement the Debug trait. So, we can rewrite our function like so:

fn print_value<T: Debug>(value: T) {
    println!("{:?}", value);
}

Since String implements Debug, this does not break any existing callers! However, it does allow a much more flexible range of inputs, like:

let optional_str: Option<String> = None;

print_value(optional_str);

Behavior-based generics offer a lot of potential for relaxing function inputs after the fact: as long as the behavior of your inputs can be encapsulated in a trait or several, you can avoid requiring downstream callers to change their invocations. In the worst case scenario, you would need to define and implement a custom trait for your desired input types, and downstream callers would just need to ensure that trait was in scope (i.e. by importing it) before calling the function.

What about Macros

Another interesting way of achieving some degree of type flexibility is with macros. We have written a Rust macro at SpecTrust, for example, that generates a copy of a struct with certain fields omitted or included. Use of it looks something like this:

#[derive_struct(name="AccountPerson", only=["first_name", "last_name"])]
#[derive_struct(name="RegisteredAccount", only=["id", "email"])]
struct Account {
    id: usize,
    first_name: String,
    last_name: String,
    email: String,
}

This gives us three, real, compile-time structs: Account, AccountPerson and RegisteredAccount, with only the desired fields. Updates to Account are automatically reflected in the derived structs.

Where this approach is objectively more difficult is of course that these are still explicit types, so functions cannot "relax" from Account to AccountPerson without updates at call-sites, and callers must explicitly construct the type that the given function expects (although this is somewhat eased with From implementations).

What about Type Operators / Higher Kinded Types

Another option here is type operators, i.e. types that operate on existing types to create new ones, and higher-kinded types (HKTs), i.e. composable, first-class type operators. As an example, TypeScript provides a selection of type operators called utility types. Despite not supporting higher-kinded types, these utility types give us a lot of the flexibility we see in clojure.spec.

Essentially, I can define a canonical type in TypeScript like:

type Account = {
    id: int,
    firstName: string,
    lastName: string,
    email: string,
};

I can then use type operators to create derivations of this type. For example, one where all fields are optional:

type OptionalAccount = Partial<Account>;

Or one where only certain fields are required:

type RegisteredAccount = Pick<Account, "id" | "email">;

As you can see, we can replicate a lot of the same reuse that we see in clojure.spec through use of type operators. Of course, TypeScript has maybe an unnatural advantage in this space, since TypeScript type are fundamentally based around the object (which is a map structure) and are structural, rather than nominal.

I'm not sure that this approach is as useful for struct/record optionality in languages with nominal types, although I am sure I would be surprised at what you can do.

Summary

All that being said, I think clojure.spec is a pretty darn cool type annotation system. It blows typescript's type operators out of the water from a flexibility standpoint. The ability to automatically generate tests and to instrument code to emit runtime type errors in development are a great way to improve confidence in the types. However, it is unfortunate that it's only an annotation system and thus provides no real runtime guarantees, however useful it might be in the development and testing phase.

The difference between type systems and type annotation systems is I think the main reason why I think the criticism of Maybe misses the mark. Expressing the presence or absence of a key in a map is a fundamentally different thing than expressing the type of an optional value, and I haven't seen a better solution that Maybe/Option for the latter. Also, presumably if you're only doing your type-checking in development/testing, you would still want to do some extra runtime checking in case your expected properties existed, rather than just blowing up if they're not there. For that case, functor/monad-like composition with Maybe/Option is a nice, elegant solution, but I'm sure there are others.

Essentially, the clojure.spec system provides a similar value to something like JSONSchema for dealing with JSON data, except clojure.spec's syntax is way better, it is significantly more flexible, and it applies to the clojure's native, heterogeneous map type.

One of the main points Hickey constantly makes about Clojure is that programs are about data. And it's clear from the language design that the preferred way of dealing with data is in existing data structures like maps, arrays, and so on, rather than in record types or structs or whatever. Whether you agree with this or not is beside the point, but given that constraint, and given Clojure's heavy focus on usage of these data structures a system like clojure.spec makes a lot of sense. You may not get top-tier performance relative to passing around statically typed data on the stack or homogeneous data on the heap, but the vast majority of programs don't need that kind of performance.

I would be super interested to see a macro-based crate in rust that uses a syntax like clojure.spec's to implement runtime-validated, heterogeneously typed hash maps. I think it could potentially be really useful for cases where some performance could be sacrificed for the increased flexibility.

Thanks for reading, and I hope you enjoyed this post! If you have any corrections or thoughts, feel free to email me. I'm no expert in type theory or Clojure, so it would not surprise me at all if I've made some mistakes.

Created: 2021-11-26

Tags: clojure, Rich Hickey, rust, types

Thoughts on "Maybe Not" #

Major Points of "Maybe Not" #

Need for Optionality #

Review of Systems for Handling Optionality #

Records vs Maps #

The clojure.spec Solution #

Relation to Actual Types #

What about Generics #

What about Macros #

What about Type Operators / Higher Kinded Types #

Summary #