Algebraic Data Types

We have just seen different data types in Haskell, and introduced the concept of polymorphic types as demonstrated by examples in Python. Yet, we have not discussed how we can create our own (polymorphic) data types in Haskell!

Haskell is a purely functional language, so do not expect classes here. In OOP, objects have both data (attributes) and behaviour (methods), whereas this is not necessarily a principle in FP (although, you can have data types with functions as fields since functions are first-class). We already know how to create functions, so now we must investigate how we can create data types in a purely functional language.

If we think about it carefully, we might realize that data types are a mix of the following:

A type and another type and...and yet another type
A type or another type or...or yet another type

We can express the following types using and and or over other types:

A Fraction consists of a numerator (Int) and a denominator (Int)
A Student consists of a name (String) and an ID (Int)
A Bool is either True or False
A String is either an empty string or (a head character (Char) and a tail list (String))
A polymorphic Tree is either (a leaf with a value of type a) or (a node with a value (a) and a left subtree (Tree a) and a right subtree (Tree a))

This formulation of data types as products (and) and/or sums (sum) is what is known as Algebraic Data Types (ADTs) (not to be confused with Abstract Data Types). In Haskell, types are sums of zero or more constructors; constructors are products of zero or more types.

To create a new data type in Haskell, we can use the data keyword. Let us create a fraction type based on our algebraic specification above:

data Fraction = Fraction Int Int

half :: Fraction
half = Fraction 1 2

On the left hand side we have the declaration of the type, and on the right hand side, a list of constructors separated by | that help us create the type. Note that the Fraction on the right hand side is the name of the constructor of the type; it in fact can be distinct from the name of the type itself (which is very helpful when you have more than one constructor). As you can see, to construct a Fraction (the type), the Fraction constructor receives two Ints, one numerator, and one denominator.

Then, defining the student type from our algebraic formulation above should also be straightforward:

data Student = S String Int

bob :: Student
bob = S "Bob" 123

Let us define the Bool type, which should have two constructors, each constructor not having any fields:

data Bool = True | False

true, false :: Bool
true = True
false = False

To construct a Bool we can use either the True constructor or the False constructor. Neither of these constructors receive any other fields.

We can also have multiple constructors that are products of more than zero types, as we shall see in the algebraic formulation of a String:

data String = EmptyString | Node Char String

hello, empty :: String
hello = Node 'h' (Node 'e' (Node 'l' (Node 'l' (Node 'o' EmptyString))))
empty = EmptyString

Polymorphic Algebraic Data Types

Now we show examples of creating our own polymorphic data types. The way we would do so is similar to how we defined generic/polymorphic classes in Python.

Let us start from the bottom again by creating specialized versions of a box type, this time in Haskell. We start by assuming that a box contains an Int:

data IntBox = IB Int
b :: IntBox
b = IB 1

Then define a box that contains a String:

data StrBox = SB String
b :: StrBox
b = SB "123"

Again, they look more or less the same, except for the type of the field. As such, we should allow Box to be polymorphic by introducing a type parameter:

data Box a = B a
x :: Box Int
x = B 1
y :: Box String
y = B "123"

Perfect! Let us try more complex polymorphic algebraic data types like linked lists and trees:

data LinkedList a = EmptyList | Node a (LinkedList a)
cat :: LinkedList Char
cat = Node 'c' (Node 'a' (Node 't' EmptyList))

data Tree a = Leaf a | TreeNode (Tree a) a (Tree a)
tree :: Tree Int
tree = TreeNode (Leaf 1) 2 (Leaf 3)

Constructors are actually functions!

ghci> data Fraction = F Int Int
ghci> :t F
F :: Int -> Int -> Fraction
ghci> :t F 1
F 1 :: Int -> Fraction
ghci> :t F 1 2
F 1 2 :: Fraction

We now have the facilities to define and construct data types and their terms, but so far we are not able to access the fields of a data type in Haskell. Unlike Python, we are not able to do something like x.numerator to obtain the numerator of a fraction x, for example. There are ways to define functions that do so and we will show them to you in later sections, but for now, Haskell has record syntax that automatically defines these accessor functions for us.

Let us re-create the Student type, this time using record syntax to automatically derive functions that obtain their names and IDs:

data Student = S { name :: String, id :: Int }

With this, we no longer need to define our own functions that access these fields for us. Record syntax is great for giving names to fields! Importantly, record syntax is nothing special, and we can continue to create terms of those types by way of usual constructor application.

x, y :: Student
x = S { name = "Alice", id = 123 }
y = S "Bob" 456

Let's try loading this into GHCI and see the accessor functions in action:

ghci> name x
"Alice"
ghci> id y
456

You can also make use of record syntax to express record updates. For example, we can update Alice to have the ID of 456 like so:

ghci> id x
123
ghci> z = x { id = 456 }
ghci> name z
"Alice"
ghci> id z
456

Of course, the original term was not actually updated since everything is immutable in Haskell—x { id = 456 } simply constructs a new term that contains the same values for all its fields, except where the id field now takes the value 456.

We can even mix and match these different forms of constructor definitions, or create large data structures!

data Department = D {name' :: String, courses :: [Course]}
data Course = C { code :: String, 
                  credits :: Int,
                  students :: [Student] }
data Student = UG { homeFac :: String,
                    name :: String,
                    id :: Int }
             | PG [String] String Int

alice   = UG "SoC" "Alice" 123
bob     = PG ["SoC", "YLLSoM"] "Bob" 456
it5100a = C "IT5100A" 2 [alice]
it5100b = C "IT5100B" 2 [alice, bob]
cs      = D "Computer Science" [it5100a, it5100b]

More on Polymorphism

Now that we have shown how to create our own algebraic data types in Haskell (and polymorphic ones), we step aside and give a mental model for understanding polymorphism. Recall that we have described polymorphic functions and types as functions/types that quantifies/parameterizes types; in other words, they receive a type as a parameter.

Recall in the lambda calculus that $\lambda$ creates a function over a parameter. Assuming the parameter has type $S$ and the returned value has type $T$, we get:

$$\lambda x.e: S \to T$$

and when we call or apply this function, we are substituting the parameter for the argument of the function application: $$(\lambda x.e_1)e_2 \equiv_\beta e_1[x:=e_2]$$

$$\begin{aligned} (\lambda x: \mathtt{Int}.x + 4)3 &\equiv_\beta (x + 4)[x := 3]\\ &\equiv_\beta (3 + 4)\\ & \equiv_\beta 7 \end{aligned}$$ In Haskell (the expression in parentheses is a lambda expression):

ghci> (\x -> x + 4) 3
7

A typed variant of the lambda calculus known as System $F$ has polymorphic functions, which are functions that also receive a type parameter. We can then apply this function onto a type argument to get a specialized version of that function. Such type parameters are bound by $\Lambda$. As an example, if we have a term $e$ of type $T$, we get:

$$\Lambda \alpha.e: \forall\alpha.T$$ Calling or applying this function with a type argument, once again, substitutes the type parameter with the type argument:

$$(\Lambda\alpha.e)\ \tau\equiv_\beta e[\alpha := \tau]$$ $$(\Lambda\alpha.e)\ \tau : T[\alpha := \tau]$$

$$\begin{aligned} (\Lambda \alpha.\lambda x:\alpha.[x]) \mathtt{Int} &\equiv_\beta (\lambda x:\alpha.[x])[\alpha := \mathtt{Int}]\\ & \equiv_\beta \lambda x:\mathtt{Int}.[x] \end{aligned}$$

We can show this with an example in Haskell. Explicit type arguments must be enabled with a language extension and the type arguments must be prefixed by @:

ghci> :set -XTypeApplications -fprint-explicit-foralls
ghci> :{
ghci| f :: forall a. a -> [a]
ghci| f x = [x]
ghci| :}

ghci> :t f
f :: forall a. a -> [a]

ghci> :t f @Int
f @Int :: Int -> [Int]

ghci> f @Int 1
[1]

On the other hand, polymorphic types can be seen as functions at the type-level. These are "functions" that receive types and return types! For example, we can define a Pair type that is polymorphic in its component types. Thus, the Pair type itself (not its constructor!) receives two types, and returns the resulting Pair type specialized to those component types. This makes Pair what is known as a type constructor.

To observe this fact, know that types are to terms as kinds are to types: they describe what kind of type a type is. The usual types that we encounter Int, [[Char]] etc. have kind *, and type constructors or "type-level functions" have kind * -> * for example. Below, we show that Pair is a type constructor of kind * -> * -> *, which makes sense since it receives two types and returns the specialized type of the Pair:

ghci> data Pair a b = P a b
ghci> :k Pair
Pair :: * -> * -> *
ghci> :k Pair Int
Pair Int :: * -> *
ghci> :k Pair Int String
Pair Int String :: *

We know that we can have higher-order functions, for example, the type of map might be something like (a -> b) -> [a] -> [b]. Can we have higher-order type constructors? Yes! These are known as higher kinds or higher-kinded types. These types receive type constructors as type arguments. Let us construct a higher-kinded type that receives a type constructor and applies it onto a type:

ghci> data Crazy f a = C (f a)

Upon visual inspection we can see that f must be a type constructor, because the constructor C receives a term of type f a! What's crazier is, when inspecting the kind of Crazy, we see that it exhibits kind polymorphism:

ghci> :set -fprint-explicit-foralls
ghci> :k Crazy
Crazy :: forall {k}. (k -> *) -> k -> *

To give you an example of how this might work, because we know we can construct lists of any type, [] (the type, not the empty list) must be a type constructor. We can thus pass the [] type constructor into Crazy:

ghci> :k Crazy []
Crazy [] :: * -> *
ghci> :k Crazy [] Int
Crazy [] Int :: *

How might this work? We see that Crazy [] Int has kind *, so we should be able to construct a term of this type. We can do so by using the C constructor defined above! To be clear, let's see the specialized version of the constructor with the type arguments entered:

ghci> :t C @[] @Int
C @[] @Int :: [Int] -> Crazy [] Int

As we can see, to construct a term of this type, we just need to pass in a list of integers to C:

ghci> x :: Crazy [] Int = C [1]

We can in fact instantiate other crazy types with different type constructors:

ghci> data Box a = B a
ghci> y :: Crazy Box Int = C (B 2)

The utility of higher-kinded types may not be apparent to you now; later on we might see some of them in action!

Although this might confuse you so far, what we have demonstrated merely serves to demonstrate the idea that parametric polymorphism can be thought of the phenomenon where something (type or term) can receive a type and give you a type or term, just as we have stated at the beginning of Chapter 2.2 (Polymorphism).

Other Polymorphisms

At the start of Chapter 2.2 (Polymorphism) we introduced three questions, two of which have been answered. Let us restate the final question and pose one more:

Can types depend on terms?
Are there other kinds of polymorphism?

The answers to both questions is yes. Types that depend on terms are known as dependent types, which we shall not cover in this course. There are also other kinds of polymorphisms, some of which you have already dealt with. Subtype polymorphism is used frequently in OOP, since subclasses are types that are subtypes of their superclasses. An umbrella term ad-hoc polymorphism generally refers to overloading, which we shall discuss in the future. There are also more kinds of polymorphisms, but we shall not discuss them in this course.

Python (and several other mainstream languages) is quite special, being a multi-paradigm language means that several forms of polymorphism are applicable to it. In particular, we have seen that Python supports parametric polymorphism, and since Python supports OOP, it also has subtype polymorphism. Despite Python not having algebraic data types (yet), we may also formulate our types to behave similarly to Algebraic Data Types. Two formulations we may attempt are: 1) with types as unions and constructors as classes, 2) with types as classes and constructors as their subclasses. Below we present both formulations for the linked list type:

# (1)
type List[a] = Node[a] | Empty

@dataclass
class Empty:
    pass

@dataclass
class Node[a]:
    head: a
    tail: List[a]

x: List[int] = Node(1, Node(2, Empty()))

# (2)
from typing import Any
@dataclass
class List[a]:
    pass

@dataclass
class Empty(List[Any]):
    pass

@dataclass
class Node[a](List[a]):
    head: a
    tail: List[a]

x: List[int] = Node(1, Node(2, Empty()))

There are some differences between the two formulations, and between these with Haskell's Algebraic Data Types. Most importantly, in Haskell, data types are types, but constructors are not. This is unlike Python, where all classes are types. That means a variable of type Node[int] is valid in Python, but a variable of type Node Int is not in Haskell.

Generalized Algebraic Data Types

However, something interesting is going on here. In the second formulation, a Node[a] is a List[a], which makes sense. On the other hand, an Empty can be typed as List[Any], because an empty list fits all kinds of lists. An interesting observation you might see is that the supertype of our "constructors" need not strictly be List[a], it could be any kind of list!

Consider the following example of defining simple expressions in a programming language, which is defined polymorphically using OOP:

class Expr[a]:
    def eval(self) -> a:
        raise Exception

The Expr class is parameterized by the type of its evaluation. From this class we may now create subclasses of Expr. For example, some simple numeric expressions.

@dataclass
class LitNumExpr(Expr[int]):
    n: int
    def eval(self) -> int:
        return self.n

@dataclass
class AddExpr(Expr[int]):
    lhs: Expr[int]
    rhs: Expr[int]
    def eval(self) -> int:
        return self.lhs.eval() + self.rhs.eval()

We can then create other kinds of expressions. For example, an equality expression that returns booleans:

@dataclass
class EqExpr[a](Expr[bool]):
    lhs: Expr[a]
    rhs: Expr[a]
    def eval(self) -> bool:
        return self.lhs.eval() == self.rhs.eval()

Or even a conditional expression whose evaluated type is parameterized:

@dataclass
class CondExpr[a](Expr[a]):
    cond: Expr[bool]
    true: Expr[a]
    false: Expr[a]
    def eval(self) -> a:
        return self.true.eval() if self.cond.eval() else self.false.eval()

Let's try this out! Suppose we would like to evaluate the following expression:

if 1 == 2 then 1 + 1 else 0

Let's write this in the program using our classes and evaluate it!

zero: Expr[int] = LitNumExpr(0)
one: Expr[int] = LitNumExpr(1)
two: Expr[int] = LitNumExpr(2)
one_plus_one: Expr[int] = AddExpr(one, one)
one_eq_two: Expr[bool] = EqExpr(one, two)
cond: Expr[int] = CondExpr(one_eq_two, one_plus_one, zero)
print(cond.eval()) # 0

How do we create such an algebraic data type in Haskell? For this, we have to use Generalized Algebraic Data Types (GADTs). Loosely, these are algebraic data types like before, except that each constructor can decide what type it returns!

First, let us formulate our original algebraic data types using GADT syntax.

data LinkedList a where
    EmptyList :: LinkedList a -- this is a different a!
    Node :: b -> LinkedList b -> LinkedList b

Now let us take it a step further, and truly customize the constructors of an Expr GADT:

data Expr a where
    LitNumExpr :: Int -> Expr Int
    AddExpr    :: Expr Int -> Expr Int -> Expr Int
    EqExpr     :: Expr a -> Expr a -> Expr Bool
    CondExpr   :: Expr Bool -> Expr a -> Expr a -> Expr a

Pretty neat huh! There are many uses of GADTs, and we might see them in the future. In the next section, we will show you how we can write functions against algebraic data types and GADTs, including how we can implement the eval function.

Typed Functional Programming

Algebraic Data Types

Polymorphic Algebraic Data Types

More on Polymorphism

Other Polymorphisms

Generalized Algebraic Data Types