Type Systems
As the course title suggests, Haskell is a typed functional programming language—in particular, it uses a statically-typed type system. This begs the question, "what is a type system?"
An online search for definitions might give you the following:
Definition (Type System). A type system is a tractable syntactic method for proving the absence of certain program behaviours by classifying phrases according to the kinds of values they compute.
Let us unpack the highlighted phrases in the definition above.
Tractable syntactic method
Tractable more or less means easy, or polynomial time. Method refers to a formal method, which means it is a kind of mathematically formal process. The fact that it is a syntactic method means that this formal analysis can be done syntactically, without the need to appeal to a semantic analysis (although, static type checking is done against the static semantics of the type system). More or less, it can be performed without executing any of the code it is analyzing.
Proving the absence of certain program behaviours
In the case of type systems, this usually means that the type system is used to prove the absence of type errors. The realm of program analysis is broken down into roughly two kinds: over-approximation analyses, and under-approximation analyses. Notice that both perform approximations of program behaviour—this is because obtaining a precise specification of any program is undecidable. Generally, static analyses, like type checking, perform an over-approximation of program behaviour. An analogy of how this works is as follows: assume true program behaviour is \(x\) and buggy behaviour is at \(y\) (these are all positive numbers, let's say). We then over-approximate the true program behaviour, giving us \(x + \epsilon\). If we can show that \(x + \epsilon < y\), then we can guarantee that \(x < y\), so the program is not buggy.
A more concrete example is as follows. Let's suppose we have the following code snippet in Python:
y: int = 0 if f() else 'abc'
print(y + 1)
Notice that if we can determine that f
always returns
True
, then we know for sure that there will be no type
errors. However, it is not possible to make this determination in
general. Thus, we over-approximate program behaviour by assuming that it
is possible that f
may return either True
or
False
leading us to show that we cannot prove the absence of
type errors in this program. Instead, if we had written the following:
y: int = 0 if f() else 1
print(y + 1)
Then even by assuming that both branches of the conditional expression
may be the result, we can conclusively show that y
will
always be an int
. Our over-approximation of program behaviour
doesn't have type errors, meaning, that our actual program really does
not have type errors.
Kinds of values they compute
This is a simple description of what types are. Types, as we will informally define later, are classifications of data/things in the program that all behave similarly or have similar characteristics. In some other sense, types can be seen as abstractions over terms.
Simply put, a type system is a formal system that lets us show that there won't be type errors. As we have seen, the nature of [statically-typed] type systems forces us to program in a different way (at least compared to dynamically typed languages like Python), and this is what we will explore in this chapter.
Types
Type systems are systems of types; but what is a type? In essence, a type is like a kind of thing, or a high-level description of what something is. Types (1) give meaning to some data, and (2) describe what its members are like.
Since you have already programmed in Python, you should have some inkling of what types are. In Python, everything is an object. Thus, in Python, the type of an object is the class from which it was instantiated.
The following is some sample output showing the types of various objects. The output of all these function calls are classes.
>>> x = 1
>>> type(x)
<class 'int'>
>>> type('abc')
<class 'str'>
>>> class A: pass
>>> type(A())
<class '__main__.A'>
This is very apt—classes are blueprints for creating objects, and (for the most part), all instances of a class will abide by the specification as laid out in the class. Therefore, Python's type system based on classes is very appropriate for our purposes. In fact, this is not unique to Python. Many other languages with OO features also have classes as types.
In Python, we mainly think of types as being bound to objects, that is, objects have reified types that can be accessed at runtime. We have never thought of assigning types to variables or function parameters, since when we are investigating the type of a variable, what we are really doing is investigating the type of the object that is referred to by the variable. However, Python actually does allow us to annotate variables, function parameters etc with types to document "suggestions" as to what the types of the objects assigned to them should be.
Observe the following program fragment.
def f(x: int) -> str:
y: int = x * 2
return f'{x} * 2 = {y}'
z: int
z = 3
s: str = f(z)
print(s) # 3 * 2 = 6
This program fragment contains several type annotations. In the
function header, we have a specification for f
to receive an
int
and return a str
. That is, if the type
annotations make sense, then passing an int
into f
will always result in a str
. In the function body, we also
have an annotation for the variable y
stating that it is also
an int
. This makes sense—if x
is an int
,
then so will x * 2
. Actually, the type of y
can be
inferred (a type checker can determine the type of y
automatically), so our type annotation for it is not necessary. Outside
the function body we have other type annotations, documenting what the
types of the other variables are. On visual inspection, we can see that
all the type annotations make sense and we have adhered to them fully;
we are thus guaranteed that we have no type errors.
While Haskell also provides the capability for type annotations, a notable distinction lies in Haskell's enforcement of adherence to these annotations. Consequently, it might be more fitting to refer to them as type declarations. Nevertheless, the core concept remains unchanged: specifying the types of variables, functions, or terms ensures that, when adhered to correctly, our program will be well-typed.
The following code snippet shows some Haskell code with type declarations.
f :: Int -> String
f x = show x ++ " * 2 = " ++ show y
where y = x * 2
z :: Int
z = 3
s :: String
s = f(z) -- 3 * 2 = 6
A natural question would be to ask, what types can we declare variables
to be of? We have looked at some basic types earlier, Int
,
String
(which is an alias for [Char]
),
Char
, [Int]
, Bool
,
Double
etc. There are many other types in Haskell's Prelude,
and later on we will see how we can create our own types.
Declaring types for functions is slightly different. In Python, when
writing type annotations for functions, we are really annotating the
types of its parameters, and its return type. In Haskell, we are
declaring the type of the function itself. The difference is actually
not as large as one might imagine. If the function receives a type \(S\)
and returns a type \(T\), then the function has the type \(S\to T\). We
similarly use arrows to declare the type of functions in Haskell. Thus,
as above, since f
receives an Int
and returns a
String
, then f
itself is of the type
Int -> String
.
Haskell has roots in formal systems, in particular, System \(F_C\), which is a dialect of System \(F\omega\) (without type lambdas). Thus, the types of terms can be described formally. Knowing the formal typing rules of Haskell is not required, but may give you some insight as to how it works. Below we show the typing rules for function declarations, more accurately, lambda abstractions.
\[\frac{\Gamma,x:S\vdash e: T}{\Gamma\vdash\lambda x.e : S \to T}\text{T-Abs}\]
The T-Abs rule is an inference rule stating that if the premise above the line is true, then the conclusion below the line will also be true. Let's first parse the premise. The part to the left of \(\vdash\) is the typing environment, more or less describing the type declarations we have at the point of analysis of the program. Specifically, \(\Gamma\) is the actual type environment, while \(x: S\) is an additional assumption that a variable \(x\) has type \(S\). The part to the right of \(\vdash\) describes the judgement of the type of \(e\) being \(T\). Overall, the premise states "given what we have so far, if in assuming \(x\) is of type \(S\) we get that \(e\) is of type \(T\), ...". The conclusion can be understood similarly: it states that the typing environment \(\Gamma\) will show that the function \(\lambda x.e\) has type \(S \to T\). Putting these together, the rule states that "given typing environment \(\Gamma\), if by assuming that variable \(x\) has type \(S\) we get that the expression \(e\) is of type \(T\), then \(\Gamma\) will also show that the type of the function \(\lambda x.e\) is of type \(S \to T\)".
A simple demonstration in Python is as follows: suppose we have \(x\) as
x
and \(e\) as x * 2
. If we assume that x
is of type int
, then we know that x * 2
will also
be an int
. Therefore, the type of \(\lambda x.e\) which is
lambda x: x * 2
is int -> int
1.
What about multi-parameter functions? Remember that in Haskell, all
functions are curried, thus, all functions in Haskell are single
parameter functions. Curried functions receive one parameter, and return
a function closure that receives the remaining variables and
eventually will return the final result. Therefore the (+)
function actually looks more like:
# Python
def add(x):
return lambda y: x + y
The type of add
is more like int -> (int -> int)
.
This is (more or less) the type of (+)
in Haskell, which
(more or less) has type Int -> Int -> Int
. Note that
->
is right-associative, so Int -> Int -> Int
is
the same as Int -> (Int -> Int)
.
In Haskell, the types of everything are fixed. This should be unsurprising since everything in Haskell is immutable, but it is a restriction that can also be found in other less restrictive languages like Java and C++. In this environment, we have to, perhaps ahead of time, decide what the type of a variable, function, function parameter is, then write the implementation of your function around those restrictions.
The following code
snippet first declares the type of f
before showing its
implementation. It is not only good practice to declare types above
their implementation, but it can be a nice way to frame your mind around
the implementation of your function—start by providing a high-level
specification of your function, then work on the implementation to
describe what the function is actually trying to achieve.
f :: Int -> String -- explicit type declaration
f x = show x ++ "!"
g x = x + 1 -- type of g is inferred
However, observe that the type of g
is not defined. This
does not mean that the type of g
is dynamic or is not being
checked; rather, Haskell can infer the principal (most liberal) type
of g
via a process known as type inference. That still
means that the implementation of g
itself must be well-typed
(its implementation does not break any of the typing rules), and that
any users of g
must abide by its static type signature.
Generally speaking, it is good practice to declare the types of
top-level bindings—that is, nested bindings of functions, variables (for
example, in let
expressions) do not need type declarations
and can often be inferred. The example above of the declaration of
f
is a perfectly idiomatic way of defining and declaring a
function, unlike g
which lacks a type declaration.
Programming with Types
When learning Python, you might not have had to think very much about
types; this is because Python does not care about type annotations. For
example, you can happily annotate a variable to be an int
but
then assign a string into it. This is very much unlike Haskell, where
adherence to type declarations and well-typedness is enforced by the
compiler—the compiler will reject any program that is not well-typed.
Observe the following program fragment:
f :: Int -> String -- explicit type declaration
f x = show x ++ "!"
g = f "1" -- compiler throws type error as f receives Int, not String
The definition of f
is well-typed since it abides by all the
typing rules, and all the types make sense. However, since f
only receives Int
, passing a String
into it is a
clear violation of the rules. Thus, the entire program is ill-typed and
will not be compiled. Try this for yourself!
Programming in such a strict and formal language can feel restrictive, but these restrictions actually feel more like "guard rails" or "seatbelts"; if your program passes the checks done by the compiler, you can be quite assured that it works. As the saying goes, in Haskell, "if it compiles, it works". Although this is not necessarily true, Haskell's robust and expressive type system allows you to rule out a large class of bugs, and often, results in correct programs. However, one question to ask is: how do we go about programming with static types?
The first step of being able to program with types is understanding the typing rules. We shall elide explanation on how typing works with inference, typeclasses, polymorphism etc. and focus solely on the simplest typing rules:
-
In a binding
x = e
, the type ofx
must be the same as the type ofe
-
In a conditional expression
if x then y else z
, the type ofx
must beBool
and the types ofy
andz
must both be equal to some typea
; the type of the entire expression isa
-
In a function application expression
f x
the type off
must bea -> b
for somea
andb
,x
must be of typea
, and the type of the expression isb
-
(Without loss of generality of number of parameters) For a function binding
f x = e
the type off
must bea -> b
for somea
andb
and in assumingx
to be of typea
,e
must be of typeb
.
Try calculating the types of every expression in the following code snippet. Can you get it all right?
f :: Int -> Int -> [Int]
f x n =
if n == 0 then
[]
else
let r = f x (n - 1)
in x : r
Let's work through this example.
- We are declaring
f
to be of typeInt -> Int -> [Int]
, so it stands to reason that in the definition off
we are assuming thatx
andn
are both of typeInt
. - For this to be well-typed, we must ensure that the conditional expression evaluates to
[Int]
, that means both branches must themselves evaluate to[Int]
. - First we observe the condition
n == 0
; the(==)
function receives two numbers and returns aBool
, so this is well-typed. - Looking at the
True
branch, we see that we are returning the empty list, which matches the type of[Int]
. - In the
False
branch, we have alet
expression, so we must ensure thatx : r
evaluates to[Int]
too. - The
let
binding contains a bindingr = f x (n - 1)
; knowing that (by our own declaration)f
has typeInt -> Int -> [Int]
, knowing thatx
andn - 1
are of typeInt
means we can safely conclude thatr
has type[Int]
(of course, the(-)
function receives two integers and returns an integer). - The
(:)
function receives anInt
and a[Int]
and returns a[Int]
, so all the types match.
Overall, we have seen that we successfully determined the types of every expression in the program fragment, and concluded that it is well-typed.
Now that you are familiar with the basic typing rules and (roughly) how types are inferred, the next step is to get comfortable writing programs with static types. Generally this comes with practice, but one great way to get you started with typeful programming is to try letting the types guide your programming.
Suppose we are trying to define a function f
that receives
an integer x
and returns a string showing the result of
multiplying x
by 2:
ghci> f 3
"3 * 2 = 6"
ghci> f 5
"5 * 2 = 10"
Let us try implementing this function. The first thing we have to
consider is the type of f
itself, which by definition,
should receive an Int
and return a String
. As
such, we may start with the type declaration
f :: Int -> String
.
Next, we know we are eventually going to have to convert x
into a String
. We know that there is a show
function that does that. Its type signature (modified) is
Int -> String
, so we know that show x
is a
String
.
We also know that we need to multiply x
by 2. For this, we
can use the (*)
function, which has a (modified) type
signature of Int -> Int -> Int
. Thus, we can write
x * 2
and that gives us an Int
. Knowing that we
eventually need to display it as a String
, once again, we
can rely on the show
function.
Now we have all the numbers we need in String
form, we need
to concatenate them together. For this, we can rely on our trusty
(++)
function that receives two String
s and
returns a String
. Using this allows us to concatenate all
our desired strings together. Since our original function f
was meant to return a String
, we can return it as our final
result.
f :: Int -> String
f x =
let sx :: String = show x
y :: Int = x * 2
sy :: String = show y
in sx ++ " * 2 = " ++ sy
This is a simple example of using types to guide your programming. While seemingly trivial, this skill can be incredibly useful for defining recursive functions!
Suppose we are trying to define a function that sums the integers in a
list. As always, we must decide what the type of this function is. As per our
definition, it receives a list of integers and returns the final sum,
which should be an integer as well. This gives us the type declaration
sum' :: [Int] -> Int
.
First, let us define the base case. We should be quite clear on what the
condition for the base case is: it should be when the input list is empty.
What should we return in the base case? By our type declaration, we must
return an Int
, so we must express our base result in that
type. The result is 0
, which matches our type declaration.
Next we must define the recursive case. This one might be tricky
initially. We know that we can make our recursive call, passing in the
tail of the input list. This might look something like
sum' (tail ls)
. We must be very clear about the type of this
expression; as per the type declaration, the result is an
Int
, and not anything else.
We also know that we want to add the head of the input list to the
result of the recursive call. In doing so we get an Int
.
Finally, we can add the results together, giving us an Int
,
which matches our return type.
sum' :: [Int] -> Int
sum' ls =
if null ls
then 0
else let r :: Int = sum' (tail ls)
hd :: Int = head ls
in hd + r
By getting used to types, having a statically-typed system no longer feels like a chore or a hurdle to cross, and instead feels like a support system that makes everything you are doing clear! Many developers (including myself) love statically-typed programming languages for this very reason, so much so that people have gone to great lengths to add static typing to otherwise dynamically typed languages like JavaScript (the typed variant of JavaScript is TypeScript).
Python is no different. Several static type checkers are out there to
help us analyze the well-typedness of our program. One of the most
popular analyzers is mypy
, which was heavily developed by Dropbox.
However, I recommend pyright
because at the time of writing, it has
implemented bleeding edge features that we need for further discussion
of types which we shall see very shortly.
Let's see pyright
in action. We shall write an ill-typed
program and see if it catches the potential bug:
# main.py
def f(x: int, y: int) -> int:
z = x / y
return z
Running pyright
on this program will reveal an error message:
pyright main.py
pyright main.py
/home/main.py
/home/main.py:4:12 - error:
Expression of type "float" is incompatible with return
type "int"
"float" is incompatible with "int" (reportReturnType)
1 error, 0 warnings, 0 informations
Great! This makes sense because assuming x
and y
are of type int
, the type of z
should actually be
float
! Let's correct the program and try running pyright
against the new program:
# main.py
def f(x: int, y: int) -> int:
z = x // y
return z
pyright main.py
0 errors, 0 warnings, 0 informations
Very well! We have now learnt how to program with types in Haskell and
in Python, and since Python does not come with a type-checker, we are
able to use tools like pyright
to do the type checking for us!
One additional great feature about pyright
is that it is actually also a
language server. As such, you can include pyright
in your favourite text
editors so that it can catch bugs while writing programs!
Python doesn't have arrow types. The actual type of the function
is Callable[[int], int]
.