Updated

Pattern Matching

We have seen how we can write constructors for algebraic data types, and even use record syntax to create functions for accessing fields. However, one natural question would then be to ask, how do we write functions that access these fields, if we do not use record syntax? For example, if we defined a fraction type normally, how do we obtain a fraction's numerator and denominator?

The answer to this question is to use pattern matching. It is a control structure just like if-then-else expressions, except that we would execute different branches based on the value/structure of the data, instead of a general condition.

Let us define the factorial function using pattern matching instead of conditional expressions or guards. We use case expressions to do so:

fac :: Int -> Int
fac n = case n of -- match n against these patterns:
    0 -> 1
    x -> x * fac (x - 1) -- any other Int

The nice thing about pattern matching is that we can also match against the structure of data, i.e. to match against constructors. Let us redefine the fst and snd functions which project a pair into its components:

fst' :: (a, b) -> a
fst' p = case p of 
    (x, _) -> x

snd' :: (a, b) -> b
snd' p = case p of
    (_, y) -> y

Let us also write accessor functions to access the numerator and denominator of a fraction.

data Fraction = F Int Int
numerator, denominator :: Fraction -> Int
numerator f = case f of
    F x _ -> x
denominator f = case f of 
    F _ x -> x

One nice thing about Haskell is that because we perform pattern matching over the arguments of functions so frequently, we can actually bring the patterns up to the definitions of the functions themselves. Let us define all the functions we've just written using case expressions into more idiomatic uses of pattern matching.

fac :: Int -> Int
fac 0 = 1
fac n = n * fac (n - 1)
 
fst' :: (a, b) -> a
fst' (x, _) = x

snd' :: (a, b) -> b
snd' (_, y) = y
 
data Fraction = F Int Int
numerator, denominator :: Fraction -> Int

numerator (F x _) = x
denominator (F _ y) = y

We also know that the list type is a singly linked list, which is roughly defined as such:

data [a] = [] | a : [a]

We can use this fact to pattern match against lists! For instance, the sum of a list of integers is 0 if the list is empty, otherwise its the head of the list plus the sum of the tail of the list.

sum' :: [Int] -> Int
sum' [] = 0
sum' (x : xs) = x + sum' xs

Similarly, the length of a list is 0 if the list is empty, otherwise it is 1 more than the length of its tail.

len :: [a] -> Int
len [] = 0
len (_ : xs) = 1 + len xs

Really neat! Defining functions operating on algebraic data types (including recursive data types) are very convenient thanks to pattern matching! What's more, patterns can actually be used virtually anywhere on the left side of any binding:

Let us use pattern matching in a let binding:

len :: [a] -> Int
len [] = 0
len ls = 
    let (_ : xs) = ls
    in  1 + len xs

Perhaps the most powerful feature of pattern matching is that the compiler will warn you if your pattern matches are non-exhaustive, i.e. if you do not match against all possible constructors of the type! Let us define a function that only matches against the empty list constructor.

-- Main.hs
emp :: [a] -> [a]
emp [] = []

Compile it to see the warning!

ghc Main.hs
Main.hs:3:1: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In an equation for 'emp': Patterns of type '[a]' not matched: (_:_)
  |
3 | emp [] = []
  | ^^^^^^^^^^^

This is one reason why pattern matching is so powerful: compilers can check if you have covered all possible patterns of a given type. This is unlike the usual if-else statements in other languages where it is much less straightforward to check if you have covered all possible branches, especially if you omit else statements.

One important point to highlight here is that pattern matching is done top-down. Pattern-matching is kind of similar to if-else statements in that regard: your most specific condition should be defined first, then followed by more general or catch-all patterns.

The following factorial function is poorly defined, because the first pattern match will match all possible integers, thereby causing the function to never terminate:

fac :: Int -> Int
fac n = n * fac (n - 1)
fac 0 = 1 -- redundant as pattern above matches all possible integers

With pattern matching, let us know fulfil our earlier promise of defining the eval function for the Expr GADT in Chapter 2.3 (Algebraic Data Types). In our Python formulation, we know that eval should have the type signature Expr a -> a. Let us then define how each expression should be evaluated with pattern matching.

-- Main.hs
eval :: Expr a -> a
eval (LitNumExpr n)   = n
eval (AddExpr a b)    = eval a + eval b
eval (EqExpr a b)     = eval a == eval b
eval (CondExpr a b c) = if eval a then eval b else eval c

This seems straightforward. However, you might find that when this program is compiled, the compiler throws an error on the use of the (==) function:

ghc Main.hs
Main.hs:13:28: error:
    • Could not deduce (Eq a1) arising from a use of ‘==’
      from the context: a ~ Bool
        bound by a pattern with constructor:
                   EqExpr :: forall a. Expr a -> Expr a -> Expr Bool,
                 in an equation for ‘eval’
        at app/Main.hs:13:7-16
      Possible fix:
        add (Eq a1) to the context of the data constructor ‘EqExpr’
    • In the expression: eval a == eval b
      In an equation for ‘eval’: eval (EqExpr a b) = eval a == eval b
   |
13 | eval (EqExpr a b) = eval a == eval b
   |       

The reason for this is Haskell is unable to determine that the type parameter a is amenable to equality comparisons. Solving this requires an understanding of typeclasses, which we will explore in the next chapter. For now, just include an Eq a => constraint in our GADT declaration.

You might also get a warning about pattern matching on GADTs being fragile; that is because GADTs are actually a Haskell language extension. As such, enable this extension when compiling this program, or add a LANGUAGE pragma at the top of the file.

{-# LANGUAGE GADTs #-}
data Expr a where
  LitNumExpr ::         Int -> Expr Int
  AddExpr    ::         Expr Int -> Expr Int -> Expr Int
  EqExpr     :: Eq a => Expr a -> Expr a -> Expr Bool
  CondExpr   ::         Expr Bool -> Expr a -> Expr a -> Expr a

Our program should compile now!

Pattern Matching in Python

Python also has pattern matching with match statements with case clauses! It looks very similar to how we would write case expressions in Haskell.

def factorial(n: int) -> int:
  match n:
    case 0: return 1
    case n: return n * factorial(n - 1)

We can also match on the structure of types by unpacking. For example, defining a function that sums over a list of integers:

def sum(ls: list[int]) -> int:
  match ls:
    case []: return 0
    case (x, *xs): return x + sum(xs)
    case _: raise TypeError()

Alternatively, performing structural pattern matching over a so called algebraic data type:

@dataclass
class Tree[a]: pass
 
@dataclass
class Node[a](Tree[a]):
    val: a
    left: Tree[a]
    right: Tree[a]
 
@dataclass
class Leaf[a](Tree[a]):
    val: a

def preorder[a](tree: Tree[a]) -> list[a]:
    match tree:
        case Node(v, l, r): return [v] + preorder(l) + preorder(r)
        case Leaf(v): return [v]
        case _: raise TypeError

However, notice that in the sum and preorder function definitions, the last clause catches all patterns and raises an error. This is needed to side-step the exhaustiveness checker. This is because we are using classes to model algebraic data types, and Python does not always know all the possible structures of a given class. In the case of sum, Python's type system does not contain information about the length of a list, so it has no way of determining exhaustiveness. In the case of preorder, the reason omitting the last case gives a non-exhaustiveness error is because we did not match against other possible subclasses of Tree.

If we had formulated our Tree type using unions, pyright can determine the exhaustiveness of our patterns:

type Tree[a] = Node[a] | Leaf[a]

@dataclass
class Node[a]:
    val: a
    left: Tree[a]
    right: Tree[a]

@dataclass
class Leaf[a]:
    val: a

def preorder[a](tree: Tree[a]) -> list[a]:
    match tree:
        case Node(v, l, r): return [v] + preorder(l) + preorder(r)
        case Leaf(v): return [v]
        # no need for further cases

However, this may not always be ideal, especially if we are to define GADTs in Python. Until Algebraic Data Types or ways to annotate the exhaustivity of subclasses (such as defining a sealed class) are formally introduced, exhaustive pattern matching checks are going to be difficult to do. When doing pattern matching in Python, ensure that all possible cases are handled before doing a catch-all clause in your match statement.

All-in-all, we have just introduced a new control structure known as pattern matching. When should we use this control structure? The general rule of thumb is as follows:

  • If you are doing different things based on the value and/or structure of data, use pattern matching. You can tell this is the case if you are doing equality and isinstance checks in your conditional statements in Python.

  • Otherwise, you are likely going with the more general case of doing different things based on the satisfiability of a condition, in which case, rely on if-else statements, or in Haskell, conditional expressions and/or guards.