Lazy pattern matching

In previous sections we only considered strict ML. In a strict language we can define ML pattern matching as a predicate operating on terms, which also are the values of program expressions. In other words, pattern matching apply to completely evaluated expressions, or normal forms.

Studying pattern matching in a lazy language such as Haskell requires a more sophisticated semantical setting. Essentially, lazy language manipulate values that are known partially and, more significant to our study, pattern matching operates on such incomplete values.

Definition 1 of the instance relation for patterns and values apply unchanged to partial values (we have _ ≼ Ω, value Ω possesses all types). Thus we keep the same notation ≼, and maintain Definition 3 of the instance relation for matrices.

Example 3 Let us consider a simple example²: case e of True -> 1 | _ -> 2. If the root symbol of expression e is not a constructor, its partial value is Ω. Then, since True ⋠Ω and _ ≼ Ω, the value of the whole expression is 2. But, if we compute e further, its value may become True. Then, the value of the whole expression becomes 1. Something is wrong, since the value of the whole expression changed from 1 to 2.

More generally, partial values and computation interact. Let us consider some expression e. If the root symbol of expression e is a constructor c, then expression e is a head-normal form and we express the “current value” of e as c(v₁,v₂, … ,v_a) — see Huet and Lévy [1991] for a more precise treatment in the context of term rewriting systems. Otherwise, the “current value” of expression e is Ω. Then, we consider various “current values” along the evaluation of e. As constructors cannot be reduced, those values are increasing according to the following precision ordering.

Definition 7 (Precision ordering) Relation ≼_Ω is defined on pairs of values (v, w) as follows.

Ω	≼_Ω	w
c(v₁, … , v_a)	≼_Ω	c(w₁, … , w_a)	iff (v₁⋯v_a) ≼_Ω(w₁⋯w_a)
(v₁⋯v_n)	≼_Ω	(w₁⋯w_n)	iff v_i ≼_Ωw_i, for all i ∈ [1…n]

To be of practical use, a predicate P that defines pattern matching must be monotonic. That is, when P(v) holds, P(w) also holds for all w such that v ≼_Ωw. With monotonic predicates, matching decisions do not change during computations. One should notice that, given any pattern p, the predicate p ≼ v is monotonic in v. Example 3 shows that the predicate P ⋠v^→ is not monotonic in general.

We thus need a new definition of pattern matching. For the moment, we leave most of lazy pattern matching unspecified.

Definition 8 (General (lazy) pattern matching) Let P(P, v^→) be a predicate defined over pattern matrices P and value vectors v^→, where the size n of v^→ is equal to the width of P. Row number i in P filters v^→, if and only if the following condition holds:

P(P^[1…i),

→

) ∧

→

ⁱ ≼

→

We call P the disambiguating predicate and now look for sufficient conditions on P that account for our intuition of pattern matching in a lazy language.

The three conditions above are our basic restrictions on P. We further define U_P and M_P as U and M (Definition 6) parameterized by P.

Now, given a definition of lazy pattern matching, we face the temptation to assume that the computation of U described in Section 3.1 still works for U_P. More precisely, by finding additional sufficient conditions on predicate P we aim at proving U_P = U_rec. Thus, we re-examine the proof of Proposition 2 in the context of lazy pattern matching.

To prove the inductive cases, it suffices to reformulate the key properties of Lemma 1, replacing P ⋠v^→ by P(P, v^→). However, key properties now are rather assumed than established.

Definition 9 (Key properties) We say that predicate P meets key properties when the following four properties hold. For any matrix P, constructor c, and value vector v^→ such that v₁ = c(w₁, … , w_a), we assume:

P(P,

→

) ⇐⇒ P(S(c, P), S(c,

→

)). (1)

Additionally, for any value vector v^→, we assume:

P(P, (v₁ v₂⋯v_n)) =⇒ P(D(P), (v₂⋯v_n)). (2)

Furthermore, given any matrix P, let Σ be set of the root constructors of P’s first column. If Σ is not empty, then for any constructor c not in Σ and any value vector (w₁⋯ w_a v₂⋯ v_n), we assume:

P(D(P), (v₂⋯v_n)) =⇒ P(P, (c(w₁, … , w_a) v₂⋯v_n)). (3)

If Σ is empty, then, for any value vector v^→, we instead assume:

P(D(P), (v₂⋯v_n)) =⇒ P(P, (v₁ v₂⋯v_n)). (4)

It is not obvious that assuming key properties suffices to prove that U_P can be computed as U is, since Ω does not show in the proof of Proposition 2. Indeed, monotonicity plays some part here.

Proof: Base cases follow from basic restrictions; while the proofs of all inductive cases in Proposition 2, except 2-(a), apply unchanged.

Hence, we assume q₁ to be a wildcard and the set Σ to be a complete signature. We need prove:

U_P(P, ((_:t) q₂⋯q_n)) =

∨

k=1

U_P(S(c_k,P), S(c_k,

→

)).

By (1), for any constructor c_k in Σ and any vector v^→ such that v₁ = c_k(w₁, … , w_ak), we have: v^→ ∈ M_P(P, q^→ ) ⇐⇒ S(c_k, v^→) ∈ M_P(S(c_k, P), S(c_k, q^→ )). Hence, a potential difficulty arises for vectors v^→ in M_P(P, q^→ ), when v₁ is Ω. Then, by monotonicity of P (and ≼), the non-empty type axiom, and for any constructor c (in the signature of type t), there exists (w₁⋯w_a) such that (c(w₁, … , w_a) v₂⋯v_n) ∈ M_P(P, q^→ ). That is (by (1) in the forwards direction), U_P(S(c_k, P), S(c_k, q^→)) holds for all c_k in Σ. □

As an immediate consequence of the proposition above, the useless clause problem is now solved in the lazy case (see second item in Proposition 1). However, the exact formulation of exhaustiveness needs a slight change. Reconsider Example 3.

By definition of ≼, True ≼ Ω does not hold, hence Ω cannot match first clause. However, P((

), Ω) does not hold either, by monotonicity. Hence, there is no row that filters value Ω and Definition 4 would flag this matching as non-exhaustive, a clear contradiction with our intuition of exhaustiveness. Thus, we now directly define an exhaustive matrix P from the condition U_P(P, (_⋯_)) = False. That is, P is exhaustive, if and only if for all vectors v^→, P(P, v^→) does not hold. By this new definition, the example is exhaustive: for any value v in {Ω,False, True}, we have _ ≺ v. Thus we have:

4.2 Lazy pattern matching, à la Laville

Laville’s definition of lazy pattern matching Laville [1991] stems directly from the need of a monotonic P: if we decide that some term is evaluated enough not to match a pattern, then we want this to remain true when the term is evaluated further. By definition, matrix P and value v^→ are incompatible, written P #v^→ , when making v^→ more precise cannot produce an instance of P. That is, P #v^→ means

The first basic restriction follows by letting w^→ be v^→ in the definition of P #v^→, the second restriction follows from ∅ ⋠w^→ for all w^→, and monotonicity is a consequence of the transitivity of ≼_Ω.

Incompatibility is the most general P in the following sense: for any predicate P, any matrix and any value vector v^→, we have.

For, if P and v^→ are compatible (i.e. not incompatible), then there exists w^→, with v^→ ≼_Ωw^→ and P ≼ w^→. Thus, by the first basic restriction, P(P, w^→) does not hold, and, by the monotonicity of P, P(P, v^→) does not hold either.

It is routine to show that incompatibility meets key properties. Hence, by Proposition 3, algorithm U_rec is correct with respect to Laville’s semantics.

Laville’s definition is quite appealing as a good, implementation independent, definition of lazy pattern matching. However, there is a slight difficulty: predicate P #v^→ is not sequential in the sense of Kahn and Plotkin [1978] in v^→ for any matrix P. This means that its compilation on an ordinary, sequential, computer is problematic Maranget [1992], Sekar et al. [1992]. As a consequence, the Haskell committee adopted another semantics for pattern matching. Their definition is aware of the presence of Ω and solves the difficulty by specifying left-to-right testing order.

4.3 Pattern matching in Haskell

By interpreting the Haskell report Hudak et al. [1998] we can formulate a pattern matching predicate for this language. Matching can yield three different results: it may either succeed, fail or diverge. Furthermore, matching of arguments is performed left-to-right. We encode “success”, “failure” and “divergence” by the three values T, F and ⊥, and define the following H function.

We ignore some of Haskell patterns such as irrefutable patterns. We also ignore or-patterns at the moment. From this definition one easily shows the following two properties on vectors:

We then interpret the many program equivalences of section 3.17.3 in the Haskell report as expressing a downward search for a pattern of which v^→ is an instance of:

Informally, H(P, v^→ ) = T means “v^→ is found to match some row in P in the Haskell way”, H(P, v^→ ) = F means “no row of P is found to be matched”, and H(P, v^→ ) = ⊥ means “v^→ is not precise enough to make a clear decision”. We can now formulate the Haskell way of pattern matching in our setting.

One easily checks that predicate H(P, v^→ ) = F meets all basic restrictions and key properties (decomposing along first columns is instrumental). Hence, save for or-patterns, algorithm U_rec also computes the utility of pattern matching in Haskell.

4.4 Or-patterns in Haskell

As this work is partly dedicated to specific warnings for or-patterns, we wish to enrich Haskell matching with or-patterns. The H function is extended to consider or-patterns, sticking to left-to-right bias:

Semantical consequences are non-negligible, since the equivalence H(p^→, v^→ ) = T ⇐⇒ p^→ ≼ v^→ does not hold any more, as can be seen by considering H((True∣_), Ω) = ⊥.

However, the left-to-right implication still holds, and the following definition of Haskell pattern matching makes sense.

Definition 12 (Haskell matching with or-patterns) Let P be a pattern matrix and v^→ be a value vector. Vector v matches row i in P, if and only if the following proposition hold:

H(P^[1…i),

→

) = F ∧ H(

→

ⁱ,

→

) = T.

From this definition of matching, we define the utility predicate U_H and the set of matching values M_H as we did in Definition 6.

The definition above is not the application of the generic definition 8 to P(P,v^→ ) = (H(P, v^→ ) = F ), because we have written H(p^→ ⁱ, v^→ ) = T in place of the instance relation p^→ ⁱ ≼ v^→ . However, as illustrated by the following lemma, H(q^→ , v^→ ) = T and q^→ ≼ v^→ are closely related.

Proof: We first prove the existence of w by induction on p.

If p = c(p₁, … , p_a), then, by hypothesis H(p, v) = ⊥, we have two sub-cases.
- Value v is c(v₁, … , v_a), with H(p_i, v_i) = ⊥ for i in some (non-empty) index set I. Applying induction hypothesis to all such i yields values v′_i such that v_i ≼_Ωv′_i and H(p_i, v′_i) ≠⊥. Then, we define w = c(w₁, … , w_a) where w_i = v′_i for i ∈ I, and w_i = v_i otherwise.
- Otherwise, value v is Ω. Let v′ be c(Ω, … , Ω). If H(p, v′) is not ⊥, then we define w = v′. Otherwise, we reason as in the previous case.
If p = (q₁∣q₂), we have two sub-cases.
- If H(q₁, v) = F and H(q₂, v) = ⊥, then (by induction) there exists a value w such that H(q₂, w) ≠ ⊥. Finally, by definition of H, we have H(p, w) = H(q₂, w) ≠ ⊥.
- If H(q₁, v) = ⊥, then (by induction) there exists a value w′, such that v ≼_Ωw′ and H(q₁, w′) ≠ ⊥ If H(q₁, w′) = T, we define w to be w′ and we conclude. Otherwise, H(q₁, w′) = F and thus H((q₁∣q₂),w′) = H(q₂, w′). Then we conclude, either directly, or by induction in the case where H(q₂, w′) = ⊥.

Additionally, H(p, w) = T holds under the extra hypothesis p ≼ v, by H(p, w) = F =⇒ p ⋠w and by the monotonicity of ≼. □

The lemma above suffices to relate Haskell matching to generic lazy matching, and thus to compute the utility of Haskell matching.

Proof: We note U_H≼ the utility predicate that results from the generic definition, taking P(P,v^→ ) to be H(p, v^→ ) = F. From generic proposition 3, we have U_H≼ = U_rec. (Formally we check that predicate H(P, v^→ ) = F meets key properties even when some of the patterns in P are or-patterns).

Then we show U_H = U_H≼. From the implication H(q^→, v^→ ) = T =⇒ q^→ ≼ v^→ , we have U_H(P, q^→ ) =⇒ U_H≼(P,q^→ ); the converse implication follows from Lemma 2. □

It is time to clearly stress on some important consequence of propositions U = U_rec, U_P = U_rec and U_H = U_rec: all our utility predicates are in fact equal. This suggests a quite powerful and elegant“semantical” proof technique, which we immediately demonstrate.

Proof: Consider strict matching. Since predicates P ⋠v^→ and q^→ ≼ v^→ do not depend on column order, we have U(P, q^→ ) = U(P′, q^→ ′). From U_H = U, we conclude. □

First observe that proving the irrelevance of column order for Haskell matching by induction on matrix and pattern structure would be quite cumbersome.

Also notice that the lemma above is not obvious, since Haskell matching depends upon column order in a strong sense. For instance, let P, q^→ , P′ and q^→ ′ be as follows.

Matrix P′ (resp. vector q^→ ′) is P (resp. q^→ ) with columns swapped. The sets of matching values are as follows.

Swapping the components of the elements of M_H(P, q^→ ) does not yield M_H(P′, q^→ ′), since (Ω False) does not belong to M_H(P′, q^→ ′). However, some of the values of the M_H sets above are related by the permutation. Moreover, the equality U_H=U can be seen as telling us that there is at least one such value.

From now on, we simply write U for any utility predicate, regardless of semantics. We also write “algorithm U” for U_rec.

4 Lazy pattern matching

4.1 Lazy pattern matching in theory

4.2 Lazy pattern matching, à la Laville

4.3 Pattern matching in Haskell

4.4 Or-patterns in Haskell