You can download this blog post’s source (implemented in Coq using the HoTT library). Learn more about HoTTSQL by visiting our website.

## HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics

## Combinatorial Species and Finite Sets in HoTT

(Post by Brent Yorgey)

My dissertation was on the topic of combinatorial species, and specifically on the idea of using species as a foundation for thinking about generalized notions of algebraic data types. (Species are sort of dual to containers; I think both have intereseting and complementary things to offer in this space.) I didn’t really end up getting very far into practicalities, instead getting sucked into a bunch of more foundational issues.

To use species as a basis for computational things, I wanted to first “port” the definition from traditional, set-theory-based, classical mathematics into a constructive type theory. HoTT came along at just the right time, and seems to provide exactly the right framework for thinking about a constructive encoding of combinatorial species.

For those who are familiar with HoTT, this post will contain nothing all that new. But I hope it can serve as a nice example of an “application” of HoTT. (At least, it’s more applied than research in HoTT itself.)

# Combinatorial Species

Traditionally, a species is defined as a functor , where is the groupoid of finite sets and bijections, and is the category of finite sets and (total) functions. Intuitively, we can think of a species as mapping finite sets of “labels” to finite sets of “structures” built from those labels. For example, the species of linear orderings (*i.e.* lists) maps the finite set of labels to the size- set of all possible linear orderings of those labels. Functoriality ensures that the specific identity of the labels does not matter—we can always coherently relabel things.

# Constructive Finiteness

So what happens when we try to define species inside a constructive type theory? The crucial piece is : the thing that makes species interesting is that they have built into them a notion of bijective relabelling, and this is encoded by the groupoid . The first problem we run into is how to encode the notion of a *finite* set, since the notion of finiteness is nontrivial in a constructive setting.

One might well ask why we even care about finiteness in the first place. Why not just use the groupoid of *all* sets and bijections? To be honest, I have asked myself this question many times, and I still don’t feel as though I have an entirely satisfactory answer. But what it seems to come down to is the fact that species can be seen as a categorification of generating functions. Generating functions over the semiring can be represented by functions , that is, each natural number maps to some coefficient in ; each natural number, categorified, corresponds to (an equivalence class of) *finite* sets. Finite label sets are also important insofar as our goal is to actually use species as a basis for *computation*. In a computational setting, one often wants to be able to do things like enumerate all labels (*e.g.* in order to iterate through them, to do something like a map or fold). It will therefore be important that our encoding of finiteness actually has some computational content that we can use to enumerate labels.

Our first attempt might be to say that a finite set will be encoded as a type together with a bijection between and a canonical finite set of a particular natural number size. That is, assuming standard inductively defined types and ,

However, this is unsatisfactory, since defining a suitable notion of bijections/isomorphisms between such finite sets is tricky. Since is supposed to be a groupoid, we are naturally led to try using equalities (*i.e.* paths) as morphisms—but this does not work with the above definition of finite sets. In , there are supposed to be different morphisms between any two sets of size . However, given any two same-size inhabitants of the above type, there is only *one* path between them—intuitively, this is because paths between -types correspond to tuples of paths relating the components pointwise, and such paths must therefore preserve the *particular* relation to . The only bijection which is allowed is the one which sends each element related to to the other element related to , for each .

So elements of the above type are not just finite sets, they are finite sets *with a total order*, and paths between them must be order-preserving; this is too restrictive. (However, this type is not without interest, and can be used to build a counterpart to L-species. In fact, I think this is exactly the right setting in which to understand the relationship between species and L-species, and more generally the difference between isomorphism and *equipotence* of species; there is more on this in my dissertation.)

# Truncation to the Rescue

We can fix things using propositional truncation. In particular, we define

That is, a “finite set” is a type together with some *hidden* evidence that is equivalent to for some . (I will sometimes abuse notation and write instead of .) A few observations:

- First, we can pull the size out of the propositional truncation, that is, . Intuitively, this is because if a set is finite, there is only one possible size it can have, so the evidence that it has that size is actually a mere proposition.
- More generally, I mentioned previously that we sometimes want to use the computational evidence for the finiteness of a set of labels,
*e.g.*enumerating the labels in order to do things like maps and folds. It may seem at first glance that we cannot do this, since the computational evidence is now hidden inside a propositional truncation. But actually, things are exactly the way they should be: the point is that we can use the bijection hidden in the propositional truncation*as long as the result does not depend on the particular bijection we find there*. For example, we cannot write a function which returns the value of type corresponding to , since this reveals something about the underlying bijection; but we can write a function which finds the smallest value of (with respect to some linear ordering), by iterating through all the values of and taking the minimum. - It is not hard to show that if , then is a set (
*i.e.*a 0-type) with decidable equality, since is equivalent to the 0-type . Likewise, itself is a 1-type. - Finally, note that paths between inhabitants of now do exactly what we want: a path is really just a path between 0-types, that is, a bijection, since trivially.

# Constructive Species

We can now define species in HoTT as functions of type . The main reason I think this is the Right Definition ™ of species in HoTT is that functoriality comes for free! When defining species in set theory, one must say “a species is a functor, *i.e.* a pair of mappings satisfying such-and-such properties”. When constructing a particular species one must explicitly demonstrate the functoriality properties; since the mappings are just functions on sets, it is quite possible to write down mappings which are not functorial. But in HoTT, all functions are functorial with respect to paths, and we are using paths to represent the morphisms in , so any function of type automatically has the right functoriality properties—it is literally impossible to write down an invalid species. Actually, in my dissertation I define species as functors between certain categories built from and , but the point is that any function can be automatically lifted to such a functor.

Here’s another nice thing about the theory of species in HoTT. In HoTT, coends whose index category are groupoids are just plain -types. That is, if is a groupoid, a category, and , then . In set theory, this coend would be a *quotient* of the corresponding -type, but in HoTT the isomorphisms of are required to correspond to paths, which automatically induce paths over the -type which correspond to the necessary quotient. Put another way, we can define coends in HoTT as a certain HIT, but in the case that is a groupoid we already get all the paths given by the higher path constructor anyway, so it is redundant. So, what does this have to do with species, I hear you ask? Well, several species constructions involve coends (most notably partitional product); since species are functors from a groupoid, the definitions of these constructions in HoTT are particularly simple. We again get the right thing essentially “for free”.

There’s lots more in my dissertation, of course, but these are a few of the key ideas specifically relating species and HoTT. I am far from being an expert on either, but am happy to entertain comments, questions, etc. I can also point you to the right section of my dissertation if you’re interested in more detail about anything I mentioned above.

## Parametricity and excluded middle

Exercise 6.9 of the HoTT book tells us that, and assuming LEM, we can exhibit a function such that is a non-identity function I have proved the converse of this. Like in exercise 6.9, we assume univalence.

## Parametricity

In a typical functional programming career, at some point one encounters the notions of parametricity and free theorems.

Parametricity can be used to answer questions such as: is every function

f : forall x. x -> x

equal to the identity function? Parametricity tells us that this is true for System F.

However, this is a metatheoretical statement. Parametricity gives properties about the *terms* of a language, rather than proving *internally* that certain elements satisfy some properties.

So what can we prove internally about a polymorphic function ?

In particular, we can see that internal proofs (claiming that must be the identity function for every type ) *cannot* exist: exercise 6.9 of the HoTT book tells us that, assuming LEM, we can exhibit a function such that is (Notice that the proof of this is not quite as trivial as it may seem: LEM only gives us if is a (mere) proposition (a.k.a. subsingleton). Hence, simple case analysis on does not work, because this is not necessarily a proposition.)

And given the fact that LEM is consistent with univalent foundations, this means that a proof that is the identity function cannot exist.

I have proved that LEM is exactly what is needed to get a polymorphic function that is not the identity on the booleans.

**Theorem.** If there is a function with then LEM holds.

## Proof idea

If then by simply trying both elements we can find an explicit boolean such that Without loss of generality, we can assume

For the remainder of this analysis, let be an arbitrary proposition. Then we want to achieve to prove LEM.

We will consider a type with three points, where we identify two points depending on whether holds. In other words, we consider the quotient of a three-element type, where the relation between two of those points is the proposition

I will call this space and it can be defined as where is the *suspension* of This particular way of defining the quotient, which is equivalent to a quotient of a three-point set, will make case analysis simpler to set up. (Note that suspensions are not generally quotients: we use the fact that is a proposition here.)

Notice that if holds, then and also

We will consider at the type (*not* itself!). Now the proof continues by defining

(where is the equivalence given by the identity function on ) and doing case analysis on and if necessary also on for some elements I do not believe it is very instructive to spell out all cases explicitly here. I wrote a more detailed note containing an explicit proof.

Notice that doing case analysis here is simply an instance of the induction principle for In particular, we do not require decidable equality of (which would already give us which is exactly what we are trying to prove).

For the sake of illustration, here is one case:

- Assume holds. Then since then by transporting along an appropriate equivalence (namely the one that identifies with we get But since is an equivalence for which is a fixed point, must be the identity everywhere, that is, which is a contradiction.

I formalized this proof in Agda using the HoTT-Agda library

## Acknowledgements

Thanks to Martín Escardó, my supervisor, for his support. Thanks to Uday Reddy for giving the talk on parametricity that inspired me to think about this.

## Colimits in HoTT

In this post, I would want to present you two things:

- the small library about colimits that I formalized in Coq,
- a construction of the image of a function as a colimit, which is essentially a sliced version of the result that Floris van Doorn talked in this blog recently, and further improvements.

I present my hott-colimits library in the first part. This part is quite easy but I hope that the library could be useful to some people. The second part is more original. Lets sketch it.

Given a function we can construct a diagram

where the HIT is defined by:

HIT KP f := | kp : A -> KP f | kp_eq : forall x x', f(x) = f(x') -> kp(x) = kp(x').

and where is defined recursively from . We call this diagram the iterated kernel pair of . The result is that the colimit of this diagram is , the image of ( is the homotopy fiber of in ).

It generalizes Floris’ result in the following sense: if we consider the unique arrow (where is Unit) then is the one-step truncation of and the colimit is equivalent to the truncation of .

We then go further. Indeed, this HIT doesn’t respect the homotopy levels at all: even is the circle. We try to address this issue considering an HIT that take care of already existing paths:

HIT KP' f := | kp : A -> KP' f | kp_eq : forall x x', f(x) = f(x') -> kp(x) = kp(x'). | kp_eq_1 : forall x, kp_eq (refl (f x)) = refl (kp x)

This HIT avoid adding new paths when some elements are already equals, and turns out to better respect homotopy level: it at least respects hProps. See below for the details.

Besides, there is another interesting thing considering this HIT: we can sketch a link between the iterated kernel pair using and the Čech nerve of a function. We outline this in the last paragraph.

All the following is joint work with Kevin Quirin and Nicolas Tabareau (from the CoqHoTT project), but also with Egbert Rijke, who visited us.

All our results are formalized in Coq. The library is available here:

https://github.com/SimonBoulier/hott-colimits

# Colimits in HoTT

In homotopy type theory, Type, the type of all types can be seen as an ∞-category. We seek to calculate some homotopy limits and colimits in this category. The article of Jeremy Avigad, Krzysztof Kapulkin and Peter LeFanu Lumsdaine explain how to calculate the limits over graphs using sigma types. For instance an equalizer of two function and is .

The colimits over graphs are computed in same way with Higher Inductive Types instead of sigma types. For instance, the coequalizer of two functions is

HIT Coeq (f g: A -> B) : Type := | coeq : B -> Coeq f g | cp : forall x, coeq (f x) = coeq (g x).

In both case there is a severe restriction: we don’t know how two compute limits and colimits over diagrams which are much more complicated than those generated by some graphs (below we use an extension to “graphs with compositions” which is proposed in the exercise 7.16 of the HoTT book, but those diagrams remain quite poor).

We first define the type of graphs and diagrams, as in the HoTT book (exercise 7.2) or in hott-limits library of Lumsdaine *et al.*:

Record graph := { G_0 :> Type ; G_1 :> G_0 -> G_0 - Type }.

Record diagram (G : graph) := { D_0 :> G -> Type ; D_1 : forall {i j : G}, G i j -> (D_0 i -> D_0 j) }.

And then, a cocone over a diagram into a type :

Record cocone {G: graph} (D: diagram G) (Q: Type) := { q : forall (i: G), D i - X ; qq : forall (i j: G) (g: G i j) (x: D i), q j (D_1 g x) = q i x }.

Let be a cocone into and be a function . Then we can extend to a cocone into by postcomposition with . It gives us a function

A cocone is said to be universal if, for all other cocone over the same diagram, can be obtained uniquely by extension of , that we translate by:

Definition is_universal (C: cocone D Q) := forall (Q': Type), IsEquiv (postcompose_cocone C Q').

Last, a type is said to be a colimit of the diagram if there exists a universal cocone over into .

## Existence

The existence of the colimit over a diagram is given by the HIT:

HIT colimit (D: diagram G) : Type := | colim : forall (i: G), D i - colimit D | eq : forall (i j: G) (g: G i j) (x: D i), colim j (D_1 g x) = colim i x

Of course, is a colimit of .

## Functoriality and Uniqueness

### Diagram morphisms

Let and be two diagrams over the same graph . A morphism of diagrams is defined by:

Record diagram_map (D1 D2 : diagram G) := { map_0: forall i, D1 i - D2 i ; map_1: forall i j (g: G i j) x, D_1 D2 g (map_0 i x) = map_0 j (D_1 D1 g x) }.

We can compose diagram morphisms and there is an identity morphism. We say that a morphism is an equivalence of diagrams if all functions are equivalences. In that case, we can define the inverse of (reversing the proofs of commutation), and check that it is indeed an inverse for the composition of diagram morphisms.

### Precomposition

We yet defined forward extension of a cocone by postcomposition, we now define backward extension. Given a diagram morphism , we can make every cocone over into a cocone over by precomposition by . It gives us a function

We check that precomposition and postcomposition respect the identity and the composition of morphism. And then, we can show that the notions of universality and colimits are stable by equivalence.

### Functoriality of colimits

Let be a diagram morphism and and two colimits of and . Let’s note and the universal cocone into and . Then, we can get a function given by:

We check that if is an equivalence of diagram then the function given by is well an inverse of .

As a consequence, we get:

The colimits of two equivalents diagrams are equivalent.

### Uniqueness

In particular, if we consider the identity morphism we get:

Let and be two colimits of the same diagram, then: .

So, if we assume univalence, the colimit of a diagram is truly unique!

## Commutation with sigmas

Let be a type and, for all , a diagram over a graph . We can then build a new diagram over whose objects are the and functions are induced by the identity on the first component and by on the second one. Let’s note this diagram.

Seemingly, from a family of cocone , we can make a cocone over into .

We proved the following result, which we believed to be quite nice:

If, for all , is a colimit of , then is a colimit of .

# Iterated Kernel Pair

## First construction

Let’s first recall the result of Floris. An attempt to define the propositional truncation is the following:

HIT {_} (A: Type) := | α : A -> {A} | e : forall (x x': A), α x = α x'.

Unfortunately, in general is not a proposition, the path constructor is not strong enough. But we have the following result:

Let be a type. Let’s consider the following diagram:

Then, is a colimit of this diagram.

Let’s generalize this result to a function (we will recover the theorem considering the unique function ).

Let . We note the colimit of the kernel pair of :

where the pullback is given by .

Hence, is the following HIT:

Inductive KP f := | kp : A -> KP f | kp_eq : forall x x', f(x) = f(x') -> kp(x) = kp(x').

Let’s consider the following cocone:

we get a function by universality (another point of view is to say that is defined by ).

Then, iteratively, we can construct the following diagram:

where and .

The iterated kernel pair of is the subdiagram

We proved the following result:

The colimit of this diagram is , the image of .

The proof is a slicing argument to come down to Floris’ result. It uses all properties of colimits that we talked above. The idea is to show that those three diagrams are equivalent.

Going from the first line to the second is just apply the equivalence (for ) at each type. Going from the second to the third is more involved, we don’t detail it here. And is well the colimit of the last line: by commutation with sigmas it is sufficient to show that for all , is the colimit of the diagram

which is exactly Floris’ result!

The details are available here.

## Second construction

The previous construction has a small defect: it did not respect the homotopy level at all. For instance is the circle . Hence, to compute (which is of course), we go through very complex types.

We found a way to improve this: adding identities!

Indeed, the proof keeps working if we replace by which is defined by:

Inductive KP' f := | kp : A -> KP' f | kp_eq : forall x x', f(x) = f(x') -> kp(x) = kp(x'). | kp_eq_1 : forall x, kp_eq (refl (f x)) = refl (kp x)

can be seen as a “colimit with identities” of the following diagram :

(♣)

with .

In his article, Floris explains that, when then and are not equal. But now they become equal: by path induction we bring back to . That is, if two elements are already equal, we don’t add any path between them.

And indeed, this new HIT respects the homotopy level better, at least in the following sense:

- is (meaning that the one-step truncation of a contractible type is now ),
- If is an embedding (in the sense that is an equivalence for all ) then so is . In particular, if is hProp then so is (meaning that the one-step truncation of an hProp is now itself).

## Toward a link with the Čech nerve

Although we don’t succeed in making it precise, there are several hints which suggest a link between the iterated kernel pair and the Čech nerve of a function.

The Čech nerve of a function is a generalization of his kernel pair: it is the simplicial object

(the degeneracies are not dawn but they are present).

We will call n-truncated Čech nerve the diagram restricted to the n+1 first objects:

(degeneracies still here).

The kernel pair (♣) is then the 1-truncated Čech nerve.

We wonder to which extent could be the colimit of the (n+1)-truncated Čech nerve. We are far from having such a proof but we succeeded in proving :

- That is the colimit of the kernel pair (♣),
- and that there is a cocone over the 2-trunated Čech nerve into

(both in the sense of “graphs with compositions”, see exercise 7.16 of the HoTT book).

The second point is quite interesting because it makes the path concatenation appear. We don’t detail exactly how, but to build a cocone over the 2-trunated Čech nerve into a type , must have a certain compatibility with the path concatenation. doesn’t have such a compatibility: if and , in general we do **not** have

in .

On the contrary, **have** the require compatibility: we can prove that

in .

( has indeed the type because is and then .)

This fact is quite surprising. The proof is basically getting an equation with a transport with apD and then making the transport into a path concatenation (see the file *link_KPv2_CechNerve.v* of the library for more details).

## Questions

Many questions are left opened. To what extent is linked with the (n+1)-truncated diagram? Could we use the idea of the iterated kernel pair to define a groupoid object internally? Indeed, in an ∞-topos every groupoid object is effective (by Giraud’s axioms) an then is the Čech nerve of his colimit…

## The Lean Theorem Prover

Lean is a new player in the field of proof assistants for Homotopy Type Theory. It is being developed by Leonardo de Moura working at Microsoft Research, and it is still under active development for the foreseeable future. The code is open source, and available on Github.

You can install it on Windows, OS X or Linux. It will come with a useful mode for Emacs, with syntax highlighting, on-the-fly syntax checking, autocompletion and many other features. There is also an online version of Lean which you can try in your browser. The on-line version is quite a bit slower than the native version and it takes a little while to load, but it is still useful to try out small code snippets. You are invited to test the code snippets in this post in the on-line version. You can run code by pressing shift+enter.

In this post I’ll first say more about the Lean proof assistant, and then talk about the considerations for the HoTT library of Lean (Lean has two libraries, the standard library and the HoTT library). I will also cover our approach to higher inductive types. Since Lean is not mature yet, things mentioned below can change in the future.

Update January 2017: the newest version of Lean currently doesn’t support HoTT, but there is a frozen version which does support HoTT. The newest version is available here, and the frozen version is available here. To use the frozen version, you will have to compile it from the source code yourself.

## Real-cohesive homotopy type theory

Two new papers have recently appeared online:

- Brouwer’s fixed-point theorem in real-cohesive homotopy type theory by me, and
- Adjoint logic with a 2-category of modes, by Dan Licata with a bit of help from me.

Both of them have fairly chatty introductions, so I’ll try to restrain myself from pontificating at length here about their contents. Just go read the introductions. Instead I’ll say a few words about how these papers came about and how they are related to each other.

## A new class of models for the univalence axiom

First of all, in case anyone missed it, Chris Kapulkin recently wrote a guest post at the n-category cafe summarizing the current state of the art regarding “homotopy type theory as the internal language of higher categories”.

I’ve just posted a preprint which improves that state a bit, providing a version of “Lang(*C*)” containing univalent strict universes for a wider class of (∞,1)-toposes *C*: