Two months back, we went over the process of building a DSL for arithmetic operations and we introduced the concept of Catamorphism as a way to decouple the traversal of an AST
from the operations we want to perform on it.
We saw how it could help us compose operations
before traversing the AST of our DSL, leading to more efficient composition and better testability.
We did this exercise in Haskell, introducing the concept of Catamorphism through its wonderful type system. Then we tried it in Clojure only to see it was a simple post-walk tree traversal. We ended up by exploring the limits of its applicability in C++.
The full series of post is available below for reference:
- Catamorph your DSL: Introduction
(Haskell) - Catamorph your DSL: Deep Dive
(Haskell) - Catamorph your DSL: Trade-offs
(Haskell) - Catamorph your DSL: Clojure
- Catamorph your DSL: C++ Port
This post will build on top of the Catamorph your DSL: C++ Port
post. We will introduce, through a motivated use case, another very close and useful recursion scheme, Paramorphisms, and try to implement them in C++.
Reminder: Our Arithmetic DSL
Our Arithmetic DSL allows to build expression composed of operations like addition and multiplication of integer constants and integers variables. An arithmetic expression is one of the following:
- A constant value (and integer for simplicity)
- A variable (identified by a string for simplicity)
- An addition of a vector of sub-expressions
- A multiplication of a vector of sub-expressions
Because an AST alone is not very useful, we added several interpreters on our DSL to play with it. We implemented each of them in in the Catamorph your DSL: C++ Port
post:
- prn
to pretty print our Arithmetic expressions - eval
to compute the value of an expression, given the value of all variables - optimize
to simplify and optimize our expression - partial
to partially evaluate our expression, given the value of some variables - dependencies
to retrieve not yet resolved variables from our expression
Reminder: Recursion Schemes
Looking at our interpreters, we noticed in our previous post that they all had in common the need to traverse the full AST. We used Catamorphisms to factorize this concern out of our interpreters.
This section is a short overview of what Catamorphism are for and how we implemented them in C++. You can skip it if you feel comfortable with the notion already.
Open recursion on types
To use Catamorphisms, we first need to generalize our Arithmetic expression
and make it an open recursive data type: expression_r
templated on a parameter type R
, which represents the type of the sub-expressions.
An expression
of our DSL is then defined such as the template parameter R
is itself an expression. In more formal terms, expression
is the fixed point of the expression_r
type. We can compute it using CRTP
and boost::recursive_wrapper
.
Functor instance
We made our expression_r
type an instance of a Functor by implementing a fmap
function for our type. Note that by Functor here, we mean the mathematical construct
, and not function objects
.
The transformation map
given to fmap
applies on the template parameter of the expression_r
and may modify its type. In other words, fmap
allows to transform the sub-expressions.
Note: because the function provided to fmap can have different types as input and output, the type of the template parameter of expression_r can change under fmap, and with it, the nature of the recursion.
Generic tree traversal
We then defined the cata
function, which implements the Catamorphism recursion scheme. It takes what we will call an algebra
as parameter, and returns a transformation on a whole AST.
The algebra is a function that takes an expression_r
templated on Out
, and outputs an Out
value.
The result of cata
on the algebra (curried) is a function that applies on a full expression
and returns an Out
value:
For instance, the following print_alg
algebra transforms an expression_r
templated on string into a string. Its goal is to implement a “pretty print” of the expression at one stage of the expression alone.
Given to cata
, it will become an function that transforms a complete expression
into a string, basically being promoted to operate on all stages of the expression.
The use case that breaks the pattern
Every single technique we use in Software Engineering has its limits. Outside of these limits, the technique will either not apply or might not be the best tool for the job. Catamorphisms are no exception to this rule.
Because Catamorphisms are a way to factorize a particular recursion scheme, it cannot apply for recursions that do not follow the given pattern. We will now explore one use case that demonstrates these limits.
Infix notation
Let us imagine that the client of our arithmetic DSL does not like the prefix notation of our pretty print function. Instead, he would like the arithmetic expressions to be pretty printed in infix notation. For example, 1 + y + x * 3
instead of (+ 1 y (* x 3))
.
Now, and contrary to the prefix notation which follows a pretty simple parenthesizing scheme, pretty printing in infix notation requires us to be careful. A wrong use of parentheses could change the meaning of the expression.
A first attempt
As a first approximation, we could use the following strategy for the placement of the parentheses in our infix notation:
- Systematically surround the arguments of a multiplication with parentheses
- Never surround the arguments of an addition with parentheses
Implementing the pretty printer that follows this strategy can be done using Catamorphisms. We use Boost
to make the code simpler:
We can try this first implementation on our test arithmetic expression:
What we get is is a correct result: the pretty printed expression has indeed the right meaning. But there are so much unneeded parentheses that the expression is really hard to read. We have to do much better.
Improving parenthesizing
To get better results, we need our infix notation to avoid adding useless pairs of parentheses. We know how to fix this: a multiplication only needs to add parentheses around sub-expressions that correspond to additions.
Unfortunately, Catamorphisms are not a powerful enough recursion scheme to implement this logic. The algebra given to the Catamorphism has no access to the sub-expressions, only their resulting string.
As a consequence, there is no way to know whether the expression was previously an addition (except by parsing it, which would truly be awful). The Catamorphism has failed us here: we need to turn to a different recursion scheme.
From Catamorphism to Paramorphism
The previous use case demonstrated us that we need the information on whether a sub-expression is an addition when dealing with the multiplication.
We will turn to the recursion scheme known as Paramorphism
, which is like a Catamorphism but with added contextual information.
Paramorphisms
Paramorphisms are pretty close to Catamorphisms. The goal is identical: it turns a local transformation on an open recursive data structure into a global transformation on the fixed point of this data structure.
The difference is the prototype of the algebra given as parameter to the recursion scheme. Instead of taking the open recursive type parameterized on the output type of the transformation, the algebra takes a an open recursive type parameterized on a pair:
- The first element of the pair is the output of the sub-expression transformation
- The second element of the pair is the sub-expression before the transformation
In more concrete terms, and applied to our arithmetic expression, here are the prototypes for the algebra of catamorphism and paramorphism:
The second element of the pair is the contextual information. In our specific case, it will provide us with the information about the sub-expression needed to make the right parenthesizing choice.
Paramorphism implementation
We will now implement the para
function, whose goal is to transform a local algebra into a global transformation on an arithmetic expression.
The implementation is very similar to the one of the Catamorphism. The only modification concerns the lambda provided to the fmap
function, which needs to return a pair instead of a single value:
Implementing the infix notation
We now have enough contextual information to implement our infix notation properly. For each sub-expression of addition and multiplication, our algebra has access to:
- The pretty printed sub-expression (first argument of the pair)
- The sub-expression itself (second argument of the pair)
We can therefore implement the pretty printing of the multiplication such that it adds parentheses around addition sub-expressions only.
We can try this second implementation on our test arithmetic expression. This result is clearly much better. The resulting infix notation has no unnecessary parentheses left:
Conclusion
This was the second post dedicated to the implementation of the recursion schemes often used in Haskell into C++.
Through a motivating use case, we discovered the limits of the Catamorphism recursion scheme, and learned about a new strictly more powerful one: Paramorphism.
Although typically unused in C++, we managed to provide a concise implementation of it. The result is clearly not as short and beautiful than in Haskell, but has the merit to show that it is not outside of the expressiveness range of C++.
You can access the full code of the DSL, along with all its associated interpreters, in this Gist
or alternatively in Ideone
.