Paramorph you DSL: C++

综合编程 2017-03-31

Two months back, we went over the process of building a DSL for arithmetic operations and we introduced the concept of Catamorphism as a way to decouple the traversal of an AST
from the operations we want to perform on it.

We saw how it could help us compose operations
before traversing the AST of our DSL, leading to more efficient composition and better testability.

We did this exercise in Haskell, introducing the concept of Catamorphism through its wonderful type system. Then we tried it in Clojure only to see it was a simple post-walk tree traversal. We ended up by exploring the limits of its applicability in C++.

The full series of post is available below for reference:

This post will build on top of the Catamorph your DSL: C++ Port
post. We will introduce, through a motivated use case, another very close and useful recursion scheme, Paramorphisms, and try to implement them in C++.

Reminder: Our Arithmetic DSL

Our Arithmetic DSL allows to build expression composed of operations like addition and multiplication of integer constants and integers variables. An arithmetic expression is one of the following:

  • A constant value (and integer for simplicity)
  • A variable (identified by a string for simplicity)
  • An addition of a vector of sub-expressions
  • A multiplication of a vector of sub-expressions

Because an AST alone is not very useful, we added several interpreters on our DSL to play with it. We implemented each of them in in the Catamorph your DSL: C++ Port
post:

  • prn
    to pretty print our Arithmetic expressions
  • eval
    to compute the value of an expression, given the value of all variables
  • optimize
    to simplify and optimize our expression
  • partial
    to partially evaluate our expression, given the value of some variables
  • dependencies
    to retrieve not yet resolved variables from our expression

Reminder: Recursion Schemes

Looking at our interpreters, we noticed in our previous post that they all had in common the need to traverse the full AST. We used Catamorphisms to factorize this concern out of our interpreters.

This section is a short overview of what Catamorphism are for and how we implemented them in C++. You can skip it if you feel comfortable with the notion already.

Open recursion on types

To use Catamorphisms, we first need to generalize our Arithmetic expression
and make it an open recursive data type: expression_r
templated on a parameter type R
, which represents the type of the sub-expressions.

An expression
of our DSL is then defined such as the template parameter R
is itself an expression. In more formal terms, expression
is the fixed point of the expression_r
type. We can compute it using CRTP
and boost::recursive_wrapper
.

Functor instance

We made our expression_r
type an instance of a Functor by implementing a fmap
function for our type. Note that by Functor here, we mean the mathematical construct
, and not function objects
.

The transformation map
given to fmap
applies on the template parameter of the expression_r
and may modify its type. In other words, fmap
allows to transform the sub-expressions.

Note: because the function provided to fmap can have different types as input and output, the type of the template parameter of expression_r can change under fmap, and with it, the nature of the recursion.

Generic tree traversal

We then defined the cata
function, which implements the Catamorphism recursion scheme. It takes what we will call an algebra
as parameter, and returns a transformation on a whole AST.

The algebra is a function that takes an expression_r
templated on Out
, and outputs an Out
value.

The result of cata
on the algebra (curried) is a function that applies on a full expression
and returns an Out
value:

For instance, the following print_alg
algebra transforms an expression_r
templated on string into a string. Its goal is to implement a “pretty print” of the expression at one stage of the expression alone.

Given to cata
, it will become an function that transforms a complete expression
into a string, basically being promoted to operate on all stages of the expression.

The use case that breaks the pattern

Every single technique we use in Software Engineering has its limits. Outside of these limits, the technique will either not apply or might not be the best tool for the job. Catamorphisms are no exception to this rule.

Because Catamorphisms are a way to factorize a particular recursion scheme, it cannot apply for recursions that do not follow the given pattern. We will now explore one use case that demonstrates these limits.

Infix notation

Let us imagine that the client of our arithmetic DSL does not like the prefix notation of our pretty print function. Instead, he would like the arithmetic expressions to be pretty printed in infix notation. For example, 1 + y + x * 3
instead of (+ 1 y (* x 3))
.

Now, and contrary to the prefix notation which follows a pretty simple parenthesizing scheme, pretty printing in infix notation requires us to be careful. A wrong use of parentheses could change the meaning of the expression.

A first attempt

As a first approximation, we could use the following strategy for the placement of the parentheses in our infix notation:

  • Systematically surround the arguments of a multiplication with parentheses
  • Never surround the arguments of an addition with parentheses

Implementing the pretty printer that follows this strategy can be done using Catamorphisms. We use Boost
to make the code simpler:

We can try this first implementation on our test arithmetic expression:

What we get is is a correct result: the pretty printed expression has indeed the right meaning. But there are so much unneeded parentheses that the expression is really hard to read. We have to do much better.

Improving parenthesizing

To get better results, we need our infix notation to avoid adding useless pairs of parentheses. We know how to fix this: a multiplication only needs to add parentheses around sub-expressions that correspond to additions.

Unfortunately, Catamorphisms are not a powerful enough recursion scheme to implement this logic. The algebra given to the Catamorphism has no access to the sub-expressions, only their resulting string.

As a consequence, there is no way to know whether the expression was previously an addition (except by parsing it, which would truly be awful). The Catamorphism has failed us here: we need to turn to a different recursion scheme.

From Catamorphism to Paramorphism

The previous use case demonstrated us that we need the information on whether a sub-expression is an addition when dealing with the multiplication.

We will turn to the recursion scheme known as Paramorphism
, which is like a Catamorphism but with added contextual information.

Paramorphisms

Paramorphisms are pretty close to Catamorphisms. The goal is identical: it turns a local transformation on an open recursive data structure into a global transformation on the fixed point of this data structure.

The difference is the prototype of the algebra given as parameter to the recursion scheme. Instead of taking the open recursive type parameterized on the output type of the transformation, the algebra takes a an open recursive type parameterized on a pair:

  • The first element of the pair is the output of the sub-expression transformation
  • The second element of the pair is the sub-expression before the transformation

In more concrete terms, and applied to our arithmetic expression, here are the prototypes for the algebra of catamorphism and paramorphism:

The second element of the pair is the contextual information. In our specific case, it will provide us with the information about the sub-expression needed to make the right parenthesizing choice.

Paramorphism implementation

We will now implement the para
function, whose goal is to transform a local algebra into a global transformation on an arithmetic expression.

The implementation is very similar to the one of the Catamorphism. The only modification concerns the lambda provided to the fmap
function, which needs to return a pair instead of a single value:

Implementing the infix notation

We now have enough contextual information to implement our infix notation properly. For each sub-expression of addition and multiplication, our algebra has access to:

  • The pretty printed sub-expression (first argument of the pair)
  • The sub-expression itself (second argument of the pair)

We can therefore implement the pretty printing of the multiplication such that it adds parentheses around addition sub-expressions only.

We can try this second implementation on our test arithmetic expression. This result is clearly much better. The resulting infix notation has no unnecessary parentheses left:

Conclusion

This was the second post dedicated to the implementation of the recursion schemes often used in Haskell into C++.

Through a motivating use case, we discovered the limits of the Catamorphism recursion scheme, and learned about a new strictly more powerful one: Paramorphism.

Although typically unused in C++, we managed to provide a concise implementation of it. The result is clearly not as short and beautiful than in Haskell, but has the merit to show that it is not outside of the expressiveness range of C++.

You can access the full code of the DSL, along with all its associated interpreters, in this Gist
or alternatively in Ideone
.

您可能感兴趣的

Google开源了Abseil,为C++和Python开发提供支持... Google公开了其项目内部使用的一系列C++库,随后还会公开其Python库。 Abseil已在Google历经十多年的开发,它的目的是为Google编程人员在各种项目上的工作需求提供支持,这些项目包括Protocol Buffers、gRPC和TensorFlow等。Google评价Abse...
C++并发高级接口:std::async和std::future std::async和std::future std::async 创建一个后台线程执行传递的任务,这个任务只要是callable object均可,然后返回一个 std::future 。future储存一个多线程共享的状态,当调用future.get时会阻塞直到绑定的task...
Solve the “Find the squares in C#” puzzle This post shows how to solve the puzzle Puzzle: Find the squares in C# . The program howto_square_puzzle_solution uses the following code to ...
Java Vs C++ | Difference between C++ and Java 1. Java vs C++ In this Java tutorial, we are going to study Comparision between Java vs C++, which is better. Moreover, we will discuss the major...
再论接口 如果说,类的设计思路,是以数据为基础的纵向组织结构,只有唯一的分类方式,有相同基类的,就意味着其相似性,共同点都体现在基类上;那么,接口就是以功能以性质从横向上,来看待类的相似性,并且存在无数的横向视角(否则就失去意义)。 静态面向对象语言,这里不考虑 template , c+...
0
Deque

责编内容来自:Deque (本文源链)
阅读提示:酷辣虫无法对本内容的真实性提供任何保证,请自行验证并承担相关的风险与后果!
本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。