The Expressive C++17 Coding Challenge in D

You might have seen that I have been coding a lot in D lately and as a few weeks ago there was the Expressive C++17 Coding Challenge
with its solution in C++
and Rust
now being public, I thought this is an excellent opportunity to show why I like D so much.

The requirements

Let me first recap the requirements of this challenge:

This command line tool should accept the following arguments:

  • the filename of a CSV file,
  • the name of the column to overwrite in that file,
  • the string that will be used as a replacement for that column,
  • the filename where the output will be written.
./program    

Example input

Given this simple CSV as input

name,surname,city,country
Adam,Jones,Manchester,UK
Joe,Doe,Cracow,Poland
Michael,Smith,Paris,France
Alex,McNeil,Gdynia,Poland

the program called with:

./program input.csv city London output.csv

should write:

name,surname,city,country
Adam,Jones,London,UK
Joe,Doe,London,Poland
Michael,Smith,London,France
Alex,McNeil,London,Poland

Sounds fairly trivial, right? Please have a short look first at the “best” C++
and Rust
solutions before you look at the D solution.

The solution

Okay, so here’s one way to solve this in D. If you are scared at the moment – don’t worry. I will explain it line by line below.

#!/usr/bin/env rdmd
import std.algorithm, std.exception, std.format, std.range, std.stdio;
void main(string[] args) {
    enforce(args.length == 5, "Invalid argsn" ~
    "./tool    ");
    auto inFile = args[1], columName = args[2], replacement = args[3], outFile = args[4];
    auto lines = File(inFile).byLine.map!(a => a.splitter(","));
    auto colIndex = lines.front.countUntil(columName);
    enforce(colIndex >= 0, "Invalid column. Valid columns: %(%s, %)".format(lines.front));
    auto os = File(outFile, "w");
    os.writefln("%(%s, %)", lines.front);
    foreach (line; lines.dropOne) {
        auto r = line
            .enumerate
            .map!(a => a.index == colIndex ? replacement : a.value)
            .joiner(",");
        os.writeln(r);
    }
}

1) What is #!/usr/bin/env rdmd
?

One of the favorite aspects of D is that it has a blazingly fast compiler. Period. I can compile the entire D front-end of the compiler (~200 kLoC) in less than two seconds or the entire standard library with lots and lots of compile-time function evaluation and templates and > 300 kLoC in 5 seconds from scratch without any cache or incremental compilation
.

This means that the compiler is almost as fast as an interpreter and rdmd
is the tool that allows handy usage as “pseudo-interpreted” language. You can invoke rdmd
with any file and it will automatically figure out all required files based on your dependencies and pass them to the compiler.

It’s very popular in the D community because for small scripts one doesn’t even notice that the program is compiled to real machine code under the hood. Also if the shebang header is added and the file is executable, D scripts can be used as if they would be script files:

./main.d input.csv city London output.csv

2) So you import a bunch of libraries. What do they do?

import std.algorithm, std.exception, std.format, std.range, std.stdio;

In short std.stdio
is for input and output, std.range
is about D’s magic streams called “ranges” and std.algorithm
abstracts on top of them and provides generic interfaces for a lot of sophisticated algorithms.

Moreover, std.exception
offers methods for working with exceptions like enforce
and finally format
bundles methods for string formatting.

Don’t worry – the functionality imported from these modules will be explained soon.

3) Your program has a main function. What’s so special about it compared to C or C++?

void main(string[] args) {
    …

For starters, arrays in D have a length. Try:

args[5].writeln;

Compared to C/C++ null-terminated strings and arrays, it won’t segfault. It would just throw a nice Error:

core.exception.RangeError@./main.d(10): Range violation
----------------
??:? _d_arrayboundsp [0x43b622]
prog.d:9 void main.foo(immutable(char)[][]) [0x43ac93]
prog.d:4 _Dmain [0x43ac67]
Oh so D performs automatic bounds-checking before accessing the memory. Isn’t that expensive?

It’s almost negligible compared to the safety it buys, but D is a language for everyone, so the people who want to squeeze out the last cycles of their processor can do so by simply compiling with -boundscheck=off
(for obvious reasons this isn’t recommended).

In D, strings are arrays too and there’s another nice property about D’s arrays. They are only view on the actual memory and you don’t copy the array, but just the view on (in D it’s called a slice).

Consider this example:

int[] arr = [1, 2, 3];
auto bArr = arr[1 .. $];
bArr[] += 2; // this is a vectorized operation
arr.writeln; // [1, 4, 5]

There many other things D has learned from C and C++. Walter has recently written a great article
on how D helps to vanquish forever these bugs that blasted your kingdom
which I highly recommend if you have a C/C++ background.

4) What’s up with this enforce
?

enforce(args.length == 5, "Invalid args.n" ~
"./tool    ");

I have never seen the ~
operator before!

It’s the string concatenation (or more general array concatenation) operator. How often how you encountered code like a + b
and needed to know the types of a
and b
to know whether it’s a addition or concatenation?

Why don’t you use an if statement and terminate the program explicitly?
if (args.length < 5) {
    writeln("Invalid args.");
    writeln("./tool    ");
    return 1;
}

That’s valid D too. D allows a lot of different programming styles, but this article is intended to highlight a few specific D styles
like enforce
.


enforce

is a function defined in std.exception
and throws an exception if its first argument has a falsy value.

Hmm, I looked at the documentation
and saw this monster. I thought it simply throws an exception?

auto enforce(E : Throwable = Exception, T)(T value, lazy string msg = null, string file = __FILE__, size_t line = __LINE__)

I don’t have the time to fully dive into D’s syntax, but auto
instructs the compiler infer the return type for you. This leads to the interesting
Voldemort
return types

as they can’t be named by the user, but that’s a good topic for another article.

The next part looks a bit complicated (E : Throwable = Exception, T)
, but don’t worry yet. It means that E
is a template parameter which needs to inherit from Throwable
(the root of all exceptions), and is by default Exception
. T
is the template type of value
.

Wait. I just instantiated a template without specifying its template parameters?

Yes, the D compiler does all the hard work for. The technical term is Implicit Function-Template Instantiation
(IFTI). Of course, we could have instructed enforce
to throw a custom exception, but more on template instantiation later.

Alright. So this function takes a generic value
and a msg
, but a lazy string msg
?

lazy
is a special keyword in D and tells the compiler to defer the evaluation of an argument expression until is actually needed.

I don’t understand. msg
seems to be a string concatentation of two strings. Isn’t this done before
the enforce
is called?

"Invalid args.n" ~ "./tool    "

No, lazy
is lazy and the string concatenation doesn’t happen at the caller site, but can be requested explicitly by the callee.

It gets a bit clearer if we look at the second enforce
:

enforce(colIndex < 0, "Invalid column name. Valid are: %(%s, %)".format(lines.front));

format
and all the expensive work of formatting the error message is never done on the default path, but only if an exception actually gets thrown. Ignore the %(%s, %)
formatting string for a bit, it will be explained soon.

Ok, but how does that work?

In short: the compiler does a few smart lowering for you and creates an anonymous lambda. For more details, see this advanced article about
D’s lazy

.

But there’s more magic here. What’s __FILE__
and __LINE__
?

string file = __FILE__, size_t line = __LINE__

Remember that D is a compiled language and accessing the stack isn’t as easy as asking the interpreter nicely. These two default arguments are automatically set by the compiler with the file and line number of the caller. This is important for logging or throwing exceptions like we have done here.

So an API author can simply say “Hey, I would like to know the line number of my caller.” and doesn’t depend on the user hacking the replacements like its down in C/C++ with the preprocessor macros:

#ifdef SPDLOG_DEBUG_ON
#define SPDLOG_DEBUG(logger, ...) logger->debug(__VA_ARGS__)  << " (" << __FILE__ << " #" << __LINE__ <<")";
#else
#define SPDLOG_DEBUG(logger, ...)
#endif

In fact, D doesn’t even have a preprocessor.

5) auto
and a statically typed language

auto inFile = args[1], columName = args[2], replacement = args[3], outFile = args[4];

Hmm, but what’s auto
? I thought D has a static type system?

Yes D is statically typed, but the compiler is pretty smart, so we can let him do all the hard for for us. auto
is a filler word for the compiler that means “whatever the type of the assignment, use this as type of this variable”.

6) What the hell is UFCS?

auto lines = File(inFile).byLine.map!(a => a.splitter(","));

One of the major features of D is the Unified Function Call Syntax (UFCS). In short, the compiler will look up a function in the current namespace if it’s not found as a member function of a type, but let’s go through this step by step.

I looked at the
documentation of File

and it has a method
byLine

. So where’s the magic?

Have another look at
map

, it’s located in std.algorithm
.

Okay, wait. How does this work?

The compiler internally rewrites the expression File.byLine.map
to the following:

map(File.byLine());

Missing parenthesis are allowed too – after all the compiler knows that the symbol is a function.

Okay, but what’s up with this !(a => a.splitter(",")))
?

!
is similar to C++/Java’s
and allows to instantiate a template. In this case it’s a lambda function of a => a.splitter(",")
. Notice that for
splitter

UFCS is used again and your brain might be more used to reading splitter(a, ",")
for now.

7) Ranges

Okay to recap, we have taken the input of a file by line, splitted every line by commas ,
.

Wouldn’t this result in a lot of unnecessary allocation?

The short answer is: D uses “iterators on steroids” which are lazy and work is only done when explicitly requested. Usually range algorithms don’t even require any heap allocation as everything is done on the stack.

For example, in the next line .front
returns the the first line though which countUntil
explicitly iterates:

auto colIndex = lines.front.countUntil(columnName);

So lines.front
looks something like:

["name", "surname", "city", "country"]

countUntil
will return the of the first match or -1
otherwise. It’s a bit similar to indexOf
function known from e.g. JavaScript, but it accepts a template. So we could have supplied a custom predicate function:

lines.front.countUntil!(a => a.endsWith("ty"));

8) std.format: and compile-time checking of parameters

The next lines are:

enforce(colIndex >= 0, "Invalid column name. Valid are: %(%s, %)".format(lines.front));
auto os = File(outFile, "w");
os.writefln("%(s, %)", lines.front);

I have never seen writefln("%(s, %)")
. What happens here?


writefln

is just a handy wrapper around D’s
format

function. format
itself provides a lot of options for serialization, but it’s very similar to printf
, although it does provide a few goodies like the special syntax for arrays %(s, %)
.

This syntax opens an array formatting “scope” by %(
and closes it with %)
. Within this array “scope” the elements should be formatted with s
(their string
serialization) and use ,
a delimiter between the element.

It’s a shorthand syntax that often comes in handy, but if you don’t like it there are many other ways to achieve the same result. For example, joiner
:

lines.front.joiner(",").writeln;
How would such an error message look like?
object.Exception@./main.d(9): Invalid column name. Valid are: "name", "surname", "city", "country"
----------------
??:? pure @safe void std.exception.bailOut!(Exception).bailOut(immutable(char)[], ulong, const(char[])) [0x7a34b57e]
??:? pure @safe bool std.exception.enforce!(Exception, bool).enforce(bool, lazy const(char)[], immutable(char)[], ulong) [0x7a34b4f8]
??:? _Dmain [0x7a34b17f]

Okay, but isn’t printf
bad and unsafe? I heard that languages like Python are moving away from C-like formatting.

A Python library can only realize that arguments and formatted string don’t fit when it’s called. In D the compiler knows the types of the arguments and if you pass the format string at compile-time, guess what, the format can be checked compile-time. Try to compile a format string that tries to format strings as numbers:

writefln!"%d"("foo");

The compiler will complain:

/dlang/dmd/linux/bin64/../../src/phobos/std/stdio.d(3876): Error: static assert  "Incorrect format specifier for range: %d"
onlineapp.d(4):        instantiated from here: writefln!("%d", string)
Wow, that’s really cool. How does this work?

D has another unique feature: compile-time function evaluation (CTFE) that allows to execute almost any function at compile-time. All that happens is that writefln
is instantiated at compile-time with the string as template argument and then it calls the same format
function that would normally be called at compile-time with the known string. The coolest part about this is that there’s no special casing in the compiler and everything is just a few lines of library code.

9) Let’s parse the file

Now that we have found the index of the replacement column, have opened the output csv file and have already written the header to it, all that’s left is to go over the input CSV file line by line and replace the specific CSV column with the replacement
:

foreach (line; lines.dropOne) // remove the header
{
    auto r = line
        .enumerate // iterate with a (index, value) tuple
        // and lazily map a different value for the specific CSV column
        .map!(a => a.index == colIndex ? replacement : a.value),
        .joiner(","); // join the lines back to a CSV format
    os.writeln(r);
}

One of the cool parts of D ranges is that they are so flexible. You want to do everything in a functional way? D has you covered:

alias csvPipe = pipe!(enumerate,
                      map!(a => a.index == colIndex ? replacement : a.value),
                      partial!(reverseArgs!joiner, "_"),
                     );
lines.dropOne.map!csvPipe.each!(a => os.writeln(a));

There’s another cool thing about D – std.parallelism
. Have you ever been annoyed that a loop takes too long, but didn’t know a quick way to parallelize your code? Again, D has you covered with
.parallel

:

foreach (line; lines.parallel)
    // expensive operation in parallel
No way. I don’t believe this can be so simple.

Just try it yourself
.

The Garbage Collector (GC)

On the internet and especially on reddit and HackerNews there’s a huge criticism of D’s decision to do use a GC. Go
, Java
, Ruby
, JavaScript
etc. all use a GC, but I can’t better phrase it than Adam D. Ruppe:

D is a pragmatic language aimed toward writing fast code, fast. Garbage collection has proved to be a smashing success in the industry, providing productivity and memory-safety to programmers of all skill levels. D’s GC implementation follows in the footsteps of industry giants without compromising expert’s ability to tweak even further.

So ask your question:

Okay, “ability to tweak even further” sounds a bit vauge, what does this mean? I can tweak the memory usage?

Well, of course you can do that, but that’s something most languages with a GC allow you to do. D allows you to get the benefit of both worlds, profit from the convenience of the GC and use manual allocation methods for the hot paths in your program. This is great, because you can use the same language
for prototyping and shipping your application.

A short and simplified summary of allocation patterns in D:

  • malloc
    and friends are available in D (everything from C is)
  • RAII is supported (e.g. File
    you saw earlier is reference-counted and automatically deallocates its buffer and close the file once all references are dead)
  • there’s std.experimental.allocator for everyone with custom allocation needs
  • std.typecons
    provides a lot of library goodies like Unique
    , Scoped
    , RefCounted
    for @nogc
    allocation.

Mike Parker has recently started an extensive GC Series
on the DBlog which I recommend to everyone who prefers performance over convenience.

Other goodies

std.csv

Hey, I saw that there’s std.csv
in D, why didn’t you use it?

Simple – it felt like cheating:

import std.algorithm, std.csv, std.functional, std.file, std.range;

void main(string[] args)
{
    auto inputFile = args[1], columnName = args[2], replacement = args[3], outputFile = args[4];
    auto records = inputFile.readText.csvReader!(string[string])(null);
    outputFile.write(records.map!((r) {
        r[columnName] = replacement;
        return r;
    }).pipe!(rows => records.header.join(",") ~ "n" ~
             rows.map!(r => records.header.map!(h => r[h]).join(",")).join("n")
    ));
}

std.getopt

One of the reasons why this challenge used positional arguments and no flags is that argument parsing is pretty hard in C++. It’s not in D. std.getopt
provides convenience for everything out of the box:

import std.getopt;

int main(string[] args)
{
    string input, output, selectedColumn, fill = "FOO";
    auto opts = getopt(args,
        "i|input", &input,
        "o|output", &output,
        "s|select", "Select a column to overwrite", &selectedColumn,
        "f|fill", "Overwrite (default: FOO)", &fill,
    );
    if (opts.helpWanted || input.length == 0) {
        defaultGetoptPrinter("./program", opts.options);
        return 1;
    }
    return 0;
}

DMD, LDC and GDC

One of the things that newcomers are often getting confused by is that D has three compilers. The short summary is:

  • DMD (DigitalMars D compiler) – latest greatest features + fast compilation (= ideal for development)
  • LDC (uses the LLVM backend) – battle-tested LLVM backend + sophisticated optimizers + cross-compilation (=ideal for production)
  • GDC (uses the GCC backend) – similar points as LDC

Benchmark and performance

Benchmarking a language compiler is a bit tricky as very often you end up benchmarking library functions. In general, D code can be as fast as C++ and often is even faster – after all the LDC and GDC compilers have the same backend as clang++
or g++
with all its optimization logic. If you are interested to see how D programs perform against similar programs written in other languages, checkout Kostya’s benchmarks
.

There’s also an excellent performance benchmark
from Jon Degenhardt (member of eBay’s data science team) on how eBay’s tsv-utils
compare against existing CSV/TSV processing tools written in C, Go, and Rust.

@safe

Even though D is a system programming language that allows you to mess with pointers, raw memory and even inline assembly, it providers a sane way to deal with the dirty details. D has a
@safe
subset

of the language in which the compiler will enforce that you don’t do anything stupid thing and shot yourself in the feet with e.g. accessing undefined memory.

Unittest

One strategic advantage of D is that unit-testing is so easy as it’s built-in in the language and compiler. This is a valid D program:

unittest {
  assert(1 == 2);
}

And with -unittest
the compiler can be instructed to emit unittest block to the object files or binary. Here, rdmd
is again a friendly tool and you can directly go ahead and test your line with you this:

rdmd -main -unittest test.d

No advanced tooling setup required. Of course, this also means that it’s particulary easy to automatically verify all examples that are listed in the documentation, because there part of the testsuite. I even went one step further and made it possible to directly edit and run the examples on dlang.org
.

Other cool D features

There are many other cool features that D offers that didn’t make it in this article, but as a teaser for future articles:

  • Code generation within the language (cut down your boilerplate)
  • Strong and easy Compile-Time introspection (Meta-programming)
  • alias this
    for subtyping
  • -betterC
    (using D without a runtime)
  • mixin
    for easily generating code
  • A module system that doesn’t suck
  • Inline documentation
  • Contracts
  • debug
    attribute to break out of pure
    code
  • Built-in documentation
  • Contracts and invariants
  • scope(exit)
    and scope(failure)
    for structuring creation with its destruction
  • Native interfacing with C
  • with
    for loading symbols into the current name

For a full list, see the Overview of D
.

Downsides

Okay, so you say D is so great, but why hasn’t it taken off?

There’s a lot more to a programming language than just the language and compiler. D has to fight with the problems all young languages have to deal with e.g. small ecosystem, few tutorials / sparse documentation and occasional rough edges. Languages like Kotlin, Rust or Go have it a lot easier, because they have a big corporate sponsor which gives these language a big boost.

Without such a boost, it’s a chicken/egg problem: if nobody is learning D, it also means that no one can write tutorials or better documentation. Also many people have learnt a few languages and use them in production. There’s little incentive for them to redesign their entire stack.

However, things improved greatly over the last years and nowadays even companies like Netflix, eBay, or Remedy Games use D. A few examples:

  • the fastest parallel file system for High Performance Computing
    is written in D
  • if you drive by train in Europe, chances are good that you were guided by D ( Funkwerk
    – the company that manages the transport passenger information system – develops their software in D)
  • if you don’t use an Adblocker, chances are good that algorithms written in D bid in real-time for showing you advertisement (two of the leading companies in digital advertising (Sociomantic and Adroll) use D)

The organizations using D page
lists more of these success stories.

Of course, D – like every other language – has its “ugly” parts, but there’s always work in progress to fix these and compared to all other languages I have worked with, the ugly parts are relatively tiny.

Where to go from here?

Okay that sounds great, but how do I install D on my system?

Use the install script
:

curl https://dlang.org/install.sh | bash -s

And start hacking!

Acknowledgements

Thanks a lot to Timothee Cour, Juan Miguel Cejuela, Jon Degenhardt, Lio Lunesu, and ketmar for their helpful feedback.

Lobsters责编内容来自:Lobsters (源链) | 更多关于

阅读提示:酷辣虫无法对本内容的真实性提供任何保证,请自行验证并承担相关的风险与后果!
本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 综合编程 » The Expressive C++17 Coding Challenge in D

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录