I needed to build some binaries for a specific version of Linux. Because manually building in a VM is for chumps, I automated it in the cloud. It’s effectively two clicks (plus some confirmation steps).
This post will touch upon a couple of difference services, then tie things together toward the end. But first a warning: this is likely going to be of interest only to developers. If you don’t care about cloud development, then this post probably is not for you.
Until recently, something like Elastic Compute Cloud (EC2) was what you got when you signed up for “cloud.” When they say “the cloud is other people’s computers” this is that. You rent someone else’s ephemeral machine by the hour. When it starts up, it’s running a base operating system (AWS Linux by default). When you turn it off, any changes you made to it are wiped away so that the next person to receive it has a fresh machine. While it is running, you are responsible for security updates, locking down the machine and its services, OS-level login maintenance, and everything else that goes with running your own server. It’s highly flexible, but also sort of a drag to be responsible for all the setup and maintenance.
As a service, AWS Lambda is kind of neat. It’s like EC2, but with far less maintenance. But there’s a tradeoff: it is much less flexible. It is a service that you can upload scripts into — typically Python or Node.js, but Java and C# are also supported. Based on triggers like timers, API calls, or message queues, it executes your code. No big deal, right?
The big deal comes in from the fact that this is a managed service. I don’t have to care about the OS that’s running this code. I don’t have to care about library or runtime differences, like whether Python 2 or 3 is installed and how to switch between them. I don’t have to care about security patches, open ports, or weak login passwords. Someone else takes care of all of that for me. But that’s the departure from EC2’s flexibility: your job has to fit into one script (plus its dependencies).
Where this breaks down is when I may need native binaries. I can package a binary alongside the scripts I push up to Lambda. Those scripts can shell out and execute it. But that binary must match the OS that Lambda runs on. I can’t just compile a tool on my Mac and expect it to work in the cloud. I can’t even compile it on Ubuntu and push it to Lambda. It needs to be compiled on Amazon’s Linux version, so that it links against the proper versions of glibc, libstdc++, and such. It’s easy enough to do this — just spin up an EC2 instance, grab the code and compile — but it’s also a pain to manually go through this.
Experienced developers know that single-click builds are the only kind of builds worth doing. Otherwise, there’s a learning curve for newbies. Otherwise, you’ll make a mistake when under pressure and trying to rush out a build in the wee hours of the morning. One-click builds are not just a luxury, but imperative for sustainable operation.
CloudFormation is the cloud way of scripting the build-up, configuration, and tear-down of cloud resources. CloudFormation uses “templates” to build “stacks.” A template is a recipe for how to requisition and configure a set of resources. This is often used for spinning up a fleet of worker nodes, but can be used for databases, networking configuration, and any number of other things. “Running” the template gives you a “stack”: a set of computers and config that are currently set up and can be easily torn down in one-click. People often use it to say “my webapp needs five web servers, three application servers, some load balancers, and some network magic to securely isolate the layers.” But the neat thing is that you can use it to define almost anything in AWS.
I have a project that I’ve been mulling over in my head. I’d like to put together some online computing to artistically glitch-out videos. I don’t yet know the form this will take. Maybe a Twitter bot, maybe a website, maybe something else. But I do know that I want it to run on a managed service like Lambda or Docker. I want to be a developer, not a sysadmin. Most video tools are native code, so need to be compiled on the OS they’ll be running on. That means spinning up an EC2 instance, logging in, grabbing code, compiling it, and copying out the binaries. And the EC2 instance has to be running a compatible version of Linux to the Lambda host.
Attentive readers might ask: “If you’re automating, why not just do all these steps in Lambda? That would guarantee you’re running the same version of Linux, and has far fewer moving parts.” The short answer is that some of these tools take 10+ minutes to build. Lambda has a hard timeout at 5 minutes. If your computing job takes more than 5 minutes, then Lambda is not the correct tool for you at present.
Some of these video tools have IP or patent encumbrances that prevent you from shipping binaries. You can distribute the source, but may not be able to distribute the binaries in some jurisdictions. Many of these binaries, such as x264 and ffmpeg, default to building as shared libraries. While it is possible to set the
LD_LIBRARY_PATH on Lambda to account for binaries with shared libraries, it is far easier for both local development and production to just work with static binaries in your sandbox. The giant caveat here is: you can use those binaries in a service, but you cannot ship them to someone else.
The Cloud Build System
This is where my StaticBinaries project comes into play. It is a pair of CloudFormation templates to build the following:
- ffmpeg, ffprobe, and ffserver (compiled against x264)
Why two CloudFormation templates? Why not one? The two recipes have two different scopes — two different lifetimes. Namely:
- Create an S3 bucket to store the resulting binaries and create the minimal required permissions that allow EC2 to write to the bucket. This is where you will pick up the binaries after the build completes.
- Create an EC2 instance that updates itself, installs developer tools, grabs source code, compiles everything, and deploys the binaries into your S3 bucket.
The second CloudFormation template (the build) takes about 30-40 minutes. When it’s done, you can uninstall that stack (so you’re not continuing to pay for time on the EC2 instance). The S3 bucket is something you might want to keep around for longer. You’re free to keep that up (and pay the minimal hosting charges), or copy out the binaries and spin it down. It’s up to you.
I now have an S3 bucket that holds useful video-related binaries. I can download them and package them up with a Lambda script. Alternately, and with the correct permissions, I can just grab them directly from a Lambda function. Either way, they’re ready for use in a Lambda sandbox.
The StaticBinaries project also has the individual scripts to run for building the binaries manually. If you’re more comfortable doing that, versus relying on CloudFormation’s magic, there are a set of shell scripts that isolate the build and deploy steps. You can spin up EC2, log in, and run the scripts by hand.