How The World’s Largest Design Marketplace Builds and Ships Code

存储架构 2016-06-20

By John Barton , Director of Engineering, at 99designs . You can find him here onStackShare and Twitter


99designs is the world's leading graphic design marketplace. We're best known for our design contests, but we connect designers and clients in a variety of ways including contests, fixed price tasks, 1:1 projects, and even off the shelf logos from our stock store. We were founded in Melbourne, Australia, but after raising VC our headquarters moved to San Francisco. The bulk of our product development group is still based here in Melbourne - just in a much bigger office!

As of April 2015, we've facilitated 390,000+ crowdsourced design contests for small businesses, startups, agencies, non-profits and other organisations, and have paid out over $110M to our community of 1M+ graphic designers around the world. We also serve localised versions of the site in English, Spanish, German, French, Italian, Dutch, Portuguese and Japanese.

I'm the Director of Engineering here at 99, which puts me one rung under the CTO where I take care of the day to day running of our tech team and the short to mid-term architecture vision. My background is as a Ruby on Rails developer turned dev manager, and if there's a career trend it seems to be working in small to mid-sized teams in fast growing dual sided marketplaces.

Architecture & Engineering Team

We're big believers in Conway's Law at 99:

organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations

Like so many other tech startups, 99 started out with the classic LAMP monolithic architecture and team, and as the company grew rapidly that approach added friction to our development processes. Early attempts at splitting the monolith and moving onto newer technologies weren't 100% successful with some services suffering bitrot making new changes more expensive than going back to the monolith. Maintaining a wide spread of services and languages created a high operational burden and our ratio of sysadmins to developers was uneconomical.

Around two years ago Lachlan, our CTO, went back to the drawing board taking Conway's law to heart. We now almost exclusively design the staffing around a particular product or "platform" challenge and allow the architecture to be an almost emergent property of our team structure.

We're now 33 engineers across both San Francisco and Melbourne arranged into around 8 cross functional teams, each of which is predominantly responsible for one major system. We've got a couple of developer positions in our Melbourne office opening very soon. Keep an eye on our jobs page for ads over the next few days. We generally look for anyone with PHP or Ruby experience, with Go knowledge being icing on top.

The architecture that has fallen out from this structure is roughly:

  • Varnish providing our outermost ring tying everything together - caching, carving up our route map to various sub-products, etc
  • Some core services written inGo, most notably our Identity & Single Sign On service and the system handling all of our marketing emails
  • 4 main product teams with their own "mini-monoliths" in eitherPHP orRuby on Rails, in a fairly standard LAMP shape of loadbalancer plus application tier plusAmazon RDS database cluster
  • A cross platform payments service inRuby on Rails
  • A data science team tying everything together in anAmazon Redshift database powering our business intelligence

Most of the systems listed are deployed usingDocker which we manage directly onEC2. Our technology per product have been driven largely by the sizes of our pager rotations. Every engineer takes part in an on call rotation, and we've divided the engineers into four roughly equal groups along technology lines: PHP on bare VMs, PHP in Docker, Rails/MySQL in Docker, and Rails/Postgres.

We're running > 130 EC2 instances, if I had to guess around 70-80 of those hosts run Docker.

In development our container orchestration is 80% bash and env vars, and we've added Docker Compose to manage our always on. A lot of the complexity in our dev orchestration is about selectively starting containers (and their dependent containers) based on what you'll actually be working on that day - we don't have infinite resources in our little MacBooks 🙂 In production, for now, we just have one Docker container per EC2 host and pretty much just manage those withCloud Formation.

Our main application images are built up from our own base Docker image in dev and prod, but we use a lot of the stock images on the Docker registry for things like databases,Elasticsearch, etc in development as rough equivalents for off the shelf Amazon products we use (likeElasticache or RDS).

Workflow & Deployment

Our development environments are 100% Docker-powered. Every codebase & system we develop day to day has a Docker container set up, and we wire those together with some bespoke bash and a standardisedVagrant image. We deliberately cobbled our current solution together in a language we didn't like so that once community consensus emerged on container linking we wouldn't feel the least bit bad about deleting what we've got and fight off "Not Invented Here" syndrome.

We practice full continuous delivery across all of our systems, with the exclusion of our credit card processor, where every commit to master is built and tested in CI and automatically deployed to production.

We useBuildkite to manage our build/test/deploy pipeline. It's a hosted management service with agents installed on your build servers that works just as well managing Docker containers and working with legacy "handcrafted" CI boxes for some of our older bits and pieces. Having a unified build management system without necessarily a unified test environment is really useful.

Tools & Services

For monitoring we useNew Relic,Papertrail for unified logging,Bugsnag for error reporting,PagerDuty for on call rotations,Cloudability for AWS cost analysis, lots ofCloudWatch alerts, and Wormly for our external http healthchecks.

For issue tracking, we stopped using Github Issues quite a while back internally, it was a real barrier to getting team members outside of engineering involved with the bug reporting and triaging process, so now we just handle it all withTrello cards.

We useSegment for tracking our business events for analytics on both client and server side. It makes product development a lot easier for each team to have one API to work with and only worry about what kind of events they emit, without getting bogged down in how they'll be analysed.

We use a bunch of different payment processors depending on which market the customer is in, which method the customer wants to use, and which services are up. The main ones in production now areStripe, Braintree ,Paypal, andAdyen. We useSift as one among several fraud prevention measures.

As I mentioned, 99Designs is available in eight different languages. Our localisation efforts proceeded in two waves, both of which heavily rely onSmartling. For the initial rollout we used their proxy service in front of the site that would swap out content on the fly with translated sentences we managed inside their CMS as we had just too many English-only pages to convert by hand. For the second phase we've been using Smartling to export XLIFF files so that we can display the right content direct from our servers. The second phase has been a much more organic process - as we redesign pages or launch new products we'll roll them out as internationalised from our hosts, but we haven't treated that as a project in and of itself.


We serve nearly all of our assets of all kinds throughCloudFront and store them inS3, though it's worth dividing that up between user submitted assets and the dev generated ones like CSS and JavaScript etc. We handle a pretty high volume of image uploads, as you'd expect as a graphic design marketplace. Last I checked we get a new design submitted every two seconds.

We've standardised onSASS across the business for CSS preprocessing, and use a framework one of our devs created call Asimov to manage the way we share SASS across projects in a component driven design process.

Each product has it's own makefile andgulp/grunt based asset pipeline. We eschewed the Rails asset pipeline as we felt it was better to have company-wide consistency in front-end workflow (as the asset pipeline is not available to our PHP teams) than it was to keep consistency with the Rails community.

Right now we consistently build our assets in CI (rather than on our production boxes) but what we do with them varies by team. Most of the products ship the assets along with the server-side code and serve them directly alongside dynamic content. Some of the newer projects have started shipping all static assets to S3 as part of the build, requiring us only to ship an asset manifest alongside our serve-side code. We'll be converging over time on this solution as it gives us a few operational wins. Firstly it makes for smaller git repos or Docker images, our compiled assets are quite heavyweight proportionately. The second benefit is that it puts much less through our load balancers, giving us a lot more headroom for future growth in pageviews without adding latency - something that can happen if you overburden ELBs.

What's coming up next

The biggest change for us right now is the introduction ofReact in the front end. It's been a huge benefit for some of our complex real-time parts of the site. Our designer to customer messaging system in contests is all React-based now and it made a very challenging problem much more tractable.

Before React we were predominantly usingFlight for communicating events between our rich client code and our AJAX handlers. This worked great for us for quite a while, it's easy to add in incrementally within non-frameworked jQuery code and components up to a certain complexity level, but becomes a bit "spooky action at a distance" as the complexity ramps up. React (andFlux) came along at the right time to help us address that. The discipline of the directionality of events in the Flux helps keep the comprehensibility under control. We're also excited that by pushing more templating into the JavaScript it gives us a path to sharing common brand components across PHP and Rails apps, something we've "solved" to date with liberal copy and paste.

Go is also becoming a bigger and bigger piece of our stack. It's such a pleasure to use as a JSON API layer that paired with our use of React, should allow us to see some great results.

As we get closer and closer to all of our production environment being Docker-based we've been keeping a close eye on how Amazon's Elastic Container Service has been evolving. Getting rid of our custom code managing our Docker containers on EC2 will be a big maintenance cost cutter for us, freeing up more engineering time to do product work.


责编内容by:StackShare (源链)。感谢您的支持!


docker registry带ssl认证的私有仓库搭建 1.首先docker pull registry默认下载最新版的镜像,我这边是2.6.2版本 2.这边考虑私有仓库部署的服务器可能没有网络,可以使用docker save -o registry.tar registry:2.6.2保存一个镜像,然后把registry.tar打包到部署包里面,...
迁移你的单体系统:最佳实践和关注领域... 假设有这样一种情况:你有一个对你的业务十分重要的复杂单体系统,你已经阅读过相关的文章,希望将这一系统迁移到更加先进的、使用微服务和容器的平台上,但又不知道从何入手。 如果你正面临着这一问题,那么这篇文章一定会帮到你。接下来,我将从最佳实践以及需要关注的领域两个方面,帮助你将单体应用程序演进为面向...
My Experience with Voxxed Days Singapore event Brief history about the event This year was the first time Voxxed Days event was happening in Singapore held on 2nd June 2017. Launched in 2015,...
How you can do continuous delivery with Vue, Docke... A few weeks ago at ng-conf , I announced the launch of — a project I worked on with Sarah Drasner to centralize all of my favo...
A Workshop on Linux Containers: Rebuild Docker fro... Docker From Scratch Workshop Preparatory Talk The preparatory talk covers all the basics you'll need for this workshop, including: L...