Disclosure: this post mentions my current hosting provider (
). I’m in no way affiliated to them, I just like them a lot.
First, some data:
- At the time of writing, there are
- The total size of these docsets is
- Bandwidth requirements are in the range of
5-7 TB / month
- It would cost about
$600 / month
to host them in a “regular” CDN. In contrast, my hosting only costs
$20 / month
(thanks to 4 VPSs from DigitalOcean)
Hosting the docsets
Some docsets are somewhat large, so download speeds need to be decent. This is achieved by hosting the files in different data centers:
- 2 mirrors in New York (for North America)
- 1 mirror in San Francisco (for North America and Asia)
- 1 mirror in Amsterdam (for Europe – or at least Western Europe)
- Extra mirrors can be added in less than 2 minutes to account for spikes
South America, Eastern Europe, Africa and Australia are not directly covered, but should still have alright download speeds, as no one complained yet. More mirrors will be added whenever DigitalOcean opens more data centers.
Dash performs latency tests on all available mirrors by loading a
. The mirrors are then prioritised based on latency. Whenever a mirror goes down, Dash notices it and avoids it.
This setup results in 100% uptime and really cheap bandwidth costs. I highly recommend you consider a similar setup if you need to host large files.
Hosting the docset feeds
The docset feeds are just small XML files which Dash polls to check for updates. These files are requested a lot, on each Dash launch and every 24 hours afterwards. As each docset has its own feed and most users have more than one docset installed, about 320k HTTP requests are made each day.
These requests are easily handled by a
nginx web server
on a 512 MB VPS in New York and are also mirrored on
. I tried using Apache but it would sometimes use over 1 GB of RAM while hosting these files and would end up completely failing, while nginx serves requests faster and uses less than 40MB of RAM. I’ll talk about my experiences with nginx in a future post.
Whenever Dash needs to load a feed, it launches 2 threads which race to grab the feed (from kapeli.com or from GitHub), whichever thread finishes first wins and its results are used. Most of the time, the kapeli.com thread wins.
The chances of both kapeli.com and GitHub being unavailable are very very small, so this approach resulted in 100% uptime so far.