A Poor Man’s CDN

移动互联 2013-09-29


Disclosure: this post mentions my current hosting provider (
DigitalOcean
). I’m in no way affiliated to them, I just like them a lot.

Hosting large and often-downloaded files can be tricky, especially when you want users to have decent download speeds and 100% availability. This is the story of how
Dash
’s
docsets
are hosted.

First, some data:

  • At the time of writing, there are
    102 docsets
    hosted
  • The total size of these docsets is
    1.5 GB
    (while archived)
  • Bandwidth requirements are in the range of
    5-7 TB / month
  • It would cost about
    $600 / month
    to host them in a “regular” CDN. In contrast, my hosting only costs
    $20 / month
    (thanks to 4 VPSs from DigitalOcean)

Hosting the docsets

Some docsets are somewhat large, so download speeds need to be decent. This is achieved by hosting the files in different data centers:

  • 2 mirrors in New York (for North America)
  • 1 mirror in San Francisco (for North America and Asia)
  • 1 mirror in Amsterdam (for Europe – or at least Western Europe)
  • Extra mirrors can be added in less than 2 minutes to account for spikes

South America, Eastern Europe, Africa and Australia are not directly covered, but should still have alright download speeds, as no one complained yet. More mirrors will be added whenever DigitalOcean opens more data centers.

Dash performs latency tests on all available mirrors by loading a
small file
. The mirrors are then prioritised based on latency. Whenever a mirror goes down, Dash notices it and avoids it.

This setup results in 100% uptime and really cheap bandwidth costs. I highly recommend you consider a similar setup if you need to host large files.

Hosting the docset feeds

The docset feeds are just small XML files which Dash polls to check for updates. These files are requested a lot, on each Dash launch and every 24 hours afterwards. As each docset has its own feed and most users have more than one docset installed, about 320k HTTP requests are made each day.

These requests are easily handled by a
nginx web server
on a 512 MB VPS in New York and are also mirrored on
GitHub
. I tried using Apache but it would sometimes use over 1 GB of RAM while hosting these files and would end up completely failing, while nginx serves requests faster and uses less than 40MB of RAM. I’ll talk about my experiences with nginx in a future post.

Whenever Dash needs to load a feed, it launches 2 threads which race to grab the feed (from kapeli.com or from GitHub), whichever thread finishes first wins and its results are used. Most of the time, the kapeli.com thread wins.

The chances of both kapeli.com and GitHub being unavailable are very very small, so this approach resulted in 100% uptime so far.

您可能感兴趣的

Java之常用web服务器 Unix和Linux平台下使用最广泛的免费HTTP服务器是Apache服务器,而Windows平台的服务器通常使用IIS作为Web服务器。选择Web服务器应考虑的因素有:性能、安全性、日志和统计、虚拟主机、代理服务器、缓冲服务和集成应用程序等。下面是对常见服务器的简介: 一、IIS Micr...
【客户案例】长江云 客户简介 长江云移动政务新媒体平台是由湖北广播电视台打造,服务全省各级媒体主流移动新媒体平台,这是积极响应国家 「推动传统媒体和新兴媒体融合发展」 政策,跳出媒体内融合「小圈子」,着眼本省各级媒体大融合,全力打造区域性、生态级、智能化的媒体、文教、政务...
如何实现一个Web Server 最近一鼓作气重构了去年造的一个轮子 Vino 。Vino 旨在实现一个轻量并且能够保证性能的 Web Server,仅关注 Web Server 的本质部分。在重构过程中,Vino 借鉴了许多优秀开源项目的思想,如 Nginx、mongoose 和 Webbench。因此,对比上一个版本的 ...
CDN的技术原理及资源访问流程 1. 什么是 CDN ? CDN 的全称是 Content Delivery Network ,即 内容分发网络 。其目的是通过在现有的 Internet 中增加一层新的网络架构,将网站的内容发布到最接近用户的网络...
A small HTTP server in Python The Python http.server module can be run from the commandline with python3 -m http.server to provide a simple http server that listens on...