Amazon’s Elastic Compute Cloud (EC2) offers businesses the opportunity to rent scalable servers and host applications and services remotely, rather than pay for and handle the infrastructure and management of those resources on their own. The service, which first entered beta a little more than ten years ago, has historically focused on CPUs, but that’s changing now, courtesy of a newly-unveiled partnership with Nvidia.
According to joint blog posts from both companies, Amazon will now offer P2 instances that include Nvidia’s K80 accelerators, which are based on the older Kepler architecture. Those of you who follow the graphics market may be surprised, given that Maxwell has been available since 2014, but Maxwell was explicitly designed as a consumer and workstation product, not a big-iron HPC part. The K80 is based on GK210, not the top-end GK110 parts that formed the basis for the early Titan GPUs and the GTX 780 and GTX 780 Ti. GK210 offers a larger register file and much more shared memory per multiprocessor block, as shown below.
The new P2 instances unveiled by Amazon will offer up to 8 K80 GPUs with 12GB of RAM and 2,496 CUDA cores per card. All K80s support ECC memory protection and offer up to 240GB/s of memory bandwidth per card. One reasonAmazon gave for its decision to offer GPU compute as opposed to focusing on scaling out with additional CPU cores is the so-called von Neumann bottleneck. Amazon states: “The well-known von Neumann Bottleneck imposes limits on the value of additional CPU power.”
This is a significant oversimplification of the problem. When John von Neumann wrote “First Draft of a Report on the EDVAC” in 1945, he described a computer in which program instructions and data were stored in the same pool of memory and accessed by the same bus, as shown below.