技术控

    今日:124| 主题:49179
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] High-performance Computing with Amazon’s X1 Instance

[复制链接]
伴我╮別絆我 发表于 2016-10-5 14:53:48
161 4

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
We’re excited to announce support for Amazon’s X1 instances . Now in Domino, you can do data science on machines with 128 cores and 2TB of RAM — with one click:
   
High-performance Computing with Amazon’s X1 Instance-1 (techniques,previously,available,customers,machines)

  The X1 hardware tier is available in our cloud-hosted environment and can be made available to customers using Domino in their own VPCs.
  Needless to say, with access to this unprecedented level of compute power, we had some fun. Read on for some of our reflections about doing data science with X1 instances.
  Processing Power: Working with 128 Cores Under the Hood

  Access to 128 cores on a single machine was nearly unheard of even just a few years ago, much less on a platform which could trivially be rented by the minute. Core counts at this scale were previously only the domain of distributed and HPC systems.
  The ability to distribute a machine learning workload to 128 cores is a non-trivial problem, but two common techniques are (1) parallelizing the machine learning itself and (2) parallelizing fitting of the algorithm across multiple possible configurations (i.e., grid search).
   Parallelizing grid search is fairly straightforward, and packages like scikit-learn and caret offer great solutions for this. Parallelization of a machine learning algorithm, however, is a challenging problem. There are a number of natural limitations to this approach, not least of which are the large-scale matrix operations at the core of many machine learning algorithms. These have natural bounds on the amount of parallelism that can be beneficial.
  To explore these limits, I undertook a short and incomplete analysis of two modern machine learning toolkits, H2O and XGBoost for the task of fitting a GBM with 1,000 trees on the canonical airline dataset. I don’t undertake the task of validating the goodness of fit of the models generated. In this case, I’m simply interested in seeing how much parallelism these two packages are able to leverage when given a large number of cores.
  Using H2O’s R package version 3.10.0.6 and training on 100k rows of the airline dataset, the system was able to train a single model with 1,000 trees in 813 seconds. Full theoretical processor utilization would be 12,800%, that is, 100% utilization for each core. During training, processor utilization peaked at roughly 5,600%, implying 56 cores were in use.

High-performance Computing with Amazon’s X1 Instance-2 (techniques,previously,available,customers,machines)

  Given the nature of the GBM algorithm, this limitation is understandable. There is an explicit limit on the amount of parallelism possible for training as determined by the shape of the input to the algorithm. It is also interesting to note that while peak memory usage of 46GB is high for GBM, it was still a very small percentage of total available RAM on the X1. Although H2O’s GBM algorithm provides excellent performance, it was not able to harness most of the processing power and memory available with an X1 instance.
   When fitting multiple models and attempting to search a large hyperparameter space, the power of the X1 instance type and H2O’s Grid tools show value. Using H2O’s Grid Search example , H2O’s package was able to utilize roughly 35 cores.
12下一页
友荐云推荐




上一篇:simplify-string
下一篇:Rust 1.12 Brings Mid-Level IR
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

欧巴 发表于 2016-10-5 23:12:22
没有不透风的墙,没有不能上吊的梁。
回复 支持 反对

使用道具 举报

贪恋你的笑 发表于 2016-10-6 21:32:20
如果有一双眼睛陪我一同哭泣,就值得我为生命受苦。  
回复 支持 反对

使用道具 举报

山怀 发表于 2016-10-7 13:24:25
楼上的很有激情啊!
回复 支持 反对

使用道具 举报

兰赐 发表于 2016-10-9 01:39:40
别和我谈理想,戒了.
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表