Scale up your parallel R workloads with containers and doAzureParallel

综合技术 2017-11-22

by JS Tan (Program Manager, Microsoft)

The R language is by and far the most popular statistical language, and has seen massive adoption in both academia and industry. In our new data-centric economy, the models and algorithms that data scientists build in R are not just being used for research and experimentation. They are now also being deployed into production environments, and directly into products themselves.

However, taking your workload in R and deploying it at production capacity, and at scale, is no trivial matter. Because of R's rich and robust package ecosystem, and the many versions of R, reproducing the environment of your local machine in a production setting can be challenging. Let alone ensuring your model's reproducibility!

This is why using containers is extremely important when it comes to operationalizing your R workloads. I'm happy to announce that the doAzureParallel package , powered by Azure Batch, now supports fully containerized deployments. With this migration, doAzureParallel will not only help you scale out your workloads, but will also do it in a completely containerized fashion, letting your bypass the complexities of dealing with inconsistent environments. Now that doAzureParallel runs on containers, we can ensure a consistent immutable runtime while handling custom R versions, environments, and packages.

By default, the container used in doAzureParallel is the ' rocker/tidyverse:latest ' container that is developed and maintained as part of the rocker project . For most cases, and especially for beginners, this image will contain most of what is needed. However, as users become more experienced or have more complex deployment requirements, they may want to change the Docker image that is used, or even build their own. doAzureParallel supports both those options, giving you flexibility (without any compromise on reliability). Configuring the Docker image is easy. Once you know which Docker image you want to use, you can simply specify its location in the cluster configuration and doAzureParallel will just know to use it when provisioning subsequent clusters. More details on configuring your Docker container settings with doAzureParallel are included in the documentation.

With this release, we hope to unblock many users who are looking to take their R models, and scale it up in the cloud. To get started with doAzureParallel, visit our Github page . Please give it a try and let us know if you have questions, feedback, or suggestions, or via email at .

Github (Azure): doAzureParallel


责编内容by:Revolutions (源链)。感谢您的支持!


『中级篇』构建自己的Docker镜像(16) 构建自己的Docker镜像,push到 。github源码: ...
AppDynamics update pinpoints performance issues in... AppDynamics released an update today with a nod towards the growing trend of containerization. T...
物联网技术周报第 104 期: 使用 Rapberry Pi 群集与 Docker 构建 Serve... InfoQ.com及所有内容,版权所有 © 2006-2017 C4Media Inc. 服务器由 Contegix 提供, 我们最信赖的ISP伙伴...
Docker tutorial: Get started with Docker volumes Docker containers are meant to be immutable. The code and data they hold never change. Immutabili...
未授权访问的tips 前言 知识那么多,大佬们学慢点,我营养跟不上啦! 前人栽树后人乘凉,本文主要是把一些资料依葫芦画瓢学习了下,做了个汇总. 0x00 小二上酒