Scale up your parallel R workloads with containers and doAzureParallel

综合技术 2017-11-22 阅读原文

by JS Tan (Program Manager, Microsoft)

The R language is by and far the most popular statistical language, and has seen massive adoption in both academia and industry. In our new data-centric economy, the models and algorithms that data scientists build in R are not just being used for research and experimentation. They are now also being deployed into production environments, and directly into products themselves.

However, taking your workload in R and deploying it at production capacity, and at scale, is no trivial matter. Because of R's rich and robust package ecosystem, and the many versions of R, reproducing the environment of your local machine in a production setting can be challenging. Let alone ensuring your model's reproducibility!

This is why using containers is extremely important when it comes to operationalizing your R workloads. I'm happy to announce that the doAzureParallel package , powered by Azure Batch, now supports fully containerized deployments. With this migration, doAzureParallel will not only help you scale out your workloads, but will also do it in a completely containerized fashion, letting your bypass the complexities of dealing with inconsistent environments. Now that doAzureParallel runs on containers, we can ensure a consistent immutable runtime while handling custom R versions, environments, and packages.

By default, the container used in doAzureParallel is the ' rocker/tidyverse:latest ' container that is developed and maintained as part of the rocker project . For most cases, and especially for beginners, this image will contain most of what is needed. However, as users become more experienced or have more complex deployment requirements, they may want to change the Docker image that is used, or even build their own. doAzureParallel supports both those options, giving you flexibility (without any compromise on reliability). Configuring the Docker image is easy. Once you know which Docker image you want to use, you can simply specify its location in the cluster configuration and doAzureParallel will just know to use it when provisioning subsequent clusters. More details on configuring your Docker container settings with doAzureParallel are included in the documentation.

With this release, we hope to unblock many users who are looking to take their R models, and scale it up in the cloud. To get started with doAzureParallel, visit our Github page . Please give it a try and let us know if you have questions, feedback, or suggestions, or via email at .

Github (Azure): doAzureParallel




Torus: A Toolkit for Docker-First Data Science Torus: A Toolkit For Docker-First Data Science Applying DevOps best practices to machine learning...
How to Share Data between Docker container and the... As we know,Dockercontainers are ephemeral, running just as long as it takes for the command issued in a container to co...
Docker 构建统一的前端开发环境 15年刚来运满满,那个时候的前端团队还只有2个人,还没有用到打包的相关工具,我们重构了前端的开发流程,引入了vue做基础开发框架,使用了fis3做项目打包,由于项目快速迭代,公共库的增加,我们引入了webpack,配合npm进行基础库...
利用docker部署深度学习模型的一个最佳实践... 最近团队的模型部署上线终于全面开始用上docker了,这感觉,真香! 讲道理,docker是天然的微服务,确实是能敏捷高效的解决深度学习这一块的几个痛点。 部分神经网络框架比如caffe依赖过重,安装困难。 各种网络模型未做工程...
issue #90: OCI, Apache, Go, Shells, Duplicity, mkt... Welcome to cron.weekly issue #90 for Sunday, July 23rd, 2017. A packed issue again, with news on containers, lic...