Scale up your parallel R workloads with containers and doAzureParallel

综合技术 2017-11-22 阅读原文

by JS Tan (Program Manager, Microsoft)

The R language is by and far the most popular statistical language, and has seen massive adoption in both academia and industry. In our new data-centric economy, the models and algorithms that data scientists build in R are not just being used for research and experimentation. They are now also being deployed into production environments, and directly into products themselves.

However, taking your workload in R and deploying it at production capacity, and at scale, is no trivial matter. Because of R's rich and robust package ecosystem, and the many versions of R, reproducing the environment of your local machine in a production setting can be challenging. Let alone ensuring your model's reproducibility!

This is why using containers is extremely important when it comes to operationalizing your R workloads. I'm happy to announce that the doAzureParallel package , powered by Azure Batch, now supports fully containerized deployments. With this migration, doAzureParallel will not only help you scale out your workloads, but will also do it in a completely containerized fashion, letting your bypass the complexities of dealing with inconsistent environments. Now that doAzureParallel runs on containers, we can ensure a consistent immutable runtime while handling custom R versions, environments, and packages.

By default, the container used in doAzureParallel is the ' rocker/tidyverse:latest ' container that is developed and maintained as part of the rocker project . For most cases, and especially for beginners, this image will contain most of what is needed. However, as users become more experienced or have more complex deployment requirements, they may want to change the Docker image that is used, or even build their own. doAzureParallel supports both those options, giving you flexibility (without any compromise on reliability). Configuring the Docker image is easy. Once you know which Docker image you want to use, you can simply specify its location in the cluster configuration and doAzureParallel will just know to use it when provisioning subsequent clusters. More details on configuring your Docker container settings with doAzureParallel are included in the documentation.

With this release, we hope to unblock many users who are looking to take their R models, and scale it up in the cloud. To get started with doAzureParallel, visit our Github page . Please give it a try and let us know if you have questions, feedback, or suggestions, or via email at .

Github (Azure): doAzureParallel




Java 10改进了对Docker容器的支持 Java 10改进了对Docker容器的支持 2018.4.9 版权声明:本文为博主chszs的原创文章,未经博主允许不得转载。 许多运行在Java虚拟机中的应用程序(包括Apache Spark和Kafka等数据服务以及传统的企业应用程序)都可以在Docker容器中...
Docker学习总结(27)——Dockerfile详解 Docker可以从Dockerfile中一步一步的读取指令来自动的创建镜像,常使用Dockerfile来创建用户自定义的镜像。格式如下: # Comment INSTRUCTION arguments 虽然前面的指令大小写不敏感,但习惯性的还是建议大写。docker...
docker基本使用 1、登录阿里云控制台,找到自己的控制台,配置docker的镜像加速,在自己的centos7的机器上执行下述命令即可: sudo mkdir -p /etc/docker sudo tee /etc/docker/daemon.json <<-'EOF' { &...
Ubuntu & Docker & Consul & Fabio &... 相关博文: Ubuntu 简单安装 Docker Mac OS、Ubuntu 安装及使用 Consul Consul 服务注册与服务发现 Fabio 安装和简单使用 阅读目录: Docker 运行 Consul 环境 D...
Kafka-docker: Steps to run Apache Kafka Using Dock... 1. Objective In this Kafka-docker tutorial, we will learn the whole concept of Kafka-docker. Moreover, we will see uninstallation process of D...