Dataiku 4.1.0: More support for R users!

综合技术 2017-11-20 阅读原文


Recently, Dataiku 4.1.0 was released, it now offers much more support for R users. But wait a minute, Data-what? I guess some of you do not know Dataiku, so what is Dataiku in the first place? It is a collaborative data science platform created to design and run data products at scale. The main themes of the product are:

Collaboration & Orchestration: A data science project often involves a team of people with different skills and interests. To name a few, we have data engineers, data scientists, business analysts, business stakeholders, hardcore coders, R users and Python users. Dataiku provides a platform to accommodate the needs of these different roles to work together on data science projects.

Productivity:Whether you like hard core coding or are more GUI oriented, the platform offers an environment for both. A flow interface can handle most of the steps needed in a data science project, and this can be enriched by Python or R recipes. Moreover, a managed notebook environment is integrated in Dataiku to do whatever you want with code.

Deployment of data science products:As a data scientist you can produce many interesting stuff, i.e. graphs, data transformations, analysis, predictive models. The Dataiku platform facilitates the deployment of these deliverables, so that others (in your organization) can consume them. There are dashboards, web-apps, model API’s, productionized model API’s and data pipelines.

There is a free version which contains already a lot of features to be very useful, and there is an paid version, with “ enterprise features “. See for the Dataiku website for more info.

Improved R Support in 4.1.0

Among many new features, and the one that interests me the most as an R user, is the improved support for R . In previous versions of Dataiku there was already some support for R, this version has the following improvements. There is now support for:

R Code environments

In Dataiku you can now create so-called code environments for R (and Python). A code environment is a standalone and self-contained environment to run R code. Each environment can have its own set of packages (and specific versions of packages). Dataiku provides a handy GUI to manage different code environments. The figure below shows an example code environment with specific packages.

In Dataiku whenever you make use of R –> in R recipes, Shiny, R Markdown or creating R API’s you can select a specific R code environment to use.

R Markdown reports & Shiny applications

If you are working in RStudio you most likely already know R Markdown documents and Shiny applications. In this version, you can also create them in Dataiku. Now, why would you do that and not just create them in RStudio? Well, the reports and shiny apps become part of the Dataiku environment and so:

  • They are managed in the environment. You will have a good overview of all reports and apps and see who has created/edited them.
  • You can make use of all data that is already available in the Dataiku environment.
  • Moreover, the resulting reports and Shiny apps can be embedded inside Dataiku dashboards.

The figure above shows a R markdown report in Dataiku, the interface provides a nice way to edit the report, alter settings and publish the report. Below is an example dashboard in Dataiku with a markdown and Shiny report.

Creating R API’s

Once you created an analytical model in R, you want to deploy it and make use of its predictions. With Dataiku you can now easily expose R prediction models as an API. In fact, you can expose any R function as an API. The Dataiku GUI provides an environment where you can easily set up and test an R API’s. Moreover, The Dataiku API Node, which can be installed on a (separate) production server imports the R models that you have created in the GUI and can take care of load balancing, high availability and scaling of real-time scoring.

The following three figures give you an overview of how easy it is to work with the R API functionality.

First, define an API endpoint and R (prediction) function.

Then, define the R function, it can make use of data in Dataiku, R objects created earlier or any R library you need.

Then, test and deploy the R function. Dataiku provides a handy interface to test your function/API.

Finally, once you are satisfied with the R API you can make a package of the API, that package can then be imported on a production server with Dataiku API node installed. From which you can then serve API requests.


The new Dataiku 4.1.0 version has a lot to offer for anyone involved in a data science project. The system already has a wide range support for Python, but now with the improved support for R, the system is even more useful to a very large group of data scientists.

Cheers, Longhow.

责编内容by:Longhow Lam's Blog 【阅读原文】。感谢您的支持!


Unmatched ROI – Investing into the next generation... Last weekend, Hortonworks hosted a group of 12 teenagers from Gooddler Social Impact Youth Incubator who came to our Santa Clara headquarters to ...
2018年,20大Python数据科学库都做了哪些更新?... 2018年,Python仍然是数据科学领域解决重大任务和挑战的佼佼者。去年,我们发了一篇博文,列举了一些被证明是最有用的Python库。今年,我们扩充了原来的清单,并重新审视之前讨论过的库,重点关注在过去一年内出现的更新。我们对它们进行了分组,排序不分先后,因为真的说不清它们哪个更好。 核心库...
Dataiku Unveils New Headquarters in Downtown New Y... Global enterprise data science software maker officially relocates to New York and looks to become an integral contributor to the big data ecosystem, ...
解读欧盟 GDPR,这将是企业级数据科学不容忽视的合规风险... 雷锋网 (公众号:雷锋网) AI 科技评论按:欧盟于 2018 年 5 于 25 日出台数据保护条例 GDPR,随之在数据科学领域引起了广泛的讨论,这是因为严格的数据条例,将对数据科学项目,尤其是机器学习领域产生巨大的影响。 目前,随着技术的进步,机器学习也在飞速发展,全球对...