Heron Turns Open-Source as Twitter Prioritizes Faster Stream Analytics

存储架构 2016-06-06

You have surely heard of Heron — a stream-processing arrangement that works in real-time. Devised by Twitter, this system was a handy replacement for Apache Storm. This in-house alternative is finally out in the open as Twitter puts an open-source tag on it — two years after being announced.

Figure 1: Putting Heron at the forefront of Twitter’s functional hierarchy

On recapitulating the facts, we infer Twitter’s real motive behind creating Heron in the first place. The foremost requirement was speed, which a ‘real-time stream processing’ hierarchy is capable of providing. Scaling upwards was yet another necessity, resulting in the creation of Heron. Apache Storm was equally effective at times, but the new platform allowed seamless deployment, better management capabilities, easier debugging, and perfect usage of the ‘multitenant cluster’ environment.

That said, Apache Storm was quite fulfilling to begin with after being created by BackType — a Marketing Intelligence company. The former was then bought by Twitter way back in 2011. After the takeover, Storm the open-sourced platform was pushed right to the Apache Foundation. Storm definitely offers a lot of advantages as it boasts an entire ecosystem built around it. Data receipt is easier, but the hierarchy is way harder to decode.

Figure 2: Heron reduces spout latency

Admittedly, Apache Storm was always believed to be an intricate system — giving results only after a continued effort. No wonderStorm was challenged by other renditions — namely Apache Spark and even its very own revised framework for streaming in real-time — regardless of the recent v1.0.

All these factors forced Twitter to look for other avenues, and instead of refurbishing the existing project, the company opted to start from the scratch. It all started with a container and cluster-oriented design. The unique possibilities include jobs and topologies that need to be submitted to the master scheduling system. After processing, the new platform launches the required topology, via a series of usable containers.

Twitter provides us with the flexibility of selecting the desired scheduler with the choices being Apache Aurora, Apache Mesos, or something else. Apache Storm loses out here as one needs to provision the clusters manually — mainly for adding the scales. The best decision made by Twitter was to provide backwards compatibility to Heron pertaining to the Storm API. This was a practical move, as many systems were still using Apache Storm, and the bolts or spouts can now be moved over to the new platform. This concept is similar to using a messaging app like kik which will soon be providing backwards compatibility with obsolete versions — making online kik login easier than before. This messaging app will therefore allow data integration across PC and mobile versions without losing any data.

Coming back to Heron, even the older version of the real-time streaming platform can work with the current system – only with minor modifications — just like a futuristic messaging application. Moreover, people who are still invested in Storm can seamlessly migrate onto Heron with less effort — eliminating the need for a different project.

Figure 3: This is how ‘Heron’ works

The concept of backwards-compatibility allows Twitter to eye an encore of sorts — offering incentives to Storm users. With Heron, expect an increase in efficiency that might be somewhere between two to five times the current rates. The capex and lower opex are subject to major improvements with Heron on-board.

If you are looking for a faster processing system which streams in real-time — check out Twitter’s newest offering.

DZone

责编内容by:DZone (源链)。感谢您的支持!

您可能感兴趣的

Twitter 开源了数据实时分析平台 Heron “ 去年才说Heron短时间内不会开源,这才一年时间就开源了。嘴上说着不要,身体倒是很诚实嘛~ 去年,Twitter对外宣布了新的分布式流计算系统Heron,随后消息称 Twitter已经用Heron替换了Storm。 此举将吞吐量最高提升了14倍,单词计数拓扑时间延...
Twitter Heron阅读笔记 Twitter Heron阅读笔记 说明:本文是《Twitter Heron: Stream Processing at Scale》的阅读记录整理,再结合网上其他资料整理而成,文中图片主要来自Heron论文和InfoQ上的宣传资料。 Storm的问题所在 Worker级别 S...
Twitter开源了看上去完爆Apache Storm的Heron!... 去年才说 Heron 短时间内不会开源,这才一年时间就开源了。嘴上说着不要,身体倒是很诚实嘛~ 去年,Twitter 对外宣布了新的分布式流计算系统 Heron,随后消息称 Twitter 已经用 Heron 替换了 Storm。此举将吞吐量最高提升了 14 倍,单词计数拓扑时间延迟...
Article: Storm Applied Review and Q&A with the... Storm is a distributed, fault-tolerant, real-time computation system that was originally developed at BackType and lateropen sourced by Twitter. S...
Apache Storm vs Twitter Heron: Twitter’s motivatio... Twitter Heron Twitter processes billions of events daily and it isn’t easy doing it in real-time. They were using Apache Storm,...