Hazelcast Jet 0.5: “Pipeline API has been intentionally designed as a fluent Java API, inst…

Vladimir Schreiner

Hazelcast Jet 0.5 was released earlier this month. It includes the first release of Jet’s high-level API, which makes it easy for a developer to set processing parameters and embed Jet directly into an application. We talked with Vladimir Schreiner, Hazelcast Product Manager about the new release, what’s next for Hazelcast Jet and more.

JAXenter: Hazelcast Jet was one of the nominees for this year’s JAX Innovation Awards . Why do you think people like it? Why does it deserve to be one of the most innovative contributions to the Java ecosystem?

Vladimir Schreiner:Simplicity. Jet makes big data processing an application-level concern.

Processing of big data is no longer the sole domain of large infrastructure. Adding one 10MB JAR file is enough to establish an entire computing cluster, capable of processing gigabytes of data every second.

Via a simple and intuitive API, Jet simplifies developers’ ability to design and build applications that consume and process big data in real-time – it is a large boon to the productivity of software developers. Jet is a single library with no dependencies which is therefore easily embedded and deployed, removing the need for multiple systems.

JAXenter: We’ve heard this time and time again but we need the confirmation from you. Is Hazelcast Jet the direct rival of Apache Spark? What does it bring to the table?

Vladimir Schreiner:They overlap, but they are also different.

Spark, with its broad community and modules for machine learning, graph algorithms and SQL, is a significant addition to the large-scale data processing ecosystem. However, bear in mind that Spark has been built from the ground up for batch processing.

Hazelcast Jet is a true stream processing technology with much lower processing latency, a factor of 10x when compared to batch processing, which can process batches in real-time as they are delivered. For near-real-time use-cases, such as online trades, sensor updates in IoT architectures and fraud detection, the ability to process data in milliseconds is crucial. Jet is a library which adheres to a zero-dependency policy, which means it can be embedded into applications to build self-contained data processing microservices. In contrast, Spark relies on Hadoop infrastructure, which makes embedding difficult.

In the future, we will consider doing more integration with Spring, as its ubiquitous and a very popular framework among Java developers.

So, both Jet and Spark can be applicable to general batch and stream processing. Spark covers more specific use-cases via its modules. Jet provides a novel, streaming-first approach which makes it a better fit for streaming and microservices.

JAXenter: What’s new about Hazelcast Jet 0.5 ?0.4 included some dramatic changes but how about the latest release? Is it as meaningful as the previous one?

Vladimir Schreiner:0.5 is even more important as it includes the first release of Jet’s high-level API. This is the API which makes it easy for a developer to set processing parameters and embed Jet directly into an application. Hazelcast API design is known in the industry for conforming to existing Java programming conventions that make it easy for developers to pick up and quickly become productive. Jet 0.5 presents the first release of this API for big data processing.

Additionally, fault tolerance using distributed in-memory snapshots. In Hazelcast Jet 0.5 snapshots are distributed across the cluster and held in multiple replicas to provide redundancy. Jet is now able to tolerate multiple faults such as node failure, network partition or job execution failure. Snapshots are periodically created and backed up. If there is a node failure, Jet uses the latest state snapshot and automatically restarts all jobs that contain the failed node as a job participant. No additional infrastructure, such as distributed file system or external snapshot storage, is necessary to ensure Hazelcast Jet is fault tolerant out of the box.

Another important update is streaming access to the distributed data storage of Jet. So, instead of iterating over the items actively, a change event is fired and sent to the Jet job every time the value changes. This feature can be used for stream ingestion into Jet or to interconnect multiple Jet Jobs.

JAXenter: What’s interesting about the new Pipeline API? What is its purpose?

Vladimir Schreiner:The Pipeline API is the primary programming interface of Hazelcast Jet for batch and stream processing, making it more appealing to a wider Java audience.

For the first two major Jet releases, the main options for building Jet applications was a DAG API or a distributed implementation of java.util.stream. The DAG API, while powerful, is more imperative than declarative. It requires a very good understanding of the execution model and architecture of Jet and could be considered low-level. java.util.stream, on the other hand, being declarative was mostly designed for local, same-JVM processing rather than as a distributed computation API and misses many of the constructs of distributed data processing, such as joins and forks. Despite the name, it’s also designed as a batch processing API.

To overcome this, we’ve designed a powerful, general-purpose high-level API for processing both bounded and unbounded data. Hazelcast API design is known in the industry for conforming to existing Java programming conventions that make it easy for developers to pick up and quickly become productive. Jet 0.5 presents the first release of this API for big data processing.

JAXenter: How are you making the platform more appealing to Java developers? What are your next plans in this direction?

Vladimir Schreiner:To begin with, Jet is targeted to a Java developer audience.

We want Java developers to feel more at home using concepts from Java 8 such as lambdas or java.util.stream API. Also, the Pipeline API has been intentionally designed as a fluent Java API, instead of something like SQL.

Moreover, Jet is designed in Java allowing users to benefit from the convenient tooling and infrastructure.

And let’s use that word again, simplicity. Jet is one JAR and can be used by adding one Maven dependency as opposed to downloading, installing and managing multiple layers of infrastructure.

In the future, we will consider doing more integration with Spring, as its ubiquitous and a very popular framework among Java developers.

The Pipeline API is the primary programming interface of Hazelcast Jet for batch and stream processing, making it more appealing to a wider Java audience.

JAXenter: Who uses (or should use) the Jet platform and for what?

Vladimir Schreiner: Jet is used in a variety of stream and batch data pipelines. For example, payment processing, log analysis or in-store e-commerce to name a few. However, the majority of Jet use cases share the following traits:

  • Low latency (latency sensitive) data processing — not everybody has really “big” data. However, the ability to operate with latencies that are predictable and consistently low, even during unexpected peaks, are critical due to SLA requirements.
  • Scalability and resiliency — there is a lot of tools that can be used for batch or stream processing in a single node deployment: traditional Extract-Transfer-Load (ETL) or Complex Event Processing (CEP) frameworks for example. They do the job until you have to scale to keep up with input data growth or you have to guarantee high availability to deal with failures. People use Jet, as the latter seems to be a must-have requirement in modern software architectures.
  • Distributed in-memory storage — whether used for data ingestion, messaging, caching or to store results, the need for operational storage is ubiquitous when processing data. Jet offers scalable storage out-of-the-box built on top of Hazelcast IMDG.

SEE ALSO: Hazelcast joins Eclipse Foundation: “There is interest in adding JCache support to MicroProfile”

JAXenter: What’s next for Hazelcast Jet?

Vladimir Schreiner: We’re working on further extensions of the Pipeline API, mostly to fully support streaming use-cases. In addition, we plan to make the Jet cluster fully elastic, allowing users to automatically scale running computations to nodes being dynamically added to the cluster. Tooling for monitoring and diagnostics are also being discussed.

In 2018, we intend to reach 1.0 maturity.

Thank you!

JAXenter责编内容来自:JAXenter (源链) | 更多关于

阅读提示:酷辣虫无法对本内容的真实性提供任何保证,请自行验证并承担相关的风险与后果!
本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 后端存储 » Hazelcast Jet 0.5: “Pipeline API has been intentionally designed as a fluent Java API, inst…

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录