存储架构

Hadoop Weekly Issue #188

微信扫一扫,分享到朋友圈

Hadoop Weekly Issue #188
0

25 September 2016

Lots of releases this week—CouchDB, Accumulo, Kylin, Osso (a new OSS project from Rocana—but most notably Apache Kudu hit version 1.0. There’s a bit less technical content and general news than usual, but that’s to be expected. With Strata + Hadoop World taking place this week in NYC, get ready for tons of news in the next issue.

Technical

The Cloudera blog has a post on the recently released Apache Hadoop 3.0.0-alpha1. It describes several of the features of the release, including HDFS erasure coding, v.2 of the YARN Timeline Service, and the shell script rewrite.

http://blog.cloudera.com/blog/2016/09/getting-to-know-the-apache-hadoop-3-alpha/

MapR has posted a whiteboard walkthrough on how Apache Flink handles event time for stream processing. In addition to the video, there’s a transcript of the presentation.

https://www.mapr.com/blog/event-time-apache-flink-stream-processing-whiteboard-walkthrough

This post is a great walkthrough of Apache Drill. It covers a bunch of topics, including: quoting reserved keywords, interpreting/fixing json parse errors, use of subqueries, conveniences for querying csv, a basic overview of Drill’s web interface, plugin configuration, querying a rdbms, and analyzing a query plan.

https://www.mapr.com/blog/how-guide-getting-started-apache-drill

Cloudera has published a post comparing Apache Impala and Amazon Redshift. There’s an overview of key differences, but the main focus is a performance and cost comparison. As always, these results shouldn’t be viewed as necessarily representative (each dataset is different). With that said, using a TPC-DS derived workload, they show that Impala can often beat Redshift in cost and performance.

http://blog.cloudera.com/blog/2016/09/apache-impala-incubating-vs-amazon-redshift-s3-integration-elasticity-agility-and-cost-performance-benefits-on-aws/

The StreamSets blog has a post arguing that Apache Kudu’s support for efficient real-time access and atomic updates provides an alternative to the lambda architecture.

https://streamsets.com/blog/post-lambda-world-apache-kudu/

This post describes some of the challenges of moving a data science research project into a production data pipeline. The author argues that it’s important for developers and data scientists to work together to integrate quickly.

https://www.oreilly.com/ideas/what-is-hardcore-data-science-in-practice

News

IBM Power systems are getting support for Apache Hadoop through an IBM partnership with Hortonworks.

http://www.prnewswire.com/news-releases/hortonworks-ibm-collaborate-to-offer-open-source-distribution-on-power-systems-300330299.html

dataArtisans have announced the dA Platform, which is a distribution of Apache Flink with enterprise support.

http://data-artisans.com/announcing-the-da-platform-our-distribution-of-apache-flink/

Oracle and Qubole announced a partnership to bring the Qubole big data as a service offering to the Oracle Cloud Platform.

https://www.qubole.com/blog/qubole-and-oracle/

Omid is a transaction manager for Apache HBase that was recently accepted into the Apache Incubator after a proposal from Yahoo. It both provides snapshot isolation guarantees and can be used in high performance environments (supporting over 100k transactions/second).

http://yahoohadoop.tumblr.com/post/150821732246/omids-first-step-in-the-apache-community

Releases

Rocana has open sourced Osso, which is a new semi-structured event format. Built on Avro, the standard is meant to be easy, intuitive, efficient, and complementary to existing solutions.

http://www.osso-project.org/

The Google Cloud Platform blog has highlighted three integrations related to Kafka. The Google Cloud Pub/Sub connectors offer a mechanism for moving data between pub/sub and Kafka, the KafkaIO connector for Apache Beam allows Beam systems to consume from Kafka, and the Kafka to BigQuery connector can be used to mirror data to BigQuery.

https://cloud.google.com/blog/big-data/2016/09/apache-kafka-for-gcp-users-connectors-for-pubsub-dataflow-and-bigquery

Version 2.0 of Apache CouchDB was released this week. Highlights of the release include new clustering, a new querying language, and a rewritten admin interface.

https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces99

Apache Kudu announced version 1.0 this week. The release includes support for HA Kudu Master, a rewritten Apache Spark integration, an official client library for Python, and more. To mark the occasion, the Cloudera blog has an overview of the history of the project and a look at its future.

https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces100
http://vision.cloudera.com/apache-kudu-1-0-is-released/

Apache Accumulo 1.6.6 includes a data loss fix, a fix for DataNode decommission, dependency upgrades, and more.

https://accumulo.apache.org/release_notes/1.6.6

Amazon EMR now supports security configurations to enable encryption for data at rest and in transit. The post has an example of configuring the encryption providers.

http://blogs.aws.amazon.com/bigdata/post/Tx31P2UUJKR4ONF/Encrypt-Data-At-Rest-and-In-Flight-on-Amazon-EMR-with-Security-Configurations

Version 1.5.4 of Apache Kylin, the OLAP engine for Hadoop, was released.

http://mail-archives.us.apache.org/mod_mbox/www-announce/201609.mbox/%3CCANfpUctGUDjsNVoe_Pd1CJF4Ebh8ne2NSzZBaaYsj2d7M4rq6Q@mail.gmail.com%3E

Amazon Web Services has open-sourced the Amazon EMR-DynamoDB connector.

http://blogs.aws.amazon.com/bigdata/post/Tx1LFQWRADHKT44/Amazon-EMR-DynamoDB-Connector-Repository-on-AWSLabs-GitHub

Events

Curated by Datadog ( http://www.datadog.com )

UNITED STATES

California

Apache Spark Meetup (San Francisco) – Tuesday, September 27

http://www.meetup.com/spark-users/events/233723499/

Azure 101: Hadoop on Cloud (Mountain View) – Wednesday, September 28

http://www.meetup.com/Microsoft-Azure-Open-Group/events/234105376/

Washington

Scaling Recommenders + Content Embeddings at Facebook (Seattle) – Wednesday, September 28

http://www.meetup.com/Seattle-Scalability-Meetup/events/231640322/

Colorado

Apache Nifi (Lafayette) – Monday, September 26

http://www.meetup.com/Lafayette-CO-Tech/events/234001885/

Nebraska

Hadoop Security and Governance with Apache Ranger and Apache Atlas (Manhattan) – Wednesday, September 28

http://www.meetup.com/futureofdata-newyork/events/234153727/

Texas

Big Data & Data Science Workshop Using Apache Spark (Houston) – Monday, September 26

http://www.meetup.com/Houston-Spark-Meetup/events/234198876/

Georgia

Diving Into Big Data Technologies: Hadoop, Hive, and Apache NiFi (Atlanta) – Thursday, September 29

http://www.meetup.com/Technologists/events/231068842/

District of Columbia

“Data Analytics with Hadoop” Book Release Celebration (Washington) – Monday, September 26

http://www.meetup.com/Data-Community-DC/events/234075049/

New York

HBaseCon East 2016 (New York) – Monday, September 26

http://www.meetup.com/HBase-NYC/events/233024937/

Intro to Apache Kudu: Fast Analytics on Fast Data (New York) – Tuesday, September 27

http://www.meetup.com/mysqlnyc/events/233599664/

The Stream Processor as a Database (New York) – Wednesday, September 28

http://www.meetup.com/NYCRealTimeStreamingAnalytics/events/234329394/

NORWAY

Let’s Get Started with Hadoop #9 (Oslo) – Thursday, September 29

http://www.meetup.com/Oslo-Hadoop-Big-Data-Meetup/events/231886409/

FRANCE

Criteo Labs Tech Talks Session 3 (Paris) – Wednesday, September 28

http://www.meetup.com/Criteo-Labs-Tech-Talks/events/234001806/

NETHERLANDS

Introduction to Apache Flink (Amsterdam) – Thursday, September 29

http://www.meetup.com/Apache-Flink-Meetup-Amsterdam/events/233817119/

GERMANY

Data Engineering on AWS by Thorsten Greiner (Dusseldorf) – Thursday, September 29

http://www.meetup.com/Dusseldorf-Data-Science-Meetup/events/234016369/

POLAND

Hands-On Introduction to Apache Spark & Apache Zeppelin (Gdansk) – Wednesday, September 28

http://www.meetup.com/futureofdata-gdansk/events/233501975/

ISRAEL

Practical Distributed Stream Processing with Akka Streams (Tel Aviv-Yafo) – Tuesday, September 27

http://www.meetup.com/underscore/events/234017005/

INDIA

Discuss Key Emerging Big Data Technologies (Bangalore) – Thursday, September 29

http://www.meetup.com/Emerging-Big-Data-Technologies-Meetup/events/231614230/

Introduction to Hadoop, Yarn, HDFSStudents Only – Friday, September 30

http://www.meetup.com/Apache-Apex-Pune/events/234087397/

阅读原文...


Hadoop Weekly

CoffeeScript 1.11.0 发布,脚本语言

上一篇

城堡里学无人机:深入浅出无人机姿态,欧拉角,四元数,指数表示及数据转换与程序实现

下一篇

您也可能喜欢

评论已经被关闭。

插入图片
Hadoop Weekly Issue #188

长按储存图像,分享给朋友