存储架构

R.I.P. HDFS | The Cloud Wins!

微信扫一扫,分享到朋友圈

R.I.P. HDFS | The Cloud Wins!
0

HDFS is an evolutionary dead end in the tree of big data. Data lakes based on S3 object storage deliver on the promise of separating storage from compute and make it possible to scale your processing and downstream analytics/AI and data marts on top of a data lake in an agile and elastic fashion. The HDFS architecture always bugged me when it was first released (besides the fact it is written in Java). Moving the code to the Hadoop data node (usually only three replicas available by the way), seemed to be inherently limiting to me. It was not really better than using big unix SMP servers other than you got to use cheaper commodity hardware and grow incrementally. Good stuff, but not good enough – 1 step forward and a half step backwards.

While the idea of moving code to the data sounded cool at the time, it is fundamentally a bad data processing design for a truly scalable data lake that allows for rolling up an arbitrary number ephemeral compute clusters on top of your storage. There is a place for HDFS and traditional Hadoop clusters, if you have big fixed and slow evolving predictable cluster of compute/storage environment. For the rest of us, a cloud based data lake architecture will win in the end and allow for agile development to meet the fast paced needs of downsteam today’s BI, analytics and AI/ML applications that need to sit on top of the mythical data lake.

阅读原文...


Avatar

Azure SQL DB and @@version

上一篇

Always in trouble with the scope of the class

下一篇

您也可能喜欢

评论已经被关闭。

插入图片
R.I.P. HDFS | The Cloud Wins!

长按储存图像,分享给朋友