Karri Linnoinen

Wrapping up DataWorks Summit 2017


By Karri Linnoinen

Every year Hortonworks, together with Yahoo, put on the DataWorks / Hadoop Summit. It’s a 2-3 day conference dedicated to Big Data and its technologies. This year it was my time to visit the summit, so I’ve compiled a quick summary.

#DWS17

DWS17 kicked off with an epic (to say the least) laser show.

IMG_3161

From the welcome keynote on Day 1, emphasis was on the Data itself. It’s not about Hadoop or the platform anymore, instead on how to create value from the data in your organisation or the data you have collected. That data is also the reason why the summit has been renamed from the “Hadoop Summit” to the “Dataworks Summit”. With the ability to process and use data in an entirely different way from times of old, new businesses will emerge from data.

“In today’s world, data is actually our product.”

Scott Gnau, the Chief Technical Officer at Hortonworks talked about how the Future of Enterprise is centered around four paradigms: Cloud computing, Artifical Intelligence, Internet of Things and Streaming Data. Many of these are already in use in organisations, Cloud computing especially. Artificial Intelligence, which in itself is a broader area, is getting a lot of traction as Machine Learning is becoming more accessible due to services like Microsoft Azure Machine Learning Studio.

As for the other keynotes on both mornings, the sponsor keynotes were a little hit-and-miss.

Delightfully, the last morning keynote on Day 1, by Dr. Barry Devlin, shook things up by outlining the fall of Capitalism, and how A.I will inevitably replace the factory worker. This is of course, if we continue on our present course. It was a very interesting take on the future of Big Data and life beyond it, considering the speed at which current and new technologies are developing. As technological progress increases at an exponential rate, a crash is almost inevitable. A some what morbid start to the summit you could say, but thankfully the presentation had a silver lining at the end — we are now at the turning point, where we can influence how the future turns out, and influence the steepness of the downward curve. Hopefully we are able to level it out and avoid Dr Devlin’s Skynet-esque future 🙂

Also on Day 2, the last keynote by Dr Rand Hindi, was a quick look into privacy issues in Cloud computing. With the introduction of personal voice-assistants like Amazon Alexa and Google Home, technology companies should be paying more and more thought to where consumers’ data is processed. Voice patterns are after all, just as unique as fingerprints.

Breakout Sessions

This year, as the focus was on data itself, you could see that many of the Breakout sessions were showcases of implementation by different companies. BMW, SociĂ©tĂ© GĂ©nĂ©rale, Lloyds Bank, and Klarna all showed how they’d leveraged Hadoop in their Big Data journey. Data Science was also in a big role at DWS17, as many of the customer showcases and Breakout Sessions had a Data Science theme.

Live Long And Process

Looking at the agenda for the two days at DWS17, you could see one thing jump out — Hive. Specifically Hive with LLAP. This was evident in the number of Hive (and LLAP) -specific Breakout Sessions. Apache Hive has been with the HDP stack for forever, and has been a staple part of many of our POC architectures at Bilot. Back in 2016, the launch of the Hive 2.0 LLAP Tech Preview made a lot of people happy, as the query speeds of Hive 1.x lacked the required punch, as well as missing full ACID support. Now with the newest version of the platform, LLAP is a reality (GA), and all the many sessions at DWS17 indicated it’s a big deal. Query times are reduced by an order of magnitude, which is definitely something to be excited about.

IMG_3096

LLAP also adds value to other newer technologies coming into the HDP stack. Druid, a new time-series optimised data store, can leverage LLAP’s parallel processing capabilities to speed up query times. I’m especially excited to test out Druid, as it will come bundled with HDP 2.6 and thus be deployable via Ambari blueprints. It’s currently in beta, but will hopefully mature quickly.

HDF

The Hortonworks Dataflow, powered by Apache NiFi, looked to be Hortonworks’ next big thing. Teradata for example has open sourced its new “data lake management software platform”, Kylo, which leverages NiFi for pipeline orchestration. Hortonworks’ DataFlow still requires a fair amount of infrastructure to run, but as its little brother miniFy (JVM-based version of NiFi) matures, I think the whole edge-node processing paradigm will take off in a completely different way. Especially when you can run NiFi on very resource-scarce systems.

But we’ll have to stay tuned.

HDP 2.6 and beyond

Funnily enough, the launch of the new major release of HDP and Ambari wasn’t hyped at DWS17 as much as I would have expected. Granted, there was a fair amount of buzz around its new features, but the focus definitely was elsewhere. That being said, it didn’t mean that the announcement wasn’t important. Many of the new, cool features are only available with HDP 2.6 and Ambari 2.5, so users will need to upgrade their existing systems to leverage LLAP and Druid, for example. I for one will definitely be doing some upgrading 🙂

Beyond the newest version of HDP, is Hadoop 3.0. It could be releasing as early as Q4/2017, and will bring improvements to resource management as well as container support (yay!). This will make Hadoop in itself more resource-aware, and mean better performance. The usage of Docker has exploded since its initial release four years ago, and some of the newer Hortonworks apps, such as Cloudbreak, already take advantage of the technology. So with the addition of container support to Hadoop, YARN could potentially control non-Hadoop services and applications deployed into containers.

In Summary

The Dataworks Summit is definitely something you need in your life, if Big Data is on your roadmap or you’re already knee-deep in it. I’m glad I went, since getting to talk to the developers and community members directly is invaluable.

Stay tuned for some blog posts on specific technologies related to what was showcased and discussed at DWS17. There are several key part of the new HDP release that can be discussed in greater length.

If you’re interesting in hearing about Bilot’s Big Data offering and how Hortonworks Data Platform can help your organisation, get in touch and let’s talk!