Apache Spark Ignites Big Data fuel

The Silicon Review
15 Febuary, 2016

Apache Spark is among the Hadoop ecosystem technologies acting as catalysts for broader adoption of big data infrastructure. Now, Looker, a vendor of business intelligence software — has announced support for Spark and other Hadoop technologies. The goal? To speed up access to the data that fuels business decision making. Hadoop‘s arrival on the scene 10 years ago may have started the big data revolution, but only recently did adoption of this technology begin spreading to a wider audience. Apache Spark is one of the catalysts for the growing adoption rates.

Spark can be used as a replacement for MapReduce, a component of Hadoop implementations, to speed up the processing and analytics of big data by 100x in memory, according to the Apache Software Foundation. In today’s business environment, in which real-time analytics is the goal and organizations don’t want to wait for data warehouses and analysts to provide batch intelligence back to business users, Spark has gained momentum. And here’s one case in point: Looker, a business intelligence platform used by Avant, Acorns, and Etsy, this week announced support for Presto and Spark SQL. The company also updated its support for Impala and Hive, other Hadoop ecosystem technologies that speed up analysis on Hadoop. Looker’s announcement of support for these additional Hadoop ecosystem technologies lets organizations “leave data in Hadoop and process it at speed and at scale,” said James Haight, principal analyst at Blue Hill Research. “What we are talking about is getting some real business value out of this big data infrastructure that’s been talked about for so long,” Looker CEO Frank Bien told InformationWeek in an interview. “Hadoop is finally moving beyond the science experiments. Organizations can query large amounts of data.” That’s what Facebook does with Presto, Bien noted. And that’s what many other companies are doing with Spark.

Looker’s announcement this week may be the first in a series as vendors get ready for the Spark Summit East in New York City, February 16-18. Speakers include technologists and executives from Databricks, IBM, Capital One, Hortonworks, SAP and eBay. Haight, of Blue Hill Research, said many of his clients have been storing much more data in recent years, largely generated by social media and the Internet of Things (IoT). “These have outpaced our ability to store data, let alone analyze it,” he said. And now Hadoop has enabled businesses to store even more of it. “We are pouring data in, but we have no ability to process it. How do we process all this data? I’m dealing with a lot of companies at that juncture,” Haight said.