Databricks’ mission is to accelerate innovation for its customers by unifying Data Science, Engineering, and Business

The Silicon Review

thesiliconreview-ali-ghodsi-ceo-databricks-19 Databricks’ founders started the Spark research project at UC Berkeley, which later became Apache Spark™. The firm has working for the past ten years on cutting-edge systems to extract value from Big Data.

Databricks provides a Unified Analytics Platform powered by Apache Spark for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Salesforce, Viacom, Shell and HP.

Deep Learning with Databricks

Deep learning is the ideal way to provide big data predictive analytics solutions as data volume and complexity continues to grow, creating a need for increased processing power and more advanced graphics processors.

With deep learning, organizations are able to harness the power of unstructured data such as images, text, and voice to deliver transformative use cases that leverage techniques like AI, image interpretation, automatic translation, natural language processing, and more.

Common Use Cases

Image Classification: Recognize and categorize images for easy sorting and more accurate search.
Object Detection: Fast object detection to make autonomous cars and face recognition a reality.
Natural Language Processing: Accurately understanding spoken words to power new tools like speech-to-text and home automation.

Challenges of Deep Learning

While Big Data and AI offers a ton of potential, extracting actionable insights from Big Data is not an ordinary task. The large and rapidly growing body of information hidden in unstructured data (images, sound, text, etc) requires both the development of advanced technologies and interdisciplinary teams — data engineering, data science, and business — working in close collaboration.

Disjointed Technology: Reliance on separate frameworks and tools (TensorFlow, Keras, PyTorch, MXNet, Caffe, CNTK, Theano) that offer low level APIs with steep learning curves.

Costly Infrastructure: Providing the infrastructure to support deep learning can require significant amounts of costly resources and computational power to scale.

Data Science Complexity: Training an accurate deep learning model can be manually intensive on data scientists — often requiring labeling of data and tuning of parameters.

Democratizing Deep Learning

The Databricks Unified Analytics Platform powered by Apache Spark™ allows you to build reliable, performant, and scalable deep learning pipelines that enable data scientists to build, train, and deploy deep learning applications with ease.

Unified Infrastructure:

Fully managed, serverless cloud infrastructure for isolation, cost control and elasticity.Provides an interactive environment to make it easy to work with major frameworks such as TensorFlow, Keras, PyTorch, MXNet, Caffe, CNTK, and Theano.

End-To-End Workflows:

A single platform to handle data preparation, exploration, model training, and large-scale prediction. High level APIs and example applications let you easily leverage state of the art models.

Performance Optimized:

A highly performant Databricks Runtime powered by Apache Spark and built to run on powerful GPU hardware at scale.

Interactive Data Science:

Collaborate with your team across multiple programming languages to explore data and train deep learning models against real time data sets.

Making Machine Learning Simple with Databricks

As businesses contend with quickly growing volumes of data and an expanding variety of data types and formats, the ability to gain deeper and more accurate insights becomes near impossible at this scale without machine assistance.

Powered by Apache Spark™, Databricks provides a unified analytics platform that accelerates innovation by unifying data science, engineering and business with an extensive library of machine learning algorithms, interactive notebooks to build and train models, and cluster management capabilities that enable the provisioning of highly-tuned Spark clusters on-demand.

Build

Accelerate feature data extraction at scale.
Easily support a variety of data sources and formats.
Simplify ETL and implement machine learning in a single framework.

Tune

Speed up iterative model tuning with interactive notebooks.
Interactively query large-scale data sets in R, Python, Scala, or SQL.
Visualize results with rich dashboards.

Deploy

Provision distributed clusters on-demand.
Scale storage and compute resources independently.
Ensure uninterrupted operations with seamless updates.

High praise for Databricks

“Agility and flexibility were critical for us to successfully support our data science and engineering goals. Moving to Databricks Unified Analytics Platform to run 100% of our workflows has been a huge boost for our business and our customers.” 

- Matt Fryer VP, Chief Data Science Officer, Hotels.com

“Databricks takes the pain of cluster management away so we can focus on data science and not DevOps.” 

- Brent Schneeman, Principal Data Scientist, HomeAway

“Working in Databricks is like getting a seat in first class. It's just the way flying (or more data science-ing) should be.” 

- Mary Clair Thompson, Data Scientist, Overstock.com

The leading man

Ali Ghodsi, Co-founder and CEO: Ali is the CEO and cofounder of Databricks, responsible for the growth and international expansion of the company. He previously served as the VP of Engineering and Product Management before taking the role of CEO in January 2016. In addition to his work at Databricks, Ali serves as an adjunct professor at UC Berkeley and is on the board at UC Berkeley’s RiseLab. Ali was one of the creators of open source project, Apache Spark, and ideas from his academic research in the areas of resource management and scheduling and data caching have been applied to Apache Mesos and Apache Hadoop. Ali received his MBA from Mid-Sweden University in 2003 and PhD from KTH/Royal Institute of Technology in Sweden in 2006 in the area of Distributed Computing.

“We believe that Big Data is a huge opportunity that is still largely untapped, and we’re working to revolutionize what you can do with it.”