Taking the Data Lake to an Enterprise Level: CIRRO Inc

The Silicon Review

The term Data Lake has become the ‘newest’ market metaphor in the world of big data and is being defined by vendors in ways that support their specific go-to-market strategies. In general, a data lake is a large parallel processing repository capable of storing unstructured, semi-structured and structured data that can scale to the demands of big data processing. Hadoop is the dominant storage and processing platform for data lakes today. Fundamental to the success of a data lake is the idea that all data required for analytics is in the data lake itself. However, many organizations struggle with being able to put all of their data into a single data lake much in the same way they struggled to put all of their data into a single enterprise data warehouse. In many cases the data lake becomes yet another data silo.

Cirro Inc. has taken a different approach to data lake challenge as forward-thinking firms will consider the requirement for Hadoop data lakes to seamlessly co-exist with existing data silos and potentially other data lakes. An Enterprise Data Lake is comprised of an ecosystem of data stored in disparate distributed systems housing a variety of data types and engines, crossing business units and purposes, but managed as a single entity and accessible through a common access method and point of entry. Cirro was founded by Mark Theissen, Dave Salch, Cliff Currie and Stuart Frost in 2010 to address the gap in the accessing traditional database sources in data centers and new emerging sources such as NoSQL, Hadoop, SaaS and the Cloud for analytic purposes. Glen Schrank serves as the CEO at Cirro, he is a technology visionary and leader, and has spent over 25 years in executive roles building and growing next generation technology businesses. While Mark Theissen operates as the COO, a serial entrepreneur and big data, analytics and data warehousing expert having spent his career in the industry in a variety of key roles.

Cirro is headquartered at Aliso Viejo, CA and is the industry’s first provider of enterprise data fabric technology, aimed at delivering on the promises of Big Data, Data Lakes, and the Internet of Things. Cirro’s technology unifies the data ecosystem by providing access, intelligent integration, and management of all enterprise data regardless of type, engine or location.This makes contextual real-time analytics attainable across disparate data sources while leveraging existing tools, infrastructure and skills. Cirro enables enterprises to optimize current infrastructure investments while taking advantage of new technology requirements and trends.

Cirro Offers a Revolutionary Approach to the Challenge of Data Silos
Cirro focuses on the Financial Services and High Tech Manufacturing markets. The attributes of Cirro’s target customers is comprised of large enterprises, with growing numbers of data silos that have or are implementing a data lake on Hadoop. Some of the big clients of Cirro are GE, Fortune 500 banks and Wild Tangent Games.

To support Enterprise Data Lakes, Cirro has developed a Universal Data Network (UDN) platform that enables users to serve themselves for their analytic, data exploration and reporting needs using the tools and skills they already have. Cirro’s UDN platform eliminates the complexity of integrating data from Hadoop, SaaS, NoSQL and traditional database sources. Using Cirro users see all of the data they are entitled to access as if it was in a single system or database. Performing analytics no longer requires pulling extract files, manually moving data around, creating spreadsheets, manually integrating data or having to rely on IT. Cirro delivers agility and time to value so that users can focus on the questions they want to ask of their data rather than spending most of their time trying to bring the data together.

Core components of the UDN are as follows:

Cirro Data Hub – Similar to a network router, the data hub provides a single point of entry for accessing all data sources within a defined data ecosystem. Through the data hub users can access, join and mash-up data of multiple types from multiple sources including Hadoop, SaaS sources and relational databases with common BI and data visualization tools.

Cirro Data Agents – Similar to a network switch, the data agents enable the seamless movement of data between processing engines and automatically reconcile data types as the data is moved. Data agents are strategically placed amongst data silos to form an interconnected data fabric.

Cirro Data Analyst – An Excel plug-in that provides an analyst workbench for data exploration, sampling and transformation.

The UDN solution is not a “rip-and-replace” strategy. Rather, it is an evolutionary approach that integrates Hadoop deployments or data lakes with the existing technologies and infrastructure of an organization. Contrary to other technology approaches, the UDN itself does not process queries. Rather, it orchestrates the processing of queries across the various processing engines within a defined data ecosystem.

For Financial Services organizations Cirro provides an agile and flexible solution that fits into and leverages existing and new infrastructure investments. Using Cirro’s UDN platform financial services organizations can judiciously meet their risk aggregation, compliance and audit requirements. In addition, Cirro provides the needed capabilities of bringing together disparate data sources for Single Customer View and Customer Monetization initiatives. Cirro helps any organization to enjoy a whole new level ad-hoc, reporting and data exploration capabilities that were previously not possible.

Cirro provides a complete Enterprise Data Lake solution that enables the creation of virtual objects or semantics as required but does not require them to be in place to query data. This means users have immediate access to data once the UDN is installed. “Indeed, raw data can be combined or joined with virtual objects,” adds Glen. In addition to being able to query data, UDN also provides ETL Express capabilities to move or migrate data between processing engines or platforms. “For example, with Cirro you can create a table or file on Hadoop from multiple sources like Teradata, Oracle and SaaS applications with a single SQL statement. This can eliminate significant ETL development work and deliver faster results and ROI for projects,” states Mark.

There are many differentiating factors within Cirro; likewise the company has patent pending technology that eliminates the challenges of data silos and distributed processing. Most importantly, Cirro leverages the existing processing power of the data ecosystem.

“The biggest challenge we face is in getting our word out and ensuring customers and partners hear about Cirro’s ability to improve their results from Big Data and Risk & Compliance initiatives with our technology and approach,” explains Glen.

Ideas down the Road

Cirro’s products today are enterprise ready and integrate with the existing data security infrastructure within an enterprise. Core to Cirro’s agility is an architecture that is easy to build on to meet the needs of customers today as well as their needs in the future. Going forward Cirro will continue to build out UDN based on a well-defined roadmap in the areas of semantics, security, federation, compatibility and administration.