Skip navigation EPAM

How a Data Factory Approach Can Increase the Value of Your Data

Val Tsitlik

VP, Technology Solutions, EPAM US
Blog

Many organizations are grappling with how to effectively extract value from the vast, ever increasing amounts of data. “Gartner data shows 87% of organizations have low business intelligence and analytics maturity.”* Organizations face constraints when attempting to maximize value from data. At the most basic level, deriving insights and value from data requires the ability to easily ingest and experiment with information in a governed and secure environment. Turning unused data into data products using a data factory approach can enable companies to effectively and efficiently leverage data assets to create business value.

DATA PRODUCTS

When a company transforms the data they own or collect into information that provides a specific business value, the outcome can be referred to as a data product. Data products are published through multiple channels, including data marts, dashboards, analytics models or APIs. 

Unlike data sets, data products are highly curated, governed, discoverable and shareable. They are created for a specific business purpose and are managed like software, meaning they can be versioned, tested, enhanced and deprecated.

Data products are an efficient way for organizations to focus on creating business value instead of just considering tools and technology. With the overwhelming amount of data, many companies struggle with determining what data should be loaded, considered and analyzed. Businesses address this challenge by only loading the raw data required to produce the product, avoiding a data swamp and analyzing only the essential information to make informed decisions and track associated business value. Data products also make it easier for companies to quickly respond to business opportunities and become more agile in leveraging new technologies and approaches, such as machine learning, artificial intelligence and cognitive computing.

DATA FACTORY

An enterprise data factory is a technology framework used to create data products efficiently, consistently and at scale. Think of a data factory as a logical program that defines the physical data platforms and shared enterprise services needed to govern the data products.  There are four essential components that make up an effective data factory:

  • Technological Platforms – the set of technologies most suitable for the organization’s use cases
  • Data Product Methodology – methodology to efficiently and effectively build and support data products
  • Data Management and Orchestration – approaches and tools to create data assets and publish them to consumers
  • Adoption and Promotion – a defined approach to promote broad internal adoption of data products to extract maximum value

Let’s explore each of these components.

TECHNOLOGY PLATFORMS

Technological platforms should be carefully crafted to support the specific organizational needs and goals based on real use cases and requirements. You don’t want to over-engineer a platform implementation. Technology has evolved to transform data management platforms from a single server data warehouse, to MPP platforms, to Hadoop-based data lakes, and finally to cloud-based enterprise data factories. Currently, these cloud-based data factories are virtually unlimited when it comes to computation, storage and geographical distribution. The public cloud has also introduced even more capabilities, such as elastic computation and low maintenance storage.

DATA PRODUCT METHODOLOGY

A major difference between a data product and data asset is that data products are managed in a similar way to other digital artifacts, like code and content. The lifecycle of a data product includes five major phases that businesses should consider in implementing an effective data factory:

  1. Research and Design: The desired business goal is defined and the data product is designed to deliver business value. A working prototype is created, and the data requirements are defined.
  2. Initial Data Storage & Collection: Data sources are identified and data ingestion into the raw storage area is automated to facilitate experimentation and profiling. The initial ingestion pipelines are built, and data is added to the catalog.
  3. Data Delivery: The data delivery model for the data product is defined.  This includes all necessary technology usage patterns for publishing or making the data available for consumption to downstream customers.
  4. Production Pipeline: The steps are defined and automated to ingest, transform and publish the data products, including scheduling, operational monitoring and flow management.
  5. Continuous Value Assessment: All digital artifacts eventually lose value and need to be deprecated. To maintain an accurate picture of data value, the usage and need of data products must be continuously monitored and evaluated.  The data product can evolve and be enhanced until it no longer provides business value.
DATA MANAGEMENT & ORCHESTRATION

Data management involves data governance, data quality, metadata management to improve the quality, usability and maintenance of data products. Data management should be viewed as facilitating access to quality data rather than an obstacle for gaining data access. One of the key differentiators of the enterprise data factory is that ALL data on the platform is registered in a catalogue, which ensures that data products are searchable.

ADOPTION & PROMOTION

The success of a complex data platform program largely depends on the quality of the user experience and how widely it is used. An adoption and promotion program is needed to ensure data products are used effectively and that data platform capabilities can evolve based on usage monitoring. An enterprise data factory should be positioned as a platform to internal customers – similar to Microsoft Office applications – by providing education about the platform’s capabilities and how the platform can be used to incorporate data-driven insights into daily work. The platform is not just to inform individual teams but can be used to discuss opportunities, troubleshoot problems and increase knowledge in the enterprise.

IN SUMMARY

There is an enormous potential payback for companies who invest in the right procedures and platforms to create value from data, rather than leaving these valuable assets on the table to sit and depreciate. The value and capabilities of an enterprise data factory will grow exponentially as more data products are built, more individuals use these products, and more data becomes available to build sophisticated data products.

You can find the original article here.

*Gartner Press Release, Gartner Data Shows 87 Percent of Organizations Have Low BI and Analytics Maturity, December 6, 2018.

 

Hello. How Can We Help You?


Our Offices