Skip navigation EPAM

Why Your Financial Services Company Should Consider Implementing a Data Factory

Why Your Financial Services Company Should Consider Implementing a Data Factory

Your financial analysis is only as good as the data used to create it. That’s why the need to explore and derive business insight from data analysis is greater than ever.

Data exploration is a multi-step process that consists of data acquisition, data management and data analysis – data acquisition and data management include data being collected, aggregated, governed, cleansed and packaged so it is properly normalized for data analysis. During data analysis, financial companies create intellectual property and derive most of the business value from data. Interestingly, the steps leading up to data analysis are considered fairly generic and commoditized, yet this portion of a company’s data technology budget usually greatly exceeds the amount spent on data analysis. Why? Because most financial companies have evolved their data acquisition and data management capabilities organically in a decentralized manner, utilizing what have now become legacy technologies that are inefficient and difficult to operate and optimize.

Today, the advancement of data-related cloud technologies offers an opportunity to build a centralized facility that fulfills and manages all company-wide needs for source data. Given the non-proprietary nature of source data, the build-out and operation of this facility can be cost-effective when utilizing and outsourcing many of its functions to an experienced product development services company. We call this facility a “data factory” to underscore that, in addition to a central data repository, it is also a platform for managing data throughout its lifecycle.

Current State

As previously mentioned, for most financial companies, data has been historically siloed by function and resides within a vendor or a proprietary application. This set-up presents many challenges around duplicating the resources required to properly manage data, data quality inconsistences resulting from this duplication, as well as potentially higher than necessary data procurement costs. 

Another challenge that many financial companies face is the speed with which new data sources are acquired and integrated. According to Greenwich Associates, 32% of asset managers and hedge funds cited they are challenged due to a lack of the time needed to evaluate the data, 28% have difficulty in understanding or working with data sets that are not customized for your specific use, and 32% say internal procurement processes are too slow or cumbersome.

The availability of new data sources—such as credit card transaction data, consumer sentiment data, detailed company operating data, internet search and traffic statistics, or information originating from unforeseen global events—that can be used for alpha-generating insight is growing rapidly. This is a positive development; however, source data containing alpha quickly becomes commoditized over time, leading to alpha destruction. That’s why the timeliness of new data source acquisition is paramount. 

Target State

The purpose of a data factory is to produce a data product from a raw data input through a repeatable, reliable and scalable business process. Unlike a raw dataset, data product is normalized, cleansed, governed, mastered and published to fit any number of specific use cases. In order to accomplish this process effectively, a data factory must implement the following functional and non-functional areas listed below:


  • Data Normalization – process of parsing raw data and mapping it to a common, uniform data model
  • Data Management – process and methodology of providing data security, entitlement, provenance and lineage tracing
  • Data Quality – process and methodology of data cleansing and curation
  • Data Bi-temporality – process and technology of maintaining data and metadata versioning due to revisions and corrections
  • Reference Data – component to identify data from a source, as well as reference the same data from multiple sources
  • Data Mastering – process of constructing a data product by encapsulating specific data from specific data sources
  • Data Product – data encapsulation, discovery and process of multi-channel publishing via downloadable files, visualizations and APIs


  • Data Quality – SLA based on data quality KPIs in terms of different types of anomalies, detected and corrected
  • New Dataset Acquisition and Availability – SLA based on a timeliness to integrate a new dataset and make it available via the data product
  • New Dataset Release – SLA based on a timeliness to process a new data release of a dataset
  • Performance – SLA based on performance of platform functions, such as search query, page rendering, etc.
  • Support – SLA based on the operational metrics KPIs, like software upgrade, helpdesk response, etc.

From working with our clients, we’ve found that a “data-as-a-service” strategy blueprint offers the best set of software architecture, project governance and solution-operating principles that are required to successfully deliver on a data factory. Rather than an internal team that likely does not have this level of expertise, working with a product development partner with robust experience in financial services, big data, DevOps, cloud, open source technologies and full stack development can help companies gain faster time-to-market with relatively little oversight needed.  

In Summary

We are starting to see an increasing number of a large- to medium-sized asset management firms that rely on quantitative analysis transition their data function to the outlined target state model. Given the potential return on investment and cost efficiencies that a data factory can realize, it’s not surprising. With the value and capabilities of a data factory, we can expect to see more users create additional data products that are re-used across the enterprise. 


Hi! We’d love to hear from you.

Want to talk to us about your business needs?