Skip navigation EPAM

An Open Source Approach to Time-Series Database Software

Ilya Gorelik

VP, Real-Time Computing Lab, EPAM
Blog
  • Financial Services

In the financial services industry, analytics always equates to the need for time-series data. Time-series data (typically market data, orders, executions and transactions of all kinds) is foundational to key business functions within financial services organizations, which can be broken into two categories: those using historical or real-time data.

Historical data includes:

  • Analysis of order execution quality (TCA)
  • Back-testing of alpha or execution algorithms
  • Risk measurement via historical simulation
  • Trade surveillance

Real-time data includes:

  • Front-office risk management
  • Running algorithms (execution, alpha)
  • Trade surveillance
  • Complex event processing (CEP)

Of course, real-time data becomes historical almost immediately, so it should be no surprise that several use cases require the simultaneous use of historical and real-time data. Commonly used statistical functions such as correlations, moving averages and VWAP (volume weighted average price) are examples.

The need for both historical and real-time data implies a “moving window” of data points: a dynamic set of data delineated by time (anything from microseconds upwards) or events (new trades, price ticks). This means organizations must incorporate the newest data item and omit the oldest. Maintaining this moving window of data is central to many financial activities, which is why we see commonality in both categories of time-series data. Considering rates of data ingestion are often measured as hundreds of thousands of messages per second, it follows that many use cases (e.g. back-testing, machine learning applications) require data consumed at rates of millions of messages per second.

Power to the People & to the Cloud

It feels like a long time ago when consumers and users of financial data at financial services firms had to take what they were given from systems designed and built by engineers with often nominal input from consumers. Hard copy reports and screens with at-best limited flexibility were the order of the day. Back then, data processing was centralized in the software in which the data resided–often the classic relational databases (like Sybase and Oracle), which were the foundation of many banking operational systems.

Contrast this with the situation today when many end users in financial firms use Python to perform analytics and non-technical users expect highly flexible user interfaces to perform analytics on real-time data. This means that across the user base, there will be varying and wide demands for different time-series data sets as differentiated by the amount of data, content, granularity, update frequency and analytics.

Cloud deployments create another interesting system architecture consideration–whether to perform analytics on the server hosting the data or to deliver data to the user or consumer “as is.” In effect, this delegates processing to the consumer and strips the duties of the time-series database software to the role of data ingestion and streaming only, maybe with some simple data filtering and grouping. This latter bare-bones approach to time-series data processing is central to cloud-based architectures to support maximum flexibility for analytical applications needed by today’s users.

However, it also creates challenges. Streaming data (even small-sized market data messages) measured in millions of messages per second is demanding of bandwidth. The need for fast data compression and decompression algorithms is essential when data is distributed over the internet and may even be useful for intra-cloud deployments. There is also a requirement for consuming applications to implement protocols that handle data delivered at very high frequencies, so they do not “choke” on data or compromise analytical integrity.

The Intersection of Cloud, Analytics and Open Source

The combination of these architecture challenges, the increase in cloud use and the variety of open source tools available for analytics prompted us to contribute our proprietary time-series database and messaging middleware software to the open source community via FINOS as TimeBase Community Edition (TimeBase CE). Under a commercial license, TimeBase is used by the quantitative and algorithmic trading communities as it provides both real-time and streaming data through a single high-performance streaming API (Java, .Net, C++, Python).

Deployed in the cloud, TimeBase CE implements the decentralized analytical architecture described above. By providing rich APIs and leveraging other open source components, financial services firms can build advanced highly customized time-series analytical applications rapidly and with full control over the technology stack.

Learn more about TimeBase CE at one of our upcoming speaker sessions that will break down this tool. Our team will be at the FINOS Open Source Strategy Forum in London on October 4-5 and New York on November 9-10, 2021 discussing TimeBase CE.

Hello. How Can We Help You?

Get in touch with us. We'd love to hear from you.


Our Offices