Skip navigation EPAM

Helping Marathon Oil Create a Next-Generation Cloud Native Data Platform

Based in Houston, Texas, Marathon Oil Corporation (NYSE: MRO) is an independent exploration and production (E&P) company focused on four of the most competitive resource plays in the U.S., complemented by a world-class integrated gas business in Equatorial Guinea

In 2020, Marathon Oil set out to build a next-generation data platform with the overall goal to make data more accessible and enable more data-driven business decisions.

Key Challenges

To create a next-generation data platform to advance its data and analytics vision, Marathon Oil needed to start with building a solid foundation for data ingestion, transformation and consumption. The platform had to support ingesting data varying in volume, variety and velocity from many sources, provide advanced data transformation and refinement capabilities, and integrate with various business intelligence and analytics tools. In order to support rapid, iterative development and a low operational overhead, Marathon Oil wanted the platform to leverage serverless offerings in the cloud, infrastructure as code and automated deployments across environments. With a new data foundation in place, over 20 data pipelines and data sets needed to be migrated from an on-premise Hadoop platform to the new cloud data platform.

The Results

EPAM and Amazon Web Services (AWS) helped Marathon Oil develop a new data platform that delivers the performance, scalability and reliability required. With a repeatable process for adding new data ingestion pipelines, the company continues to add additional data sources. The cloud data platform is transforming Marathon Oil’s business by:

  • Centralizing data in a single platform and making it more accessible to business users
  • Providing the flexibility and scalability to support existing and future use cases
  • Removing recurring software licenses and reducing operational overhead from legacy on-premise platforms
  • Providing a repeatable process for building and deploying new data pipelines

Technologies Used

  • Amazon CloudWatch
  • Amazon EMR
  • GitHub
  • AWS Glue
  • AWS Lambda
  • Python
  • Amazon Simple Storage Service (Amazon S3)
  • Apache Spark
  • AWS Step Functions
  • Snowflake
quote bg

With EPAM’s help, we were able to build a data platform that is truly next gen. EPAM provided the necessary expertise and personnel to get the job done, with the balance of cost, time and quality we were looking for. The platform is enabling us to make data more accessible and extract additional value from it. It is an important piece of the puzzle to becoming a more data-driven organization.


Hi! We’d love to hear from you.

Are you ready to design the business models of tomorrow?