Skip navigation EPAM

How Machine Learning & Differential Privacy Can Be Used to Anonymize Production Data

Boris Khazin

Global Head of Digital Risk Management, Governance and Compliance, EPAM

Pavel Henrykhsen

Software Engineering Team Leader, EPAM

Ira Livshits

Senior Data Scientist, EPAM

Pavel Bobyrev

Data Scientist, EPAM
White Paper

Governments, policymakers and international organizations are under considerable pressure to protect personally identifiable information (PII). As regulation increases, it’s even more important to anonymize internal data used by development teams to protect your organization from PII leaks. 

To date, development teams have often tested against production data, as it is believed to be the most realistic and can catch issues before market release In this white paper, our experts discuss a safer alternative for development teams to consider that uses machine learning with a data set that is trained to generate an anonymized data set from client production data. This approach retains the utility of the data without the inherent risk.

This paper first breaks down anonymization and the challenges in generating a data set that’s as useful as the original production data. From there, we use publicly available data sets as an example and compare the resulting anonymized data set against the original data to highlight our methodology. Download the paper to learn more about EPAM’s effective approach. 

Get White Paper

Successfully submitted! Please check your email for the link to the document you requested.

Oops, something went wrong. Please try again.

If your download doesn't start automatically, please click

Validation failed! Please use the same browser and device that you used to fill out this form. You can also re-submit the form to receive a new download link.

Thank you for helping us keep your information up-to-date.

*Please complete required fields

Hello. How Can We Help You?

Get in touch with us. We'd love to hear from you.

Our Offices