How Machine Learning & Differential Privacy Can Be Used to Anonymize Production Data
Governments, policymakers and international organizations are under considerable pressure to protect personally identifiable information (PII). As regulation increases, it’s even more important to anonymize internal data used by development teams to protect your organization from PII leaks.
To date, development teams have often tested against production data, as it is believed to be the most realistic and can catch issues before market release In this white paper, our experts discuss a safer alternative for development teams to consider that uses machine learning with a data set that is trained to generate an anonymized data set from client production data. This approach retains the utility of the data without the inherent risk.
This paper first breaks down anonymization and the challenges in generating a data set that’s as useful as the original production data. From there, we use publicly available data sets as an example and compare the resulting anonymized data set against the original data to highlight our methodology. Download the paper to learn more about EPAM’s effective approach.