This validation requires both data integrity and traceability, which are some of the major drivers for digital transformation in the lab. “Many pharma companies are plagued by dark data—scientists within the organization or even within the same department may be unaware of similar or preexisting studies,” says EPAM’s Chief Scientist Chris Waller.
One cause of dark data is the inability to trace the development of initiatives through disjointed emails spread across several inboxes. Julie Gorenstein, Director of Data & Analytics at Takeda, elaborates on these challenges. “Generally speaking, everybody has exactly the same problem: There's a lot of data; how do you organize it to make it queryable and visualizable?”
Gorenstein says, “There's a lot of complexity on the process of alignment within the scientific community.” A simple example, she adds, can be found in our naming procedures here. “There are ontologies that exist to simplify complexity, but how do we trace the data from a raw value through an aggregated computed value? It might be one-to-one for a sample, or it could be one-to-two, or it could be to several samples, or it could be to several samples across many populations. So, how do you capture all this information? How do you trace through? That is a huge challenge.”
To address some of these traceability challenges, EPAM and the vaccine developer team implemented AWS cloud-based tools that leverage FAIR guiding principles for scientific data management.