Skip navigation EPAM
Dark Mode
Light Mode

AI-Ready Data: FAIR Matters for Life Sciences Companies

AI-Ready Data: FAIR Matters for Life Sciences Companies

If you don't understand your data, your artificial intelligence (AI) is not going to understand it either. This shouldn’t surprise anyone, but it's worth repeating. When you are training AI, you are looking for those models to pick out from the data things that matter – things that will be useful for you in your work. But much like telling a child to do something they’ve never done before (or seen you or someone else do), you need to be explicit in your instructions. To make your data support the semantics that it needs to train your models appropriately, it needs to be FAIR.

For those unfamiliar with the FAIR principles – the idea that data be findable, accessible, interoperable and reusable – we encourage you to learn a little more. Those looking for deeper insight might enjoy the following thoughts from EPAM’s resident FAIR expert, Ted Slater, from the webinar “AI-Ready Data and Why FAIR Data Matters in Life Science Companies” from the Pistoia Alliance.

There’s often a perception in conversations with leadership that FAIR data concepts have been around for a long time – that the concept is no longer novel and the conversation should focus on something new. How would you respond to this?

TS: Someone recently said to me that “FAIR has failed,” and it was surprising to hear. And I think the reason is that the FAIR guiding principles – all 15 of them – are just guidelines, right? They don’t tell you exactly what to do or take the actions for you.

It’s sort of like if your mom made a list of things for you to do to have a good life. If you don’t do those things – and suffer the consequences – did your mom fail? No. You failed. You failed to take action against the list.

So I think that’s the issue here with FAIR. People have historically failed to do the things that these guidelines recommend. The FAIR principles haven’t failed; we have. I think it’s important to remind leadership that, even though the concept was introduced over a decade ago, the FAIR data concepts should remain a topic of conversation until we’ve acted on them.

What does the term “Fake FAIR” mean?

TS: Fake FAIR is just the idea that people think they understand what FAIR is, and they take or perform certain actions along those lines that then won’t get them to where they think they’re going. A great example – and my favorite one to use when I’m explaining this – is something along the lines of, “Our platform supports free-text search and can index strings,” or “All our data is in the same data lake, so obviously it’s findable.”

In that context, there’s no mention of identifiers, global uniqueness of identifiers or metadata that’s going to help you (or your machine) find the right data – that is, if you can find any data at all.

Of the 15 FAIR data principles, which do you absolutely need to have before you can even begin to think about using your data for large language model (LLM) or GenAI projects?

TS: Oh wow, that’s a loaded question. I’ll start by saying it is super situationally dependent. There’s a jokey term that those in this space will use, and that is “fair enough.” Because checking all the boxes is more or less impossible. And there’s no need to have 100% accuracy, for the most part. I think having appropriate identifiers (e.g., Global Unique Persistent and Resolvable Identifiers (GUPRI)), addressing ontological concerns and making sure that your interoperability and usability are where they need to be would be a great place to start.

How can AI help in generating FAIR data?

TS: It’s a simple example, but knowledge graphs are an excellent way to make your data more FAIR and usable in a variety of different ways – and AI can generate those with speed. I’ve seen some excellent applications of LLMs to things like unstructured text in different modes to find entities, find relationships between entities and make rapid progress toward creating knowledge graphs.

Any final takeaways regarding FAIR and preparing data for AI’s full impact?

TS: I really do believe that if we don’t get better at FAIR data, we’re going to limit the effectiveness of our AI. It’s that simple.

We have to take extra care to invest not only our money but our intellectual capital into understanding what the FAIR principles are and how we implement them successfully. If we fail to make our data AI-ready, the AI systems that our organizations are working so hard to implement will never realize their full potential.

Learn more about Life Sciences & Healthcare at EPAM. View the full webinar, “AI-Ready Data and Why FAIR Data Matters in Life Science Companies” from the Pistoia Alliance, here

GET IN TOUCH

Hi! We’d love to hear from you.

Want to talk to us about your business needs?