The Right Approach to Extracting Actionable Insights from Your Data
Data Lakes: A Solution to Siloed Legacy Corporate Databases?
Everybody else has one – I want one, too! Of course, I’m talking about data lakes. Their still-growing popularity has created massive repositories of all sorts of data – structured, unstructured, numerical, textual – within many life sciences and healthcare organizations whose digital research labs are consuming and analyzing data at an unprecedented pace.
We often hear of the five Vs when we talk about data lakes – volume, variety, velocity, veracity and – wait for it – value! The argument that most companies make to justify the investment is that, by throwing all of a corporation’s data together and applying analytics through Machine Learning (ML) or Artificial Intelligence (AI), the fifth V will most certainly appear. But what is right approach to extracting this value from your data and making it an actionable asset?
Whether we’re dealing with enterprise data lakes or our siloed legacy corporate databases, there’s still an inherent flaw in the approach: we’re often only looking for answers in the places we expect to find them, throwing new technologies at old problems expecting something magical to happen. In many cases, the answers exist in the interstitial spaces between the galaxies of data in our databases, and it takes a more thoughtful, tempered approach to find them.
Following the Scientific Method from Corporate Data Silos to Data Lakes
The life sciences industry is loaded with vast amounts of chemical and biological data collected by scientists to advance our knowledge of various diseases and empower us with the insights we need to create life-altering medicines, as well as discover the nature of our chemical and biological universe.
We are always searching for new technologies to make this process faster. ML algorithms and AI technology have been used in the life sciences and healthcare arena for quite some time now, from using multi-parameter regression to uncover relationships between chemical structures and biological activities to the application of more complex natural language processing to better define clinical trial cohorts and everything in between. The result is that data is siloed by nature.
Enterprise data lakes pose a significant challenge and possess significant potential to alter and even transform our industry by providing us with a technology that can, in conjunction with some ontology or other linking mechanism, break down the legacy data silos. Once data is combined, is it possible to use AI technology to troll through all the data?
Extracting Actionable Insights from Data Lakes: The Role of the Data Scientist
The analytics continuum in life sciences involves a wide variety of technologies that allow us to extract actionable insights from the information contained in our data collection. How we apply these insights is the responsibility of data scientists in a life sciences organization.
The industry is at a tipping point where there is a strong desire to become more ‘digital.’ This represents a paradigm shift from the legacy ‘experiment first’ culture where data are collected to disprove (or more often prove) a hypothesis and generally discarded (or at best stored and never used again) to a ‘data first’ culture where data scientists play a key role in the transformation and are tasked with extracting actionable insights from data that, in some cases, had been long forgotten.
Companies are all hiring data scientists as quickly as they are minted from data science programs at universities across the world. These data scientists are intended to drive the transformation of life sciences to become more data-driven through the application of analytics, including ML techniques. These data scientists represent the Natural Intelligence (NI) investments that life sciences companies are making.
A Final Word on AI
The life sciences industry must first become comfortable with this new approach being catalyzed through investments in NI. AI will naturally start to replace NI resources as the industry learns the value of the data and trusts the intelligence (natural or artificial) that is driving change. Whether you’re using a data lake or siloed data approach, there’s still a need for having the right people in place to take advantage of these technologies.
In closing, while I strongly believe that AI will play a major role in life sciences and healthcare, we must temper our enthusiasm and resist the urge to apply AI wholesale to our existing databases and emerging data lakes with the expectation that answers to our most challenging problems will be delivered. We must develop an even greater emphasis on data quality, metadata, interoperability and domain applicability if we truly expect to extract value in the form of actionable insights from the investments being in made.
In my final installment of this blog series, we’ll explore cloud collaboration as an enabler of industry transformation with specific use cases related to the digital research labs of the future and the associated security-based challenges and opportunities.