Improving life science R&D outcomes with AI

Administrator
Staff member
Messages
223
Reaction score
0
Points
16

Maximizing the potential AI/data-driven model of drug discovery requires the technology’s full integration into R&D. Biorelate’s Dr Ben Sidders explains.​


Pharma R&D latest disruption is via a data-driven, AI-enabled mode of drug discovery, evidenced by the rise of ‘AI-first’ companies such as Recursion and Insilico Medicine. Traditional pharmaceutical companies are also keenly incorporating AI across their businesses. Although the perceived value of AI in drug discovery and development has struggled to match the hype, a series of successful point solutions have emerged now which are having a tangible impact on particular steps in the R&D pipeline, giving rise to some useful takeaways about how to successfully embed AI into its core cycle.

Positive progress​


In target discovery, knowledge graphs are adept at integrating a vast number of data sources into a query-able structure, which can be used to make informed and relatively unbiased target prioritization. A graph containing 84 million relationships derived from 37 separate sources has been successfully applied to the problem of target triage, identifying the most promising targets from hundreds of hits arising from genome-wide functional genomic CRISPR screens[1]. The time taken to identify hits for validation was reduced from months to days.

Challenges remain, however. Predicting synergistic drug combinations has been the topic of extensive research, and every flavor of AI model has been assessed with only limited success and almost no translational relevance. Nor are we any nearer to being able to predict the effect of a drug on a given patient without first running a clinical trial. Progress will require a structured and integrated approach to AI-enabled R&D transformation, spanning Data, Model, Culture and Validation considerations.

Data​


Up to now, AI has found most success where the data set is large, complete and in many cases has been generated specifically to solve the problem at hand. The UNI foundation model for computational pathology, for instance, was trained on >100 million images from 20 tissue types.

In contrast one of the largest datasets available to train models for drug combination synergy prediction has 910 combinations of 118 drugs – many orders of magnitude smaller.

This problem is further exacerbated when we look at data from clinical trial cohorts, which is often sparse, and inconsistent in what is measured. For example, one trial might collect demographics and data for a specific blood-based biomarker; another might also collect genomic data.

Then there are differences in the analysis pipelines applied to all these data. Re-processing and harmonizing all of these data types is highly labor intensive, and often only the start of the process. The features used to train models may not be derived directly from the harmonized data and may need significant further manipulation.

The underlying issue, is that Pharma’s data, particularly that from clinical trials, was not generated for AI. To exploit data in a meaningful way using AI, companies must develop a data strategy – and be willing to fund and generate data on clinical cohorts if possible – to build useful data of the required scale.

Model​


While AI models excel at classification and predictive problems, if AI is to revolutionize drug discovery it must incorporate causality. Predicting that a drug might work in a new indication is valuable, but it is not the same as explaining why the drug will work in that indication. To support internal and regulatory decision-making it is essential to have explainable biology that supports a mechanistic understanding of the particular drug or biology.

The integration of prior knowledge and data-driven insights offers a promising solution. AI combined with highly accurate causal relationships can distil both a broader array of targets with strong promise, and a mechanistic understanding of their biological role in disease.

Cause-and-effect relationships can be mined from the literature and created from experimental data. These relationships, defining the regulatory interactions between two biological entities, can be combined into structural causal models – a framework to represent and analyze the causal relationships between variables. Such models provide a systematic way to model how changes in one variable can lead to changes in another. These could be used during the training process of more expansive foundation models, but also to build specific mechanistic models that further describe the output from an upstream finding.

Validation


The output from all AI solutions should be validated, experimentally if appropriate, with two provisos. First, the R&D function should be set up so that all data feeds back to the AI model. This helps to mitigate some of the challenges described above, while ensuring that the model can be continually improved. For example, every result from the CRISPR screening group should find its way back into the knowledge graph so that future queries can benefit from that data.

Second, there needs to be a triage-based validation model. While an AI system is able to identify hundreds of targets, the challenge is to stay open to ‘left-field’ opportunities that AI might highlight. Orthogonal in silico approaches might be used to go from 1000 to 100 targets, but to go from 100 to 10 the team should adopt the quickest, most high-throughput experiment to yield the next rung of supporting evidence.

Culture​


Underlying many of the data, model and validation issues up to now has been the culture of the organization and its failure to fully adapt to an AI driven way of thinking or working.

While there are increasing efforts to bridge this gap, upskilling or recruiting talent with AI expertise is essential. At the same time data scientists must be educated in the decision-making process of R&D, and understand/develop methods that directly support that. More could also be done to build the understanding that AI will raise the productivity level of all R&D researchers, and is therefore an opportunity and not a threat.

Facilitating change​


Within 5-10 years, every major decision taken along the drug R&D pipeline will be accelerated by unparalleled access to knowledge. But to get there, data scientists will need to develop actionable models with causality at their heart; biologists will need to determine how to effectively integrate data science into their workflows; and heads of R&D will need to orchestrate more seamless integration and symbiosis between the two sciences.

The post Improving life science R&D outcomes with AI appeared first on LifeSci Voice.
 
Top