Digital: AI Hallucinations

AI hallucinations in pharmaceuticals: what are they and how to overcome them?

Large language models can be useful in the pharmaceutical industry in order to streamline digital capabilities. However, steps must be taken in order to mitigate the phenomenon of AI hallucinations – incorrect and false data that can jeopardise a company’s data set

Image

Karthik Narayan at Reltio

It has been four years since the COVID-19 pandemic raised the alarm about the need for the pharmaceutical industry to up its game regarding digital capabilities. Traditionally, the industry has been plagued by siloed and fragmented operations, which cause poor data quality. This also hinders how the pharmaceutical industry can use technology in clinical trials, drug discovery, development and design, and supply chain optimisation, to name a few.

Like in all industries, artificial intelligence (AI), including generative AI tools that use large language models (LLMs), is revolutionising processes. Some benefits that can be seen in the pharmaceutical industry when leveraging AI are: offering personalised treatments and tailoring medication better to the individual patient; ensuring tighter regulatory compliance with automated documentation and reporting; and using past historical data to predict how effective new medication will be. Generative AI could offer the industry a new opportunity and create between $60bn-$110bn per year in economic value for the pharmaceuticals and medical product industries.1

Even though the industry is likely to see explosive growth in economic value, this new technology does come with several risks, including the potential for hallucinations. This is when LLMs, which are trained on data models to predict the next word, produce results that could be perceived as true or very convincing but are, in fact, false. These are not ‘intelligent’ models in themselves and cannot conduct any self fact-checking operations. In addition, if these LLMs have been trained on inaccurate or low-quality data, the end user will be unable to confirm if the output of the model is trustworthy. When considering pharmaceuticals in particular, different from other industries, AI hallucinations could potentially create life or death situations for patients. It could even go as far as the AI creating its own, incorrect, clinical trial result, for example. Even with the technology available today, it is unlikely that AI hallucinations will be completely eliminated, but they can be significantly reduced if organisations focus on the data that is being input into the LLM. One of the key strategies to do this is to apply a strong data unification framework, which focuses on improving data quality, leading to more trustworthy data. This creates a stronger framework and foundation for more responsible AI. So, the pharmaceuticals industry must strike the right balance between using these technologies to improve their operations and their patients’ lives, but also ensuring that they do not fall victim to AI hallucinations.

Creating tailored LLMs for the pharmaceuticals industry

Ahead of the industry even considering the use of LLMs, it should review the data that is being fed into these models. The data should be sourced from transparent and ethical places, as well as being accurate. Critically, only data consented for use for model development should be leveraged by AI. This is a significant risk that pharmaceutical companies need to mitigate for properly leveraging AI capabilities. Additionally, the data that is being used must have the capacity to produce tailored and applicable responses for the pharmaceutical industry. This requires a two-pronged approach – retrieval augmented generation (RAG) and graph augmentation.

RAG involves creating a model that recalls information from a set source, including company-specific data, for the LLM to use. As such, this leads to the LLM producing more personalised and informed responses. However, it is essential for the RAG model to be built on high-quality, complete, relevant and accurate data. For this to be successful, the pharmaceutical industry should invest in a robust data management and unification system that can make real-time updates.

Yet, RAG alone is not effective in reducing AI hallucinations. The pharmaceutical industry should also be leveraging graph augmentation in its operations. This is a highly structured knowledge graph of organisation-wide entities and relationships within each business. This leads to pharmaceutical-specific terminology and facts to be included in the outputs, which are highly tailored to each business.

In a similar vein to RAG, the extent to which graph augmentation is effective is dependent upon the quality of the data that has been input into the model in the first place. Graph augmentation also includes a layer of quality control of the AI-produced responses, which furthers the quality assurance of the output.

AI is also driving significant improvements in data quality. Powered by LLM-based pre-trained ML models, rule-free matching is revolutionising data unification. This innovation ensures high match accuracy with minimal effort, automatically suggesting matches out of the box.

By leveraging zero-shot learning, the need for extensive model training is eliminated, significantly boosting data team productivity and accelerating implementation, which is crucial in the fast-paced world of pharmaceuticals.

LLMs should be based on trusted and unified data

When using LLMs in the industry, it is essential to leverage modern cloud-native data unification and management systems to address the challenges of training these models. These systems unify data from various sources, significantly improving the accuracy and timeliness of information available to downstream consumers. Key approaches to creating truly unified and trusted data include master data management (MDM) and 360 core data products, such as Customer 360. These methods are crucial for developing evergreen, reusable and scalable core data sets that are vital to the organisation – data unification and management systems that offer out-of-the-box integrations with data governance frameworks. Such integrations simplify this process, making it easier and more efficient to synchronise metadata. By using these approaches, data is not only unified from several sources, but also fed into the LLM in a real-time, consistent and highly accurate manner. Consequently, these systems are critical in enhancing the reliability and accuracy of outputs from LLMs.

In the pharmaceutical industry, where the risk of AI hallucinations can have serious implications, one effective strategy for ensuring unified and robust data is the use of

The future of LLMs in the pharmaceuticals industry

As AI continues to become increasingly embedded in everyday life, there is a great need for these models to be based on transparent, trusted and reliable data. In Europe especially, this has been pushed to the forefront with the introduction of the EU AI Act on 1 August 2024.2 The regulation set out a robust framework for how AI systems should be safeguarded in a number of categories from prohibited risk to those that pose minimal risk. The date marked the transitional period where businesses must ensure that their operations meet the specified criteria in the Act to ensure responsible and ethical uses of AI.

Image

canonical data models. A canonical data model presents the simplest possible structure for data and its relationships within the organisation. This simplicity enables seamless data unification across all sources, creating consistency within the models and improving accuracy.

Beyond unification, real-time data availability and automation are key to success. This can be achieved through an application programming interface (API)-driven performance. Security-compliant APIs facilitate seamless integration between the data platform and the LLM, leading to faster data access and processing. The continuous flow of consistent data reduces the likelihood of the model producing incorrect or inconsistent outputs, enhancing the model’s reliability and increasing user trust.

“ As AI continues to become increasingly embedded in everyday life, there is a great need for these models to be based on transparent, trusted and reliable data ”

As the pharmaceutical industry continues to grow and collect data from multiple sources, MDM and canonical data models must scale at the same pace. This scalability ensures that the most recent data is always available to the LLM, thereby reducing the risk of AI hallucinations and maintaining the accuracy of the model’s outputs.

As the pharmaceutical industry holds a lot of personal and private information, it falls into a higher risk category, meaning that the AI systems used need much more stringent regulatory requirements to comply with the EU AI Act. It is also essential for the industry to hold detailed documents of its AI system, including the data used, and be able to explain the AI’s decision-making process. As the Act emphasises accountability when using AI systems, it is crucial for those within the industry to meet these new safety standards, which reduces the risk of harm to patients and AI hallucinations.

AI tools are used to drive innovation and efficiency in several industries, including pharmaceuticals, so there is no shying away from the fact that they are needed. But, with this technology comes risks of AI hallucinations, and it is especially important in the pharmaceutical industry for this to be minimised. Working within regulations and ensuring that LLMs are built on a robust foundation of trusted, responsible and ethical data goes a way to ensure that the industry can reap the benefits of AI without the significant risks it brings.


Image

Karthik Narayan is the product management director of solutions at Reltio. He is a seasoned professional leading, executing and delivering on critical strategic initiatives in the product space. Karthik holds a Master’s of Science in Bioinformatics from Indiana University Bloomington, Indiana, US, and a Master of Business Administration from Harvard Business School, Massachusetts, US.