Cheminformatics 101: The science behind smarter drug design

From accelerating drug discovery to optimising materials science, cheminformatics is shaping the future of pharmaceutical innovation. Here, experts from European IP firm Mewburn Ellis break down what this means for drug developers.

Cheminformatics can be defined as the use of computers to organise, manipulate, and transform chemical data into meaningful information. It is a discipline positioned on the interface of chemistry, computer science, and data science. Examples of techniques that commonly fall under the cheminformatics umbrella include storage and organisation of chemical information, structure-property predictions, virtual screening methods, similarity structural analysis, and the design of chemical compounds and libraries.

The term was first used in literature in 1998 by Frank Brown, however, the use of chemical informatic techniques has been traced back to publications from the late 1950s before the field was officially named. Work in the 1960s gave rise to chemical information storage and handling methods, followed by the introduction of quantitative structure-activity/property relationships (QSAR/QSPR), and pattern recognition.

Where traditional chemistry consists of in vitro experiments, cheminformatics research is conducted primarily in silico. Although both traditional chemistry and cheminformatics are based on chemistry concepts, cheminformatics applies software and algorithms to manipulate these in order to obtain results and predictions that may not otherwise be possible.

Key applications

Cheminformatics has been applied to many branches of chemistry, including drug discovery, materials chemistry, and environmental sciences, among others. Developments within these branches mean that they may encompass aspects beyond chemical principles, leading to the introduction of terms such as “materials informatics”. Today, modern materials informatics goes beyond chemical structures and properties alone, considering aspects of materials structures, functions, production, and lifecycle.

Cheminformatics in drug discovery

An increasing number of drug discovery projects now include in-silico steps for design, screening and/or optimisation, enabling larger chemical spaces to be explored in a faster and more cost-effective manner.

Generally, computer-aided drug design falls under two categories: structure-based drug design (SBDD) where the target structure is known, and ligand-based drug design (LBDD) where there is existing knowledge of compounds with biological activity against the target. In SBDD, commercial virtual compound libraries (such as Enamine’s REAL Space and WuXi AppTec’s GalaXi) and personal compound libraries can be virtually screened against a target for predicted binding properties. Compounds can be designed and optimised through techniques such as fragment growth and scaffold hopping.

LBDD uses approaches such as QSAR and pharmacophore modelling, to correlate chemical structure with biological activity. Candidates can also be screened in silico for predicted properties of interest, such as pharmacodynamics / pharmacokinetics or toxicity properties.

More recently, deep learning based technologies such as AlphaFold 3 have made it within reach to predict molecular complexes including protein drug targets for which no structure is known, opening the door to targeting of previously undruggable targets. More in-depth studies of compounds interacting with targets can be predicted and visualised using techniques such as molecular dynamics simulations.

These tools are valuable to the pharmaceutical industry as they can provide insights into the behaviour and mode of action of a compound within a simulated biological environment.

Early cheminformatic contributions to drug discovery came from larger companies such as GSK and J&J Pharmaceutical, due to a need for new methods to efficiently digest and analyse information from high throughput screens and chemical libraries. As mentioned in the Journal of Cheminformatics, a lot of initial groundbreaking cheminformatics work occurred within industry, meaning early developments in cheminformatics techniques are not as well published in comparison to fields like bioinformatics, which was mainly introduced and developed in academic settings.

female lab researcher

As computational resources have become more accessible, there has been a rise in contributions from universities, start-ups, and smaller biotechnology companies. In the last 20 years, downloadable software packages (for example SeeSAR, Flare and PyMol) have been developed and implemented in industry and academia. Alongside software advancements, there is a growing list of compound libraries for virtual screens, and databases to train prediction models.

Initiatives, such as the creation of large open-access databases of large and small biological molecules (like those hosted by the European Bioinformatics Institute and the RCSB Protein Data Bank), have played a vital role in the development of the field. This ultimately enabled the development of the current era of data-hungry artificial intelligence (AI) and machine learning (ML), including AlphaFold. The quality of datasets hugely influences results and outputs, with consideration for the quality of the question being asked too.

Framing a research question in the correct manner allows more information of value to be extracted from prediction models, highlighting that the production of high-quality data still depends on knowledgeable researchers.

The inclusion of AI/ML in the drug discovery process promises to provide significant derisking and acceleration of the drug discovery process, ultimately leading to a cheaper and less failure-prone process. As models evolve further with larger and higher quality training datasets, and new methodological developments are made, it is hoped that we will see a reduction in the famously high failure rate of clinical drug development.

Cheminformatics and intellectual property (IP)

Cheminformatics, like other in silico and data-driven fields, can create assets that are associated with different types of IP rights. For example, in the UK and the EU, database rights allow for the protection of databases, regardless of originality, to compensate rights owners for the effort and investment that has gone into creating the database.

Software code and some aspects of databases can be protected by copyright; however this does not protect the ideas behind the work – protecting technical concepts is typically the role of patents. The use of trade secrets can be a viable option to protect technical innovations, when keeping innovation secret is practical and attractive, e.g. where protection is either not possible or disclosure may be commercially unattractive.

However, growth in the field of cheminformatics has also meant an increase in the number of patents being filed by both industry and academia. The competitive nature of the pharmaceutical industry, the very fast paced growth of the cheminformatics field involving both academic and industry players of all sizes – including a very active start-up field that is typically more prone to disclosure than traditional pharma players – and a fast moving job market has meant that more innovators are filing patent applications to protect their inventions in this field, where perhaps in the past patent filing activity would have focussed on compositions of matter rather than platforms and methods.

Future of cheminformatics

It is expected that with improving AI and ML systems, smaller areas of cheminformatics will begin to grow at a faster pace. We may observe the introduction of other new branches of informatics, and the creation of more specialised software to cater to the needs of individual fields.

With increasing numbers of publications, patents and datasets being accessible to train ML models, cheminformatics can grow and improve across all applications. With the number of resources increasing at a rapid rate, it is noted that an awareness for the quality of literature and data being input to training models should be exercised if AI is to continue bringing useful outputs to the field.

A call for transparency within the area of data and AI may also lead to a more open-access field of research, allowing work to reach more people than before and further accelerate the pace at which contributions are made.

About the authors

Matthew Smith

Matthew Smith, partner, patent attorney at European IP firm Mewburn Ellis

Matthew Smith is a leading UK and European Patent Attorney working in the fast paced and dynamic field of high-performance materials, nanotechnology, and energy storage solutions. Smith’s diverse patent practice spans the realms of chemical and material sciences, and his clients range from multinational corporations to research institutions and university technology transfer departments.

Camille Terfve

Camille Terfve, partner, patent attorney, Mewburn Ellis

Camille Terfve is a unique computational biology and AI life science specialist, with technical and patent experience in this specific field. She has deep expertise in bioinformatics/computational biology, digital health, AI in the life sciences, and advanced bioprocessing. She has a passion for solutions that use data and tech to understand and leverage living systems to ultimately improve patient outcomes.

Jeremy Webster

Jeremy Webster – partner, patent attorney, head of chemistry, Mewburn Ellis

Jeremy Webster is a results-driven UK & European patent attorney with a proven track record in securing, protecting, and defending intellectual property rights for businesses operating in the chemistry, materials, and devices fields. His passion for innovation and dedication to excellence drive his thriving patent practice making him the go-to person for businesses of all sizes in need of an effective, pragmatic IP strategy.

Jonathan Wills

Jonathan Willis, partner, patent attorney, Mewburn Ellis

Jonathan Willis’s work is focussed on advanced materials and biologically active agents for medical and agricultural uses. He has extensive experience of drafting and prosecution, global portfolio management and invention capture. Willis is also actively involved in European opposition and appeal proceedings, with major projects defending chemical patents. His areas of expertise include, polymers and advanced materials, pharmaceuticals, agrochemicals, and new age chemistry.

This collaborative article was led by Mewburn Ellis Chemistry Intern, Amelia Stennett.

Sign up

Supercharge your pharma insights: Sign up to pharmaphorum's newsletter for daily updates, weekly roundups, and in-depth analysis across all industry sectors.

Click on either of the images below for more articles from this edition of Deep Dive: Research and Development 2025