Dr. Kai Dührkop präsentiert die Visualisierung eines Datensatzes mit der Software CANOPUS.

Search engine for metabolites

Bioinformaticians develop a new method to analyse metabolites​
Dr. Kai Dührkop präsentiert die Visualisierung eines Datensatzes mit der Software CANOPUS.
Image: Jens Meyer (University of Jena)

The metabolism of every organism – from unicellular microbes to the complex human system – produces thousands of chemical compounds. As these molecules are the starting, intermediate and end products of chemical processes, they can provide information about the physiological state of living beings and their organs, tissues and cells. For this to work, however, these molecules (metabolites) actually have to be detectable. Such analyses involved an extremely high degree of complexity, because it was only possible to clearly identify metabolites whose structures were already known. However, bioinformaticians at the University of Jena are now using artificial intelligence methods to detect all metabolites in a sample – even those which are unknown.


By Sebastian Hollstein

Everything that lives has metabolites, produces metabolites and consumes metabolites. They can be used as ›chemical markers‹ to detect diseases or investigate drinking water samples, to name just a few of their applications. However, the diversity of these chemical compounds causes difficulties in scientific research. Scientists have only managed to identify and define the structure of a relatively small number of molecules. Therefore, whenever a sample is analysed in the laboratory, only a relatively small part of it can be identified, while the majority of molecules remain unknown.

A team of bioinformaticians at the University of Jena have been working with their colleagues from Finland and the USA to develop a unique method with which all metabolites in a sample can be taken into account, thus considerably increasing the knowledge gained from examining such molecules. The team reports on its successful research in the renowned scientific journal ›Nature Biotechnology‹.

»Mass spectrometry, one of the most widely used experimental methods for analysing metabolites, identifies only those molecules that can be uniquely assigned by matching them against a database. All other, previously unknown, molecules contained in the sample do not provide much information,« explains Prof. Sebastian Böcker from the University of Jena. »With our newly developed method, called CANOPUS, however, we also obtain valuable insight from the unidentified metabolites in a sample, as we can assign them to existing compound classes.«

CANOPUS works in two phases: first, the method generates a ›molecular fingerprint‹ from the fragmentation spectrum measured by means of mass spectrometry. This contains information about the structural properties of the measured molecule. In the second phase, the method uses the molecular fingerprint to assign the metabolite to a specific compound class without having to identify it.

Analysis simplified by two-stage learning process

»Machine learning methods usually require large amounts of data in order to be trained. In contrast, our two-stage process makes it possible in the first step to train on a comparatively small amount of data of tens of thousands of fragmentation mass spectra. Then, in the second step, the characteristic structural properties that are significant for a compound class can be determined from millions of structures,« explains Dr Kai Dührkop from the University of Jena.

The system therefore identifies these structural properties in an unknown molecule within a sample and then assigns it to a specific compound class. »This information alone is suffi­cient to answer many important questions,« Böcker emphasises. »The precise identification of a metabolite would be far more complex and is often not necessary at all.« The CANOPUS method uses a deep neural network predicting around 2,500 compound classes.

The bioinformaticians at the University of Jena have already used their method to compare the intestinal flora of mice in a study where a test group had been treated with antibiotics. Their experiments provide information as to which classes of substances are produced by the mouse itself and which by its intestinal flora. Their research results may provide important insights into the human digestive and metabolic systems. The study also presented two more possible applications of the new method, which further proves its functionality and informative value.

Jena’s molecular search engine used millions of times

With the new method, the bioinformaticians from Jena are expanding the possibilities of the search engine for molecular structures – ›CSI: FingerID‹ – which they have been making available to the international research community for around five years. This service is now used thousands of times a day by researchers looking to compare a mass spectrum from a sample with various online databases in order to determine a metabolite with greater precision. Over a hundred million requests have been submitted.

The new process strengthens the field of metabolomics – that is, research on these omnipresent small molecules – and increases its potential in many research areas, such as pharmaceuticals. Many active pharmaceutical substances in use for decades are metabolites; others could be developed with their help.

Information

Original Publication:

Systematic classification of unknown meta­bolites using high-resolution fragmentation mass spectra, Nature Biotechnology (2020), DOI: 10.1038/s41587-020-0740-8External link

Contact:

Sebastian Böcker, Univ.-Prof. Dr
vCard
Room 3405
Ernst-Abbe-Platz 1-2
07743 Jena Google Maps site planExternal link