AI-Based Precision Pathology: Learning with Small Amounts of Data

Improving Indicators for Tumor Diagnosis

Cellular building blocks as the basis of histopathological diagnostics.
© Fraunhofer MEVIS
Our vision is to develop data-driven methods to capture the fundamental cellular features considered in modern histomorphological diagnostics. In this way, we aim to accelerate the development of better algorithms for clinical and research purposes.

Biomarkers are often indispensable for accurate diagnoses. The Fraunhofer Institute for Digital Medicine MEVIS is working on adaptive algorithms that make the search for new biomarkers much easier. This can provide physicians with valuable support in choosing the best possible therapy.


Biomarkers make up an important building block of diagnostics. One common example is the cholesterol level in blood, which can suggest an increased risk of cardiovascular disease. Biomarkers also play a significant role in pathology, the microscopic examination for example of suspected tumor tissue samples. If particular cell types with certain combinations of properties are present, this can be considered a meaningful indicator that ideally reveals which subtype of tumor is present. This lets physicians select a targeted treatment that is effective for each individual patient.

In clinical practice, however, this accurate procedure does not always work. A study from 2018, for example, revealed that 44 percent of cancer patients in the United States should be considered for immunotherapy, a special type of tumor treatment. In reality, however, this treatment was only effective in 12 percent of patients, meaning that many were treated in vain. “To provide more targeted therapy in the future, we need to be able to subdivide tumor types much more precisely than at the present,” says MEVIS researcher Johannes Lotz. “We need to discover the biomarkers that deliver this differentiation.”

Detecting such biomarkers demands extensive clinical studies. The adaptive AI systems of the future will assist in this search. “The computer analyzes digitalized tissue sections and looks through them for patterns,” says Lotz’s colleague Henning Höfener when describing the strategy. “This enables it to find new biomarkers.” The software must be trained with as many sets of high-quality data as possible, otherwise the search for patterns will be unsuccessful. This creates a problem: The more precise the differentiation between different tumor subtypes, the fewer patients exhibit a certain subtype and the fewer data sets are available for training and analysis.


The Best of Two Worlds

Another difficulty: “The appearance of digital tissue sections can vary significantly from laboratory to laboratory,” explains Lotz. “This makes it difficult for the computer to detect existing patterns in the images reliably.” These problems can hardly be overcome using previous AI methods, in which the algorithms sift through vast quantities of pixels. That’s why the MEVIS team is attempting a new strategy and is taking a look at some of the tried-and-true ways in which humans work. “Experienced pathologists have seen thousands and thousands of tissue images and derive the essential laws from them,” explains Höfener. “Unlike AI, a handful of images is typically sufficient for an accurate diagnosis.”

The MEVIS team wants to bring the best of both worlds together. Their plan is to train the artificial intelligence as if it were a pathologist, using many tissue images unrelated to a specific inquiry. Thanks to this “basic training,” the AI acquires knowledge about general characteristics and relations, so-called tissue descriptors. With their help, the machine can describe and classify the images. “If the algorithm then encounters a specific problem, it can use tissue descriptors to find correlations. Even with relatively few data, this could predict, for example, the success of a certain therapy,” says Höfener.

The project is still in its beginning phase, but Fraunhofer MEVIS is well prepared to master it successfully. “We have a lot of experience training adaptive algorithms and programming computer-aided diagnostic tools,” emphasizes Henning Höfener. “In cooperation with our clinical project partners, we can select the data we need for initial algorithm training.”


AI Accelerates Automatic Detection

These biomarker algorithms will benefit the research departments of pharmaceutical companies, but also university workgroups. The approach promises additional applications, however. For example, tissue descriptors could be used for segmentation, the automatic recognition and measurement of an image’s tissue structures. “The descriptors also reduce the amount of training data required in this case,” says Höfener. “This could significantly accelerate adaptation of the segmentation algorithm.”

Content-based image retrieval could also benefit from this method. “Many are familiar with a similar feature on Google, where an image can be uploaded and similar ones are displayed,” explains Johannes Lotz. “For medical images, however, this is not yet as reliable.” The MEVIS experts hope that the descriptor concept could lead to substantial advancements. They envision a system that allows physicians who encounter unusual findings to search databases for similar images to check their diagnoses and discover which therapies have or have not worked in the past.

Learning Universal Tissue Concepts for Patient Stratification

Two-step scheme to learn universal Tissue Concepts for patient stratification.
Tissue Concepts in two steps: 1. basic training based on many different datasets to learn general features, 2. adaptation to a specific diagnostic task. (The neural network diagram is derived from "The neural network zoo" by S. Leijnen and F. van Veen, CC BY 4.0)

Pathologists see many images during training and practice and learn concepts and patterns such as heterogeneity or vessel density that are independent of a particular disease or even independent of individual organs or tissues. Similarly, we will let the computer learn these or similar concepts from a broad collection of data from different issues. Combining images from different organs and issues creates the large data sets needed to train a robust AI system. This step can be described as a type of basic training, built on many different data sets, in which the computer acquires general features and regularities, so-called tissue concepts. 

In a second step, these features are adapted to a specific diagnostic task, such as the separation of a patient collective into responders and non-responders. The diagnosis is then supported based on these features. The tissue concepts are significantly less complex than the original image. In addition, they already contain contextual knowledge from the training step. In this way, significantly less data is required to develop, for example, a biomarker that can distinguish between different tumor types.

Image Registration for Faster AI Development

Image registration is used to transfer molecular markers between histological stains.

Image registration provides a way to automatically generate annotated training data from differently stained tissue sections. For example, in the stain on the right, epithelial cell nuclei are chemically stained (brown, right) and transferred to the standard H&E staining using image registration. In this way, all positively stained nuclei in the left image are automatically annotated, and new samples are available for training. This technique is well tested and has been applied in multiple scientific publications.

Small Populations in Clinical Trials Challenge AI Algorithms in Coping with the Large Variability in Histological Data

A histological section contains thousands of cells that can be evaluated using AI-based methods, but still only reflect the characteristics of one patient.

The key challenge for the use of AI in pathology is collecting sufficiently large amounts of data. Histological images are highly complex and variable. A single image is typically several GB in size and contains many thousands of different cells. Their appearance depends on many factors, such as the specific disease expression and the preparation of the slide.


Small populations in clinical trials face the large variability in histological data.
© Fraunhofer MEVIS
Small patient populations lead to insufficient coverage of the large variability of histological data.

AI methods must extract diagnostically relevant features from this variability. A single tissue section reflects only a tiny fraction of this variability. Therefore, it usually takes large numbers of sample images from different patients and from different laboratories to robustly train AI methods. Collecting sufficiently large amounts of data is very costly, as extensive patient populations must be compiled and the data must often be manually annotated by experts.