Piraud Team // Health
HELMHOLTZ AI CONSULTANTS @ HELMHOLTZ MUNICH
HEALTH-FOCUSED AI CONSULTANTS
The Helmholtz AI central unit is also the local unit for Health, and includes a team of health-focused consultants. They are key actors in achieving the Helmholtz AI goal of empowering scientists to use AI in their research. For that, they advise and support research teams in using machine learning and deep learning. The consultants master a broad range of methods and tools, and offer help at all stages of the data analysis pipeline, from project conceptualisation to actual implementation. They provide reusable code and technical reports, and strive to enable their scientific collaborators to leverage the methods themselves, by proposing pair programming and code review sessions for example. They also play a key role in disseminating knowledge, by contributing open-source software to the community and proposing trainings adapted to the needs of the Health research community.
The team
SELECTED ONGOING VOUCHER PROJECTS
Our most prominent current challenges.
-
- Challenge: Vitiligo is a skin disease that can be diagnosed and monitored through patient history visual inspection. However, visual inspection is limited when it comes to monitoring treatment response and quantifying disease progression. Clinical images are routinely taken, but can be sometimes difficult to compare due to different technical circumstances. A relatively new tool in Dermatology is the 3D full body scanner, which can almost image the whole skin surface in a highly standardized way and with high quality. Dealing with these full body scans however presents other challenges, including data protection since anonymization is not feasible.
- Approach: We are building an automated approach for real-world hospital settings when data can not be shared between hospitals due to patient confidentiality and privacy. Using full-body scans from the Department of Dermatology and Allergy at the Klinikum Rechts der Isar in Munich, we will first develop an automated method that can detect and segment vitiligo using Deep Learning. We will then use the Swarm Learning framework developed by our collaborators at DZNE for decentralized training of the final model to enable real-world collaboration among hospitals. As a first use case, we will then train the algorithm with full-body scans from the Dermatology Department of the university hospital Erlangen.
- Consultants: Gerome Vivar, Florian Kofler
-
- Challenge: Neuromuscular organoids capture key morphological and functional features of the in vivo tissue at unprecedented level, including contraction. Analysing the contractile activity of neuromuscular organoids will help us unravel the mechanisms of interaction between these tissues during development and disease. Establishment of such an assay will also allow us to use neuromuscular organoids as tools for screening and developing effective drugs and treatments for neuromuscular disorders and diseases, such as Spinal Muscular Atrophy (SMA). To this end, we aim to develop a tool for analysing the contraction of organoids, captured in video recordings. To gain valuable insights from these recordings, our tool is designed to accurately address challenges such as disregarding organoid drift in the extracted signal. Once the signals are successfully extracted, our analysis will concentrate on feature extraction, as well as univariate and bivariate assessments of these features, to understand the behaviour of organoids under varying conditions.
- Approach: We developed a stable pipeline for extracting and analysing organoid contraction signals from video data, split into two parts. The first part focuses on the signal extraction from the video; in the second, the signal is processed and features are extracted, for analysing the type of contraction. Moreover, we offer an interactive tool that enables experts to focus on the insights gained from the data, while the time series extraction and analysis, together with the technical details, are automatically executed. As a result, the functional profile of neuromuscular organoids is generated, facilitating better understanding of the organoid properties under different biological scenaria.
- Consultants: Donatella Cea, Christina Bukas
-
-
Challenge: Recent advances in scRNA-seq models provide the possibility to investigate the transcriptomic profile of individual patients on single-cell resolution. They thus are promising leads to new forms of, for example, biomarker generation. However, scRNA-seq data used for such analyses is prone to biases, and when used for biomarker development or other application areas, the insights will be too.
-
Approach: We apply the embedded ethics approach to a scRNA seq model project by our partners at Helmholtz Munich. By accompanying their research, we gain an understanding of the potential ethical issues in the context of scRNA-seq in this nascent field. Together with our partner, we work on creating metrics to control for and mitigate the issues we find.
-
Consultants: Theresa Willem
-
-
- Challenge: Manual quantification of the lung organoids is considered a huge bottleneck for higher throughput analysis. Automatising the process of lung organoid quantification can significantly reduce the time required for expert annotators to manually quantify the organoids. The resulting approach should be able to count the number of organoids on a plate and extract properties regarding their size and shape. Such systems can be further used for perturbation studies as well as drug screening and testing.
-
Approach: We developed a robust deep learning pipeline to detect lung organoids by implementing the Faster-RCNN object detection model, along with a plugin for an image visualisation and analysis tool called Napari, which allows users to easily run the algorithm, validate results and extract useful features. Our model was trained on a dataset of more than 40,000 organoids, outperforming our original approach, based on traditional image processing techniques. The plugin along with a user manual was delivered to the collaborators, allowing for an easy introduction into the usage of the new software.
-
Consultants: Harshavardhan Subramanian, Christina Bukas, Florian Kofler
-
- Challenge: Gene coexpression networks and gene modules derived from them are a popular approach to represent gene expression data, allowing to compare and integrate data sets from different platforms and labs and enabling downstream functional analyses at various levels of cellular processes to understand molecular disease characteristics. However, a correlation-based approach to coexpression analysis fails e.g. due to covariate effects or due to dropouts in single-cell measurements.
- Approach: Applying machine learning, we obtain robust models of gene expression, which explicitly address these problems. Furthermore, we exploit these models to automatize the coexpression analysis as much as possible, replacing manual cut-off selections by data-driven decisions.
- Consultants: Elisabeth Georgii
SELECTED COMPLETED VOUCHER PROJECTS
Project highlights of our previous work.
-
- Challenge: CRISPR interference (CRISPRi), the targeting of a catalytically dead Cas protein to block transcription, is the leading technique to silence gene expression in bacteria. Genome-scale CRISPRi essentiality screens provide one data source from which rules for guide design can be extracted. However, depletion confounds guide efficiency with effects from the targeted gene.
- Approach: Together with our collaborators from Helmholtz Centre for Infection Research, we could show that depletion can be predicted using machine learning models and a combination of guide and gene features, with expression of the target gene having an outsized influence. Further, integrating data across independent CRISPRi screens improves performance. We develop a mixed-effect random forest regression model that learns from multiple datasets and isolates effects manipulable in guide design, and apply methods from explainable AI to infer interpretable design rules. Our method outperforms the state-of-the-art in predicting depletion in an independent saturating screen targeting purine biosynthesis genes in Escherichia coli. Our approach provides a blueprint for the development of predictive models for CRISPR technologies in bacteria.
- This project was published in: Yu, Y., Gawlitt, S., Barros de Andrade e Sousa, L. , Merdivan, E., Piraud, M., Beisel, C. L., & Barquist, L. (2024). Improved prediction of bacterial CRISPRi guide efficiency from depletion screens through mixed-effect machine learning and data integration. Genome Biology, 25(1), 13.
- Consultants: Lisa Barros de Andrade e Sousa, Erinc Merdivan
-
- Challenge: Cell migration is central to many physiological and pathological processes such as embryonic development, wound repair, and tumor metastasis. Boyden Chamber assay is the most widely accepted cell migration technique for the characterization of cell motility. Cell motility is quantified by counting the cell numbers in the microscopic images. Such images normally contain many cells and therefore counting manually is quite time consuming, laborious and error prone.
- Approach: An automatic cell counter algorithm is provided to count crystal violet cell numbers in 2D microscopic images. In addition, a graphical user interface is also implemented for further manual correction of the automatic results. Our solution permits to speed up the analysis in cell migration experiments by a factor of 10!
- This project was published in: Delbrouck, C., Kiweler, N., Chen, O., Pozdeev, V. I., Haase, L., Neises, L., ..., Shen, R., …, Piraud, M., … & Meiser, J. (2023). Formate promotes invasion and metastasis in reliance on lipid metabolism. Cell Reports, 42(9).
- The tool is available on GitHub here.
- Consultants: Ruolin Shen
-
- Challenge: DNA methylation analysis by sequencing is becoming increasingly popular, yielding methylomes at single-base pair and single-molecule resolution. It has tremendous potential for cell-type heterogeneity analysis using intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, systematic evaluation has not been performed yet.
- Approach: Together with our collaboration partners from the German Cancer Research Center, we thoroughly benchmark six previously published methods: Bayesian epiallele detection, DXM, PRISM, csmFinder+coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman, as a comparison group. We found that array-based methods, both reference-based and reference-free, generally outperformed sequencing-based methods, despite the absence of read-level information. This implies that the current sequencing-based methods still struggle with correctly identifying cell-type-specific signals and eliminating confounding methylation patterns, which needs to be handled in future studies.
- This project was published in: Jeong, Y., Barros de Andrade e Sousa, L., Thalmeier, D., Toth, R., Ganslmeier, M., Breuer, K., ... & Lutsik, P. (2022). Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes. Briefings in bioinformatics, 23(4), bbac248.
- Consultants: Lisa Barros de Andrade e Sousa, Dominik Thalmeier
-
- Challenge: While genetically encoded reporters are common for fluorescence microscopy, equivalent multiplexable gene reporters for electron microscopy (EM) are still scarce. By installing a variable number of fixation-stable metal-interacting moieties in the lumen of encapsulin nanocompartments of different sizes, a suite of spherically symmetric and concentric barcodes (EMcapsulins) was developed that are readable by standard EM techniques. After imaging six different classes of such EMcapsulins in drosophila and mice brains using 2D EM, the collaborator contacted us to segment and quantify them because their existing two-stage segmentation+classification pipeline failed to correctly segment and classify many EMcapsulins.
- Approach: We implemented an end-to-end hierarchical multi-class segmentation U-Net to segment and classify EMcapsulins within a single step. The U-Net significantly outperformed the existing two-step segmentation and classification pipeline, possibly due to its ability to also encode contextual information. Further, instance-wise segmentation quality metrics were implemented. This enabled segmentation and quantification of the EMcapsulins, so the collaborator could qualitatively and quantitatively demonstrate their usefullnes.
- This project was published in: Sigmund, F., Berezin, O., Beliakova, S., Magerl, B., Drawitsch, M., Piovesan, A., …, Kofler, F., Piraud, M., ... & Westmeyer, G. G. (2023). Genetically encoded barcodes for correlative volume electron microscopy. Nature Biotechnology, 1-12.
- The segmentation model is available on GitHub here.
- Consultants: Florian Kofler
-
- Challenge: Databases of protein-ligands complex structures play an essential role in machine learning research supporting drug discovery. However, available ligand structures are not quantum chemically refined, resulting in ligands with inaccurate 3D structures, protonation, and charges. The new database includes ligands with accurate 3D structure, increasing the data quality for 3D Deep Learning models in drug discovery.
- Approach: We integrated three quantum chemically refined databases of protein ligands and their physicochemical properties generated by our partners at Helmholtz Munich. We created benchmarks for graph-level predictions for ligand properties and node-level predictions for protein adaptability (how much protein changes its shape when the ligand is bound) using graph neural networks to promote the enhanced database among the AI community.
- This project was published in: Siebenmorgen, T., Menezes, F., Benassou, S., Merdivan, E., Kesselheim, S., Piraud, M., ... & Popowicz, G. M. (2023). MISATO-Machine learning dataset of protein-ligand complexes for structure-based drug discovery. bioRxiv, 2023-05.
- Consultants: Erinc Merdivan
-
- Challenge: Spinocerebellar ataxias (SCAs) are rare, autosomal dominantly inherited neurological diseases with onset in adult age. The most common SCAs, SCA1, 2, 3 and 6, together account for more than half of all affected families worldwide. Clinical hallmarks are progressive loss of balance and coordination, accompanied by slurred speech. Patients affected by SCA suffer substantial restrictions of mobility and communicative skills. Predicting the disease progression from genetic features, demographic information and the current status of neurological symptoms paves the way toward potential stratification markers and is important for anticipating optimal windows regarding the start of preventive treatments.
- Approach: Using a multi-cohort data set with clinical time courses of different established neurological scales, comprising a total of 39 single items, we trained predictive models by regularized Cox regression and survival forests. For each of the most common SCAs, we extracted relevant features and characterized its progression with respect to the multitude of neurological symptoms. The loss of the ability of free walking is a transition of high clinical impact and was analyzed in detail to support future monitoring and decision making.
- Consultants: Elisabeth Georgii
This project was published in:Georgii, E., Klockgether, T., Jacobi, H., Schmitz-Hubsch, T., ..., Piraud, M., & Faber J. (2024) Modeling disease progression in spinocerebellar ataxias. medRxiv, 2024-05
Softwares and resources
Created for the community and beyond.
-
Oligo Designer Toolsuite (abbrev. oligos) are short, synthetic strands of DNA or RNA that have many application areas, ranging from research to disease diagnosis or therapeutics, and need to be designed individually based on the intended application and experimental design. We developed the Oligo Designer Toolsuite, which is a collection of modules that provide all basic functionalities for custom oligo design pipelines within a flexible Python framework. All modules have a standardized I/O format and can be combined individually depending on the required processing steps, like the generation of custom-length oligo sequences, the filtering of oligo sequences based on thermodynamic properties as well as the selection of an optimal set of oligos.
-
This tool is available on GitHub here.
-
The implemented oligo design pipeline for SCRINSHOT, Merfish and SeqFISH probes were published alongside the following paper: Kuemmerle, L. B., Luecken, M. D., Firsova, A. B., Barros de Andrade e Sousa, L., Straßer, L., Heumos, L., Mekki, I., …, Piraud, M., ... & Theis, F. J. (2022). Probe set selection for targeted spatial transcriptomics. bioRxiv, 2022-08.
-
Consultants: Lisa Barros de Andrade e Sousa, Isra Mekki, Francesco Campi
-
-
High quality labeled datasets are critical for advances in biology and medicine. However, the labelling process is often intensive and time consuming and therefore, our collaborators often have very few annotated data. By developing a data-centric platform for microscopy images we try to minimise the time experts need to fully annotate the data and disentangle ourselves from the train-correct-train loop. We do this by first applying baseline models (such as CellPose), which can already give meaningful output. Then, we fine tune them by allowing the user to check, confirm or fix the model-produced annotations. The interface to perform this is as simple as it gets, tailored to each collaborator and it requires no prior knowledge of machine or deep learning methods.
-
This tool is available on GitHub here.
-
Consultants: Christina Bukas, Helena Pelin
-
-
Standard explainability methods for Random Forest (RF) models, like permutation feature importance, are commonly used to pinpoint the individual contribution of features to the model performance but often miss the role of correlated features or feature interactions in the model’s decision-making process. The Forest-Guided Clustering algorithm computes feature importance based on subgroups of instances that follow similar decision paths within the RF model, thus focusing on pattern-driven rather than performance-driven importance. By doing so, our method avoids the misleading interpretation of correlated features, allows the detection of feature interactions and gives a sense for the generalizability of identified patterns.
-
This tool is available on GitHub here.
-
Consultants: Lisa Barros de Andrade e Sousa, Dominik Thalmeier, Helena Pelin
-
-
Oftentimes, in machine learning (ML) projects, researchers tend to focus on the ML code. As the project grows, many challenges arise, such as difficulties in communication, collaboration, and tracking of the experiments. This can lead to lack of reproducibility of the results and difficulties in deploying the pipelines. Through Quicksetup-ai, we propose a flexible template as a quick setup for deep learning projects in research. The template combines established and widely used tools and libraries to provide a clean, simple and reusable baseline with a wide range of features.
-
This template is available on GitHub here.
-
Consultants: Isra Mekki, Gerome Vivar
-
-
PySDDR combines the interpretability of a statistical model with the prediction power of deep neural networks in an easy-to-use python package. It is the python implementation of the Semi-Structured Deep Distributional Regression (SDDR) framework which enhances Generalized Additive Models (GAMs) with neural networks. This extends the use of GAMs to model high-dimensional nonlinear patterns in the data and, furthermore, to be applied to multimodal data (e.g. a combination of tabular and image data). The framework is written in PyTorch and accepts any number of neural networks, of any type (FC, CNN, LSTM, ...).
-
This tool is available on GitHub here.
-
This project was published in: Rügamer, D., Kolb, C., Fritz, C., Pfisterer, F., Kopper, P., Bischl, B., Shen, R., Bukas, C., Barros de Andrade e Sousa, L., Thalmeier, D., …., Klein, N., & Müller, C. L. (2023). deepregression: A Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression. Journal of Statistical Software, 105(2), 1–31.
-
Consultants: Christina Bukas, Lisa Barros de Andrade e Sousa, Dominik Thalmeier, Ruolin Shen
-
Teaching offers
-
In this course, we discuss the reasons why explainability is important and we introduce several model-agnostic and model-specific methods for tabular data, images and 1D data like text or signals. This course was held at the following events: Summer Academy 2022, ml4earth 2022, MALTAomics Summer School 2023, Summer Academy 2023, Forschungszentrum Jülich Training 2023
-
Consultants: Lisa Barros de Andrade e Sousa, Christina Bukas, Francesco Campi, Donatella Cea, Elisabeth Georgii, Isra Mekki, Helena Pelin, Theresa Willem;
-
In this workshop we want to share tips on prompt engineering in order to obtain better outputs from ChatGPT and be aware of potential pitfalls and biases. We split the workshop in a writing part, were we focus on how to use this tool for writing email, papers, brainstorm idea; and a coding part to produce new code, review, document and test existing code (in python). This course was held multiple times at Helmholtz Munich, at the Helmholtz Centre for Infection Research, and at the Max Delbrück Center.
-
Consultants: Donatella Cea, Erinc Merdivan, Isra Mekki;
-
-
In this challenge, we encourage participants to experiment with algorithmic solutions to synthesize 3D healthy brain tissue in the area affected by glioma. We cast the problem of synthesizing healthy tissue as inpainting within the tumor area. This Challenge was held at the following events: MICCAI 2023
-
Consultants: Florian Kofler
FEATURED PUBLICATIONS
-
- Bukas, C., Jian, B., Rodríguez Venegas, L. F., De Benetti, F., Rühling, S., Sekuboyina, A., ..., Piraud, M., … & Wendler, T. (2021). Patient-specific virtual spine straightening and vertebra inpainting: an automatic framework for osteoplasty planning. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24 (pp. 529-539). Springer International Publishing.
- Bhatia, H. S., Brunner, A. D., Öztürk, F., Kapoor, S., Rong, Z., Mai, H., ..., Kofler, F., ... & Ertürk, A. (2022). Spatial proteomics in three-dimensional intact specimens. Cell, 185(26), 5040-5058.
- Thalmeier, D., Miller, G., Schneltzer, E., Hurt, A., Hrabě deAngelis, M., Becker, L., ... & Maier, H. (2022). Objective hearing threshold identification from auditory brainstem response measurements using supervised and self-supervised approaches. BMC neuroscience, 23(1), 1-24.
- Kuemmerle, L. B., Luecken, M. D., Firsova, A. B., Barros de Andrade e Sousa, L., Straßer, L., Heumos, L., Mekki, I., …, Piraud, M., ... & Theis, F. J. (2022). Probe set selection for targeted spatial transcriptomics. bioRxiv, 2022-08.
- Kofler, F., Ezhov, I., Fidon, L., Horvath, I., de la Rosa, E., LaMaster, J., …, Piraud, M., ... & Menze, B. (2022). Deep Quality Estimation: Creating Surrogate Models for Human Quality Ratings. arXiv preprint arXiv:2205.10355.
- Buchner, J. A., Kofler, F., Etzel, L., Mayinger, M., Christ, S. M., Brunner, T. B., ... & Peeken, J. C. (2023). Development and external validation of an MRI-based neural network for brain metastasis segmentation in the AURORA multicenter study. Radiotherapy and Oncology, 178, 109425.
- Buchner, J. A., Peeken, J. C., Etzel, L., Ezhov, I., Mayinger, M., Christ, S. M., …, Piraud, M., ... & Kofler, F.(2023). Identifying core MRI sequences for reliable automatic brain metastasis segmentation.Radiotherapy and Oncology, 188, 109901.
- Mai, H., Luo, J., Hoeher, L., Al-Maskari, R., Horvath, I., Chen, Y., …, Kofler, F., Piraud, M., ... & Ertürk, A. (2023). Whole-body cellular mapping in mouse using standard IgG antibodies. Nature Biotechnology, 1-11.
- Sigmund, F., Berezin, O., Beliakova, S., Magerl, B., Drawitsch, M., Piovesan, A., ... , Kofler, F., Piraud, M., … & Westmeyer, G. (2023). Genetically encoded barcodes for correlative volume electron microscopy. Nature Biotechnology, 1-12.
- Bukas, C., Galter, I., da Silva-Buttkus, P., Fuchs, H., Maier, H., Gailus-Durner, V., ... Piraud, M. & Spielmann, N. (2023). Echo2Pheno: a deep-learning application to uncover echocardiographic phenotypes in conscious mice. Mammalian Genome, 1-16.
- Rügamer, D., Kolb, C., Fritz, C., Pfisterer, F., Kopper, P., Bischl, B., Shen, R., Bukas, C., Barros de Andrade e Sousa, L., Thalmeier, D., ... & Müller, C. L. (2023). Deepregression: a flexible neural network framework for semi-structured deep distributional regression. Journal of Statistical Software, 105(2), 1–31.
- Kofler, F., Wahle, J., Ezhov, I., Wagner, S. J., Al-Maskari, R., Gryska, E., ... & Piraud, M. (2023). Approaching peak ground truth. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI)(pp. 1-6). IEEE.
- Kofler, F., Shit, S., Ezhov, I., Fidon, L., Horvath, I., Al-Maskari, R., ..., Piraud, M., … & Menze, B. (2023). blob loss: instance imbalance aware loss functions for semantic segmentation. International Conference on Information Processing in Medical Imaging.
- Delbrouck, C., Kiweler, N., Chen, O., Pozdeev, V. I., Haase, L., Neises, L., ..., Shen, R., …, Piraud, M., … & Meiser, J. (2023). Formate promotes invasion and metastasis in reliance on lipid metabolism. Cell Reports, 42(9).
- Jeong, Y., Barros de Andrade e Sousa, L., Thalmeier, D., Toth, R., Ganslmeier, M., Breuer, K., ... & Lutsik, P. (2022). Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes. Briefings in bioinformatics, 23(4), bbac248.
- Yu, Y., Gawlitt, S., Barros de Andrade e Sousa, L. , Merdivan, E., Piraud, M., Beisel, C. L., & Barquist, L. (2024). Improved prediction of bacterial CRISPRi guide efficiency from depletion screens through mixed-effect machine learning and data integration. Genome Biology, 25(1), 13.
- Cerda-Jara, C. A., Kim, S. J., Thomas, G., Farsi, Z., Zolotarov, G., Dube, G., Deter, A., Bahry, E.,Georgii, E., ... & Rajewsky, N. (2024). miR-7 controls glutamatergic transmission and neuronal connectivity in a Cdr1as-dependent manner. EMBO reports, 1-32
-
Bukas, C., Albrecht, F., Ur-Rehman, M. S., Popek, D., Patalan, M., Pawłowski, J., Wecker, B., Landsch, K., Golan, T., Kowalczyk, T., Piraud, M., Ende, S. S. W. (2024) Robust deep learning based shrimp counting in an industrial farm setting. Journal of Cleaner Production 468, 143024.
-
Oestreich, M., Merdivan, E., Lee, M., Schultze, J. L., Piraud, M., & Becker, M. (2024). DrugDiff-small molecule diffusion model with flexible guidance towards molecular properties. bioRxiv, 2024-07.
-
van der Weg, K., Merdivan, E., Piraud, M., & Gohlke, H. (2024). TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks. bioRxiv, 2024-01.
-
Siebenmorgen, T., Menezes, F., Benassou, S., Merdivan, E., Didi, K., Mourão, A. S. D., ..., Piraud, M., ... & Popowicz, G. M. (2024). MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery. Nature Computational Science 4, 367–378.