Darko Štern

Graz, AT · darko.stern@avl.at

Experience in image and signal analysis with a strong focus on machine learning. After a career shift, my research and professional interest is now focused on machine learning methods for testing and validation of ADAS/AD systems, as well as batteries and fuel-cells. Previos research interests are concentrated around the design and development of algorithms for processing and analysis of three-dimensional (3D) computed tomography (CT) and magnetic resonance (MR) images.

At AVL, we are constantly searching for motivated students interested in doing their master theses on the topic of ADAS/AD, battery, and fuel-cell testing. Please, visit our web page for more information or check the Project section bellow.

Experience

Tehnology Scout for Cognitive Testing

September 2021 - Present
Januar 2020 - September 2021
May 2012 - May 2013

Projects

I am constantly looking for students with a research interest in machine learning, image and signal processing in domains of ADAS/AD, and battery and fuel cell testing. This page lists specific open student projects at the master's or bachelor's level. Please, also check AVL web page.

Evaluation of AI/ML methods for cognitive testing of AD stack

Testing an AD stack in a virtual environment requires a cognitive testing methodology that will go beyond the full factorial variation of the parameters of all possible scenarios. The task of the student is to compare the performance and identify the limitations of Cognitive Testing methods developing at AVL with testing methods available in the literature on a publicly available AD stack (e.g. Autowave and Apollo).

Read more..

    List of all Projects !

    Publications

    List of my publication can also be found at Google Scholar and ReserchGate . If you have any problems accessing our publications, feel free to contact me.

    A Framework for the generation of digital twins of cardiac electrophysiology from clinical 12-leads ECGs

    Karli Gillette, Matthias A.F. Gsell, Anton J. Prassl, Elias Karabelas, Ursula Reiter, Gert Reiter, Thomas Grandits, Christian Payer, Darko Štern, Martin Urschler, Jason D.Bayer, Christoph M. Augustin, Aurel Neic, Thomas Pock, Edward J.Vigmond, Gernot Plank
    Medical Image Analysis (2021)

    Cardiac digital twins (Cardiac Digital Twin (CDT)s) of human electrophysiology (Electrophysiology (EP)) are digital replicas of patient hearts derived from clinical data that match like-for-like all available clinical observations. Due to their inherent predictive potential, CDTs show high promise as a complementary modality aiding in clinical decision making and also in the cost-effective, saf e and ethical testing of novel EP device therapies. However, current workflows for both the anatomical and functional twinning phases within CDT generation, referring to the inference of model anatomy and parameters from clinical data, are not sufficiently efficient, robust and accurate for advanced clinical and industrial applications. Our study addresses three primary limitations impeding the routine generation of high-fidelity CDTs by introducing; a comprehensive parameter vector encapsulating all factors relating to the ventricular EP; an abstract reference frame within the model allowing the unattended manipulation of model parame- ter fields; a novel fast-forward electrocardiogram (Electrocardiogram (ECG)) model for efficient and bio- physically-detailed simulation required for parameter inference. A novel workflow for the generation of CDTs is then introduced as an initial proof of concept. Anatomical twinning was performed within a reasonable time compatible with clinical workflows ( < 4h) for 12 subjects from clinically-attained magnetic resonance images. After assessment of the underlying fast forward ECG model against a gold standard bidomain ECG model, functional twinning of optimal parameters according to a clinically-attained 12 lead ECG was then performed using a forward Saltelli sampling approach for a single subject. The achieved results in terms of efficiency and fidelity demon- strate that our workflow is well-suited and viable for generating biophysically-detailed CDTs at scale.

    Inferring the 3D standing spine posture from 2D radiographs

    Amirhossein Bayat, Anjany Sekuboyina, Johannes C Paetzold, Christian Payer, Darko Štern, Martin Urschler, Jan S Kirschke, Bjoern H Menze
    International Conference on Medical Image Computing and Computer-Assisted Intervention - MICCAI (2020)

    The treatment of degenerative spinal disorders requires an understanding of the individual spinal anatomy and curvature in 3D. An upright spinal pose (i.e. standing) under natural weight bearing is crucial for such bio-mechanical analysis. 3D volumetric imaging modalities (e.g. CT and MRI) are performed in patients lying down. On the other hand, radiographs are captured in an upright pose, but result in 2D projections. This work aims to integrate the two realms, i.e. it combines the upright spinal curvature from radiographs with the 3D vertebral shape from CT imaging for synthesizing an upright 3D model of spine, loaded naturally. Specifically, we propose a novel neural network architecture working vertebra-wise, termed TransVert, which takes orthogonal 2D radiographs and infers the spine’s 3D posture. We validate our architecture on digitally reconstructed radiographs, achieving a 3D reconstruction Dice of 95.52% , indicating an almost perfect 2D-to-3D domain translation. Deploying our model on clinical radiographs, we successfully synthesise full-3D, upright, patient-specific spine models for the first time.

    Uncertainty Estimation in Landmark Localization Based on Gaussian Heatmaps

    Christian Payer, Darko Štern, Horst Bischof, Martin Urschler
    Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Graphs in Biomedical Image Analysis - UNSURE (2020)

    In landmark localization, due to ambiguities in defining their exact position, landmark annotations may suffer from both large inter- and intra-observer variabilites, which result in uncertain annotations. Therefore, predicting a single coordinate for a landmark is not sufficient for modeling the distribution of possible landmark locations. We propose to learn the Gaussian covariances of target heatmaps, such that covariances for pointed heatmaps correspond to more certain landmarks and covariances for flat heatmaps to more uncertain or ambiguous landmarks. By fitting Gaussian functions to the predicted heatmaps, our method is able to obtain landmark location distributions, which model location uncertainties. We show on a dataset of left hand radiographs and on a dataset of lateral cephalograms that the predicted uncertainties correlate with the landmark error, as well as inter-observer variabilities.

    Coarse to Fine Vertebrae Localization and Segmentation with SpatialConfiguration-Net and U-Net

    Christian Payer, Darko Štern, Horst Bischof, Martin Urschler
    International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - VISIGRAPP (2020)

    Localization and segmentation of vertebral bodies from spine CT volumes are crucial for pathological diagnosis, surgical planning, and postoperative assessment. However, fully automatic analysis of spine CT volumes is difficult due to the anatomical variation of pathologies, noise caused by screws and implants, and the large range of different field-of-views. We propose a fully automatic coarse to fine approach for vertebrae localization and segmentation based on fully convolutional CNNs. In a three-step approach, at first, a U-Net localizes the rough position of the spine. Then, the SpatialConfiguration-Net performs vertebrae localization and identification using heatmap regression. Finally, a U-Net performs binary segmentation of each identified vertebrae in a high resolution, before merging the individual predictions into the resulting multi-label vertebrae segmentation. The evaluation shows top performance of our approach, ranking first place and winning the MICCAI 2019 Large Scale Vertebrae Segmentation Challenge (VerSe 2019).

    Variational Inference and Bayesian CNNs for Uncertainty Estimation in Multi-Factorial Bone Age Prediction

    Stefan Eggenreich, Christian Payer, Martin Urschler, Darko Štern
    Medical Imaging Meets NeurIPS - Med-NerIPS (2019)

    Additionally to the extensive use in clinical medicine, biological age (BA) in legal medicine is used to assess unknown chronological age (CA) in applications where identification documents are not available. Automatic methods for age estimation proposed in the literature are predicting point estimates, which can be misleading without the quantification of predictive uncertainty. In our multi-factorial age estimation method from MRI data, we used the Variational Inference approach to estimate the uncertainty of a Bayesian CNN model. Distinguishing model uncertainty from data uncertainty, we interpreted data uncertainty as biological variation, i.e. the range of possible CA of subjects having the same BA.

    Evaluating Spatial Configuration Constrained CNNs for Localizing Facial and Body Pose Landmarks

    Christian Payer, Darko Štern, Martin Urschler
    International Conference on Image and Vision Computing New Zealand - IVCNZ (2019)

    Landmark localization is a widely used task required in medical image analysis and computer vision applications. Formulated in a heatmap regression framework, we have recently proposed a CNN architecture that learns on its own to split the localization task into two simpler sub-problems, dedicating one component to locally accurate but ambiguous predictions, while the other component improves robustness by incorporating the spatial configuration of landmarks to remove ambiguities. We learn this simplification in our SpatialConfiguration-Net (SCN) by multiplying the heatmap predictions of its two components and by training the network in and end-to-end manner, thus achieving regularization similar to e.g. a hand-crafted Markov Random Field model. While we have previously shown localization results solely on data from 2D and 3D medical imaging modalities, in this work our aim is to study the generalization capabilities of our SpatialConfiguration-Net to computer vision problems. Therefore, we evaluate our performance both in terms of accuracy and robustness on a facial alignment task, where we improve upon the state-of-the-art methods, as well as on a human body pose estimation task, where we demonstrate results in line with the recent state-of-the-art.

    Evaluation of algorithms for Multi-Modality Whole Heart Segmentation: An open-access grand challenge

    Xiahai Zhuanga, Lei Li, Christian Payer, Darko Štern, Martin Urschler, Mattias P. Heinrich et al.
    Medical Image Analysis (2019)

    Knowledge of whole heart anatomy is a prerequisite for many clinical applications. Whole heart segmentation (WHS), which delineates substructures of the heart, can be very valuable for modeling and analysis of the anatomy and functions of the heart. However, automating this segmentation can be challenging due to the large variation of the heart shape, and different image qualities of the clinical data. To achieve this goal, an initial set of training data is generally needed for constructing priors or for training. Furthermore, it is difficult to perform comparisons between different methods, largely due to differences in the datasets and evaluation metrics used. This manuscript presents the methodologies and evaluation results for the WHS algorithms selected from the submissions to the Multi-Modality Whole Heart Segmentation (MM-WHS) challenge, in conjunction with MICCAI 2017. The challenge provided 120 three-dimensional cardiac images covering the whole heart, including 60 CT and 60 MRI volumes, all acquired in clinical environments with manual delineation. Ten algorithms for CT data and eleven algorithms for MRI data, submitted from twelve groups, have been evaluated. The results showed that the performance of CT WHS was generally better than that of MRI WHS. The segmentation of the substructures for different categories of patients could present different levels of challenge due to the difference in imaging and variations of heart shapes. The deep learning (DL)-based methods demonstrated great potential, though several of them reported poor results in the blinded evaluation. Their performance could vary greatly across different network structures and training strategies. The conventional algorithms, mainly based on multi-atlas segmentation, demonstrated good performance, though the accuracy and computational efficiency could be limited. The challenge, including provision of the annotated training data and the blinded evaluation for submitted algorithms on the test data, continues as an ongoing benchmarking resource via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/mmwhs/).

    Matwo-CapsNet: A Multi-Label Semantic Segmentation Capsules Network

    Savinien Bonheur, Darko Štern, Christian Payer, Michael Pienn1, Horst Olschewski, Martin Urschler
    International Conference on Medical Image Computing and Computer-Assisted Intervention - MICCAI (2019)

    Despite some design limitations, CNNs have been largely adopted by the computer vision community due to their efficacy and versatility. Introduced by Sabour et al. to circumvent some limitations of CNNs, capsules replace scalars with vectors to encode appearance feature representation, allowing better preservation of spatial relationships between whole objects and its parts. They also introduced the dynamic routing mechanism, which allows to weight the contributions of parts to a whole object differently at each inference step. Recently, Hinton et al. have proposed to solely encode pose information to model such part-whole relationships. Additionally, they used a matrix instead of a vector encoding in the capsules framework. In this work, we introduce several improvements to the capsules framework, allowing it to be applied for multi-label semantic segmentation. More speci cally, we combine pose and appearance information encoded as matrices into a new type of capsule, i.e. Matwo-Caps. Additionally, we propose a novel routing mechanism, i.e. Dual Routing, which e ectively combines these two kinds of information. We evaluate our resulting Matwo-CapsNet on the JSRT chest X-ray dataset by comparing it to SegCaps, a capsule based network for binary segmentation, as well as to other CNN based state-of-the-art segmentation methods, where we show that our Matwo-CapsNet achieves competitive results, while requiring only a fraction of the parameters of other previously proposed methods.

    Automated age estimation from MRI volumes of the hand

    Darko Štern, Christian Payer, Martin Urschler
    Medical Image Analysis (2019)

    Highly relevant for both clinical and legal medicine applications, the established radiological methods for estimating unknown age in children and adolescents are based on visual examination of bone ossification in X-ray images of the hand. Our group has initiated the development of fully automatic age estimation methods from 3D MRI scans of the hand, in order to simultaneously overcome the problems of the radiological methods including (1) exposure to ionizing radiation, (2) necessity to define new, MRI specific staging systems, and (3) subjective influence of the examiner. The present work provides a theoretical background for understanding the nonlinear regression problem of biological age estimation and chronological age approximation. Based on this theoretical background, we comprehensively evaluate machine learning methods (random forests, deep convolutional neural networks) with different simplifications of the image information used as an input for learning. Trained on a large dataset of 328 MR images, we compare the performance of the different input strategies and demonstrate unprecedented results. For estimating biological age, we obtain a mean absolute error of 0.37 ± 0.51 years for the age range of the subjects  ≤  18 years, i.e. where bone ossification has not yet saturated. Finally, we validate our findings by adapting our best performing method to 2D images and applying it to a publicly available dataset of X-ray images, showing that we are in line with the state-of-the-art automatic methods for this task.

    Segmenting and tracking cell instances with cosine embeddings and recurrent hourglass networks

    Christian Payer, Darko Štern, Marlies Feiner, Horst Bischof, Martin Urschler
    Medical Image Analysis (2019)

    Differently to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same object class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time, which is highly relevant, e.g., in biomedical applications involving cell growth and migration. Our network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal information, e.g., from microscopy videos. Moreover, we train our network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos, even in the presence of dynamic structural changes due to mitosis of cells. To create the final tracked instance segmentations, the pixel-wise embeddings are clustered among subsequent video frames by using the mean shift algorithm. After showing the performance of the instance segmentation on a static in-house dataset of muscle fibers from H&E-stained microscopy images, we also evaluate our proposed recurrent stacked hourglass network regarding instance segmentation and tracking performance on six datasets from the ISBI celltracking challenge, where it delivers state-of-the-art results.

    Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization

    Christian Payer, Darko Štern, Horst Bischof, Martin Urschler
    Medical Image Analysis (2019)

    In many medical image analysis applications, only a limited amount of training data is available due to the costs of image acquisition and the large manual annotation effort required from experts. Training recent state-of-the-art machine learning methods like convolutional neural networks (CNNs) from small datasets is a challenging task. In this work on anatomical landmark localization, we propose a CNN architecture that learns to split the localization task into two simpler sub-problems, reducing the overall need for large training datasets. Our fully convolutional SpatialConfiguration-Net (SCN) learns this simplification due to multiplying the heatmap predictions of its two components and by training the network in an end-to-end manner. Thus, the SCN dedicates one component to locally accurate but ambiguous candidate predictions, while the other component improves robustness to ambiguities by incorporating the spatial configuration of landmarks. In our extensive experimental evaluation, we show that the proposed SCN outperforms related methods in terms of landmark localization error on a variety of size-limited 2D and 3D landmark localization datasets, i.e., hand radiographs, lateral cephalograms, hand MRIs, and spine CTs.

    Automatic Age Estimation and Majority Age Classification from Multi-Factorial MRI Data

    Darko Štern, Christian Payer, Nicola Giuliani, Martin Urschler
    IEEE Journal of Biomedical and Health Informatics (2018)

    Age estimation from radiologic data is an important topic both in clinical medicine as well as in forensic applications, where it is used to assess unknown chronological age or to discriminate minors from adults. In this work, we propose an automatic multi-factorial age estimation method based on MRI data of hand, clavicle and teeth to extend the maximal age range from up to 19 years, as commonly used for age assessment based on hand bones, to up to 25 years, when combined with clavicle bones and wisdom teeth. Fusing age-relevant information from all three anatomical sites, our method utilizes a deep convolutional neural network that is trained on a dataset of 322 subjects in the age range between 13 and 25 years, to achieve a mean absolute prediction error in regressing chronological age of 1.01 ± 0.74 years. Furthermore, when used for majority age classification, we show that a classifier derived from thresholding our regression based predictor is better suited than a classifier directly trained with a classification loss, especially when taking into account that cases of minors being wrongly classified as adults need to be minimized. In conclusion, we overcome the limitations of the multi-factorial methods currently used in forensic practice, i.e., dependency on ionizing radiation, subjectivity in quantifying age-relevant information, and lack of an established approach to fuse this information from individual anatomical sites.

    Sparse-View CT Reconstruction Using Wasserstein GANs

    Franz Thaler, Kerstin Hammernik, Christian Payer, Martin Urschler, Darko Štern
    IEEE Journal of Biomedical and Health Informatics (2018)

    We propose a 2D computed tomography (CT) slice image reconstruction method from a limited number of projection images using Wasserstein generative adversarial networks (wGAN). Our wGAN optimizes the 2D CT image reconstruction by utilizing an adversarial loss to improve the perceived image quality as well as an 𝐿1 content loss to enforce structural similarity to the target image. We evaluate our wGANs using different weight factors between the two loss functions and compare to a convolutional neural network (CNN) optimized on 𝐿1 and the Filtered Backprojection (FBP) method. The evaluation shows that the results generated by the machine learning based approaches are substantially better than those from the FBP method. In contrast to the blurrier looking images generated by the CNNs trained on 𝐿1, the wGANs results appear sharper and seem to contain more structural information. We show that a certain amount of projection data is needed to get a correct representation of the anatomical correspondences.

    Instance segmentation and tracking with cosine embeddings and recurrent hourglass networks

    Christian Payer, Darko Štern, Thomas Neff, Horst Bischof, Martin Urschler
    International Conference on Medical Image Computing and Computer-Assisted Intervention - MICCAI (2018)

    Different to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time. The network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal video information. Furthermore, we train the network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos. Afterwards, these embeddings are clustered among subsequent video frames to create the final tracked instance segmentations. We evaluate the recurrent hourglass network by segmenting left ventricles in MR videos of the heart, where it outperforms a network that does not incorporate video information. Furthermore, we show applicability of the cosine embedding loss for segmenting leaf instances on still images of plants. Finally, we evaluate the framework for instance segmentation and tracking on six datasets of the ISBI celltracking challenge, where it shows state-of-the-art performance.

    Integrating geometric configuration and appearance information into a unified framework for anatomical landmark localization

    Martin Urschler, Thomas Ebner, Darko Štern
    Medical Image Analysis (2018)

    In approaches for automatic localization of multiple anatomical landmarks, disambiguation of locally similar structures as obtained by locally accurate candidate generation is often performed by solely including high level knowledge about geometric landmark configuration. In our novel localization approach, we propose to combine both image appearance information and geometric landmark configuration into a unified random forest framework integrated into an optimization procedure that iteratively refines joint landmark predictions by using the coordinate descent algorithm. Depending on how strong multiple landmarks are correlated in a specific localization task, this integration has the benefit that it remains flexible in deciding whether appearance information or the geometric configuration of multiple landmarks is the stronger cue for solving a localization problem both accurately and robustly. Furthermore, no preliminary choice on how to encode a graphical model describing landmark configuration has to be made. In an extensive evaluation on five challenging datasets involving different 2D and 3D imaging modalities, we show that our proposed method is widely applicable and delivers state-of-the-art results when compared to various other related methods.

    Simultaneous multi-person detection and single-person pose estimation with a single heatmap regression network

    Christian Payer, Thomas Neff, Horst Bischof, Martin Urschler, Darko Štern
    ICCV PoseTrack Workshop (2017)

    We propose a two component fully-convolutional network for heatmap regression to perform multi-person pose estimation from images. The first component of the network predicts all body joints of all persons visible on an image, while the second component groups these body joints based on the position of the head of the person of interest. By applying the second component for all detected heads, the poses of all persons visible on an image are estimated. A subsequent geometric frame-by-frame tracker using distances of body joints tracks the poses of all detected persons throughout video sequences. Results on the PoseTrack challenge test set show good performance of our proposed method with a mean average precision (mAP) of 50.4 and a multiple object tracking accuracy (MOTA) of 29.9.