Darko Štern

Graz, AT · stern@icg.tugraz.at

Experience in medical image processing with a strong focus on machine learning. Research interests are concentrated around the design and development of algorithms for processing and analysis of three-dimensional (3D) computed tomography (CT) and magnetic resonance (MR) images. I am also interested in computer vision topics, like segmentation, recognition and reconstruction.


We are constantly looking for students with an interest in medical image analysis, as well as the use of machine learning and computer vision in novel and established clinical and forensic applications. This page lists specific open student projects on a master and bachelor level. Students coming with their own research ideas are also welcome to get in contact! Please be aware that work on students projects is usually not financially covered by our side.

Detection of Infected Teeth in 3D CBCT Images

As a consequence of a bacterial infection, tooth associated infection is very common. Those pathologies are usually located in the surrounding of the root of the teeth. They can vary in diameter from a simple widening of the periodontal space up to several millimeters or more, being completely bone surrounded or perforating the adjacent anatomical borders. Furthermore, they potentially affect each of the around 30 roots per jaw. The manual location of those frequently requires a large amount of work, depending on the number of investigated teeth and the quality of the data set as well as on the education and experience of the doctor doing an examination. The aim of the project is to train deep convolutional neural networks (DCNN) to automatically recognize all the infected teeth in the 3D Cone Beam Computed Tomography (CBCT) image.

Read more..

    Instance Segmentation in Medical Image Applications

    To start answering fundamental questions for understanding how the brain works, we need to look at the brain structure on the cell levels. Reconstruction of cell morphology and building connectivity diagram requires that all instances of neuron cell are segmented. Differently, to semantic segmentation, instance segmentation does not only assign a class label to each pixel of an image but also distinguishes between instances within each class, e.g., each individual cell in an electronic microscopy image gets assigned a unique ID. This work will investigate interesting direction for simultaneous segmentation of all instances by automatically encoding the individual instances as pixel-wise embeddings.

    Read more..

      Rotation Invariant Deep Neural Networks

      Deep convolutional neural networks (DCNN) have recently shown outstanding performance on image classification and object detection tasks due to their powerful multiscale filters. The dominant filters used in building DCNN architectures are only transitionally invariant, which is not optimal when the problem is rotation equivalent, as it is the case in e.g. cells detection and tracking task. Thus, by explicitly encoding the expected rotational invariance of the object in the image, the complexity of the problem is decreased, leading to a reduction in the size of the required model.

      Read more..

        Low-Dose CT Reconstruction Using Deep Learning

        Computed tomography (CT) is a widely used medical imaging modality to generate a volumetric image representing the interior structure of a subject. To reconstruct a three dimensional (3D) CT image, a series of two dimensional (2D) X-ray based projections are acquired from different views of the subject. While the Filtered Backprojection (FBP) method yields an analytical solution to reconstruct a 3D CT image from these 2D X-ray projections, it also relies on a large number of them which correlates to the amount of ionizing radiation the subject is exposed to. In order to decrease the amount of ionizing radiation and consequently the subject’s risk to develop cancer, new CT reconstruction approaches that yield a decent image quality even from a low radiation dose need to be investigated. Recent research in low-dose CT reconstruction employed deep convolutional neural networks (CNNs) to find low-dose solutions to this problem. The goal of this project is to explore and evaluate deep learning based low-dose CT reconstruction approaches to investigate new solutions to this demanding problem.

        Read more..

          Bayesian Deep Learning in Medical Imaging

          The application of Bayesian theory to the deep learning framework recently has attracted the attention of both the computer vision and medical imaging community and is a currently growing field of research. By extending the mathematically grounded theory of neural networks with Bayesian theory, the ability to capture the uncertainty present in the data the model’s weights is gained. With this, not only comparable performance to current state-of-the-art results in applications like classification, segmentation, and regression, can be reached, but also the quality of the predictions can be assessed by their predictive uncertainty. The ability to reason about the data and model uncertainty [1,2] is of crucial importance for many applications that are related to decision making.

          Read more..

            List of all Projects !


            List of my publication can also be found at Google Scholar and ReserchGate . If you have any problems accessing our publications, feel free to contact me.

            Matwo-CapsNet: A Multi-Label Semantic Segmentation Capsules Network

            Savinien Bonheur, Darko Štern, Christian Payer, Michael Pienn1, Horst Olschewski, Martin Urschler
            International Conference on Medical Image Computing and Computer-Assisted Intervention - MICCAI (2019)

            Despite some design limitations, CNNs have been largely adopted by the computer vision community due to their efficacy and versatility. Introduced by Sabour et al. to circumvent some limitations of CNNs, capsules replace scalars with vectors to encode appearance feature representation, allowing better preservation of spatial relationships between whole objects and its parts. They also introduced the dynamic routing mechanism, which allows to weight the contributions of parts to a whole object differently at each inference step. Recently, Hinton et al. have proposed to solely encode pose information to model such part-whole relationships. Additionally, they used a matrix instead of a vector encoding in the capsules framework. In this work, we introduce several improvements to the capsules framework, allowing it to be applied for multi-label semantic segmentation. More speci cally, we combine pose and appearance information encoded as matrices into a new type of capsule, i.e. Matwo-Caps. Additionally, we propose a novel routing mechanism, i.e. Dual Routing, which e ectively combines these two kinds of information. We evaluate our resulting Matwo-CapsNet on the JSRT chest X-ray dataset by comparing it to SegCaps, a capsule based network for binary segmentation, as well as to other CNN based state-of-the-art segmentation methods, where we show that our Matwo-CapsNet achieves competitive results, while requiring only a fraction of the parameters of other previously proposed methods.

            Automated age estimation from MRI volumes of the hand

            Darko Štern, Christian Payer, Martin Urschler
            Medical Image Analysis (2019)

            Highly relevant for both clinical and legal medicine applications, the established radiological methods for estimating unknown age in children and adolescents are based on visual examination of bone ossification in X-ray images of the hand. Our group has initiated the development of fully automatic age estimation methods from 3D MRI scans of the hand, in order to simultaneously overcome the problems of the radiological methods including (1) exposure to ionizing radiation, (2) necessity to define new, MRI specific staging systems, and (3) subjective influence of the examiner. The present work provides a theoretical background for understanding the nonlinear regression problem of biological age estimation and chronological age approximation. Based on this theoretical background, we comprehensively evaluate machine learning methods (random forests, deep convolutional neural networks) with different simplifications of the image information used as an input for learning. Trained on a large dataset of 328 MR images, we compare the performance of the different input strategies and demonstrate unprecedented results. For estimating biological age, we obtain a mean absolute error of 0.37 ± 0.51 years for the age range of the subjects  ≤  18 years, i.e. where bone ossification has not yet saturated. Finally, we validate our findings by adapting our best performing method to 2D images and applying it to a publicly available dataset of X-ray images, showing that we are in line with the state-of-the-art automatic methods for this task.

            Segmenting and tracking cell instances with cosine embeddings and recurrent hourglass networks

            Christian Payer, Darko Štern, Marlies Feiner, Horst Bischof, Martin Urschler
            Medical Image Analysis (2019)

            Differently to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same object class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time, which is highly relevant, e.g., in biomedical applications involving cell growth and migration. Our network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal information, e.g., from microscopy videos. Moreover, we train our network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos, even in the presence of dynamic structural changes due to mitosis of cells. To create the final tracked instance segmentations, the pixel-wise embeddings are clustered among subsequent video frames by using the mean shift algorithm. After showing the performance of the instance segmentation on a static in-house dataset of muscle fibers from H&E-stained microscopy images, we also evaluate our proposed recurrent stacked hourglass network regarding instance segmentation and tracking performance on six datasets from the ISBI celltracking challenge, where it delivers state-of-the-art results.

            Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization

            Christian Payer, Darko Štern, Horst Bischof, Martin Urschler
            Medical Image Analysis (2019)

            In many medical image analysis applications, only a limited amount of training data is available due to the costs of image acquisition and the large manual annotation effort required from experts. Training recent state-of-the-art machine learning methods like convolutional neural networks (CNNs) from small datasets is a challenging task. In this work on anatomical landmark localization, we propose a CNN architecture that learns to split the localization task into two simpler sub-problems, reducing the overall need for large training datasets. Our fully convolutional SpatialConfiguration-Net (SCN) learns this simplification due to multiplying the heatmap predictions of its two components and by training the network in an end-to-end manner. Thus, the SCN dedicates one component to locally accurate but ambiguous candidate predictions, while the other component improves robustness to ambiguities by incorporating the spatial configuration of landmarks. In our extensive experimental evaluation, we show that the proposed SCN outperforms related methods in terms of landmark localization error on a variety of size-limited 2D and 3D landmark localization datasets, i.e., hand radiographs, lateral cephalograms, hand MRIs, and spine CTs.

            Automatic Age Estimation and Majority Age Classification from Multi-Factorial MRI Data

            Darko Štern, Christian Payer, Nicola Giuliani, Martin Urschler
            IEEE Journal of Biomedical and Health Informatics (2018)

            Age estimation from radiologic data is an important topic both in clinical medicine as well as in forensic applications, where it is used to assess unknown chronological age or to discriminate minors from adults. In this work, we propose an automatic multi-factorial age estimation method based on MRI data of hand, clavicle and teeth to extend the maximal age range from up to 19 years, as commonly used for age assessment based on hand bones, to up to 25 years, when combined with clavicle bones and wisdom teeth. Fusing age-relevant information from all three anatomical sites, our method utilizes a deep convolutional neural network that is trained on a dataset of 322 subjects in the age range between 13 and 25 years, to achieve a mean absolute prediction error in regressing chronological age of 1.01 ± 0.74 years. Furthermore, when used for majority age classification, we show that a classifier derived from thresholding our regression based predictor is better suited than a classifier directly trained with a classification loss, especially when taking into account that cases of minors being wrongly classified as adults need to be minimized. In conclusion, we overcome the limitations of the multi-factorial methods currently used in forensic practice, i.e., dependency on ionizing radiation, subjectivity in quantifying age-relevant information, and lack of an established approach to fuse this information from individual anatomical sites.

            Sparse-View CT Reconstruction Using Wasserstein GANs

            Franz Thaler, Kerstin Hammernik, Christian Payer, Martin Urschler, Darko Štern
            IEEE Journal of Biomedical and Health Informatics (2018)

            We propose a 2D computed tomography (CT) slice image reconstruction method from a limited number of projection images using Wasserstein generative adversarial networks (wGAN). Our wGAN optimizes the 2D CT image reconstruction by utilizing an adversarial loss to improve the perceived image quality as well as an 𝐿1 content loss to enforce structural similarity to the target image. We evaluate our wGANs using different weight factors between the two loss functions and compare to a convolutional neural network (CNN) optimized on 𝐿1 and the Filtered Backprojection (FBP) method. The evaluation shows that the results generated by the machine learning based approaches are substantially better than those from the FBP method. In contrast to the blurrier looking images generated by the CNNs trained on 𝐿1, the wGANs results appear sharper and seem to contain more structural information. We show that a certain amount of projection data is needed to get a correct representation of the anatomical correspondences.

            Instance segmentation and tracking with cosine embeddings and recurrent hourglass networks

            Christian Payer, Darko Štern, Thomas Neff, Horst Bischof, Martin Urschler
            International Conference on Medical Image Computing and Computer-Assisted Intervention - MICCAI (2018)

            Different to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time. The network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal video information. Furthermore, we train the network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos. Afterwards, these embeddings are clustered among subsequent video frames to create the final tracked instance segmentations. We evaluate the recurrent hourglass network by segmenting left ventricles in MR videos of the heart, where it outperforms a network that does not incorporate video information. Furthermore, we show applicability of the cosine embedding loss for segmenting leaf instances on still images of plants. Finally, we evaluate the framework for instance segmentation and tracking on six datasets of the ISBI celltracking challenge, where it shows state-of-the-art performance.

            Integrating geometric configuration and appearance information into a unified framework for anatomical landmark localization

            Martin Urschler, Thomas Ebner, Darko Štern
            Medical Image Analysis (2018)

            In approaches for automatic localization of multiple anatomical landmarks, disambiguation of locally similar structures as obtained by locally accurate candidate generation is often performed by solely including high level knowledge about geometric landmark configuration. In our novel localization approach, we propose to combine both image appearance information and geometric landmark configuration into a unified random forest framework integrated into an optimization procedure that iteratively refines joint landmark predictions by using the coordinate descent algorithm. Depending on how strong multiple landmarks are correlated in a specific localization task, this integration has the benefit that it remains flexible in deciding whether appearance information or the geometric configuration of multiple landmarks is the stronger cue for solving a localization problem both accurately and robustly. Furthermore, no preliminary choice on how to encode a graphical model describing landmark configuration has to be made. In an extensive evaluation on five challenging datasets involving different 2D and 3D imaging modalities, we show that our proposed method is widely applicable and delivers state-of-the-art results when compared to various other related methods.

            Simultaneous multi-person detection and single-person pose estimation with a single heatmap regression network

            Christian Payer, Thomas Neff, Horst Bischof, Martin Urschler, Darko Štern
            ICCV PoseTrack Workshop (2017)

            We propose a two component fully-convolutional network for heatmap regression to perform multi-person pose estimation from images. The first component of the network predicts all body joints of all persons visible on an image, while the second component groups these body joints based on the position of the head of the person of interest. By applying the second component for all detected heads, the poses of all persons visible on an image are estimated. A subsequent geometric frame-by-frame tracker using distances of body joints tracks the poses of all detected persons throughout video sequences. Results on the PoseTrack challenge test set show good performance of our proposed method with a mean average precision (mAP) of 50.4 and a multiple object tracking accuracy (MOTA) of 29.9.