Darko Štern

Graz, AT · stern@icg.tugraz.at

Experience in medical image processing with a strong focus on machine learning. Research interests are concentrated around the design and development of algorithms for processing and analysis of three-dimensional (3D) computed tomography (CT) and magnetic resonance (MR) images. I am also interested in computer vision topics, like segmentation, recognition and reconstruction.


We are constantly looking for students with an interest in medical image analysis, as well as the use of machine learning and computer vision in novel and established clinical and forensic applications. This page lists specific open student projects on a master and bachelor level. Students coming with their own research ideas are also welcome to get in contact! Please be aware that work on students projects is usually not financially covered by our side.

Detection of Infected Teeth in 3D CBCT Images

As a consequence of a bacterial, tooth associated infection are very common. Those pathologies are usually located in the surrounding of the root of the teeth. They can vary in diameter from a simple widening of the periodontal space up to several millimetres or more, being completely bone surrounded or perforating the adjacent anatomical borders. Furthermore, they potentially affect each of the around 30 roots of per jaw. The manual location of those frequently requires a large amount of work, depending on the number of investigated teeth and the quality of the data set as well as on the education and experience of the doctor doing exemination. The aim of the project is to train deep convolutional neural networks (DCNN) to automatically recognize all the infected teeth in the 3D Cone Beam Computed Tomography (CBCT) image.

Read more..

    Instance Segmentation in Medical Image Applications

    To start answering fundamental questions for understanding how the brain works, we need to look at the brain structure on the cell levels. Reconstruction of cell morphology and building connectivity diagram requires that all instances of neuron cell are segmented. Differently, to semantic segmentation, instance segmentation does not only assign a class label to each pixel of an image but also distinguishes between instances within each class, e.g., each individual cell in an electronic microscopy image gets assigned a unique ID. This work will investigate interesting direction for simultaneous segmentation of all instances by automatically encoding the individual instances as pixel-wise embeddings.

    Read more..

      Deep Reinforcement Learning in Medical Image Applications

      By learning a sequence of actions that maximize the expected reward, deep reinforcement learning (DRL) brought significant performance improvements in many areas including games, robotics, natural language processing, and computer vision. It was DeepMind, a small and little-known company in 2013, that achieved a breakthrough in the world of reinforcement learning as they implemented a system that could learn to play many classic Atari games with human or even superhuman performance. Sill, it was until recently that DRL started to appear also in medical image applications for landmark detection, automatic view planning from 3D MR images, or active breast lesion detection.

      Read more..

        Rotation Invariant Deep Neural Networks

        Deep convolutional neural networks (DCNN) have recently shown outstanding performance on image classification and object detection tasks due to their powerful multiscale filters. The dominant filters used in building DCNN architectures are only transitionally invariant, which is not optimal when the problem is rotation equivalent, as it is the case in e.g. cells detection and tracking task. Thus, by explicitly encoding the expected rotational invariance of the object in the image, the complexity of the problem is decreased, leading to a reduction in the size of the required model.

        Read more..

          List of all Projects !


          List of my publication can also be found at Google Scholar and ReserchGate . If you have any problems accessing our publications, feel free to contact me.

          Automated age estimation from MRI volumes of the hand

          Darko Štern, Christian Payer, Martin Urschler
          Medical Image Analysis (2019)

          Highly relevant for both clinical and legal medicine applications, the established radiological methods for estimating unknown age in children and adolescents are based on visual examination of bone ossification in X-ray images of the hand. Our group has initiated the development of fully automatic age estimation methods from 3D MRI scans of the hand, in order to simultaneously overcome the problems of the radiological methods including (1) exposure to ionizing radiation, (2) necessity to define new, MRI specific staging systems, and (3) subjective influence of the examiner. The present work provides a theoretical background for understanding the nonlinear regression problem of biological age estimation and chronological age approximation. Based on this theoretical background, we comprehensively evaluate machine learning methods (random forests, deep convolutional neural networks) with different simplifications of the image information used as an input for learning. Trained on a large dataset of 328 MR images, we compare the performance of the different input strategies and demonstrate unprecedented results. For estimating biological age, we obtain a mean absolute error of 0.37 ± 0.51 years for the age range of the subjects  ≤  18 years, i.e. where bone ossification has not yet saturated. Finally, we validate our findings by adapting our best performing method to 2D images and applying it to a publicly available dataset of X-ray images, showing that we are in line with the state-of-the-art automatic methods for this task.

          Segmenting and tracking cell instances with cosine embeddings and recurrent hourglass networks

          Christian Payer, Darko Štern, Marlies Feiner, Horst Bischof, Martin Urschler
          Medical Image Analysis (2019)

          Differently to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same object class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time, which is highly relevant, e.g., in biomedical applications involving cell growth and migration. Our network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal information, e.g., from microscopy videos. Moreover, we train our network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos, even in the presence of dynamic structural changes due to mitosis of cells. To create the final tracked instance segmentations, the pixel-wise embeddings are clustered among subsequent video frames by using the mean shift algorithm. After showing the performance of the instance segmentation on a static in-house dataset of muscle fibers from H&E-stained microscopy images, we also evaluate our proposed recurrent stacked hourglass network regarding instance segmentation and tracking performance on six datasets from the ISBI celltracking challenge, where it delivers state-of-the-art results.

          Integrating Spatial Configuration into Heatmap Regression Based CNNs for Landmark Localization

          Christian Payer, Darko Štern, Horst Bischof, Martin Urschler
          Medical Image Analysis (2019)

          In many medical image analysis applications, only a limited amount of training data is available due to the costs of image acquisition and the large manual annotation effort required from experts. Training recent state-of-the-art machine learning methods like convolutional neural networks (CNNs) from small datasets is a challenging task. In this work on anatomical landmark localization, we propose a CNN architecture that learns to split the localization task into two simpler sub-problems, reducing the overall need for large training datasets. Our fully convolutional SpatialConfiguration-Net (SCN) learns this simplification due to multiplying the heatmap predictions of its two components and by training the network in an end-to-end manner. Thus, the SCN dedicates one component to locally accurate but ambiguous candidate predictions, while the other component improves robustness to ambiguities by incorporating the spatial configuration of landmarks. In our extensive experimental evaluation, we show that the proposed SCN outperforms related methods in terms of landmark localization error on a variety of size-limited 2D and 3D landmark localization datasets, i.e., hand radiographs, lateral cephalograms, hand MRIs, and spine CTs.

          Automatic Age Estimation and Majority Age Classification from Multi-Factorial MRI Data

          Darko Štern, Christian Payer, Nicola Giuliani, Martin Urschler
          IEEE Journal of Biomedical and Health Informatics (2018)

          Age estimation from radiologic data is an important topic both in clinical medicine as well as in forensic applications, where it is used to assess unknown chronological age or to discriminate minors from adults. In this work, we propose an automatic multi-factorial age estimation method based on MRI data of hand, clavicle and teeth to extend the maximal age range from up to 19 years, as commonly used for age assessment based on hand bones, to up to 25 years, when combined with clavicle bones and wisdom teeth. Fusing age-relevant information from all three anatomical sites, our method utilizes a deep convolutional neural network that is trained on a dataset of 322 subjects in the age range between 13 and 25 years, to achieve a mean absolute prediction error in regressing chronological age of 1.01 ± 0.74 years. Furthermore, when used for majority age classification, we show that a classifier derived from thresholding our regression based predictor is better suited than a classifier directly trained with a classification loss, especially when taking into account that cases of minors being wrongly classified as adults need to be minimized. In conclusion, we overcome the limitations of the multi-factorial methods currently used in forensic practice, i.e., dependency on ionizing radiation, subjectivity in quantifying age-relevant information, and lack of an established approach to fuse this information from individual anatomical sites.

          Sparse-View CT Reconstruction Using Wasserstein GANs

          Franz Thaler, Kerstin Hammernik, Christian Payer, Martin Urschler, Darko Štern
          IEEE Journal of Biomedical and Health Informatics (2018)

          We propose a 2D computed tomography (CT) slice image reconstruction method from a limited number of projection images using Wasserstein generative adversarial networks (wGAN). Our wGAN optimizes the 2D CT image reconstruction by utilizing an adversarial loss to improve the perceived image quality as well as an 𝐿1 content loss to enforce structural similarity to the target image. We evaluate our wGANs using different weight factors between the two loss functions and compare to a convolutional neural network (CNN) optimized on 𝐿1 and the Filtered Backprojection (FBP) method. The evaluation shows that the results generated by the machine learning based approaches are substantially better than those from the FBP method. In contrast to the blurrier looking images generated by the CNNs trained on 𝐿1, the wGANs results appear sharper and seem to contain more structural information. We show that a certain amount of projection data is needed to get a correct representation of the anatomical correspondences.

          Instance segmentation and tracking with cosine embeddings and recurrent hourglass networks

          Christian Payer, Darko Štern, Thomas Neff, Horst Bischof, Martin Urschler
          International Conference on Medical Image Computing and Computer-Assisted Intervention (2018)

          Different to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time. The network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal video information. Furthermore, we train the network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos. Afterwards, these embeddings are clustered among subsequent video frames to create the final tracked instance segmentations. We evaluate the recurrent hourglass network by segmenting left ventricles in MR videos of the heart, where it outperforms a network that does not incorporate video information. Furthermore, we show applicability of the cosine embedding loss for segmenting leaf instances on still images of plants. Finally, we evaluate the framework for instance segmentation and tracking on six datasets of the ISBI celltracking challenge, where it shows state-of-the-art performance.

          Integrating geometric configuration and appearance information into a unified framework for anatomical landmark localization

          Martin Urschler, Thomas Ebner, Darko Štern
          Medical Image Analysis (2018)

          In approaches for automatic localization of multiple anatomical landmarks, disambiguation of locally similar structures as obtained by locally accurate candidate generation is often performed by solely including high level knowledge about geometric landmark configuration. In our novel localization approach, we propose to combine both image appearance information and geometric landmark configuration into a unified random forest framework integrated into an optimization procedure that iteratively refines joint landmark predictions by using the coordinate descent algorithm. Depending on how strong multiple landmarks are correlated in a specific localization task, this integration has the benefit that it remains flexible in deciding whether appearance information or the geometric configuration of multiple landmarks is the stronger cue for solving a localization problem both accurately and robustly. Furthermore, no preliminary choice on how to encode a graphical model describing landmark configuration has to be made. In an extensive evaluation on five challenging datasets involving different 2D and 3D imaging modalities, we show that our proposed method is widely applicable and delivers state-of-the-art results when compared to various other related methods.

          Simultaneous multi-person detection and single-person pose estimation with a single heatmap regression network

          Christian Payer, Thomas Neff, Horst Bischof, Martin Urschler, Darko Štern
          ICCV PoseTrack Workshop (2017)

          We propose a two component fully-convolutional network for heatmap regression to perform multi-person pose estimation from images. The first component of the network predicts all body joints of all persons visible on an image, while the second component groups these body joints based on the position of the head of the person of interest. By applying the second component for all detected heads, the poses of all persons visible on an image are estimated. A subsequent geometric frame-by-frame tracker using distances of body joints tracks the poses of all detected persons throughout video sequences. Results on the PoseTrack challenge test set show good performance of our proposed method with a mean average precision (mAP) of 50.4 and a multiple object tracking accuracy (MOTA) of 29.9.