Poster Presentation at the European Congress of Radiology, Vienna, 2019
To stress test the performance of a deep learning algorithm on a dataset with spectrum bias against normalcy in chest x-ray normal vs. abnormal classifier.
Methods and Materials
A Deep Learning algorithm consisting of an ensemble of 14 Convolutional Neural Networks and a weighting Fully Connected Network was trained with more than 100,000 Chest X Ray studies. The output of the algorithm was the probability of an input image of being pathological. The system was validated with a partition of 1000 studies that were not used during training obtaining an accuracy of 70%. A real-world retrospectively acquired independent test set of 301 CXRs (197 abnormal, 104 normal) was analysed by the algorithm with the algorithm classifying each X-Ray as normal or abnormal. Ground truth for the independent test set was established by a sub-specialist chest radiologist (8 years ‘experience) reviewing each image along with its corresponding report. Algorithm output was compared against ground truth and summary statistics calculated.
The algorithm correctly classified 237 (78.74%) CXRs with a sensitivity of 83.76% (95% CI - 77.85% to 88.62%) and specificity of 69.23% (95% CI - 59.42% to 77.91%). There were equal number of false positives and false negative cases- 32 (13.5%)
As compared to the validation results, there is an increment in performance of the deep learning algorithm on the stress test on biased datasets with more abnormal scans than normal scans.