How to Lie with Statistics: Things To Keep in Mind While Evaluating a Deep Learning Claim (RSNA 2019, Sun Dec 01 – 06)

TEACHING POINTS

1. In today’s age of deep learning and artificial intelligence, a radiologist must know what to watch out for while

evaluating a deep learning algorithm’s claim

2. What is ground truth?

3. Specific points to keep in mind while evaluating:

What is the medium of communication? Is it a video, a pre-print or a reputed peer-reviewed journal article?

What is the performance metric? Accuracy is a bad metric to use.

What data was the algorithm developed on? Generally, algorithms developed on poor ground truth have poor performance

What data was the algorithm validated on? Generally, algorithms validated on data from the same institution from where training data was obtained tend to falsely perform better

How much data was it tested on? Test data should not only be independent, but also adequate, both in number and disease heterogeneity.

What are the implications of the algorithm failing – what if a chest X-Ray algorithm misses a critical finding?

4. Try to get access to the actual algorithm and run it in your department

TABLE OF CONTENTS/OUTLINE

1. Why should a radiologist know how to evaluate a deep learning algorithm?

2. Performance metrics for evaluating algorithms

3. Data – training and testing

4. When an algorithm fails – implications

5. Run AI in your department!