The promise of machine learning in cancer screening

by Dr. Kevin Lan,_brown_and_red_lesion_1.jpg
An image of a melanoma lesion, with characteristic irregular borders. Image source: Larry Meyer, National Cancer Institute.

Early diagnosis and treatment can greatly improve the survival rate of cancer patients, but the stage of a cancer when first diagnosed can vary greatly between different cancer types and individual patients with the same cancer type. Sometimes, a cancer may be missed at first because doctors don’t yet have the proper diagnostic tools, or because symptoms are not present until the cancer has already progressed to a late stage. Unfortunately, diagnosis can also be delayed if a patient does not have easy access to a primary healthcare provider, whether it is because they live in a rural location, do not have healthcare coverage, or come from a low socioeconomic background. The process of diagnosis can also be drawn out, sometimes requiring complicated tests as well as imaging that can only be interpreted by medical specialists. However, computer scientists are developing new strategies for increasing the access to, and speed of diagnosis by automating some aspects of medical test interpretation.

Computer scientists have two broad strategies for solving these problems. In the first strategy, a list of explicit instructions is given to a computer to perform a task. For example, if we wanted to go through some patient health records and pick out all the patients that have already been diagnosed with skin cancer, we can tell a computer to 1) start at the top of the list of patients, 2) look at the health record, under the “diagnosis” category, 3) If it says “skin cancer”, move the record to a new list of patients, 4) move down by one in the original list, and 5) go back to step 2 until we’ve gone through the entire list. However, not all problems can be solved using this traditional strategy. For example, if we want a program that can look at digital pictures of skin lesions, and classify them as “cancer” or “not cancer”, it becomes a much more complex problem. To a computer, a digital picture is just a grid of thousands of pixels, each pixel representing a different colour. A doctor may be able to tell you that a skin lesion with irregular borders is more likely to be cancer compared to a skin lesion with regular borders, but how would you translate this kind of medical knowledge to a computer program that looks at a grid of numbers? This is where a second strategy, called machine learning comes in.

In machine learning, we do not supply explicit instructions to the computer in order to complete a given task. Instead, we can “train” a computer to complete a task using information or data that we already have. One simple application of machine learning is in teaching a computer to read hand-written digits, similar to what is used by banks to read handwritten checks. To do this, a computer can be given a sample set of “inputs” (thousands of images of handwritten numbers) and a matched set of “outputs” (the actual numbers themselves, e.g. 0,1,2 etc.). Using different strategies, an algorithm can then be created to assign the inputs to the outputs, interpreting the information found in the pixels of each image accurately to read the number. The advantage of this approach is that we don’t have to know beforehand that a certain combination or pattern of pixels corresponds to a number, rather the rules are determined automatically by the computer. As more data are processed by the computer, the algorithm is refined to accommodate the new information (a process called “learning”), and its accuracy improves. This particular application of machine learning is also known as image classification, because the goal is to develop an algorithm that extracts information automatically from pictures.

Image classification can also be adapted to more complex tasks with important implications in medicine. In fact, machine learning was used in a recent study that created an algorithm to classify pictures of skin lesions into one of 757 disease states (including broad categories like “cancer” or “non-cancer”), using a database of 129,450 clinical images.In this database, each picture was previously assigned to one of the 757 disease states by a dermatologist. The researchers then took the majority of the dataset (127,463 images) and used them to train a machine learning algorithm that can assign images to the specific disease classes. Using the remaining 1,942 images, the researchers could then test the accuracy of the algorithm and compare it with the ability of human experts (dermatologists) to classify the image. Comparing the ability of the algorithm to classify each picture as “benign”, “malignant”, or “non-neoplastic”, the algorithm had an accuracy of approximately 72.1% while dermatologists examining the same images had an accuracy of approximately 66%. The researchers also point out that this algorithm can be readily adapted to smartphone technologies, potentially as a first screening tool for a new skin cancer lesion at home or to assist primary care doctors who do not have the expertise of specialists.

In a real life situation, the visual examination of the lesion is only one of many strategies that a doctor can take to assess it. For example, skin lesions that look “cancerous” would be sampled for biopsy to determine whether it is truly cancer or not. A dermatologist would also have access to information outside of the visual inspection of the skin that helps in the initial assessment (for example, how long the lesion has been there, or is there any family history of skin cancer). Therefore, care should be taken in how this technology is incorporated into the existing clinical practices. Machine learning has many additional applications in medical diagnosis, for example, in detecting heart rhythm defects from ECGs,or identifying diabetic retinopathy from images of human retinas.The fact that a computer algorithm can perform certain tasks at a comparable level to human experts suggests that the process of diagnosis can be much more efficient in the future, allowing doctors to have more time to dedicate to the more human aspects of medicine.


  1. Esteva, A., et al., Dermatologist-level classification of skin cancer with deep neural networks.Nature, 2017. 542(7639): p. 115-118.
  3. Gulshan, V., et al., Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.Jama, 2016. 316(22): p. 2402-2410.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s