An algorithm that learns to diagnose genetic disease


Illustration of artificial intelligence. Shiny head, circles, lines and dots.

Scientists from the fields of physics and medicine the University of Gothenburg and Nova Medical School in have collaborated to find a way of using machine learning to help diagnose patients with elevated cholesterol levels.

By training an algorithm with patient data, they have found a way of estimating the likelihood of if a patient is suffering from a common genetic disease or not.

Familial hypercholesterolemia (FH) is a common genetic disorder that affects 1 in 250 people. It interferes with metabolism and causes heightened cholesterol levels. To get a definite diagnosis a genetic test is needed.

However, these tests are only available at a limited number of university hospitals. All other hospitals use a method called the Dutch lipid score, which is based on clinical characteristics and family history. This test estimates the likelihood of FH rather than diagnosing it, and is a more accessible, but less accurate, tool than a genetic test.

Now a collaborating group of researchers from the Department of Physics and the Department of Molecular and Clinical Medicine at the University of Gothenburg have discovered an alternative method of diagnosing FH.

“We decided to try to apply machine learning to predict if a patient has the hereditary genetic disease or if their clinical manifestations are caused by causes related to lifestyle,” says Saga Helgadottir, PhD-student at the Department of Physics. “To do this, we used three different algorithms: a classification tree, a gradient boosting machine and a neural network. We found that all three performed better than the traditional Dutch lipid score.”

The researchers used half of a data set comprised of information about patients in Gothenburg to train the algorithms, and the other half of the Gothenburg-data set to test it and see how accurately it could predict if a patient was suffering from FH or not. For every patient, they provided a few basic variables that any hospital could obtain, like age and blood test results. The scientists then tested the algorithms trained on Gothenburg-data using a different set of data from a similarly afflicted patient group from Milan. The results showed that the algorithm performed more accurately when “locally trained”, but it still out-performed the Dutch lipid score when used on the external Milan-patients’ data.

Portrait of smiling Saga Helgadottir in laboratory.“Depending on how well you train the algorithm, the test would be more or less precise. Basically, hospitals would have the choice of training the algorithm using their own patient data and get a more accurate result when applying it to new patients in need of diagnosing – or they could use a pre-trained algorithm and still get better results than when using the Dutch lipid score,” Saga Helgadottir explains.

Moving forward, one potential route for the team to explore further could be to train the algorithm using a large quantity of data from different patient groups around the world and compare the accuracy of the results to that of a version trained on only local data. That idea hinges on the availability of such data sets, according to Saga Helgadottir.

The results of the study are published in the European Journal of Preventive Cardiology. Collaborating researchers in the study include Stefano Romeo, Professor in Molecular and Clinical Medicine at the University of Gothenburg and Senior Consultant in Cardiology at Sahlgrenska University Hospital, Ana Pina, specialized physician, data scientist and researcher at Medir, Chronic Diseases Research Center, NovaMedical School, and Giovanni Volpe, senior lecturer at the Department of Physics, University of Gothenburg.

Link to scientific paper:

Text: Carolina Svensson

Saga Helgadottir
031-786 9157

Illustration AI, photo kentoh, Mostphotos.
Photo of Saga Helgadottir (GU)