Shaping the future: topology and big data

18 Mar 2019 - 12:00

UCT mathematical physicist Professor Jeff Murugan applies topological data analysis to how the brain learns

As a mathematical physicist and ‘experimental’ mathematician, University of Cape Town (UCT) Professor Jeff Murugan’s day job revolves around ideas in string theory, quantum gravity and quantum chaos, or as he describes it: “identifying patterns pretty much anywhere that I can find them”.

At a recent Café Scientifique event – hosted by UCT Research Contracts and Innovation (RC&I) and supported by leading intellectual property law firm Spoor & Fisher – Murugan explained how he recently became interested in something called topological data analysis. “I think this is extremely interesting given the focus on big data as a central factor in the Fourth Industrial Revolution,” he says. “While I think most people have an intuitive understanding of what data analysis means, topology is less well known and often confused with geometry.”

A cube is a ‘2’ but so is a sphere

What is topology? Murugan explains by doing some basic origami. “If I take this rectangular piece of paper and attach it end to end, I get a cylinder. Glue together the remaining two ends and you get a torus which is just fancy mathematics-speak for the surface of a doughnut,” he says.

“Topology is the study of how to quantify the description of such shapes. One way to do this is to assign a number to the shape. For example, take a simple cube: it has a certain number of faces, edges (where two faces meet) and vertices (where three edges meet).

“It turns out that one useful shape-describing number is given by the combination of vertices - edges + faces. This is called the Euler number, named after 19th-century polymath Leonard Euler. The Euler number for a cube is 8 - 12 + 6 = 2”

“That’s the first thing you have to know about topology,” says Mururgan, “it deals with integers – or whole numbers.”

Second, topology is blind to small deformations of the shape. “If you take a sphere and you squash it or stretch it, it makes no difference at all to its topology. One way to see that is that its Euler number remains 2.” he explains. “The only way that it stops being a sphere is if you do something drastic like cut it or puncture it, then it becomes something else. Topologically speaking, a sphere with a puncture is a disc. Puncture it twice and you’ll end up with a cylinder – try it.”

Mapping data through shape

But what does topology have to do with big data? “If you think of data as being a series of points in some large space, you can see how using topology to process data could be useful,” says Murugan.

 

“If you think of data as being a series of points in some large space, you can see how using topology to process data could be useful.”

He explains using the example of the recent discovery of a new kind of breast cancer tumour by a team of three applied mathematicians. “These were not oncologists or any kind of clinician,” he says, “and yet, by looking at the shape of the data they were able to able to account for a strange anomaly.”

The anomaly was one that had puzzled oncologists for some time. Within the category of breast cancer patients with oestrogen-receptor-positive tumours, known to be very aggressive, there was a subset of patients whose tumours spontaneously resolved, were benign and did not metastasize.

“To better understand this anomaly, these mathematicians studied a dataset of 240 patients with oestrogen-receptor-positive tumours. These tumours have something like 24 000 different genetic expressions. I don’t know about you, but I lose my keys in three-dimensional space! Imagine looking for something in 24 000-dimensional space and you have an idea of the challenge data scientists face daily.”

To process this big data in a meaningful way, the team turned to topological data analysis. “When they used this method, they were able to visualise the shape of the data in data space and discovered something very interesting: the data once mapped, looked a bit like a ‘Y’.

“In addition to the two main branches of kinds of malignant tumours, they discovered a third – the benign kind. In essence, what had happened to researchers before then is like what happens when you look at a ‘Y’ shape from above: you see two arms but you can’t see the stem.”

AI and radiology

Murugan finds topological data analysis most interesting when it is applied to how the brain learns.

“Another interesting application of this is something I am currently doing with my postgraduate researcher Duncan Robertson and a United Kingdom-based company that is working on artificial intelligence (AI) to read x-rays,” he says. In many contexts around the world, x-ray machines are available, but radiologists are not.

“So you have patients who can get an x-ray but then there is no one close by who can read it,” says Murugan. “If this work is successful, it would make highly reliable and remote diagnoses possible and immediate.

“This is just one example of how these methods can be used.”

“If the field of physics in the 20th century showed us our place in the world, I think that the 21st century will see such diverse fields as physics, data science, medicine and neuroscience come together in ways we have never seen before with one common thread: information.

 

“If the field of physics in the 20th century showed us our place in the world, I think that the 21st century will see such diverse fields as physics, data science, medicine and neuroscience come together in ways we have never seen before with one common thread: information.”

“Topology, or the ability to assign discrete numbers to things and thereby extract simplicity from complexity, is an invaluable tool in understanding how information works.”

Story:  Ambre Nicolson

Photo:  Katherine Traut

TOP