Ingredients for Data Science

Posted November 16, 2023

“Sometimes [data science] can feel like alchemy,” Anna Little, Assistant Professor of Mathematics states. “Like we’re just stirring this big pile of math until the results look right.” Little was the featured speaker at the College of Science's Science at Breakfast speaker series on November 2 at the Natural History Museum of Utah. She titled her remarks Challenges of the Modern Data Era.

There are three key challenges today within the field of data science: Determining effective knowledge transfer, how to accomplish reliable data visualization and achieving physically meaningful machine learning. All are issues that Anna Little’s research focuses on solving.

Effective knowledge transfer centers on what it means for two tasks to be similar. With so many different applications, it becomes difficult to accurately predict. “An alternative to assessing similarity is to think about distances depending on conditions in some underlying network, not just the individual points,” Little said, "investigating novel ways of measuring distance.”

Reliable data visualization deals with the patterns that we see when looking at data. Modern data tends to have a very large number of features, which makes it difficult to visualize the data as well as analyze it. Through a process called dimension reduction, one can take a large table and minimize it into a smaller table that’s easier to analyze. However, dimension reduction can also lead to patterns going undetected, or create false patterns, as well as the disappearance of outliers. Little’s research looks into the “best of both worlds” by using linear algorithms with better note properties for the data.

For the last challenge, Little reported that machine learning is currently unreliable when it comes to data science. “AI responses aren’t stable,” Little said. “We want a small change in input to lead to a small change in output, but it often leads to a big change in output, and that makes mathematicians very uncomfortable.” Machine learning has good performance, but it’s difficult for data scientists to understand why or how it comes up with a certain conclusion.

It’s important to design features of machine learning with the characteristics that one wants, and Little focuses on utilizing translation in variant features. This means the features all compute the same, regardless of whether the data has shifted in terms of location or interference.

Anna Little was born in Alabama, but spent a majority of her childhood in Europe. She received a bachelor’s in mathematics from Samford University before completing a PhD in mathematics at Duke University before arriving at the U in 2021.

Data science is crucial — but can be faced with plenty of difficulties.