Learning to love the data

In 1994, a young graduate student named Matt Sigman found himself at a biotech firm in Boulder, Colorado, watching scientists run millions of experiments in search of a single result. The company was pioneering a process that used vast libraries of DNA and RNA to find molecules that would bind to specific targets. It was an elegant, almost Darwinian approach to chemistry, “evolving those sequentially to bind better and better and better," Sigman recalls.

But something nagged at him. Of the million compounds being screened, researchers were only paying attention to the tiny fraction that worked. The rest—the failures, the near-misses, the data that didn't really “sing”—were being set aside. "Wait," he remembers thinking, "there are 999,000 experiments you don't know the answer to, and you're only focused on things that work."

What researchers are missing when they only chase success would quietly animate the next three decades of Sigman's career and ultimately lead to one of the most distinctive research programs in modern chemistry.

This spring, the University of Utah professor of chemistry was elected to the American Academy of Arts & Sciences, one of the most prestigious honors in American intellectual life, awarded not merely for expanding knowledge but for changing how an entire field thinks.

Language of molecules

In a recent Nature paper Sigman and his team developed a machine-learning system trained on sparse data that predicts how molecules form, cutting months of lab work to days, at a fraction of the cost.

Sigman calls what his lab does "data chemistry." It is a phrase he cheerfully admits was partly a grant-writing invention. "Data chemistry is a sales pitch term we had to use to get money from the NSF," he says, smiling. Even so, the science behind it is serious, and the insight at its core is genuinely original: that chemical reactions, traditionally understood through intuition and bench-top experimentation, can be described mathematically with enough precision to predict outcomes before a new experiment is performed.

The challenge Sigman has spent his career solving is one of the hardest in pharmaceutical chemistry. It turns out that molecules, like human hands, can exist as mirror images of each other, and the human body responds to each image very differently. One mirror image of a drug molecule might cure a disease; its twin might do nothing . . . or cause harm. Producing the correct mirror image reliably, what chemists call enantioselective catalysis, has traditionally required running fifty, sixty, sometimes hundreds of lab reactions. Sigman's models can reduce that to five or ten and do so while teaching chemists something new about why the reaction works.

"The why and how of reactions are super important to my interests," he says. "We love the nuance of reactions and processes, and the kind of modeling that we do is all about the nuance." His most recent paper, published in Nature, pushed the approach even further: a machine learning model trained on a handful of select published papers from other labs, predicting new reactions it had never encountered. The key innovation, he explains, is in how his team describes molecules mathematically. These descriptions are not crude approximations but as computational portraits that capture the exact moment a chemical bond forms, what chemists call the transition state. "Our secret sauce is understanding how you describe the molecules mathematically," he says. "If your computations are asking the wrong question, it's junk in, junk out."

Asking hard science questions

Sigman has been at the University of Utah for nearly thirty years, serving for a term as chair of the Department of Chemistry, and he speaks of the department's culture with genuine warmth. He has won the university's Distinguished Teaching and Distinguished Mentoring awards—honors he counts, without hesitation, among the most meaningful. "I think my favorite awards are actually the ones I've won here for teaching and mentoring," he says. "That's not accidental. It allows me to take students and help them find their way to being excellent in their work, and therefore we can ask hard science questions."

That philosophy extends to collaboration. His Nature paper involved a joint postdoctoral researcher, Simone Gallarati, who solved a key computational problem, and a close collaborator at UCLA, Abby Doyle, whose graduate student, Erin Bucci, ran the key validation experiments. It is a division of labor that Sigman describes as the lifeblood of his research. "I feel completely privileged," he says, "not only to recruit incredibly talented people to my group, but to get to work on hard and interesting problems. Collaboration has been the driver of all of this, really."

AI and intentionality

Asked about the future in which artificial intelligence is reshaping not just chemistry but nearly every field, Sigman is thoughtful and honest about the uncertainty. He believes that data-driven models will eventually reduce the need for human chemists at the bench, though how far and how fast, he admits, no one really knows. "Can you remove the chemist from the loop? I don't know the answer to that," he says. What he does know is that every experiment should mean something. "Every experiment should be a physical organic experiment," he says. "You don't want to waste any experiments. You want everything to be intentional."

Intention is important more broadly as well. Scientists, he argues, by the nature of their work are typically too busy, too specialized, too comfortable inside the lab to make the case to the public and policymakers who ultimately fund and shape the enterprise. "We've never been great advocates of ourselves," he says, "and we own some of that." It is a rare moment of institutional self-criticism from a scientist who has spent his career pushing chemistry to be more rigorous, more willing to confront uncomfortable data, including, it seems, the data about science's own public standing. Strategically (and relentlessly) communicating the value of scientific inquiry and higher education to policy makers and the community at large is a next best step.

It is, in miniature, the ethic that has driven him since that Boulder lab in 1994: the conviction that the data we ignore is as important as the data we celebrate, and that inside the noise, if you're willing to look, there is always a pattern waiting.

by David Pace

Chemist Matt Sigman elected to the American Academy of Arts & Sciences