My wife, Kristin, jokes that whenever someone asks me what I do, her ears perk up, because she likes to hear the latest story. It changes often enough—though I’ve been at the University of Chicago and Argonne National Laboratory for 20 years—that she needs regular updates, and this is how she gets them.

Lately, I’ve had the privilege of working on a project with the National Institutes of Health. It’s a data management project—no surprise there!—and it’s one in a series of projects aimed at getting a handle on the data being produced by NIH’s strategic initiatives.

NIH is plural: the ‘I’ stands for institutes, and there are quite a few of them, mainly housed together on a campus in Bethesda, Maryland. For the most part, each institute has its own programs and projects that focus on specific issues, such as cancer, heart disease, etc. They don’t often share data, and the data they collect isn’t particularly re-usable. But there’s also a program called the Common Fund, which sponsors a portfolio of research initiatives, each of which generates considerable data that could be useful in other work.

Human BioMolecular Atlas Program, NIH Common Fund
Human BioMolecular Atlas Program, NIH Common Fund (This is not the project I’m working on, but it’s very cool.)

Each Common Fund project focuses on a specific topic. Examples include cataloging the microbes that live in our bodies, the interaction between nerves and organs, childhood cancer, and how chromosomes are organized in cell nuclii. Each of the Common Fund initiatives has a data-gathering component, a sub-team called a Data Coordination Center (DCC), responsible for gathering and managing the data and encouraging its use in research programs. Initiatives last up to ten years and they start and end on a rolling basis, with 12-24 active at a time.

The project I’m involved in is to help the research community keep track of—and find new ways to use—the data that’s been gathered by these initiatives. Maintaining awareness of the overall Common Fund data inventory (all the data collected by all the DCCs) is important for at least two reasons. First, it helps ensure the data can be used by appropriate research programs. Second, it helps ensure the resources to maintain the data remain available over time, even when the initiative that produced the data has completed.

You might think it’s an easy problem to keep track of this data, but actually it’s not. More accurately, although it isn’t hard to remember what kinds of data were collected, it turns out to be quite a challenge to remember enough about the data—how it was collected, where it came from, how each measurement was made—for it to be useful in other research projects. There’s a lot of value in this data, but only if we remember enough about it for it to be reused. Otherwise, it’s just a very expensive pile of numbers.

The team I belong to is working on a system that allows the Common Fund DCCs to reliably share all the metadata—the all-important details about the data—with NIH so it can be remembered after the initiative has concluded. Other important benefits include: helping the DCCs make their data as reusable as possible, and helping medical researchers find data that could be of use in their work.

I feel privileged to be involved in work like this. It’s projects like these that keep my work at the University of Chicago interesting and worthwhile.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.