Skip to Main Content

Common ground: NSF-funded project will standardize how geoscientists describe their data, making information easier to find and use


The National Science Foundation (NSF) has awarded a $3.2 million project to a multidisciplinary team of researchers working to standardize how Earth science data is described, allowing for scientific data search engines that not only support discoverability but also facilitate data usage.

Shuang Zhang, assistant professor in the Department of Oceanography in Texas A&M University’s College of Arts and Sciences, is a co-principal investigator for the Democratized Cyberinfrastructure for Open Discovery to Enable Research (DeCODER) project, which began Oct. 1.

Kenton McHenry, National Center for Supercomputing Applications (NCSA) associate director for software, is leading the project, which will expand and extend NCSA’s EarthCube GeoCODES framework and community to unify data and tool description and reuse across geoscience domains.

“The internet works because of defined standards and protocols (e.g., TCP/IP, HTTP, HTML),” McHenry said. “This allows software, which must be sustained, to change and evolve over time, with better software with new features to emerge, while still allowing everything to just work from the user perspective. That’s what we are doing here for research data through the adoption of science-on-schema.”

The DeCODER project is a collaborative research effort between NCSA, the San Diego Supercomputer Center (SDSC), Scripps Institution of Oceanography, Virginia Tech, Syracuse University, Texas A&M University and the University of California, Berkeley.

“This effort will assist researchers seeking to reuse data and bridge subdomains, especially in the Earth sciences,” said Christine Kirkpatrick, division director of research data services at SDSC.

The project will leverage the DeCODER platform to enable similar activities and outcomes across scientific communities, such as ecological forecasting, deep ocean science and low-temperature geochemical science.

“The past several decades have seen a proliferation in the amount of data documenting Earth’s low-temperature surface processes, such as global carbon cycling through the river-land-atmosphere system and the interplay between anthropogenic footprints and environmental feedbacks,” said Texas A&M’s Zhang. “Coupling data science techniques with these datasets is helping reveal the intrinsic patterns of nature’s low temperature processes that are sometimes extremely difficult to be captured by classical physical process models.”

However, due to the inherent complexity of Earth’s surface processes, Zhang said the datasets documenting them typically originate from a wide range of disciplines and deposition locations and also vary in size and format, which hinders data-driven discoveries.

“The DeCODER project will help the community of low-temperature geochemistry to build an online searching framework to retrieve the high-dimensional datasets in a more streamlined and efficient way,” he said. “Part of the outcome of DeCODER is expected to greatly push forward the fundamental research in using data to delineate Earth’s surface processes and patterns both on the regional and global scale.”