$1.5 million grant to advance ‘big data’ for genomic research

PULLMAN, Wash.—Scientists at Washington State University have received a grant from the National Science Foundation to help meet the growing needs of the data driven genomic science community.  The Tripal Gateway project will build on existing cyberinfrastructure to enhance the capacity of genomic databases to manage, exchange and process “big data.”

“Now, in a single day some modern DNA sequencers can output as much data as the human genome,” Stephen Ficklin said. “We expect the deluge of data to continue to grow exponentially.”Highway Signpost Big Data

Ficklin, the lead investigator and a research scientist in the department of horticulture at WSU, said that just as computers have had dramatic improvements that have lowered costs and allowed for mass production, DNA sequencing technologies are undergoing a similar transition. The challenge is no longer affordability of DNA sequencing, he said.

The WSU project is one of 17 grants, totaling $31 million awarded by the National Science Foundation Data Infrastructure Building Blocks (DIBBS) program.

Sharing information

Genomic research relies on community databases — websites that house genomic, genetic and breeding data — for use by scientists working in the same research area, for example cotton, cacao (chocolate), or plants in the rosaceae family like apple, cherry, and pear.

By creating ways to easily share data between community databases, on demand, researchers will no longer have to navigate between multiple websites to obtain the information they need.

“Genomics scientists who can access large data sets but have limited resources for storing, sharing and analyzing them will benefit from this work,” Ficklin said.

The three-year project will also utilize software-defined networking technology to quickly transfer large data sets between computational resources and the database to support data sharing and analysis. Ultimately, it will link existing community databases for fruit and hardwood trees as well as legumes into a larger network of online research databases.

Tripal software

The project is based on open-source software known as Tripal (http://tripal.info), originally developed by Ficklin and Meg Staton at Clemson University and significantly enhanced by Dorrie Main at Washington State University and Kirsten Bett at the University of Saskatchewan. Tripal is used by at least 24 different plant and animal databases, including the Genome Database for Rosaceae (GDR) and community databases for 24 crops developed by the Main lab. Main is a co-investigator of the new project.

The project team also includes Sook Jung, Washington State University; Alex Feltus and Kuang-Ching Wang, Clemson University; Meg Staton, University of Tennessee; and Jill Wegrzyn, University of Connecticut.