Rutgers Protein Data Bank expands storage, research access with AWS

Jessica Perry//May 4, 2022

Rutgers Protein Data Bank expands storage, research access with AWS

Jessica Perry//May 4, 2022

Part of the Rutgers Institute for Quantitative Biomedicine, the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) is expanding its storage capabilities.

Amazon Web Services’ Open Data Sponsorship Program will provide RCSB PDB with more than 100 terabytes of storage for no-cost delivery of Protein Data Bank information to millions of scientists, educators and students worldwide. The arrangement, which will benefit those working in fundamental biology, biomedicine, bioenergy, and bioengineering and biotechnology, was announced April 26.


“For more than five decades, the global Protein Data Bank has enabled basic, translational, and clinical research by providing open access to three-dimensional (3D) biostructure information at the atomic level,” said Dr. Stephen Burley, director of the RCSB PDB, founding director of Rutgers Institute for Quantitative Biomedicine, university professor and Henry Rutgers Chair at Rutgers University. “Open access to Protein Data Bank information is central to accelerating scientific discoveries for the benefit of all humanity.”

The AWS Open Data Sponsorship Program covers the cost of storage and egress for publicly available, high-value, cloud-optimized datasets. According to Rutgers, Amazon is working to expand open access to data by making it available for analysis on AWS; developing cloud-native techniques, formats and tools to lower the cost of working with data; and encouraging developing communities that benefit from shared datasets.

“The Protein Data Bank plays an important role in facilitating discovery and development of lifechanging drugs,” added Burley, who also co-leads the Cancer Pharmacology Research Program at Rutgers Cancer Institute of New Jersey. “Freely available 3D biostructure data constitute a public good with far-reaching impacts on patients and their families.”

The RCSB PDB has operated the U.S. data center for the global Protein Data Bank for over two decades. It is currently home to nearly 190,000 experimentally determined 3D structures of proteins, DNA and RNA, available freely with no limitations for usage, according to RCSB PDB. Its archive is jointly managed by the Worldwide Protein Data Bank partnership.

The U.S. data center is housed at RCSB PDB at Rutgers, in addition to the University of California, San Diego-San Diego Supercomputer Center and the University of California, San Francisco.

“Access to open data sets is improving the way the scientific community can collaborate and accelerate life-changing discoveries,” said AWS Director, US Education, State and Local Government Verticals Josh Weatherly. “The Protein Data Bank provides a vast and diverse repository for researchers in government, academia, and industry to use to develop diagnostics, vaccines, drugs, and other therapeutic treatments.

“AWS can help provide the Protein Data Bank the capacity to scale up to meet the increasing demand to continue to provide free and open access information and unlock the latest analytic capabilities,” he said.

In 2019, the Protein Data Bank was awarded $34.5 million in grants over five years from three federal agencies. At the time, RCSB PDB said the funding marked a 5% increase from the previous five-year period and would cover ongoing operations for the entity, as well as expanding its reach.