Coming soon: A more powerful supercomputer for A&M research

Image: Research Communications

Beginning in December 2020, Texas A&M University will offer researchers a new flagship high-performance computing (HPC) platform.  Named in memory of programming pioneer Vice Admiral Grace Hopper, the Grace supercomputer, powered by Dell Technologies, will serve as a platform for ground-breaking discoveries and innovations in science and engineering at Texas A&M. The new Grace supercomputer will replace the Ada supercomputer, which has served as the lead supercomputer at Texas A&M’s High Performance Research Computing (HPRC) since 2014.

“The Grace system represents the next generation of supercomputing—reshaping how science and engineering are transforming massive data into solutions that address the world’s greatest challenges,” Texas A&M Vice President for Research Mark A. Barteau said. “In the groundbreaking research performed today, access to superior high-performance computing is vital to our mission of advancing knowledge and inspiring innovation. Grace will enable researchers in the Texas A&M Institute of Data Science (TAMIDS) and across the University to tackle problems once thought impractical or impossible.”

The computing power of a supercomputer is measured as “flops,” which stands for “floating point operations per second.” A floating-point operation is any calculation that involves numbers that contain decimal points. One trillion flops equal a teraflop; one quadrillion flops equal a petaflop. It takes 1,000 teraflops to equal one petaflop. At peak performance, the current Ada supercomputer can process 337 teraflops. The new Grace supercomputer will handle up to 6.2 petaflops, making the platform almost 20 times more powerful than Ada.

“HPRC has a mission to infuse computational and data analysis technologies into the research and creative activities of every academic discipline at Texas A&M,” said Honggao Liu, executive director of HPRC. “We support compute- and data-intensive workloads and enable researchers to use cutting-edge processor, accelerator and data analytic technologies to solve complex research problems. In this era of converged demand for advanced computing resources, a new supercomputer like Grace is needed to support complex workflows and allow researchers to continue in their pursuit of discoveries and inventions. Grace will allow A&M researchers to take significant strides in HPC, artificial intelligence (AI) and data science, while simultaneously preparing a workforce for exascale computing, which will handle a quintillion (1,000,000,000,000,000,000 or 1018) calculations per second. Grace will greatly enhance A&M’s research capabilities and competitiveness and allow A&M researchers to keep pace with current trends in research computing technologies.”

The Grace cluster responds to the growing demand for advanced supercomputing in science and engineering research. Over the last four years, Texas A&M HPRC has seen its user base roughly double from more than 1,300 in 2016 to more than 2,600 in 2020.  Grace’s architecture will support Texas A&M researchers in disparate fields such as drug design, materials science, artificial intelligence and machine learning, geosciences, fluid dynamics, biomedical applications, biophysics, genetics, quantum computing, data analytics, population informatics and autonomous vehicles.

Funding for the Grace system comes from Texas A&M University and the Texas A&M Research Development Fund with contributions from the Texas A&M Health Science Center, the Texas A&M Engineering Experiment Station, the Texas A&M Transportation Institute and several individual faculty members in the College of Engineering and the College of Science.

Texas A&M purchased its first supercomputer in 1989 and generally operates two large-scale supercomputers concurrently on the College Station campus. The oldest is replaced every three to four years. The University last replaced one of its supercomputers in 2016.

Primary vendor and partners

Dell Technologies is the primary vendor for the Grace system.

“Texas A&M delivers research and innovation that require incredible breadth and depth to achieve,” said Thierry Pellegrino, VP/GM HPC at Dell Technologies. “Our goal is to help the HPRC team deploy Grace because it enables data analytics, visualization, modeling and simulation that will drive human progress.”

Additionally, there are three partners that provided specific components for the system. The central processing units come from Intel, the storage system from DDN, and the graphics processing units and InfiniBand interconnect network from NVIDIA.

Trish Damkroger, vice president and general manager of high-performance computing at Intel, said, “The Grace supercomputer will open a broad range of opportunities for Texas A&M University researchers to make groundbreaking discoveries in many application domains. Intel’s expertise in delivering advanced HPC hardware and software environments will enable researchers to harness the power of HPC and artificial intelligence and push the boundaries of what modern supercomputers can be.” 

DDN President and Co-Founder Paul Bloch said, “DDN truly values our long-standing relationship with Texas A&M University, and are thrilled that DDN Intelligent Infrastructure solutions will be integrated into all the shared HPC/AI data storage within the new Grace cluster. Our ES7990X systems are designed to effortlessly and reliably provide unfettered access to complex distributed data sets and supply maximum application efficiency, thereby assuring the best ROI for Texas A&M.”

NVIDIA Director of Higher Education and Research Cheryl Martin said, “Universities and research institutions around the world use NVIDIA’s accelerated computing platform to create high-performance computing facilities that propel innovation. The researchers at Texas A&M using the Grace supercomputer will be able to take advantage of our latest NVIDIA Ampere architecture GPUs and HDR InfiniBand to boost performance and drive discovery with AI, accelerated analytics and simulation.”

Technical specifications

Grace is an integrated computational research platform that combines Dell EMC PowerEdge servers with cutting-edge 2nd Gen Intel Xeon Scalable processors, NVIDIA A100, NVIDIA T4 and NVIDIA RTX 6000 Tensor Core GPUs, an NVIDIA Mellanox HDR100 InfiniBand network, NVMe-based local storage and high-performance DDN EXAScaler® ES7990X™ storage.

The Grace cluster is composed of 800 regular compute nodes, 100 double precision NVIDIA A100 GPU compute nodes, eight large memory (three terabyte) compute nodes, eight single precision NVIDIA T4 GPU compute nodes, nine single precision NVIDIA RTX 6000 GPU compute nodes, five login nodes and six management nodes. All nodes are powered by Intel Xeon Scalable processors.

The Grace cluster has an NVIDIA low-latency HDR InfiniBand interconnect and 5.12 petabytes of high-performance DDN storage running the EXAScaler parallel filesystem. Each regular and GPU compute node is equipped with two 2nd Gen Intel Xeon Scalable 24-core 3.0GHz processors and 384GB DDR4 3200MHz memory, while each of eight large memory nodes has four 2nd Gen Intel Xeon Scalable 20-core 2.5 GHz processors and 3.072 terabytes of DDR4 3200MHz memory.

©2020 All rights reserved. DDN and EXAScaler are registered trademarks and ES7990X is a trademark owned by DataDirect Networks.

Intel, the Intel logo and other Intel names are trademarks of Intel Corporation or its subsidiaries.