Microsoft Launches 'World’s First AI Superfactory'

The new data centers are defined by a unique design that includes Nvidia GB200 NVL72 rack-scale systems capable of scaling to hundreds of thousands of Blackwell GPUs.

MITSloan ME Editorial 2 hours ago

Topics

[Image source: Chetan Jha/MITSMR Middle East]

Taking one step closer to its AI ambitions, Microsoft earlier this week announced the successful connection of data centers in Wisconsin and Atlanta—roughly 700 miles and five states apart—through a high-speed fiber-optic network, creating what it calls the world’s first AI superfactory.

The Atlanta site became operational in October and is a part of Microsoft’s Fairwater family alongside Wisconsin.

These Fairwater sites go beyond housing sophisticated silicon and zero-water cooling techniques. The two, soon to connect with additional facilities under construction across the US, are directly connected through an AI Wide Area Network, or AI WAN, enabling faster training of next-generation AI models, completing tasks in weeks instead of months.

Each infrastructure houses hundreds of thousands of advanced AI GPUs, supported by exabytes of storage and millions of CPU cores for operational computation.

“A traditional data center is designed to run millions of separate applications for multiple customers,” said Alistair Speirs, general manager focusing on Azure infrastructure at Microsoft. “The reason we call this an AI superfactory is that it’s running one complex job across millions of pieces of hardware. And it’s not just a single site training an AI model, it’s a network of sites supporting that one job.”

Speirs adds that the network being built is planned to act as a virtual supercomputer for tackling the world’s critical challenges and is something that cannot be done in a single facility.

The new datacenters are defined by a unique design that includes NVIDIA GB200 NVL72 rack-scale systems capable of scaling to hundreds of thousands of Blackwell GPUs.

The site’s networks are expected to support training models with hundreds of trillions of parameters.

“To make improvements in the capabilities of the AI, you need to have larger and larger infrastructure to train it,” said Mark Russinovich, CTO, deputy CISO, and technical fellow, Microsoft Azure. “The amount of infrastructure required now to train these models is not just one data center, not two, but multiples of that.”

Unlike many data centers, the Fairwater design uses two stories.

The Fairwater data center’s GPU density poses a heat challenge, which is being addressed by a complex closed-loop cooling system that removes hot liquid from the building through a configuration of pipes, pumps, and chillers before returning it to the GPUs.

Notably, the water used in the Atlanta site’s initial fill is equivalent to an annual consumption of 20 homes.

“Fairwater exemplifies our vision for a fungible fleet: infra that can serve any workload, anywhere, on fit-for-purpose accelerators and network paths, with maximum performance and efficiency,” shared Satya Nadella, CEO, Microsoft on X.

The global data center market size is expected to grow from $347.60 billion in 2024 to reach $652.01 billion by 2030.

Topics

About the Author

Tags:

AI Data Center Microsoft World’s First AI Superfactory

Topics

Share