
Amazon Web Services (AWS) is building a super computing cluster called Project Rainier, which contains hundreds of thousands of self-developed Trainium2 AI chips, providing strong computing capabilities for investment partner Anthropic. It will be launched at the end of the year and will be connected to many data centers across the United States.
The single Indiana facility contains 30 data centers, each covering an area of 200,000 square feet, and consumes more than 2.2 GW of electricity. Amazon has invested $8 billion in Anthropic, hoping to help win the advantage of competition with OpenAI. The project adopts Amazon's self-developed Trainium2 chip instead of GPU, which is the largest scale deployment of AWS' self-developed AI chips in history.
Unprecedented in scale historyAmazo Annapurna Labs Products Director Gadi Hutt said: "This is the first time we have built such a large training cluster that allows Anthropic to train a single model in all basic facilities. The scale is truly unprecedented."
Unlike OpenAI's Stargate or xAI's Colossus, Project Rainier is a distributed system across multiple locations, rather than a single supercomputer. This design allows the system to continue to expand, and there is no theoretical limit. Anthropic has begun to train AI models using some systems. Amazon said that the basic computing unit will be "copy and pasted" to expand the entire cluster size.
Self-developed chip challenge NvidiaProject Rainier's core is the self-developed Trainium2 chip in Amazon. Each chip provides 1.3petaFLOPS computing power and 96GB of memory. Although the performance of a single chip is not as good as Nvidia's latest B200 chip (4.5petaFLOPS), it is more important to emphasize cost-effectiveness.
Hutt explains: "What customers require is not the "fastest chip for us", but the lowest-cost performance, and of course it must be easy to use." Amazon combined 16 Trainium2 chips into a basic unit, and then combined four units to form an "UltraServer" with 64 chips. Thousands of UltraServers are connected to form a complete Project Rainier cluster.
It takes huge power to build such a large-scale AI cluster. Experts estimate that a cluster containing 250,000 Trainium2 chips requires 250 to 300 megawatts of power, which is comparable to the power consumption of the xAI Colossus supercomputer. Amazon is building special network infrastructure for Indiana facilities, including self-created optical "cracks" that deal with a large number of lines. Amazon has also developed a custom network system that promises to provide high-speed connections with extremely low latency.
Next-generation chip storage is ready to be developedAmazon has predicted the third generation of Trainium3 chips, with a 3-nanometer process and a 40% higher performance than existing chips. The computing power of the new chip system is expected to be four times higher than that of the existing system. Representing some of the Project Rainier facilities may adopt a stronger Trainium3 chip, just like another project in Amazon, Project Ceiba, which eventually switched to Nvidia to update Blackwell chips.
Project Rainier reflects the fierce competition among technology giant AI-based facilities. After the AI model is more complicated, more computing resource training is needed, and companies will build larger-scale systems to maintain competitive advantages. Amazon uses its new project to strengthen cloud AI market position, and at the same time provides Anthropic with the computing foundation to challenge OpenAI.
Amazon built a massive AI supercluster for Anthropic called Project Rainier – here’s what we know so far