AWS Prepares for Massive GenAI Demand with New Ultra-cluster and Diverse Compute Options
December 12, 2023
As generative AI (GenAI) enters its mainstream phase, cloud providers like AWS are scrambling to meet the soaring demand for compute resources. With GPU shortages already impacting smaller companies, AWS is taking several steps to ensure sufficient capacity for its customers.
Ultra-cluster Deployment:
- Partnering with Nvidia, AWS is building a massive ultra-cluster featuring 16,000 Nvidia H200 GPUs and Trainium2 chips. This 65 exaflops beast will go online in 2024, serving both Nvidia’s AI modeling service and AWS’s GenAI offerings like Titan and Bedrock LLMs.
- This initial deployment is just the first step. AWS plans to further segment the cluster into smaller units catering to specific customer needs and geographic locations.
Diversifying Compute Options:
- While Nvidia remains a key partner, AWS is open to exploring other chipmakers like AMD and Intel. They currently offer access to Intel Gaudi accelerators through the EC2 DL1 instances.
- AMD’s recent launch of the MI300X GPU, offering performance advantages over Nvidia’s current offerings, is on AWS’s radar. They are open to collaboration but emphasize the importance of software compatibility and ease of use for developers.
Addressing Inference Workloads:
- Training models is only half the battle. Running them in production also requires substantial GPU resources.
- With indications that inference workloads may exceed training demands, the GPU shortage could worsen.
- AWS is looking at solutions like Nvidia’s new GPU architecture and FP8 data type to improve inference efficiency.
- Custom chip development also incorporates both training and inference requirements for optimal resource utilization.
Current Adoption Landscape:
- While some customers are already in production with GenAI, the majority are still exploring its potential.
- Cost optimization is not yet a top priority as companies focus on exploring the technology’s capabilities.
- Distillation and quantization techniques are expected to play a role in making GenAI economically viable once it reaches wider adoption.
Source: https://www.datanami.com/2023/12/11/how-aws-plans-to-cope-with-genais-insatiable-desire-for-compute/
How are other hyperscalers and chipmakers expected to react?
Hyperscalers:
- Competition: Other hyperscalers like Microsoft Azure, Google Cloud, and Alibaba Cloud will likely feel pressure to match or exceed AWS’s GenAI offerings. This could lead to a race for compute resources and aggressive development of their own AI chip technologies.
- Partnerships: We may see collaborations between different hyperscalers to share resources and expertise in specific areas of GenAI, especially regarding custom chip development or access to scarce GPU resources.
- Differentiation: Hyperscalers will likely focus on differentiating themselves through unique services, tools, and integrations around GenAI. This could include specialized platforms for specific industries, pre-trained AI models for common tasks, or easier-to-use interfaces for developers.
Chipmakers:
- Increased investment: Chipmakers like Nvidia, AMD, and Intel are expected to significantly increase their investments in AI chip development. This will likely lead to a new generation of GPUs and other accelerators specifically designed for GenAI workloads.
- Specialization: We may see more specialized chip architectures emerge, optimized for different types of GenAI tasks like training, inference, or specific application domains.
- Open-source collaboration: Open-source initiatives around chip design and software development could become more prevalent to accelerate innovation and address the growing demand for GenAI solutions.
- Consolidation: In the face of intense competition, smaller chipmakers may struggle to survive and potentially be acquired by larger players.
Overall:
- The competitive landscape in both the cloud and chipmaking industries will intensify significantly due to the growing demand for GenAI solutions.
- We can expect a flurry of innovation, collaboration, and differentiation as companies scramble to gain a foothold in this rapidly evolving space.
- The ultimate beneficiaries will be businesses and individuals who can leverage these powerful AI technologies to solve real-world problems and create new opportunities.