Over the next decade, Meta’s artificial intelligence (AI) compute needs are expected to skyrocket as ground-breaking research into AI is conducted and cutting-edge applications created; also as its long-term vision is pursued. Meta is embarking on an ambitious plan to establish its next-generation AI infrastructure, with plans involving the creation of custom silicon chips for AI model execution, an AI-focused data center design, and the expansion of an advanced GPU supercomputer dedicated to AI research underway. These efforts aim to support the creation and efficient deployment of larger and more advanced AI models for enhanced personalization, safer products, and enhanced experiences. Meta is also revolutionizing software development with CodeCompose, an AI-powered coding assistant to increase developer productivity throughout the software development lifecycle. Through innovative infrastructure creation, Meta is providing a scalable foundation to fuel emerging opportunities within generative AI and the metaverse.
Meta has developed a global infrastructure since opening its first data center in 2010, serving billions of users through its suite of apps. AI has played an essential part in their systems – from Big Sur hardware being installed in 2015 to PyTorch development and creating an AI research supercomputer dedicated for research. Now Meta is expanding their infrastructure further with exciting innovations:
- MTIA (Meta Training and Inference Accelerator): Meta has created its own accelerator chip family called MTIA specifically tailored for inference workloads. Offering superior compute power and efficiency than CPUs and tailored specifically to internal workloads. Combining MTIA chips with GPUs enables Meta to deliver enhanced performance, reduced latency, and increased efficiency across a variety of workloads.
- Meta is developing a next-generation data center to support existing products while accommodating future generations of AI hardware for training and inference. This AI-friendly data center design will feature liquid-cooled AI hardware connected by high performance network interconnection, large scale AI training clusters. Meta’s new approach prioritizes speed, cost-effectiveness and compatibility with their in-house developed ASIC solution MSVP specifically developed to handle ever-increasing video workloads.
- Research SuperCluster (RSC) AI Supercomputer: Meta’s RSC artificial intelligence supercomputer is widely renowned as one of the fastest AI supercomputers worldwide. Designed to train large AI models for applications such as Augmented Reality tools, Content Understanding systems, Real-time translation technology and more; its 16,000 accessible GPUs interconnected through 3-level Clos network fabric ensure high bandwidth across its 2,000 training systems.
Custom-designing much of Meta’s infrastructure enables an optimized end-to-end experience from physical layer through virtual layer, software layer and user experience. Meta has full control of its stack, permitting custom modifications tailored specifically to user needs and accommodating flexible collocation of GPUs, CPUs, network and storage to support workloads with power and cooling solutions reevaluated as part of an overall cohesive system solution.
Meta’s unique ability to design, construct and operate all aspects of its infrastructure makes them adaptable to changing specialization and customization in chip design, purpose-built AI infrastructure for specific workloads, scalable deployment systems and improved product efficiency and support services. Meta remains dedicated to long-term value creation by capitalizing on its extensive infrastructure to lead AI for years to come.