The Metaverse starts today with the development of the AI Research SuperCluster, a supercomputer with NVIDIA’s A100 GPUs focusing on AI to assist with Meta’s future in technology & virtual reality
Meta currently has researchers utilizing the RSC to run computations to train models in natural language processing (NLP), along with computer vision for research purposes, with the goal of training AI with trillions of parameters. The new Research SuperCluster will aid Meta’s researchers of artificial intelligence to develop improved AI models that will be smarter and capable of learning from trillions of instances; process information from several languages simultaneously; analyze text, images, and video simultaneously; create unique augmented reality devices and implemented tools; along with several other projects in the designing stages. — Kevin Lee and Shubho Sengupta, Technical Program Manager and Software Engineer, respectively Meta wants to see AI-powered applications and developments take the lead in creating the virtual universe that society incorporates as a mere buzzword. One example of the new AI supercomputer is the capability to control voice translation from a large group of people in real-time instead of using human translators to slow the conversation, allowing many people to collaborate on a project or play a multiplayer game at once. But, the underlining use for the new Research SuperCluster is to help build new technologies for the metaverse. Facebook initially created the AI Research lab in 2013 when the company executed a long-term investment in artificial intelligence. Several advancements in AI have become incorporated into our world, and Meta explains how their progression is included in transformers that assist AI models to process information higher than before by pinpointing specific areas and self-supervised learning, helping formulas comprehend a large number of numerals from unknown examples. Since high-performing computing infrastructures are crucial to artificial intelligence training, Meta divulges that they have researched and built systems to fulfill these needs for many years. Their first version initially came to design in 2017, utilizing 22,000 NVIDIA V100 Tensor Core graphics processors located on a single grouping to complete 35,000 training assignments in a single day. This design element permitted the research teams at Meta to achieve high levels of productivity, performance, and reliability. Two years ago, the company realized that to move forward with their developments, they would need to develop a new platform for the levels of computing being completed. They designed the infrastructure to use newer graphics cards and network fabric technology from the ground up. Their goal? Meta researchers wanted to train the AI utilizing trillions of parameters on exabyte-sized data sets, comparable to 36,000 years of high-quality video.
Meta also includes the need to identify harmful content found on all social media platforms, including their own. With the research capabilities of both embodied AI and multimodal AI, the company plans to improve user experience on a larger scale with its series of applications. Meta explains what is currently powering the AI Research SuperComputer in 2022: Several benchmarks have been tested, showing the RSC processes computer vision workflows as high as 20 times more efficiently, executes the NVIDIA Collective Communication Library, or NCCL, as much as nine times faster, and implements learning for high-scale NLP models as high as three times faster than their previous research systems. The equivalency of the supercomputer is equal to “tens of billions of parameters [that] finish training in three weeks, compared with nine weeks before.” Once the RSC is complete, it’s InfiniBand network fabric will link 16,000 GPUs as endpoints, operating information with 4,000 AMD EPYC processors, constructing it as one of the most extensive networks deployed. Also, Meta developed a caching and storage system that can conform 16 TB/s of data for training, with plans to increase the size to one exabyte.
Partners that have worked on the RSC project with Meta are Penguin Computing—an SGH company that worked closely with the operations team to integrate hardware to deploy clusters and assisted with the control plane of the supercomputer. Another partner was Pure Storage, which offered a unique but customizable storage solution. Finally, NVIDIA offered the use of their AI technologies, which cover graphics cards, next-gen systems, and the InfiniBand fabric, as well as the NCCL, to work in tandem with the cluster. Meta details that the RSC supercomputer is currently running even though still in development. The company also states that they are currently in Phase Two of the project and reminds readers that this new development will begin the basis of what they are considering the ground floor of the metaverse. Source: Meta AI