Want to compete with Nvidia? Intel pushes cloud AI chips and intends to join hands with domestic manufacturers to develop AI servers

Original source: Science and Technology Innovation Board Daily

Image credit: Generated by Unbounded AI‌

On July 11, Intel launched a cloud AI training chip - Habana® Gaudi® 2 in the Chinese market. The chip can meet the needs of large-scale language models, multi-modal models and generative AI models. According to the on-site introduction, the performance of the chip is better than that of Nvidia A100 in some key indicators.

It is understood that the Gaudi2 processor and Gaudi2 mezzanine card HL-225B are based on the first-generation Gaudi high-performance architecture and have 24 programmable Tensor processor cores (TPCs).

Each chip integrates 21 100Gbps (RoCEv2 RDMA) Ethernet interfaces dedicated to internal interconnection, equipped with 96GB HBM high-speed memory and a total memory bandwidth of 2.4TB/s, meeting large-scale language models, multi-modal models and generative models. AI model needs.

According to the on-site introduction, the performance per watt of Intel Gaudi2 running ResNet-50 is about twice that of NVIDIA A100, and the performance per watt of running the 176 billion parameter BLOOMZ model is about 1.6 times that of A100. **

At the press conference, Liu Jun, senior vice president of Inspur Information and general manager of AI and HPC, released a new generation of AI server NF5698G7 equipped with Gaudi2 on the spot.

Currently, Intel has cooperated with Inspur Information to create an AI server NF5698G7 based on the Gaudi2 deep learning accelerator. The server integrates 8 Gaudi2 accelerator cards HL-225B, and also includes dual fourth-generation Intel Xeon scalable processors, supporting AI acceleration engines such as AMX/DSA. On the same day, the AI server was also officially released.

Wang Lei, senior product manager of Inspur Information, emphasized that NF5698G7 is a new generation of AI server specially developed for the generative AI market. It supports 8 OAM high-speed interconnected Gaudi2 accelerators and will provide AI customers with large-scale model training and reasoning capabilities.

Liu Hongcheng, vice president of H3C's computing and storage product line, said that based on the Intel Gaudi2 AI accelerator, H3C is cooperating with Intel to develop high-performance AI servers suitable for large-scale model training and reasoning. **

At the same time, Tang Qiming, president of the computing power infrastructure field of Super Fusion Digital Technology Co., Ltd., pointed out that ** Super Fusion and Intel will jointly launch new products and solutions based on Gaudi2. **

Previously, Wang Rui, chairman of Intel China, pointed out in an interview with a reporter from the "Science and Technology Board Daily" that the **ChatGPT wave has brought about a significant increase in computing demand, and is currently developing with Chinese customers including Baidu and Ali. Joint research. **Wang Rui revealed that both high-performance computing and distributed computing have been laid out.

A reporter from the Science and Technology Innovation Board Daily, an Intel technology expert, shared Intel's layout in the field of large models. At the training level, for heterogeneous computing such as CPU, GPU, IPU, and Habana, Intel uses the oneAPI and XPU platforms to provide more choices. In terms of reasoning, after the release of the fourth-generation Sapphire Rapids, based on the AMX accelerator, it can solve more than 80% of customer reasoning needs in the industry. At the same time, it also makes full use of the CPU, including Ponte Vecchio for training, and Arctic Sound for reasoning.

With the upsurge of large models, the business model of cloud computing is evolving towards MaaS (Model as a Service), and the demand for computing power brought by it is also worthy of attention.

"Its idea is to use the pre-training model to train industry data to form a segmented model for specific scenarios and for various industries. We know that the number of parameters of the general model is very large, for example, GPT-3 can reach 175 billion, Deploying these models will be cumbersome, therefore, large models may need to be distilled and compressed to form a model that can be deployed by the industry." said the above-mentioned technical experts.

In addition, Privatized deployment of large models is a potential demand of many industries. "Many subdivided industries do not accept SaaS services, especially financial and other industries. Therefore, Intel is discussing how to miniaturize this model and implement privatized deployment locally so that it can truly be implemented in the industry."

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 1
  • Repost
  • Share
Comment
0/400
SugarCakevip
· 2023-07-12 08:38
Thumbs up
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)