Grabbing the "food and grass" of the AI war, the price of Nvidia's AI graphics card is soaring

Original text: Shinsei

Image source: Generated by Unbounded AI

From the era of mining to the era of AI, the computing power of GPU has come into play again. The computing power reserve competition among domestic and foreign cloud vendors is in full swing...

Byte has purchased 100,000 pieces of A100 and H800 accelerator cards from Nvidia, exceeding US$1 billion (over RMB 7 billion). The purchase volume of Byte alone this year is close to the total sales of commercial GPUs sold by Nvidia in China last year. , Another large enterprise has an order value of at least more than 1 billion yuan.

OpenAI, which already uses about 25,000 Nvidia GPUs, says there is still a shortage of GPUs. ChatGPT needs 13.5 EPLOPS of computing power for every 100 million active users, supported by about 69,000 NVIDIA DGX A100 80G servers, and the current global computing power can only support 100 million average daily online users. Increase.

In the spot circulation market of these high-end GPUs, even half a layman is eager to try when he sees the opportunity, and his eyes are full of "golden bricks".

**The A800, which was originally about 74,000 yuan, has now risen to more than 85,000 yuan, and the high-end version has risen to about 100,000 yuan. ** At the end of April, the futures quotation for the 8-card A800 module on the market was still 900,000, and the delivery period was 2 weeks. Now the quotation has reached more than 1 million, and the delivery period has been extended to 7-8 weeks. In May, A800 was even fired to 100,000 yuan.

"The delivery time for those on the sidelines is longer, and the price is more expensive." Those who lacked a single card went to buy modules, and those who could not buy modules later bought a whole server.

From GPU single card to module to AI server, the price increases are crazy, the shortage is serious, and the delivery time is getting longer and longer. Is the demand in the market really so high? Why is Nvidia's GPU so short? In addition, there is no guaranteed delivery date for brand-new genuine products, the second-hand market, gray channels, the deposit will not be refunded after payment, and the chaos surrounding the AI server is unfolding...

NVIDIA GPU Market Chaos

"I need 1,000 or 3,000 tickets at one go, and I can't supply them at all. Is the demand here real or fake?"

"Is the market good? I have a way to get the goods, but I'm afraid it will be smashed."

**In May, Nvidia’s GPUs were still out of stock and rising in price. **

According to an agent, the price of Nvidia A100 began to rise in December last year. As of the first half of April this year, its cumulative price increase in five months reached 37.5%; the cumulative price increase of A800 reached 20.0% during the same period.

Lead times have been stretched from one month to three months or longer, and some new orders "may not be delivered until December."

Some manufacturers who have a quantitative advantage in AI servers use GPU resources while tightening their belts. Microsoft fell into the dilemma of AI server hardware shortage internally, and adopted a "quota supply" mechanism. In June, it was mentioned in the deleted OpenAI CEO Sam Altman's talk minutes that Open AI also lacks GPUs, and the shortage of GPUs has delayed many short-term plans of customers.

Standing on the wave of artificial intelligence, Nvidia, the "shovel seller", has now almost monopolized the entire AI server chip market and has become the biggest winner. Not long ago, its market value exceeded one trillion yuan, becoming the first chip company in history with a market value exceeding US$1 trillion.

Affected by the US ban, the spot supply of Nvidia A100 in my country has been directly cut off. There are about 40,000 to 50,000 A100s that can be used to train AI large models in China. The supply is quite tight. Internal use is strictly limited.

However, A800, the castrated version of A100 that is currently in normal supply, only started production in the third quarter of last year. Coupled with the rise of new demand, it is facing a shortage of supply. In early May, A800 has risen to 100,000 yuan in China.

Generally speaking, the high-priced A800, A100, etc. reported by the media belong to the top configuration in this series, that is, the GPU memory is 80G, and the interconnection technology supports the version of NVLink.

A friend in the GPU business in the market told us that the price of A800 in the spot market is changing every day, and short-term exchange rate changes will also affect the price. The lowest offer may have risen to 86,000."

As the shortage and price increase intensified, more and more participants were shipping and looking for GPUs, and the chaos in the GPU market began: swarms of inquiries, popular futures market and high deposits, and outflows from second-hand channels The size of the "pit"...

The most direct impression is the noisy demand from all directions. Many market practitioners related to server machines and server GPUs have reported that recently because of the popularity of AI, many people have come to inquire about cards and prices, but not many can actually make deals.

I came to look for A800, A100 single cards, the demand is small, and I am interested in the price, but the demand is large, and hundreds of thousands of cards are required, saying that I am helping customers and friends, and traders from Huaqiangbei are dispatched. It feels like they are setting prices.

"When you come up with so much, how can you supply it? Ask him whether he pays attention to the price advantage or the delivery time advantage, and there is no further information." "Some people have almost no payment for the goods, and they disappear after asking." Easy transaction of purchase.

At present, there are two main distribution channels for the supply of enterprise-level GPUs such as Nvidia: one is the original factory-general agent-dealer-market; the other is the original factory-OEM factory (server manufacturer)-dealer-market . Those that circulate in the spot market are mainly in distribution, server solutions or second-hand markets.

Such enterprise-level products cover downstream customers such as enterprises, schools, and server solution providers. Brother Xu, who focuses on the server business, said that the real demand from Alibaba and other companies is usually a large order. These big customers are given priority to find the general agent or server OEM on the upper level to get the goods, and the lower level is small. Resellers and server parts suppliers are hard to come by. There are also media reports that cloud vendors say that big companies such as Byte and Ali mainly negotiate directly with Nvidia's original factory for procurement, and agents and second-hand markets are difficult to meet their huge needs.

The lack of real demand in the spot market does not prevent everyone from looking for goods. If the spot goods are sold out, they have to book futures. They need to bear high deposits that cannot be refunded, and the delivery date may be far away.

It is understood that A800 cards are still in stock on the market, A100 cards are almost "extinct", and A800 modules are rarely in stock, so they have to choose futures. Taking an 8-card A800 module as an example, a seller quoted a total price of 1.12 million, a deposit of 50%, and a delivery time of 7-8 weeks. There is no guarantee of on-time delivery (the delivery time was still 6 weeks a few days ago), and the deposit No refund. At the end of April, the price of another seller's 8-card A800 module was still 900,000, with a deposit of 30%-50% and a delivery period of 2 weeks.

In more than a month, it can be seen that the price difference of an 8-card A800 module is more than 200,000 yuan, the delivery period is extended, and the deposit may also increase.

A seller who claimed to be the source factory told us that the 8-card module is in stock, and buyers who want it have already placed an order amounting to 500 million yuan, and now they can only wait for the arrival notice.

Paying a deposit seems to be able to queue up early, but high risks coexist. Taking the 8-card A800 module as an example, a 50% deposit is usually charged. A top-of-the-line 80G NV A800 module is more than 1 million, and the deposit is at least 500,000, and once the deposit is paid, it is non-refundable. A buyer on the Internet said that the 10 single cards ordered in March and April this year have not yet received the goods.

The delivery time promised by many sellers is not guaranteed, the basic price is high, and the deposit ratio is also high, so the real money that needs to be paid is more. If you don't get the goods, you can only wait, after all, the money has been paid.

**If you have a batch of A800 and A100 in stock, in the eyes of everyone, they are not ordinary graphics cards, they are gold bricks. **

Some people see the heat and want to do the GPU business but are afraid that the goods will fall into their hands. For individual speculators who are chasing profits, the real demand is doubtful. The price is too high, not worth the candle.

Second-hand products emerge in endlessly, and some people recycle second-hand AI chips at high prices. Let alone where they are used, the warranty of this type of GPU is a problem.

Urgent upstream production capacity: there is no shortage of advanced manufacturing processes, where is the problem?

After ChatGPT became popular, Internet companies and cloud vendors have deployed large AI models more extensively to compete for the large computing power of Nvidia GPUs. What is lacking is not only the A100 and A800, but also the higher-end H100 and H800. Someone asked, there is no shortage of wafer production capacity, why can't GPUs be supplied?

"GPU performance increases by 1000 times every 10 years", "the more you buy, the more you save", Huang's Law will replace Moore's Law. Although advanced manufacturing processes can improve the performance of GPUs, Moore's Law has come to an end, and The purpose of the server is different from the demanding space requirements of mobile phone chips. If advanced manufacturing is the first choice for GPUs, advanced packaging is the icing on the cake.

It is not enough to manufacture GPUs with advanced manufacturing processes. It is a big step to improve GPU performance under the current chip manufacturing process constraints by making the package small, low power consumption, and fewer pins, and realizing a tighter interconnection between chips, chips and packaging substrates. . With any of the four magic Bump, RDL, Wafer and TSV technologies of advanced packaging, new packaging skills can be illuminated.

Nvidia's V100, A100, A800, H100, etc. all adopt TSMC's CoWoS advanced packaging technology, which solves the problem of "storage and calculation integration" of chips under the background of high computing power AI. However, TSMC's 7nm wafer foundry capacity is indeed not short, but this time the shortage is still planted on TSMC.

**First of all, the core technology of CoWoS advanced packaging can only be done by TSMC, and it cannot be done without TSMC. **

The advanced packaging wafer that is lacking now, the technology in it is TSMC’s patent, and Nvidia can only find TSMC to do it. The advanced manufacturing process and advanced packaging are firmly grasped by TSMC. In 2012, TSMC launched its unique CoWoS advanced packaging technology, and since then has a one-stop service from wafer foundry to terminal packaging. The CoWoS family includes CoWoS-S and CoWoS-L/R and other parts, and customers corresponding to high-speed computing applications include many first-tier manufacturers such as NVIDIA. In addition, most of the InFO advanced packaging series are under the package of Apple.

You said outsourcing? The low-tech process is fine, but the core technology is still not enough without TSMC, and other packaging factories can only eat soup.

Recently, in response to temporary needs, TSMC has adopted the method of outsourcing and subcontracting part of the os (on substrate), but it is not the outsourcing of the CoWoS process. TSMC is still focusing on the most valuable advanced packaging part.

TSMC has achieved one-stop packaging from wafer foundry. Google TPU, Nvidia GPU and AMD MI300, which have all imported generative AI chip manufacturers, have contributed a large number of AIGC orders to TSMC, driving the demand for CoWoS expansion.

Secondly, this type of advanced packaging also consumes corresponding packaging production capacity, which is currently in short supply.

On the one hand, advanced packaging is developing towards the upstream wafer process field, that is, wafer-level packaging, which accommodates more pins in a smaller packaging area, and on the other hand, it is expanding towards the downstream module field and developing system-in-package. Recently @手机chip达人 revealed that the CoWoS process is divided into the interposer of the front-end wafer process, and the die-to-die stacked die of the back-end packaging. The lack of TSMC's CoWoS production capacity is due to the lack of 65nm interposer.

The interposer (interposer) here adopts Wafer (wafer) technology. For example, TSMC’s CoWoS packaging technology solution is 2.5D packaging technology, which packages the chip into the silicon interposer (interposer) and uses the high-density wafer on the silicon interposer. Wiring is used to interconnect and then mounted on the package substrate.

Therefore, GPU requires additional wafers in the advanced packaging process, that is, CoWoS capacity. Foreign capital Nomura Securities expects that TSMC's CoWoS annualized production capacity will increase from 70,000 to 80,000 wafers by the end of 2022 to 140,000 to 150,000 wafers by the end of 2023. With the continuous expansion of production capacity, it is estimated that the production capacity of 200,000 wafers will be challenged by the end of 2024.

Filling the gap in wafer-level packaging capacity has become a top priority. Moreover, with the continuous development of wafer technology, the wafer area is gradually increasing. Compared with InFO, CoWoS for the high-end market has a relatively large number of connections and package size. According to @手机chip达人, the demand for 65nm interposer is 1.4 times higher than that of top die (H100).

TSMC's advanced CoWoS packaging production capacity has been seriously in short supply. Since last year, TSMC's CoWoS demand has almost doubled, and the demand will continue to be strong next year. Advanced packaging can only be done by TSMC's Taiwan factory, and TSMC is directly deploying and expanding at each factory to speed up the progress of the advanced packaging process.

"Graphic card shortage" burned to the server real fire or false fire?

For buyers who really need it, using such a high-end graphics card is essentially used for AI servers, either lacking a card, or a module, or a complete machine. ** But in terms of real needs, "BRIC" is actually inferior to servers. **

GPUs for AI are short of price increases, and the prices of servers equipped with them are rising. As early as April, it was reported that Inspur will increase the price of AI servers by about 20%. This is after Nvidia terminated the supply of its top A100 and H100 chips to China decision. Subsequent sources confirmed that the increase was not specified. Another source said that the server supplier only increased the price of Nvidia AI servers, and the prices of other server products did not change.

According to data from Pacific Securities, the top 8 server consumables in China's AI server market will account for 92% in 2022, and Inspur will top the list with 37%. Inspur's financial disclosure shows that Nvidia has been its main chip supplier. In 2019, Intel and Nvidia are the top two chip suppliers of Inspur.

A single GPU card is equivalent to a component of a server. The demand for a single card is more price-sensitive. The server needs to use multiple GPU modules, with a maximum of 8 cards. Even the PCle version of the server currently costs about 800,000 yuan, and the NV version is 100,000 yuan. million level.

At present, the A800 futures price is more than 1.2 million yuan, and the spot price is relatively small. According to a salesperson of a well-known computing service provider, the spot price of the A800 machine is 1.68 million yuan, and the deposit is 50%. "There are 35 units at the end of July. 25 units have been ordered." Even for business use, the price is not cheap. Moreover, according to market news, the normal delivery period of the A800 machine has been scheduled until the end of October, and even many sellers have directly reported for 24 weeks, and the delivery period is close to 6 months.

In the eyes of some friends who are engaged in the whole server business, no matter how popular the GPU is, it is not as close to the real demand as the server, and it is easier to do business. Brother Xu said that he is now focusing on servers, and he is not interested in single cards. Selling servers is to send configurations to customers before quoting. Cards are more troublesome, and it depends on whether the interface is compatible... In short, the server is relatively cheap. To be honest, there are more transactions and a high turnover rate. And the card is just a lot of inquiries, the transaction volume is too small, only large companies ask for a lot. "The ones that require more are almost all for the whole machine, and the ones that are in small quantities are all cards."

In the wide range of customer needs, not all AI servers are top-of-the-line when they come up, and there are a lot of demands when they come up. Whether it is A800 PCIE server, NVlinvk server, or H800 HGX server, etc., enterprises choose configurations according to different needs. Of course, the price of the server will be more transparent, and there will be more opportunities for price comparison.

GPUs and hard disks are one of the accessories of servers. Now that AI has become popular in related servers, high-end GPUs such as the A800 account for a high cost and are in short supply. Among the complete machines, their price sensitivity is also at the forefront.

**The shortage of GPUs alone limits the normal supply of AI servers, which seem to be hot in the market, but are mixed with more false demands. **

This wave of GPU fever is reminiscent of the year 2020, when men frantically speculated on the forehead temperature gun business. At that time, the fission effect of the circle of friends led to a false demand in the market that was infinitely larger than the real demand. As a result, the supply is in short supply, the spot price is much higher than the futures price, the delivery period is not guaranteed, and the default cost of the seller is low.

The days of extreme chip shortages have come to an end. Although high-end GPU production capacity will not be available for a while, the corresponding enterprise-level demand barriers are relatively high, requiring formal technical support and after-sales service, coupled with the sharp increase in demand, all of which are proposed to everyone. Without long-term hard work and a solid foundation, it is difficult to get a share of the trend by relying only on opportunistic means.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)