What optimizations and breakthroughs will be brought about when the big model fights into 2.0

Original text: The Paper, author: Che Xingyun

Image source: Generated by Unbounded AI‌

In June of this year, major manufacturers have upgraded their ChatGPT-like products. On June 9, Xunfei launched an upgraded version of the Xinghuo Cognitive Big Model; on June 13, after releasing the ChatGPT-like product, 360 held the 360 Smart Brain Big Model Application Conference again.

Different from the large models released around February, the upgraded products released by various companies recently are more biased towards the application layer, and the purpose is to make it easier for thousands of households.

Judging from the current release, 360 Intellectual Brain has initially possessed cross-modal generation capabilities. In addition to generating text, tables, and pictures from text; generating text and pictures from pictures, and generating text from videos, and cutting videos from text and other basic creations In addition, it also redefines "digital human" to give users a customizable and exclusive "artificial intelligence" that "has a soul, a human design, and a memory".

At present, the application scenario with the closest distance between 360 Smart Brain and users is 360's existing family bucket. Zhou Hongyi said at the press conference that "360 Smart Brain 4.0" will be connected to 360 Security Guard, 360 Browser, 360 Search, etc. Human-machine collaboration.

At the press conference, Zhou Hongyi changed his previous point of view, "I once said that the gap between the domestic large-scale model and ChatGPT is two years, and now I want to take this sentence back." The level is on par with GPT3.5, and if it develops at this speed, it will catch up with or even surpass GPT4 in a blink of an eye.

Within four months from the release of the initial version to the official release of 360 Smart Brain, did Zhou Hongyi see such a huge change?

Science and technology giant hunting large model

The "China Artificial Intelligence Large-scale Model Map Research Report" released at the 2023 Zhongguancun Forum shows that at present, China's artificial intelligence large-scale models are showing a trend of vigorous development. According to incomplete statistics, as of now, 79 large-scale models with parameters of more than 1 billion have been released nationwide.

However, the parameters of the large-scale models of major technology companies are relatively large: the parameters of the Alibaba Tongyi Qianwen large model are above 10 trillion, the parameters of the Tencent Hunyuan large model and Huawei’s Pangu large model are all above one trillion, and Baidu Wenxin’s large model has parameters of more than one trillion. The parameter volume of the large-scale model is more than 200 billion, and the parameter volume of the JD Yanxi large model is 100 billion; the parameter volume of technology companies in vertical industries is generally more than 100 billion; and the parameter volume of the large model of scientific research institutions At the hundred billion level and below.

From the perspective of the layout system of the large model, the major technology companies have carried out a four-in-one comprehensive layout in the computing power layer, platform layer, model layer, and application layer. Baidu, Ali, and Huawei all have a comprehensive layout of independent research and development from chips to applications, such as Baidu's "Kunlun core + flying paddle platform + Wenxin large model + industry application", Ali's "Hanguang 800 chip + M6-OFA base + Tongyi large model + industry application", Huawei's "Shengteng chip + MindSpore framework + Pangu large model + industry application".

In addition, Kingsoft Office also released WPS AI on May 31. At present, WPSAI has been connected to Kingsoft Office's office components such as light documents, text, tables, presentations, and PDFs. In the future, it will anchor AIGC, reading comprehension, question and answer, and human-computer interaction. development in a strategic direction, and access to the full line of Kingsoft Office products.

The rapid influx of various major manufacturers into this track is mainly due to the rapid follow-up and introduction of measures by the regulatory authorities to regulate the development of the industry. With the escort of the top-level structure, each major manufacturer can naturally invest in research and development and launch products with confidence.

Since the large-scale model was launched in batches in March this year, AI regulatory policies have gradually become clear, which has also pointed out the direction for industry applications.

Looking back on the development of the entire industry, on April 11, the "Generative Artificial Intelligence Service Management Method" was released for comments; on May 30, the Institute of Information and Communications Technology is jointly preparing the "Kite" open artificial intelligence model license, and the next step will be to issue the " Zhikite Open Artificial Intelligence Model License (Draft for Comments).

Subsequently, the first-tier cities jointly released the "Implementation Plan for Beijing to Accelerate the Construction of a Globally Influential Artificial Intelligence Innovation Source (2023-2025)"; Year)".

In this context, Zhou Hongyi believes that the domestic large model will quickly narrow the gap with ChatGPT, which seems to be easy to understand.

What's the difference between 360 Smart Brain

According to Zhou Hongyi's plan, the 360 large model will be based on continuous upgrading of the large model, taking into account sceneization, productization, flattening and verticalization.

Under this development strategy, 360 Smart Brain can achieve consumer (user personal AI assistant), small and medium-sized enterprises (SaaS vertical application), enterprises/governments/cities (privatized deployment model), industry (industry vertical model) and other four main application scenarios.

In order to better meet the needs of the above-mentioned different scenarios, the general large-scale model needs to complete the transformation from inputting text to outputting text, to understanding images and videos, and being able to produce images and videos on the existing basis, which is equivalent to making the large-scale model With "ears" and "eyes", it lays the foundation for creating a "digital human".

Traditional digital humans only need to output according to the established script, but in the era of large models, 360 digital humans can be customized, so people are designed, have memories, and experience. Currently, there are more than 200 characters in the 360 digital human square platform , divided into two categories: digital celebrities and digital employees. 360 hopes that in the future, everyone will have their own AI assistant and have the opportunity to communicate with ancient people in virtual space, across time and space.

At the demonstration meeting, Zhou Hongyi asked "Zhuge Liang" how he thinks it has become a material for ghosts and animals today, and the digital man replied in Zhuge Liang's tone: the fate of the past and the present is inevitable. In today's situation, the world is in turmoil. Although I am old, I still aim for the world. Today's young people use me as a material for ghosts and animals, and I readily accept this change. And I wish young friends to go forward bravely on the road ahead and create a better future.

At the same time, Zhou Hongyi also emphasized that the form of digital humans in the future will also have its own goals, planning and decomposition capabilities, so that various vertical models can be called to complete tasks.

However, these functions are actually optimizations based on existing large-scale model applications, and have not opened up a new field. But in fact, when the big model has made a breakthrough, the most creative application scenario is unmanned driving.

** Driverless driving has a chance to enter the fast lane **

Looking back at the field of unmanned driving, since 2016, major manufacturers have been deploying in this field, but until this year, none of them can achieve true unmanned driving.

At present, an L2+ level unmanned driving system needs 10+ cameras; 1-2 lidars; or 3-5 millimeter-wave radars to provide multi-dimensional data, which can be used for model training after manual labeling. After the emergence of large models that can recognize images, the time cost and material expenditure required for manual labeling will drop sharply.

According to the Momo Zhixing DriveGPT press conference in April 2023, to obtain information such as lane lines, traffic participants, traffic lights, etc., the cost of manual labeling in the industry is about 5 yuan per picture, and the cost of Momo DriveGPT is 0.5 yuan. We believe that after the large-scale model training of technology companies matures, the marginal cost of automatic labeling of a single image will approach 0, and the average cost is expected to further decrease.

According to Zhang Peng, vice president of Kaiwang Data Products Project, in February 2023, at present, manual labeling is the main method of data labeling, supplemented by machine labeling, and 95% of data labeling is still mainly manual. The intervention of large models can greatly improve the efficiency of this industry. Taking Tesla as an example, the manual labeling team will have more than 1,000 people in 2021, and the team will lay off more than 200 people in 2022.

In addition, in the era of large models, third-party technology giants are expected to help OEMs build their own autonomous driving algorithms and data closed-loop systems by providing a complete tool chain, while relying on the data generation capabilities of large models to narrow the gap in the data field , the Android era of autonomous driving is expected to come.

At present, large models have been used to enable data closed-loop, simulation, perception algorithms, regulation and control algorithms and other fields. And giants such as Microsoft and Nvidia are vying for layout in large models and autonomous driving, or will spark new sparks.

In addition, the emergence of large models also promotes the division of labor in the industry, avoids "reinventing the wheel", and accelerates the iteration of sensors and chips, and the system cost is expected to drop significantly. Large-scale model developers and players in the autonomous driving industry chain are expected to benefit in an all-round way.

Taking Baidu Apollo as an example, it first uses graphic information to pre-train an original model, uses algorithms to identify, locate and segment street view image data, and puts them into the encoder to form a base library, that is, establishes a correspondence between pictures and text information based on street view data pool.

Secondly, it is possible to search and mine specific scenes (such as express vehicles, wheelchairs, children, etc.) through text and images, and conduct customized training on the vehicle-side model, which greatly improves the utilization effect of stock data.

Baidu uses a semi-supervised method to make full use of 2D and 3D data to train a large perception model. By distilling the small model in multiple steps, the performance of the small model is improved, and at the same time, the small model is customized for training through automatic labeling, which is used to enhance the long-distance visual 3D perception ability and improve the perception effect of the multi-modal perception model.

Another leading player, SenseTime, also publicly stated that AIGC can be used to generate real traffic scenes and difficult samples to train the automatic driving system, and multi-modal data can be used as the input of the large model to improve the upper limit of the system's perception of cornercase scenes. .

At the same time, the multi-modal large model of autonomous driving can realize the integrated integration of perception and decision-making, and the 3D environment can be reconstructed through the environment decoder at the output end to realize the visual understanding of the environment; the behavior decoder can generate a complete path planning; the motivation decoder can be used Natural language describes the reasoning process, making the autonomous driving system safer and more reliable.

After the large model realizes the above functions, the threshold for unmanned driving will become lower and lower in the future. While leading manufacturers accelerate the progress of unmanned driving projects, they can also allow more new players to join this field and develop roads that require roads in addition to road navigation. The track of the planning function, such as further optimizing the path planning of the sweeping robot.

Looking at it now, after the centralized release period of large-scale models from February to March, and the product development period from April to May and the policy direction have been gradually clarified, June has entered the period of centralized release of AI large-scale model products and applications. This also directly leads to the price reduction of OpenAI API.

In the foreseeable future, AI technology will continue to iterate, and applications will continue to advance. At the same time, more and more major technology companies will launch products to cut into this track, which will continue to boost the industry's prosperity and bring more benefits to users. GPT-like products that meet market demand, such as Tencent, which has a huge user base, also released a technical solution in the field of large models on June 19.

When these companies roll together, the development of the industry will enter the fast lane, and it also means that C-end users will soon be able to use this product. As for who will pay for it, each manufacturer needs to rely on its own ability.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)