Understanding AI Through the Lens of Tokens

The rise of tokens is revolutionizing the AI industry, reshaping models, computing power, data, applications, and governance.

Introduction

On May 18, a report titled “Understanding Artificial Intelligence Must Start from Understanding Tokens” was published by Xinhua Daily Telegraph.

In early 2026, a set of data sparked heated discussions in the global AI industry. OpenRouter, the world’s largest AI model API aggregation platform, reported that from February 9 to 15, the token call volume of Chinese large models reached 4.12 trillion, surpassing the 2.94 trillion of U.S. models for the first time in history. This lead continued for several weeks, breaking through 7.3 trillion by mid to late March, with four of the top five models in global call volume coming from China.

This data is not presented to compare “who has more or less” but marks a quiet revolution in the basic measurement unit of the AI industry—tokens, which are becoming the “kilowatt-hour” of the intelligent era. The meanings of six dimensions—models, computing power, data, applications, industry, and governance—are profoundly reshaped by this measurement unit. Understanding AI in 2026 must begin with understanding tokens.

Sixfold Reconstruction from a Measurement Unit

The measurement unit of the industrial revolution was the “kilowatt-hour,” allowing energy to be accurately measured, priced, and transported across domains. The information revolution’s unit was “bits” and “bandwidth,” enabling information to be packaged, transmitted, and billed for the first time. The measurement unit of the intelligent revolution is “tokens,” allowing “intelligence” to be segmented, measured, priced, and traded for the first time.

The popularization of the token concept and its rapid growth in call volume are gradually pushing “intelligence” towards industrialization, marketization, and circulation.

Models

The economic value of large models is shifting from one-time training costs to ongoing inference output. Model vendors no longer simply “sell capabilities” but directly “sell tokens”—pricing based on millions of tokens for input and output has become a global industry norm. The asset attributes of models are transitioning from “weight files” to “the ability to continuously produce tokens.”

Computing Power

The focus is shifting from “training computing power” to “inference computing power.” Training computing power is pulse-based and centralized, while inference computing power is continuous and distributed, posing new requirements for latency, energy efficiency, and geographical distribution. The collaboration of three levels of computing power—cloud, edge, and end—along with inference-specific chips and optical interconnects, is becoming the new focus of infrastructure. JPMorgan predicts that China’s inference token consumption will grow by more than two orders of magnitude by 2030 compared to 2025.

Data

Data must be cleaned, labeled, and tokenized before entering large models, similar to how raw coal needs to be processed into standard fuel for power generation. In long-tail scenarios like autonomous driving, robot training, and scientific discovery, synthetic data generated through simulation has achieved large-scale application. The construction of a data factor market is entering a substantial phase, with “trainability” and “token output density”—rather than just data scale—becoming new metrics for data asset pricing. This shift is significant: the valuation of data is beginning to be linked to its actual contribution in the token production chain, providing a more solid economic foundation for the market-oriented allocation of data factors.

Applications

Traditional software charges based on seats and functionalities; today, applications are billed according to token call volume and business results. Intelligent agents are becoming the main consumers of tokens, with a complex task potentially consuming hundreds of thousands or even millions of tokens. The “intelligent agent as a service” market is rapidly expanding, with performance-based billing models being implemented at scale in customer service, marketing, compliance, and programming scenarios. The essence of applications is shifting from “delivering functions” to “consuming intelligence.”

Industry

A new industry chain is forming around tokens, encompassing production (models and computing power), distribution (inference networks, APIs, intelligent agent protocols), consumption (applications and intelligent agents), and measurement (evaluation benchmarks, auditing, and trusted verification). The boundaries between model layers, inference service layers, intelligent agent middleware layers, and industry application layers are becoming increasingly clear, with industry-specific intelligent agents becoming mainstream investments. Model vendors, cloud vendors, chip manufacturers, green power operators, and content distribution network providers together form a collaborative ecosystem for the token industry chain. According to data from the China Academy of Information and Communications Technology, the scale of China’s core AI industry is expected to exceed 1.2 trillion yuan by 2026, with the collaborative effects of the entire industry chain becoming evident.

Governance

The governance focus is shifting from “algorithm governance” to “full-chain governance of tokens.” As the AI industry has developed, the governance objects have expanded from “algorithms and code” to the entire chain of token production, circulation, consumption, and cross-border flow: traceability of tokens, identification of synthetic content, cross-border token flow, constraints on computing power and energy consumption, and trusted evaluation and benchmarks—all call for new governance tools and rules. The year 2026 may become a key year for the concentrated implementation of global AI governance rules.

China’s Position in the Global Token Wave

In the global wave brought by tokens, China is forming a unique position supported by multiple factors.

On the production side, domestic models are rising in clusters. A number of domestic models, such as MiniMax, Dark Side of the Moon, Deep Quest, Zhipu, Alibaba Qianwen, and Byte Bean, are leveraging mixed expert architectures and extreme engineering optimizations to enhance performance while reducing inference prices to a fraction of comparable global models. On the OpenRouter platform, U.S. users account for 47%, while Chinese users make up only about 6%, yet the call volume is led by Chinese models—this is a recognition voted by global developers.

On the consumption side, applications are penetrating deeper into everyday life at an unprecedented speed. A general practitioner in a county hospital, faced with a suspicious lung CT, has AI circle nodules and provide differential diagnosis suggestions within seconds and thousands of tokens, compressing what used to take two weeks of consultation into a single outpatient visit. A farmer in Shouguang, Shandong, uses a smartphone to photograph a curling cucumber, and a smart agriculture app utilizes tokenized agricultural knowledge to inform him whether it is thrips or a viral disease and which medication to use. An elderly person living alone says “I feel chest tightness” to a smart speaker in their dialect, and after a conversation of several thousand tokens, their children’s phones receive a warning and location sharing for emergency services. Delivery riders no longer hear mechanical instructions like “turn right ahead” but receive route planning based on real-time traffic and elevator wait times. AI assistants in government service halls respond around the clock to inquiries about medical insurance transfers and property registrations, replacing “people running errands” with “tokens running errands”… Tokens are becoming the “invisible labor force” across various industries.

At the industry chain level, a full-stack collaborative ecosystem is rapidly taking shape. From domestic chips like Ascend, Cambricon, and Haiguang to inference service platforms like Volcano Engine, Alibaba Cloud, and Tencent Cloud, along with a range of open-source middleware and industry-specific intelligent agents, the entire industry chain covering chips, computing power, models, middleware, and applications is quickly improving. The “East Data West Computing” project provides low-cost computing power, while green power directly supplies data centers, solidifying the energy foundation.

However, it is essential to recognize that there is still significant room for improvement in areas such as original model innovation, high-end computing power infrastructure, cross-language and cross-cultural ecological influence, and participation in global rule-making.

The second half of the token wave is not about “already winning” but rather “just beginning.” In the global picture unfolded by small tokens, China is not only a vast market but also a proactive builder and responsible co-governor. Understanding tokens means understanding the next phase of artificial intelligence.

Was this helpful?

Likes and saves are stored in your browser on this device only (local storage) and are not uploaded to our servers.

Comments

Discussion is powered by Giscus (GitHub Discussions). Add repo, repoID, category, and categoryID under [params.comments.giscus] in hugo.toml using the values from the Giscus setup tool.