The Small-Town Youth Labeling Big AI Models
Author | Sleepy.md
In Datong, Shanxi, a city that was once supported by coal and has now shaken off the coal dust, a sharp pickaxe has replaced the coal mines, heading towards another invisible mine.
Inside the office building of Jinmao International Center in Pingcheng District, there are no longer mine shafts or coal trucks. Instead, there are thousands of closely arranged computer workstations. Shanghai Runxun Cloud Sonic Valley Big Data Smart Service Center occupies several floors, with thousands of young employees wearing headphones, staring at screens, clicking, dragging, and selecting.
According to official data, as of November 2025, Datong City has put into operation 745,000 servers, introduced 69 callout data labeling enterprises, driven more than 30,000 people to employment, with an output value of 750 million yuan. In this digital mine, 94% of the practitioners are locals.
It's not just Datong. In the first batch of data labeling bases identified by the National Bureau of Statistics, counties in the western region such as Yonghe County in Shanxi,Bijie in Guizhou, and Mengzi in Yunnan are listed. In the data labeling base in Yonghe County, 80% of the employees are women. Most of them are rural stay-at-home moms or rural youth who cannot find suitable jobs.
A hundred years ago, Manchester's textile factories in the UK were crowded with landless farmers. Today, in the computer screens of these remote county towns, young people who cannot find a place in the real economy are sitting in front of them.
They are engaged in a futuristic yet extremely primitive piece-rate work, producing the necessary data feed for the AI giants in Beijing, Shenzhen, and Silicon Valley.
No one sees any problem with this.
A New Assembly Line on the Loess Plateau
The essence of data labeling is to teach machines about the world.
Autonomous driving needs to recognize traffic lights and pedestrians, and large models need to distinguish between cats and dogs. Machines themselves have no common sense and must have a human draw a box on the image to tell them "this is a pedestrian" before they can learn to recognize it after digesting millions of images.
This job does not require a high education level, only patience, and a finger that can click incessantly.
During the golden age of 2017, a simple 2D box could cost more than one cent, and some companies even offered a high price of half a yuan. A fast-clicking labeler could earn five to six hundred yuan by working ten hours a day. In the county town, this is definitely considered a high-paying and decent job.
But as large models evolved, the harsh reality of this pipeline began to emerge.
By 2023, the unit price of simple image annotation had been driven down to 3 to 4 cents, a drop of over 90%. Even for more challenging 3D point cloud images, where the points are so dense that the edges require significant zooming to be discerned, annotators must meticulously draw a three-dimensional box in space that encompasses length, width, height, and orientation angle to seamlessly wrap around a vehicle or pedestrian. However, the price for such a complex 3D box is only 5 cents.

The direct consequence of this price plunge is a dramatic increase in labor intensity. In order to hold onto a monthly salary of two to three thousand dollars, annotators must continuously and tirelessly improve their speed.
This is by no means an easy white-collar job. In many annotation centers, the management is so strict that it's suffocating; employees are not allowed to answer phone calls during work, and mobile phones must be locked in storage compartments. The system meticulously records every employee's mouse movements and idle time, and if there is a break of more than three minutes, a backend warning will strike like a whip.
Even more frustrating is the tolerance rate. The industry's passing grade is usually above 95%, with some companies even requiring 98%-99%. This means that if you draw 100 boxes and make 2 mistakes, the entire image will be sent back for rework.
Dynamic images consist of frames, with vehicles changing lanes being obscured, forcing annotators to use their imagination to identify each one; in 3D point cloud images, any object with more than 10 points must be boxed. In a complex parking spot project, if the lines are too long or something is missed, quality inspection will always find faults. It's common for an image to be reworked four or five times. In the end, after spending an hour's work, you only earn a few cents.
An annotator in Hunan province posted her settlement statement on social media, showing that after a day's work, she drew over 700 boxes at a rate of 4 cents each, earning a total of 30.2 yuan.
This is an extremely fragmented scene.
On one side are the shiny tech giants at conferences discussing how AGI will liberate humanity; on the other side, in county towns on the Loess Plateau and in the mountains of the southwest, young people stare at screens for eight to ten hours a day, mechanically drawing boxes, thousands, tens of thousands, and even dreaming at night, their fingers tracing lane lines in the air.
Someone once said that the facade of artificial intelligence is a roaring luxury car, but when you open the door, you'll find a hundred people pedaling bicycles inside, gritting their teeth and pedaling hard.
No one thinks there's anything wrong with this.
The Piecework Craftsman Teaching Machines "How to Love"
After breaking through the bottleneck of image recognition, large models have undergone a deeper evolution, needing to learn to think, converse, and even show "empathy" like humans.
This has given rise to the most critical and expensive part of large-scale model training — RLHF (Human Feedback-based Reinforcement Learning).
In simple terms, it involves having real people score AI-generated responses, telling it which answers are better, more aligned with human values, and emotional preferences.
The reason ChatGPT looks "human-like" is because behind it, there are countless RLHF annotators teaching it.
On crowdsourcing platforms, such annotation tasks are often clearly priced: a unit cost of 3 to 7 RMB. Annotators need to provide extremely subjective emotional scores to AI responses to assess whether the response is "warm," "empathetic," or "considerate of the user's emotions."
Someone earning a mere couple thousand RMB per month, struggling in the mud of reality, barely able to attend to their own emotions, is now required in the system to act as AI's emotional mentor and arbiter of values.

They need to forcibly break down warmness, empathy, and other highly complex, subtle human emotions into cold scores ranging from 1 to 5. If their scores do not align with the system's predefined correct answers, their accuracy will be deemed insufficient, leading to deductions from their meager piecework wages.
This is a cognitive drain. Human emotions, morals, and compassion, so intricate and nuanced, are being forcibly squeezed into the algorithm's funnel. In the ice-cold realm of quantification and standardization scales, they are drained of their last bit of warmth. While you marvel at the cyber behemoth on the screen having learned to write poetry, compose music, show care, and even donned a skin of melancholy sensitivity; off-screen, that group of once lively humans has, through daily mechanical judgments, regressed into emotionless scoring machines.
This is the most secretive side of the entire industry chain, never appearing in any funding news or tech whitepapers.
No one thinks there's anything wrong with this.
985 Master's Degree Holder vs. Small-town Youth
Low-level assembly line work is being crushed by AI's treads, causing this cybernetic conveyor belt to spread upwards, beginning to engulf higher-order brain labor.
The appetite of large models has changed. No longer satisfied with chewing on basic common sense, they now require devouring human expertise and advanced logic.
On various major job recruitment platforms, a new type of part-time job has begun to appear frequently, such as "Large Model Logical Reasoning Annotation" and "AI Humanities Trainer." This part-time job has an extremely high threshold, often requiring a "master's degree or above from Project 985/Project 211 universities" and involving professional fields such as law, medicine, philosophy, and literature.

Many graduate students from prestigious universities are attracted to and joining the outsourcing groups of these tech giants. However, they quickly realize that this is not some easy mental exercise but rather a form of mental torture.
Before formally taking on tasks, they must read through dozens of pages of scoring dimensions and evaluation criteria documents, and undergo two to three rounds of trial annotations. Upon meeting the standards, during formal annotation, if their accuracy falls below the average level, they will lose their qualification and be kicked out of the group chat.
Most suffocating of all is that these standards are not fixed at all. Faced with similar questions and answers, scoring them with the same thinking process may yield completely opposite results. It's like working on a never-ending exam paper with no standard answer. Accuracy cannot be improved through self-effort or study; one can only spin in place endlessly, depleting both mental and physical energy.
This is the new form of exploitation in the era of large models—class folding.
Knowledge, once seen as a golden ladder to break barriers and climb upwards, has now become a more complex digital fodder offered to algorithms for chewing. In the face of the absolute power of algorithms and systems, the master's students from elite universities in their ivory towers and the young people from small towns on the Loess Plateau have embarked on the most bizarre convergence path.
Together, they plummet into this bottomless cyber-mining pit, stripped of their halos, erasing differences, all turned into cheap gears on the conveyor belt that can be replaced at any time.
It's the same overseas. In 2024, Apple directly laid off a 121-member AI voice annotation team in San Diego. These employees were responsible for improving Siri's multilingual processing capabilities. They once thought they stood at the core business edge of a tech giant, only to instantly plunge into the abyss of unemployment.
In the eyes of tech giants, whether it's a middle-aged lady running a grocery store in a small county or a logic trainer with a prestigious education, fundamentally, they are all "consumables" that can be replaced at any time.
No one thinks there is anything wrong with this.
A Trillion-Dollar Tower of Babel, Built with a Few Cents of Exploitation
According to data released by the China Information and Communications Research Institute, the Chinese data annotation market reached a scale of 6.08 billion yuan in 2023 and is expected to reach 20-30 billion yuan by 2025. It is predicted that by 2030, the global data annotation and service market sales will skyrocket to 117.1 billion yuan.
Behind these numbers are tech giants such as OpenAI, Microsoft, and ByteDance, with valuations reaching the trillions of dollars.
However, this sky-high wealth has not flowed to those who truly "feed" AI.
In China's data labeling industry, a typical inverted pyramid outsourcing structure is evident. At the top are the tech giants tightly holding the core algorithms; the second level consists of large data service providers; the third level comprises data labeling centers and small to medium-sized outsourcing companies scattered across the country; only at the bottom do we find the piece-rate earning foot soldiers - the labeling workers.
Each outsourcing layer takes a hefty cut. When the big factories offer a unit price of 0.5 RMB, after layers of exploitation, what ends up in the hands of a labeling worker in a county town may be less than 0.05 RMB.
In his book "Techno-Feudalism," former Greek Finance Minister Yanis Varoufakis put forth a penetrating viewpoint: today's tech giants are no longer capitalists in the traditional sense but "Cloudalists."
They do not own factories and machinery but algorithms, platforms, and computing power, the digital territories of the cyber era. In this new feudal system, users are not consumers but digital serfs. Every like, comment, and browse on social media is free labor supplying data to the Cloudalists.
Meanwhile, the data labeling workers in emerging markets are the lowest-tier digital serfs in this system. They not only have to produce data but also clean, categorize, and rate massive raw data, transforming it into high-quality feed that large models can digest.
This is a secretive cognitive enclosure movement. Similar to how the Enclosure Acts of 19th century England drove farmers into textile factories, today's AI wave is pushing young people who cannot find a place in the physical economy in front of screens.
AI has not flattened the class divide; instead, it has established a "Data and Blood-Sweat Conveyor Belt" from small counties in central and western China directly to the headquarters of tech giants in Beijing, Shanghai, Guangzhou, and Shenzhen. The narrative of technological revolution is always grand and magnificent, but its foundation is forever the scaled consumption of cheap labor.
No one seems to think there's anything wrong with this.
A Tomorrow Without the Need for Humans
The most brutal conclusion is fast approaching, faster and faster.
With the rise of large-scale model capabilities, tasks that once required human labor day and night to complete are being taken over by AI itself.
In April 2023, Li Xiang, the founder of Ideal Auto, revealed at a forum that in the past, Ideal used to manually label approximately 10 million frames of autonomous driving images in a year, with outsourcing costs close to one billion. However, after they employed large models for automated labeling, what used to take a year to accomplish can now be done in about 3 hours.
Efficiency is 1000 times that of humans, and it was achieved as early as 2023. In the last March alone, Ideals released the next-generation MindVLA-o1 automatic annotation engine.
A grimly true self-deprecating saying circulates in the industry: "The more intelligence, the more artificial." But now, there has been a cliff-like 40%-50% drop in outsourcing for data annotation by tech giants.
Those young people from small towns who have sat in front of computers for countless days and nights, their eyes bloodshot from the strain, have personally raised a behemoth. And now, this behemoth is turning around, shattering their rice bowls.
As night falls, the office buildings in Datong's Pingcheng District remain as bright as day. The young people on shift silently exchange their weary shells in the elevator lobby. In this folded space imprisoned by innumerable polygons, no one cares about the epic leap of the Transformer architecture on the other side of the ocean, nor does anyone understand the roar of computing power behind the hundred billion parameters.
Their gaze is welded to the backstage's red/green progress bar representing the "passing line," calculating whether the meager piecework numbers can patch together a decent life by the end of the month.
On one side, the closing bell of the Nasdaq and the continuous coverage by tech media have the giants raising their glasses in celebration of AGI's advent; on the other side, these digital serfs who have fed AI with their flesh and blood can only, in the midst of aching sleep, nervously wait for the behemoth they have raised with their own hands to nonchalantly kick away their rice bowls on an ordinary morning.
No one thinks there's anything wrong with this.
You may also like

Prediction Markets Under Bias

Stolen: $290 million, Three Parties Refusing to Acknowledge, Who Should Foot the Bill for the KelpDAO Incident Resolution?

ASTEROID Pumped 10,000x in Three Days, Is Meme Season Back on Ethereum?

ChainCatcher Hong Kong Themed Forum Highlights: Decoding the Growth Engine Under the Integration of Crypto Assets and Smart Economy

Why can this institution still grow by 150% when the scale of leading crypto VCs has shrunk significantly?

Anthropic's $1 trillion, compared to DeepSeek's $100 billion

Geopolitical Risk Persists, Is Bitcoin Becoming a Key Barometer?

Annualized 11.5%, Wall Street Buzzing: Is MicroStrategy's STRC Bitcoin's Savior or Destroyer?

An Obscure Open Source AI Tool Alerted on Kelp DAO's $292 million Bug 12 Days Ago

Mixin has launched USTD-margined perpetual contracts, bringing derivative trading into the chat scene.
The privacy-focused crypto wallet Mixin announced today the launch of its U-based perpetual contract (a derivative priced in USDT). Unlike traditional exchanges, Mixin has taken a new approach by "liberating" derivative trading from isolated matching engines and embedding it into the instant messaging environment.
Users can directly open positions within the app with leverage of up to 200x, while sharing positions, discussing strategies, and copy trading within private communities. Trading, social interaction, and asset management are integrated into the same interface.
Based on its non-custodial architecture, Mixin has eliminated friction from the traditional onboarding process, allowing users to participate in perpetual contract trading without identity verification.
The trading process has been streamlined into five steps:
· Choose the trading asset
· Select long or short
· Input position size and leverage
· Confirm order details
· Confirm and open the position
The interface provides real-time visualization of price, position, and profit and loss (PnL), allowing users to complete trades without switching between multiple modules.
Mixin has directly integrated social features into the derivative trading environment. Users can create private trading communities and interact around real-time positions:
· End-to-end encrypted private groups supporting up to 1024 members
· End-to-end encrypted voice communication
· One-click position sharing
· One-click trade copying
On the execution side, Mixin aggregates liquidity from multiple sources and accesses decentralized protocol and external market liquidity through a unified trading interface.
By combining social interaction with trade execution, Mixin enables users to collaborate, share, and execute trading strategies instantly within the same environment.
Mixin has also introduced a referral incentive system based on trading behavior:
· Users can join with an invite code
· Up to 60% of trading fees as referral rewards
· Incentive mechanism designed for long-term, sustainable earnings
This model aims to drive user-driven network expansion and organic growth.
Mixin's derivative transactions are built on top of its existing self-custody wallet infrastructure, with core features including:
· Separation of transaction account and asset storage
· User full control over assets
· Platform does not custody user funds
· Built-in privacy mechanisms to reduce data exposure
The system aims to strike a balance between transaction efficiency, asset security, and privacy protection.
Against the background of perpetual contracts becoming a mainstream trading tool, Mixin is exploring a different development direction by lowering barriers, enhancing social and privacy attributes.
The platform does not only view transactions as execution actions but positions them as a networked activity: transactions have social attributes, strategies can be shared, and relationships between individuals also become part of the financial system.
Mixin's design is based on a user-initiated, user-controlled model. The platform neither custodies assets nor executes transactions on behalf of users.
This model aligns with a statement issued by the U.S. Securities and Exchange Commission (SEC) on April 13, 2026, titled "Staff Statement on Whether Partial User Interface Used in Preparing Cryptocurrency Securities Transactions May Require Broker-Dealer Registration."
The statement indicates that, under the premise where transactions are entirely initiated and controlled by users, non-custodial service providers that offer neutral interfaces may not need to register as broker-dealers or exchanges.
Mixin is a decentralized, self-custodial privacy wallet designed to provide secure and efficient digital asset management services.
Its core capabilities include:
· Aggregation: integrating multi-chain assets and routing between different transaction paths to simplify user operations
· High liquidity access: connecting to various liquidity sources, including decentralized protocols and external markets
· Decentralization: achieving full user control over assets without relying on custodial intermediaries
· Privacy protection: safeguarding assets and data through MPC, CryptoNote, and end-to-end encrypted communication
Mixin has been in operation for over 8 years, supporting over 40 blockchains and more than 10,000 assets, with a global user base exceeding 10 million and an on-chain self-custodied asset scale of over $1 billion.

$600 million stolen in 20 days, ushering in the era of AI hackers in the crypto world

Vitalik's 2026 Hong Kong Web3 Summit Speech: Ethereum's Ultimate Vision as the "World Computer" and Future Roadmap

On the same day Aave introduced rsETH, why did Spark decide to exit?

Full Post-Mortem of the KelpDAO Incident: Why Did Aave, Which Was Not Compromised, End Up in Crisis Situation?

After a $290 million DeFi liquidation, is the security promise still there?

ZachXBT's post ignites RAVE nearing zero, what is the truth behind the insider control?

Vitalik 2026 Hong Kong Web3 Carnival Speech Transcript: We do not compete on speed; security and decentralization are the core















