New Chinese LLM Trained with Zero External Data May Slash AI Costs Again
- RoboCap
- 3 days ago
- 2 min read
Researchers from Tsinghua University, the Beijing Institute for General Artificial Intelligence (both in China), and Penn State in the US have developed the hypothetical framework for the first ‘Zero External Interference’ RLVR (Reinforcement Learning with Verifiable Rewards) Large Language Model.
RVLR was first bought to global attention with the emergence of DeepSeek R1 which demonstrated its worth in training LLMs, particularly for tasks requiring complex reasoning. By eliminating dependence on human curated and annotated data, RLVR offered a scalable and efficient alternative for models to teach themselves how to reason through problems as the output was effectively marked by humans and the learnings from that were re-incorporated into the model while in production. This was a big advantage over ‘Supervised Learning’ framework made popular by OpenAI which needed most of this reasoning i.e. “how do I solve this problem?” work done up front before the model was released.
But RLVR still has drawbacks as the framework depends heavily on human/manually curated datasets of questions and answers for initial training. Also, as the model matures, the pace of its evolution is limited by the amount and complexity of the problems asked of it. This is where the addition of the Absolute Zero Reasoner (AZR) model framework comes in which allows the model to iterate itself without any external i.e. human input for either the foundational knowledge base or how to solve it.
The technique enables the model to generate questions and verify answers internally achieving “Zero External Interference and Data.” This advancement significantly reduces humans’ input and improves the scalability prospects of LLMs even further by making cheaper to initially create.
However, because AZR operates as a code executor, the model’s current capabilities are confined to code-related tasks. The researchers also acknowledged that errors still occur in certain scenarios, underscoring the continued necessity of human oversight, even with the adoption of the ‘Absolute Zero’ framework.
Comments