AI new era with RL generalization
Two profound blogs about AI new era with RL generalization are worth reading… both of them are insightful jusk like we read “The Bitter Lesson” in 2019.
1.Second Half by 姚顺雨 in 2025
Here are a few excerpts I highlight from Yao’s blog…
- Once we turn all digital worlds into an environment, solve it with smart RL algorithms, we have digital AGI.
- It turned out the most important part of RL might not even be the RL algorithm or environment, but the priors
- language generalizes through reasoning in agents.
- Evaluation “should” run automatically
- Evaluation “should” run i.i.d
- the way to play the new game of the second half is
- We develop novel evaluation setups or tasks for real-world utility.
- We solve them with the recipe or augment the recipe with novel components. Continue the loop.
2.Welcome to the Era of Experience by David Silver, Richard S. Sutton in 2025
Here are a few excerpts I highlight from the paper :
To progress significantly further, a new source of data is required. This data must be generated in a way that continually improves as the agent becomes stronger; any static procedure for synthetically generating data will quickly become outstripped. This can be achieved by allowing agents to learn continually from their own experience, i.e., data that is generated by the agent interacting with its environment. AI is at the cusp of a new period in which experience will become the dominant medium of improvement and ultimately dwarf the scale of human data used in today’s systems.
Our contention is that incredible new capabilities will arise once the full potential of experiential learning is harnessed. This era of experience will likely be characterised by agents and environments that, in addition to learning from vast quantities of experiential data, will break through the limitations of human-centric AI systems in several further dimensions:
- Agents will inhabit streams of experience, rather than short snippets of interaction.
- Their actions and observations will be richly grounded in the environment, rather than interacting via human dialogue alone.
- Their rewards will be grounded in their experience of the environment, rather than coming from human prejudgement.
- They will plan and/or reason about experience, rather than reasoning solely in human terms
… … …
The era of experience marks a pivotal moment in the evolution of AI. Building on today’s strong foundations, but moving beyond the limitations of human-derived data, agents will increasingly learn from their own interactions with the world. Agents will autonomously interact with environments through rich observations and actions. They will continue to adapt over the course of lifelong streams of experience. Their goals will be directable towards any combination of grounded signals. Furthermore, agents will utilise powerful non-human reasoning, and construct plans that are grounded in the consequences of the agent’s actions upon its environment. Ultimately, experiential data will eclipse the scale and quality of human generated data. This paradigm shift, accompanied by algorithmic advancements in RL, will unlock in many domains new capabilities that surpass those possessed by any human.
What’s more, the screenshot of post: Sutton’s X post reply to Google DeepMind’s and the podcast: Is Human Data Enough? With David Silver are as followed:
- Sutton’s X post
(data source: Richard S.Sutton x post)
- podcast: Is Human Data Enough? With David Silver
(video source: Google DeepMind)
Enjoy reading…