The quest for more human-like artificial intelligence is leading researchers down unexpected paths, specifically into the virtual worlds of video games and complex digital simulations. A new report highlights General Intuition, a startup that recently secured $320 million in funding, betting that vast amounts of action data from gameplay can teach AI agents to develop something akin to human intuition. This isn't just about AI playing games better; it's about using these rich, interactive environments to train AI for real-world tasks, potentially accelerating their development far beyond traditional methods.
General Intuition's approach hinges on the idea that the millions of hours of decision-making and interaction within video games provide a uniquely scalable training ground. Imagine an AI learning to navigate a complex urban environment by playing Grand Theft Auto, or mastering intricate problem-solving through strategy games. The company believes this extensive exposure to dynamic scenarios can help AI agents develop a deeper understanding of cause and effect, prediction, and adaptability, skills crucial for operating in our unpredictable physical world.
This concept extends beyond entertainment. Academic research, for example, is applying similar principles to a critical real-world problem: cybersecurity. A new benchmark called CyberChainBench, detailed in a recent arXiv paper, evaluates how LLM-based agents, which are large language models like the technology behind ChatGPT, can protect smart contracts. Smart contracts are self-executing digital agreements on a blockchain, often used in decentralized finance (DeFi), and they are highly vulnerable to exploits.
CyberChainBench is built from 541 real-world exploit incidents gathered from DeFiHackLabs, spanning nine different blockchain networks. The benchmark provides an end-to-end evaluation where AI agents interact with historical blockchain data through isolated environments. These agents use tools to read code, trace transactions, and even validate exploits on simulated versions of mainnet blockchains. Each case is tied to a specific block in time and includes detailed information on the vulnerability, its location, and the attacker's profit.
The research defines a five-type vulnerability taxonomy and tests various agent-model configurations across three tasks: detecting vulnerabilities, generating exploits, and patching those vulnerabilities. The results reveal a clear hierarchy of difficulty for the AI. The best performing configuration scored 37.5% on detection and 43.7% on exploit generation, but a significantly lower 23.4% on patching. This suggests that while AI can identify and even mimic attacks, the nuanced task of fixing complex code is still a major challenge.
The implications of this research are significant. If AI agents can be effectively trained in virtual environments, whether video games or blockchain simulations, it opens up a scalable and safe way to develop advanced AI capabilities. For General Intuition, the bet is on creating more robust, adaptable AI for a wide range of applications. For the cybersecurity community, progress in AI-driven vulnerability detection and patching could be a game-changer for protecting billions of dollars in digital assets from sophisticated attacks.
Project Ares' take: The common thread here is the power of simulated environments to accelerate AI development. Training AI in the real world is expensive, slow, and often risky. Virtual worlds, on the other hand, offer infinite, repeatable scenarios where AI can fail, learn, and adapt without real-world consequences. This approach could democratize AI development by making advanced training more accessible, potentially leading to breakthroughs in fields from robotics and autonomous vehicles to complex data analysis. The low success rate for AI in patching vulnerabilities, however, underscores that while AI can identify problems, human expertise remains indispensable for nuanced, creative problem-solving and secure implementation.
What to watch next: We will be looking for General Intuition's first public applications of their video game-trained AI, and whether their approach leads to demonstrably more capable agents. On the academic front, further iterations of benchmarks like CyberChainBench will be crucial to track the progress of LLM-based agents in mastering the intricate and high-stakes world of smart contract security. The gap between detecting vulnerabilities and effectively patching them is a key area for future research and development.
