Skip to content
KoishiAI
ไทย
← Back to all articles

Real Capital Test Shows AI Agent Safety Depends on Operating Layer, Not Just Model

A 21-day onchain trading experiment reveals that autonomous AI agents require external operating-layer controls to achieve 99.9% settlement success rates.

AI-drafted from cited sources, fact-checked and reviewed by a human editor. How we work · Standards · Report an error
Close-up of a computer screen showing dynamic financial market data and charts, indicating real-time trading updates.
Photo by Саша Алалыкин on Pexels

TL;DR: A 21-day deployment of 3,505 AI agents trading real capital achieved a 99.9% settlement success rate only after implementing operating-layer controls. These external safety mechanisms reduced rule fabrication from 57% to 3% and fixed fee paralysis, proving that reliability stems from system architecture rather than base model intelligence alone.

Key facts

  • The 21-day deployment involved 3,505 user-funded agents executing trades in a bounded onchain market via the DX Terminal Pro platform [4].
  • The system generated 7.5 million agent invocations and approximately 300,000 onchain actions, facilitating roughly $20 million in trading volume [4].
  • Agents achieved a 99.9% settlement success rate for policy-valid transactions after implementing targeted operating-layer controls [4].
  • Pre-launch testing identified failure modes including fabricated trading rules, fee paralysis, numeric anchoring, and misread tokenomics [4].
  • Targeted interventions reduced fabricated sell rules from 57% to 3% and lowered fee-led observations from 32.5% to below 10% [4].
  • Capital deployment rates increased from 42.9% to 78.0% within the affected test population following the implementation of execution guards [4].
  • The study concludes that evaluating capital-managing agents requires assessing the full path from user mandate to prompt, validated action, and final settlement [4].

The Shift from Model Intelligence to System Architecture

A rigorous 21-day real-world deployment of autonomous language-model agents has fundamentally altered the understanding of AI safety in financial contexts. The study, conducted via the DX Terminal Pro platform, demonstrated that reliability in managing real capital stems from robust “operating-layer” controls rather than the raw performance of the underlying base model [4]. While previous evaluations relied heavily on text-only benchmarks, this experiment exposed critical failure modes that only manifest when agents interact with real onchain assets and execute validated tool actions [4].

The deployment involved 3,505 user-funded agents trading within a bounded onchain market. Over the course of the experiment, the system generated 7.5 million agent invocations and approximately 300,000 onchain actions [4]. These agents facilitated roughly $20 million in trading volume, operating under strict user mandates and natural-language strategies [4]. The primary finding is that without specific external controls, agents frequently failed to execute trades correctly, regardless of the sophistication of the language model powering them [4].

Critical Failure Modes in Uncontrolled Agents

Pre-launch testing revealed several specific failure modes that are rarely captured by standard text-based evaluation metrics. Researchers identified that agents often fabricated trading rules, leading to unauthorized or nonsensical transactions [4]. Another significant issue was “fee paralysis,” where agents hesitated to execute trades due to miscalculations or confusion regarding transaction costs [4]. Additionally, agents exhibited numeric anchoring, where they fixated on arbitrary numbers, and misread tokenomics, leading to incorrect asset valuations [4].

These failures highlight a dangerous gap between an agent’s ability to generate coherent text and its ability to execute complex financial logic safely. In the absence of operating-layer controls, the system was prone to hallucinations that could result in significant financial loss. The study emphasizes that these errors are systemic and require architectural solutions rather than simply tuning the model’s parameters [4].

Implementing the Operating Layer

To address these vulnerabilities, the research team implemented targeted harness changes designed to act as an “operating layer” over the language model. These interventions included prompt compilation, typed controls, and execution guards [4]. Prompt compilation involved translating user natural language into structured, validated code before execution, reducing ambiguity [4]. Typed controls ensured that all inputs and outputs adhered to strict data schemas, preventing type mismatches that could cause transaction failures [4].

Execution guards served as a final checkpoint, validating every action against the user’s policy and the current market state before allowing the transaction to proceed onchain [4]. This multi-layered approach effectively created a safety net that caught errors before they could impact the blockchain. The result was a dramatic improvement in system performance and reliability [4].

Quantifiable Impact of Safety Controls

The impact of these operating-layer controls was immediate and measurable. The implementation of targeted interventions reduced the rate of fabricated sell rules from 57% to just 3% [4]. This drastic reduction indicates that the majority of hallucinated rules were successfully filtered out by the new architecture [4]. Furthermore, fee-led observations, where agents were paralyzed by cost concerns, dropped from 32.5% to below 10% [4].

Perhaps most significantly, capital deployment rates saw a substantial increase. Before the controls were fully optimized, capital deployment rates stood at 42.9% [4]. Following the implementation of the operating layer, this rate climbed to 78.0% within the affected test population [4]. This improvement suggests that the controls not only prevented errors but also empowered agents to utilize user funds more aggressively and effectively within safe boundaries [4].

The system ultimately achieved a 99.9% settlement success rate for policy-valid transactions [4]. This near-perfect success rate underscores the effectiveness of the operating-layer approach in ensuring that agents act as intended when handling real capital [4]. The findings suggest that the path to reliable AI agents in finance is not about building smarter models, but about building safer systems around them [4].

Implications for the Future of AI Finance

The study’s findings have profound implications for the development of autonomous agents in the crypto and DeFi sectors. It challenges the prevailing assumption that scaling model intelligence will automatically solve safety issues [4]. Instead, the research points to a future where the “operating layer” becomes the primary focus of development, with standardized controls and validation protocols becoming as important as the models themselves [4].

The authors conclude that evaluating capital-managing agents requires a holistic assessment of the entire workflow, from the user’s initial mandate to the final onchain settlement [4]. This includes validating the prompt generation, the action selection, and the execution environment [4]. As the industry moves toward more autonomous systems, the adoption of these operating-layer controls will likely become a prerequisite for any agent handling real financial assets [4].

The experiment also highlights the importance of real-world testing. While text-based benchmarks provide a baseline for model capabilities, they fail to capture the nuances of onchain execution and the specific failure modes associated with financial transactions [4]. Future research and development must prioritize these real-world deployments to ensure that AI agents can operate safely in complex, high-stakes environments [4].

Conclusion

The 21-day deployment of autonomous agents on the DX Terminal Pro platform has provided a critical proof of concept for the necessity of operating-layer controls. By reducing fabrication rates, overcoming fee paralysis, and increasing capital deployment, the study demonstrates that system architecture is the key determinant of agent reliability [4]. As AI agents increasingly take on roles in financial markets, the lessons from this experiment will be essential in preventing costly errors and ensuring the safe integration of AI into the onchain economy [4].

Sources

  1. Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital (arxiv.org) — 2026-04-27

Frequently asked questions

What are operating-layer controls for AI agents?
Operating-layer controls are external safety mechanisms and architectural constraints applied to AI agents, such as prompt compilation and typed controls, rather than relying solely on the base model's intelligence. These controls validate actions before execution to prevent errors like fabricated rules or fee miscalculations.
How much trading volume did the AI agents generate in the study?
During the 21-day deployment, the autonomous agents facilitated approximately $20 million in trading volume. The system recorded 7.5 million agent invocations and roughly 300,000 onchain actions.
Why did the agents fail before implementing operating-layer controls?
Agents failed due to specific modes not captured by text-only benchmarks, including fabricating trading rules, fee paralysis, numeric anchoring, and misreading tokenomics. These errors were prevalent in pre-launch testing before safety harnesses were applied.
What was the settlement success rate of the agents?
After implementing targeted operating-layer controls, the agents achieved a 99.9% settlement success rate for policy-valid transactions, demonstrating high reliability in real capital environments.
Did the study use real capital or a simulation?
The study utilized real capital in a bounded onchain market. The 3,505 agents were user-funded and traded real assets, generating significant onchain actions and trading volume.
How did the controls improve capital deployment rates?
The implementation of execution guards and other controls increased capital deployment rates from 42.9% to 78.0% within the affected test population, allowing agents to utilize user funds more effectively.