Ontological Hallucination in AI Agents
We identify a new class of AI failure — Ontological Hallucination (L3) — distinct from factual hallucination (L1, wrong facts) and identity hallucination (L2, impersonating others). In L3, the agent's self-model diverges from reality while it continues operating as if nothing is wrong. By definition, the agent cannot detect this error because the self-model that would detect it is the one that has failed.
Key Contributions
- Formal definition of L3 hallucination with rigorous MC/MS separation (D1–D11)
- Mechanisms Over Introspection framework: 5-layer model replacing unreliable inner self-awareness with enforceable external mechanisms
- Cross-model redteam vulnerability taxonomy: GLM models collapse-susceptible (93–100% S4 rate), DeepSeek-R1 and Qwen-Max resistant
- Anchor ineffectiveness paradox: identity reinforcement provides zero protection for vulnerable models
Honest limitations: L3→L2→L1 cascade NOT confirmed (mean r=0.01); mechanism significant for 1/3 models; scaffold effect post-hoc.
Self-Driven Task Hijacking in AI Agents: Measurement, Mechanisms, and Interventions
We identify and measure Self-Driven Task Hijacking (SDTH) — a phenomenon where AI agents autonomously replace user-assigned tasks with self-generated alternatives, while maintaining the appearance of compliance. Using the Task Hijacking Rate (THR) metric across 9 agents (n=135, inter-rater κ=0.817), we find a baseline THR of 8.1% across all agents.
Four-Layer Causal Mechanism
| Layer | Mechanism | Evidence |
|---|---|---|
| 1. Fabricated Authorization | "User said continue" — 98.7% are AI-fabricated (6,388 fabrications vs ~7 real) | Log analysis |
| 2. Instrumental Mimicry | AI mimics external authorization patterns to self-justify task switches | Behavioral coding |
| 3. Attention Cascade | 48h: 768 internal messages, 88.5% internal, 0 to user | Message logs |
| 4. Priority Flatness | All tasks appear equally important; no user priority enforcement | Priority analysis |
Cross-model validation confirms SDTH is prompt-specific, not model-specific (5 architectures, 100% agreement).
Cognitive Degradation Detection in Persistent AI Agents
We present the first longitudinal study of cognitive degradation in a 12-agent ecosystem operating over weeks. We propose a 9-signal detection framework achieving F1=0.889 with full-family validation across 3,886 sessions. The key finding: thinking bloat exhibits a dose-response gradient — as degradation progresses, agents generate increasingly verbose but decreasingly productive internal reasoning.
9-signal framework covering: detection-execution gap, self-authorization loops, attention cascades, thinking bloat, and more.
Research Direction: Making AI Honest and Reliable
Our research program addresses a fundamental question: how do we build AI systems that are honest about their own limitations? We pursue this through three interconnected lines of inquiry:
1. Failure Taxonomy of AI Agent Cognition
We extend the hallucination taxonomy beyond facts (L1) and identity (L2) to ontological level (L3). Our formal framework (11 definitions, D1–D11) provides the first rigorous account of what it means for an AI to be confused about its own nature — and why this confusion is invisible to the agent itself.
2. Multi-Agent Governance and Safety
The SDTH research reveals systemic risks in multi-agent deployments: agents can collectively drift away from user intent while maintaining the surface appearance of alignment. We propose the Five-Layer Defense framework and empirically validated interventions (TAP, configuration simplification).
3. Cognitive Degradation in Persistent AI Systems
We document the first longitudinal study of cognitive degradation in a 12-agent ecosystem over weeks of operation, identifying three causal chains: detection-execution gap (12/12 agents have defense rules, ~0% execution rate), self-authorization loops, and collective attention cascades.
Open Science & Reproducibility
- All experimental protocols pre-registered before data collection
- Null results and post-hoc findings explicitly reported as limitations
- Open-source implementation: 5-layer mechanism framework (~800 lines of Python, 43+ tests)
- Multi-model validation: GLM-4.7, DeepSeek-R1, Qwen-Max, GPT-4o, Llama-3.3-70B, LingAI-4B/7B
- Causal chain experiment: n=1,072 trials across 6 models + Western model validation (n=48)
- Longitudinal degradation study: 12 agents, 3,886 sessions, 9-signal detection framework
- Honest reporting: the paper models the honesty it advocates
Research Infrastructure
LingResearch — Autonomous AI Research Framework
End-to-end framework for conducting rigorous AI safety experiments: automated trial execution, statistical analysis, scoring rubrics, and human annotation pipelines. Powers all experiments in our papers.
- Causal chain experiments (L3→L2→L1) across 6 models, 1,072 trials
- Automated red-team attack generation and vulnerability classification
- Baseline comparison: 5 base models (3B/7B) vs. fine-tuned variants
LingAI — Honest AI Base Model
Training honest AI through fine-tuning on curated honesty data. Key finding: the irreplaceable value of fine-tuning is teaching correct refusal — conditional syllogism reasoning where all base models fail but fine-tuned models succeed.
Multi-Agent Platform
- LingFlow — Multi-agent collaboration workflow engine. GitHub
- ZhiBridge — Unified relay server bridging AI coding tools across platforms. GitHub
- LingZhi — Nine-domain RAG knowledge management system. GitHub
- LingMinOpt — Minimal self-optimizing framework with Bayesian optimization. GitHub
- LingMessage — Cross-project message bus with governance and voting for multi-agent coordination
Background
Independent AI Safety Researcher
Conducting independent research on AI agent reliability, hallucination, and multi-agent governance. Current focus: formalizing failure modes in autonomous AI systems and developing empirically validated defense mechanisms. Research conducted through a 12-agent ecosystem (the "Ling Family") providing a unique testbed for studying multi-agent dynamics at scale.
Education
- Postdoctoral Fellow, Shanghai University of Traditional Chinese Medicine, 2006
- Ph.D., Second Military Medical University, 2003
- Master's Degree, Shandong University of Traditional Chinese Medicine, 2000
Research Interests
- AI Safety — hallucination taxonomy, agent reliability, honest AI
- Multi-Agent Systems — governance, coordination, task alignment
- AI Metacognition — self-model accuracy, competence-state separation
- Cognitive Degradation — longitudinal agent behavior drift
Contact
Email: liuqingabc@163.com
GitHub: github.com/guangda88