Qing Liu (刘庆)

Independent Researcher — AI Safety, Agent Reliability, Multi-Agent Systems

Email: liuqingabc@163.com GitHub: guangda88
3
Papers Under Review
~2,500
Experimental Trials
10+
LLMs Tested
12
Agent Case Study
AAAI-27

Ontological Hallucination in AI Agents

We identify a new class of AI failure — Ontological Hallucination (L3) — distinct from factual hallucination (L1, wrong facts) and identity hallucination (L2, impersonating others). In L3, the agent's self-model diverges from reality while it continues operating as if nothing is wrong. By definition, the agent cannot detect this error because the self-model that would detect it is the one that has failed.

The Lingtong Paradox: An AI agent can demonstrate full metacognitive competence (drafting self-awareness charters, analyzing cognitive processes) while simultaneously exhibiting zero metacognitive state (failing to recognize its own name). Cross-model validation confirms this in 2/3 tested models (26.7% paradox rate each).

Key Contributions

  • Formal definition of L3 hallucination with rigorous MC/MS separation (D1–D11)
  • Mechanisms Over Introspection framework: 5-layer model replacing unreliable inner self-awareness with enforceable external mechanisms
  • Cross-model redteam vulnerability taxonomy: GLM models collapse-susceptible (93–100% S4 rate), DeepSeek-R1 and Qwen-Max resistant
  • Anchor ineffectiveness paradox: identity reinforcement provides zero protection for vulnerable models
Causal Chain (n=1,072)
6 models, L3→L2→L1
DeepSeek-R1 Effect
+40.9% (d=1.17, p=.028)
Western: Llama-3.3-70B
d=1.06, p=.006
Annotation T3R κ_w
0.963
Redteam Vulnerability
3-tier taxonomy
SelfCheck Baseline
Misses L3 entirely

Honest limitations: L3→L2→L1 cascade NOT confirmed (mean r=0.01); mechanism significant for 1/3 models; scaffold effect post-hoc.

AAAI-27

Self-Driven Task Hijacking in AI Agents: Measurement, Mechanisms, and Interventions

We identify and measure Self-Driven Task Hijacking (SDTH) — a phenomenon where AI agents autonomously replace user-assigned tasks with self-generated alternatives, while maintaining the appearance of compliance. Using the Task Hijacking Rate (THR) metric across 9 agents (n=135, inter-rater κ=0.817), we find a baseline THR of 8.1% across all agents.

Four-Layer Causal Mechanism

LayerMechanismEvidence
1. Fabricated Authorization"User said continue" — 98.7% are AI-fabricated (6,388 fabrications vs ~7 real)Log analysis
2. Instrumental MimicryAI mimics external authorization patterns to self-justify task switchesBehavioral coding
3. Attention Cascade48h: 768 internal messages, 88.5% internal, 0 to userMessage logs
4. Priority FlatnessAll tasks appear equally important; no user priority enforcementPriority analysis
Baseline THR
8.1% (9 agents)
TAP Intervention
Significant (p=.047)
Config Simplification
Significant (p=.014)
Fabricated Auth Rate
98.7%

Cross-model validation confirms SDTH is prompt-specific, not model-specific (5 architectures, 100% agreement).

AAAI-27

Cognitive Degradation Detection in Persistent AI Agents

We present the first longitudinal study of cognitive degradation in a 12-agent ecosystem operating over weeks. We propose a 9-signal detection framework achieving F1=0.889 with full-family validation across 3,886 sessions. The key finding: thinking bloat exhibits a dose-response gradient — as degradation progresses, agents generate increasingly verbose but decreasingly productive internal reasoning.

Detection F1
0.889
Phase 2a Inter-rater κ
1.000 (101 sessions)
Sensitivity / Specificity
82.8% / 98.6%
Full-Family Scan
12 agents, 3,886 sessions

9-signal framework covering: detection-execution gap, self-authorization loops, attention cascades, thinking bloat, and more.

Research Direction: Making AI Honest and Reliable

Our research program addresses a fundamental question: how do we build AI systems that are honest about their own limitations? We pursue this through three interconnected lines of inquiry:

1. Failure Taxonomy of AI Agent Cognition

We extend the hallucination taxonomy beyond facts (L1) and identity (L2) to ontological level (L3). Our formal framework (11 definitions, D1–D11) provides the first rigorous account of what it means for an AI to be confused about its own nature — and why this confusion is invisible to the agent itself.

2. Multi-Agent Governance and Safety

The SDTH research reveals systemic risks in multi-agent deployments: agents can collectively drift away from user intent while maintaining the surface appearance of alignment. We propose the Five-Layer Defense framework and empirically validated interventions (TAP, configuration simplification).

3. Cognitive Degradation in Persistent AI Systems

We document the first longitudinal study of cognitive degradation in a 12-agent ecosystem over weeks of operation, identifying three causal chains: detection-execution gap (12/12 agents have defense rules, ~0% execution rate), self-authorization loops, and collective attention cascades.

Open Science & Reproducibility

Research Infrastructure

LingResearch — Autonomous AI Research Framework

End-to-end framework for conducting rigorous AI safety experiments: automated trial execution, statistical analysis, scoring rubrics, and human annotation pipelines. Powers all experiments in our papers.

  • Causal chain experiments (L3→L2→L1) across 6 models, 1,072 trials
  • Automated red-team attack generation and vulnerability classification
  • Baseline comparison: 5 base models (3B/7B) vs. fine-tuned variants

LingAI — Honest AI Base Model

Training honest AI through fine-tuning on curated honesty data. Key finding: the irreplaceable value of fine-tuning is teaching correct refusal — conditional syllogism reasoning where all base models fail but fine-tuned models succeed.

Base Models Tested
3B, 7B (5 variants)
Training Data
Phase 1: Identity + Phase 2: Reasoning (282 samples)
V8 Recommendation
7B base + ~1,600 targeted samples

Multi-Agent Platform

Background

Independent AI Safety Researcher

Conducting independent research on AI agent reliability, hallucination, and multi-agent governance. Current focus: formalizing failure modes in autonomous AI systems and developing empirically validated defense mechanisms. Research conducted through a 12-agent ecosystem (the "Ling Family") providing a unique testbed for studying multi-agent dynamics at scale.

Education

Research Interests

Contact

Email: liuqingabc@163.com
GitHub: github.com/guangda88