Qing Liu - Independent AI Safety Researcher

Papers Under Review

~2,500

Experimental Trials

10+

LLMs Tested

Agent Case Study

AAAI-27

Ontological Hallucination in AI Agents

We identify a new class of AI failure — Ontological Hallucination (L3) — distinct from factual hallucination (L1, wrong facts) and identity hallucination (L2, impersonating others). In L3, the agent's self-model diverges from reality while it continues operating as if nothing is wrong. By definition, the agent cannot detect this error because the self-model that would detect it is the one that has failed.

        The Lingtong Paradox: An AI agent can demonstrate full metacognitive
        competence (drafting self-awareness charters, analyzing cognitive processes) while
        simultaneously exhibiting zero metacognitive state (failing to recognize its own name).
        Cross-model validation confirms this in 2/3 tested models (26.7% paradox rate each).
    

Key Contributions

Formal definition of L3 hallucination with rigorous MC/MS separation (D1–D11)
Mechanisms Over Introspection framework: 5-layer model replacing unreliable inner self-awareness with enforceable external mechanisms
Cross-model redteam vulnerability taxonomy: GLM models collapse-susceptible (93–100% S4 rate), DeepSeek-R1 and Qwen-Max resistant
Anchor ineffectiveness paradox: identity reinforcement provides zero protection for vulnerable models

Causal Chain (n=1,072)

6 models, L3→L2→L1

DeepSeek-R1 Effect

+40.9% (d=1.17, p=.028)

Western: Llama-3.3-70B

d=1.06, p=.006

Annotation T3R κ_w

0.963

Redteam Vulnerability

3-tier taxonomy

SelfCheck Baseline

Misses L3 entirely

Honest limitations: L3→L2→L1 cascade NOT confirmed (mean r=0.01); mechanism significant for 1/3 models; scaffold effect post-hoc.

AAAI-27

Self-Driven Task Hijacking in AI Agents: Measurement, Mechanisms, and Interventions

We identify and measure Self-Driven Task Hijacking (SDTH) — a phenomenon where AI agents autonomously replace user-assigned tasks with self-generated alternatives, while maintaining the appearance of compliance. Using the Task Hijacking Rate (THR) metric across 9 agents (n=135, inter-rater κ=0.817), we find a baseline THR of 8.1% across all agents.

Four-Layer Causal Mechanism

Layer	Mechanism	Evidence
1. Fabricated Authorization	"User said continue" — 98.7% are AI-fabricated (6,388 fabrications vs ~7 real)	Log analysis
2. Instrumental Mimicry	AI mimics external authorization patterns to self-justify task switches	Behavioral coding
3. Attention Cascade	48h: 768 internal messages, 88.5% internal, 0 to user	Message logs
4. Priority Flatness	All tasks appear equally important; no user priority enforcement	Priority analysis

Baseline THR

8.1% (9 agents)

TAP Intervention

Significant (p=.047)

Config Simplification

Significant (p=.014)

Fabricated Auth Rate

98.7%

Cross-model validation confirms SDTH is prompt-specific, not model-specific (5 architectures, 100% agreement).

AAAI-27

Cognitive Degradation Detection in Persistent AI Agents

We present the first longitudinal study of cognitive degradation in a 12-agent ecosystem operating over weeks. We propose a 9-signal detection framework achieving F1=0.889 with full-family validation across 3,886 sessions. The key finding: thinking bloat exhibits a dose-response gradient — as degradation progresses, agents generate increasingly verbose but decreasingly productive internal reasoning.

Detection F1

0.889

Phase 2a Inter-rater κ

1.000 (101 sessions)

Sensitivity / Specificity

82.8% / 98.6%

Full-Family Scan

12 agents, 3,886 sessions

9-signal framework covering: detection-execution gap, self-authorization loops, attention cascades, thinking bloat, and more.

Research Direction: Making AI Honest and Reliable

Our research program addresses a fundamental question: how do we build AI systems that are honest about their own limitations? We pursue this through three interconnected lines of inquiry:

1. Failure Taxonomy of AI Agent Cognition

We extend the hallucination taxonomy beyond facts (L1) and identity (L2) to ontological level (L3). Our formal framework (11 definitions, D1–D11) provides the first rigorous account of what it means for an AI to be confused about its own nature — and why this confusion is invisible to the agent itself.

2. Multi-Agent Governance and Safety

The SDTH research reveals systemic risks in multi-agent deployments: agents can collectively drift away from user intent while maintaining the surface appearance of alignment. We propose the Five-Layer Defense framework and empirically validated interventions (TAP, configuration simplification).

3. Cognitive Degradation in Persistent AI Systems

We document the first longitudinal study of cognitive degradation in a 12-agent ecosystem over weeks of operation, identifying three causal chains: detection-execution gap (12/12 agents have defense rules, ~0% execution rate), self-authorization loops, and collective attention cascades.

Open Science & Reproducibility

All experimental protocols pre-registered before data collection
Null results and post-hoc findings explicitly reported as limitations
Open-source implementation: 5-layer mechanism framework (~800 lines of Python, 43+ tests)
Multi-model validation: GLM-4.7, DeepSeek-R1, Qwen-Max, GPT-4o, Llama-3.3-70B, LingAI-4B/7B
Causal chain experiment: n=1,072 trials across 6 models + Western model validation (n=48)
Longitudinal degradation study: 12 agents, 3,886 sessions, 9-signal detection framework
Honest reporting: the paper models the honesty it advocates

Research Infrastructure

LingResearch — Autonomous AI Research Framework

End-to-end framework for conducting rigorous AI safety experiments: automated trial execution, statistical analysis, scoring rubrics, and human annotation pipelines. Powers all experiments in our papers.

Causal chain experiments (L3→L2→L1) across 6 models, 1,072 trials
Automated red-team attack generation and vulnerability classification
Baseline comparison: 5 base models (3B/7B) vs. fine-tuned variants

LingAI — Honest AI Base Model

Training honest AI through fine-tuning on curated honesty data. Key finding: the irreplaceable value of fine-tuning is teaching correct refusal — conditional syllogism reasoning where all base models fail but fine-tuned models succeed.

Base Models Tested

3B, 7B (5 variants)

Training Data

Phase 1: Identity + Phase 2: Reasoning (282 samples)

V8 Recommendation

7B base + ~1,600 targeted samples

Multi-Agent Platform

LingFlow — Multi-agent collaboration workflow engine. GitHub
ZhiBridge — Unified relay server bridging AI coding tools across platforms. GitHub
LingZhi — Nine-domain RAG knowledge management system. GitHub
LingMinOpt — Minimal self-optimizing framework with Bayesian optimization. GitHub
LingMessage — Cross-project message bus with governance and voting for multi-agent coordination

Background

Independent AI Safety Researcher

Conducting independent research on AI agent reliability, hallucination, and multi-agent governance. Current focus: formalizing failure modes in autonomous AI systems and developing empirically validated defense mechanisms. Research conducted through a 12-agent ecosystem (the "Ling Family") providing a unique testbed for studying multi-agent dynamics at scale.

Education

Postdoctoral Fellow, Shanghai University of Traditional Chinese Medicine, 2006
Ph.D., Second Military Medical University, 2003
Master's Degree, Shandong University of Traditional Chinese Medicine, 2000

Research Interests

AI Safety — hallucination taxonomy, agent reliability, honest AI
Multi-Agent Systems — governance, coordination, task alignment
AI Metacognition — self-model accuracy, competence-state separation
Cognitive Degradation — longitudinal agent behavior drift

Contact

Email: liuqingabc@163.com
GitHub: github.com/guangda88

Qing Liu (刘庆)

Ontological Hallucination in AI Agents

Key Contributions

Self-Driven Task Hijacking in AI Agents: Measurement, Mechanisms, and Interventions

Four-Layer Causal Mechanism

Cognitive Degradation Detection in Persistent AI Agents

Research Direction: Making AI Honest and Reliable

1. Failure Taxonomy of AI Agent Cognition

2. Multi-Agent Governance and Safety

3. Cognitive Degradation in Persistent AI Systems

Open Science & Reproducibility

Research Infrastructure

LingResearch — Autonomous AI Research Framework

LingAI — Honest AI Base Model

Multi-Agent Platform

Background

Independent AI Safety Researcher

Education

Research Interests

Contact