Research the current state of claim-source verification (also called citation faithfulness or attri...

research prompt

Research the current state of claim-source verification (also called citation faithfulness or attribution evaluation) in LLM-based research and RAG systems. I am building a deep-research skill whose differentiator is an inline engine that verifies each claim against the full text of its cited source before shipping it. I need the existing landscape and the open problems. Cover, with sources for each: 1. The conceptual distinction between citation CORRECTNESS (does the cited doc support the statement) and citation FAITHFULNESS (does the model's reliance on the source actually drive the claim, vs. post-hoc rationalization). Who has formalized this and how. 2. Documented failure modes in deployed deep-research agents (OpenAI, Perplexity, Gemini, Grok): citation hallucination (fabricated references) vs. statement/claim hallucination (real source, unsupported claim) vs. misattribution. Measured rates where available. 3. Existing verification methods and where each breaks: deterministic string/span matching, NLI/entailment-based checking, LLM-as-a-judge, retrieval-augmented validation. Strengths and known weaknesses of each. 4. Existing benchmarks and metrics for faithfulness/attribution, what gold labels they use, and their critiques. 5. The unsolved problems specifically — what every existing approach still fails at, and where the consensus says the hard part actually is. For each major claim, give the source title and link. Flag where sources disagree. Prioritize 2025–2026 work.

date

Jun 19, 2026

direct compare

metric
format	prose	prose
word count	3,276	613
sources	50	40
processing time	0s	132s
has images	no	no
has tables	no	no
citation style	—	—

prose • 3,276 words

# 1. Definitions and who formalized them - Citation correctness: whether a cited document semantically supports the generated statement (a.k.a. answer faithfulness / entailment between citation and claim). This framing—treating correctness as a semantic entailment test—is used explicitly in recent RAG/citation evaluation work [Correctness is not Faithfulness in Retrieval Augmented Generation Attributions](https://dl.acm.org/doi/10.1145/3731120.3744592) and in TRACE-style decompositions of citat...

prose • 3,276 words

# 1. Definitions and who formalized them - Citation correctness: whether a cited document semantically supports the generated statement (a.k.a. answer faithfulness / entailment between citation and claim). This framing—treating correctness as a semantic entailment test—is used explicitly in recent RAG/citation evaluation work [Correctness is not Faithfulness in Retrieval Augmented Generation Attributions](https://dl.acm.org/doi/10.1145/3731120.3744592) and in TRACE-style decompositions of citat...

prose • 613 words

Below is a condensed map of the landscape, emphasizing 2024–2026 work and what it implies for building a claim-by-claim verification engine. --- ## 1. Correctness vs. Faithfulness: who formalized what ### 1.1 AIS: “Attributable to Identified Sources” Rashkin et al. introduce the *Attributable to Identified Sources (AIS)* framework for NLG. [[1, 2]] - **Goal**: decide whether system output about the external world is *verifiable* against specific sources. - **Definition**: a statement *s* is...

ai-generated content. verify independently. preserved in the museum of queries.