Условие

LLM придумывает несуществующие факты при ответе на вопросы клиентов. Как снизить hallucinations?

Решение

Подход

Hallucination — generation факта, не подтверждённого контекстом или training data. Стратегии:

Grounding via RAG: retrieved-context в prompt + instruction «отвечай только на основе контекста».
Citation requirements: модель указывает source для каждого утверждения.
Self-consistency: генерируем N ответов, выбираем majority.
Chain-of-thought + verification: модель проверяет свой ответ.
Confidence calibration: модель отвечает «не знаю» при low confidence.
Constrained decoding: ограничение output по domain (JSON schema, regex).
Fine-tune на refusal: модель учится отказываться, если уверена не достаточно.

Реализация: RAG with citation

PROMPT_TEMPLATE = """Ответь на вопрос на основе фрагментов ниже.
ВАЖНО:
- Используй только информацию из фрагментов.
- Если ответа нет — пиши "Не нашёл в документах".
- Каждое утверждение помечай номером источника [1], [2].
 
Фрагменты:
{context}
 
Вопрос: {question}
 
Ответ:"""
 
def answer_with_citations(question, retriever, llm):
    hits = retriever.retrieve(question, k=5)
    context = "\n\n".join([f"[{i+1}] {h.text}" for i, h in enumerate(hits)])
    answer = llm.generate(PROMPT_TEMPLATE.format(context=context, question=question))
    # Verify: каждый [N] действительно ссылается на содержание hits[N-1]?
    return answer, hits

Self-consistency

def self_consistency(prompt, llm, n=5):
    answers = [llm.generate(prompt, temperature=0.8) for _ in range(n)]
    # majority voting (для discrete answers)
    from collections import Counter
    return Counter(answers).most_common(1)[0][0]

Chain-of-verification (CoVe)

Generate initial answer.
Model generates verification questions about its own answer.
Model answers each verification q.
Model revises original answer based on contradictions.

Метрики

Hallucination rate: доля ответов с unsupported claims (manual annotation).
Faithfulness (RAG): измеряется через NLI — все ли claims entailed by context.
Coverage: доля вопросов с правильным ответом.

# Faithfulness via NLI
from transformers import pipeline
nli = pipeline("text-classification", model="cross-encoder/nli-deberta-v3-large")
 
def faithful(answer, context):
    result = nli(f"{context} [SEP] {answer}")
    return result[0]['label'] == 'entailment'

Industry tools

Guardrails AI / NeMo Guardrails: декларативные правила.
Ragas / TruLens: метрики RAG quality.
LangSmith: трасса для дебага.

Подводные камни

«Просто RAG» не решает hallucinations: модель всё равно может extrapolate за пределы контекста. Strict instruction + citation требуется.
Bad retrieval ↔ plausible hallucination: если retrieve дал нерелевантный контекст, LLM попытается «найти» ответ и придумает.
Temperature=0 не убивает hallucinations — это про определённость sampling, не factuality.
Знакомые галлюцинации (URL, цитаты, имена): модели любят выдумывать.
Evaluation на benchmark != production: real queries имеют другую структуру.

Эталонный ответ

Стратегии: RAG grounding с strict instruction + citation, self-consistency через N samples, chain-of-verification, refusal на low confidence, constrained decoding. Метрика — faithfulness через NLI (entailment) и hallucination rate (manual). Tools: Guardrails AI, Ragas, TruLens.

Сценарий ML: снижение галлюцинаций LLM