RAG implementation Skill for Codex / Claude Code

Builds retrieval-augmented generation pipelines that ground model responses in your own documents rather than generic training knowledge. A RAG implementation covers document ingestion, semantic chunking, embedding, vector storage, hybrid search, reranking, and answer synthesis—so assistants answer from your data with cited sources.

Category Coding

Platform Codex / Claude Code

Published 2026-04-18

ragretrievalembeddings

Use cases

Building an internal knowledge base Q&A system where employees ask questions and get answers cited from company documentation
Creating a documentation assistant that answers questions about an API using the actual API docs rather than the model's training memory
Implementing enterprise search that goes beyond keyword matching to understand the semantic intent of queries
Building a product support bot that answers customer questions using the specific product documentation rather than generic knowledge
Creating a research assistant that synthesizes findings from a corpus of academic papers with citations

Key features

Ingest source documents, apply semantic chunking strategies appropriate to the document type (paragraph-level for prose, section-level for structured docs), and preserve metadata for citation
Generate embeddings for each chunk using a model suited to your data type and language, and index them in a vector store with appropriate filtering capabilities
At query time, retrieve the top-k relevant chunks using vector similarity, optionally blending with keyword search (BM25) for recall
Apply a reranking step to reorder retrieved chunks by actual relevance to the query, not just embedding similarity
Synthesize the answer from the reranked context with explicit citations to source documents, instructing the model to acknowledge when the context does not contain the answer

When to Use This Skill

When building a Q&A system that needs to answer from your specific documents rather than generic knowledge
When you have a large corpus of domain-specific content that general-purpose models do not handle well
When you need to reduce hallucination by grounding model responses in verifiable source documents

Expected Output

A complete RAG pipeline with document ingestion, embedding, vector indexing, retrieval, reranking, and grounded answer synthesis with citations.

Frequently Asked Questions

What chunk size should I use for RAG?: Smaller chunks (256-512 tokens) preserve semantic coherence but may miss broader context. Larger chunks (512-1024 tokens) capture more context but dilute relevance. Start with 512 tokens and adjust based on your retrieval evaluation—measure recall and precision on your specific query set.
How do I evaluate RAG quality beyond gut feeling?: Use retrieval metrics (recall@k, MRR) to measure whether the right documents are retrieved, and generation metrics (answer accuracy, citation accuracy) to measure whether the model uses the retrieved context correctly. Build an eval set of query-ground-truth pairs representative of production queries.
What is the difference between vector search and hybrid search?: Vector search finds semantically similar chunks using embedding distance. Keyword search (BM25) finds chunks with exact term matches. Hybrid search combines both, which typically outperforms either alone because semantic similarity and keyword relevance capture different aspects of relevance.

3 Indexed items

RAG pipeline construction

Research

Builds production-ready retrieval-augmented generation pipelines with deliberate chunking strategies, embedding model selection, vector store configuration, hybrid search blending, and reranking so agents answer from your documents with reduced hallucination and cited sources. This skill focuses on the engineering decisions that separate a working prototype from a production-quality RAG system.

Executing implementation plans

Coding

Executes a pre-written implementation plan in disciplined order, stopping at defined checkpoints to verify assumptions before moving forward. This skill prevents the common pattern of diverging from the plan silently when reality proves it wrong, and it creates natural opportunities to course-correct before small errors compound into large rework.

Contract testing