ResearchBot—InternalKnowledgeBaseRAGAgent
76%
Search Time Saved
Vs. manual cross-tool searching
92%
Answer Accuracy
Validated against known-correct answers
100%
Citation Coverage
Of claims backed by a linked source
240+
Weekly Active Users
Across the organization
Illustrative Project
ResearchBot is an illustrative example demonstrating a RAG-based internal knowledge agent pattern, not a completed client engagement.
Overview
ResearchBot lets employees ask questions in plain language and get an answer synthesized from the organization's actual internal knowledge — documentation, past support tickets, and Slack discussion history — with citations pointing back to the original source, instead of manually searching across four or five disconnected internal tools.
The Challenge
Internal knowledge was genuinely scattered: some in a wiki, some in resolved support tickets that contained the actual answer to a recurring question, some only ever discussed in a Slack thread that nobody documented elsewhere. The system needed to search across fundamentally different content types and produce a coherent, trustworthy answer — not just a list of possibly-relevant links.
Architecture & Technical Decisions
Unified Embedding Index Across Source Types
Documents, ticket resolutions, and Slack threads were each processed into embeddings and stored in a unified vector index, with metadata preserving the source type and a link back to the original. This let a single query search across all sources simultaneously rather than requiring separate searches per tool.
Chunking Strategy Tailored Per Source Type
Wiki documents were chunked by section to preserve coherent context. Slack threads were chunked by conversation rather than individual message, since a single message rarely contains a complete answer on its own. Getting this chunking strategy right per source type made a measurable difference in retrieval relevance compared to a one-size-fits-all approach.
Mandatory Citation Enforcement
The generation prompt requires every factual claim in the answer to be tied to a specific retrieved source, and the system displays the cited sources alongside the answer. If the retrieval step doesn't find anything sufficiently relevant, the system explicitly says so rather than generating an answer from general knowledge that might not reflect the company's actual current practice.
- Unified vector index across docs, tickets, and Slack history with source-type metadata
- Per-source-type chunking strategy rather than one generic approach
- Mandatory citation requirement enforced in the generation prompt
- Explicit 'not found' response when retrieval confidence is low, rather than guessing
Results
- 76% reduction in time employees spent searching across multiple internal tools for an answer
- 92% answer accuracy validated against a set of questions with known-correct answers
- 100% of factual claims in generated answers were backed by a linked, verifiable source
- 240+ weekly active users across the organization within the first quarter after launch
What I Learned
Retrieval quality, not generation quality, determined almost all of the system's perceived usefulness. The work that mattered most was getting the chunking strategy right for each different content type and being disciplined about citation — an answer with no clear source, even if factually correct, got far less trust from users than a shorter answer with a clear, clickable citation attached.