Skip to main content
Mujtaba Farooq logoMujtaba
Back to Projects

ResearchBotInternalKnowledgeBaseRAGAgent

SaaSSaaS
Next.jsNode.jsOpenAI APIpgvectorPostgreSQL

76%

Search Time Saved

Vs. manual cross-tool searching

92%

Answer Accuracy

Validated against known-correct answers

100%

Citation Coverage

Of claims backed by a linked source

240+

Weekly Active Users

Across the organization

Illustrative Project

ResearchBot is an illustrative example demonstrating a RAG-based internal knowledge agent pattern, not a completed client engagement.

Overview

ResearchBot lets employees ask questions in plain language and get an answer synthesized from the organization's actual internal knowledge — documentation, past support tickets, and Slack discussion history — with citations pointing back to the original source, instead of manually searching across four or five disconnected internal tools.

The Challenge

Internal knowledge was genuinely scattered: some in a wiki, some in resolved support tickets that contained the actual answer to a recurring question, some only ever discussed in a Slack thread that nobody documented elsewhere. The system needed to search across fundamentally different content types and produce a coherent, trustworthy answer — not just a list of possibly-relevant links.

Architecture & Technical Decisions

Unified Embedding Index Across Source Types

Documents, ticket resolutions, and Slack threads were each processed into embeddings and stored in a unified vector index, with metadata preserving the source type and a link back to the original. This let a single query search across all sources simultaneously rather than requiring separate searches per tool.

Chunking Strategy Tailored Per Source Type

Wiki documents were chunked by section to preserve coherent context. Slack threads were chunked by conversation rather than individual message, since a single message rarely contains a complete answer on its own. Getting this chunking strategy right per source type made a measurable difference in retrieval relevance compared to a one-size-fits-all approach.

Mandatory Citation Enforcement

The generation prompt requires every factual claim in the answer to be tied to a specific retrieved source, and the system displays the cited sources alongside the answer. If the retrieval step doesn't find anything sufficiently relevant, the system explicitly says so rather than generating an answer from general knowledge that might not reflect the company's actual current practice.

  • Unified vector index across docs, tickets, and Slack history with source-type metadata
  • Per-source-type chunking strategy rather than one generic approach
  • Mandatory citation requirement enforced in the generation prompt
  • Explicit 'not found' response when retrieval confidence is low, rather than guessing

Results

  • 76% reduction in time employees spent searching across multiple internal tools for an answer
  • 92% answer accuracy validated against a set of questions with known-correct answers
  • 100% of factual claims in generated answers were backed by a linked, verifiable source
  • 240+ weekly active users across the organization within the first quarter after launch

What I Learned

Retrieval quality, not generation quality, determined almost all of the system's perceived usefulness. The work that mattered most was getting the chunking strategy right for each different content type and being disciplined about citation — an answer with no clear source, even if factually correct, got far less trust from users than a shorter answer with a clear, clickable citation attached.

Related Projects