Use Case

Enterprise Knowledge Base Search with AI

Employees waste hours every week searching for information that exists somewhere in the organization but is impossible to find. We build AI retrieval systems that answer natural language questions accurately, with sources cited.

The Challenge

At a 2,800-person professional services firm, institutional knowledge lives in 140 SharePoint sites, a legacy Confluence instance nobody migrated off, Google Drive folders owned by people who left two years ago, a ServiceNow KB, and Slack channels whose search is indexed up to a point and then not. A new hire's first 90 days is mostly asking a senior colleague where something is. A senior consultant loses 3-4 hours a week to information-finding, confirmed through a time study. Brand-new questions get answered on a first-principles basis because the last team that answered the same question left no searchable artifact. The firm has tried SharePoint Enterprise Search twice, which returns keyword matches that technically contain the search terms and none that answer the question. People have stopped trusting search and default to asking in Teams.

Our Approach

A Retrieval-Augmented Generation system built on Claude Sonnet 4.5, OpenAI text-embedding-3-large, and Pinecone connects to your existing knowledge sources via their native APIs (Microsoft Graph for SharePoint and OneDrive, Confluence REST, Google Drive, Notion, ServiceNow KB, and Slack search export). Documents are chunked at paragraph level with overlap, embedded into Pinecone, and tagged with source metadata: owner, last modified, access control list, document type. A query agent rewrites the user's question for retrieval, pulls the top 15 candidate chunks, re-ranks with a cross-encoder, synthesizes an answer using only retrieved content, and cites specific sources with deep-links. Access permissions are enforced at query time against Azure AD and Google Workspace, so users only see answers from documents they can access. Unanswered questions feed a knowledge-gap report delivered weekly to content owners.

How We Do It

1

Knowledge Source Indexing

We connect to your document sources through their APIs: Microsoft Graph for SharePoint/OneDrive, Confluence REST, Google Drive API, Notion, ServiceNow KB, GitHub Wiki, and internal PDF repositories via direct scrape. Documents are chunked at semantic boundaries (paragraph, section, table) with 10-15% overlap, embedded with OpenAI text-embedding-3-large, and stored in Pinecone with metadata: source system, owner, created and modified dates, ACL, language, document type. Initial indexing runs in batches; incremental sync runs every 15-30 minutes using change webhooks or delta queries. Failure mode: a document references another by link and the link is to a system we can't index (external vendor portal). The system flags unresolvable references rather than indexing the placeholder text.

2

Natural Language Search Interface

Users ask questions in plain language through a chat UI, a Slack or Teams bot, or an embedded widget in SharePoint. The query agent rewrites ambiguous questions into retrieval-friendly form (expanding acronyms, adding synonyms from a company glossary), retrieves the top 15 chunks from Pinecone, re-ranks with a cross-encoder (Cohere Rerank v3 or equivalent) to reorder by true relevance, and then passes the top 5-8 chunks to Claude Sonnet 4.5 for synthesis. The response cites specific documents with deep-links and confidence level. Multi-turn conversations maintain context for follow-ups. Failure mode: the user asks something truly novel with no coverage in the knowledge base. The system returns 'I don't have enough to answer this confidently' and logs the gap.

3

Access Control and Permission Enforcement

Permissions are enforced at query time, not index time. When a user queries, the system resolves their identity (SSO via Azure AD, Okta, or Google Workspace), pulls their group memberships, and filters Pinecone results to documents the user has read access to. For SharePoint, we query Graph API for effective permissions; for Google Drive, the Drive permissions API; for Confluence, the space permissions API. A user who cannot read a document in the source system cannot receive answers sourced from it. Failure mode: permissions change after indexing (user removed from a group, document ACL updated). The next query picks up the change because permissions are resolved live, not cached.

4

Gap Analysis and Content Improvement

Every low-confidence answer, every 'I don't know' response, and every user thumbs-down writes to a gap log. A weekly analysis groups related unanswered questions (e.g. 23 variations of 'what is our policy on remote work from international locations') and surfaces them to the relevant knowledge owner with sample questions, a proposed article outline, and a one-click 'claim this topic' action. Answered questions that generate high re-query rates (users asking similar questions repeatedly) signal knowledge that exists but is hard to find. Failure mode: the gap report goes to a distribution list that nobody actually owns. We track response rate on gap reports per owner and escalate stale ones.

What You Get

Employees find accurate answers 5x faster than keyword search, measured in time-to-answer on a benchmark question set
New employee ramp time decreases 25-35% as onboarding information becomes accessible without asking colleagues
Knowledge owner inquiry volume (DMs, help tickets) drops 40% as self-service becomes reliable
Documentation gaps identified and addressed systematically: most clients close 100+ gaps per quarter versus previously closing none
Every answer comes with source citations and a confidence score, exportable as a CSV of question-answer pairs for compliance review

Where this fits — and where it doesn't

Good fit when

  • Organizations with 500+ employees where knowledge is genuinely distributed across multiple systems, and where document ownership and access control are reasonably well-maintained even if discovery is hard.
  • Use cases where the answer is actually in documents: policies, procedures, product specs, technical documentation, past decisions. The agent can retrieve and synthesize; it can't invent knowledge that doesn't exist somewhere in the corpus.
  • Teams willing to use the gap report as a driver of documentation investment. The agent amplifies existing content and makes absences visible, which creates pressure to close those gaps. Organizations that treat gap reports seriously see compounding improvements.

Not a fit when

  • ×Organizations with unclear or broken access control. If SharePoint permissions are inherited inconsistently, the agent will surface documents it shouldn't, or hide documents users should see. Fix access control before deployment.
  • ×Knowledge bases where the source of truth is someone's head, not a document. The agent can index documentation, transcripts, and chat; it can't index tacit knowledge. For these environments, the agent is complementary to a deliberate knowledge-capture effort, not a substitute.
  • ×Use cases where currency requirements are extreme (e.g. minute-by-minute operational procedures during an incident). Content that moves faster than the indexing cadence won't feel fresh enough.

Technology Stack

Claude Sonnet 4.5OpenAI text-embedding-3-largeCohere Rerank v3PineconeLangChainMicrosoft Graph APIConfluence REST APIAzure AD

Integrates with

SharePoint and OneDriveConfluenceGoogle DriveNotionServiceNow KnowledgeSlackMicrosoft TeamsGitHub WikiZendesk Guide

Related Services

Multimodal RAG SystemsView →
Generative AI ApplicationsView →
Enterprise AI IntegrationView →

Frequently Asked Questions

How do you handle documents that are outdated or contradict each other?+
Document metadata (creation date, last modified date, owner, version) feeds into retrieval ranking, so more recent documents surface first for topics where recency matters. The agent is prompted to prefer recent sources explicitly for time-sensitive questions (e.g. current policies, benefit enrollment windows). When multiple sources genuinely contradict, the agent surfaces the contradiction with both positions cited rather than picking one and hiding the other. For organizations with significant documentation debt, we offer a content audit as part of scoping: we identify the most problematic overlap areas (policies updated in 3 places, procedures in 5 different documents) before indexing, and route cleanup to content owners. The agent quality is bounded by content quality.
Can the system access our SharePoint and respect existing permission groups?+
Yes. We integrate with SharePoint Online and OneDrive via Microsoft Graph API. Permissions are resolved at query time against the user's current Azure AD group memberships and each document's effective ACL. A user who cannot read a document in SharePoint (because they're not in the required group, or because item-level permissions exclude them) will not receive answers sourced from that document. This is enforced in two places: the retrieval filter and the synthesis prompt, so there's defense in depth. Permission enforcement at query time is a non-negotiable design requirement, not an optional feature. We do not cache permissions in a way that could grant stale access.
What happens when the AI gives a wrong answer?+
Three mechanisms. First, every answer includes inline citations to the specific source passages, so users can verify the answer against the source without leaving the interface. Second, a thumbs-up/thumbs-down and a 'report issue' button capture user feedback, which routes to the knowledge owner and to the content team. Third, a quality dashboard tracks thumbs-down rate by topic, re-query patterns, and specific source documents that produce disagreement. When a pattern emerges (e.g. the agent consistently cites an outdated policy), we update the source and the index within hours. Users are trained during rollout to verify cited sources for decisions that matter, which is a healthy habit regardless of whether the answer is AI-generated.
How much content can the system handle, and is there a limit to knowledge base size?+
There's no practical upper bound for enterprise use cases. Our deployments range from 50K documents to over 2M documents, in both cases with query response times under 3 seconds. Pinecone handles horizontal scaling; the embedding-index size is the cost driver. At 2M documents with ~5 chunks per document the vector index runs roughly $800-1,200/month in infrastructure. During scoping we assess your document volume, growth rate, and query volume and right-size the infrastructure. The architecture is read-heavy and scales linearly, so adding a new source (e.g. a 500K document archive) is a planned indexing job rather than an architectural change.
How does the agent handle edge cases it hasn't seen before?+
The agent is explicit about the limits of its knowledge base. When a question has no retrievable match above the relevance threshold, the agent says 'I don't have enough information to answer confidently' and offers to route the question to a knowledge owner or to log it as a gap. For ambiguous questions (multiple interpretations), the agent asks a clarifying follow-up rather than guessing. For questions that fall outside the knowledge base's scope (e.g. personal HR data the agent doesn't have access to), the agent explicitly redirects to the correct system or person. The agent never fabricates an answer. Hallucination is prevented both architecturally (synthesis is constrained to retrieved content) and prompt-level (explicit instructions and grounding checks).
What happens when the agent is wrong?+
Wrong answers in a RAG system usually trace to one of three root causes. First, the source document is itself wrong or outdated. We fix it at the source and the agent follows. Second, retrieval missed the right document. We tune the embedding model, chunking strategy, or add the document to a higher-priority bucket. Third, synthesis misinterpreted retrieved content. We adjust the synthesis prompt or add grounding constraints. Every user-reported error triggers a small investigation with the output reaching the content or retrieval team. Across deployments, reported error rate trends from 4-6% in the first month to under 1% by month 3 as systemic issues are identified and fixed.
How do we audit every decision?+
Every query writes to a log: user ID, timestamp, original question, rewritten query, retrieved documents with relevance scores, final answer, citations, and any user feedback. Logs export to your SIEM (Splunk, Elastic, Datadog) or a dedicated analytics database. For regulated environments we add prompt and response logging with configurable retention, PII redaction rules, and access controls on the log itself (not everyone who can query can read query logs). A governance dashboard shows query volume, top topics, coverage gaps, access violations if any, and source attribution frequency. Several financial services clients use the log for FINRA-relevant questions to prove what knowledge was available and when.
How long to production?+
A focused deployment indexing 2-3 primary sources (e.g. SharePoint, Confluence, ServiceNow KB) with 50K-200K documents runs 6-8 weeks. Weeks 1-2 are discovery: source access, permission architecture, user interface choice. Weeks 3-4 build the indexing pipeline and run the first full sync. Weeks 5-6 deploy the query interface (Teams bot, web chat, or embedded widget) and run a beta with 30-50 pilot users. Weeks 7-8 incorporate pilot feedback, tune retrieval and synthesis, and roll out broadly. Adding additional sources (a fourth, fifth, sixth) takes 1-2 weeks per source. Most of the complexity is permissions and access control, not the AI. If your access architecture is clean, rollout is faster.

Related reading

AI Agents vs Chatbots: They're Not the Same Thing

Every week someone tells me they want to build an AI agent when what they actually need is a chatbot. Or worse, they build a chatbot when they need an agent. Here's how to tell the difference.

Build AI In-House vs Hire a Consultancy: The Real 2026 Cost Comparison

The build vs buy decision for AI is more nuanced than most comparisons suggest. Here is what the full cost of each path actually looks like in 2026.

What Does Enterprise RAG Actually Cost? A Breakdown

Enterprise RAG costs range from $40K to $150K+ to build, with $2K-$8K in monthly ongoing costs. Here is a full breakdown by component so you can budget accurately.

Ready to build this for your team?

We take this from concept to production deployment. Usually in 3–6 weeks.

Start Your Project →