Skip to main content
Mujtaba Farooq logoMujtaba
Back to Projects

SupportPilotAIAgentforCustomerSupportTriage

SaaSSaaS
Node.jsNestJSOpenAI APIPostgreSQLRedisBullMQ

<60s

First-Response Time

Down from 4+ hours average

68%

Auto-Resolved Tickets

Resolved without human involvement

96%

Escalation Accuracy

Correctly routed to the right team

+22%

Customer CSAT

Improvement after rollout

Illustrative Project

SupportPilot is an illustrative example demonstrating an AI agent development pattern, not a completed client engagement. It represents the kind of system I design and build for support-heavy businesses.

Overview

SupportPilot is an autonomous support agent designed to sit in front of a company's existing ticketing system. Instead of routing every incoming ticket to a human queue, the agent reads each ticket, looks up relevant account and order context, determines whether it can resolve the request directly, and either takes action or escalates with a full summary attached.

The Challenge

Most support teams face the same bottleneck: the majority of tickets are routine — order status, simple refunds, password resets, basic account questions — but they still consume the same triage time as genuinely complex cases before a human even starts working on them. The goal was to remove that triage tax entirely for the routine majority while making escalation seamless and well-informed for everything else.

The harder constraint was trust: a support agent that occasionally makes a confidently wrong decision — approving a refund it shouldn't, or closing a ticket prematurely — does more damage to customer trust than a slow human queue ever would.

Architecture & Technical Decisions

Tool-Scoped Agent Design

Rather than giving the agent broad access to the support and billing systems, I defined a narrow set of tools it could call: look up order, look up account, issue refund (capped at a configurable dollar threshold), update ticket status, and escalate with summary. Each tool had its own validation layer in code — the agent's reasoning decided when to call a tool, but the tool itself enforced hard limits regardless of what the agent's reasoning concluded.

  • Refund tool hard-capped at the original order total, enforced in code, not by prompt instruction
  • Account lookup tool scoped to read-only fields needed for support context, not full account data
  • Escalation tool required as the fallback for any case below a confidence threshold

Confidence-Based Escalation

The agent was prompted to explicitly state its confidence and reasoning before taking any action with real consequences. Below a calibrated confidence threshold — determined through evaluation against a labeled set of historical tickets — the agent escalates automatically rather than acting. This threshold was tuned conservatively at launch and gradually relaxed for specific action types as evaluation data showed sustained reliability.

Asynchronous Processing With BullMQ

Incoming tickets are queued rather than processed synchronously, since LLM reasoning and multiple tool calls take a few seconds — long enough that blocking the ticketing system's webhook response would be poor practice. A dedicated worker pool processes the queue, with retry logic for transient API failures and dead-letter handling for tickets that fail processing repeatedly.

Full Audit Trail

Every ticket processed by the agent logs its reasoning, the tools it called, the arguments passed, and the final action taken. This made it possible to review edge cases after the fact, build a growing evaluation set from real production tickets, and explain to the support team exactly why the agent did what it did on any given ticket.

Results

  • 68% of incoming tickets resolved end-to-end without human involvement
  • First-response time on auto-resolved tickets dropped to under 60 seconds, down from a 4+ hour average queue time
  • 96% of escalated tickets were correctly routed with accurate context, reducing the back-and-forth that previously happened when tickets were escalated without enough information
  • Customer satisfaction scores rose 22% in the months following rollout, primarily attributed to faster resolution on routine requests

What I Learned

The most valuable engineering work wasn't the agent's reasoning quality — it was the tool design and the hard limits enforced outside the model's control. A well-scoped tool with a hard cap is worth more for safety than a clever prompt asking the model to be careful. Confidence-based escalation, tuned conservatively and relaxed gradually based on real evaluation data, was what made it possible to trust the system enough to actually deploy it.

Related Projects