CourseOptions — Intelligent Course Discovery Engine

Web AppSaaS

Next.jsNode.jsElasticsearchPostgreSQLAWSRedis

50K+

Courses Indexed

200+

Providers

500K+

Active Learners

<100ms

Search Speed

Overview

CourseOptions is an intelligent course discovery engine that aggregates online learning content from 200+ providers — Coursera, Udemy, edX, LinkedIn Learning, and dozens more — into a single, searchable, personalized platform. Half a million learners use it to find the right course for their career goals without spending hours comparing platforms.

I joined as Senior Full-Stack Engineer and owned the ingestion pipeline, the Elasticsearch search layer, and the Next.js frontend from day one.

The Challenge

The data problem was formidable: 200+ providers with wildly inconsistent data schemas, update frequencies ranging from real-time to monthly, and course attributes that didn't map cleanly across platforms. A Coursera "specialization" and a Udemy "course" are structurally different products — our schema had to normalize them without losing fidelity.

On the search side, users expected Google-level relevance. A query for "machine learning for beginners" needed to surface beginner-appropriate courses first, account for course ratings and freshness, and de-rank duplicates from multiple providers. Standard Elasticsearch out-of-the-box scoring wasn't enough.

Architecture & Technical Decisions

Multi-Provider Ingestion Pipeline

I built a plugin-style ingestion architecture where each provider has a dedicated adapter implementing a standard interface. Adapters handle auth (OAuth, API keys, scraping where unavoidable), rate limiting, and schema normalization. A central orchestrator schedules jobs per provider based on their update cadence, using BullMQ for reliable execution and retry logic.

200+ provider adapters with pluggable auth strategies
Canonical course schema with provider-specific metadata stored as JSONB
Deduplication via fingerprinting (title + provider + URL hash) before indexing
Dead-letter queue for failed ingestion jobs with Slack alerting
Full re-index triggered nightly; incremental sync every 4 hours for top providers

Elasticsearch Search & Relevance Tuning

The search layer used a custom scoring model built on top of Elasticsearch's BM25 baseline. I added function score queries that boosted results based on: average rating (weighted by review count), freshness (recency decay), provider reputation score, and enrollment velocity. Query-time boosting for beginner/advanced tags based on inferred user level from their history completed the relevance stack.

Multi-field search across title, description, instructor, and tags with per-field boosts
Function score query with Gaussian decay for recency and sigmoid for rating confidence
Synonym expansion ("ML" → "machine learning") via custom analyzer
Auto-complete via edge n-gram tokenization on title field
A/B tested scoring weights against click-through rate — improved CTR by 28%

Frontend Performance

The Next.js frontend used ISR (Incremental Static Regeneration) for course detail pages, pre-rendering the top 10K most-visited courses at build time and regenerating on a 1-hour stale window. Search results pages used streaming SSR to send the page shell immediately while the Elasticsearch query resolved. Core Web Vitals: LCP 0.9s, CLS 0, FID <50ms.

Results

50K+ courses indexed from 200+ providers with <0.1% schema normalization errors
Search p95 latency: 94ms including Elasticsearch query + Redis cache check
500K+ monthly active learners with 4.1-minute average session duration
Recommendation click-through rate 3.2x higher than generic browse
Ingestion pipeline reliability: 99.7% job success rate over 12 months

What I Learned

Search relevance is a product problem before it's a technical problem. The best Elasticsearch configuration in the world won't save you if you don't understand what "good result" means to your users. Instrumenting every search interaction — what users clicked, what they refined, what they abandoned — and feeding that signal back into scoring weights was as valuable as any infrastructure decision.

Related Projects

ComplianceWatch — Regulatory Filing Monitoring Agent

An agent that monitors regulatory filing deadlines across multiple jurisdictions, drafts filing checklists from source requirements, and proactively alerts the compliance team — replacing a manually maintained spreadsheet of deadlines.

Node.jsNestJSClaude APIPostgreSQL+1

-100% Missed Deadline Risk

OnboardAI — Automated Employee Onboarding Workflow Agent

An agent that orchestrates the multi-step, multi-system employee onboarding process — provisioning accounts, scheduling training, and answering new-hire questions — that previously required a coordinator manually tracking a checklist across five tools.

Next.jsNestJSOpenAI APIPostgreSQL+1

12hrs Coordinator Hours Saved

QABot — Bug Triage & Reproduction Agent for Engineering Teams

An agent that reviews incoming bug reports, attempts reproduction using available context, checks for duplicates, assigns severity, and routes to the right engineering team with relevant logs attached.

Node.jsNestJSClaude APIPostgreSQL+1

70% Triage Time Reduced