CourseOptions — Intelligent Course Discovery Engine
50K+
Courses Indexed
200+
Providers
500K+
Active Learners
<100ms
Search Speed
Overview
CourseOptions is an intelligent course discovery engine that aggregates online learning content from 200+ providers — Coursera, Udemy, edX, LinkedIn Learning, and dozens more — into a single, searchable, personalized platform. Half a million learners use it to find the right course for their career goals without spending hours comparing platforms.
I joined as Senior Full-Stack Engineer and owned the ingestion pipeline, the Elasticsearch search layer, and the Next.js frontend from day one.
The Challenge
The data problem was formidable: 200+ providers with wildly inconsistent data schemas, update frequencies ranging from real-time to monthly, and course attributes that didn't map cleanly across platforms. A Coursera "specialization" and a Udemy "course" are structurally different products — our schema had to normalize them without losing fidelity.
On the search side, users expected Google-level relevance. A query for "machine learning for beginners" needed to surface beginner-appropriate courses first, account for course ratings and freshness, and de-rank duplicates from multiple providers. Standard Elasticsearch out-of-the-box scoring wasn't enough.
Architecture & Technical Decisions
Multi-Provider Ingestion Pipeline
I built a plugin-style ingestion architecture where each provider has a dedicated adapter implementing a standard interface. Adapters handle auth (OAuth, API keys, scraping where unavoidable), rate limiting, and schema normalization. A central orchestrator schedules jobs per provider based on their update cadence, using BullMQ for reliable execution and retry logic.
- 200+ provider adapters with pluggable auth strategies
- Canonical course schema with provider-specific metadata stored as JSONB
- Deduplication via fingerprinting (title + provider + URL hash) before indexing
- Dead-letter queue for failed ingestion jobs with Slack alerting
- Full re-index triggered nightly; incremental sync every 4 hours for top providers
Elasticsearch Search & Relevance Tuning
The search layer used a custom scoring model built on top of Elasticsearch's BM25 baseline. I added function score queries that boosted results based on: average rating (weighted by review count), freshness (recency decay), provider reputation score, and enrollment velocity. Query-time boosting for beginner/advanced tags based on inferred user level from their history completed the relevance stack.
- Multi-field search across title, description, instructor, and tags with per-field boosts
- Function score query with Gaussian decay for recency and sigmoid for rating confidence
- Synonym expansion ("ML" → "machine learning") via custom analyzer
- Auto-complete via edge n-gram tokenization on title field
- A/B tested scoring weights against click-through rate — improved CTR by 28%
Frontend Performance
The Next.js frontend used ISR (Incremental Static Regeneration) for course detail pages, pre-rendering the top 10K most-visited courses at build time and regenerating on a 1-hour stale window. Search results pages used streaming SSR to send the page shell immediately while the Elasticsearch query resolved. Core Web Vitals: LCP 0.9s, CLS 0, FID <50ms.
Results
- 50K+ courses indexed from 200+ providers with <0.1% schema normalization errors
- Search p95 latency: 94ms including Elasticsearch query + Redis cache check
- 500K+ monthly active learners with 4.1-minute average session duration
- Recommendation click-through rate 3.2x higher than generic browse
- Ingestion pipeline reliability: 99.7% job success rate over 12 months
What I Learned
Search relevance is a product problem before it's a technical problem. The best Elasticsearch configuration in the world won't save you if you don't understand what "good result" means to your users. Instrumenting every search interaction — what users clicked, what they refined, what they abandoned — and feeding that signal back into scoring weights was as valuable as any infrastructure decision.