Overview
Executive summary
The client's storefront relied on manually curated bestseller lists on every page. Conversion on recommendation rails had plateaued, high-margin accessories and replacement parts were rarely shown, and each new category launch required weeks of merchandising rule tuning.
We replaced that stack with a real-time AI platform that combines retrieval, ranking, and serving(sub-80 ms p95 via CDN edge). Merchandising keeps control through a rule overlay — no deploy to pin promos or block categories.
Challenge
Business problems and how we solved them
1Static bestseller rails for every shopper
Homepage, PDP, and cart showed the same products regardless of session — weak attach on cables, chargers, and accessories.
Technique · Session-aware two-tower retrieval + learning-to-rank (LTR).
Solution · User tower: last 20 views (Transformer), device, category affinity. Item tower: MiniLM specs, ViT pack-shots, accessory graph. LightGBM re-scores top-120 ANN candidates with 28 business features before 12 slots fill.
2Long-tail SKUs invisible in recommendations
High-margin parts appeared in fewer than 5% of recommendation impressions.
Technique · Hybrid CF + content embeddings + MMR diversity.
Solution · ALS warms popular SKUs; cold items use content vectors day one. MMR on category and price band. Dashboard flags zero-impression SKUs after 72h.
3Merchandising blocked on engineering
Marketing could not pin campaigns without code deploys.
Technique · Rule overlay DSL after LTR + shadow traffic.
Solution · JSON rules for pins, boosts, blocks, margin floors. Shadow mode tests rule sets on 5% traffic before publish.
4Latency and peak-traffic risk
Legacy stack could not personalize under 100 ms at peak.
Technique · Edge cache + HNSW ANN + precomputed user vectors.
Solution · Redis user vectors refresh every 30 s. Lambda@Edge cache (TTL 90 s). pgvector ANN ~12 ms p95; full path under 80 ms p95.
System design
Architecture diagrams
Request flow, two-tower retrieval, ranking funnel, and the backend service mesh that powers real-time personalization.
Storefront → edge → ranker → retrieval → features, with an event loop for online learning.
User and item towers; HNSW returns top-K before LTR.
12k SKUs → ANN (K=120) → LTR → rules & MMR → 12 slots.
Kafka, Redis, PyTorch, pgvector, ranker, rules, API, edge CDN, trainer.
Engineering
Data pipeline, serving, and MLOps
Event ingestion
Kafka partitions by user_id into real-time feature hashes in Redis.
- product_viewed
- add_to_cart
- purchase_completed
Storage & retrieval
pgvector HNSW index for sub-15 ms ANN lookups at catalog scale.
ef_construction
128
m
16
Redis TTL
30 s
Cold start
Cohort centroids
Edge serving
FastAPI batch ranker with ONNX item tower; Lambda@Edge caches personalized rails globally.
- homepage_hero
- pdp_fbt
- cart_upsell
Release governance
NDCG@12 and catalog-coverage gates before every promote. Auto-rollback if CTR drops more than 3%.
- 10%
- 50%
- 100%





