As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.
Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.
Rohit loves to connect on http://www.productivecloudinnovation.com.
http://linkedin.com/in/rohit-bhardwaj-cloud or using Twitter at rbhardwaj1.
Modern system design has entered a new era. It’s no longer enough to optimize for uptime and latency — today’s systems must also be AI-ready, token-efficient, trustworthy, and resilient. Whether building global-scale apps, powering recommendation engines, or integrating GenAI agents, architects need new skills and playbooks to design for scale, speed, and reliability.
This full-day workshop blends classic distributed systems knowledge with AI-native thinking. Through case studies, frameworks, and hands-on design sessions, you’ll learn to design systems that balance performance, cost, resilience, and truthfulness — and walk away with reusable templates you can apply to interviews and real-world architectures.
Target Audience
Enterprise & Cloud Architects → building large-scale, AI-ready systems.
Backend Engineers & Tech Leads → leveling up to system design mastery.
AI/ML & Data Engineers → extending beyond pipelines to full-stack AI systems.
FAANG & Big Tech Interview Candidates → preparing for system design interviews with an AI twist.
Engineering Managers & CTO-track Leaders → guiding teams through AI adoption.
Startup Founders & Builders → scaling AI products without burning money.
Learning Outcomes
By the end of the workshop, participants will be able to:
Apply a 7-step system design framework extended for AI workloads.
Design systems that scale for both requests and tokens.
Architect multi-provider failover and graceful degradation ladders.
Engineer RAG 2.0 pipelines with hybrid search, GraphRAG, and semantic caching.
Implement AI trust & security with guardrails, sandboxing, and red-teaming.
Build observability dashboards for hallucination %, drift, token costs.
Reimagine real-world platforms (Uber, Netflix, Twitter, Instagram) with AI integration.
Practice mock interviews & chaos drills to defend trade-offs under pressure.
Take home reusable templates (AI System Design Canvas, RAG Checklist, Chaos Runbook).
Gain the confidence to lead AI-era system design in interviews, enterprises, or startups.
Workshop Agenda (Full-Day, 8 Hours)
Session 1 – Foundations of Modern System Design (60 min)
The new era: Why classic design is no longer enough.
Architecture KPIs in the AI age: latency, tokens, hallucination %, cost.
Group activity: brainstorm new KPIs.
Session 2 – Frameworks & Mindset (75 min)
The 7-Step System Design Framework (AI-extended).
Scaling humans vs tokens.
Token capacity planning exercise.
Session 3 – Retrieval & Resilience (75 min)
RAG 2.0 patterns: chunking, hybrid retrieval, GraphRAG, semantic cache.
Multi-provider resilience + graceful degradation ladders.
Whiteboard lab: design a resilient RAG pipeline.
Session 4 – Security & Observability (60 min)
Threats: prompt injection, data exfiltration, abuse.
Guardrails, sandboxing, red-teaming.
Observability for LLMs: traces, cost dashboards, drift monitoring.
Activity: STRIDE threat-modeling for an LLM endpoint.
Session 5 – Real-World System Patterns (90 min)
Uber, Netflix, Instagram, Twitter, Search, Fraud detection, Chatbot.
AI-enhanced vs classic system designs.
Breakout lab: redesign a system with AI augmentation.
Session 6 – Interviews & Chaos Drills (75 min)
Mock interview challenges: travel assistant, vector store sharding.
Peer review of trade-offs, diagrams, storytelling.
Chaos drills: provider outage, token overruns, fallback runbooks.
Closing (15 min)
Recap: 3 secrets (Scaling tokens, RAG as index, Resilient degradation).
Templates & takeaways: AI System Design Canvas, RAG Checklist, Chaos Runbook.
Q&A + networking.
Takeaways for Participants
AI System Design Canvas (framework for interviews & real-world reviews).
RAG 2.0 Checklist (end-to-end retrieval playbook).
Chaos Runbook Template (resilience drill starter kit).
AI SLO Dashboard template for observability + FinOps.
Confidence to design and defend AI-ready architectures in both career and enterprise contexts.
AI inference is the new production workload — always on, cost-intensive, and increasingly complex. Many teams face latency spikes at P99, runaway GPU bills, and limited observability across their agentic and RAG pipelines.
This session delivers practical, vendor-aware patterns for reliable and sustainable inference at scale.
You’ll explore queueing, caching, GPU pooling, FinOps, and GreenOps strategies grounded in the Google Cloud AI/ML Well-Architected Framework, Azure AI Workload Guidance, and the Databricks Lakehouse Principles — enabling you to build inference systems that are performant, efficient, and planet-friendly.
Problems Solved
What You’ll Learn
Agenda
Opening & Context
: Why inference reliability, observability, and sustainability define the next stage of enterprise AI.
Case study: A RAG system suffering from unpredictable latency and GPU overspend — what went wrong and what patterns fix it.
Pattern 1: Reliable Inference Flow Design Engineering for bursty demand. Patterns: async queues, back-pressure controls, serverless triggers, GPU pooling vs autoscaling, and caching strategies for RAG.
Pattern 2: Observability & Instrumentation Full-stack tracing for inference workloads. Prompt-level metrics, vector query instrumentation, GPU telemetry, OpenTelemetry integration, and structured prompt logging.
Pattern 3: FinOps for AI Controlling inference cost without losing reliability. Cost attribution, tagging GPU workloads, quantization and model distillation, choosing preemptible/spot instances, and cross-cloud FinOps tooling (GCP Recommender, Azure Advisor, Databricks Cost Profiler).
Pattern 4: GreenOps & Sustainability , Reducing the environmental footprint of AI pipelines. SCI (Software Carbon Intensity) metrics, carbon-aware scheduling, time-shifting inference jobs, and sustainable scaling practices.
Cross-Cloud Well-Architected Anchors Mapping patterns to major frameworks:
Wrap-Up & Discussion Recap of proven design patterns, FinOps + GreenOps checklist, and architectural recommendations for enterprise AI teams.
Key Framework References
Takeaways
Dynamic Programming (DP) intimidates even seasoned engineers. With the right lens, it’s just optimal substructure + overlapping subproblems turned into code. In this talk, we start from a brute-force recursive baseline, surface the recurrence, convert it to memoization and tabulation, and connect it to real systems (resource allocation, routing, caching). Along the way you’ll see how to use AI tools (ChatGPT, Copilot) to propose recurrences, generate edge cases, and draft tests—while you retain ownership of correctness and complexity. Expect pragmatic patterns you can reuse in interviews and production.
Why Now
Key Framework
Core Content
Learning Outcomes