Rohit Bhardwaj

Director of Architecture, Expert in cloud-native solutions

Rohit Bhardwaj is a Director of Architecture working at Salesforce. Rohit has extensive experience architecting multi-tenant cloud-native solutions in Resilient Microservices Service-Oriented architectures using AWS Stack. In addition, Rohit has a proven ability in designing solutions and executing and delivering transformational programs that reduce costs and increase efficiencies.

As a trusted advisor, leader, and collaborator, Rohit applies problem resolution, analytical, and operational skills to all initiatives and develops strategic requirements and solution analysis through all stages of the project life cycle and product readiness to execution.
Rohit excels in designing scalable cloud microservice architectures using Spring Boot and Netflix OSS technologies using AWS and Google clouds. As a Security Ninja, Rohit looks for ways to resolve application security vulnerabilities using ethical hacking and threat modeling. Rohit is excited about architecting cloud technologies using Dockers, REDIS, NGINX, RightScale, RabbitMQ, Apigee, Azul Zing, Actuate BIRT reporting, Chef, Splunk, Rest-Assured, SoapUI, Dynatrace, and EnterpriseDB. In addition, Rohit has developed lambda architecture solutions using Apache Spark, Cassandra, and Camel for real-time analytics and integration projects.

Rohit has done MBA from Babson College in Corporate Entrepreneurship, Masters in Computer Science from Boston University and Harvard University. Rohit is a regular speaker at No Fluff Just Stuff, UberConf, RichWeb, GIDS, and other international conferences.

Rohit loves to connect on http://www.productivecloudinnovation.com.
http://linkedin.com/in/rohit-bhardwaj-cloud or using Twitter at rbhardwaj1.

Presentations

System Design AI Mastery: Architecting for Scale, Speed, Reliability - Full Day

A hands-on deep dive into building cloud-native, AI-augmented systems

Modern system design has entered a new era. It’s no longer enough to optimize for uptime and latency — today’s systems must also be AI-ready, token-efficient, trustworthy, and resilient. Whether building global-scale apps, powering recommendation engines, or integrating GenAI agents, architects need new skills and playbooks to design for scale, speed, and reliability.

This full-day workshop blends classic distributed systems knowledge with AI-native thinking. Through case studies, frameworks, and hands-on design sessions, you’ll learn to design systems that balance performance, cost, resilience, and truthfulness — and walk away with reusable templates you can apply to interviews and real-world architectures.

Target Audience

Enterprise & Cloud Architects → building large-scale, AI-ready systems.

Backend Engineers & Tech Leads → leveling up to system design mastery.

AI/ML & Data Engineers → extending beyond pipelines to full-stack AI systems.

FAANG & Big Tech Interview Candidates → preparing for system design interviews with an AI twist.

Engineering Managers & CTO-track Leaders → guiding teams through AI adoption.

Startup Founders & Builders → scaling AI products without burning money.

Learning Outcomes

By the end of the workshop, participants will be able to:

Apply a 7-step system design framework extended for AI workloads.

Design systems that scale for both requests and tokens.

Architect multi-provider failover and graceful degradation ladders.

Engineer RAG 2.0 pipelines with hybrid search, GraphRAG, and semantic caching.

Implement AI trust & security with guardrails, sandboxing, and red-teaming.

Build observability dashboards for hallucination %, drift, token costs.

Reimagine real-world platforms (Uber, Netflix, Twitter, Instagram) with AI integration.

Practice mock interviews & chaos drills to defend trade-offs under pressure.

Take home reusable templates (AI System Design Canvas, RAG Checklist, Chaos Runbook).

Gain the confidence to lead AI-era system design in interviews, enterprises, or startups.

Workshop Agenda (Full-Day, 8 Hours)
Session 1 – Foundations of Modern System Design (60 min)

The new era: Why classic design is no longer enough.

Architecture KPIs in the AI age: latency, tokens, hallucination %, cost.

Group activity: brainstorm new KPIs.

Session 2 – Frameworks & Mindset (75 min)

The 7-Step System Design Framework (AI-extended).

Scaling humans vs tokens.

Token capacity planning exercise.

Session 3 – Retrieval & Resilience (75 min)

RAG 2.0 patterns: chunking, hybrid retrieval, GraphRAG, semantic cache.

Multi-provider resilience + graceful degradation ladders.

Whiteboard lab: design a resilient RAG pipeline.

Session 4 – Security & Observability (60 min)

Threats: prompt injection, data exfiltration, abuse.

Guardrails, sandboxing, red-teaming.

Observability for LLMs: traces, cost dashboards, drift monitoring.

Activity: STRIDE threat-modeling for an LLM endpoint.

Session 5 – Real-World System Patterns (90 min)

Uber, Netflix, Instagram, Twitter, Search, Fraud detection, Chatbot.

AI-enhanced vs classic system designs.

Breakout lab: redesign a system with AI augmentation.

Session 6 – Interviews & Chaos Drills (75 min)

Mock interview challenges: travel assistant, vector store sharding.

Peer review of trade-offs, diagrams, storytelling.

Chaos drills: provider outage, token overruns, fallback runbooks.

Closing (15 min)

Recap: 3 secrets (Scaling tokens, RAG as index, Resilient degradation).

Templates & takeaways: AI System Design Canvas, RAG Checklist, Chaos Runbook.

Q&A + networking.

Takeaways for Participants

AI System Design Canvas (framework for interviews & real-world reviews).

RAG 2.0 Checklist (end-to-end retrieval playbook).

Chaos Runbook Template (resilience drill starter kit).

AI SLO Dashboard template for observability + FinOps.

Confidence to design and defend AI-ready architectures in both career and enterprise contexts.

AI Inference at Scale: Reliability, Observability, Cost & Sustainability

Proven patterns to keep P99 low, bills sane, and carbon down for Cross-Cloud

AI inference is the new production workload — always on, cost-intensive, and increasingly complex. Many teams face latency spikes at P99, runaway GPU bills, and limited observability across their agentic and RAG pipelines.
This session delivers practical, vendor-aware patterns for reliable and sustainable inference at scale.
You’ll explore queueing, caching, GPU pooling, FinOps, and GreenOps strategies grounded in the Google Cloud AI/ML Well-Architected Framework, Azure AI Workload Guidance, and the Databricks Lakehouse Principles — enabling you to build inference systems that are performant, efficient, and planet-friendly.

Problems Solved

  • Latency spikes at P95/P99 under bursty inference workloads
  • Runaway GPU/TPU costs and inefficient utilization
  • Lack of observability in multi-agent and vector retrieval pipelines
  • Cache inefficiency and poor vector store tuning
  • Unmeasured energy and carbon footprint in AI workloads

What You’ll Learn

  • When to use serverless triggers, async queues, or GPU pooling
  • How to instrument prompts, vector queries, and GPU utilization end-to-end
  • FinOps guardrails: cost attribution, right-sizing, and preemptible instances
  • GreenOps practices: SCI metrics, time-/region-aware scaling, energy optimization
  • How to map reliability and sustainability principles across GCP, Azure, and Databricks

Agenda
Opening & Context
: Why inference reliability, observability, and sustainability define the next stage of enterprise AI.
Case study: A RAG system suffering from unpredictable latency and GPU overspend — what went wrong and what patterns fix it.

Pattern 1: Reliable Inference Flow Design
Engineering for bursty demand.
Patterns: async queues, back-pressure controls, serverless triggers, GPU pooling vs autoscaling, and caching strategies for RAG.

Pattern 2: Observability & Instrumentation
Full-stack tracing for inference workloads.
Prompt-level metrics, vector query instrumentation, GPU telemetry, OpenTelemetry integration, and structured prompt logging.

Pattern 3: FinOps for AI
Controlling inference cost without losing reliability.
Cost attribution, tagging GPU workloads, quantization and model distillation, choosing preemptible/spot instances, and cross-cloud FinOps tooling (GCP Recommender, Azure Advisor, Databricks Cost Profiler).

Pattern 4: GreenOps & Sustainability
, Reducing the environmental footprint of AI pipelines.
SCI (Software Carbon Intensity) metrics, carbon-aware scheduling, time-shifting inference jobs, and sustainable scaling practices.

Cross-Cloud Well-Architected Anchors
Mapping patterns to major frameworks:

  • Google Cloud AI/ML Well-Architected Framework (Reliability, Cost, Sustainability)
  • Azure AI Workload Guidance and sustainability assessment tools
  • Databricks Lakehouse Well-Architected Principles (governance, performance, sustainability trade-offs)

Wrap-Up & Discussion
Recap of proven design patterns, FinOps + GreenOps checklist, and architectural recommendations for enterprise AI teams.

Key Framework References

  • Google Cloud: AI/ML Well-Architected Framework (Reliability, Cost, Sustainability)
  • Azure: Well-Architected for AI Workloads + Sustainability Tools
  • Databricks: Seven-Pillar Lakehouse Principles
  • FinOps Foundation: AI/ML Cost Allocation & Efficiency Models
  • Green Software Foundation: SCI (Software Carbon Intensity) Metrics

Takeaways

  • Cross-Cloud Inference Pattern Playbook
  • FinOps & GreenOps Implementation Checklist
  • Observability and Instrumentation Reference Map for AI pipelines

Dynamic Programming Demystified: How AI Helps You See the Pattern

From brute force → recurrence → memoization → tabulation with AI as accelerator

Dynamic Programming (DP) intimidates even seasoned engineers. With the right lens, it’s just optimal substructure + overlapping subproblems turned into code. In this talk, we start from a brute-force recursive baseline, surface the recurrence, convert it to memoization and tabulation, and connect it to real systems (resource allocation, routing, caching). Along the way you’ll see how to use AI tools (ChatGPT, Copilot) to propose recurrences, generate edge cases, and draft tests—while you retain ownership of correctness and complexity. Expect pragmatic patterns you can reuse in interviews and production.

Why Now

  • DP = #1 fear topic in interviews.
  • Used in systems: caching, routing, scheduling.
  • 55% faster with Copilot, but needs guardrails.
  • AI adoption is surging — structure required.

Key Framework

  • Find optimal substructure.
  • Spot overlapping subproblems.
  • Start brute force → derive recurrence.
  • Memoization → tabulation.
  • Compare vs. greedy & divide-and-conquer.
  • Use AI for tests & recurrences, not correctness.

Core Content

  • Coin Change: brute force → DP; greedy fails in non-canonical coins.
  • 0/1 Knapsack: DP works, greedy fails; fractional knapsack = greedy.
  • LIS: O(n²) DP vs. O(n log n) patience method.
  • Graphs: shortest path as DP on DAGs.
  • AI Demos: recurrence suggestion, edge-case generation.

Learning Outcomes

  • Know when a problem is DP-worthy.
  • Build recurrence → memoization → tabulation.
  • Decide Greedy vs DP confidently.
  • Apply AI prompts safely (tests, refactors).
  • Map DP to real-world systems.