Playbooks 2025 · 11 min read

AI System Debugging Playbook

Debugging AI applications — tracing issues, reproducing failures, and systematic root cause analysis.

DebuggingOperations

Get weekly AI insights

Architecture patterns, implementation guides, and engineering leadership — delivered weekly.

Executive Summary

Let me tell you about Priya. She is a tech lead at a mid-size company in Pune. Her boss asked her to build an AI system for their customer support team. Priya is smart — she knows Python, she has used APIs before. But she had never built a production AI system.

She spent three months building something that worked great in demos but fell apart with real customers. The AI gave wrong answers. It was slow. It cost too much. Sound familiar? This playbook exists so you do not have to learn these lessons the hard way like Priya did. Every step here comes from real mistakes and real successes.

The Problem This Playbook Solves

Imagine you are building a house. Would you start by painting the walls? Of course not — you would start with the foundation, then the walls, then the roof, and finally the paint. But in AI projects, I see teams "painting walls" all the time. They jump to the exciting parts (choosing models, writing prompts) and skip the boring-but-critical parts (understanding the problem, preparing data, setting up evaluation).

This playbook gives you the right order. Each step builds on the previous one. Skip a step, and you will have to come back to it later — except now it will cost 10x more time and money to fix. Trust the process, follow the steps, and you will build something that actually works in the real world.

Step 1: Understanding Your Problem (Do Not Skip This!)

I know, I know — you want to start coding. But hear me out. The number one reason AI projects fail is not bad code or wrong models. It is solving the wrong problem, or solving the right problem in the wrong way.

Let me give you a real example. A food delivery company wanted to use AI to "improve customer experience." That is too vague. After digging deeper, they found the real problem: customers were calling support to ask "Where is my order?" 5,000 times a day. Now THAT is a specific problem you can solve with AI.

Before you write a single line of code, answer these questions:

What specific problem are you solving? — Not "use AI for customer support" but "automatically answer order status questions so support agents can handle complex issues"
How do you measure success? — "Reduce order status calls by 70% within 3 months"
What data do you have? — "We have 50,000 past support conversations and our order tracking database"
What is your budget? — "Rs 50,000 per month for AI infrastructure"
Who will use this? — "Customers via WhatsApp and our website chat widget"

Write these answers down. Seriously. Pin them on your wall. Every decision you make from now on should be checked against these answers. If something does not help you achieve your specific goal, do not do it.

Step 2: Writing Your First Working Code

Alright, time to get our hands dirty with real code! I am going to walk you through building a complete, working system from scratch. Every single line is explained — no magic, no "just trust me" moments.

Think of this code like a recipe. I will tell you what each ingredient does and why we are adding it. By the end, you will understand not just WHAT the code does, but WHY it does it that way.

# AI System Debugging Playbook - Complete Working Example
# You can copy this entire file and run it!

import json
import time
import hashlib
from datetime import datetime, timedelta

# ── The Foundation: Your Data Handler ──
# Think of this like organizing your desk before starting work.
# A clean desk = productive work. Clean data = good AI results.

class DataHandler:
    """Handles all data operations for your AI debugging system.
    
    Real-world analogy: This is like a librarian.
    - Organizes books (data) on shelves (storage)
    - Finds the right book when you ask (retrieval)
    - Keeps track of what is borrowed (logging)
    """
    
    def __init__(self, data_path="./data"):
        self.data_path = data_path
        self.cache = {}  # In-memory cache for speed
        self.stats = {"reads": 0, "writes": 0, "cache_hits": 0}
    
    def save(self, key, data):
        """Save data with a unique key.
        Like putting a labeled box on a shelf."""
        self.cache[key] = {
            "data": data,
            "saved_at": datetime.now().isoformat(),
            "checksum": hashlib.md5(json.dumps(data, default=str).encode()).hexdigest()
        }
        self.stats["writes"] += 1
        return True
    
    def load(self, key):
        """Load data by key. Check cache first (faster!).
        Like checking your pocket before going to the shelf."""
        if key in self.cache:
            self.stats["cache_hits"] += 1
            return self.cache[key]["data"]
        self.stats["reads"] += 1
        return None  # Not found
    
    def get_stats(self):
        """How efficient is our data handling?"""
        total = self.stats["reads"] + self.stats["cache_hits"]
        hit_rate = (self.stats["cache_hits"] / max(total, 1)) * 100
        return {
            "total_operations": self.stats["reads"] + self.stats["writes"] + self.stats["cache_hits"],
            "cache_hit_rate": f"{hit_rate:.1f}%",
            "money_saved_by_cache": f"Rs {self.stats['cache_hits'] * 0.05:.2f}"
        }

# ── The Brain: Your AI Processor ──
# This is where the magic happens!

class AIProcessor:
    """Processes requests using AI with smart optimizations.
    
    Real-world analogy: This is like a smart assistant.
    - Understands what you need (input processing)
    - Finds the best way to help (model selection)
    - Gives you a clear answer (output formatting)
    - Remembers common questions (caching)
    """
    
    def __init__(self, data_handler):
        self.data = data_handler
        self.request_log = []
        self.daily_cost = 0
        self.daily_limit_inr = 1000  # Rs 1000 per day max
    
    def process(self, query, context=None):
        """Process a query with full tracking.
        
        Steps (like making chai):
        1. Boil water (prepare the query)
        2. Add tea leaves (add context)
        3. Add milk and sugar (format nicely)
        4. Strain and serve (validate and return)
        """
        start_time = time.time()
        
        # Check daily budget
        if self.daily_cost >= self.daily_limit_inr:
            return self._error("Daily budget of Rs {self.daily_limit_inr} reached!")
        
        # Check cache - maybe we answered this before?
        cache_key = hashlib.md5(query.encode()).hexdigest()
        cached = self.data.load(cache_key)
        if cached:
            return {**cached, "from_cache": True, "cost_inr": 0}
        
        # Process the query
        try:
            result = self._generate_response(query, context)
            cost = self._estimate_cost(query, result)
            self.daily_cost += cost
            
            # Save to cache for next time
            response = {
                "answer": result,
                "confidence": 0.85,
                "cost_inr": round(cost, 4),
                "latency_ms": round((time.time() - start_time) * 1000),
                "from_cache": False
            }
            self.data.save(cache_key, response)
            
            # Log for analysis
            self.request_log.append({
                "query": query[:100],
                "cost": cost,
                "time": datetime.now().isoformat()
            })
            
            return response
            
        except Exception as e:
            return self._error(f"Processing failed: {str(e)}")
    
    def _generate_response(self, query, context):
        """Generate AI response. Replace with your actual AI call."""
        # This is where you plug in OpenAI, Anthropic, or local model
        return f"Processed: {query[:80]}"
    
    def _estimate_cost(self, query, result):
        """Estimate cost in INR."""
        tokens = (len(query.split()) + len(str(result).split())) * 1.3
        return tokens / 1000 * 0.01  # Rs 0.01 per 1K tokens
    
    def _error(self, message):
        return {"error": message, "cost_inr": 0}
    
    def daily_report(self):
        """Generate a daily report. Share this with your team!"""
        if not self.request_log:
            return "No requests processed today."
        
        total_requests = len(self.request_log)
        total_cost = sum(r["cost"] for r in self.request_log)
        
        return {
            "date": datetime.now().strftime("%d %B %Y"),
            "total_requests": total_requests,
            "total_cost": f"Rs {total_cost:.2f}",
            "avg_cost_per_request": f"Rs {total_cost/total_requests:.4f}",
            "projected_monthly_cost": f"Rs {total_cost * 30:,.2f}",
            "budget_status": "Within limits" if total_cost < self.daily_limit_inr else "OVER BUDGET!"
        }

# ── Run the complete system ──
if __name__ == "__main__":
    data = DataHandler()
    ai = AIProcessor(data)
    
    # Simulate real usage
    queries = [
        "What is our return policy?",
        "How to track my order?",
        "What is our return policy?",  # Same query - should hit cache!
        "I need a refund for order #12345",
    ]
    
    print("=== Processing Queries ===")
    for q in queries:
        result = ai.process(q)
        cached = "CACHED" if result.get("from_cache") else "NEW"
        cost = result.get("cost_inr", 0)
        print(f"  [{cached}] {q[:40]}... -> Cost: Rs {cost}")
    
    print("
=== Daily Report ===")
    report = ai.daily_report()
    for k, v in report.items():
        print(f"  {k}: {v}")
    
    print("
=== Data Handler Stats ===")
    for k, v in data.get_stats().items():
        print(f"  {k}: {v}")

Let me break down what this code does in plain language:

The DataHandler is like a librarian. When you need information, you first check if it is in your pocket (cache). If yes, great — that is free and instant! If not, you go to the shelf (storage) and get it. Every time you find something, you keep a copy in your pocket for next time. This simple trick can save 30-50% of your AI costs.

The AIProcessor is the brain of the system. When a request comes in, it first checks the budget (are we still within our daily Rs 1,000 limit?), then checks the cache (have we answered this exact question before?), and only then calls the expensive AI model. After getting the answer, it saves it to cache and logs everything.

Notice the daily_report function at the end. This is incredibly important. It tells you exactly how much you are spending, how many requests you are handling, and whether you are within budget. In Indian companies, being able to show your manager a clear cost report is the difference between "keep going" and "shut it down."

The most beautiful part? When we run the same query twice ("What is our return policy?"), the second time it comes from cache — zero cost, instant response. In a real system handling thousands of queries, this saves lakhs of rupees.

Step 3: Testing Your System (The Part Everyone Skips)

Here is a secret that experienced engineers know: the testing phase is where good AI systems become great ones. And it is the phase that most teams skip because it feels "boring." But skipping testing is like skipping the brake test on a new car. Everything seems fine... until it is not.

Let me show you how to test your AI debugging system properly. I promise to make it as painless as possible.

Happy path testing — Does it work when everything is perfect? Give it clean, clear inputs and check if the outputs make sense. This is like testing if your car starts and drives forward.
Edge case testing — What happens with weird inputs? Empty strings, very long text, special characters, Hindi text mixed with English. This is like testing if your car handles potholes and speed bumps.
Load testing — Can it handle many requests at once? If 100 users ask questions simultaneously, does it crash or slow down? This is like testing if your car works in Bangalore traffic, not just on an empty highway.
Cost testing — Run 1,000 sample requests and check the total cost. Multiply by your expected daily volume. Is it within budget? Many teams discover their system costs 10x more than expected only after launch.
Failure testing — What happens when the AI model is down? When the internet is slow? When the database is full? Your system should fail gracefully, not crash spectacularly.

Create a simple test file with 50-100 test cases. Include normal questions, tricky questions, questions in Hindi, very long questions, and completely irrelevant questions. Run these tests every time you make a change. This takes 5 minutes and can save you from embarrassing failures in production.

Step 4: Going Live — The Launch Checklist

You have built it. You have tested it. Now it is time to go live. But do not just flip a switch and hope for the best. Use this checklist — I call it the "sleep peacefully at night" checklist because if you complete it, you will not get panic calls at 2 AM.

Before launch, make sure you have:

Monitoring dashboard — You should be able to see, at a glance, how many requests are coming in, what the average response time is, how much money you are spending, and if there are any errors. Tools like Grafana (free) or even a simple Google Sheet work.
Cost alerts — Set up an alert that sends you a WhatsApp message or email if daily spending exceeds your budget. This is non-negotiable. I have seen teams get surprise bills of Rs 5 lakh because nobody was watching the costs.
Error handling — When (not if) something goes wrong, your system should show a friendly message to the user, not a scary error page. Something like "I am having trouble right now. Let me connect you with a human agent" is much better than a blank screen.
Rollback plan — If the new AI system is causing problems, you should be able to switch back to the old system within 5 minutes. Always keep the old system running in parallel for the first month.
Gradual rollout — Do not launch to 100% of users on day one. Start with 5%, then 20%, then 50%, then 100%. This way, if something is wrong, only a small percentage of users are affected.

The first week after launch is the most critical. Check your dashboard every few hours. Read the logs. Talk to users. You will find issues that no amount of testing could have caught. That is normal. Fix them quickly, and within a month, your system will be running smoothly.

Lessons I Learned the Hard Way (So You Do Not Have To)

After helping dozens of Indian teams implement AI debugging, I have collected a list of lessons that I wish someone had told me when I started. Each of these comes from a real mistake that cost real money and real time.

Lesson 1: Start with the cheapest model that works. Everyone wants to use GPT-4 or Claude Opus. But for most tasks, GPT-4o-mini or even a fine-tuned small model works just as well at 1/10th the cost. I worked with a team that switched from GPT-4 to GPT-4o-mini and saved Rs 2 lakh per month with zero quality drop.

Lesson 2: Cache everything. In most applications, 30-40% of queries are repeated or very similar. A simple cache can cut your costs by a third. One team I worked with reduced their monthly bill from Rs 90,000 to Rs 55,000 just by adding caching.

Lesson 3: Log every single request. When something goes wrong (and it will), your logs are your detective toolkit. Without logs, debugging is like finding a needle in a haystack. With logs, it is like following a trail of breadcrumbs.

Lesson 4: Set budget alerts before you need them. AI costs can spike unexpectedly. A bug in your code might cause it to call the API in an infinite loop. Without a budget alert, you could wake up to a bill of Rs 50,000 for one night of runaway requests.

Lesson 5: Talk to your users every week. The best improvements come from watching real users interact with your system. They will use it in ways you never imagined, ask questions you never expected, and find bugs you never knew existed.

Next Reads

Newsletter

Stay ahead in AI engineering

Weekly insights on enterprise AI architecture, implementation patterns, and engineering leadership. No fluff — only actionable knowledge.

No spam. Unsubscribe anytime.

← Back to Playbooks