Visual Debugging with AI: When Screenshots Work (and When They Don't)

03/11/2025

I gave Claude the ability to see my running Flutter app through screenshots. Here's what I learned about when visual feedback enables autonomous development - and when it gets in the way.

This is Part 2 of a 3-part series on my experiment with Claude Desktop and MCP servers. Part 1 covered the foundation setup with Desktop Commander. Part 3 explores the principles of autonomous LLM development.

The Vision: If Claude Could Just See the App

In Part 1, I set up Desktop Commander so Claude could access my terminal and file system. This eliminated the copy-paste workflow for code and errors.

But something was missing.

When I'd tell Claude "the button layout looks wrong," Claude would have to guess:

  • Which button?
  • Wrong in what way?
  • How far off is it?
  • Is it a positioning issue, sizing issue, or styling issue?

I wanted Claude to see the app directly. So I went looking for an MCP server that could provide visual feedback.

That's when I found Mobile MCP - an MCP server that can:

  • Take screenshots of Android/iOS emulators
  • Interact with UI elements
  • Monitor running apps

The promise was compelling: Visual debugging without ambiguity.

Adding Mobile MCP to the Setup

The Installation

With Desktop Commander already set up, adding Mobile MCP was effortless. I simply asked Claude:

Me: "Add Mobile MCP to my Claude Desktop configuration so you can take screenshots of my emulator."

Claude did everything:

  1. Read my existing config file
  2. Added the Mobile MCP server configuration
  3. Saved the updated config
  4. Told me to restart Claude Desktop

I didn't edit a single file. This is the compounding benefit of having terminal access - Claude can set up everything else autonomously.

The configuration Claude created (at ~/Library/Application Support/Claude/claude_desktop_config.json):

json
{ "mcpServers": { "desktop-commander": { "command": "npx", "args": [ "@wonderwhy-er/desktop-commander@latest" ] }, "mobile-mcp": { "command": "npx", "args": [ "-y", "@mobilenext/mobile-mcp@latest" ] } } }

Restart Claude Desktop, and now Claude had both:

  • Terminal/file access (Desktop Commander)
  • Visual feedback (Mobile MCP)

The Complete Workflow

Now Claude could:

  1. Edit code
  2. Run flutter run
  3. Take a screenshot
  4. Analyze the UI
  5. Make adjustments
  6. Hot reload
  7. Take another screenshot
  8. Compare before/after

This felt like the holy grail of autonomous development.

Success Story: The Community Site App

Let me show you where visual debugging really shines.

The Request

Me: "Add a bottom navigation bar to the Community Site app with three tabs: News, Forum, and Settings. Make sure it follows Material Design guidelines."

Claude's Process

Step 1: Code

[edit_block → lib/main.dart]

Claude added the BottomNavigationBar widget with three tabs.

Step 2: Run and capture

[interact_with_process → r] # Hot reload [mobile_take_screenshot]

Step 3: Visual analysis

Claude saw the screenshot and immediately identified:

  • ✅ Three tabs present
  • ✅ Icons displaying correctly
  • ❌ Selected tab color too subtle
  • ❌ Icon size slightly small on large screens

Step 4: Refine

[edit_block → lib/main.dart] # Adjust colors and icon sizes [interact_with_process → r] # Hot reload [mobile_take_screenshot] # Verify

Step 5: Report

"Navigation bar added and refined. All three tabs working, visual hierarchy clear, follows Material Design guidelines."

Why This Worked

The Community Site app has:

  • Static UI elements (buttons, text, navigation)
  • Input-driven interactions (tap a tab, see content change)
  • Verifiable layouts (spacing, alignment, colors)

A screenshot tells the complete story. Claude could see:

  • Button positioning
  • Text readability
  • Color contrast
  • Layout spacing
  • Icon sizes

Three iterations, 5 minutes, perfect result.

This is where visual debugging is genuinely valuable.

Where It Falls Short: Action Games

Now let me show you where screenshots don't work.

The Problem: TopRacer

TopRacer is a Commodore 64-inspired racing game with:

  • Continuous movement
  • Real-time physics
  • Collision detection
  • Speed/acceleration feel
  • Frame-rate dependent gameplay

The Failed Workflow

Me: "The car doesn't feel fast enough. Increase the base speed."

Claude's attempt:

Step 1: Code change

[edit_block → lib/game.dart] # Increase car speed constant
[interact_with_process → r] # Hot reload
[mobile_take_screenshot]

Step 2: Visual "analysis"

Claude looked at the screenshot showing the car mid-race and said:

"The car appears to be in motion. Speed increased to 120 km/h. Visual check complete."

But this told me nothing. The screenshot showed:

  • A car on a track (static image)
  • Some road markings
  • A speedometer showing "120"

What the screenshot couldn't show:

  • Does the car feel fast?
  • Is the acceleration smooth?
  • Does the speed feel different from before?
  • Is the visual blur appropriate?
  • Do the road markings scroll at the right pace?

More Attempts, Same Problem

Issue: "Collision detection seems off."

Claude: Takes screenshot "The car is positioned on the track. Collision boundaries appear correct."

But: You can't see collision detection in a still image. You need to see the car hit a barrier and observe what happens.

Issue: "The steering feels too responsive."

Claude: Takes screenshot "Car is turning. Steering implemented correctly."

But: "Feel" requires playing the game, not looking at a screenshot.

Why Screenshots Don't Work for Action Games

1. Motion Can't Be Captured

A racing game is about continuous movement. A screenshot freezes one frame out of thousands. You can't judge:

  • Speed
  • Acceleration
  • Smoothness
  • Timing
  • Responsiveness

2. "Feel" Is Everything

Action games live or die on feel:

  • Does jumping feel "floaty" or "tight"?
  • Does steering feel "responsive" or "sluggish"?
  • Does shooting feel "punchy" or "weak"?

You can't capture feel in a screenshot.

3. State Setup Is Tricky

To verify a feature works, you need to:

  1. Get the game into the right state
  2. Trigger the specific interaction
  3. Capture the result at exactly the right moment

For a Community Site button, this is easy: tap the button, take screenshot.

For a racing game collision:

  1. Position the car near a barrier
  2. Steer toward it at the right angle
  3. Capture the exact frame of impact
  4. Verify collision response

This is nearly impossible to coordinate for a screenshot.

4. Performance Isn't Visual

Screenshot can't show:

  • Frame drops
  • Stuttering
  • Input lag
  • Memory issues

These are critical for action games but invisible to visual inspection.

What Works Instead: Logs and Metrics

For TopRacer, I learned to rely on precise terminal output:

Speed Verification

Instead of: "Does the car look fast?"

Use:

[LOG] Car speed: 120 km/h (target: 100-150 km/h) ✓
[LOG] Acceleration: 0-100 in 2.1s ✓

Collision Detection

Instead of: Screenshot of car near barrier

Use:

[LOG] Collision detected: Car(x:150, y:200) <-> Barrier(x:148, y:200)
[LOG] Collision response: Bounce angle: 45°, speed reduction: 30%

Performance Monitoring

Instead of: "Does it look smooth?"

Use:

[LOG] FPS: 58.2 (target: 60.0) [LOG] Frame time: 16.8ms (budget: 16.6ms)
[LOG] Dropped frames: 2 in last second

Game State Verification

Instead of: Screenshot showing lap counter

Use:

[LOG] Lap completed: Time: 1:23.45, Best: 1:22.10 [LOG] Checkpoint 1: 0:18.3 ✓
[LOG] Checkpoint 2: 0:42.7 ✓

This is precise. This is verifiable. This enables autonomous debugging.

The Pattern: Match Tool to Context

Through these experiments, I discovered a clear pattern:

✅ Use Screenshots For:

1. Static UI Verification

  • Button positioning
  • Text readability
  • Color contrast
  • Layout spacing
  • Icon sizes

2. Turn-Based Games

  • Chess, card games, puzzle games
  • Board state visualization
  • Menu navigation
  • Score displays

3. Input-Driven Apps

  • Forms and settings
  • Navigation flows
  • Content display
  • Authentication screens

4. Before/After Comparisons

  • "Is this button bigger than before?"
  • "Did the color change?"
  • "Is the text centered now?"

❌ Don't Use Screenshots For:

1. Real-Time Action

  • Racing games
  • Platformers
  • Shooters
  • Rhythm games

2. Performance Issues

  • Frame rate
  • Stuttering
  • Input lag
  • Memory leaks

3. "Feel" and Timing

  • Acceleration curves
  • Jump arcs
  • Weapon recoil
  • Animation timing

4. Dynamic Behavior

  • Physics simulation
  • Collision detection
  • AI behavior
  • Procedural generation

A Hybrid Approach: Community Site + TopRacer

Here's how I now use both approaches:

Community Site Development

Workflow:

  1. Claude edits UI code
  2. Hot reloads the app
  3. Takes screenshot
  4. Analyzes layout visually
  5. Makes refinements
  6. Takes another screenshot
  7. Compares before/after

Result: Rapid UI iteration with visual confirmation.

TopRacer Development

Workflow:

  1. Claude edits game logic
  2. Hot reloads the app
  3. Checks terminal logs
  4. Analyzes metrics (FPS, speed, collisions)
  5. Makes refinements
  6. Checks logs again
  7. Verifies numbers match targets

Result: Precise gameplay tuning with measurable feedback.

Looking to the Future: Live Screen Interaction

Here's what would change the game (literally):

Imagine if Claude could:

  • Watch the game playing in real-time
  • See animations and movement
  • Observe collisions happening
  • Feel timing and responsiveness
  • Play the game itself to test

This would unlock visual debugging for action games.

Instead of: Claude looks at frozen screenshot "The car appears positioned correctly"

We'd have: Claude watches 10 seconds of gameplay "The car accelerates too slowly at low speeds but feels good at high speeds. Recommend adjusting the acceleration curve."

We're not there yet. Current LLMs can't watch live video feeds and interact in real-time.

But I expect this capability within the next 1-2 years. When it arrives, the distinction between "static apps" and "action games" for visual debugging will disappear.

Lessons Learned

1. Visual Debugging Is Context-Dependent

It's not universally good or bad - it depends on what you're building.

For Community Site: Screenshot debugging was genuinely valuable. It accelerated development and provided clear visual confirmation.

For TopRacer: Screenshot debugging was frustrating and misleading. Logs and metrics were far more reliable.

2. Know Your App's "Debuggability Profile"

Before adding Mobile MCP to your workflow, ask:

Is my app primarily:

  • [ ] Static layouts?
  • [ ] Turn-based interactions?
  • [ ] Input-driven (forms, buttons, menus)?

If yes → Screenshots will help.

Is my app primarily:

  • [ ] Real-time action?
  • [ ] Performance-critical?
  • [ ] About "feel" and timing?

If yes → Stick with logs and metrics.

3. Screenshots Excel at Layout, Fail at Behavior

Screenshots answer:

  • "Is this button in the right place?" ✓
  • "Is this text readable?" ✓
  • "Does this color work?" ✓

Screenshots don't answer:

  • "Does this feel responsive?" ✗
  • "Is this running smoothly?" ✗
  • "Does the timing feel right?" ✗

Match the question to the tool.

4. The Future Will Change This

The limitations I discovered are temporary. Once LLMs can:

  • Watch live video
  • Interact with apps in real-time
  • Observe behavior over time

...then visual debugging will work for action games too.

But that's not today's reality.

Should You Add Mobile MCP?

Add it if:

  • You're building static UIs, forms, or content apps
  • You want faster iteration on layouts
  • Visual confirmation is valuable for your use case
  • You're building turn-based games or tools

Skip it if:

  • You're building action games or real-time apps
  • Performance and timing are critical
  • You'd rather rely on logs and metrics
  • You want to keep your setup minimal

My recommendation:

Start with Desktop Commander only (Part 1). Add Mobile MCP later if your use case benefits from screenshots.

Don't force visual debugging where it doesn't fit.

Setup Guide

If you want to experiment with visual debugging:

Prerequisites

  • Desktop Commander already set up (see Part 1)
  • Android emulator running (or iOS simulator)
  • Claude Desktop

Installation

1. Edit your MCP config

Add Mobile MCP to ~/Library/Application Support/Claude/claude_desktop_config.json:

json
{ "mcpServers": { "desktop-commander": { "command": "npx", "args": [ "@wonderwhy-er/desktop-commander@latest" ] }, "mobile-mcp": { "command": "npx", "args": [ "-y", "@mobilenext/mobile-mcp@latest" ] } } }

- or let Claude handle it, as mentioned earlier.

2. Restart Claude Desktop

3. Start your emulator

bash
# List available emulators flutter emulators # Launch one flutter emulators --launch

4. Test it works

In a new conversation:

Take a screenshot of my Android emulator

If you see a screenshot, it's working!

Your First Visual Debug

Try this:

Navigate to my Flutter project, run the app on my emulator, take a screenshot, and tell me if the UI follows Material Design guidelines

Claude will analyze your app's visual design and suggest improvements.

What's Next

This post explored when visual debugging works (static apps, forms, turn-based games) and when it doesn't (action games, real-time interactions).

In Part 3, I'll synthesize everything from Parts 1 and 2 to answer the big question: What does autonomous LLM development actually require?

We'll cover:

  • The complete set of principles
  • The "holy grail" checklist
  • Prompting patterns that work
  • The role shift from coder to director
  • What the future holds

The setup is complete. The tools are understood. Now let's talk about the bigger picture.

Resources

Credits

  • Development & writing: John Doktor
  • Infrastructure: Desktop Commander, Mobile MCP
  • Apps tested: Community Site (static UI), TopRacer (action game)

Ready to understand what makes autonomous LLM development actually work? Read Part 3 for the complete picture and lessons learned.