Visual Debugging with AI: When Screenshots Work (and When They Don't)

03/11/2025

I gave Claude the ability to see my running Flutter app through screenshots. Here's what I learned about when visual feedback enables autonomous development - and when it gets in the way.

This is Part 2 of a 3-part series on my experiment with Claude Desktop and MCP servers. Part 1 covered the foundation setup with Desktop Commander. Part 3 explores the principles of autonomous LLM development.

The Vision: If Claude Could Just See the App

In Part 1, I set up Desktop Commander so Claude could access my terminal and file system. This eliminated the copy-paste workflow for code and errors.

But something was missing.

When I'd tell Claude "the button layout looks wrong," Claude would have to guess:

Which button?
Wrong in what way?
How far off is it?
Is it a positioning issue, sizing issue, or styling issue?

I wanted Claude to see the app directly. So I went looking for an MCP server that could provide visual feedback.

That's when I found Mobile MCP - an MCP server that can:

Take screenshots of Android/iOS emulators
Interact with UI elements
Monitor running apps

The promise was compelling: Visual debugging without ambiguity.

Adding Mobile MCP to the Setup

The Installation

With Desktop Commander already set up, adding Mobile MCP was effortless. I simply asked Claude:

Me: "Add Mobile MCP to my Claude Desktop configuration so you can take screenshots of my emulator."

Claude did everything:

Read my existing config file
Added the Mobile MCP server configuration
Saved the updated config
Told me to restart Claude Desktop

I didn't edit a single file. This is the compounding benefit of having terminal access - Claude can set up everything else autonomously.

The configuration Claude created (at ~/Library/Application Support/Claude/claude_desktop_config.json):

    json
  

{
  "mcpServers": {
    "desktop-commander": {
      "command": "npx",
      "args": [
        "@wonderwhy-er/desktop-commander@latest"
      ]
    },
    "mobile-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@mobilenext/mobile-mcp@latest"
      ]
    }
  }
}
  

Restart Claude Desktop, and now Claude had both:

Terminal/file access (Desktop Commander)
Visual feedback (Mobile MCP)

The Complete Workflow

Now Claude could:

Edit code
Run flutter run
Take a screenshot
Analyze the UI
Make adjustments
Hot reload
Take another screenshot
Compare before/after

This felt like the holy grail of autonomous development.

Success Story: The Community Site App

Let me show you where visual debugging really shines.

The Request

Me: "Add a bottom navigation bar to the Community Site app with three tabs: News, Forum, and Settings. Make sure it follows Material Design guidelines."

Claude's Process

Step 1: Code

[edit_block → lib/main.dart]

Claude added the BottomNavigationBar widget with three tabs.

Step 2: Run and capture

[interact_with_process → r] # Hot reload [mobile_take_screenshot]

Step 3: Visual analysis

Claude saw the screenshot and immediately identified:

✅ Three tabs present
✅ Icons displaying correctly
❌ Selected tab color too subtle
❌ Icon size slightly small on large screens

Step 4: Refine

[edit_block → lib/main.dart] # Adjust colors and icon sizes [interact_with_process → r] # Hot reload [mobile_take_screenshot] # Verify

Step 5: Report

"Navigation bar added and refined. All three tabs working, visual hierarchy clear, follows Material Design guidelines."

Why This Worked

The Community Site app has:

Static UI elements (buttons, text, navigation)
Input-driven interactions (tap a tab, see content change)
Verifiable layouts (spacing, alignment, colors)

A screenshot tells the complete story. Claude could see:

Button positioning
Text readability
Color contrast
Layout spacing
Icon sizes

Three iterations, 5 minutes, perfect result.

This is where visual debugging is genuinely valuable.

Where It Falls Short: Action Games

Now let me show you where screenshots don't work.

The Problem: TopRacer

TopRacer is a Commodore 64-inspired racing game with:

Continuous movement
Real-time physics
Collision detection
Speed/acceleration feel
Frame-rate dependent gameplay

The Failed Workflow

Me: "The car doesn't feel fast enough. Increase the base speed."

Claude's attempt:

Step 1: Code change

[edit_block → lib/game.dart] # Increase car speed constant
[interact_with_process → r] # Hot reload
[mobile_take_screenshot]

Step 2: Visual "analysis"

Claude looked at the screenshot showing the car mid-race and said:

"The car appears to be in motion. Speed increased to 120 km/h. Visual check complete."

But this told me nothing. The screenshot showed:

A car on a track (static image)
Some road markings
A speedometer showing "120"

What the screenshot couldn't show:

Does the car feel fast?
Is the acceleration smooth?
Does the speed feel different from before?
Is the visual blur appropriate?
Do the road markings scroll at the right pace?

More Attempts, Same Problem

Issue: "Collision detection seems off."

Claude: Takes screenshot "The car is positioned on the track. Collision boundaries appear correct."

But: You can't see collision detection in a still image. You need to see the car hit a barrier and observe what happens.

Issue: "The steering feels too responsive."

Claude: Takes screenshot "Car is turning. Steering implemented correctly."

But: "Feel" requires playing the game, not looking at a screenshot.

Why Screenshots Don't Work for Action Games

1. Motion Can't Be Captured

A racing game is about continuous movement. A screenshot freezes one frame out of thousands. You can't judge:

Speed
Acceleration
Smoothness
Timing
Responsiveness

2. "Feel" Is Everything

Action games live or die on feel:

Does jumping feel "floaty" or "tight"?
Does steering feel "responsive" or "sluggish"?
Does shooting feel "punchy" or "weak"?

You can't capture feel in a screenshot.

3. State Setup Is Tricky

To verify a feature works, you need to:

Get the game into the right state
Trigger the specific interaction
Capture the result at exactly the right moment

For a Community Site button, this is easy: tap the button, take screenshot.

For a racing game collision:

Position the car near a barrier
Steer toward it at the right angle
Capture the exact frame of impact
Verify collision response

This is nearly impossible to coordinate for a screenshot.

4. Performance Isn't Visual

Screenshot can't show:

Frame drops
Stuttering
Input lag
Memory issues

These are critical for action games but invisible to visual inspection.

What Works Instead: Logs and Metrics

For TopRacer, I learned to rely on precise terminal output:

Speed Verification

Instead of: "Does the car look fast?"

Use:

[LOG] Car speed: 120 km/h (target: 100-150 km/h) ✓
[LOG] Acceleration: 0-100 in 2.1s ✓

Collision Detection

Instead of: Screenshot of car near barrier

Use:

[LOG] Collision detected: Car(x:150, y:200) <-> Barrier(x:148, y:200)
[LOG] Collision response: Bounce angle: 45°, speed reduction: 30%

Performance Monitoring

Instead of: "Does it look smooth?"

Use:

[LOG] FPS: 58.2 (target: 60.0) [LOG] Frame time: 16.8ms (budget: 16.6ms)
[LOG] Dropped frames: 2 in last second

Game State Verification

Instead of: Screenshot showing lap counter

Use:

[LOG] Lap completed: Time: 1:23.45, Best: 1:22.10 [LOG] Checkpoint 1: 0:18.3 ✓
[LOG] Checkpoint 2: 0:42.7 ✓

This is precise. This is verifiable. This enables autonomous debugging.

The Pattern: Match Tool to Context

Through these experiments, I discovered a clear pattern:

✅ Use Screenshots For:

1. Static UI Verification

Button positioning
Text readability
Color contrast
Layout spacing
Icon sizes

2. Turn-Based Games

Chess, card games, puzzle games
Board state visualization
Menu navigation
Score displays

3. Input-Driven Apps

Forms and settings
Navigation flows
Content display
Authentication screens

4. Before/After Comparisons

"Is this button bigger than before?"
"Did the color change?"
"Is the text centered now?"

❌ Don't Use Screenshots For:

1. Real-Time Action

Racing games
Platformers
Shooters
Rhythm games

2. Performance Issues

Frame rate
Stuttering
Input lag
Memory leaks

3. "Feel" and Timing

Acceleration curves
Jump arcs
Weapon recoil
Animation timing

4. Dynamic Behavior

Physics simulation
Collision detection
AI behavior
Procedural generation

A Hybrid Approach: Community Site + TopRacer

Here's how I now use both approaches:

Community Site Development

Workflow:

Claude edits UI code
Hot reloads the app
Takes screenshot ✓
Analyzes layout visually
Makes refinements
Takes another screenshot
Compares before/after

Result: Rapid UI iteration with visual confirmation.

TopRacer Development

Workflow:

Claude edits game logic
Hot reloads the app
Checks terminal logs ✓
Analyzes metrics (FPS, speed, collisions)
Makes refinements
Checks logs again
Verifies numbers match targets

Result: Precise gameplay tuning with measurable feedback.

Looking to the Future: Live Screen Interaction

Here's what would change the game (literally):

Imagine if Claude could:

Watch the game playing in real-time
See animations and movement
Observe collisions happening
Feel timing and responsiveness
Play the game itself to test

This would unlock visual debugging for action games.

Instead of: Claude looks at frozen screenshot "The car appears positioned correctly"

We'd have: Claude watches 10 seconds of gameplay "The car accelerates too slowly at low speeds but feels good at high speeds. Recommend adjusting the acceleration curve."

We're not there yet. Current LLMs can't watch live video feeds and interact in real-time.

But I expect this capability within the next 1-2 years. When it arrives, the distinction between "static apps" and "action games" for visual debugging will disappear.

Lessons Learned

1. Visual Debugging Is Context-Dependent

It's not universally good or bad - it depends on what you're building.

For Community Site: Screenshot debugging was genuinely valuable. It accelerated development and provided clear visual confirmation.

For TopRacer: Screenshot debugging was frustrating and misleading. Logs and metrics were far more reliable.

2. Know Your App's "Debuggability Profile"

Before adding Mobile MCP to your workflow, ask:

Is my app primarily:

[ ] Static layouts?
[ ] Turn-based interactions?
[ ] Input-driven (forms, buttons, menus)?

If yes → Screenshots will help.

Is my app primarily:

[ ] Real-time action?
[ ] Performance-critical?
[ ] About "feel" and timing?

If yes → Stick with logs and metrics.

3. Screenshots Excel at Layout, Fail at Behavior

Screenshots answer:

"Is this button in the right place?" ✓
"Is this text readable?" ✓
"Does this color work?" ✓

Screenshots don't answer:

"Does this feel responsive?" ✗
"Is this running smoothly?" ✗
"Does the timing feel right?" ✗

Match the question to the tool.

4. The Future Will Change This

The limitations I discovered are temporary. Once LLMs can:

Watch live video
Interact with apps in real-time
Observe behavior over time

...then visual debugging will work for action games too.

But that's not today's reality.

Should You Add Mobile MCP?

Add it if:

You're building static UIs, forms, or content apps
You want faster iteration on layouts
Visual confirmation is valuable for your use case
You're building turn-based games or tools

Skip it if:

You're building action games or real-time apps
Performance and timing are critical
You'd rather rely on logs and metrics
You want to keep your setup minimal

My recommendation:

Start with Desktop Commander only (Part 1). Add Mobile MCP later if your use case benefits from screenshots.

Don't force visual debugging where it doesn't fit.

Setup Guide

If you want to experiment with visual debugging:

Prerequisites

Desktop Commander already set up (see Part 1)
Android emulator running (or iOS simulator)
Claude Desktop

Installation

1. Edit your MCP config

Add Mobile MCP to ~/Library/Application Support/Claude/claude_desktop_config.json:

    json
  

{
  "mcpServers": {
    "desktop-commander": {
      "command": "npx",
      "args": [
        "@wonderwhy-er/desktop-commander@latest"
      ]
    },
    "mobile-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@mobilenext/mobile-mcp@latest"
      ]
    }
  }
}
  

- or let Claude handle it, as mentioned earlier.

2. Restart Claude Desktop

3. Start your emulator

    bash
  
# List available emulators
flutter emulators

# Launch one
flutter emulators --launch

4. Test it works

In a new conversation:

Take a screenshot of my Android emulator

If you see a screenshot, it's working!

Your First Visual Debug

Try this:

Navigate to my Flutter project, run the app on my emulator, take a screenshot, and tell me if the UI follows Material Design guidelines

Claude will analyze your app's visual design and suggest improvements.

What's Next

This post explored when visual debugging works (static apps, forms, turn-based games) and when it doesn't (action games, real-time interactions).

In Part 3, I'll synthesize everything from Parts 1 and 2 to answer the big question: What does autonomous LLM development actually require?

We'll cover:

The complete set of principles
The "holy grail" checklist
Prompting patterns that work
The role shift from coder to director
What the future holds

The setup is complete. The tools are understood. Now let's talk about the bigger picture.

Resources

Mobile MCP: github.com/mobile-next/mobile-mcp
Desktop Commander: github.com/wonderwhy-er/desktop-commander
Flutter Emulator Setup: docs.flutter.dev

Credits

Development & writing: John Doktor
Infrastructure: Desktop Commander, Mobile MCP
Apps tested: Community Site (static UI), TopRacer (action game)

Ready to understand what makes autonomous LLM development actually work? Read Part 3 for the complete picture and lessons learned.

Visual Debugging with AI: When Screenshots Work (and When They Don't)

The Vision: If Claude Could Just See the App

Adding Mobile MCP to the Setup

The Installation

The Complete Workflow

Success Story: The Community Site App

The Request

Claude's Process

Why This Worked

Where It Falls Short: Action Games

The Problem: TopRacer

The Failed Workflow

More Attempts, Same Problem

Why Screenshots Don't Work for Action Games

1. Motion Can't Be Captured

2. "Feel" Is Everything

3. State Setup Is Tricky

4. Performance Isn't Visual

What Works Instead: Logs and Metrics

Speed Verification

Collision Detection

Performance Monitoring

Game State Verification

The Pattern: Match Tool to Context

✅ Use Screenshots For:

❌ Don't Use Screenshots For:

A Hybrid Approach: Community Site + TopRacer

Community Site Development

TopRacer Development

Looking to the Future: Live Screen Interaction

Lessons Learned

1. Visual Debugging Is Context-Dependent

2. Know Your App's "Debuggability Profile"

3. Screenshots Excel at Layout, Fail at Behavior

4. The Future Will Change This

Should You Add Mobile MCP?

Setup Guide

Prerequisites

Installation

Your First Visual Debug

What's Next

Resources

Advanced settings