Visual Debugging with AI: When Screenshots Work (and When They Don't)
I gave Claude the ability to see my running Flutter app through screenshots. Here's what I learned about when visual feedback enables autonomous development - and when it gets in the way.
This is Part 2 of a 3-part series on my experiment with Claude Desktop and MCP servers. Part 1 covered the foundation setup with Desktop Commander. Part 3 explores the principles of autonomous LLM development.
The Vision: If Claude Could Just See the App
In Part 1, I set up Desktop Commander so Claude could access my terminal and file system. This eliminated the copy-paste workflow for code and errors.
But something was missing.
When I'd tell Claude "the button layout looks wrong," Claude would have to guess:
- Which button?
- Wrong in what way?
- How far off is it?
- Is it a positioning issue, sizing issue, or styling issue?
I wanted Claude to see the app directly. So I went looking for an MCP server that could provide visual feedback.
That's when I found Mobile MCP - an MCP server that can:
- Take screenshots of Android/iOS emulators
- Interact with UI elements
- Monitor running apps
The promise was compelling: Visual debugging without ambiguity.
Adding Mobile MCP to the Setup
The Installation
With Desktop Commander already set up, adding Mobile MCP was effortless. I simply asked Claude:
Me: "Add Mobile MCP to my Claude Desktop configuration so you can take screenshots of my emulator."
Claude did everything:
- Read my existing config file
- Added the Mobile MCP server configuration
- Saved the updated config
- Told me to restart Claude Desktop
I didn't edit a single file. This is the compounding benefit of having terminal access - Claude can set up everything else autonomously.
The configuration Claude created (at ~/Library/Application Support/Claude/claude_desktop_config.json):
Restart Claude Desktop, and now Claude had both:
- Terminal/file access (Desktop Commander)
- Visual feedback (Mobile MCP)
The Complete Workflow
Now Claude could:
- Edit code
- Run flutter run
- Take a screenshot
- Analyze the UI
- Make adjustments
- Hot reload
- Take another screenshot
- Compare before/after
This felt like the holy grail of autonomous development.
Success Story: The Community Site App
Let me show you where visual debugging really shines.
The Request
Me: "Add a bottom navigation bar to the Community Site app with three tabs: News, Forum, and Settings. Make sure it follows Material Design guidelines."
Claude's Process
Step 1: Code
[edit_block → lib/main.dart]
Claude added the BottomNavigationBar widget with three tabs.
Step 2: Run and capture
[interact_with_process → r] # Hot reload [mobile_take_screenshot]
Step 3: Visual analysis
Claude saw the screenshot and immediately identified:
- ✅ Three tabs present
- ✅ Icons displaying correctly
- ❌ Selected tab color too subtle
- ❌ Icon size slightly small on large screens
Step 4: Refine
[edit_block → lib/main.dart] # Adjust colors and icon sizes [interact_with_process → r] # Hot reload [mobile_take_screenshot] # Verify
Step 5: Report
"Navigation bar added and refined. All three tabs working, visual hierarchy clear, follows Material Design guidelines."
Why This Worked
The Community Site app has:
- Static UI elements (buttons, text, navigation)
- Input-driven interactions (tap a tab, see content change)
- Verifiable layouts (spacing, alignment, colors)
A screenshot tells the complete story. Claude could see:
- Button positioning
- Text readability
- Color contrast
- Layout spacing
- Icon sizes
Three iterations, 5 minutes, perfect result.
This is where visual debugging is genuinely valuable.
Where It Falls Short: Action Games
Now let me show you where screenshots don't work.

The Problem: TopRacer
TopRacer is a Commodore 64-inspired racing game with:
- Continuous movement
- Real-time physics
- Collision detection
- Speed/acceleration feel
- Frame-rate dependent gameplay
The Failed Workflow
Me: "The car doesn't feel fast enough. Increase the base speed."
Claude's attempt:
Step 1: Code change
[edit_block → lib/game.dart] # Increase car speed constant
[interact_with_process → r] # Hot reload
[mobile_take_screenshot]
Step 2: Visual "analysis"
Claude looked at the screenshot showing the car mid-race and said:
"The car appears to be in motion. Speed increased to 120 km/h. Visual check complete."
But this told me nothing. The screenshot showed:
- A car on a track (static image)
- Some road markings
- A speedometer showing "120"
What the screenshot couldn't show:
- Does the car feel fast?
- Is the acceleration smooth?
- Does the speed feel different from before?
- Is the visual blur appropriate?
- Do the road markings scroll at the right pace?
More Attempts, Same Problem
Issue: "Collision detection seems off."
Claude: Takes screenshot "The car is positioned on the track. Collision boundaries appear correct."
But: You can't see collision detection in a still image. You need to see the car hit a barrier and observe what happens.
Issue: "The steering feels too responsive."
Claude: Takes screenshot "Car is turning. Steering implemented correctly."
But: "Feel" requires playing the game, not looking at a screenshot.
Why Screenshots Don't Work for Action Games
1. Motion Can't Be Captured
A racing game is about continuous movement. A screenshot freezes one frame out of thousands. You can't judge:
- Speed
- Acceleration
- Smoothness
- Timing
- Responsiveness
2. "Feel" Is Everything
Action games live or die on feel:
- Does jumping feel "floaty" or "tight"?
- Does steering feel "responsive" or "sluggish"?
- Does shooting feel "punchy" or "weak"?
You can't capture feel in a screenshot.
3. State Setup Is Tricky
To verify a feature works, you need to:
- Get the game into the right state
- Trigger the specific interaction
- Capture the result at exactly the right moment
For a Community Site button, this is easy: tap the button, take screenshot.
For a racing game collision:
- Position the car near a barrier
- Steer toward it at the right angle
- Capture the exact frame of impact
- Verify collision response
This is nearly impossible to coordinate for a screenshot.
4. Performance Isn't Visual
Screenshot can't show:
- Frame drops
- Stuttering
- Input lag
- Memory issues
These are critical for action games but invisible to visual inspection.
What Works Instead: Logs and Metrics
For TopRacer, I learned to rely on precise terminal output:
Speed Verification
Instead of: "Does the car look fast?"
Use:
[LOG] Car speed: 120 km/h (target: 100-150 km/h) ✓
[LOG] Acceleration: 0-100 in 2.1s ✓
Collision Detection
Instead of: Screenshot of car near barrier
Use:
[LOG] Collision detected: Car(x:150, y:200) <-> Barrier(x:148, y:200)
[LOG] Collision response: Bounce angle: 45°, speed reduction: 30%
Performance Monitoring
Instead of: "Does it look smooth?"
Use:
[LOG] FPS: 58.2 (target: 60.0) [LOG] Frame time: 16.8ms (budget: 16.6ms)
[LOG] Dropped frames: 2 in last second
Game State Verification
Instead of: Screenshot showing lap counter
Use:
[LOG] Lap completed: Time: 1:23.45, Best: 1:22.10 [LOG] Checkpoint 1: 0:18.3 ✓
[LOG] Checkpoint 2: 0:42.7 ✓
This is precise. This is verifiable. This enables autonomous debugging.
The Pattern: Match Tool to Context
Through these experiments, I discovered a clear pattern:
✅ Use Screenshots For:
1. Static UI Verification
- Button positioning
- Text readability
- Color contrast
- Layout spacing
- Icon sizes
2. Turn-Based Games
- Chess, card games, puzzle games
- Board state visualization
- Menu navigation
- Score displays
3. Input-Driven Apps
- Forms and settings
- Navigation flows
- Content display
- Authentication screens
4. Before/After Comparisons
- "Is this button bigger than before?"
- "Did the color change?"
- "Is the text centered now?"
❌ Don't Use Screenshots For:
1. Real-Time Action
- Racing games
- Platformers
- Shooters
- Rhythm games
2. Performance Issues
- Frame rate
- Stuttering
- Input lag
- Memory leaks
3. "Feel" and Timing
- Acceleration curves
- Jump arcs
- Weapon recoil
- Animation timing
4. Dynamic Behavior
- Physics simulation
- Collision detection
- AI behavior
- Procedural generation
A Hybrid Approach: Community Site + TopRacer
Here's how I now use both approaches:
Community Site Development
Workflow:
- Claude edits UI code
- Hot reloads the app
- Takes screenshot ✓
- Analyzes layout visually
- Makes refinements
- Takes another screenshot
- Compares before/after
Result: Rapid UI iteration with visual confirmation.
TopRacer Development
Workflow:
- Claude edits game logic
- Hot reloads the app
- Checks terminal logs ✓
- Analyzes metrics (FPS, speed, collisions)
- Makes refinements
- Checks logs again
- Verifies numbers match targets
Result: Precise gameplay tuning with measurable feedback.
Looking to the Future: Live Screen Interaction
Here's what would change the game (literally):
Imagine if Claude could:
- Watch the game playing in real-time
- See animations and movement
- Observe collisions happening
- Feel timing and responsiveness
- Play the game itself to test
This would unlock visual debugging for action games.
Instead of: Claude looks at frozen screenshot "The car appears positioned correctly"
We'd have: Claude watches 10 seconds of gameplay "The car accelerates too slowly at low speeds but feels good at high speeds. Recommend adjusting the acceleration curve."
We're not there yet. Current LLMs can't watch live video feeds and interact in real-time.
But I expect this capability within the next 1-2 years. When it arrives, the distinction between "static apps" and "action games" for visual debugging will disappear.
Lessons Learned
1. Visual Debugging Is Context-Dependent
It's not universally good or bad - it depends on what you're building.
For Community Site: Screenshot debugging was genuinely valuable. It accelerated development and provided clear visual confirmation.
For TopRacer: Screenshot debugging was frustrating and misleading. Logs and metrics were far more reliable.
2. Know Your App's "Debuggability Profile"
Before adding Mobile MCP to your workflow, ask:
Is my app primarily:
- [ ] Static layouts?
- [ ] Turn-based interactions?
- [ ] Input-driven (forms, buttons, menus)?
If yes → Screenshots will help.
Is my app primarily:
- [ ] Real-time action?
- [ ] Performance-critical?
- [ ] About "feel" and timing?
If yes → Stick with logs and metrics.
3. Screenshots Excel at Layout, Fail at Behavior
Screenshots answer:
- "Is this button in the right place?" ✓
- "Is this text readable?" ✓
- "Does this color work?" ✓
Screenshots don't answer:
- "Does this feel responsive?" ✗
- "Is this running smoothly?" ✗
- "Does the timing feel right?" ✗
Match the question to the tool.
4. The Future Will Change This
The limitations I discovered are temporary. Once LLMs can:
- Watch live video
- Interact with apps in real-time
- Observe behavior over time
...then visual debugging will work for action games too.
But that's not today's reality.
Should You Add Mobile MCP?
Add it if:
- You're building static UIs, forms, or content apps
- You want faster iteration on layouts
- Visual confirmation is valuable for your use case
- You're building turn-based games or tools
Skip it if:
- You're building action games or real-time apps
- Performance and timing are critical
- You'd rather rely on logs and metrics
- You want to keep your setup minimal
My recommendation:
Start with Desktop Commander only (Part 1). Add Mobile MCP later if your use case benefits from screenshots.
Don't force visual debugging where it doesn't fit.
Setup Guide
If you want to experiment with visual debugging:
Prerequisites
- Desktop Commander already set up (see Part 1)
- Android emulator running (or iOS simulator)
- Claude Desktop
Installation
1. Edit your MCP config
Add Mobile MCP to ~/Library/Application Support/Claude/claude_desktop_config.json:
- or let Claude handle it, as mentioned earlier.
2. Restart Claude Desktop
3. Start your emulator
4. Test it works
In a new conversation:
Take a screenshot of my Android emulator
If you see a screenshot, it's working!
Your First Visual Debug
Try this:
Navigate to my Flutter project, run the app on my emulator, take a screenshot, and tell me if the UI follows Material Design guidelines
Claude will analyze your app's visual design and suggest improvements.
What's Next
This post explored when visual debugging works (static apps, forms, turn-based games) and when it doesn't (action games, real-time interactions).
In Part 3, I'll synthesize everything from Parts 1 and 2 to answer the big question: What does autonomous LLM development actually require?
We'll cover:
- The complete set of principles
- The "holy grail" checklist
- Prompting patterns that work
- The role shift from coder to director
- What the future holds
The setup is complete. The tools are understood. Now let's talk about the bigger picture.
Resources
- Mobile MCP: github.com/mobile-next/mobile-mcp
- Desktop Commander: github.com/wonderwhy-er/desktop-commander
- Flutter Emulator Setup: docs.flutter.dev
Credits
- Development & writing: John Doktor
- Infrastructure: Desktop Commander, Mobile MCP
- Apps tested: Community Site (static UI), TopRacer (action game)
Ready to understand what makes autonomous LLM development actually work? Read Part 3 for the complete picture and lessons learned.