You built it with AI. Now how do you review it?
Vibe coding collapsed the build phase: prompt, ship, iterate. What it didn't collapse is review. AI-built apps have a distinct failure profile — plausible-looking UI with broken edges — and the feedback loop most builders use to catch it (screenshots in Slack, notes in a doc) is the slowest part of their stack. Here's a review process that keeps up.
Why AI-built apps need a different kind of review
An agent-built app fails differently than a hand-built one. Hand-built apps accumulate bugs where the developer got tired. Agent-built apps fail where the prompt was silent. The happy path usually works — that's what you described. What breaks is everything you didn't mention:
- Empty and error states. The dashboard looks great with the seed data the agent generated. What renders with zero items, a failed fetch, or a 40-character username?
- Responsive breakpoints. Agents build to the viewport they're told about. Nobody told it about 390px.
- Copy drift. Placeholder microcopy ("Manage your items"), inconsistent terminology between pages, button labels that describe the implementation instead of the action.
- Accessibility. Missing focus states, low-contrast text on that gradient the agent loved, unlabeled icon buttons.
- Interaction edges. Double-submits, loading states that never resolve, hover-only affordances on touch devices.
None of these show up in a unit test, and most don't show up on your own machine, because you always click the same path. They show up when someone else uses the deployed thing.
Agent-built apps fail where the prompt was silent. The only way to find those places is other people, on the live app.
The bottleneck: feedback arrives as prose
So you send the URL to a friend, a client, or a designer. Their feedback comes back as texts and screenshots: "the button at the bottom looks off on my phone." Now you're doing archaeology — which button, which page, which phone, which state? You reconstruct all of that into a prompt for your coding agent, which is absurd when you think about it: a human translating structured visual information into prose so a machine can translate it back.
The fix isn't better screenshots. It's capturing feedback in the format your agent actually consumes: the element's CSS selector, the surrounding DOM, the viewport it was seen at, a screenshot, and the reviewer's note — bundled together at the moment the reviewer clicks. That's the format review feedback should have when the entity fixing it is the same kind of agent that built it.
A review loop that closes itself
1. Put the review surface on the deployed app
Reviewers use the Pincushion Chrome extension — or you drop a one-line script tag in your layout so anyone who opens the site can pin without installing anything. Either way, reviewers click the actual element and type the note. Reviewers are free, unlimited, no accounts to provision.
2. Let the pins pile up during the session
Each pin captures the work packet automatically: selector, DOM snippet, screenshot, viewport, thread. Discussion happens on the pin, not in a channel. You approve the ones you agree with — approval is the quality gate, so a reviewer's stray opinion never auto-ships.
3. Hand the batch to your agent
In Cursor, Claude Code, Windsurf, or Copilot agent mode, run /implement. The agent pulls approved pins via MCP, grouped by page so one branch addresses one page's feedback, greps the source by selector, makes the changes, and records the commit, branch, and PR on each pin.
4. Verify on the next deploy
The deploy hook links the live URL back to each resolved pin, and Pincushion AI can re-check the page and write a verdict — verified, regressed, or inconclusive — onto the pin (Pro). Your reviewer sees their feedback went from pin to production without ever hearing the word "ticket."
What to actually ask reviewers to look at
| Pass | What to check | Who's good at it |
|---|---|---|
| States pass | Empty, loading, error, and overflow states on every page | Anyone — give them a test account with no data |
| Phone pass | Every page at a real phone width, thumbs only, no hover | Anyone with a phone |
| Copy pass | Terminology consistency, button labels, empty-state copy, tone | Your most word-picky friend |
| Flow pass | Signup → first value, checkout, invite — done cold, no guidance | Someone who's never seen the app |
| A11y pass | Keyboard-only navigation, focus visibility, contrast | Tab through it yourself; you'll find plenty |
You don't need a QA department. You need three people and a pin tool. Each pass takes fifteen minutes, and each finding arrives as something your agent can fix without a conversation. If you can't recruit anyone today, /critique makes Pincushion AI drop up to three high-signal UI/copy/a11y pins on a page — a decent first pass while you wait for humans.
The economics of the loop
The whole point of building with AI is cycle speed, and review is now the slowest cycle. Compressing "feedback arrives" to "fix is deployed" from days of Slack archaeology to one agent run is the highest-leverage change you can make to an AI-built project's quality. Pincushion's free tier covers it: one project, unlimited manual pins, 25 AI actions a month, and a 14-day Pro trial (unlimited automation, realtime push, unlimited projects) on signup.
Related: what a visual feedback MCP server actually does, why Figma comments can't review a deployed app, and how Pincushion compares to BugHerd and Usersnap.