ContentsTailor
Build

How I Built a Security Scanner in 48 Hours with Cursor + Claude Code

I built b4uship.com — a pre-launch security scanner for vibe coders — in a single weekend. Here's the full build log: what I chose, what broke, and the irony of using AI to catch AI-generated vulnerabilities.

March 28, 202610 min read
How I Built a Security Scanner in 48 Hours with Cursor + Claude Code

Vibe coding is everywhere now. You type a prompt in Cursor, Claude generates the code, and you have a working prototype in hours.

The problem is what happens next. That code goes straight to production.

AI-generated code works. It runs. But it hasn't been reviewed for security. Environment variables leak to the client side. SQL queries take raw user input. CORS is set to allow everything. When you're moving fast — build, deploy, ship — these things slip through.

b4uship.com is the product I built to solve this. Paste your code or enter a URL, and it scans for security issues before you ship. The target audience is people like me: solo builders and vibe coders who don't have a security team.

This is the build log. What I decided, what tools I used, and where things broke — in chronological order over 48 hours.

Day 0: From Idea to Spec

Saturday Morning — The Problem

I mentor junior developers on the side. Over the past few months, I started seeing a new pattern: people with little or no coding background were building full web apps with vibe coding tools — Cursor, Bolt, Replit Agent — and shipping them live.

The apps worked. They looked decent. But when I reviewed the code, it was a different story.

Stripe secret keys hardcoded in client-side JavaScript. CORS set to origin: "*". API endpoints concatenating user input directly into database queries. .env files committed to public repos. These aren't obscure vulnerabilities. They're basics that any experienced developer would catch in seconds. But if you've never written code before vibe coding, you don't know what you don't know.

I kept seeing the same issues across different mentees' projects. The tools that made building fast also made shipping dangerous code fast. And these people had no way to check — they didn't know what to look for, and they couldn't afford a security audit.

The idea was simple: a lightweight scanner you run once before you deploy. Not a full enterprise security suite. Just a quick pre-launch check that catches the most common mistakes vibe coders make.

Saturday Afternoon — MVP Scope

I gave myself 48 hours. That meant cutting ruthlessly.

What made it in:

  • Code scan: Paste code or connect a GitHub repo, get a security report
  • URL scan: Enter a live URL, check its security posture
  • CWE-based categorization: Map findings to Common Weakness Enumeration standards
  • Instant results: No signup required for basic scans
  • AI review: LLM-powered analysis that explains findings in plain English

What didn't make it:

  • User account system (added later)
  • Scan history (added later)
  • CI/CD integration
  • Auto-fix PRs (added later)

This scope decision saved the project. The temptation to add features was constant, but "a shipped imperfect product beats an unshipped perfect one" was the rule.

Day 1: Backend — The Scan Engine

Saturday Evening — Architecture Decision

The biggest decision was the backend architecture. I needed something that could run security scans — CPU-intensive work like cloning repos, parsing files, running regex patterns, calling an LLM — without maintaining servers.

I chose Modal.com for the serverless backend. It spins up containers on demand, bills per second, and shuts down when idle. For a product with zero users (at the time), paying only for actual compute was the right call.

The full stack:

  • Backend: Python on Modal.com (serverless functions)
  • Frontend: Next.js 14 with App Router, TypeScript
  • Database: Neon Postgres (serverless)
  • AI: Google Gemini 2.5 Flash for code review
  • Auth: NextAuth v5 with GitHub OAuth
  • Payments: Stripe
  • Deployment: Vercel (frontend) + Modal.com (backend)

The Python backend was a deliberate choice. Security tooling and regex pattern matching are Python's strong suit, and the LangChain ecosystem for Gemini integration was mature.

Sunday Early Morning — Static Analysis Engine

This is where the most time went. I built a static analyzer with 19 regex-based security rules across four categories.

Security rules (13 total):

  • Hardcoded API keys — OpenAI (sk-*), Google (AIza*), Stripe (sk_live_*, sk_test_*), AWS (AKIA*)
  • Committed .env files
  • eval() and new Function() usage
  • SQL injection via string concatenation
  • CORS wildcard (origin: "*")
  • dangerouslySetInnerHTML in React (XSS vector)
  • Plain-text password comparison
  • Auth endpoints without rate limiting

Performance rules (5): Unoptimized images, synchronous file operations, useEffect without dependency arrays.

Code quality rules (5): TypeScript any abuse, empty catch blocks, TODO comments left in production.

Deploy readiness rules (4): Localhost URLs, missing environment variable validation, no error boundaries.

The scanner flow for code: clone the repo with git clone --depth=1, traverse all .js, .ts, .tsx, .jsx, .py, .env files, run all 19 pattern checks, deduplicate (max 10 findings per rule to reduce noise), and generate a structured report.

Then the interesting part: the findings get sent to Gemini 2.5 Flash for AI review. It receives the top 25 findings plus the first 200 lines of up to 15 source files. It returns:

  • A plain-English summary of the project's security health
  • 3–5 top priority fixes with copy-paste code
  • Patterns detected across the codebase
  • An overall risk level (critical / high / moderate / low)
  • Additional insights that static rules missed

The grading system:

  • F: Any critical finding
  • D: 3+ high severity findings
  • C: 1–2 high severity findings
  • B/B+: Medium severity only
  • A-/A+: Low or no findings

Sunday Morning — Dynamic URL Scanner

While the static analyzer handles code, the URL scanner handles live sites. It runs 11 security checks concurrently via asyncio.gather:

  1. SSL/TLS — Certificate validity, expiration, TLS version
  2. Security headers — CSP, X-Frame-Options, HSTS, X-Content-Type-Options, Referrer-Policy, Permissions-Policy
  3. CORS — Wildcard detection, reflection attack testing (sends Origin: https://evil-site.com)
  4. HTTPS redirect — Whether HTTP redirects to HTTPS
  5. Server info leakage — Exposed Server and X-Powered-By headers
  6. Sensitive file exposure — Probes for /.env, /.git/config, /.DS_Store, /backup.sql, /wp-config.php.bak
  7. Directory listing — Checks /, /assets/, /static/, /uploads/ for autoindex
  8. Dangerous HTTP methods — Tests if PUT, DELETE, TRACE are enabled
  9. Cookie security — HttpOnly, Secure, SameSite flags
  10. Open redirect — Tests redirect params (?redirect=, ?url=, ?next=) with https://evil.com
  11. Mixed content — Scans HTTPS pages for HTTP resource URLs

All 11 checks run in parallel. Total timeout: 15 seconds for the entire scan.

This was the part where Cursor + Claude Code worked best together. I'd describe a check in natural language to Cursor, get the initial implementation, then use Claude Code to review the logic and catch edge cases. The CORS reflection test, for example, went through three iterations because the first two versions had false positive issues.

Day 2: Frontend + Deployment

Sunday Afternoon — UI

The frontend needed three views:

  1. Landing page — What the product does, pricing, a demo
  2. App interface — Dual-tab design: URL scan vs. code scan
  3. Report view — Findings grouped by category with severity badges

I used Tailwind CSS with a glassmorphic design — frosted glass cards, gradient accent blobs, dark backgrounds. No component library like shadcn/ui. Just custom Tailwind classes and Lucide icons.

The UI work is where Cursor's productivity was highest. "Build a card component showing scan findings grouped by severity" — and a reasonable component came out. Layout and basic styling were usable without modification.

But the details needed manual work. Color-coding by severity (Critical = red, High = orange, Medium = yellow, Low = blue), scan progress animations, the blur overlay for unpaid findings — these took human polish. AI-generated UI reaches "functional" fast but "polished" still needs a person.

The report view component ended up at 1,300+ lines. It includes:

  • Findings grouped by check ID with expandable file lists
  • "Copy Fix Prompt for Cursor" buttons that generate ready-to-paste prompts
  • A blur overlay that hides Critical and High findings on free scans
  • AI review section with priority fixes and pattern analysis
  • One-click auto-fix PR generator (creates a GitHub branch and opens a PR with the fix)
  • Stripe checkout integration for the $29 unlock

Sunday Evening — Deployment

Backend went to Modal.com with two endpoint configurations:

  • Code scan: 1.0 CPU core, 512MB memory, 120-second timeout
  • URL scan: 0.5 CPU core, 256MB memory, 30-second timeout

Frontend deployed to Vercel. I set up proper security headers in next.config.js — because a security scanner without its own security headers would be embarrassing:

  • Content Security Policy with strict connect-src
  • HSTS with 2-year max-age and preload
  • X-Frame-Options: DENY
  • X-Content-Type-Options: nosniff
  • CORS restricted to https://b4uship.com only

Domain: b4uship.com — "Before You Ship." The project originally started as "BreakMyVibe," but that name was too aggressive. B4UShip communicates what it does: scan before you ship.

The last thing I did before calling it shipped was run b4uship on its own code. Dogfooding.

The results? It found a few medium-severity items — some security headers I'd missed during development, and a couple of places where error messages were leaking internal system paths. I fixed them before the final deploy. The irony of a security scanner finding issues in its own code wasn't lost on me.

What I Learned Building with Vibe Coding

48 hours of Cursor + Claude Code as primary development tools. Here's what I actually experienced — not just the good parts.

What Worked

Boilerplate and repetitive code. Project scaffolding, API route structure, basic component generation. These tasks are where AI code generation genuinely saves time. Without it, 48 hours wouldn't have been enough.

Debugging assistant. Showing Claude Code a block of code and asking "what could go wrong with this input?" caught issues I would have missed. It's particularly good at spotting security problems — which makes sense, since it's trained on vast amounts of code including security-focused discussions.

API integration. Stripe checkout, GitHub OAuth, Neon Postgres queries — the pattern-based work of wiring up third-party services. Cursor generated mostly correct integration code.

What Broke

Hallucinated library APIs. Cursor generated Modal.com SDK calls using methods that didn't exist in the version I had installed. Modal's API had changed, and the model was trained on older documentation. I spent a solid hour debugging a "method not found" error before realizing the AI was writing code for an API that no longer existed. The fix was reading the actual docs — something the AI should have done but couldn't.

Overly permissive defaults. This was the most ironic problem. The AI-generated error handlers were returning full stack traces and internal file paths to the client. The CORS config defaulted to origin: "*". For a security scanner. I was building a tool to catch exactly these mistakes, and the AI was generating them in the tool itself. I caught it during my own review, but it perfectly validated the product thesis: AI writes code that works on the happy path and ignores security on every other path.

Sensitive data in responses. When the scan engine found a hardcoded API key, the initial implementation included the full key in the finding report. Which meant b4uship would be collecting and displaying people's secrets in plaintext. I had to build a masking layer that truncates sensitive values — showing sk-abc123***MASKED*** instead of the full key. This is the kind of thing AI doesn't think about because it optimizes for functionality, not for the security implications of the functionality.

The pattern is consistent: AI-generated code works on the happy path. Normal input, normal network, normal environment. But production is mostly abnormal situations, and handling those still requires human judgment — or at minimum, careful human review.

That's b4uship's reason for existing. Build fast with vibe coding, but run one check before you ship.

How Cursor and Claude Code Split the Work

Using both tools simultaneously, a natural division emerged:

  • Cursor: Code generation, autocomplete, file-level work. Best for building new features from scratch.
  • Claude Code: Code review, debugging, architecture discussion, analyzing complex logic. Best for improving existing code and finding problems.

This combination is genuinely faster than solo development. But "faster" and "better" are different. The speed increase means you need to spend proportionally more time on verification. That's the lesson from this build.

Where Things Stand

b4uship.com is live. Here are the real numbers:

  • Signups: 1 (from a mentoring session referral)
  • Paid conversions: 0
  • Revenue: $0

The product works. You can scan a GitHub repo or a live URL right now without signing up. The scan runs, findings appear, the grade shows up. Payment works — Stripe is integrated, $29 unlocks the full report with Critical and High severity findings.

Nobody has paid yet.

The pricing model:

  • Free: Unlimited scans, see your grade (A+ to F), view Medium and Low findings in full detail. Critical and High findings show titles only — the details are blurred.
  • Launch Scan ($29 one-time): Full report unlocked. All Critical and High findings with exact file, line number, and copy-paste fix code. AI security review with priority recommendations.
  • Guardian ($29/month): Coming soon. Auto-scan on every push, Slack alerts, auto-fix PRs, grade trend tracking.

Current scan capabilities:

Static code analysis — 19 pattern-based rules covering hardcoded secrets, injection vulnerabilities, XSS vectors, auth weaknesses, deployment readiness. Plus Gemini AI review that catches patterns the regex rules miss.

Dynamic URL scanning — 11 concurrent checks covering SSL/TLS, security headers, CORS, sensitive file exposure, cookie security, open redirects, and more.

Auto-fix PRs — One-click GitHub PR generation with the suggested fix. The bot creates a branch, applies the change, and opens the PR with severity, CWE reference, and explanation.

What's on the roadmap:

  • CI/CD integration (scan on every push)
  • More language support (currently focused on JS/TS/Python)
  • Dependency vulnerability scanning
  • Browser extension for instant page-level checks

Try it at b4uship.com. Paste your code or enter a URL — no signup needed.

Next Post

In the next post, I'll share actual scan results from vibe-coded projects. What security issues show up most frequently in AI-generated code, broken down by CWE category. If you're shipping vibe-coded projects, the results might make you uncomfortable. But knowing the problems before your users find them is the whole point.

This is part of the $0 to $10K series on contentstailor.com. One engineer, six products, zero revenue — documenting the process of building and growing products with AI.

Part of the $0 to $10K series on ContentsTailor