How to audit on-page SEO without falling into guesswork
A data-driven on-page audit framework built on Search Console, crawler and log file evidence. Technical checklist, zero mysticism.
Any on-page audit that opens with 'I think the title could be better' should close with a resignation letter. The problem isn't the opinion, it's the missing baseline. Before touching a single tag you need three things: a raw 16-month Search Console export, a full crawl (Screaming Frog or Sitebulb), and ideally a log file sample. Without that, every recommendation is a guess. In a recent audit for a 40k-URL ecommerce site we found that 31% of indexable pages had received zero organic clicks in 12 months. That's the kind of number that resets the conversation with the client.
The first cut of the audit splits pages into four buckets: ranking well (top 10), ranking poorly (11-30), invisible (31+), and zombies (no impressions). Each bucket needs different surgery. Top 10 pages get CTR work, usually via title and snippet experiments. That's the fine-grained work I detailed in Title tags that convert: 7 patterns tested on real SERPs and Does meta description still matter? What CTR data shows. Zombie pages are usually cannibalization, intent mismatches, or decaying content. Each bucket becomes its own sheet, prioritized by potential traffic computed as (monthly volume * expected CTR by position - current clicks).
Then the crawl. I run Screaming Frog with JavaScript rendering enabled, connected to GSC and GA4 through their APIs. That gives me, on the same row, status code, depth, word count, title, H1, canonical, hreflang, impressions, clicks and sessions. Structural errors surface fast: duplicate titles, missing H1, canonicals pointing to 404s, redirect chains. On one B2B SaaS client we found 1,200 pages with self-referencing canonicals, listed as priority in the sitemap, and simultaneously blocked via meta noindex. Pure chaos. The details matter, as I argued in Canonical tags: common mistakes bleeding your organic traffic and Headings H1-H6: the structure Google actually reads.
The technical layer comes third. Here we measure real Core Web Vitals (CrUX, not just Lighthouse), server response time by template, average image weight and HTML-to-JS ratio. In a news portal audit, mobile LCP averaged 4.8s because the hero image was served at 1920px with no srcset. We swapped to responsive AVIF and LCP dropped to 2.1s in three weeks, with an 18% lift in organic clicks on pages that were already ranking. That causal relationship only emerges when you measure before and after. For images specifically see Image optimization: alt text, weight and LCP in practice. When JavaScript breaks indexing, revisit JavaScript SEO: rendering, hydration, and indexing.
Content only enters after the skeleton stands. Here I track three per-URL metrics: query coverage (how many unique keywords it ranks for), depth score (words + headings + entities covered vs top-3 competitors), and a dwell signal proxy (sessions / impressions adjusted by position). Pages with high query coverage but mediocre average position usually need intent work, not more words. Pages with good position but poor CTR are a title/snippet problem. That segmentation avoids the classic 'rewrite everything' trap that costs a fortune and moves nothing, the core argument in Rewrite or rebuild: making the call with SERP data.
The final block is interlinking and architecture. I use the crawl to build a graph (URL > linked URLs) and compute a simple internal PageRank. I compare it against real organic traffic: pages with high internal PR and low traffic are obvious upside, pages with low internal PR and high traffic are underexploited. On average, moving 15-25 contextual links from hubs to money pages lifts positions by 2-4 spots within 6 weeks. That work underpinned everything I wrote in Smart interlinking: the internal authority map. Finish with a 90-day plan capped at 12 prioritized items, each with a hypothesis, a metric and a review date. An audit that becomes an 80-page PDF nobody executes is worth zero. Takeaway: if you can't summarize the findings in a spreadsheet with an estimated-sessions impact column, you didn't audit, you opined.