GEO Implementation Checklist: 22 Items

The GEO implementation checklist has 22 items across 6 layers: (1) AI crawler access via robots.txt, (2) discovery via llms.txt and sitemap, (3) meta tags with Open Graph and article dates, (4) JSON-LD schema, (5) inverted pyramid content structure, (6) Core Web Vitals optimization.

Use this checklist when auditing an existing site or launching a new one. Items are ordered by impact — complete them in sequence for maximum effect.

Layer 1: AI Crawler Access

#	Task	Impact
1	robots.txt with all 8 AI crawlers explicitly allowed (GPTBot, OAI-SearchBot, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Google-Extended, BingBot)	Critical
2	SSR or SSG active — never CSR-only for indexable content	Critical

Why it’s critical: A site that blocks AI crawlers or serves only client-side JavaScript is invisible to all generative AI systems, regardless of content quality. Fix this first.

Layer 2: Discovery

#	Task	Impact
3	llms.txt at site root with site description and all major pages	Critical
4	XML sitemap with `<lastmod>` on all URLs	Critical
22	Segmented sitemaps by content type (blog, guides, products)	Medium

Why it matters: Even if AI crawlers can access your site, they need to find your pages. llms.txt provides a curated, human-readable map. XML sitemap provides a machine-readable index with recency signals.

Layer 3: Meta Tags

#	Task	Impact
5	`<title>` specific + `<meta name="description">` as direct answer (≤160 chars)	Critical
6	Open Graph complete: og:type, og:title, og:description, og:url, og:site_name, og:image, og:locale	High
7	article:published_time + article:modified_time on all article pages	High
8	`<link rel="canonical">` on every page	High

Key detail on dates: Recency is a primary scoring dimension in AI citation algorithms. Pages without article:published_time get no freshness signal. Update article:modified_time every time you revise content.

Layer 4: Schema Markup (JSON-LD)

#	Task	Impact
9	JSON-LD Article schema with publisher + dates on all content pages	Critical
10	FAQPage schema on pages with question sections	High
11	HowTo schema on tutorials and step-by-step guides	Medium
12	BreadcrumbList schema for site hierarchy	Medium

Data point: Schema markup increases precise information extraction from 16% to 54% (Semrush, 10,000-page study). Pages with correct JSON-LD are 2.5x more likely to appear in AI-generated answers.

Layer 5: Content Structure

#	Task	Impact
13	Inverted pyramid: answer in first 1-2 sentences after every H2	Critical
14	Answer capsules of 40-60 words at section starts	High
15	Statistics with source citation	High
16	Direct quotes from named experts	High
17	H1→H2→H3 semantic hierarchy, one concept per heading	High
18	Semantic HTML: `<article>`, `<time>`, `<cite>`, `<address>`	Medium
19	Descriptive anchor text on internal links — no “click here”	Medium

Research source: Princeton/Georgia Tech GEO study (2023) quantified the impact: cited statistics +40%, expert quotes +37%, external source references +30%.

Layer 6: Performance

#	Task	Impact
20	LCP < 2.5s, INP < 200ms, CLS < 0.1	High
21	External brand mentions in industry publications	Medium

On Core Web Vitals: These are not an accelerator — they’re a minimum threshold. A slow site can be excluded from AI citations even with excellent content. A fast site doesn’t gain advantage from speed alone, but a slow site loses citations.

Priority Order for New Sites

If you’re starting from scratch, implement in this order:

robots.txt (takes 5 minutes, unlocks everything else)
SSR/SSG (architectural decision — do this before building content)
Base meta tags (title, description, canonical on every page)
JSON-LD Article schema (add to page template so it applies everywhere)
article:published_time (add to article template)
llms.txt (create once, update as you add pages)
Content structure (apply inverted pyramid and answer capsules as you write)
Open Graph (add to page template)
FAQPage/HowTo schema (add to specific page types)
Core Web Vitals (optimize once technical foundation is in place)

Audit Template

For existing sites, use this audit flow:

Step 1: Check robots.txt
  → curl https://yoursite.com/robots.txt | grep -E "GPTBot|ClaudeBot|Perplexity"
  → Expected: Allow: / for each bot

Step 2: Check llms.txt
  → curl https://yoursite.com/llms.txt
  → Expected: Markdown file with site description and page list

Step 3: Check meta tags on a sample page
  → View source → search for article:published_time
  → Expected: ISO 8601 date

Step 4: Check schema on a sample page
  → Google Rich Results Test: https://search.google.com/test/rich-results
  → Expected: Valid Article schema detected

Step 5: Check Core Web Vitals
  → PageSpeed Insights: https://pagespeed.web.dev
  → Expected: LCP < 2.5s, INP < 200ms, CLS < 0.1

Full 22-Item Reference

robots.txt with all 8 AI crawlers allowed
SSR or SSG (no CSR-only)
llms.txt at site root
XML sitemap with lastmod
Title + meta description on every page
Open Graph complete
article:published_time + article:modified_time
Canonical URL on every page
JSON-LD Article schema (all content pages)
FAQPage schema (where applicable)
HowTo schema (tutorials)
BreadcrumbList schema
Inverted pyramid content structure
Answer capsules (40-60 words)
Cited statistics
Named expert quotes
Semantic heading hierarchy
Semantic HTML elements
Descriptive anchor text
Core Web Vitals (LCP, INP, CLS)
External brand mentions
Segmented sitemaps