GEO Implementation Checklist: 22 Items
The GEO implementation checklist has 22 items across 6 layers: (1) AI crawler access via robots.txt, (2) discovery via llms.txt and sitemap, (3) meta tags with Open Graph and article dates, (4) JSON-LD schema, (5) inverted pyramid content structure, (6) Core Web Vitals optimization.
GEO Implementation Checklist: 22 Items
The GEO implementation checklist has 22 items across 6 layers: (1) AI crawler access via robots.txt, (2) discovery via llms.txt and sitemap, (3) meta tags with Open Graph and article dates, (4) JSON-LD schema, (5) inverted pyramid content structure, (6) Core Web Vitals optimization.
Use this checklist when auditing an existing site or launching a new one. Items are ordered by impact — complete them in sequence for maximum effect.
Layer 1: AI Crawler Access
| # | Task | Impact |
|---|---|---|
| 1 | robots.txt with all 8 AI crawlers explicitly allowed (GPTBot, OAI-SearchBot, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Google-Extended, BingBot) | Critical |
| 2 | SSR or SSG active — never CSR-only for indexable content | Critical |
Why it’s critical: A site that blocks AI crawlers or serves only client-side JavaScript is invisible to all generative AI systems, regardless of content quality. Fix this first.
Layer 2: Discovery
| # | Task | Impact |
|---|---|---|
| 3 | llms.txt at site root with site description and all major pages | Critical |
| 4 | XML sitemap with <lastmod> on all URLs | Critical |
| 22 | Segmented sitemaps by content type (blog, guides, products) | Medium |
Why it matters: Even if AI crawlers can access your site, they need to find your pages. llms.txt provides a curated, human-readable map. XML sitemap provides a machine-readable index with recency signals.
Layer 3: Meta Tags
| # | Task | Impact |
|---|---|---|
| 5 | <title> specific + <meta name="description"> as direct answer (≤160 chars) | Critical |
| 6 | Open Graph complete: og:type, og:title, og:description, og:url, og:site_name, og:image, og:locale | High |
| 7 | article:published_time + article:modified_time on all article pages | High |
| 8 | <link rel="canonical"> on every page | High |
Key detail on dates: Recency is a primary scoring dimension in AI citation algorithms. Pages without article:published_time get no freshness signal. Update article:modified_time every time you revise content.
Layer 4: Schema Markup (JSON-LD)
| # | Task | Impact |
|---|---|---|
| 9 | JSON-LD Article schema with publisher + dates on all content pages | Critical |
| 10 | FAQPage schema on pages with question sections | High |
| 11 | HowTo schema on tutorials and step-by-step guides | Medium |
| 12 | BreadcrumbList schema for site hierarchy | Medium |
Data point: Schema markup increases precise information extraction from 16% to 54% (Semrush, 10,000-page study). Pages with correct JSON-LD are 2.5x more likely to appear in AI-generated answers.
Layer 5: Content Structure
| # | Task | Impact |
|---|---|---|
| 13 | Inverted pyramid: answer in first 1-2 sentences after every H2 | Critical |
| 14 | Answer capsules of 40-60 words at section starts | High |
| 15 | Statistics with source citation | High |
| 16 | Direct quotes from named experts | High |
| 17 | H1→H2→H3 semantic hierarchy, one concept per heading | High |
| 18 | Semantic HTML: <article>, <time>, <cite>, <address> | Medium |
| 19 | Descriptive anchor text on internal links — no “click here” | Medium |
Research source: Princeton/Georgia Tech GEO study (2023) quantified the impact: cited statistics +40%, expert quotes +37%, external source references +30%.
Layer 6: Performance
| # | Task | Impact |
|---|---|---|
| 20 | LCP < 2.5s, INP < 200ms, CLS < 0.1 | High |
| 21 | External brand mentions in industry publications | Medium |
On Core Web Vitals: These are not an accelerator — they’re a minimum threshold. A slow site can be excluded from AI citations even with excellent content. A fast site doesn’t gain advantage from speed alone, but a slow site loses citations.
Priority Order for New Sites
If you’re starting from scratch, implement in this order:
- robots.txt (takes 5 minutes, unlocks everything else)
- SSR/SSG (architectural decision — do this before building content)
- Base meta tags (title, description, canonical on every page)
- JSON-LD Article schema (add to page template so it applies everywhere)
- article:published_time (add to article template)
- llms.txt (create once, update as you add pages)
- Content structure (apply inverted pyramid and answer capsules as you write)
- Open Graph (add to page template)
- FAQPage/HowTo schema (add to specific page types)
- Core Web Vitals (optimize once technical foundation is in place)
Audit Template
For existing sites, use this audit flow:
Step 1: Check robots.txt
→ curl https://yoursite.com/robots.txt | grep -E "GPTBot|ClaudeBot|Perplexity"
→ Expected: Allow: / for each bot
Step 2: Check llms.txt
→ curl https://yoursite.com/llms.txt
→ Expected: Markdown file with site description and page list
Step 3: Check meta tags on a sample page
→ View source → search for article:published_time
→ Expected: ISO 8601 date
Step 4: Check schema on a sample page
→ Google Rich Results Test: https://search.google.com/test/rich-results
→ Expected: Valid Article schema detected
Step 5: Check Core Web Vitals
→ PageSpeed Insights: https://pagespeed.web.dev
→ Expected: LCP < 2.5s, INP < 200ms, CLS < 0.1
Full 22-Item Reference
- robots.txt with all 8 AI crawlers allowed
- SSR or SSG (no CSR-only)
- llms.txt at site root
- XML sitemap with lastmod
- Title + meta description on every page
- Open Graph complete
- article:published_time + article:modified_time
- Canonical URL on every page
- JSON-LD Article schema (all content pages)
- FAQPage schema (where applicable)
- HowTo schema (tutorials)
- BreadcrumbList schema
- Inverted pyramid content structure
- Answer capsules (40-60 words)
- Cited statistics
- Named expert quotes
- Semantic heading hierarchy
- Semantic HTML elements
- Descriptive anchor text
- Core Web Vitals (LCP, INP, CLS)
- External brand mentions
- Segmented sitemaps