GEO Implementation Checklist: 22 Items

The GEO implementation checklist has 22 items across 6 layers: (1) AI crawler access via robots.txt, (2) discovery via llms.txt and sitemap, (3) meta tags with Open Graph and article dates, (4) JSON-LD schema, (5) inverted pyramid content structure, (6) Core Web Vitals optimization.

GEO Implementation Checklist: 22 Items

The GEO implementation checklist has 22 items across 6 layers: (1) AI crawler access via robots.txt, (2) discovery via llms.txt and sitemap, (3) meta tags with Open Graph and article dates, (4) JSON-LD schema, (5) inverted pyramid content structure, (6) Core Web Vitals optimization.

Use this checklist when auditing an existing site or launching a new one. Items are ordered by impact — complete them in sequence for maximum effect.

Layer 1: AI Crawler Access

#TaskImpact
1robots.txt with all 8 AI crawlers explicitly allowed (GPTBot, OAI-SearchBot, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Google-Extended, BingBot)Critical
2SSR or SSG active — never CSR-only for indexable contentCritical

Why it’s critical: A site that blocks AI crawlers or serves only client-side JavaScript is invisible to all generative AI systems, regardless of content quality. Fix this first.

Layer 2: Discovery

#TaskImpact
3llms.txt at site root with site description and all major pagesCritical
4XML sitemap with <lastmod> on all URLsCritical
22Segmented sitemaps by content type (blog, guides, products)Medium

Why it matters: Even if AI crawlers can access your site, they need to find your pages. llms.txt provides a curated, human-readable map. XML sitemap provides a machine-readable index with recency signals.

Layer 3: Meta Tags

#TaskImpact
5<title> specific + <meta name="description"> as direct answer (≤160 chars)Critical
6Open Graph complete: og:type, og:title, og:description, og:url, og:site_name, og:image, og:localeHigh
7article:published_time + article:modified_time on all article pagesHigh
8<link rel="canonical"> on every pageHigh

Key detail on dates: Recency is a primary scoring dimension in AI citation algorithms. Pages without article:published_time get no freshness signal. Update article:modified_time every time you revise content.

Layer 4: Schema Markup (JSON-LD)

#TaskImpact
9JSON-LD Article schema with publisher + dates on all content pagesCritical
10FAQPage schema on pages with question sectionsHigh
11HowTo schema on tutorials and step-by-step guidesMedium
12BreadcrumbList schema for site hierarchyMedium

Data point: Schema markup increases precise information extraction from 16% to 54% (Semrush, 10,000-page study). Pages with correct JSON-LD are 2.5x more likely to appear in AI-generated answers.

Layer 5: Content Structure

#TaskImpact
13Inverted pyramid: answer in first 1-2 sentences after every H2Critical
14Answer capsules of 40-60 words at section startsHigh
15Statistics with source citationHigh
16Direct quotes from named expertsHigh
17H1→H2→H3 semantic hierarchy, one concept per headingHigh
18Semantic HTML: <article>, <time>, <cite>, <address>Medium
19Descriptive anchor text on internal links — no “click here”Medium

Research source: Princeton/Georgia Tech GEO study (2023) quantified the impact: cited statistics +40%, expert quotes +37%, external source references +30%.

Layer 6: Performance

#TaskImpact
20LCP < 2.5s, INP < 200ms, CLS < 0.1High
21External brand mentions in industry publicationsMedium

On Core Web Vitals: These are not an accelerator — they’re a minimum threshold. A slow site can be excluded from AI citations even with excellent content. A fast site doesn’t gain advantage from speed alone, but a slow site loses citations.

Priority Order for New Sites

If you’re starting from scratch, implement in this order:

  1. robots.txt (takes 5 minutes, unlocks everything else)
  2. SSR/SSG (architectural decision — do this before building content)
  3. Base meta tags (title, description, canonical on every page)
  4. JSON-LD Article schema (add to page template so it applies everywhere)
  5. article:published_time (add to article template)
  6. llms.txt (create once, update as you add pages)
  7. Content structure (apply inverted pyramid and answer capsules as you write)
  8. Open Graph (add to page template)
  9. FAQPage/HowTo schema (add to specific page types)
  10. Core Web Vitals (optimize once technical foundation is in place)

Audit Template

For existing sites, use this audit flow:

Step 1: Check robots.txt
  → curl https://yoursite.com/robots.txt | grep -E "GPTBot|ClaudeBot|Perplexity"
  → Expected: Allow: / for each bot

Step 2: Check llms.txt
  → curl https://yoursite.com/llms.txt
  → Expected: Markdown file with site description and page list

Step 3: Check meta tags on a sample page
  → View source → search for article:published_time
  → Expected: ISO 8601 date

Step 4: Check schema on a sample page
  → Google Rich Results Test: https://search.google.com/test/rich-results
  → Expected: Valid Article schema detected

Step 5: Check Core Web Vitals
  → PageSpeed Insights: https://pagespeed.web.dev
  → Expected: LCP < 2.5s, INP < 200ms, CLS < 0.1

Full 22-Item Reference

  1. robots.txt with all 8 AI crawlers allowed
  2. SSR or SSG (no CSR-only)
  3. llms.txt at site root
  4. XML sitemap with lastmod
  5. Title + meta description on every page
  6. Open Graph complete
  7. article:published_time + article:modified_time
  8. Canonical URL on every page
  9. JSON-LD Article schema (all content pages)
  10. FAQPage schema (where applicable)
  11. HowTo schema (tutorials)
  12. BreadcrumbList schema
  13. Inverted pyramid content structure
  14. Answer capsules (40-60 words)
  15. Cited statistics
  16. Named expert quotes
  17. Semantic heading hierarchy
  18. Semantic HTML elements
  19. Descriptive anchor text
  20. Core Web Vitals (LCP, INP, CLS)
  21. External brand mentions
  22. Segmented sitemaps