Make content easy for LLMs to parse
Analyzes how well LLMs can parse and understand the content
- Use semantic HTML headings, paragraphs, and lists — LLMs prefer structured markup
- Avoid content locked behind JavaScript rendering or requiring user interaction
- Write clear, self-contained sections that make sense out of full-page context
- Structured data (JSON-LD) provides machine-readable context alongside human-readable text
Rule Details
LLMs and answer engines such as Google AI Overviews (opens in new tab), Bing Copilot, and Perplexity extract content from web pages and synthesize answers. Pages structured for LLM parsability are cited more accurately and more often, especially when they already follow strong structured-data practices.
Code Examples
<!-- ✅ Good: Semantic structure — LLMs can extract by section -->
<article>
<h1>How to Build a Sourdough Starter</h1>
<p>A sourdough starter is a fermented mixture of flour and water that
captures wild yeast and lactic acid bacteria from the environment.
It takes 7–14 days to become active enough for baking.</p>
<h2>Ingredients</h2>
<ul>
<li>100g whole wheat flour</li>
<li>100ml room-temperature water (filtered or left overnight to dechlorinate)</li>
</ul>
<h2>Day 1: Initial Mix</h2>
<p>Combine flour and water in a clean jar. Stir vigorously until no
dry flour remains. Cover loosely and leave at room temperature (70–75°F).</p>
</article><!-- ❌ Poor: Content in div soup — hard to extract sections -->
<div class="wrapper">
<div class="content-block">
<div class="title-area">Sourdough Starter</div>
<div class="text">Mix flour and water...</div>
</div>
</div>Why It Matters
AI assistants and answer engines (including Google's AI Overviews) extract and cite content from web pages—pages with clear structure and explicit context are more likely to be accurately cited and surfaced in AI-generated responses.
What LLMs Need From Your Content
| Signal | Why It Matters |
|---|---|
| Semantic HTML headings | Defines topic hierarchy; LLMs use headings to segment and label content |
| Self-contained paragraphs | Paragraphs may be quoted independently from surrounding context |
| Explicit Q&A structure | FAQPage schema maps directly to answer-engine question matching |
| Server-rendered text | JavaScript-only content may not be accessible during crawl |
| JSON-LD schema | Provides machine-readable context without ambiguity |
FAQ Schema for Answer Extraction
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "How long does it take to make a sourdough starter?",
"acceptedAnswer": {
"@type": "Answer",
"text": "A sourdough starter typically takes 7–14 days to become reliably active for baking. In the first 3–4 days you may see activity that then slows — this is normal. Consistent daily feeding and warm temperatures (75–80°F) speed up the process."
}
},
{
"@type": "Question",
"name": "What flour is best for a sourdough starter?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Whole wheat or rye flour works best initially because the bran contains more wild yeast and bacteria. Once established, you can switch to unbleached all-purpose flour."
}
}
]
}
</script>HowTo Schema for Step-by-Step Content
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "How to Build a Sourdough Starter",
"totalTime": "P14D",
"step": [
{
"@type": "HowToStep",
"name": "Day 1: Initial mix",
"text": "Combine 100g whole wheat flour and 100ml water. Stir well. Cover loosely.",
"position": 1
},
{
"@type": "HowToStep",
"name": "Days 2–7: Daily feeding",
"text": "Each day, discard 80g of the starter and feed with 50g flour and 50ml water.",
"position": 2
}
]
}
</script>Server-Side Rendering
Content loaded by JavaScript after the initial HTML response may be missed by LLM crawlers:
// ❌ Bad: Content only available after JS execution
useEffect(() => {
setContent(fetchContent())
}, [])
// ✅ Good: Content in initial HTML response (SSR/SSG)
export async function getServerSideProps() {
const content = await fetchContent()
return { props: { content } }
}Writing Self-Contained Sentences
LLMs often extract individual sentences or paragraphs without surrounding context:
❌ "As mentioned above, this approach improves performance."
(What approach? What was mentioned above?)
✅ "Using lazy loading for images improves page performance by reducing initial download size."
(Complete, context-free sentence)Exceptions
- Necessary utility or compliance pages can be intentionally brief and should not be judged by the same editorial-depth expectations as ranking-focused content.
- AI-assisted drafting is not a failure by itself; flag unsupported claims, missing editorial review, or low-originality output instead.
- When a page has both trust-signal issues and crawl/index problems, make the page eligible to rank first and then improve the content quality signals.
Verification
Automated Checks
- Inspect rendered HTML and HTTP headers to confirm the expected metadata or crawlability signal is present.
- Test the affected URL with Google Search Console or equivalent tooling where relevant.
- Re-crawl a representative page set after deployment.
Manual Checks
- Confirm the change does not create conflicting canonical-url, robots, or structured-data signals.
Use with AI
Copy these prompts to use with your AI assistant, or install the MCP server to use directly from Claude, Cursor, or Windsurf.
Check
Verify implementation
Evaluate whether the page content is parseable by an LLM. Check: (1) Is content in semantic HTML tags (<h1>–<h6>, <p>, <ul>, <ol>, <table>)? (2) Is key content accessible without JavaScript? (3) Are section headings descriptive enough to stand alone? (4) Does the page have JSON-LD structured data? (5) Are there FAQ sections or explicit Q&A patterns that match common search queries?
Fix
Auto-fix issues
Restructure content into explicit HTML sections with descriptive headings. Replace JavaScript-rendered content with server-side rendered HTML. Add JSON-LD schema (Article, FAQPage, HowTo) to annotate the content type. Write headings and lead sentences that work as standalone answers—assume the reader only sees one paragraph.
Explain
Learn more
Large language models and answer engines process web content by extracting text from HTML. Pages that use semantic markup, clear headings, and server-rendered content are parsed more accurately than JavaScript-heavy or visually-structured pages. As AI-generated answers increasingly cite specific web sources, well-structured content is more likely to be accurately quoted and linked.
Review
Code review
Check the page's rendered HTML for: (1) proper heading hierarchy (h1→h2→h3), (2) content wrapped in semantic elements (<article>, <section>, <main>), (3) key content visible in initial HTML response (not injected by JS), (4) presence of FAQPage, HowTo, or Article JSON-LD schema, (5) absence of content hidden behind modals, tabs, or accordions that require JS interaction.
