Schema Markup for AI Visibility: What Crawlers Actually Read
AI crawlers don't read your site the way humans do. They can't admire your hero image or appreciate your brand voice. What they can read – fluently – is structured data. Schema markup is how you translate your content into a language AI actually understands.
In this guide
What schema markup actually is
Schema markup is a standardized vocabulary – maintained at schema.org – that lets you describe the content on your pages in a way machines can parse without guessing. Instead of hoping a crawler figures out that the number 49.99 on your page is a price, you explicitly label it as one.
Think of it as metadata with structure. You wrap facts about your content – names, prices, ratings, publication dates, authorship -- in a format that any crawler can ingest on the first pass. No interpretation needed. No ambiguity.
There are three formats for embedding schema: Microdata, RDFa, and JSON-LD. In practice, JSON-LD won. Google recommends it. AI crawlers prefer it. It lives in a <script> tag in your page head, completely separate from your HTML markup, which means it doesn't clutter your templates and is easy to generate server-side.
For bot CRO specifically, schema markup is the single highest-impact change you can make to improve how AI systems understand and represent your content. When GPTBot, ClaudeBot, or PerplexityBot crawl your page, structured data is the first thing they parse before they even look at your body content.
Schema types that matter for AI visibility
Schema.org defines hundreds of types. Most of them are irrelevant for AI crawlers. Here are the five that consistently improve how AI systems understand and cite your content.
Product
The most critical type for ecommerce. Defines name, description, price, availability, brand, reviews, and images. AI shopping agents rely on Product schema to compare offerings across stores. Without it, your product is invisible to recommendation engines.
FAQPage
Maps question-and-answer pairs directly into structured data. AI assistants pull from FAQ schema when answering user queries. If your FAQ content is only in HTML, crawlers have to guess which text is a question and which is the answer. Schema removes that ambiguity.
Organization
Establishes your brand identity – name, logo, contact info, social profiles, founding date. AI systems use Organization schema to build knowledge graph entries and determine brand authority. It is the foundation for entity recognition.
HowTo
Breaks instructional content into discrete steps with tools, supplies, and time estimates. AI assistants surface HowTo schema when users ask procedural questions. Each step becomes a self-contained unit the AI can reference or quote.
BreadcrumbList
Defines the hierarchical path to a page within your site. Crawlers use this to understand site architecture and content relationships. It tells AI systems that your running-shoes page lives under footwear, which lives under products – context that plain URLs rarely communicate.
How AI crawlers use structured data
Traditional search engine crawlers like Googlebot use schema to generate rich snippets – star ratings, price ranges, FAQ dropdowns in search results. AI crawlers use it differently.
When GPTBot or ClaudeBot crawls a page, structured data serves as the index card for that page's content. The crawler ingests your JSON-LD block first, building a structured representation of what the page offers. Then it processes the body content with that context already established.
This has three practical implications. First, pages with schema are more likely to be cited in AI-generated answers because the AI already has a clean, structured understanding of the content. Second, the AI can represent your content more accurately – it knows your product costs $49.99, not that it "might cost around fifty dollars." Third, schema helps AI systems de-duplicate information. If three sites sell the same product, the one with complete Product schema gets the most precise representation.
Botjar's crawl replays show you exactly which schema blocks each AI crawler parses and whether they encounter errors during extraction. You can see in real time what GPTBot "understood" about your page versus what you intended it to understand. Learn more about how these crawlers work in our guide to AI crawlers.
Implementation guide: JSON-LD examples
Every JSON-LD block follows the same basic structure: a @context pointing to schema.org and a @type declaring what you are describing. Here are production-ready examples for each schema type.
Product schema
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Ultra Running Shoe Pro",
"description": "Lightweight trail running shoe with carbon plate.",
"brand": {
"@type": "Brand",
"name": "TrailCo"
},
"image": "https://example.com/shoe.jpg",
"offers": {
"@type": "Offer",
"price": "149.99",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock",
"url": "https://example.com/products/ultra-pro"
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.7",
"reviewCount": "342"
}
}
</script>FAQPage schema
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is bot CRO?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Bot CRO is conversion rate optimization
for non-human visitors – AI crawlers,
search bots, and data collectors."
}
},
{
"@type": "Question",
"name": "How does schema help AI visibility?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Schema gives AI crawlers structured data
they can parse without guessing, increasing
accuracy of citations and recommendations."
}
}
]
}
</script>Organization schema
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"name": "Your Company",
"url": "https://example.com",
"logo": "https://example.com/logo.png",
"foundingDate": "2023",
"sameAs": [
"https://twitter.com/yourcompany",
"https://linkedin.com/company/yourcompany"
],
"contactPoint": {
"@type": "ContactPoint",
"contactType": "customer service",
"email": "support@example.com"
}
}
</script>BreadcrumbList schema
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "Home",
"item": "https://example.com"
},
{
"@type": "ListItem",
"position": 2,
"name": "Products",
"item": "https://example.com/products"
},
{
"@type": "ListItem",
"position": 3,
"name": "Running Shoes",
"item": "https://example.com/products/running-shoes"
}
]
}
</script>Place each JSON-LD block inside a <script type="application/ld+json"> tag in your page's <head>. You can include multiple schema blocks on the same page – a Product block, a BreadcrumbList block, and an Organization block can all coexist.
Common schema mistakes
Getting schema on your page is step one. Getting it right is step two. These are the errors botjar detects most frequently across client sites.
Missing required properties
A Product schema without an offers block is technically valid but practically useless. AI crawlers need price and availability to make recommendations. Always include every property the schema type supports, not just the required minimums.
Stale or incorrect data
Schema that says a product is in stock when it is actually sold out, or lists last year's price, actively damages your AI visibility. Crawlers compare your schema against your page content. Discrepancies erode trust scores. Generate schema dynamically from your data source, never hardcode it.
Using Microdata instead of JSON-LD
Microdata is embedded inline in your HTML, which makes it fragile – template changes can silently break your schema. JSON-LD lives in a separate script block, making it portable, testable, and easier to maintain. Every major AI crawler handles JSON-LD more reliably.
Duplicate or conflicting schema
Multiple Product schemas on a single product page – often caused by theme plugins and apps adding their own – confuse crawlers. Audit your page source. You should see exactly one schema block per entity being described.
Schema on JavaScript-rendered pages without SSR
If your schema is injected via client-side JavaScript, some crawlers will never see it. GPTBot and ClaudeBot have limited JavaScript rendering capability. Ensure your schema is present in the initial HTML response, not injected after hydration.
Testing and validation
After implementing schema, validate it before you let crawlers discover it. Here is the testing workflow we recommend.
Google Rich Results Test
Paste your URL into Google's Rich Results Test. It will parse your schema and flag any errors or warnings. Fix all errors before proceeding. This tool catches syntax issues, missing required fields, and type mismatches.
Schema.org Validator
The official schema.org validator checks compliance with the full vocabulary, including properties that Google does not yet support but AI crawlers may use. It catches issues the Rich Results Test misses.
Botjar Schema Pulse
Botjar's Schema Pulse feature monitors your schema markup continuously. It alerts you when schema breaks due to deployment changes, when data falls out of sync, and when new schema types become relevant for your content. It also shows you exactly what each AI crawler extracted from your schema on every crawl.
For a broader perspective on how AI systems evaluate your site beyond just schema, read our guide on AI Visibility Score.
See what crawlers actually read on your site
Botjar shows you every schema block each AI crawler parses – and what it misses. Stop guessing. Start measuring.
Get Your Free Bot Audit