citare
GEO spoke — developer reference

Structured data for AI search

Structured data is the single most under-deployed lever for AI search visibility. AI platforms read JSON-LD preferentially over body text when extracting facts for citation. The leverage gap is large; the cost to close it is small. This is the complete JSON-LD reference — every schema that materially affects AI citation, with production-ready code, validation tools, and the seven mistakes that produce broken or sparse schema in production.

Updated May 2026

TL;DR

  • 1.AI platforms prefer JSON-LD over body text for fact extraction. A page with comprehensive JSON-LD is materially more citable than the same page without it.
  • 2.Use JSON-LD (not Microdata, not RDFa). Google explicitly prefers it; AI platforms parse it most reliably; it's decoupled from visible markup.
  • 3.Tier 1 (everyone deploys): Organization, Article, FAQPage, WebSite. FAQPage is the single highest individual citation lever; Organization+sameAs is the entity-recognition foundation.
  • 4.Validate every priority page in Google Rich Results Test + Schema Markup Validator before deploy. Schema-doesn't-match- visible-content is now an active negative signal.

What JSON-LD is and why AI platforms prefer it

JSON-LD is JSON for Linking Data — a JSON-based format for embedding structured data in web pages, standardized by W3C. The format is built on schema.org, a vendor-neutral vocabulary jointly developed by Google, Microsoft, Yahoo, and Yandex.

JSON-LD lives inside <script type="application/ld+json"> tags in your HTML, typically in the <head>. It does not affect human-visible page rendering. It exists exclusively for machine consumption.

AI platforms — Google AI Overview, Gemini, ChatGPT, Claude, Perplexity — face an extraction problem when generating answers. From any page: what is the canonical brand name? The price? Who is the author? When was it last updated? Body text is ambiguous. JSON-LD is unambiguous. When a page provides both, AI platforms prefer JSON-LD for factual extraction. Citation context becomes more reliable when the model extracts structured claims rather than parsing prose.

JSON-LD vs Microdata vs RDFa

Schema.org markup can be expressed in three formats. JSON-LD is the recommended one in 2026 — Google explicitly prefers it, AI platforms parse it most reliably, and it's decoupled from visible markup so design changes don't break your structured data. Microdata embeds attributes (itemscope, itemtype, itemprop) inline in HTML; RDFa uses RDF attribute syntax. Both are legacy. Use JSON-LD.

Schema priority — what to deploy in order

Not all schemas have equal citation leverage. Deploy in tier order; measure lift before moving to the next tier.

Tier 1

Everyone deploys these

Organization Homepage

Entity recognition foundation. Powers Knowledge Graph + sameAs disambiguation across every AI platform.

Article Every long-form content page

Blog posts, guides, research. Carries dateModified (the freshness signal AIO uses for citation weighting).

FAQPage Any page with Q&A content

The single highest-leverage schema for AIO citation. AI extracts answers verbatim from FAQPage mainEntity.

WebSite + SearchAction Homepage

Enables Google's site-search rich result and signals canonical site identity.

Tier 2

Most brands deploy these

WebPage Non-Article content pages

Wrapper for pages that aren't blog/guide content (pricing, about, contact).

BreadcrumbList Every page beyond homepage

Navigation hierarchy signal. Helps AI platforms understand your IA.

Person Author bylines + team pages

Linked from Article.author. E-E-A-T authorial credibility signal.

Product Every ecommerce product page

Powers AI citation in 'best of', 'top', and recommendation queries.

LocalBusiness Every physical-location page

AIO geo-contextualization + Gemini local query handling.

Tier 3

Use-case specific

  • HowTo Procedural / step-by-step content
  • Recipe Recipe content (food vertical)
  • Event Events with date / location / registration
  • JobPosting Hiring pages
  • SoftwareApplication SaaS product pages
  • Service Service offering pages
  • Review + AggregateRating Review content + social proof signals
  • VideoObject Video content with thumbnails + duration + transcript

Production-ready code samples

Copy-pasteable JSON-LD for the highest-leverage schemas. Replace example values; validate in Rich Results Test; ship.

Organization schema

Homepage · highest entity-recognition leverage

Tells AI platforms what your brand entity is. Powers Knowledge Graph, sameAs disambiguation, and canonical reference identity across every AI platform. The sameAs array is the single highest-leverage property — aim for 5-10 canonical references (LinkedIn, Crunchbase, Wikipedia where applicable, official social channels).

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Citare",
  "alternateName": "Citare AI",
  "url": "https://citare.ai",
  "logo": "https://citare.ai/logo.png",
  "description": "AI search visibility platform measuring brand presence across ChatGPT, Gemini, Claude, Perplexity, and Google AI Overview.",
  "foundingDate": "2024",
  "founders": [
    { "@type": "Person", "name": "Ravi RDP" }
  ],
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Bangalore",
    "addressRegion": "KA",
    "addressCountry": "IN"
  },
  "contactPoint": {
    "@type": "ContactPoint",
    "contactType": "customer support",
    "email": "support@citare.ai",
    "availableLanguage": ["English"]
  },
  "areaServed": "Worldwide",
  "sameAs": [
    "https://www.linkedin.com/company/citare-ai",
    "https://twitter.com/citare_ai",
    "https://www.crunchbase.com/organization/citare",
    "https://github.com/citare-ai"
  ]
}

Watch out: Common mistakes: sparse properties (only name + url); logo URL pointing to a non-public CDN; address without addressCountry; sameAs containing redirects or stale URLs.

Article schema

Every long-form page

Applies to blog posts, guides, research, and editorial content. The dateModified property is the freshness signal AIO uses for citation weighting — pages with recent dateModified cite at higher rates. Use Article as the default; BlogPosting only if your CMS or audience expects that framing; NewsArticle only for genuine news (it has stricter rich-result rules).

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How to Measure AI Search Visibility",
  "description": "A complete framework for measuring AI search visibility — query design, persona dispatch, citation parsing, surface rate, competitor benchmarking.",
  "image": ["https://citare.ai/guides/measure-ai-search-visibility/hero.png"],
  "author": {
    "@type": "Person",
    "name": "Ravi RDP",
    "url": "https://citare.ai/team/ravi"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Citare",
    "url": "https://citare.ai",
    "logo": {
      "@type": "ImageObject",
      "url": "https://citare.ai/logo.png"
    }
  },
  "datePublished": "2026-05-04",
  "dateModified": "2026-05-20",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://citare.ai/guides/measure-ai-search-visibility"
  },
  "keywords": "AI search visibility measurement, surface rate, persona dispatch"
}

Watch out: Update dateModified whenever you make non-trivial revisions. Don't game it — Google penalizes dateModified updates without real content changes.

FAQPage schema

Highest single AIO citation lift

The single schema with the largest measurable lift on AIO citation rate. AI platforms extract question-answer pairs directly from FAQPage mainEntity into their generated answers. Effective FAQ content has three properties: question phrasing matches conversational queries (not SEO target lists), answers are direct and self-contained, answers are factually dense with specific claims.

{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is Generative Engine Optimization (GEO)?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "GEO is the practice of optimizing your brand and content to be cited or recommended by AI-powered search platforms including Google AI Overview, ChatGPT, Gemini, Claude, and Perplexity."
      }
    },
    {
      "@type": "Question",
      "name": "How is GEO different from SEO?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "SEO optimizes pages to rank in Google's link-based results. GEO optimizes content to be cited in AI-generated answers. Ranking logic, measurement methodology, and optimization tactics are structurally different."
      }
    }
  ]
}

Watch out: Schema FAQ MUST match visible page content — Google penalizes invisible-FAQ tricks. Aim for 8-15 questions per FAQPage on content-heavy guides. Front-load the answer; AIO extracts the first 1-3 sentences typically.

Product schema

Ecommerce — every product page

Powers ecommerce AI citation in 'best of', 'top', and recommendation queries. Universal identifiers (sku, gtin, mpn, isbn) help AI platforms disambiguate products. aggregateRating + review are heavily cited as social proof signals in recommendation queries.

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Citare Pro Plan",
  "description": "AI search visibility monitoring across ChatGPT, Gemini, Claude, Perplexity, and Google AI Overview with persona-anchored dispatch and competitor benchmarking.",
  "image": ["https://citare.ai/products/pro/hero.png"],
  "brand": { "@type": "Brand", "name": "Citare" },
  "sku": "CITARE-PRO-MONTHLY",
  "offers": {
    "@type": "Offer",
    "url": "https://citare.ai/pricing",
    "priceCurrency": "USD",
    "price": "119.00",
    "priceValidUntil": "2026-12-31",
    "availability": "https://schema.org/InStock",
    "seller": { "@type": "Organization", "name": "Citare" }
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.7",
    "reviewCount": "42"
  }
}

Watch out: For multi-variant products (size/color/configuration), use ProductGroup with hasVariant arrays of individual Product items. Do not collapse variants into a single Product if they have different prices or SKUs.

LocalBusiness schema

Physical-location pages

Powers AIO geo-contextualization (city-level brand citation) and Gemini local query handling. Required for any brand with physical presence. Multi-location brands should deploy distinct LocalBusiness schema on each location's landing page — don't collapse all locations into homepage Organization schema, that dilutes geo-specific signals.

{
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "name": "Citare HQ",
  "image": "https://citare.ai/locations/bangalore.jpg",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "123 MG Road",
    "addressLocality": "Bangalore",
    "addressRegion": "KA",
    "postalCode": "560001",
    "addressCountry": "IN"
  },
  "geo": {
    "@type": "GeoCoordinates",
    "latitude": 12.9716,
    "longitude": 77.5946
  },
  "openingHoursSpecification": [
    {
      "@type": "OpeningHoursSpecification",
      "dayOfWeek": ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"],
      "opens": "09:00",
      "closes": "18:00"
    }
  ],
  "telephone": "+91-80-1234-5678",
  "priceRange": "$$",
  "url": "https://citare.ai/locations/bangalore",
  "areaServed": "India"
}

Watch out: Include geo coordinates + openingHoursSpecification for full AIO eligibility on 'near me' and 'open now' queries. priceRange uses $ symbols ($ to $$$$).

HowTo schema

Procedural content

Highly citable structure for 'how to X' queries on AIO and Perplexity. Rewards explicit step structure with names, descriptions, and optional URLs anchoring each step to a section of the page. totalTime in ISO 8601 duration format (PT15M = 15 minutes).

{
  "@context": "https://schema.org",
  "@type": "HowTo",
  "name": "How to Configure robots.txt for AI Crawlers",
  "description": "Step-by-step guide to allowing GPTBot, ClaudeBot, PerplexityBot, and Google-Extended in robots.txt for AI search visibility.",
  "totalTime": "PT15M",
  "tool": [
    { "@type": "HowToTool", "name": "Text editor" },
    { "@type": "HowToTool", "name": "FTP or file system access to web root" }
  ],
  "step": [
    {
      "@type": "HowToStep",
      "name": "Open your robots.txt file",
      "text": "Locate robots.txt at the root of your web server. If none exists, create one.",
      "url": "https://citare.ai/ai-bot-crawlers#step-1"
    },
    {
      "@type": "HowToStep",
      "name": "Add the AI crawler allow list",
      "text": "Add named-bot allow rules for Googlebot, Google-Extended, Bingbot, GPTBot, ClaudeBot, and PerplexityBot.",
      "url": "https://citare.ai/ai-bot-crawlers#step-2"
    }
  ]
}

Validation tools

Use both validators on every priority page before deploy. Google's tool is the practical one for rich-result eligibility; the schema.org validator catches issues Google's may miss.

Google Rich Results Test

Tests for Google rich-result eligibility. Validates schema and shows preview. Use for every priority page.

Schema Markup Validator

Vendor-neutral validation against schema.org standards. More lenient than Google's tool; catches edge cases.

Google Search Console

Once indexed, GSC reports schema errors + warnings across all pages. Set up email alerts for new errors.

Citare JSON-LD Inspector

Free tool — paste a URL, see every JSON-LD block on the page. Faster than DevTools for triage.

Citare JSON-LD Generator

Free tool — form-driven builder for Organization, Article, FAQPage, Product schemas. No code required.

Citare Schema Coverage Audit

Free tool — score your page's AI citation readiness 0-100 including schema coverage + content patterns.

Seven mistakes that produce broken or sparse schema

1

Schema doesn't match visible content

Google explicitly penalizes this. If your FAQPage schema lists 10 questions but the visible page only shows 3, you fail validation. The 'invisible schema' trick is now an active negative signal — visible content and schema must agree.

2

Missing required properties

Organization requires name + url. Article requires headline + image + author + datePublished + publisher. Product requires name + offers. Validators flag missing required; AI platforms simply ignore the schema block.

3

Wrong nesting — strings where objects expected

Article.author should nest a Person object, not a bare string. Product.offers.seller should nest an Organization. Flat string values where objects are expected silently fail extraction.

4

Stale datePublished / dateModified

Update dateModified whenever the page changes. Stale dates depress AIO citation. Don't game it — Google penalizes dateModified updates without real content changes. Schema accuracy is itself a citation signal.

5

Generic placeholder values in production

'Brand Name' or 'Lorem Ipsum' left in production schema is the most common embarrassment. Validators won't always catch it. Audit before deploy.

6

Multiple conflicting schemas on one page

Three different Article schema blocks on one page tells AI platforms nothing. Consolidate into one block per type per page. Multiple types (Article + FAQPage + Breadcrumb) is correct; multiple of the same type is broken.

7

sameAs URLs that redirect or 404

Organization sameAs must contain canonical URLs, not redirects. A LinkedIn URL that 301s to a new handle weakens entity recognition. Audit sameAs URLs quarterly; replace stale entries.

Implementation patterns

The same JSON-LD can be embedded several ways. Pick the one that matches your stack.

Static HTML

Embed JSON-LD directly in the HTML <head>:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Brand Name"
}
</script>

Next.js / React (server-rendered)

Render JSON-LD via a small component using dangerouslySetInnerHTML in your layout or page:

// app/blog/[slug]/page.tsx
import { JsonLd } from "@/components/JsonLd";

export default async function BlogPost({ params }) {
  const post = await fetchPost(params.slug);

  return (
    <>
      <JsonLd
        data={{
          "@context": "https://schema.org",
          "@type": "Article",
          "headline": post.title,
          "datePublished": post.publishedAt,
          "dateModified": post.updatedAt,
          "author": { "@type": "Person", "name": post.author.name },
          "publisher": { "@type": "Organization", "name": "Citare" },
        }}
      />
      <article>{post.body}</article>
    </>
  );
}

// components/JsonLd.tsx
export function JsonLd({ data }: { data: unknown }) {
  return (
    <script
      type="application/ld+json"
      dangerouslySetInnerHTML={{ __html: JSON.stringify(data) }}
    />
  );
}

CMS-driven sites

Major CMSes have schema plugins:

  • WordPress — Yoast SEO, Rank Math, Schema Pro
  • Sanity — custom Studio plugins or rendered server-side at build
  • Contentful — render via your application layer
  • Webflow — embed via the custom-code field
  • Shopify — built-in Product schema; supplement with Yoast or Schema App for Article + Organization

Plugin-driven schema is convenient but check the output. Plugins frequently produce minimal schema (just type and name); add missing properties manually.

Frequently asked questions

Why do AI platforms prefer JSON-LD over body text?

AI platforms face an extraction problem when generating answers. From any given page: what is the canonical brand name? What is the price? Who is the author? When was it last updated? Body text is ambiguous; JSON-LD is unambiguous. When a page provides both, AI platforms prefer JSON-LD for factual extraction — the citation context becomes more reliable when the model extracts structured claims rather than parsing prose.

Should I use JSON-LD, Microdata, or RDFa?

JSON-LD. Google explicitly prefers it; AI platforms parse it most reliably; it's decoupled from visible HTML structure so design changes don't break your structured data. Microdata and RDFa are legacy — they work but are not the recommended path in 2026.

Should I add schema to every page or only key pages?

Every page that serves a clear purpose should have schema appropriate to its content type. Homepage → Organization + WebSite. Article pages → Article. Product pages → Product. FAQ pages → FAQPage. Plain pages → WebPage. Schema-on-every-page is the goal state. There is no penalty for legitimate schema across many pages.

Is there a penalty for too much schema?

No penalty for legitimate schema across many pages. There IS penalty for schema that doesn't match visible content, schema with placeholder values, or duplicate / conflicting schemas on the same page. Quality and accuracy matter, not quantity.

How long until Google picks up new schema?

24-72 hours for Googlebot to recrawl and notice schema changes. Rich Results Test reflects new schema within hours of deploy. AI Overview citation lift from new schema typically registers in 4-8 weeks as Google's relevance evaluation cycles.

Can I use multiple schema types on one page?

Yes. A blog post might carry Article + Person (author) + Organization (publisher) + FAQPage (Q&A section) + BreadcrumbList — all in one HTML file. Use multiple <script type='application/ld+json'> blocks or combine into a single @graph array. Both work; @graph is preferred for pages with many types.

Do AI crawlers use the same schema as Google?

Effectively yes. AI platforms read the same schema.org JSON-LD that Google uses for rich results. Some platforms (notably Perplexity) are adding proprietary extensions, but the core schema.org vocabulary is universal. Deploy standard schema.org first; add proprietary extensions only where they unlock specific platform features.

What's the difference between schema.org and JSON-LD?

Schema.org is the vocabulary — the dictionary of types and properties (Organization, Article, Product, etc.). JSON-LD is the format used to express that vocabulary in JSON syntax embedded in HTML. Schema.org defines what 'Organization' means; JSON-LD specifies how to write it in your HTML. They are complementary, not alternatives.

Generate or inspect schema in 30 seconds

Citare's free tools cover the JSON-LD workflow end-to-end. Generate schema from a form, inspect what your competitors deploy, score your AI citation readiness, and check structured data coverage — all without a signup.

Related