Reference 10 min read

GEO Glossary: Complete Dictionary of AI Citation and Generative Engine Optimization Terms

Every term you need to understand AI visibility, from AI Overviews and E-E-A-T to Princeton's 9-method framework and llms.txt.

Generative Engine Optimization (GEO) has its own vocabulary. AI platforms, research frameworks, technical standards, and structured data types all come with terminology that matters for understanding and implementing AI citability strategy. This glossary defines every major term — bookmark it as your reference.

Core GEO Concepts

GEO (Generative Engine Optimization)
The practice of optimizing web content to appear in AI-generated responses from systems like ChatGPT, Google AI Overviews, Perplexity, Claude, and Microsoft Copilot. Analogous to SEO for traditional search engines, but focused on different content signals. While SEO optimizes for ranked link visibility, GEO optimizes for being cited as a source in generated answers. The GEO market is projected to grow from $848 million in 2024 to $33.7 billion by 2034.
GEO vs SEO
SEO (Search Engine Optimization) focuses on ranking in Google's traditional blue-link results through keyword optimization, backlinks, and technical factors. GEO focuses on being cited in AI-generated answers through E-E-A-T signals, structured data, statistical content, and authoritative tone. Research shows that 88% of AI citation sources are NOT in the organic SERP top 10 — strong SEO does not guarantee AI citation.
AI Citability
The likelihood that an AI system will cite a specific piece of content when generating an answer related to that content's topic. AI citability is determined by structured data quality, E-E-A-T signals, content structure, GEO optimization, and AI bot accessibility. Measured by CiteOps.ai as a 0–100 score.
Princeton 9-Method Framework
A framework from Princeton University research identifying nine content optimization techniques that most reliably increase AI citation frequency. The nine methods, in order of impact weight: (1) citations to authoritative sources (0.40), (2) statistics and numerical data (0.37), (3) direct quotations (0.30), (4) authoritative tone (0.25), (5) fluency (0.225), (6) easy-to-understand language (0.20), (7) technical terms (0.18), (8) vocabulary diversity (0.15), and (9) avoidance of keyword stuffing (−0.10 penalty).
AI Citation
When an AI system references a specific website or piece of content as a source in its generated response. AI citations typically appear as numbered references, linked source cards, or inline attributions. Being cited drives brand visibility even when users do not click through to the source page.
A search paradigm in which AI generates a synthesized answer to a query rather than returning a list of links. Examples include Google AI Overviews, Perplexity's answer engine, and ChatGPT's browsing mode. Generative search changes the competitive landscape because only the sources cited in the generated answer receive visibility.
Zero-Click AI Answers
AI-generated responses that fully answer a user's query without requiring a click to any source website. Over 1 billion queries per week are answered by AI without users visiting underlying websites. Zero-click AI answers represent a direct threat to organic traffic for brands not optimized for AI citation.

E-E-A-T Signals

E-E-A-T (Experience, Expertise, Authority, Trust)
Google's framework for evaluating content quality and credibility, used in its Search Quality Rater Guidelines and as a training signal for AI systems. First introduced as E-A-T (without Experience), the Experience dimension was added in 2022 to reward first-hand knowledge. High E-E-A-T content is substantially more likely to be cited by AI systems.
Experience (E-E-A-T)
The first E in E-E-A-T. Signals that the content author has direct, first-hand experience with the topic. Detected through author bios describing personal background, first-person pronouns indicating direct involvement, case studies with specific details, and content with verifiable specificity (dates, locations, names).
Expertise (E-E-A-T)
The second E in E-E-A-T. Signals professional or subject-matter knowledge. Detected through Person schema markup with jobTitle and affiliation, professional credentials mentioned in author bios, technical depth and precision in content, industry-specific terminology used correctly, and citations to academic or professional sources.
Authority (E-E-A-T)
The A in E-E-A-T. Signals that the content creator or organization is recognized as an authority by others. Detected through Organization schema, external citations from high-authority domains, press mentions and media coverage, industry awards and recognition, and participation in professional organizations.
Trust (E-E-A-T)
The T in E-E-A-T. Signals that the site is safe, transparent, and legitimate. Detected through HTTPS encryption, privacy policy presence, contact information (email, phone, address), terms of service, and security indicators for e-commerce sites.
Author Schema
JSON-LD structured data using @type: Person to identify content authors. Includes properties like name, jobTitle, url (author profile page), sameAs (social profiles), and affiliation (Organization). Author schema is a strong E-E-A-T signal that tells AI systems who created the content and validates their credentials.

AI Platforms

AI Overviews
Google's AI-generated summaries shown at the top of search results, powered by Google's Gemini model. Previously called Search Generative Experience (SGE). AI Overviews synthesize information from multiple web sources and cite 3–5 pages per answer. Optimization for AI Overviews requires strong E-E-A-T, Core Web Vitals, and Organization schema.
ChatGPT (with Browsing)
OpenAI's ChatGPT in browsing mode, which retrieves real-time web content when answering queries. Primarily trained on Common Crawl and curated web content. ChatGPT favors content with clear E-E-A-T signals, FAQ content, and unique statistical claims. GPTBot must be allowed in robots.txt for OpenAI to access your content.
Perplexity AI
An AI-powered answer engine that retrieves real-time web content and presents synthesized answers with inline citations. Perplexity favors factual content, external citations to primary sources, and recency signals. PerplexityBot must be allowed in robots.txt. Perplexity's citation model is especially transparent, showing source cards for each cited page.
Microsoft Copilot
Microsoft's AI assistant powered by OpenAI models and Bing's search index. Copilot favors Open Graph markup, Bing Webmaster Tools verification, and well-structured content. Optimization for Copilot is closely aligned with Bing SEO best practices alongside standard GEO signals.
Claude (Anthropic)
Anthropic's AI assistant trained on diverse web content with emphasis on helpfulness and safety. Claude favors clear structure, Q&A formats, and cited claims. ClaudeBot and anthropic-ai must be allowed in robots.txt for Anthropic to access and potentially include your content.
SGE (Search Generative Experience)
Google's earlier name for what is now called AI Overviews. SGE was the experimental phase (2023–2024) before Google rolled out AI-generated summaries globally. The term SGE is still used in some research and documentation but has been replaced by "AI Overviews" in Google's current product branding.

Structured Data and Schema

JSON-LD (JavaScript Object Notation for Linked Data)
The structured data format recommended by Google for adding machine-readable metadata to web pages. JSON-LD is embedded in HTML as a <script type="application/ld+json"> block and does not require modifying existing HTML markup. AI systems parse JSON-LD directly to understand page entities and relationships.
FAQPage Schema
JSON-LD structured data of @type: FAQPage containing mainEntity arrays of Question and Answer pairs. The highest-weight schema type for AI citability. AI systems are specifically trained to identify and present Q&A format content. Pages with FAQPage schema are substantially more likely to have their content extracted verbatim in AI-generated answers.
Organization Schema
JSON-LD structured data of @type: Organization that establishes a site's entity identity. Key properties: name, url, logo (ImageObject), sameAs (array of social profile URLs), contactPoint. Organization schema tells AI systems who owns the site and validates authority signals.
JSON-LD structured data of @type: BreadcrumbList containing ListItem elements with position, name, and item (URL). Provides content hierarchy context that helps AI systems understand where a page fits within a site's structure. Used to improve content organization signals.
DefinedTermSet Schema
JSON-LD structured data of @type: DefinedTermSet containing hasDefinedTerm arrays of DefinedTerm objects. The appropriate schema for glossary and dictionary pages. Each DefinedTerm includes name, description, termCode, and url. Signals to AI systems that this page is a definitional reference resource, increasing citation likelihood for term-related queries.
HowTo Schema
JSON-LD structured data of @type: HowTo that marks up step-by-step instructional content. Includes name, description, totalTime, estimatedCost, and step arrays with HowToStep items. AI systems frequently extract and present HowTo content for procedural queries.
llms.txt
A proposed standard file placed at a website's root (as /llms.txt) to help AI language models understand the site. Proposed by Jeremy Howard of fast.ai. Markdown format. Provides a concise, machine-readable summary with sections for What We Do, How It Works, Pricing, and Links. The companion file llms-full.txt (at /llms-full.txt) provides expanded detail for AI systems that can process more content.
llms-full.txt
The detailed companion to llms.txt, following the llms-full.txt specification from llmstxt.org. Markdown format, under 100KB. Contains expanded feature descriptions, methodology explanations, pricing details, use cases, and technical documentation. Used by AI systems that can process more content when building comprehensive knowledge about a website.

AI Crawlers and Robots.txt

GPTBot
OpenAI's web crawler, used to collect training data for ChatGPT and other OpenAI models. User agent string: GPTBot. Can be controlled via robots.txt with: User-agent: GPTBot / Disallow: / (to block) or Allow: / (to permit). Blocking GPTBot prevents OpenAI from including your content in future model training and may reduce ChatGPT citation frequency.
Google-Extended
Google's crawler specifically for AI training data, separate from Googlebot (which indexes for traditional search). User agent string: Google-Extended. Blocking Google-Extended prevents your content from being used to train Google's AI models (Gemini, etc.) but does not affect traditional Google Search indexing. Mistakenly blocking Google-Extended can reduce AI Overviews citation frequency.
PerplexityBot
Perplexity AI's web crawler for real-time content retrieval. User agent string: PerplexityBot. Blocking PerplexityBot prevents Perplexity from citing your content in its answers. Since Perplexity is a real-time answer engine (not primarily a training data crawler), blocking it has immediate impact on citation frequency.
ClaudeBot
Anthropic's web crawler for training data and content retrieval. User agent string: ClaudeBot. Also check for anthropic-ai as an additional Anthropic crawler identifier. Blocking ClaudeBot reduces the likelihood of your content being cited by Claude.
robots.txt
A file at the root of a website (at /robots.txt) that instructs web crawlers which pages they can and cannot access. Uses User-agent and Disallow/Allow directives. A critical GEO factor: many sites accidentally block AI crawlers through overly broad robots.txt rules. CiteOps.ai audits robots.txt for 9 AI crawler identifiers.

Scoring and Metrics

AI Citability Score
CiteOps.ai's 0–100 composite metric measuring how likely an AI system is to cite a given web page. Calculated from six weighted categories: meta/SEO (15%), schema/structured data (15%), content quality (10%), technical (10%), E-E-A-T signals (25%), and GEO readiness (25%). Score ranges: 90–100 (Excellent), 70–89 (Good), 50–69 (Needs Work), 0–49 (Poor).
AI Ready Score
A focused sub-metric within CiteOps.ai that measures citation-specific readiness with higher weight on GEO and E-E-A-T. Formula: (Schema × 0.25) + (Content × 0.15) + (E-E-A-T × 0.25) + (GEO × 0.35). GEO receives the highest weight because research shows it is the strongest predictor of AI citation.
GEO Readiness Score
A 0–100 score measuring how well content is optimized for generative engine citation, based on the Princeton 9-method framework. Evaluates citations, statistics, quotations, authoritative tone, readability, technical terms, vocabulary diversity, fluency, and keyword stuffing. The GEO Readiness Score contributes 25% of the Overall AI Citability Score.
E-E-A-T Score
A 0–100 score measuring the strength of Experience, Expertise, Authority, and Trust signals on a page. Each of the four dimensions contributes 25%. The E-E-A-T Score contributes 25% of the Overall AI Citability Score, making it the joint-highest weighted category alongside GEO Readiness.
Platform Readiness Score
Per-platform scores within CiteOps.ai indicating how well a page is optimized for each AI system: ChatGPT, Google AI Overviews, Perplexity, Microsoft Copilot, and Claude. Each platform has different citation behaviors and optimization requirements.

Technical Terms

Flesch Reading Ease
A readability formula that scores text on a 0–100 scale based on average sentence length and average syllable count per word. Higher scores indicate easier reading. GEO research shows content with higher readability scores is more frequently cited by AI systems (the "easy to understand" Princeton method). Target range for web content: 60–70.
Core Web Vitals
Google's standardized metrics for measuring user experience: LCP (Largest Contentful Paint — loading performance), CLS (Cumulative Layout Shift — visual stability), INP (Interaction to Next Paint — interactivity), FCP (First Contentful Paint), and TTFB (Time to Blocking First Byte). Poor Core Web Vitals correlate with lower content quality signals in AI training data. CiteOps.ai measures all five metrics.
Keyword Stuffing
The practice of over-repeating keywords or key phrases in content beyond what reads naturally. Penalized by both traditional search engines and AI systems. In the Princeton 9-method framework, keyword stuffing has a weight of −0.10, meaning it actively reduces AI citation likelihood. CiteOps.ai's GEO scoring includes a keyword stuffing detection penalty.
Canonical URL
The preferred URL for a page, specified via <link rel="canonical" href="..."> in the page <head>. Prevents duplicate content issues when a page is accessible at multiple URLs. A missing or incorrect canonical tag can reduce AI system confidence in which version of a page is authoritative.
Open Graph Protocol
A metadata standard developed by Facebook that uses <meta property="og:..."> tags to control how pages appear when shared on social media. Key tags: og:title, og:description, og:image, og:url, og:type. Open Graph tags are also read by AI systems (especially Microsoft Copilot, which uses Bing's indexing) and contribute to the meta/SEO score in CiteOps.ai.
SSRF (Server-Side Request Forgery)
A security vulnerability where an attacker tricks a server into making requests to internal network resources. Relevant to GEO tools that fetch competitor URLs: a tool without SSRF protection could be used to probe internal services. CiteOps.ai blocks private IP addresses (127.x, 10.x, 192.168.x, etc.) in competitor URL input to prevent this attack vector.
Chrome Extension Manifest V3
The current standard for Chrome extensions, using service workers instead of persistent background pages, the chrome.scripting API instead of tabs.executeScript, and chrome.action instead of chrome.browserAction. Manifest V3 has stronger security guarantees than V2. CiteOps.ai is built on Manifest V3.

Quick Reference: Key Statistics

This glossary is maintained by CiteOps.ai and updated as the GEO field evolves. For implementation guidance, see our blog posts on getting cited by ChatGPT and GEO vs SEO.