AEO VC Logo
Generative Engine Optimization: How to Dominate AI Search

Generative Engine Optimization (GEO) is the practice of optimizing content to maximize visibility in AI-generated responses. This groundbreaking research from arXiv demonstrates how strategic content optimization can increase visibility in AI search engines by up to 40%.

Core Stats

~90%Potential visibility increase with GEO optimization & publication

"GEO focuses on maximizing visibility within AI-generated answers through evidence-based optimization."

- arXiv Research

Generative Engine Optimization: How to Dominate AI Search

Based on groundbreaking research published on arXiv, Generative Engine Optimization (GEO) represents the next evolution in search optimization—focusing on AI-generated responses rather than traditional search rankings.

Frequently Asked Questions

What problem is this paper trying to solve?

Generative AI systems like ChatGPT, Claude, Gemini and Perplexity are becoming answer engines, not just side tools. Google’s AI Overviews already appear on ~18% of queries, with click-through on those results dropping from 15% to 8%; over a quarter of such searches become zero-click. Perplexity alone handled about 780M queries in May 2025. Yet brands have almost no visibility into how these systems pick sources. Traditional SEO only optimizes for ranking links, not being cited as the authority inside AI answers. This paper maps how AI engines source evidence, how that differs from Google, and what brands must do to become the default answer. It builds on the 2024 GEO work by Aggarwal et al., which showed sites could increase LLM citations by re-engineering content for model preferences.

How did the authors study real AI search behavior?

They started from users, not engines. The team sampled 10 Reddit communities where people openly post prompts and screenshots of AI conversations (e.g., /r/ChatGPT, /r/ClaudeAI), collecting 1,000 “hot” and 1,000 “new” posts per subreddit over a month in 2025. They manually coded these threads to build (1) a taxonomy of 12 high-level AI intents (coding help, prompt tuning, learning/explanation, workflow automation, content drafting, personal advice, etc.) and (2) a separate taxonomy of purchase-related behaviors. This grounded the rest of the experiments in what people actually ask AIs, not synthetic prompts.

What kinds of tasks do people use AI for most?

From the Reddit sample, AI is heavily used as a working assistant rather than just a Q&A box. Frequent intents include: - Coding assistance and debugging - Improving prompts / making ChatGPT smarter - Explaining concepts and teaching - Drafting and rewriting content - Workflow / automation (e.g., generating scripts, checklists, outreach emails) - Decision support and product comparison The taxonomy spans 12 meta-intents and shows that many prompts are long, multi-step, and context-rich rather than simple keyword queries. This supports the paper’s core claim: AI search is moving from “retrieve ten blue links” to “act as an agent that reasons, summarizes and recommends.”

How do people actually use AI for shopping and purchases?

The authors zoomed in on eight shopping-focused subreddits and coded prompts into 14 purchase-intent categories, covering discovery (“what brands exist?”), shortlisting, side-by-side comparisons, spec checks, budget planning, risk checks (returns, warranty, scams), and post-purchase questions. Qualitatively, the majority of these conversations are decision support, not “where can I buy X.” Users ask AIs to assemble checklists, explain trade-offs, and sanity-check deals across categories from laptops and cameras to skincare and credit cards. Many queries mirror funnel stages marketers care about—awareness, consideration, justification—but happen entirely inside the AI chat window. The takeaway: the battle for the shopping cart is increasingly fought in the agent’s reasoning process rather than on SERP ads or brand sites.

What strategic implications do these user behaviors have for brands?

The paper argues that these behaviors have four big implications for brands: - **Shortlist power:** Many prompts ask AI to "narrow down" options. You're either on the shortlist or you don't exist—there's no page two. - **Agency over retrieval:** Users ask AI to plan and decide ("plan my build", "design my stack") instead of just "give me links", so engines favor sources that help them reason, compare, and justify. - **Trust over awareness:** Questions like "Is X legit?" and "What's the catch?" push engines toward independent reviewers and expert explainers, not brand pages. - **Full-lifecycle influence:** Prompts cover pre-purchase, usage, troubleshooting, and resale. Brands that only optimize for "buy now" moments miss most AI-mediated touchpoints. Together, this motivates a Generative Engine Optimization (GEO) strategy rather than SEO alone.

How were the main engine-vs-Google experiments designed?

The paper runs several large-scale experiments: - **Regional & vertical comparison: 1,000 ranking prompts (10 categories × 100) across U.S. and Canada, focusing on Consumer Electronics, Automotive, and Software. For each query they collected Google’s top-10 results and GPT’s up-to-10 citations, normalized to domains, then measured Top-5/10 overlap and media-type mix (Brand / Earned / Social).** - **Cross-language: Parallel prompts in English plus five languages (Spanish, French, German, Chinese, Japanese), run through Google, Claude, GPT, Perplexity, and Gemini; they computed domain-overlap Jaccard scores and language shares.** - **Brand-set experiment: 10 verticals × 10 prompts on “best-known” vs “niche” brands, run across Claude, GPT, Perplexity, Gemini to compare answers and media mix.** - **Vertical freshness & local search experiments: Consumer electronics vs automotive, plus “near me” categories like restaurants and dentists.** - **All citations were classified as Brand, Earned, or Social.**

How much do AI engines rely on Brand vs Earned vs Social compared with Google?

Across experiments, AI engines are overwhelmingly Earned-heavy while Google is more Brand-heavy and Social-inclusive. In the brand-set study for well-known brands, ChatGPT’s links were 93.5% Earned, 6.5% Brand, 0% Social; Claude: 87.3% Earned, 6.8% Brand, 5.9% Social. Perplexity surfaced far more Social: 23.8% Social, 67.4% Earned, 8.8% Brand; Gemini sat in the middle with 63.4% Earned, 25.1% Brand, 11.5% Social. For niche brands, ChatGPT went to 95.1% Earned and still 0% Social; even the loosest engine (Perplexity) kept Earned at 73.4%. In general intent-based analysis, Google tends to follow Brand ≫ Earned ≫ Social, while AI systems follow Earned ≫ Brand ≫ Social.

How consistent are different AI engines with each other?

For brands (the actual names recommended), AI engines agree surprisingly often: For well-known brands, answer agreement across engine pairs ranges from 76–81%. For niche brands, it’s slightly lower but still 71–76%. So if a brand makes it into one engine’s short list, it’s likely to appear in others. However, domains (which sites are cited) are much less stable and vary by vertical. Cross-language domain overlaps are often below 0.1 Jaccard for Google and only modestly higher for Gemini or Perplexity; GPT’s cross-language overlaps are near zero, effectively swapping site ecosystems by language. This means GEO is simultaneously a multi-engine game (for brands) and a per-engine, per-language game (for which specific pages get cited).

What did the local search experiments show?

The authors tested “near me”-style prompts across categories like restaurants, dentists, plumbers, electricians, and home cleaning, comparing each AI engine’s suggested businesses with Google’s local results. Top-10 overlaps with Google were generally low: In home cleaning, for example, Google vs Claude shared only about 20.6% of domains in the top-10; other pairings and categories were often similar or lower. Overlaps between AI engines were sometimes higher than AI-Google overlap, implying the AIs may share a “parallel” local ecosystem distinct from Google’s map pack and review platforms. Practically, being strong in Google Maps and local SEO does not guarantee presence in AI local recommendations; brands need specific GEO work for local queries.

How sensitive are AI results to language and paraphrasing of queries?

Language: Google’s cross-language domain overlap typically sits between 0–0.1 Jaccard; Gemini and Perplexity are slightly higher in some language pairs (e.g., Gemini up to ≈0.32 in EN–DE laptops), while Claude is much higher (reusing authority domains). GPT is lower than Google, near zero overlap—it almost completely swaps site ecosystems by language. Across languages, all AI engines keep the Earned ≫ Brand ≫ Social pattern. GPT and Perplexity are most “local-language heavy,” Claude is extremely English-biased, and Gemini is balanced. Paraphrasing: Within one language, paraphrasing rarely changes brands. Brand overlaps across paraphrases often exceed 0.6–0.7 for GPT and sit around 0.4–0.7 for Gemini. Domains are more fluid but still more stable than in the language experiment. Google’s media mix shifts more than AIs when prompts are reworded.

How fresh and diverse are the domains AIs cite in key verticals?

In consumer electronics, all engines showed high coverage of dated links (Claude ≈92.5%), with mean article age around 117 days and median 62 days for Claude; GPT was similar. Perplexity drew from a more heterogeneous mix—about 31.6% Brand, 53.3% Earned, 15.1% Social—mixing YouTube and retailers with editorial sites like RTINGS and CNET. Claude and GPT were more homogenous, heavily Earned and editorial. In automotive, Claude and GPT again leaned on Earned (≈81–83%), with only ~17–18% Brand, and their content was older—mean age ≈331 days for Claude. Perplexity returned more Brand (34.6%) and Social (9.9%) sources and fresher, retail-linked content. Overall: AI engines give strong mid-term editorial coverage but can lag in fast-moving categories unless engines like Perplexity bridge in fresher social/retail content.

What is Generative Engine Optimization (GEO) according to this paper?

GEO is defined as the discipline of optimizing content so AI systems cite and recommend you, not just rank your pages. It builds on earlier work by Aggarwal et al. (2024), who showed that “GEO-tuned” pages earned significantly more LLM citations than unoptimized baselines, by structuring data and explanations for models rather than humans alone. This paper extends GEO from single-model experiments to a full landscape across Google, Claude, ChatGPT, Gemini, and Perplexity, adding dimensions like media-type mix, language sensitivity, freshness, and local search. The central empirical finding is clear: AI systems show an overwhelming, consistent bias toward Earned media across regions, languages, and verticals, which demands a different optimization strategy than classic SEO.

What concrete GEO agenda does the paper propose for practitioners?

The authors outline a multi-pillar GEO strategy: Engineer for agency & scannability: Treat your site as an API. Implement rigorous technical SEO and rich Schema.org markup for products, specs, prices, reviews, and availability so AI agents can parse you. Dominate Earned media: Shift resources to PR, expert collaborations, and high-authority reviews; backlinks from these domains shape both Google rankings and AI trust. Structure for justification: Provide comparison tables, explicit pros/cons, and clear value claims that can be quoted directly inside answers. Develop language-specific authority: Because engines behave differently across languages, build local-language authority sites and coverage instead of relying solely on English content. Monitor multi-engine, multi-stage presence: Track where you appear (or don’t) across engines, intents, and purchase stages, then iterate content and outreach accordingly.

How different are Google and AI engines in consumer electronics and software?

In consumer electronics, Claude and GPT rely on Earned media for ~93.7% and ~93.6% of citations, with Brand and Social almost absent. Perplexity is more mixed: ~31.6% Brand, 53.3% Earned, 15.1% Social, driven by BestBuy (Brand) and YouTube (Social) alongside RTINGS and CNET. In software, Google in Canada favors Brand (53.8%) and Social (14.4%), with only 31.8% Earned; GPT inverts this to 74.2% Earned, 25.8% Brand, 0% Social. In the U.S., Google is 43.7% Brand, 10.9% Social, 45.4% Earned, while GPT returns 72.7% Earned, 26.7% Brand, and almost no Social. Overall, AI search systematically shifts clicks from vendor and social sites toward editorial “earned” reviews.

What did the overlap experiments show about Reddit and community content?

Overlap between Google and GPT in electronics is modest: smartphones and laptops see only 15–32% overlap at top-5 and 20–41% at top-10 domains. The authors infer that AI engines deprioritize community-driven platforms like Reddit in favor of professional reviews and publisher domains, whereas Google still mixes in more community and brand sources. This helps explain why, even for tech questions where Reddit dominates human search behavior, AI answers tend to cite big review sites rather than forum threads—shrinking the role of user-generated knowledge in discovery.

How do AI engines treat well-known vs niche brands differently?

For well-known brands, Claude cites 87.3% Earned, 6.8% Brand, 5.9% Social; ChatGPT goes further to 93.5% Earned, 6.5% Brand, 0% Social. Perplexity is looser (67.4% Earned, 8.8% Brand, 23.8% Social), and Gemini sits between them (63.4% Earned, 25.1% Brand, 11.5% Social). For niche brands, ChatGPT climbs to 95.1% Earned and 4.9% Brand, again 0% Social; Claude is 86.3% Earned, 10.6% Brand, 3.2% Social; Perplexity 73.4% Earned, 9.1% Brand, 17.5% Social; Gemini 66.4% Earned, 21.2% Brand, 12.7% Social. So smaller brands depend even more on third-party validation: AI engines rarely link to their own sites unless strong earned coverage exists.

How much do AI engines agree with each other on brand recommendations?

Despite different link choices, brand answers are surprisingly consistent. Across well-known brands, average agreement between engine pairs ranges 76–81%; for niche brands it’s slightly lower, 71–76%. This means once a brand “breaks into” AI awareness, it tends to appear across multiple engines. The divergence is more in which domains they cite to justify that choice, not which product names they surface. That strengthens the paper’s claim that GEO should focus on (a) getting into shortlists at all and then (b) shaping which earned outlets dominate the justification layer.

What did the vertical freshness analysis reveal?

Freshness is measured via coverage of dated links, mean/median age, and a 1/(1+age) freshness score. In consumer electronics, Claude has 92.5% of links with dates, mean age ~117 days, median 62 days, freshness score 0.0617 (0.0571 coverage-adjusted). GPT is similar. Perplexity returns somewhat newer but less consistently dated results. In automotive, Claude’s coverage drops to ~61% and mean article age jumps to ~331 days (median 148), with a coverage-adjusted freshness score of 0.0269; GPT behaves similarly. Perplexity again injects fresher, more retail and social content. So AI search is reasonably fresh for gadgets but leans on older, static pages for cars and long-lifecycle products.

How low is AI–Google overlap in local search?

For “near me” queries, overlap in cited domains is small. In the local experiment, domain overlap between AI search and Google is: - Home cleaning: 20.6% - Roofing: 17.1% - Tax preparation: 15.4% - Dentists: 11.9% - Auto repair: 2.5% - IT support: 0.1% The authors conclude that AI engines are much less aligned with Google Maps/local pack than they are in broad verticals like electronics or health. They likely use different signals and business directories, so ranking in Google’s local results does not guarantee presence in AI recommendations, especially in fragmented or specialized services.

What does the language sensitivity experiment tell us?

Across ten verticals and six languages, Google’s cross-language domain overlap is typically 0–0.1 Jaccard, with the highest cell (EN–ES electric vehicles) just above 0.1. Relative to this: - Claude shows much higher cross-language stability, reusing the same authority domains across languages. - Perplexity and Gemini have mostly very low overlaps, with a few moderate cells. - GPT has the lowest cross-language overlap: domain sets across languages are “consistently near-zero,” indicating different site ecosystems per language. However, brand overlap is much higher than domain overlap; engines often recommend the same brands but justify them with different local-language sites.

Does language change the type of sources AI engines use?

When aggregating all verticals and languages, the authors find the engine identity matters more than language: all AIs follow an earned ≫ brand ≫ social pattern, regardless of language. GPT and Claude are the most Earned-heavy; Perplexity and Gemini allocate a larger share to Brand and Social but still keep Earned dominant. So translating content alone is not enough. A French or Japanese version of a brand’s site only helps if the engine already regards that domain (or its local reviewers) as an authority. GEO has to be localized at the ecosystem level—building non-English earned coverage, not just non-English landing pages.

How exactly did the authors classify and date links?

Each URL returned by an engine was: - Normalized to its base domain. - Classified as Brand, Earned, or Social using a rule list of known social domains plus GPT prompts for ambiguous cases. - Crawled for dates in HTML metadata, JSON-LD,

What overarching patterns do the authors emphasize for practitioners?

Across experiments, three consistent patterns emerge: - AI search is structurally Earned-first. In software, GPT shifts from Google’s 43.7% Brand / 10.9% Social / 45.4% Earned mix to 72.7% Earned and near-zero Social; similar reversals appear across verticals. - Social is nearly absent in AI citations. Reddit, YouTube, Quora and similar sites form a small fraction, except in Perplexity which still caps Social near ~15–24% depending on experiment. - Overlap with Google is modest, especially in local and product niches. Smartphones/laptops overlap only 15–32% (top-5), and local services like IT support see as little as 0.1% overlap. Their inference: GEO must treat AI search as its own channel—heavily centered on editorial authority and structured justification—rather than a side effect of traditional SEO.