How Perplexity and ChatGPT Choose Sources: An AEO Deep Dive

Shruti Sonali
Shruti Sonali · · 9 min read

How AI Answer Engines Work

The Basic Architecture

User query
    │
    ▼
┌────────────────────────────┐
│   Query Understanding      │
│   - Intent classification  │
│   - Entity recognition     │
│   - Topic identification   │
└─────────────┬──────────────┘
              │
              ▼
┌────────────────────────────┐
│   Source Retrieval         │
│   - Web search/index       │
│   - Knowledge base         │
│   - Trusted sources        │
└─────────────┬──────────────┘
              │
              ▼
┌────────────────────────────┐
│   Source Evaluation        │
│   - Authority scoring      │
│   - Relevance matching     │
│   - Freshness check        │
└─────────────┬──────────────┘
              │
              ▼
┌────────────────────────────┐
│   Answer Synthesis         │
│   - Information extraction │
│   - Citation assignment    │
│   - Response generation    │
└────────────────────────────┘

The Key Insight

AI systems don't just find information—they evaluate which sources deserve to be cited. This evaluation happens before the answer is generated.

The Source Selection Process

Step 1: Candidate Retrieval

AI first gathers potential sources:

Method What Gets Retrieved
Web search Top-ranking pages for query
Index lookup Pre-crawled authoritative sources
Knowledge base Verified facts and entities

Step 2: Authority Evaluation

Sources are scored on trust signals:

Authority Score = 
  Domain reputation (30%)
  + Topic expertise (25%)
  + Content freshness (20%)
  + External validation (15%)
  + Technical quality (10%)

Step 3: Relevance Matching

How well does the source answer the query?

Factor Evaluation
Topic alignment Does content cover query topic?
Answer presence Is there a direct answer?
Depth How thoroughly is topic covered?
Specificity Does it match query specificity?

Step 4: Citation Assignment

Deciding what to cite:

  • Factual claims → Cite authoritative source
  • Statistics → Cite original research
  • Definitions → Cite recognized authority
  • Opinions → Cite expert or don't cite

Perplexity's Citation System

How Perplexity Works

Perplexity is explicitly designed as an "answer engine" with visible citations:

Query: "What is the best CRM for startups?"

Perplexity response:
"For startups, HubSpot CRM is often recommended as the 
best option due to its free tier and scalability [1]. 
Pipedrive offers simpler pipeline management [2], while 
Salesforce provides enterprise features but at higher 
cost [3]."

[1] 
hubspot.com/crm/startups

[2] 
pipedrive.com/blog/startup-crm

[3] 
salesforce.com/solutions/startups

Perplexity Citation Factors

Factor Weight Description
Domain authority High Established, trusted domains
Content recency High Recently updated content
Direct relevance High Exact topic match
First-party source Medium Original over aggregated
Structured content Medium Easy to parse and extract

Getting Cited by Perplexity

What works:

  • Clear, direct answers to specific questions
  • Recent publication dates
  • Authoritative domain (not new/unknown)
  • Original data or insights
  • Well-structured content with headings

What doesn't work:

  • Generic, rehashed content
  • Outdated information
  • Thin content without substance
  • Paywalled content

ChatGPT's Source Evaluation

How ChatGPT Uses Sources

ChatGPT with web browsing:

  1. Searches the web for relevant pages
  2. Reads and evaluates content
  3. Synthesizes answer
  4. Cites sources (when using browsing)

ChatGPT's Implicit Authority Model

Even without browsing, ChatGPT's training reflects source authority:

Source Type How It Influences Training
Wikipedia Heavy influence on definitions
Academic papers Scientific claims
Major news Current events
Official docs Technical information
Quality blogs Industry perspectives

Optimization for ChatGPT

For training influence (long-term):

  • Publish content that gets linked/cited widely
  • Appear on Wikipedia and Wikidata
  • Get mentioned in authoritative sources

For browsing citations (immediate):

  • Rank well in search (ChatGPT uses Bing)
  • Have clear, extractable answers
  • Maintain current, accurate content

Google AI Overview Sources

How AI Overviews Select Sources

Google's AI Overviews draw from:

  1. Search index - Already-ranked pages
  2. Knowledge Graph - Verified entity data
  3. Structured data - Schema markup
  4. Quality signals - E-E-A-T indicators

The AI Overview Algorithm

Source Selection for AI Overview:

1. Identify top-ranking pages for query
2. Evaluate E-E-A-T signals
3. Check for direct answer content
4. Prefer pages with:
   - Schema markup
   - Clear heading structure
   - Answer-first content
   - Recent updates
5. Synthesize from 3-5 top sources
6. Display with source links

What Gets Featured

Content Type AI Overview Likelihood
Direct definitions Very High
Step-by-step guides High
Comparison tables High
Expert analysis Medium-High
News/current events Medium
Opinion content Low

Authority Signals That Matter

Domain-Level Signals

Signal Why It Matters
Domain age Established presence
Backlink profile External validation
Brand searches Recognition
HTTPS Security baseline
Clean link profile No spam signals

Content-Level Signals

Signal Why It Matters
Author expertise Named, credentialed authors
Publication date Recency indication
Update frequency Maintained content
Citation of sources Verifiable claims
Depth of coverage Comprehensive treatment

Entity-Level Signals

Signal Why It Matters
Knowledge Graph presence Verified entity
Wikipedia mention Third-party validation
Consistent NAP Verifiable identity
Schema markup Explicit entity data
Cross-platform presence Corroboration

Content Factors That Drive Citations

The Citation-Worthy Content Formula

Citation Likelihood = 
  Uniqueness (Is this information found elsewhere?)
  × Authority (Should readers trust this source?)
  × Extractability (Can AI easily use this?)
  × Relevance (Does this answer the query?)

What Makes Content Extractable

Good (easy to extract):

What is Schema Markup?

Schema markup is a standardized vocabulary of tags
that you add to HTML to help search engines and AI
systems understand your content. It uses JSON-LD
format to define entities, relationships, and
properties in machine-readable format.



**Bad (hard to extract):**


```markdown
You might be wondering about how to make search 
engines understand your content better. Well, 
there's this thing called schema markup, which is 
basically like... let me explain with an example...

Content Patterns That Get Cited

Pattern Example Why It Works
Definition + context "X is Y. It works by..." Direct answer + depth
Statistics + source "According to [study], 65%..." Unique data
Step-by-step "1. First... 2. Then..." Actionable
Comparison table "Feature A vs B" Decision support
Expert quote "[Expert] says..." Authority

What Gets Ignored

Content That AI Skips

Content Type Why Ignored
Generic definitions Available everywhere
Thin content No depth to extract
Outdated information Recency signals fail
Opinion without expertise No authority
Heavily promotional Bias signals
Paywalled content Can't access
Duplicate content Original preferred

Red Flags for AI Systems

Content signals that reduce citations:

• No author attribution
• No publication date
• Generic stock photos
• Thin word count
• Keyword stuffing
• Excessive ads
• Poor mobile experience
• Slow load times

Optimization Strategies

Strategy 1: Be the Definitive Source

Create content that AI must cite because no alternative exists:

  • Original research and data
  • First-party case studies
  • Expert interviews
  • Industry surveys
  • Proprietary analysis

Strategy 2: Structure for Extraction

[Question as heading]

[Direct answer in first 40-60 words]

[Supporting section]

[Elaboration with specifics...]

[Related topic]

[Additional context...]



### Strategy 3: Build Entity Authority

1. **Claim your entity** - Knowledge panel, Wikipedia
2. **Implement schema** - Organization, Person, Product
3. **Build consistency** - Same info everywhere
4. **Get corroboration** - Third-party mentions

### Strategy 4: Maintain Freshness


| Action            | Frequency     |
| ----------------- | ------------- |
| Review and update | Quarterly     |
| Add new data      | As available  |
| Update dates      | After changes |
| Check accuracy    | Ongoing       |


### Strategy 5: Target Specific Query Types


| Query Type     | Content Strategy    |
| -------------- | ------------------- |
| "What is X"    | Definition-focused  |
| "How to X"     | Step-by-step guide  |
| "X vs Y"       | Comparison content  |
| "Best X for Y" | Recommendation list |
| "X cost/price" | Pricing page/table  |

References

Shruti Sonali

Written by

Shruti Sonali

Web Designer & Strategist

Passionate about creating beautiful, functional websites that help businesses grow.

Explore Topics

Get Your Free Site Teardown

Watch a 5-minute video breakdown of what's working, what's broken, and what's costing you leads. No pitch—just fixes you can use today.

Request Your Audit

Can AI Search Find You?

Google is just the start. See if ChatGPT, Perplexity, and other AI assistants are recommending you—or sending clients to your competitors.

Run SEO Pulse