How Perplexity and ChatGPT Choose Sources: An AEO Deep Dive
How AI Answer Engines Work
The Basic Architecture
User query
│
▼
┌────────────────────────────┐
│ Query Understanding │
│ - Intent classification │
│ - Entity recognition │
│ - Topic identification │
└─────────────┬──────────────┘
│
▼
┌────────────────────────────┐
│ Source Retrieval │
│ - Web search/index │
│ - Knowledge base │
│ - Trusted sources │
└─────────────┬──────────────┘
│
▼
┌────────────────────────────┐
│ Source Evaluation │
│ - Authority scoring │
│ - Relevance matching │
│ - Freshness check │
└─────────────┬──────────────┘
│
▼
┌────────────────────────────┐
│ Answer Synthesis │
│ - Information extraction │
│ - Citation assignment │
│ - Response generation │
└────────────────────────────┘The Key Insight
AI systems don't just find information—they evaluate which sources deserve to be cited. This evaluation happens before the answer is generated.
The Source Selection Process
Step 1: Candidate Retrieval
AI first gathers potential sources:
| Method | What Gets Retrieved |
|---|---|
| Web search | Top-ranking pages for query |
| Index lookup | Pre-crawled authoritative sources |
| Knowledge base | Verified facts and entities |
Step 2: Authority Evaluation
Sources are scored on trust signals:
Authority Score =
Domain reputation (30%)
+ Topic expertise (25%)
+ Content freshness (20%)
+ External validation (15%)
+ Technical quality (10%)Step 3: Relevance Matching
How well does the source answer the query?
| Factor | Evaluation |
|---|---|
| Topic alignment | Does content cover query topic? |
| Answer presence | Is there a direct answer? |
| Depth | How thoroughly is topic covered? |
| Specificity | Does it match query specificity? |
Step 4: Citation Assignment
Deciding what to cite:
- Factual claims → Cite authoritative source
- Statistics → Cite original research
- Definitions → Cite recognized authority
- Opinions → Cite expert or don't cite
Perplexity's Citation System
How Perplexity Works
Perplexity is explicitly designed as an "answer engine" with visible citations:
Query: "What is the best CRM for startups?"
Perplexity response:
"For startups, HubSpot CRM is often recommended as the
best option due to its free tier and scalability [1].
Pipedrive offers simpler pipeline management [2], while
Salesforce provides enterprise features but at higher
cost [3]."
[1]
hubspot.com/crm/startups
[2]
pipedrive.com/blog/startup-crm
[3]
salesforce.com/solutions/startupsPerplexity Citation Factors
| Factor | Weight | Description |
|---|---|---|
| Domain authority | High | Established, trusted domains |
| Content recency | High | Recently updated content |
| Direct relevance | High | Exact topic match |
| First-party source | Medium | Original over aggregated |
| Structured content | Medium | Easy to parse and extract |
Getting Cited by Perplexity
What works:
- Clear, direct answers to specific questions
- Recent publication dates
- Authoritative domain (not new/unknown)
- Original data or insights
- Well-structured content with headings
What doesn't work:
- Generic, rehashed content
- Outdated information
- Thin content without substance
- Paywalled content
ChatGPT's Source Evaluation
How ChatGPT Uses Sources
ChatGPT with web browsing:
- Searches the web for relevant pages
- Reads and evaluates content
- Synthesizes answer
- Cites sources (when using browsing)
ChatGPT's Implicit Authority Model
Even without browsing, ChatGPT's training reflects source authority:
| Source Type | How It Influences Training |
|---|---|
| Wikipedia | Heavy influence on definitions |
| Academic papers | Scientific claims |
| Major news | Current events |
| Official docs | Technical information |
| Quality blogs | Industry perspectives |
Optimization for ChatGPT
For training influence (long-term):
- Publish content that gets linked/cited widely
- Appear on Wikipedia and Wikidata
- Get mentioned in authoritative sources
For browsing citations (immediate):
- Rank well in search (ChatGPT uses Bing)
- Have clear, extractable answers
- Maintain current, accurate content
Google AI Overview Sources
How AI Overviews Select Sources
Google's AI Overviews draw from:
- Search index - Already-ranked pages
- Knowledge Graph - Verified entity data
- Structured data - Schema markup
- Quality signals - E-E-A-T indicators
The AI Overview Algorithm
Source Selection for AI Overview:
1. Identify top-ranking pages for query
2. Evaluate E-E-A-T signals
3. Check for direct answer content
4. Prefer pages with:
- Schema markup
- Clear heading structure
- Answer-first content
- Recent updates
5. Synthesize from 3-5 top sources
6. Display with source linksWhat Gets Featured
| Content Type | AI Overview Likelihood |
|---|---|
| Direct definitions | Very High |
| Step-by-step guides | High |
| Comparison tables | High |
| Expert analysis | Medium-High |
| News/current events | Medium |
| Opinion content | Low |
Authority Signals That Matter
Domain-Level Signals
| Signal | Why It Matters |
|---|---|
| Domain age | Established presence |
| Backlink profile | External validation |
| Brand searches | Recognition |
| HTTPS | Security baseline |
| Clean link profile | No spam signals |
Content-Level Signals
| Signal | Why It Matters |
|---|---|
| Author expertise | Named, credentialed authors |
| Publication date | Recency indication |
| Update frequency | Maintained content |
| Citation of sources | Verifiable claims |
| Depth of coverage | Comprehensive treatment |
Entity-Level Signals
| Signal | Why It Matters |
|---|---|
| Knowledge Graph presence | Verified entity |
| Wikipedia mention | Third-party validation |
| Consistent NAP | Verifiable identity |
| Schema markup | Explicit entity data |
| Cross-platform presence | Corroboration |
Content Factors That Drive Citations
The Citation-Worthy Content Formula
Citation Likelihood =
Uniqueness (Is this information found elsewhere?)
× Authority (Should readers trust this source?)
× Extractability (Can AI easily use this?)
× Relevance (Does this answer the query?)What Makes Content Extractable
Good (easy to extract):
What is Schema Markup?
Schema markup is a standardized vocabulary of tags
that you add to HTML to help search engines and AI
systems understand your content. It uses JSON-LD
format to define entities, relationships, and
properties in machine-readable format.
**Bad (hard to extract):**
```markdown
You might be wondering about how to make search
engines understand your content better. Well,
there's this thing called schema markup, which is
basically like... let me explain with an example...Content Patterns That Get Cited
| Pattern | Example | Why It Works |
|---|---|---|
| Definition + context | "X is Y. It works by..." | Direct answer + depth |
| Statistics + source | "According to [study], 65%..." | Unique data |
| Step-by-step | "1. First... 2. Then..." | Actionable |
| Comparison table | "Feature A vs B" | Decision support |
| Expert quote | "[Expert] says..." | Authority |
What Gets Ignored
Content That AI Skips
| Content Type | Why Ignored |
|---|---|
| Generic definitions | Available everywhere |
| Thin content | No depth to extract |
| Outdated information | Recency signals fail |
| Opinion without expertise | No authority |
| Heavily promotional | Bias signals |
| Paywalled content | Can't access |
| Duplicate content | Original preferred |
Red Flags for AI Systems
Content signals that reduce citations:
• No author attribution
• No publication date
• Generic stock photos
• Thin word count
• Keyword stuffing
• Excessive ads
• Poor mobile experience
• Slow load times
Optimization Strategies
Strategy 1: Be the Definitive Source
Create content that AI must cite because no alternative exists:
- Original research and data
- First-party case studies
- Expert interviews
- Industry surveys
- Proprietary analysis
Strategy 2: Structure for Extraction
[Question as heading]
[Direct answer in first 40-60 words]
[Supporting section]
[Elaboration with specifics...]
[Related topic]
[Additional context...]
### Strategy 3: Build Entity Authority
1. **Claim your entity** - Knowledge panel, Wikipedia
2. **Implement schema** - Organization, Person, Product
3. **Build consistency** - Same info everywhere
4. **Get corroboration** - Third-party mentions
### Strategy 4: Maintain Freshness
| Action | Frequency |
| ----------------- | ------------- |
| Review and update | Quarterly |
| Add new data | As available |
| Update dates | After changes |
| Check accuracy | Ongoing |
### Strategy 5: Target Specific Query Types
| Query Type | Content Strategy |
| -------------- | ------------------- |
| "What is X" | Definition-focused |
| "How to X" | Step-by-step guide |
| "X vs Y" | Comparison content |
| "Best X for Y" | Recommendation list |
| "X cost/price" | Pricing page/table |