Why AI Search Ignores Your Content — and How to Diagnose It

AI search systems don't match keywords — they decompose queries into themes, extract concepts, score content chunks against each theme, apply attention weighting, and select citations based on coverage completeness. Content that scores well for one theme but misses others will fail the overall retrieval threshold. Diagnosing this requires replicating the full pipeline for a specific query against a specific page.

AI Query Simulation in QueryBurst replicates this 9-stage pipeline — query fan-out, concept extraction, RAG scoring, attention scoring, and citation selection — with per-stage visibility into where content passes or fails.

How It Works

The 9-Stage Pipeline

The simulation replicates the core retrieval and generation pipeline used by modern AI systems:

1. QUERY FAN-OUT
   Your query → Multiple thematic queries
   Prioritized by importance (P1/P2/P3)

2. CONCEPT EXTRACTION
   Each theme → Key concepts AI focuses on
   Types: entities, intent, modifiers, context
   Weighted by importance

3. PAGE SCORING (RAG)
   How well your page covers each theme
   Traditional semantic search scoring
   Per-theme and overall page score

4. ATTENTION SCORING
   Concept-level weighted scoring
   Simulates what LLM "pays attention to"
   Identifies specific concept gaps

5. SITE-WIDE SEARCH
   Searches your entire website
   Two-phase retrieval (page → chunks)
   Finds best content across all pages

6. COVERAGE ANALYSIS
   Compares target page vs site best
   Identifies internal linking opportunities
   Shows where other pages beat yours

7. AI RESPONSE GENERATION
   Simulates actual AI response
   Shows which chunks get cited
   Tracks citations from your page

8. ITERATIVE REFINEMENT
   Identifies gaps in initial response
   Runs follow-up queries
   Generates refined response

9. RECOMMENDATIONS
   Actionable optimization suggestions
   Content gaps, internal links, enhancements
   Prioritized by impact

Starting a Simulation

Query Input

Enter a question a user might ask an AI assistant:

Good queries:

  • "best golf clubs for beginners to improve my game"
  • "how to choose ergonomic office chair for back pain"
  • "what are the benefits of meditation for anxiety"

Tips:

  • Use natural language (how users actually talk to AI)
  • Include intent and context
  • Think about what a complete answer needs

Simulation Time

Duration: 30-60 seconds

  • Shows real-time progress through stages
  • Results are cached for instant re-access
  • Can re-run to verify improvements

Understanding Results

Score Summary

Two key scores at the top:

Page Score (RAG)

  • How well this specific page answers the query
  • Traditional semantic search scoring
  • 75+ = Strong coverage
  • 60-74 = Good coverage
  • 45-59 = Weak coverage
  • <45 = Poor coverage

Site Score

  • Best coverage available across your entire site
  • Shows if other pages have better content
  • Higher than page score = opportunity to link or consolidate

Themes Count

  • Number of thematic queries generated
  • Typically 5-10 themes
  • More themes = more comprehensive query

Actions Count

  • Number of optimization recommendations
  • Prioritized by impact

Thematic Fan-Out

What it shows: How AI decomposes your query into specific themes to research.

Visual representation:

  • Query at top
  • Branches out to themes below
  • Size indicates importance
  • Hover for full expanded query and reasoning

Priority levels:

PriorityVisualMeaning
P1Large, violet, glowingPrimary focus - must answer these
P2Medium, graySupporting information
P3Small, subtleContextual background

Example: Query: "best latex mattress for back pain"

  • P1: "Latex mattress recommendations for back pain relief"
  • P1: "Latex vs memory foam for spinal support"
  • P2: "Firmness levels for back pain sufferers"
  • P2: "Certifications and materials quality"
  • P3: "Price ranges and value comparison"

What it means: AI will focus most on P1 themes. If you score poorly on P1, you won't get cited.

Concept Extraction

What it shows: The specific concepts AI would focus on when reading your content.

Concept types:

TypeColorWhat It IsExample
EntityCyanNamed things"GOLS certification", "Dunlop latex"
IntentVioletUser goals"pain relief", "durability"
ModifierAmberQualifiers"organic", "best", "affordable"
ContextGrayBackground"Australia", "2026"

Weight:

  • Shows how much attention AI gives each concept
  • Higher weight = more important to address
  • 60%+ weight concepts are critical

Use case: Check if your content covers high-weight concepts. Missing a 70% weight concept means AI won't consider your content comprehensive.

Page Scoring (RAG)

What it shows: How well this specific page covers each theme using traditional semantic search.

Priority breakdown:

  • Shows your average score for P1, P2, P3 themes separately
  • P1 score is most important—must be strong to get cited

Theme details (expandable):

  • Each theme with its score
  • Best matching chunk from your page
  • Heading context (where the chunk appears)

Color coding:

  • Green (75%+) = Strong coverage
  • Blue (60-74%) = Good coverage
  • Amber (45-59%) = Weak coverage
  • Red (<45%) = Poor/missing coverage

What to focus on: P1 themes with weak scores. These are critical gaps.

Attention Scoring (Concept-Level)

Advanced scoring that simulates LLM attention mechanism.

What's different from RAG:

  • Breaks themes into weighted concepts
  • Scores each concept individually
  • Calculates weighted overall score
  • Shows specific concept gaps

Concept breakdown (per theme):

  • Lists all extracted concepts
  • Shows coverage score for each
  • Identifies which concepts you're missing
  • Links to best matching content

Example: Theme: "Latex mattress certifications"

  • Concept: "GOLS certification" (70% weight) → 85% coverage ✓
  • Concept: "Oeko-Tex standard" (50% weight) → 20% coverage ✗
  • Concept: "organic materials" (60% weight) → 0% coverage ✗

Gap tags: Concepts you don't address at all are highlighted in red.

Why this matters: Even if your overall theme score is decent, missing high-weight concepts means AI sees your content as incomplete.

Site-Wide Coverage

What it shows: Whether other pages on your site have better content for each theme.

Columns:

  • Pri - Theme priority (1/2/3)
  • Theme - Theme name
  • Page - This page's score
  • Best - Best score from any page on site
  • Source - Which page has the best content

Key insights:

✅ Green checkmark: This page has the best content (ideal)

📄 Other page name: Another page beats you (opportunity)

  • Consider adding internal link to that page
  • Or consolidate content onto this page

Two-Phase Retrieval:

Internal Linking Opportunities: Shows top 3 suggestions for linking to other pages that better cover specific themes.

Retrieved Chunks

What it shows: The specific chunks that AI would retrieve and use to generate its response.

Grouped by theme:

  • Each theme shows its top retrieved chunks
  • Chunks can come from this page or others
  • Score shows semantic relevance

Color coding:

  • Green background = From this page (good!)
  • Gray background = From other pages (missed opportunity)

Expandable detail:

  • Click any chunk to see full text
  • Shows heading context
  • Raw similarity score
  • Source page URL

Citation indicator: Chunks with very high scores (75%+) are likely to be cited in AI response.

Use case:

  • See which of your chunks are competitive
  • Identify why other pages' chunks rank higher
  • Learn from winning chunk patterns

AI Response Preview

What it shows: A simulated AI response using your content, exactly as ChatGPT/Perplexity would generate it.

Initial Response:

  • Generated from retrieved chunks
  • Shows citations (numbered sources)
  • Highlights which citations are from your page

Refined Response (if available):

  • After running follow-up queries
  • Fills gaps found in initial response
  • Additional citations shown separately
  • Toggle between initial and refined

Citations:

  • Numbered like real AI responses
  • Green = From this page (success!)
  • Gray = From other pages
  • Shows page title and URL

Follow-Up Queries:

  • Lists additional queries AI ran
  • Shows reason for each follow-up
  • Indicates gaps in initial retrieval

Queries Answered:

  • Shows which gaps were successfully filled
  • Additional evidence found

Still Missing:

  • Aspects AI couldn't answer even after follow-ups
  • True content gaps on your site

Use case: See if your page contributes to the final answer. If not, see what's missing.

Recommendations

What it shows: Prioritized, actionable suggestions based on the simulation.

Priority levels:

PriorityColorWhen Used
HighRedCritical gaps in P1 themes, missing citations
MediumAmberOpportunities to improve coverage
LowGrayMinor enhancements

Recommendation types:

Internal Link:

  • Links to suggest from this page to others
  • Helps users find better content for specific themes

Content Enhancement:

  • Specific topics to expand on this page
  • Based on concept gaps
  • Focus on high-weight missing concepts

New Content:

  • Themes with no good coverage anywhere on site
  • True content gap opportunities

Concept Gap:

  • Missing high-weight concepts
  • Add specific examples, data, or explanations

Visibility:

  • Content exists but scores low
  • Improve clarity, add keywords, restructure

Each recommendation includes:

  • Clear title and description
  • Specific action to take
  • Expected impact

Common Use Cases

1. Pre-Publication Check

Goal: Verify content will perform well in AI responses before publishing

Flow:

  1. Run simulation with target query
  2. Review page score and theme coverage
  3. Check attention scoring for concept gaps
  4. Iterate on content based on recommendations
  5. Re-run to verify improvements

2. Competitive Content Analysis

Goal: Understand why competitors get cited over you

Flow:

  1. Run simulation for key query
  2. Check site-wide coverage
  3. See which chunks get retrieved
  4. Review AI response—are you cited?
  5. Examine retrieved chunks from other pages
  6. Learn from patterns in high-scoring content

3. Internal Linking Strategy

Goal: Optimize internal links to surface best content

Flow:

  1. Run simulation on hub/category page
  2. Review site-wide coverage
  3. Note where other pages score higher
  4. Add contextual links to those pages
  5. Re-run to verify improved coverage

4. Content Gap Discovery

Goal: Find missing topics/angles to create content for

Flow:

  1. Run simulation for target query
  2. Review thematic fan-out
  3. Check which themes have weak coverage
  4. Review "Still Missing" in AI response
  5. Create content for high-priority gaps

5. Concept Coverage Audit

Goal: Ensure comprehensive topic coverage

Flow:

  1. Run simulation
  2. Expand attention scoring section
  3. Review concept breakdown per theme
  4. Note high-weight concepts with low coverage
  5. Add specific content addressing those concepts

Best Practices

Writing Good Queries

✅ Do use conversational language
✅ Do include user intent and context
✅ Do think comprehensively (what makes a complete answer?)
✅ Do test multiple related queries

❌ Don't use keyword stuffing
❌ Don't use branded queries (unless testing brand)
❌ Don't make queries too narrow

Interpreting Scores

Page Score Guidelines:

  • 80%+ = Excellent, highly competitive
  • 70-79% = Strong, likely to be cited
  • 60-69% = Good but improvable
  • 50-59% = Weak, unlikely to be cited
  • <50% = Poor, major gaps

When site score >> page score:

  • You have the content, just on wrong page
  • Consider internal linking
  • Or consolidate content

When both scores low:

  • True content gap
  • Opportunity to create definitive resource
  • Focus on high-priority themes first

Optimization Workflow

  1. Run baseline - Understand current state
  2. Prioritize P1 - Focus on primary themes first
  3. Address concept gaps - Add high-weight missing concepts
  4. Check chunks - Ensure best chunks are retrievable
  5. Re-run - Verify improvements
  6. Iterate - Keep refining until competitive

Score History

Automatic tracking of all simulations for this page.

Features:

  • Shows query and timestamp
  • Displays page and site scores
  • Quick access to cached reports
  • Re-run any historical query

Use cases:

  • Track improvements over time
  • Compare scores before/after edits
  • A/B test different query phrasings
  • Monitor score trends

Click any entry to load full cached report instantly (no re-simulation needed).

Technical Details

Scoring Methods

RAG (Page Scoring):

  • Traditional cosine similarity
  • Chunk embeddings vs theme query embeddings
  • Normalized to 0-100 scale
  • Fast, deterministic

Attention-Weighted (Site Scoring):

  • Concept-level decomposition
  • Weighted by concept importance
  • Simulates LLM attention mechanism
  • More accurate but slower

Retrieval Strategy

Two-Phase (Site-Wide):

  1. Page aggregation - Score each page by averaging top chunks
  2. Chunk retrieval - Get best chunks from top pages
  3. Prevents long-tail pages dominating with many mediocre chunks

Direct (Page Scoring):

  • Searches chunks directly
  • Faster for single-page analysis

Caching

Simulation results are cached:

  • Full report stored for 30 days
  • Instant re-access from history
  • Re-run forces fresh analysis
  • Useful for before/after comparisons

Limitations

What this simulates:

  • Core retrieval mechanisms
  • Theme decomposition and concept extraction
  • Citation selection logic
  • Multi-turn refinement

What it doesn't simulate:

  • Real-time web results (uses your crawled pages only)
  • User engagement signals
  • Personalization
  • Platform-specific biases (Google vs ChatGPT)

Remember: This is a simulation based on industry understanding of RAG systems. Real AI systems vary and evolve constantly. Use for directional optimization, not guarantees.

  • Retrieval Optimizer - Test specific chunks against competitor SERP results
  • Answer Spy - Discover criteria AI looks for before optimizing content
  • Content Clarity - Ensure chunks are self-contained for better retrieval
  • Topic Coverage - Verify keyword alignment for semantic search