How to Extract Entities and Semantic Relationships from Page Content

Entity Analysis in QueryBurst extracts every named entity (people, products, organisations, concepts, locations, events) and semantic relationship from a page, then scores the results for density, coherence, and alignment. Results are visualised as an arc diagram, a filterable entity list, relationship triples, topic clusters, and a document view with attention mode that highlights what AI focuses on. Pages with low entity density or orphan entities (mentioned but unconnected) are flagged.

How to Access Entity Analysis

  1. Navigate to the Page Reports tab in QueryBurst
  2. Select a page from the crawled pages list
  3. Click on the Entities tab in the page details view
  4. Click Start Entity Analysis to run the analysis

Note: Entity Analysis is computationally intensive and may take 30-60 seconds depending on content length.

Understanding the Results

Summary Bar

After analysis completes, you'll see a compact summary showing:

MetricDescriptionGood Score
Overall ScoreCombined score of all metrics70+
DensityHow many entities per unit of text60+
RelationsHow many relationships between entities50+
CoherenceHow well entities cluster into topics65+
AlignmentHow well content matches expected topics60+
SemanticEntity signal vs filler word ratio40+

You'll also see counts of:

  • Entities - Total unique entities extracted
  • Triples - Total relationships discovered
  • Clusters - Topic groups formed

Primary Topics

Tags showing the main topics your content covers, derived from entity clustering.

Inferred Topic

An AI-generated summary of what your page appears to be about, based on the extracted entities and relationships.

Orphan Entities Warning

Entities that are mentioned but have no relationships to other entities. These represent potential content gaps where concepts are introduced but not connected to your main narrative.

View Modes

Graph View

An interactive arc diagram visualization showing:

  • Nodes - Entities positioned in a circle, sized by importance (salience/mentions)
  • Arcs - Curved lines connecting related entities
  • Colors - Each entity type has a distinct color

Interactions:

  • Hover over an entity to highlight its connections
  • Click an entity to lock the selection and see all its relationships in the detail panel
  • Click again to deselect

The right panel shows all relationships for the selected entity in the format:

Subject → predicate → Object

Entities View

A filterable list of all extracted entities:

Entity Types:

TypeColorExamples
PersonBlueNames, roles, titles
OrganizationPurpleCompanies, institutions, brands
ProductEmeraldProducts, services, offerings
ConceptAmberAbstract ideas, methodologies
LocationRosePlaces, regions, addresses
EventCyanDates, occasions, milestones
AttributeSlateProperties, characteristics

Features:

  • Filter by entity type using the pills at the top
  • Click any entity row to expand and see context - where it appears in your content
  • See mention count for each entity

Relationships View

Shows all semantic triples extracted from your content in the format:

Subject → predicate → Object

Examples:

  • "Organic mattress" → is made from → "natural latex"
  • "Company" → founded in → "2015"
  • "Product" → certified by → "GOTS"

Relationships reveal how your content connects concepts and are crucial for AI understanding.

Clusters View

Groups of semantically related entities that form topic clusters:

Each cluster shows:

  • Primary topic - The central concept
  • Entity count - How many entities belong to this cluster
  • Member entities - All entities in the cluster

Well-structured content typically has:

  • 2-5 clear topic clusters
  • Strong central themes with supporting entities
  • Minimal overlap between clusters

Document View

The most powerful view for understanding how entities appear in context. Four sub-modes:

Normal Mode

Full document text with entities highlighted in their type colors. Hover over any entity to see its type and mention count.

Attention Mode

Dims all non-entity text, making entities "pop" visually. This shows what an AI might focus on when extracting meaning from your content.

Condensed Mode

Removes all non-entity text entirely, showing only the entities in sequence. This represents the "essence" of your content after removing filler words.

Themes Mode

Shows the extracted core themes with their descriptions and member entities. This is the final "distilled" representation of your content.

Entity Type Filters: Use the colored pills to toggle which entity types are highlighted. This helps focus on specific aspects of your content.

Chunk Density Scores: Each content chunk shows a density percentage indicating how much of that section's text consists of meaningful entities vs filler.

Watch Distillation

Click the Watch Distillation button to see an animated visualization of how your content is progressively condensed:

  1. Web Page → The original page
  2. Extracted Text → Raw text content (word count)
  3. Attention → Key terms highlighted (filler fading)
  4. Condensed → Only entities remain
  5. Themes → Final core themes

This visualization helps explain how AI systems might process and understand your content.

Interpreting Results

Healthy Entity Structure

✅ High entity density - Content rich in meaningful concepts
✅ Many relationships - Concepts are connected, not just listed
✅ Clear topic clusters - Content is well-organized thematically
✅ Few orphan entities - All concepts tie back to main narrative
✅ Strong coherence - Topics stay focused, don't drift

Warning Signs

⚠️ Low entity density - Too much filler, not enough substance
⚠️ Few relationships - Concepts mentioned but not connected
⚠️ Many orphan entities - Disconnected ideas that confuse readers
⚠️ Too many clusters - Content lacks focus, tries to cover too much
⚠️ Low coherence - Topics jump around without clear structure

Score Interpretation

Score RangeInterpretation
80-100Excellent - Well-structured, entity-rich content
60-79Good - Solid foundation with room for improvement
40-59Fair - Consider adding connections and depth
0-39Needs work - Content may be too thin or unfocused

Tips & Best Practices

Improving Entity Density

  1. Be specific - Use proper nouns and specific terms instead of generic ones
  2. Name things - Give products, features, and concepts clear names
  3. Add context - Explain what entities are when introducing them

Building Stronger Relationships

  1. Use connecting language - "X is designed for Y", "A includes B"
  2. Explain causation - "Because of X, we developed Y"
  3. Show hierarchy - "X, which is part of Y, enables Z"

Reducing Orphan Entities

  1. Connect back - When introducing a new concept, relate it to something already mentioned
  2. Use transitions - Bridge between topics explicitly
  3. Summarize - Recap how concepts relate at section ends

Improving Topic Coherence

  1. Outline first - Plan content structure before writing
  2. One topic per section - Don't mix unrelated concepts
  3. Use headers - Clear section breaks help maintain focus

Technical Details

How Entity Extraction Works

  1. Chunking - Content is split into semantic chunks
  2. NER Processing - Named Entity Recognition identifies entities
  3. Type Classification - Entities are categorized by type
  4. Relationship Extraction - Semantic triples are identified
  5. Embedding - Entities are converted to vectors
  6. Clustering - Similar entities are grouped by semantic similarity
  7. Topic Inference - An overall topic is synthesized

Entity Salience

Entities are ranked by "salience" - how important they are to the content:

  • Frequency of mentions
  • Position in content (earlier = more important)
  • Relationship connections
  • Semantic centrality in clusters

Analysis Caching

Previous analyses are saved and can be loaded from the history panel. This lets you:

  • Compare analyses over time as content changes
  • Quickly review past results without re-running
  • Track improvements in entity structure
  • Page Reports - Page-level health metrics, filtering, and semantic search
  • Links Analysis - Internal linking structure
  • AI Simulation - How AI models respond to queries about your page
  • Knowledge Graph - Site-wide entity profiles and relationships