How to Measure Whether AI Could Replace Your Content
The AI Content Similarity tool in QueryBurst generates an AI baseline for each section of a page (from the heading alone), then scores the original against it across semantic similarity, vocabulary overlap, bigram overlap, section length uniformity, and template diversity. High similarity across multiple signals indicates commodity content — material an AI model can reproduce without citing the original. Use it to find sections that need differentiation.
What it does
- Fetches your page and extracts its heading structure
- Generates section baselines — what a generic AI would write for each section, given only the heading and article title
- Generates a full article from outline — reverse-engineers your article's structure and has AI write from that blueprint
- Compares originals vs baselines using multiple signals
Signals
- Semantic Similarity — how close in meaning are the original and baseline? (embedding cosine similarity)
- Vocabulary Overlap — are they using the same words? (Jaccard on content words)
- Bigram Overlap — are they using the same phrases? (F1 on bigram overlap)
- Section Length Uniformity — AI produces uniform section lengths; humans vary
- Template Diversity — AI repeats the same structural template; humans vary
Reading the results
- High semantic + high vocabulary = content reads like what any AI would produce
- Low semantic + low vocabulary = genuine originality, expertise, or unique perspective
- The side-by-side view is the primary output — scores quantify what you can see
Score ranges (from testing)
| Signal | Generic/AI-like | Human/Original |
|---|---|---|
| Semantic | 0.82–0.86 | 0.67–0.80 |
| Vocabulary | 0.10–0.33 | 0.03–0.12 |
| Section CV | < 0.20 (uniform) | > 0.40 (varied) |