It was when I first saw ChatGPT as a traffic source in Google Analytics.
At the time, I was not intentionally optimizing for AI, so it was an interesting novelty, and certainly an indicator of what was to come.
After some digging, I noticed that a few product pages were the landing pages for this new traffic. Query strings came in empty. I couldn't see what people had asked ChatGPT to arrive there.
The Boring Pages Won
It was a B2B website for HVAC parts. The product catalog was very technical: long pages with lists and bullets. No UGC, no fresh content. Just boring lists of industrial parts and products. Detailed technical specs, part sizes, dimensions.
Nothing you'd put in a content marketing case study.
If you think about it: Schema tags, structured data, JSON-LD: it is really not that hard to see why AI crawlers would find that type of content. It is like ChatGPT was caught in my net of tags and structured content. Those pages were easy for machines to parse because they were factual, organized, and unambiguous. AI "sees" extractable structure more than creativity or marketing polish.
For many commercial and technical searches, the best-performing AI-source content may be the most boring page on the site. The question is whether your competitors have figured that out yet.
What I Changed After That
I had no idea what RAG, AEO, or LLM Visibility meant at the time. I just knew the machines were picking the pages where the answer was already prepared.
From that moment on, I would always try to sneak in blog posts, technical charts, tables, bullet lists. Then I worked to improve the lists that were already getting AI traffic:
- Clearer hierarchy, consistent H1/H2 sections
- Standardized spec tables
- Compatibility blocks: dimensions, materials, colors
- Installation notes and use cases
AI systems seem to list pages where the answer is already prepared.
Another thing I started doing: mixing technical content with normal text. I'd add content above and below the specs. A short plain-English summary at the top: what this part does, where it's used, who it's for, then detailed specs, then FAQs, comparisons, troubleshooting, and related models. That helps both humans and AI systems understand context.
All that content would feed very detailed schema tags and structured data wherever possible: Product, FAQ, HowTo, Organization, Breadcrumbs, availability, identifiers. The easier it is to classify, the easier it is to retrieve.
The Attribution Problem
Initially, I was following it in Google Analytics as traffic from known AI platforms: ChatGPT, Copilot, Perplexity. I could see it growing. But I was not certain if the growth was due to my recent updates or the fact that these AI platforms were starting and growing at the same time.
There was no clean control group. Query strings arrived empty. Attribution was muddy.
Only after reading more on the subject and going deeper did I start to see other possible KPIs and metrics to follow. In fact, I even created my own AI Visibility tool to help me: (a) understand it, and (b) start measuring, monitoring, and comparing pages.
It gives me crawler access, AEO score, LLM visibility, GEO information, and some technical signals. Users paste any URL to check what AI crawlers can fetch, how well the page is structured for AI citation, and how LLMs answer queries on that topic. Includes AEO scoring, GEO recommendations, sitemap discovery, and a robots.txt builder.
What an AEO Audit Actually Shows
Here is what a real audit looks like on an underperforming page: 1,404 words of content, one H2 heading, two H3s, no FAQ section, no direct answer sentence at the top.
| Signal | Value | Assessment |
|---|---|---|
| Retrieval Readiness | 62/100 | Structurally weak |
| AEO Score | 42/100, Grade D | Poor extraction probability |
| H2 headings | 1 for 1,404 words | Far too flat |
| Words per H2 section | 1,404 | Unparseable block |
| FAQ section | None | Missing |
| Answer block | Missing | No entry point |
| Lists / tables | Present | One strength |
| Structured data | Multiple types | Another strength |
| JS render risk | Low | Accessible |
The gap between Retrieval Readiness (62) and AEO Score (42) on the same page tells you something. A page can have decent raw content: lists, low render risk, good word count, and still score poorly because the structure isn't giving AI systems clean entry points to extract from.
What the audit surfaced as high-priority fixes:
- Add a direct answer sentence in the first paragraph. AI systems seem to prefer pages that define the topic upfront.
- Add comprehensive heading structure. Only 1 H2 for 1,404 words, that averages 1,404 words per section, far too dense for AI parsing.
- Add an FAQ section. AI systems appear to favor FAQ content for specific questions.
- Break long sections into more specific subsections with descriptive headings.
These are structural fixes, not content rewrites. The words were already there.
Who Is and Isn't Prepared
I worked inside large organizations long enough to know how content decisions actually get made. Homepage layout: a committee of editorial, upper management, and IT. Marketing usually not in the room.
Nobody asking whether a crawler could read the page, because nobody was thinking about crawlers beyond Google. AI visibility wasn't on the radar then. For most of those organizations, it still isn't. The decision-making structure hasn't changed. The crawlers have.
I ran Amazon through the same tool: a standard product page, a bamboo cutting board set. Score: 43 out of 100.
Not because the page is poorly written. Amazon product pages have reasonable structure, decent word count, lists, and tables. The problem is their robots.txt. Many major AI crawlers are blocked. (as of 04/17/2026)
| AI Crawler | Amazon Access |
|---|---|
| GPTBot | Blocked |
| ChatGPT-User | Blocked |
| ClaudeBot | Blocked |
| Google-Extended | Blocked |
| PerplexityBot | Blocked |
| CCBot | Blocked |
| Bytespider | Blocked |
This is prioritization, not ignorance. Amazon has its own AI strategy and its own data. They don't need ChatGPT sending traffic to product pages they'd rather control directly. Fair enough.
But Amazon's decision reveals something about everyone else.
The companies most at risk are not the giants. The giants are either making deliberate choices or are already embedded deeply enough in LLM training data that brand-level citations happen regardless. Their brand authority predates the optimization problem.
The vulnerable ones sit in the middle: companies with real content, real web presence, and real marketing budgets, but no visibility into whether AI systems can actually read their pages. Big enough to have slow, committee-built websites. Not big enough to have brand authority baked into model training data.
AI wasn't on the radar inside large organizations. Legal owned content review. IT owned the CMS. Nobody had mapped what the crawlers could and couldn't reach. For most organizations, that's still true.
And the companies that move first on structure will be the ones that own those citations when the market catches up.
For smaller businesses, the problem is often more fundamental: thin content, no schema, aging WordPress installations. AEO strategy is premature when the foundation isn't there.
The companies that should be paying attention now are the ones in between.
What to Measure If You're Starting From Zero
| Signal | What to Check | Good | Poor |
|---|---|---|---|
| Crawler access | robots.txt allows AI bots | All major crawlers allowed | Any blocked |
| Heading structure | H2 count vs word count | Recommended: 1 H2 per 300-400 words | 1 H2 per 1,400+ words |
| Answer block | Direct answer in first paragraph | Present and specific | Missing or buried |
| FAQ section | Structured Q&A on page | Present | Absent |
| Structured data | JSON-LD schema types present | Product, FAQ, HowTo | None detected |
None of these require a new content strategy. Most can be addressed by restructuring pages that already exist.
The content is often there. The structure usually isn't. Every week without these fixes is a potential missed opportunity for AI citations.
The Underlying Pattern
Most updates that improved AI citation probability on those HVAC pages also improved the page for human readers. Clearer structure, faster access to the answer, less ambiguity about what the page covers.
Differently than Google and the elusive “Content Quality” guidelines, I believe AI systems are focused less on reading your content for its quality. They are scanning it for structure, specificity, and extractability. Pages that prepare the answer in advance get cited. Pages that make the reader work to find it don't.
Structure precedes everything else.