Elevator Pitch
- A simple, server-side method can block poorly-behaved LLM crawlers without using JavaScript, by exploiting their tendency to ignore
robots.txt and mishandle hidden links.
Key Takeaways
- Designate a "poisoned" path disallowed in
robots.txt and serve special responses to cookie-less requests.
- Serve hidden links and redirects to trick sloppy crawlers into identifying themselves.
- Use cookies and specific headers to block or permit access based on crawler behavior.
Most Memorable Quotes
- "Name a poisoned path on your website, and disallow crawling of this path in
robots.txt."
- "Well-behaved search engine crawlers will respect
robots.txt and avoid this link, while slop-crawlers that don’t read robots.txt might follow it."
- "This scheme is obviously not flawless, but in practice it seems to work quite well, and at least it has no false positives."
Source URL•Original: 295 words
•Summary: 138 words