Elevator Pitch

  • A simple, server-side method can block poorly-behaved LLM crawlers without using JavaScript, by exploiting their tendency to ignore robots.txt and mishandle hidden links.

Key Takeaways

  • Designate a "poisoned" path disallowed in robots.txt and serve special responses to cookie-less requests.
  • Serve hidden links and redirects to trick sloppy crawlers into identifying themselves.
  • Use cookies and specific headers to block or permit access based on crawler behavior.

Most Memorable Quotes

  • "Name a poisoned path on your website, and disallow crawling of this path in robots.txt."
  • "Well-behaved search engine crawlers will respect robots.txt and avoid this link, while slop-crawlers that don’t read robots.txt might follow it."
  • "This scheme is obviously not flawless, but in practice it seems to work quite well, and at least it has no false positives."

Source URLOriginal: 295 wordsSummary: 138 words