Elevator Pitch

  • The article recounts how a Fediverse server admin uncovered a pipeline where FBI contractors scraped public posts—often using evasive techniques—from decentralized social networks, funneled them through Facebook, and into law enforcement investigative tools, all while causing headaches for server operators and raising questions about privacy and data misuse.

Key Takeaways

  • Shady data brokers like SocialGist (via BoardReader.com) scrape public Fediverse content, often ignoring admin requests to stop, and disguise their activities using proxies and browser emulation.
  • The scraped data is processed, sometimes poorly, then fed into systems used by Facebook and the FBI, who use keyword and sentiment analysis to flag posts for investigation, sometimes misunderstanding the source and context due to technical ignorance.
  • Attempts to thwart scrapers included technical countermeasures and “data poisoning” (feeding fake data), eventually revealing a law enforcement investigation into a notorious online hoaxer, and highlighting systemic issues with law enforcement’s approach to decentralized platforms.

Most Memorable Aspects

  • The FBI mistakenly attributed a threatening post to the wrong server due to BoardReader’s mishandling of Fediverse data, leading to a real-world law enforcement inquiry based on flawed inputs.
  • The admin’s creative technical battle against persistent scrapers included feeding them endless, randomized nonsense to pollute their databases and provoke a human response.
  • The story’s final twist: the FBI’s fervor was actually due to tracking a prolific swatter, not high-profile financial executives, tying together months of perplexing activity.

Direct Quotes

  • “There are scrapers getting data out of fedi without identifying themselves and at least one of them is selling data to the FBI.”
  • “So the pipeline was my terrible awk script generating JSON that represented gibberish posts, and that went out through lighttpd, then nginx, then it left my machine and went into BoardReader's crawlers, from there into their index... and straight out to Facebook, and presumably from there to the FBI, and from there into whatever UI that was that they were using to search.”
  • “If you want data from fedi, just make a fake instance and cram it onto a bunch of relays... at least you don't break anyone else's server, it's easier than scraping, and the data gets delivered to you in real-time.”

Source URLOriginal: 9212 wordsSummary: 373 words