A few years ago, I wrote a short series of posts about things I found on Github, looking at commands from shell history files, common pipe chains, and words from custom spell-check dictionaries. At the same time, I also released a tiny tool (more of a hack, really) that let you grab files from Github en-masse, using a Github code search query.

Pulling the data for these posts using the Github search API used to be rather easy: I'd simply search for the exact filename I wanted, like so:

path:.bash_history

I was interested to see that queries of this sort recently stopped working - Github now requires you to specify both a search term and a path. There are all sorts of possible explanations for this change, but my guess is that it might be to prevent exactly the kind of trawling I've been amusing myself with. It used to be all too easy to do a search for "path:id_rsa", and turn up slip-ups with real consequences.

What to do about this

I pulled the data I discuss above last week, and spent a day or so wondering what to do about it. One could, for instance, write a script to lodge a ticket for all the affected repos noting that a browser profile was checked in, and that this could potentially be dangerous. This would annoy a lot of people mightily - many of the checked in profiles were put there intentionally by people who understand the issues.