The fastest search, or: find your password
A couple of weeks ago, a monstrous set of login data was passed around in the dark web. Now known as collections #1 – #5, the files contain an estimated 2.2 billion of unique email address/password combinations. And that means cracked passwords, mind you, not only hashes.
Some of my account data have been leaked ages ago by Adobe and Dropbox (at that time, I believed in simple – 8-digit – passwords for test accounts). Incidentally, I had already deleted these accounts when the breach became public.
Currently, all of my accounts are secured by an at least 25-digit random password with an actual entropy not lower than 100 bits, which is essentially impossible to brute force even from unsalted SHA1 hashes. Hence, none of my accounts should show up in the collections mentioned above. If one did, it would show that the password had been saved in plain text, and I would react correspondingly by deleting this account.
The most comprehensive list of password hashes (including collection #1) has been assembled by Troy Hunt. We can download and search this list very easily as shown in the following.
$ cd /hdd/Downloads/pwned $ wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v4.7z $ cp pwned-passwords-sha1-ordered-by-hash-v4.7z /ssd/temp/pwned $ cd /ssd/temp/pwned/ $ unp ppwned-passwords-sha1-ordered-by-hash-v4.7z $ wc -l <pwned-passwords-sha1-ordered-by-hash-v4.txt 551509767 $ awk -F: '{ SUM += $2 } END { print SUM }' pwned-passwords-sha1-ordered-by-hash-v4.txt 3344070078
551 million passwords, and 3.344 billion accounts: holy cow. What's the fastest way to search for a single string in such a huge file? C't 5/2019 has actually an entire article on this issue, complete with a github repository.
Pina Merkert points out that grep, the standard tool for searching text files under Linux, does not exploit the fact that Troy Hunt offers the download also in a version where the passwords are ordered by hash (that's the version I've downloaded above). The linear search performed by grep is indeed rather slow:
$ time echo -n "111111" | sha1sum | awk '{print toupper($1)}' | grep -f - pwned-passwords-sha1-ordered-by-hash-v4.txt 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220 real 0m48.227s user 0m30.939s sys 0m5.752s
The possibility to order the data allows a binary search to be performed, which is potentially orders of magnitude faster – O(log[n]) vs. O(n) – than a linear search. Pina's python script is indeed dramatically faster:
$ time ./binary_search.py '111111' Searching for hash 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D of password "111111". Password found at byte 5824044773: "3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220" Your password "111111" was in 3093220 leaks or hacked databases! Please change it immediately. real 0m0.042s user 0m0.035s sys 0m0.007s
But my one-liner is four times faster than her 57-line script: 😉
$ time look $(echo -n "111111" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220 real 0m0.011s user 0m0.009s sys 0m0.005s
'Look', by the way, is a binary search tool, part of the util-linux package and thus present on any Linux installation (and also on the Windows subsystem for Linux, I guess). Unfortunately, several distributions (such as Debian and Ubuntu) manage to ship a broken version of look since about 10 years. There's an unofficial patched version available on GitHub if you'd like to try.
With the help of 'look', one can also very easily go through lists of passwords – let's take these completely arbitrary examples:
less topten.dat 123456 123456789 qwerty password 111111 12345678 abc123 1234567 password1 12345 $ time while read -r password; do look $(echo -n "$password" | sha1sum | awk '{print toupper($1)}') pwned-passwords-sha1-ordered-by-hash-v4.txt; done <topten.dat 7C4A8D09CA3762AF61E59520943DC26494F8941B:23174662 F7C3BC1D808E04732ADF679965CCC34CA7AE3441:7671364 B1B3773A05C0ED0176787A4F1574FF0075F7521E:3810555 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804 3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D:3093220 7C222FB2927D828AF22F592134E8932480637C0D:2889079 6367C48DD193D56EA7B0BAAD25B19455E529F5EE:2834058 20EABE5D64B0E216796E834F52D61FD0B70332FC:2484157 E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D:2401761 8CB2237D0679CA88DB6464EAC60DA96345513964:2333232 real 0m0.053s user 0m0.048s sys 0m0.023s
If you ever had a shred of doubt that humanity has a glorious future, look at these 10 ingenious passwords, which secure the 54 million accounts of our most brilliant minds!