Passphrase wordlist and hashcat rules for offline cracking of long, complex passwords
People think they are getting smarter by using passphrases. Let’s prove them wrong!
This project includes a massive wordlist of phrases (over 20 million) and two hashcat rule files for GPU-based cracking. The rules will create over 1,000 permutations of each phase.
To use this project, you need:
WORDLIST LAST UPDATED: 2021-10-04
Generally, you will use with hashcat’s
-a 0 mode which takes a wordlist and allows rule files. It is important to use the rule files in the correct order, as rule #1 mostly handles capital letters and spaces, and rule #2 deals with permutations.
Here is an example for NTLMv2 hashes: If you use the
-O option, watch out for what the maximum password length is set to – it may be too short.
hashcat -a 0 -m 5600 hashes.txt passphrases.txt -r passphrase-rule1.rule -r passphrase-rule2.rule -O -w 3
Some sources are pulled from a static dataset, like a Kaggle upload. Others I generate myself using various scripts and APIs. I might one day automate that via CI, but for now you can see how I update the dynamic sources here.
|wiktionary-2021-09-29.txt||dynamic||Article titles scraped from Wiktionary’s index dump here.|
|wikipedia-2021-09-29.txt||dynamic||Article titles scraped from the Wikipedia |
|urban-dictionary-2021-09-29.txt||dynamic||Urban Dictionary dataset pulled using this script.|
|know-your-meme-2021-09-29.txt||dynamic||Meme titles from KnownYourMeme scraped using my tool here.|
|imdb-titles-2021-09-29.txt||dynamic||IMDB dataset using the “primaryTitle” column from |
|global-poi-2021-09-29.txt||dynamic||Global POI dataset using the ‘allCountries’ file from 29-Sept-2021.|
|billboard-titles-2021-10-04.txt||dynamic||Album and track names using Ultimate Music Database, scraped with a fork of mwkling’s tool, modified to grab Billboard Singles (1940-2021) and Billboard Albums (1970-2021) charts.|
|billboard-artists-2021-10-04.txt||dynamic||Artist names using Ultimate Music Database, scraped with a fork of mwkling’s tool, modified to grab Billboard Singles (1940-2021) and Billboard Albums (1970-2021) charts.|
|book.txt||static||Kaggle dataset with titles from over 300,000 books.|
(could be dynamic in future)
|Song lyrics for Rolling Stone’s “top 100” artists using my lyric scraping tool.|
|cornell-movie-titles-raw.txt||static||Movie titles from this Cornell project.|
|cornell-movie-lines.txt||static||Movie lines from this Cornell project.|
|author-quotes-raw.txt||static||Quotables dataset on Kaggle.|
|1800-phrases-raw.txt||static||1,800 English Phrases.|
|15k-phrases-raw.txt||static||15,000 Useful Phrases.|
The rule files are designed to both “shape” the password and to mutate it. Shaping is based on the idea that human beings follow fairly predictable patterns when choosing a password, such as capitalising the first letter of each word and following the phrase with a number or special character. Mutations are also fairly predictable, such as replacing letters with visually-similar special characters.
Given the phrase
take the red pill the first hashcat rule will output the following:
take the red pill take-the-red-pill take.the.red.pill take_the_red_pill taketheredpill Take the red pill TAKE THE RED PILL tAKE THE RED PILL Taketheredpill tAKETHEREDPILL TAKETHEREDPILL Take The Red Pill TakeTheRedPill Take-The-Red-Pill Take.The.Red.Pill Take_The_Red_Pill
Adding in the second hashcat rule makes things get a bit more interesting. That will return a huge list per candidate. Here are a couple examples:
T@k3Th3R3dPill! T@ke-The-Red-Pill taketheredpill2020! T0KE THE RED PILL
Optionally, some researchers might be interested in:
- The raw source files mentioned in the table above. You can download them by appending the file name to
- The script I use to clean the raw sources into the wordlist here.
The cleanup script works like this:
$ python3.6 cleanup.py infile.txt outfile.txt Reading from ./infile.txt: 505 MB Wrote to ./outfile.txt: 250 MB Elapsed time: 0:02:53.062531