Fuzzing is fun. But fuzzing is even more fun when you have a solid wordlist to work with. When it comes to hunting down subdomains there are a few lists out there to plug into your fuzzer, but most are small, one-shot affairs. I set out to build a list of popular subdomains which was comprehensive and could be easily kept up-to-date.
For this project I needed to get hold of DNS records. A lot of DNS records. After trying various sources, I settled upon Rapid7's Project Sonar Forward DNS data set, which includes "... regular DNS lookup for all names gathered from the other scan types, such as HTTP data, SSL Certificate names, reverse DNS records, etc". Rapid7's data set uses a really nice mix of real-world sources and is regularly updated. Perfect.
The first challenge was how to handle 1.4 billion (68 GiB) raw DNS records in a reasonable amount of time; my first attempt at processing the data took well over a week to complete on a reasonably beefy server, hardly ideal for updating frequently.