Digit Digger is an intelligent piece of data mining software I developed in anticipation of the emerging SMS marketing boom. Unlike email marketing which yeilds only a 4% open rate, SMS text messages are opened by 97% of all reciepients. This makes SMS texting one of the greatest marketing mediums of all time. If you haven’t seen a marketing message hit your cellphone in the form of an SMS text message, rest assured you will.
Digit Digger data mines cell phone numbers off the web through Craigslist, allowing the user of the software to target particular demmagraphics of individuals that closely resemble those they are trying to market their products or services to.
For example, if you sold women’s dresses. It could be assumed that the target demographic you would be trying to market your products to, would be women (I am using this example for simplicity’s sake). With this in mind, you would then set up the Digit Digger program to data mine all the categories in any and all cities on craigslist that relate to your target demographic. These categories would likely include Beauty & Health, Baby&Kids, Jewelry and so on.
De-Masking Algorithyms Capture More Results
Digit Digger then systematically goes through each city and category specified searching through the body of the each Craigslist ad, looking for a 10 digit telephone number. It also has an algoritym built in to de-mask those numbers which have been disguised by the author to avoid being harvested. Examples of these sorts of disquised attempts are shown below.
- 360903711eight
- (360)-nine3 seven-7118
- three six zero nine zero three seven one one eight
Unlimited Possibilities
The possibilities are endless. With Digit Digger, you can create locally targeted lists for local businesses, targeted lists for select products etc. By broadly mining Craigslist, you can create lists based off of area code, state, city, and so on.
Windows Azure Services | Database Storage in the Cloud
Harvesting this much data is one thing, managing it is another. Early builds just wrote each campaign’s results to a CSV file, but it quickly became apparent that to properly manage and sort the data, a real database solution would have to be integrated.
Digit Digger communicates with a hosted Windows Azure SQL database server solution to store it’s results to the cloud. Each record is recorded across several columns including phone number, area code, state, city, post title, post link and date of entry. Using basic Transact SQL statements you can query the database through the Window’s Azure customer portal or through Microsofts SQL Server Management Studio to both pull and/ or export queried results.
Production Results
Current benchmarks have only been done on one machine. A five year old laptop running windows vista with 4gb of RAM and a 1.7ghz processor will run 8 campaigns at full throttle consuming 98% of the CPU. Running 16 campaigns, and harvesting only the first page of results, you can successfully cover every city and category on Craigslist (United States only) in 24 hours. Thats a total of approximately 1.5 million ads that are viewed and anaylyzed, returning a yield of about 60-80 thousand new and unique numbers that are written to the database. 1rst runs will yield upwards of 100k, but afterwards you get more duplicates as people tend to top post on a regular basis.
Below is a video demonstration of the Digit Digger program.