I intended on publishing a comprehensive analysis on Hillsborough County FL (and I may eventually do that). However, I went down a micro rabbit hole and instead want to focus on a data integrity discussion instead of analyzing the highest volume of data possible, for now.
First I started by running the entire county through the EDA Excel file. Then a ran a HASH of [LAST]+[REG DATE]+[BIRTH DATE]. This HASH yielded 60 records (30 pairs) that matched the HASH. Let’s break that down further.
Here you can see the distribution of registration dates, birth dates and age. Take note of the age. This HASH group is decidedly on the young side. This graph explains it better. The bottom axis is the age of the person when they registered.
Of the people who match this HASH group for the entire county, 80% of them are less than 20 years old. 47% are less than 18 years old.
Wait what? Less than 18 years old? Yes apparently in FL we can “inflate” the voter rolls with 16 years olds while they sit on the rolls waiting to vote in two years (or be voted FOR perhaps). This young age group ends up filling up this HASH query consistently.
To further prove the point, let’s look at the profile of all age groups in Hillsborough County when they “decide” to register to vote. The bottom axis is age.
Look at the huge surge in “responsible” 16 to 18 years olds who are socially responsible enough to register at rates exponentially higher than any other age bracket. WOW!
Getting back to the micro analysis of the HASH group…..
Let’s delve into more details keeping in mind if all of this sounds like “real life” to you. If it doesn’t, what is it then? Bad data? Fraud? Honest mistakes?
Within the group of 60 HASH matches, there is a subgroup of 48 (24 pairs) where the ADDRESS also matches. So the HASH that would find these records would be [ADDRESS]+[LAST]+[REG DATE]+[BIRTH DATE]. Remarkable. Within this group you will find the following:
19 pairs living at the same address are also the same GENDER. 5 pairs are a different gender. 80% identical twins?
Within that subset of 19, 2 pairs have the same phone number except that it ONLY differs by one digit, the last digit. And that digit is only one number higher than the other number in the pair.
Finally, this group of 60 HASH matches only occur in three cities and wait for it….they are all immediately adjacent to one another.
I don’t have a rationale explanation for why people in those three cities “behave” this way.
Perhaps it is worth a deeper dive specifically into the rolls associated with these remarkable cities? Avon Park, Lake Placid, Sebring.