This week we dove a little into D3.js (which made manipulating SVG quite a bit easier). I decided to use this very old dataset about extramarital affairs. There were a number of different things in the data, but I looked specifically at how religious people reported themselves to be, and what percentage of people with the same levels of religiousness had cheated on their spouse.
Originally, I had visualized it by number of people in each category, but since was a disparity between how many people were in each religiousness group, I thought showing the percentage would be more interesting.
But I also didn’t want to completely lose the info with the sheer number of people in each category. (For example, the “somewhat religious” group was much larger than the “anti religious” group). So I added an effect so that when you hover over a bar, it changes color and tells you how many people were in each category.
(The screenshots above remove the cursor for some reason but you get the idea.)
One note is that I realize “cheaters” might not necessarily be an accurate word for what’s going on. There’s some nuance in what exactly “extramarital affair” means, and it doesn’t always mean that someone is going behind the back of their spouse without their knowledge. I’m making some assumptions about this 1969 survey, though – that the people are referring to cheating and not ethical non-monogamy.
Anyway! You can check it out here. Code is also below.
We were also asked to take a look at I Quant NY this week. I looked specifically at this post called “Parking Immunity? Diplomats Owe NYC $16 Million in Unpaid Parking Tickets. A Closer Look at the Worst Offenders.” The writer looked at a dataset showing unpaid parking tickets in NYC, and found something interesting — license plates of diplomats seemed to rack up the highest number of unpaid tickets.
I thought this was a clever insight gleaned from the data, but I am curious to know how Ben manipulated the data to get the info he ended up with. He writes:
Whats not so cool is the fact that the City loaded many rows of the data in there twice accidentally. That meant there were multiple rows with the same ticket number and conflicting outstanding debt amounts. Though I understand that data errors happen, I don’t understand how the City can keep putting out data sets with no ownership and no effective way to send in fixes. A city who cares about the usability of its Open Data can do better.
He then says he “cleaned up” the data. But I wish I knew more about what he did to change it, and how he knew for sure there the duplicates were accidents.