Last week the Georgia Department of Public Health posted a chart accompanying their COVID Daily Status Report with the intention of graphically representing the most impacted counties in the state in terms of the number of cases over the prior fifteen days. Unfortunately, that chart was designed in a way so that the initial view did not present the viewer with information that could be easily interpreted.
Alberto Cairo, author of How Charts Lie, went into great detail on the issues with this chart as initially presented in a thorough blog post. Essentially, the data was sorted in a way that did not allow users to easily see the trends in the data. When pointed out, the Georgia Department of Public Health did re-post the visualization with a new design that was more straightforward.
With this contemporary example of the simple ways to change viewer understanding based on chart design, this data seemed ripe for exploration with Knarr. For our first-ever Data Diary, Speros Kokenes, Knarr CTO, pulled the closest approximation to the Georgia data used that he could find (as the specific data used in the original chart was not publicly available). The data used in his analysis was from USAfacts.org which reports the daily number of new COVID cases by county.
To begin, Speros was able to quickly drag and drop the CSV file containing daily new COVID cases for every county in the United States into Knarr. From there he selected just the five counties mentioned in the original Georgia chart, then he selected just the 15-day period in question. The next step was to draw a chart to visualize those dates then add the dimension of “New Confirmed Cases” for easy comparison. This yielded a simple histogram of the overall number of cases by day for these five counties. The next step to reproduce the chart was to break the overall case numbers down by county, which was done by adding county name as a dimension. Since the chart’s sorting was already temporal, this gave a visualization of the original chart that was much easier to interpret.
From this version, Speros re-creates a version similar to that offered by Alberto, which added individual bars for each day. In Notes, he is able to take a Snapshot of this interpretation and save it, then move on and work to build different views of the same information. There are usually so many ways to look at the same data, and the next interpretation demonstrates just that. One useful way to examine the daily cases of these five Georgia counties is to group the daily chart numbers by county and view them simultaneously allowing a user to compare daily trends for each county easily.
From here Speros uses Knarr to examine larger trends in these cases by viewing these counties’ cases and including all dates– not just for that 15-day period. For those that really want to explore large data sets, this is the fun part. He then looks at this same data for all counties in the US.
In the next segment, Speros adds a dataset from the US Census that represents population estimates by US counties. This allows a user to view cases per 100,000 residents– which throws the data into a whole new perspective. In this view, a user can understand the significance of Georgia’s Hall County’s rising number of cases given that its population by comparison with the other counties included in this assessment– is relatively low.
Through analyzing this data in different methods with Knarr, it’s easy to see that the same data can be shown in many different ways– ways that can sometimes change what the reader interprets the data to mean. This is why we believe that data exploration is such a crucial part of the data analysis process. To join us on the data exploration journey, visit knarr.io, and sign up for free to do your own exploring.
Header Image from Georgia Department of Public Health, prior to correction, May 2020