Noise to Signal

◂ Blog



What I Learned Recreating 29 Storytelling with Data Charts in R

What I Learned Recreating 29 Storytelling with Data Charts in R


I’m putting the finishing touches on a project that I started just over a month ago: Re-creating the visualizations of Storytelling with Data using R. All the code is available on Github here. If you’re not familiar with the Storytelling with Data (SWD) book, it’s a master-class in communication through data and has become a must-read for data professionals. The lessons shared in this book have inspired many to prioritize context, reducing clutter, and focusing your audience’s attention through color, size, and position. A great example is shown below.

Employee feedback over time 
96% 
Survey category Percent favorable 
91% 
peers 85% 
Culture 
Work environment 76% 75% 
Leadership 59% 62% 
Career development 49% 
Rewards & recognition 41% 
Perf management 33% 
2014 
Survey Year 
33% 
2015

The author, Cole Naussbaumer Knaflic, has also built a community portal that allows data visualization enthusiasts to collaborate and critique each other’s work.

Recently, I’ve discussed how R can be an analyst’s best friend. While I firmly believe that to be the case, I also like to challenge that notion. Are there areas where R falls apart? Can it do everything an analyst needs to do? It was through this critical lens that I thought to put R to the test and see if I could re-create all of the charts in SWD. The results are in and I’d love to share what I learned along the way.

This project wouldn’t have been possible were it not for the fact that the book’s author provided the raw data and original charts (built in Excel no less!). Before finding that link, I had considered estimating each chart’s data points by sight which would have been insane! It demonstrates, however, how excited I was to get started.

The beauty of working with R’s ggplot2 library (which stands for ‘grammar of graphics’) is that any plot can be deconstructed into its constituent parts which then have corresponding functions. Those parts are:

  • geometry (ex: line, bar, point, text)
  • scale (ex: x-axis, y-axis, color, fill)
  • mapping of data to scales (ex: car type -> x-axis)
  • theme (ex: Title font, caption color)

This meant that, over time, it became second nature to look at an SWD chart and pick out the corresponding ggplot2 functions that would be necessary. Here’s some of the before and after output:

Original 
Cost per mile by miles driven 
o 
$3.00 
$2.50 
$2.00 
$1.00 
$0.50 
$0.00 
AVG 
2,000 
ggplot 
Cost per mile by miles driven 
¯ $3.00 
$2.50 
o 
$2.00 
$1.50 
$1.00 
$0.50 
$0.00 
1 ,ooo 
Miles driven per month 
3,000 
4,000 
1 ,ooo 
Miles driven per month 
2,000 
3,000 
4,000
Original 
Ticket volume over time 
ggplot 
Ticket volume over time 
2 employees quit in May. We nearly kept up with 
incoming bolume in the following two months, but fell 
behind with the increase in Aug and haven't been able 
g 300 
250 
-a 
200 
z 
150 
100 
50 
2 employees quit in May. We nearly kept up with 
incoming volume in the following two months, but fell 
behind with the increase in Aug and haven't been able 
to catch up since. 
160 
156 
126 
177 
139 
104 
149 
140 
124 
Received 
Processed 
300 
(D 250 
z 
200 
150 
100 
50 
to catch up since 
202 
60 
139 
156 
126 
104 
149 
124 
177 
Received 
Processed 
140 
Dec 
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
2014 
Data source: XYZ Dashboard, as of 12/31/2014 | A detailed analysis on tickets processed per 
person and time to resolve issues was undertaken to inform this request and can be provided if 
Jul Aug Sep Oct Nov 
Jan Feb Mar Apr May Jun 
2014 
Data source: XYZ Dashboard as of 12/31/2014 A detailed analysis on tickets processed per 
person and time to resolve issues was undertaken to inform this request and can be provided.

Lessons Learned

If you can dream it, you can build it in ggplot2

The results speak for themselves. That I (and my collaborator, who I’ll introduce later) were able to recreate each of the plots with near-pixel-perfection shows the power and flexibility of ggplot2. The breadth and diversity of chart types covered in SWD provides compelling evidence that ggplot2 can tackle anything you throw at it. This is often true when using standard ggplot2 functions, but especially true when you start to consider the add-on libraries such as ggtext and grid that can handle edge cases and unique plotting requirements. That said…

Just because you can dream it doesn’t mean that you *should* build it in ggplot2

It became clear after manually positioning annotation text for the 40th time that there are some functions that are best left for WYSIWYG presentation software (like PowerPoint) rather than code. If an element can’t be mapped back to data and is difficult to arrange in code, it may be better to leave it for a downstream tool. The broader question one must keep in mind is: Can my chart accept new data points and how much effort must I expend to update it? In the case of annotations, you can easily retype and arrange a line of text in PowerPoint if the data changes so it passes this test.

Consistent practice is the key to mastery

While I have much left to learn, my level of comfort and ability with ggplot2 before and after this project is incomparable. I went from someone who had to Google every single function and parameter to someone who could smoothly generate visualizations and transformations as they came to mind. The key was (almost) daily practice as well as having some external motivation from a collaborator who joined half-way through. Speaking of collaboration…

The data viz and R communities are strong. Whenever possible, collaborate and seek out new perspectives

When I started this project, it was meant as a personal challenge. As an afterthought, I posted on the SWD forums asking if anyone else would want to share in the fun. Thankfully, Wal McConnell, took me up on that offer and helped in ways I didn’t realize were needed. It taught me an important lesson about R: no single person can keep track of all its packages and capabilities, so you’ll produce your best work when you can collaborate with others.

A few key tidyverse transformations go a long way

Last, I’ll share some more technical tips related to the data and visual transformations I came back to over and over again. These are the tidyverse functions I had a passing familiarity with but know by heart at this point. If you’re only going to memorize a few functions, these are the ones.

  • Data Transformation
    • pivot_longer / pivot_wider – Your first step in transforming data is making sure it’s tidy. This often involves some sort of pivot.
    • case_when to create switch statements that manipulate/mutate values based on conditions
    • forcats::fct_relevel() – ggplot2 takes its cues from factor levels when ordering elements. If you need a different order, you often need to go back to the original factor.
  • ggplot2

Did anyone ask for this project? Absolutely not. Did I gain a ton of experience that I’ll apply to my professional work? Yes! It goes to show that it’s more important to set goals than to fret over which goals to set. I hope this helps inspire others to take on similar challenges.

You can find the GitHub repository with code and images here: https://github.com/adamribaudo/storytelling-with-data-ggplot




Author

Adam Ribaudo


Adam Ribaudo is the owner and founder of Noise to Signal LLC. He works with clients to ensure that their marketing technologies work together to provide measurable outcomes.

Discussion

01. Nam Nguyen


Incredible project! Thank you to you both for sharing your work. This will definitely be bookmarked as a R Visualization Cookbook for me!

Leave a Reply



Home   Blog   Portfolio   Contact  

Bringing clarity to marketers in a noisy world © 2020