What I Learned Recreating 29 Storytelling with Data Charts in R
I’m putting the finishing touches on a project that I started just over a month ago: Re-creating the visualizations of Storytelling with Data using R. All the code is available on Github here. If you’re not familiar with the Storytelling with Data (SWD) book, it’s a master-class in communication through data and has become a must-read for data professionals. The lessons shared in this book have inspired many to prioritize context, reducing clutter, and focusing your audience’s attention through color, size, and position. A great example is shown below.
The author, Cole Naussbaumer Knaflic, has also built a community portal that allows data visualization enthusiasts to collaborate and critique each other’s work.
Recently, I’ve discussed how R can be an analyst’s best friend. While I firmly believe that to be the case, I also like to challenge that notion. Are there areas where R falls apart? Can it do everything an analyst needs to do? It was through this critical lens that I thought to put R to the test and see if I could re-create all of the charts in SWD. The results are in and I’d love to share what I learned along the way.
This project wouldn’t have been possible were it not for the fact that the book’s author provided the raw data and original charts (built in Excel no less!). Before finding that link, I had considered estimating each chart’s data points by sight which would have been insane! It demonstrates, however, how excited I was to get started.
The beauty of working with R’s ggplot2 library (which stands for ‘grammar of graphics’) is that any plot can be deconstructed into its constituent parts which then have corresponding functions. Those parts are:
- geometry (ex: line, bar, point, text)
- scale (ex: x-axis, y-axis, color, fill)
- mapping of data to scales (ex: car type -> x-axis)
- theme (ex: Title font, caption color)
This meant that, over time, it became second nature to look at an SWD chart and pick out the corresponding ggplot2 functions that would be necessary. Here’s some of the before and after output:
If you can dream it, you can build it in ggplot2
The results speak for themselves. That I (and my collaborator, who I’ll introduce later) were able to recreate each of the plots with near-pixel-perfection shows the power and flexibility of ggplot2. The breadth and diversity of chart types covered in SWD provides compelling evidence that ggplot2 can tackle anything you throw at it. This is often true when using standard ggplot2 functions, but especially true when you start to consider the add-on libraries such as ggtext and grid that can handle edge cases and unique plotting requirements. That said…
Just because you can dream it doesn’t mean that you *should* build it in ggplot2
It became clear after manually positioning annotation text for the 40th time that there are some functions that are best left for WYSIWYG presentation software (like PowerPoint) rather than code. If an element can’t be mapped back to data and is difficult to arrange in code, it may be better to leave it for a downstream tool. The broader question one must keep in mind is: Can my chart accept new data points and how much effort must I expend to update it? In the case of annotations, you can easily retype and arrange a line of text in PowerPoint if the data changes so it passes this test.
Consistent practice is the key to mastery
While I have much left to learn, my level of comfort and ability with ggplot2 before and after this project is incomparable. I went from someone who had to Google every single function and parameter to someone who could smoothly generate visualizations and transformations as they came to mind. The key was (almost) daily practice as well as having some external motivation from a collaborator who joined half-way through. Speaking of collaboration…
The data viz and R communities are strong. Whenever possible, collaborate and seek out new perspectives
When I started this project, it was meant as a personal challenge. As an afterthought, I posted on the SWD forums asking if anyone else would want to share in the fun. Thankfully, Wal McConnell, took me up on that offer and helped in ways I didn’t realize were needed. It taught me an important lesson about R: no single person can keep track of all its packages and capabilities, so you’ll produce your best work when you can collaborate with others.
A few key tidyverse transformations go a long way
Last, I’ll share some more technical tips related to the data and visual transformations I came back to over and over again. These are the tidyverse functions I had a passing familiarity with but know by heart at this point. If you’re only going to memorize a few functions, these are the ones.
- Data Transformation
- pivot_longer / pivot_wider – Your first step in transforming data is making sure it’s tidy. This often involves some sort of pivot.
- case_when to create switch statements that manipulate/mutate values based on conditions
- forcats::fct_relevel() – ggplot2 takes its cues from factor levels when ordering elements. If you need a different order, you often need to go back to the original factor.
Did anyone ask for this project? Absolutely not. Did I gain a ton of experience that I’ll apply to my professional work? Yes! It goes to show that it’s more important to set goals than to fret over which goals to set. I hope this helps inspire others to take on similar challenges.
You can find the GitHub repository with code and images here: https://github.com/adamribaudo/storytelling-with-data-ggplot