Reverse Path Analysis with R, Shiny, and Google Cloud Run
A few topics I’ve been following have all converged into a single project that I’ll describe here and also demonstrate. I’ve blogged quite a bit about the googleAnalyticsR package which powers a number of my latest projects. The creator of that package, Mark Edmondson, has recently evangelized the use of Google Cloud Run and Google Cloud Build as cheap and effective ways to move your R projects into the cloud. I’ll be honest, until about 3 days ago I had no idea what these products did. I see the light now and am excited to share.
The last relevant thread I’ve been following is a Network Analysis Shiny app created by Search Discovery’s Jamarius Taylor. As soon as I saw his application, I knew I would be able to improve upon Google Analytics’ “User Flow” tool (shown below). If you’ve ever used this tool, you know how much pain one must endure to get any useful information out of it.
Something I always thought was missing from this tool was reverse path analysis. Reverse path analysis is the visualization of user journeys that reach a specific page or fire a specific event. The visual consists of an acyclic graph where nodes represent pages and/or events and edges represent the transition between pages/events during a journey. Size often represents the volume of traffic.
With that in mind, I created a Shiny app here that pulls data from Google Analytics and transforms it into a format necessary to show a reverse path analysis with the visNetwork package. Right now it only looks at pages, but it could be extended to blend pages and events. You can find the code here.
Adding Cloud Run
Extending this project further, I wanted to see exactly where Google Cloud Run might fit in. Google Cloud Run is Google’s serverless container hosting service which makes it dead simple to host your container in the cloud. It offers similar features and scalability as Cloud Functions but removes the restriction on which programming languages and libraries you can use. It can also be compared to Kubernetes Engine but is meant for simpler tasks which makes it easier to configure and deploy.
This all clicked for me after running through @randy3k’s sample Shiny app and pushing it to Cloud Run. Armed with a Dockerfile and a few simple commands, your Shiny app is accessible to the world (with a few restrictions around concurrent users mentioned here). These commands are as follows:
docker build . -t gcr.io/GCP_PROJECT/APP_NAME docker push gcr.io/GCP_PROJECT/APP_NAME gcloud run deploy --image gcr.io/GCP_PROJECT/APP_NAME --platform managed --max-instances 1
That’s it! 3 lines of code and your Shiny app has its own URL and the benefits of GCP’s security, scalabilty, and more.
In case it’s helpful, my Dockerfile is as follows:
FROM rocker/shiny-verse RUN install2.r --error \ googleAnalyticsR \ googleAuthR \ visNetwork COPY app.R /srv/shiny-server/app.R COPY service_account.json /srv/shiny-server/service_account.json COPY shiny-customized.config /etc/shiny-server/shiny-server.conf EXPOSE 8080 USER shiny CMD ["/usr/bin/shiny-server"]
Adding Continuous Integration
The last component I wanted to add was continuous integration so that my Docker container would re-builld and re-deploy to Google Cloud Run after every commit. In my case, this had a very practical benefit in that my main PC uses Windows Home Edition which doesn’t support Docker by default. Using this CI process offloads the work of re-building the Docker container to Google.
Setting up CI was dead simple. Here are the steps:
- Create a Github repository for your code. I made mine private to protect my service account JSON, but further configuration would allow me to pull that JSON from Google Secret Manager
- In the Cloud Run UI, select your application, then click “Set up Continuous Deployment”
- Authenticate with Github and select your repository
- Accept the remaining default settings, ensuring that the ‘Build Type’ is set to ‘Dockerfile’
That’s it! Once these steps are complete, Cloud Run will create an associated “Trigger” in Google Cloud Build which will listen to commits to your repository’s Master branch. On each commit, a new build will commence which will both build and deploy your container to Cloud Run.
Despite learning so much during this project, there’s still plenty left that I don’t understand. In particular, it seems as though Cloud Run offers some powerful options related to environmental variables, pulling from Google Secret Manager, load balancing, and advanced configuration through the application’s cloudbuild.yaml file. In fact, Mark Edmondson has built an entire library focused on building cloudbuild.yaml files and deploying Cloud Run applications called googleCloudRunner that I haven’t even touched.
I expect to follow this thread further. The promise of combining the capabilities of R and cloud services is too compelling not to. Let me know if you have any questions or comments @adamribaudo.