Contributed"> Unlock GA4 Insights: BigQuery SQL Recipes for Key Metrics - The New Stack
TNS
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Data / Observability

Unlock GA4 Insights: BigQuery SQL Recipes for Key Metrics

Use these SQL queries and techniques for extracting key Google Analytics 4 metrics directly from BigQuery, enabling precise, customizable reporting without data thresholds.
Oct 8th, 2024 10:00am by
Featued image for: Unlock GA4 Insights: BigQuery SQL Recipes for Key Metrics
Photo by Mareks Mangūzis on Unsplash.

Google Analytics 4 directly integrates with BigQuery, making it easy to set up a basic data warehouse that can be queried with SQL. However, working with the GA4 schema in practice is challenging. Recreating the same metrics and measurements we’re used to seeing inside the GA interface is much more complex than you’d expect.

One of the most common requests I receive from clients related to BigQuery is help working with GA4 data in BigQuery. Over the years, I’ve built up a library of snippets that can be used in BigQuery to replicate specific GA4 UI metrics closely.

These snippets are tested and confirmed to replicate a GA4 profile. Other sources around the web typically use more complex or expensive queries than what is found here. I’ve used these in practice to consolidate separate GA4 profiles (usually many) into a single optimized table. They should cover many of the common, and even some not-so-common, scenarios most data engineers encounter while working with BigQuery and Google Analytics 4.

These snippets cover many of the most commonly referenced metrics, including:

  • Pageviews
  • Sessions
  • Average session duration
  • Engaged sessions
  • Engagement rate
  • Users
  • Bounces
  • Bounce rate
  • Organic views
  • Organic sessions
  • Organic users
  • Mobile views
  • Mobile sessions
  • Mobile users

This isn’t 100% of what’s available, but in my experience, it covers more than 95% of the requests most people encounter.

SQL Snippets

Here are the specific SELECT statements that can capture these metrics. I’ll include a complete example at the end, but these snippets should make it easier to understand each metric individually. Each is an aggregate clause typically run again daily aggregated data (or a similar period). In practice, you would drop these clauses into a query that looks something like this:


Replacing the {AGGREGATE CLAUSES GO HERE} with the specific ones you want to use. When working with GA4 data, you must understand and query partitioned data again. This is what the _TABLE_SUFFIX where clause is for: it reduces the amount of data a query scans only to include the days we want to be included. Using a simple where clause against the event_date may seem like an obvious alternative, but take note of the query cost and size: it will be substantially less with this approach due to partitioning.

Pageviews

A simple count of the total number of page_view events in a given period.

Sessions

Google combines the user_pseudo_id with a ga_session_id to define a unique session. The user_pseudo_id is an identifier for a unique device with a long time window (up to 2 years). The ga_session_id is generated when a new session starts, which by default is after 30 minutes of inactivity. Combining the two gives you the same method the GA4 UI uses to keep count.

Average Session Duration

A subquery is needed to calculate the average session duration accurately. This subquery gets a list of all unique session IDs and the length they lasted in seconds. Once we have this, we can determine the average by dividing the total seconds for all sessions by the total number of sessions.

Engaged Sessions

To get the total number of engaged sessions, we can filter a list of all unique session IDs to count only the ones marked as “engaged” within the event_params column.

Engagement Rate

Engagement rate is the percentage of all sessions that are engaged, which can be handled in a single clause like this:

Users

The user_pseudo_id ID can be used to find the total number of unique users for a given period. If your GA4 profile uses the user_id event parameter, it’s worth using that instead. Ultimately, it comes down to how your organization wants to define a “user.”

Bounces

Bounces are defined as people who visit and immediately leave, visiting no other pages on the site. The new GA4 UI uses the concept of engaged vs. unengaged sessions. Bounces are direct visits from unengaged sessions. A session can only become engaged on a subsequent request. Based on this, we can safely consider a count of all unengaged sessions as our “bounces” metric.

Bounce Rate

The bounce rate can then be considered the total number of all unengaged sessions divided by the total unique sessions. For example, if a page has 100 tracked visits and 50 marked as engaged, we can divide 50 by 100 to get a bounce rate of 0.5 or 50%.

Organic Metrics

Very similar to our pageview count, this approach filters down the pageview count to include only those with a traffic_source “medium” containing the word “organic.”


Similarly, we can use the same filtering approach with unique sessions and user pseudo-ids to get a list of organic pageviews, sessions, and users during a specific period.

Mobile Sessions

The same pattern can be applied to get total counts of mobile pageviews, sessions, and users.

Full Example

The following BigQuery SQL will generate a report with daily aggregated data for standard web metrics frequently used in GA4, with a date range covering all days in August 2024.

I’ve replaced the BigQuery project and dataset names in the example SQL below, so if you’re trying to replicate this in your BigQuery environment, you’ll need to update/replace the following values:

  • bigquery-project-id – The BigQuery project ID you’re working in
  • {GA4-PROFILE-ID} – The GA4 profile ID you want to report on
  • 20240801 – Start date for the report (YYYYMMDD)
  • 20240831 – End date for the report (YYYYMMDD)

After updating these aspects with your own values, you should be able to run the query and get results similar to those shown in the screenshot above.


The result is a tabular report that can easily be pulled into a data visualization tool like Tableau or Looker to generate an automated analytics dashboard. By using a direct query of BigQuery data, we’re avoiding thresholds, so while the amounts you see returned will be slightly off by a margin of 1-2% (depending on your data), they are actually more accurate than what is shown in the GA4 interface because they are precise counts.

Conclusion

This should provide you with a few solid examples of how to extract GA4 metrics directly from BigQuery, avoiding data thresholds, putting you in control of your data, and allowing you to report precisely what you need.

Group Created with Sketch.
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.