Unlock GA4 Insights: BigQuery SQL Recipes for Key Metrics
Google Analytics 4 directly integrates with BigQuery, making it easy to set up a basic data warehouse that can be queried with SQL. However, working with the GA4 schema in practice is challenging. Recreating the same metrics and measurements we’re used to seeing inside the GA interface is much more complex than you’d expect.
One of the most common requests I receive from clients related to BigQuery is help working with GA4 data in BigQuery. Over the years, I’ve built up a library of snippets that can be used in BigQuery to replicate specific GA4 UI metrics closely.
These snippets are tested and confirmed to replicate a GA4 profile. Other sources around the web typically use more complex or expensive queries than what is found here. I’ve used these in practice to consolidate separate GA4 profiles (usually many) into a single optimized table. They should cover many of the common, and even some not-so-common, scenarios most data engineers encounter while working with BigQuery and Google Analytics 4.
These snippets cover many of the most commonly referenced metrics, including:
- Pageviews
- Sessions
- Average session duration
- Engaged sessions
- Engagement rate
- Users
- Bounces
- Bounce rate
- Organic views
- Organic sessions
- Organic users
- Mobile views
- Mobile sessions
- Mobile users
This isn’t 100% of what’s available, but in my experience, it covers more than 95% of the requests most people encounter.
SQL Snippets
Here are the specific SELECT
statements that can capture these metrics. I’ll include a complete example at the end, but these snippets should make it easier to understand each metric individually. Each is an aggregate clause typically run again daily aggregated data (or a similar period). In practice, you would drop these clauses into a query that looks something like this:
1 2 3 4 5 6 7 |
select event_date, {AGGREGATE CLAUSES GO HERE} from `bigquery-project-id.analytics_{GA4-PROFILE-ID}.events_*` where _TABLE_SUFFIX BETWEEN '20240801' AND '20240831' group by 1 order by event_date asc |
Replacing the {AGGREGATE CLAUSES GO HERE}
with the specific ones you want to use. When working with GA4 data, you must understand and query partitioned data again. This is what the _TABLE_SUFFIX
where clause is for: it reduces the amount of data a query scans only to include the days we want to be included. Using a simple where clause against the event_date
may seem like an obvious alternative, but take note of the query cost and size: it will be substantially less with this approach due to partitioning.
Pageviews
A simple count of the total number of page_view
events in a given period.
1 |
countIF(event_name = 'page_view') AS views |
Sessions
Google combines the user_pseudo_id
with a ga_session_id
to define a unique session. The user_pseudo_id
is an identifier for a unique device with a long time window (up to 2 years). The ga_session_id
is generated when a new session starts, which by default is after 30 minutes of inactivity. Combining the two gives you the same method the GA4 UI uses to keep count.
1 2 3 4 |
count(distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) ) AS sessions |
Average Session Duration
A subquery is needed to calculate the average session duration accurately. This subquery gets a list of all unique session IDs and the length they lasted in seconds. Once we have this, we can determine the average by dividing the total seconds for all sessions by the total number of sessions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
WITH sessions AS ( SELECT CONCAT( user_pseudo_id, (SELECT VALUE.int_value FROM UNNEST(event_params) WHERE KEY = 'ga_session_id') ) AS session_id, (MAX(event_timestamp) - MIN(event_timestamp)) / 1000000 AS session_length_in_seconds FROM `ntg-consolidated-analytics.analytics_331827279.events_*` WHERE _table_suffix = '20240914' GROUP BY session_id ) round((select sum(session_length_in_seconds) / count(distinct session_id) from sessions), 2) AS avg_session_duration |
Engaged Sessions
To get the total number of engaged sessions, we can filter a list of all unique session IDs to count only the ones marked as “engaged” within the event_params
column.
1 2 3 4 5 6 |
count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end ) as engaged_sessions |
Engagement Rate
Engagement rate is the percentage of all sessions that are engaged, which can be handled in a single clause like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
round(safe_divide(count(distinct case WHEN (select value.string_value from unnest(event_params) where key='session_engaged') = '1' then concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) end), count(distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ))), 4 ) AS engagement_rate |
Users
The user_pseudo_id
ID can be used to find the total number of unique users for a given period. If your GA4 profile uses the user_id
event parameter, it’s worth using that instead. Ultimately, it comes down to how your organization wants to define a “user.”
1 |
count(distinct user_pseudo_id) AS users |
Bounces
Bounces are defined as people who visit and immediately leave, visiting no other pages on the site. The new GA4 UI uses the concept of engaged vs. unengaged sessions. Bounces are direct visits from unengaged sessions. A session can only become engaged on a subsequent request. Based on this, we can safely consider a count of all unengaged sessions as our “bounces” metric.
1 2 3 4 5 6 |
count(distinct concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id'))) - count(distinct case WHEN (select value.string_value from unnest(event_params) where key='session_engaged') = '1' then concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) end) AS bounces |
Bounce Rate
The bounce rate can then be considered the total number of all unengaged sessions divided by the total unique sessions. For example, if a page has 100 tracked visits and 50 marked as engaged, we can divide 50 by 100 to get a bounce rate of 0.5 or 50%.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
round( safe_divide( count( distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) )-count( distinct case WHEN ( select value.string_value from unnest(event_params) where key='session_engaged' )='1' then concat( user_pseudo_id, ( select value.int_value from unnest(event_params) where key='ga_session_id' ) ) end ), count( distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) ) ), 4 ) AS bounce_rate |
Organic Metrics
Very similar to our pageview count, this approach filters down the pageview count to include only those with a traffic_source “medium” containing the word “organic.”
1 |
countIF(event_name = 'page_view' and traffic_source.medium = 'organic') AS organic_views |
Similarly, we can use the same filtering approach with unique sessions and user pseudo-ids to get a list of organic pageviews, sessions, and users during a specific period.
1 |
count(distinct concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id' and traffic_source.medium like '%organic%'))) AS organic_sessions |
Mobile Sessions
The same pattern can be applied to get total counts of mobile pageviews, sessions, and users.
1 |
count(distinct concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id' and device.category = 'mobile'))) AS mobile_sessions |
Full Example
The following BigQuery SQL will generate a report with daily aggregated data for standard web metrics frequently used in GA4, with a date range covering all days in August 2024.
I’ve replaced the BigQuery project and dataset names in the example SQL below, so if you’re trying to replicate this in your BigQuery environment, you’ll need to update/replace the following values:
bigquery-project-id
– The BigQuery project ID you’re working in{GA4-PROFILE-ID}
– The GA4 profile ID you want to report on20240801
– Start date for the report (YYYYMMDD)20240831
– End date for the report (YYYYMMDD)
After updating these aspects with your own values, you should be able to run the query and get results similar to those shown in the screenshot above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
with sessions AS ( select concat( user_pseudo_id, (select value.int_value from unnest(event_params) where KEY = 'ga_session_id') ) AS session_id, (max(event_timestamp) - min(event_timestamp)) / 1000000 AS session_length_in_seconds from `bigquery-project-id.analytics_{GA4-PROFILE-ID}.events_*` where _table_suffix = '20240914' group by session_id ) select event_date, countIF(event_name = 'page_view') AS views, count(distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) ) AS sessions, round((select sum(session_length_in_seconds) / count(distinct session_id) from sessions), 2) AS avg_session_duration, count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end ) as engaged_sessions, round(safe_divide(count(distinct case WHEN (select value.string_value from unnest(event_params) where key='session_engaged') = '1' then concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) end), count(distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ))), 4 ) AS engagement_rate, count(distinct user_pseudo_id) AS users, count(distinct concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id'))) - count(distinct case WHEN (select value.string_value from unnest(event_params) where key='session_engaged') = '1' then concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) end) AS bounces, round( safe_divide( count( distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) )-count( distinct case WHEN ( select value.string_value from unnest(event_params) where key='session_engaged' )='1' then concat( user_pseudo_id, ( select value.int_value from unnest(event_params) where key='ga_session_id' ) ) end ), count( distinct concat( user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id') ) ) ), 4 ) AS bounce_rate, countIF(event_name = 'page_view' and traffic_source.medium = 'organic') AS organic_views, count(distinct concat(user_pseudo_id, (select value.int_value from unnest(event_params) where key='ga_session_id' and traffic_source.medium like '%organic%'))) AS organic_sessions, count(distinct IF(traffic_source.medium like '%organic%', user_pseudo_id, null)) AS organic_users, count(distinct IF(traffic_source.medium like '%organic%' and device.category = 'mobile', user_pseudo_id, null)) AS organic_mobile_users, count(IF (device.category = 'mobile' and event_name = 'page_view' and traffic_source.medium like '%organic%', 1, null)) AS organic_mobile_views, count(IF (device.category = 'mobile' and event_name = 'page_view', 1, null)) AS mobile_views, count(distinct (select value.int_value from unnest(event_params) where key='ga_session_id' and device.category='mobile') ) AS mobile_sessions, count( IF ( ( select value.string_value from unnest(event_params) where key='session_engaged' and device.category='mobile' )='1', 1, null ) ) AS mobile_engaged_sessions from `bigquery-project-id.analytics_{GA4-PROFILE-ID}.events_*` where _TABLE_SUFFIX BETWEEN '20240801' AND '20240831' group by 1 order by event_date asc |
The result is a tabular report that can easily be pulled into a data visualization tool like Tableau or Looker to generate an automated analytics dashboard. By using a direct query of BigQuery data, we’re avoiding thresholds, so while the amounts you see returned will be slightly off by a margin of 1-2% (depending on your data), they are actually more accurate than what is shown in the GA4 interface because they are precise counts.
Conclusion
This should provide you with a few solid examples of how to extract GA4 metrics directly from BigQuery, avoiding data thresholds, putting you in control of your data, and allowing you to report precisely what you need.