As of 1/17/17, we now refer to "apps" as "projects."
For almost any product, there are key sequences of events that users should progress through in order to successfully begin or continue using the product; this sequence is commonly called a "funnel".
For example, for a messaging app, the key initial funnel might have three steps:
(1) The "openApp" event
(2) The "viewMessage" event
(3) The "sendMessage" event
In this section, we will demonstrate how to do funnel analysis in Redshift by using the three-stage texting app funnel described above as an example. To do this, we will create each step in the funnel as a SQL "View" - essentially a saved query that we can use without retyping the query. The values in red are what you will need to replace with your own.
Table of Contents
- Query Objective: Obtain a List of Users for Each Step of a Funnel
- Query Objective: Adding Steps to a Funnel
- Query Objective: Getting the List of Users Who Did (or Did Not) Reach a Step in a Funnel
- Query Objective: Funnels Where Users Did Event X, Then Y, With No Other Events in Between
- Query Objective: Funnels Where Users Did Event Y After Event X, Within 24 Hours of Event X
Note: Tracking the number of users who make it (and do not make it) to each stage in a funnel is crucial, as it identifies which parts of your product's user experience flow are smooth and which parts are bottlenecks that need improvement.
Query Objective: Obtain a List of Users for Each Step of a Funnel
CREATE VIEW Funnel_Step_1 AS ( SELECT DISTINCT user_id FROM app123.openApp WHERE DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06' );
This view, which we name 'Funnel_Step_1', captures the users who opened the product during March 1st and 2nd. Next, we use the 'Funnel_Step_1' view to construct the view for the second step in the funnel:
CREATE VIEW Funnel_Step_2 AS ( SELECT DISTINCT app123.viewMessage.user_id FROM app123.viewMessage INNER JOIN Funnel_Step_1 ON app123.viewMessage.user_id = Funnel_Step_1.user_id WHERE DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06' );
'Funnel_Step_2' captures the subset of the users from 'Funnel_Step_1' who also did the 'viewMessage' event during the first two days of March; that is, the users who did both the 'openApp' and 'viewMessage' events. Finally, we use 'Funnel_Step_2' to construct the view for the third step of the funnel:
CREATE VIEW Funnel_Step_3 AS ( SELECT DISTINCT app123.sendMessage.user_id FROM app123.sendMessage INNER JOIN Funnel_Step_2 ON app123.sendMessage.user_id = Funnel_Step_2.user_id WHERE DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06' );
'Funnel_Step_3' captures the subset of the users from 'Funnel_Step_2' (which, recall, is itself a subset of users from 'Funnel_Step_1') who also did the 'sendMessage' event during the first two days of March.
Now that we have created the views for our funnel, we can analyze each step. First, we can look at the count of users who made it to steps 1, 2, and 3, respectively, using the queries:
SELECT count(*) FROM Funnel_Step_1; SELECT count(*) FROM Funnel_Step_2; SELECT count(*) FROM Funnel_Step_3;
Query Objective: Adding Steps to a Funnel
While our example funnel here has three steps, you can add as many steps to your funnel as you'd like. Let’s add a step to our above funnel:
CREATE VIEW Funnel_Step_4 AS ( SELECT DISTINCT app123.NEXT_EVENT.user_id FROM app123.NEXT_EVENT INNER JOIN Funnel_Step_3 ON app123.NEXT_EVENT.user_id = Funnel_Step_3.user_id WHERE DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06' );
Query Objective: Getting the List of Users Who Did (or Did Not) Reach a Step in a Funnel
In addition to getting the counts of users for each step in the funnel, you can also get the list of user_ids for the users who did (or did not) reach a given step in the funnel.
To get the list of users who reached step X but then did not reach step X+1 -- referred to as users who “dropped off” the funnel at step X+1 -- use the query below. Here we obtain the users who reached step 2 of our example funnel (so they did the 'openApp' and 'viewMessage' events) but did not reach step 3 (so they did not do the 'sendMessage' event):
SELECT Funnel_Step_2.user_id FROM Funnel_Step_2 LEFT JOIN Funnel_Step_3 ON Funnel_Step_2.user_id = Funnel_Step_3.user_id WHERE Funnel_Step_3.user_id IS NULL;
Query Objective: Funnels Where Users Did Event X, then Y, With No Other Events in Between
In our dashboard, users are counted as converted as long as they complete the next funnel step on the same day or up until 30 days they have entered the funnel. To get a list of users who did your first step in the funnel and immediately proceeded to do the next event, we will need to start using partition functions. Let's say we are looking at a funnel with events 'openApp' → 'viewMessage', and we only want to look at the number of users who did 'viewMessage' immediately after 'openApp', with no other events in between.
In this case, we must query on the events table instead of individual event tables as the individual events table does not give us information on what events immediately follows.
To get a list of the number of users who did the 'openApp' event, use the query:
SELECT COUNT(DISTINCT amplitude_id) FROM( SELECT amplitude_id, event_type, event_time, LEAD(event_type, 1) OVER(PARTITION BY amplitude_id ORDER BY event_time) AS next_event_type FROM events123) WHERE next_event_type= 'viewMessage' AND event_type='openApp' AND DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06'
The inner subquery selects amplitude_id, event_type, and event_time along with the PARTITION function. PARTITION is similar to the group function, but it does not aggregate the IDs (each row with the same amplitude_id stays independent). In addition, within each partition we have chosen to sort by event_time, so the event_time is sorted from earliest to latest. The LEAD function with value 1 returns the value for the row that is one after the current row and the AS function names that column next_event_type. Note the LEAD function only works within the partition (see the null values in the sample table below).
From this resulting table, we are only selecting the rows where the next_event_type has the value 'viewMessage' (second event in the funnel) and the event_type with the value 'openApp' (first event in the funnel) AND only events on March 1st and March 2nd.
A simplified example of the partition function can be seen below:
In this above example, we would have two rows and two users who would satisfy this requirement (users A and B, row 1 and row 4.)
Query Objective: Funnels Where Users Did Event Y After Event X, Within 24 Hours of Event X
CREATE OR REPLACE VIEW openApp_funnel_1 AS
SELECT * FROM (
SELECT amplitude_id, event_time, row_number() OVER
(PARTITION BY amplitude_id ORDER BY event_time ASC)
AND DATE(event_time) BETWEEN '2017-03-30' AND '2017-04-06')
WHERE row_number = 1;
The inner SELECT creates a table with the amplitude_id and the time at which the user did the 'openAPP' event. The table is partitioned by amplitude_id and within each partition, the event times are sorted from least to greatest. Each row in each partition is given a row number. The outer SELECT picks only the first row of each partition -- this is the first time the user did the 'openApp' event in the given window.
The inner SELECT makes a table that looks like this:
The outer SELECT makes a table that looks like this:
CREATE OR REPLACE VIEW openApp_funnel_2 AS SELECT DISTINCT(amplitude_id) FROM ( SELECT openApp_funnel_1.amplitude_id, DATEDIFF('milliseconds', openApp_funnel_1.event_time, app123.viewMessage.event_time) AS dt FROM openApp_funnel_1 INNER JOIN app123.viewMessage ON openApp_funnel_1.amplitude_id = app123.viewMessage.amplitude_id WHERE DATE(app123.viewMessage.event_time) BETWEEN '2017-03-30' AND '2017-04-06') WHERE dt > 0 AND dt <= 86400000;
The inner SELECT JOINs the openApp_funnel_1 table with the 'viewMessage' events table on amplitude_id. It selects the amplitude_id and the difference of time ('dt') between the 2nd event and the 1st event. However, for the time difference, we have to use the DATEDIFF() function because Redshift does not recognize intervals (the output you would get if you just normally subtracted the dates). In the WHERE clause, the upper bound is +1 day because the activation event could happen during the next day.
The outer SELECT picks just the amplitude_id's where the difference is greater than 0 milliseconds (meaning the 2nd event happened after the first event) and less than 86400000 milliseconds (1 day).
SELECT COUNT(*) FROM openApp_funnel_1; SELECT COUNT(*) FROM openApp_funnel_2;
To get a weekly rate, divide the 2nd value by the 1st value.