How Many Twitter Accounts Are Selected in Tailored Posts Sessions?

This question came to me last week from one of our product managers. Let’s set about answering it! To do so, we’ll gather updates sent in the past months from Tailored Posts sessions, calculate the average number of Twitter profiles selected for each user, then average that average. As of today, Tailored Posts has been rolled out to around 50% of Buffer users.

Findings

The vast majority of Tailored Posts sessions that include at least one Twitter profile selected only have a single Twitter profile selected. Only around 8% of sessions in the past month have had more than one Twitter profile selected. This still equates to millions of sessions – my back of the napkin estimate is around one million sessions with multiple Twitter profiles selected.

This makes sense to me, since most sessions come from users on the free plan. These users are less likely, or even unable, to have multiple Twitter profiles selected.

Data collection

We’ll need to load the libraries we use later on.

# load libraries
library(buffer)
library(dplyr)
library(ggplot2)
library(hrbrthemes)
library(scales)

Now let’s collecte the updates.

select 
  m.update_id
  , m.composer_type
  , m.composer_session_created_at
  , m.composer_session_id
  , up.profile_id
  , up.profile_service
  , up.user_id
  , u.billing_plan
from dbt.multiple_composer_updates as m
inner join dbt.updates as up on m.update_id = up.id
inner join dbt.users as u on up.user_id = u.user_id
where m.composer_type in ('multiple_extension', 'multiple_web_dashboard')
and m.composer_session_created_at >= (current_date - 8)
and up.profile_service = 'twitter'

Great, we have two million Twitter updates to work with. At this point we can start grouping by the user_id and composer_session_id fields to find the number of Twitter profiles selected in each session.

# group by composer session
by_session <- posts %>% 
  group_by(user_id, billing_plan, composer_session_id) %>% 
  summarise(profiles = n_distinct(profile_id))

# group by user
by_user <- by_session %>% 
  group_by(user_id, billing_plan) %>% 
  summarise(avg_profiles = mean(profiles))

Exploratory Analysis

Now we can plot the overall distribution of the number of Twitter profiles selected. We can see in the graphs below that the vast majority of tailored posts sessions have only one Twitter profile selected.

We can also plot the cumulative distribution function (CDF). We can see that over 90% of sessions in which at least one Twitter profile was selected, only a single Twitter profile was selected. Around 96% of sessions have two or less Twitter profiles selected, and around 98% of sessions had five or less Twitter profiles selected.

We can also count the absolute number of sessions in which at least two Twitter profiles were selected.

# count sessions with at least two twitter profiles
table(by_session$profiles >= 2)
## 
##   FALSE    TRUE 
## 1395682  109713

Let’s also look at the absolute number of sessions in which at least five Twitter profiles were selected.

# count sessions with at least five twitter profiles
table(by_session$profiles >= 5)
## 
##   FALSE    TRUE 
## 1467526   37869

We can break this down by plan.

# determine if five profiles are selected
by_session <- by_session %>% 
  mutate(has_two_profiles = profiles > 1,
         has_five_profiles = profiles >= 5)

# break down by plan
by_session %>% 
  group_by(billing_plan, has_five_profiles) %>% 
  summarise(users = n_distinct(user_id), 
            sessions = n_distinct(composer_session_id)) %>% 
  filter(has_five_profiles)
## # A tibble: 9 x 4
## # Groups:   billing_plan [9]
##   billing_plan  has_five_profiles users sessions
##   <chr>         <lgl>             <int>    <int>
## 1 agency        T                    21     1115
## 2 awesome       T                   109     3572
## 3 business      T                    19     3524
## 4 enterprise200 T                     1       10
## 5 enterprise300 T                     1       14
## 6 enterprise400 T                     1       88
## 7 enterprise600 T                     1        1
## 8 individual    T                    78    19342
## 9 small         T                    60    10203

Let’s count the number of users that had at least one session in which multpiple Twitter profiles were selected.

by_session %>% 
  group_by(has_two_profiles) %>% 
  summarise(users = n_distinct(user_id), 
            sessions = n_distinct(composer_session_id)) %>%  
  filter(has_two_profiles)
## # A tibble: 1 x 3
##   has_two_profiles users sessions
##   <lgl>            <int>    <int>
## 1 T                 4381   109713

Let’s try a slightly different approach. Instead of plotting the distribution of the number of Twitter profiles selected for all composer sessions, we can look at the distribution of the average number of Twitter profiles selected per user. This way, users that have posted very frequently will have less of an influence on the distribution.

Let’s see how many users had two or more Twitter profiles selected on average in the past week.

# how many users had two or more Twitter profiles selected on average
table(by_user$avg_profiles > 2)
## 
## FALSE  TRUE 
## 90361   291
# determine if two profiles are selected
by_user <- by_user %>% 
  mutate(has_two_profiles = avg_profiles >= 2,
         has_three_profiles = avg_profiles >= 3)

# break down by plan
by_user %>% 
  group_by(billing_plan, has_three_profiles) %>% 
  summarise(users = n_distinct(user_id)) %>% 
  filter(has_three_profiles)
## # A tibble: 8 x 3
## # Groups:   billing_plan [8]
##   billing_plan  has_three_profiles users
##   <chr>         <lgl>              <int>
## 1 agency        T                     16
## 2 awesome       T                     43
## 3 business      T                      7
## 4 enterprise200 T                      1
## 5 enterprise300 T                      1
## 6 enterprise400 T                      1
## 7 individual    T                     51
## 8 small         T                     22

Average Number of Twitter Profiles Selected Per User

We’ll use the same techniques, but use the by_user dataframe we created earlier. We can see that the distribution is even more heavily skewed to the left!

The CDF below is pretty interesting, around 99% of users have had two or less Twitter profiles selected (on average) in thier Tailored Posts sessions.

Removing Free Plan Users

What would the distribution look like for users on paid plans? Similar story here.