Revisiting Churn Surveys

A Tidy Text Analysis

Back in July, I analyzed responses from Buffer’s churn surveys. In this post we’ll recreate that analysis with more recent data. The goal is to see if general themes and trends have changed over time. It will also help remind us of the reasons why people choose to leave Buffer.

We’ll use data collected from four separate surveys that represents different types of churn:

  • The exit survey prompts users to explain why they are abandoning the Buffer product completely.
  • The business churn survey prompts users to explain why they are canceling their business subscriptions.
  • The awesome downgrade survey prompts users to explain why they are canceling their awesome subscriptions.
  • The business downgrade awesome survey asks why users downgrade from a Business to an Awesome subscription.

We’ve gathered the data in this look. We can use the get_look() function from the buffer package to import all of the survey responses into a ataframe.

# get data from looker
responses <- get_look(3949)

Now we just need to clean the data a bit.

# rename columns
colnames(responses) <- c('created_at', 'user_id', 'type', 'reason', 'specifics', 'details')

# set reasons as character vectors
responses$specifics <- as.character(responses$specifics)
responses$details <- as.character(responses$details)

Now we’ll set the dates and remove null values.

# set date as a date object
responses$created_at <- as.Date(responses$created_at, format = '%Y-%m-%d')

# get the month
responses <- responses %>%
  mutate(month = floor_date(created_at, unit = 'months'))

# remove the reason and specifics columns
responses$reason <- NULL
responses$specifics <- NULL

# remove NA values
responses <- responses %>%
  filter(details != "" & details != '[No reason supplied]' & details != 'false')

We’re down to around 18 thousand responses from November 2015 until January 2018! We’re now ready to do some exploratory analysis on the responses.

Tidy Text

We define the tidy text format as being a table with one-token-per-row. A token can be a word or an n-gram. Within our tidy text framework, we need to both break the comments into individual tokens and transform it to a tidy data structure.

To do this, we use tidytext’s unnest_tokens() function from the tidytext package. This breaks the churn survey responses into individual words and includes one word per row while retaining the attributes (survey type, user_id, etc) of that word.

# unnest the tokens
text_df <- responses %>%
  unnest_tokens(word, details)

Now that the data is in one-word-per-row format, we can manipulate it with tidy tools like dplyr. Often in text analysis, we will want to remove stop words; stop words are words that are not useful for an analysis, typically extremely common words such as “the”, “of”, “to”, and so forth in English. We can remove stop words (kept in the tidytext dataset stop_words) with an anti_join().

First, let’s remove a couple useful words from the stop_words dataset. We want to keep “not”, “no”, “too”, “does”, and “doesn’t”, “can” and “can’t”.

# collect stop words
data(stop_words)

# words to keep
keep_words <- c("not", "no", "too", "does", "doesn't", "can", "can't")

# limit stop words
stop_words <- stop_words %>% 
  filter(!(word %in% keep_words))

# remove stop words from our dataset
text_df <- text_df %>%
  anti_join(stop_words, by = "word")

We now have a tidy dataframe. :)

Exploratory analysis

Let’s begin by plotting the most commonly occurring words in all of the churn surveys.

Certainly interesting, but not very useful. It would help to gather some context about each word.

Diving deeper into word frequency

Another way to analyze a term’s frequency is to calculate the inverse document frequency (tdf), which is defined as:

idf(term) = ln(collection / collections containing term)

A term’s inverse document frequency (idf) decreases the weight for commonly used words and increases the weight for words that are used more sparsely. This can be combined with the overall term frequency to calculate a term’s tf-idf (the two quantities multiplied together), the frequency of a term adjusted for how rarely it is used.

The idea of tf-idf is to find the important words for the content of each collection of words (the different surveys being the collections of words) by decreasing the weight for commonly used words and increasing the weight for words that are not used very much in an entire collection of documents, in this case the text of all of the surveys combined. We want to find words that are most unique to each type of churn survey.

The bind_tf_idf function takes a tidy text dataset as input with one row per word, per document. One column (word) contains the terms, one column contains the documents (type), and the last necessary column contains the counts, how many times each document contains each term (n).

# calculate the frequency of words for each survey
survey_words <- text_df %>%
  count(type, word, sort = TRUE) %>%
  ungroup()

# calculate the total number of words for each survey
total_words <- survey_words %>% 
  group_by(type) %>% 
  summarize(total = sum(n))

# join the total words back into the survey_words data frame
survey_words <- left_join(survey_words, total_words, by = "type") %>%
  filter(type != "")

# view data 
head(survey_words)
## # A tibble: 6 x 4
##                       type    word     n total
##                     <fctr>   <chr> <int> <int>
## 1 awesome_downgrade_survey     not  2701 56249
## 2 awesome_downgrade_survey  buffer  1960 56249
## 3              exit_survey account  1846 34951
## 4              exit_survey     not  1479 34951
## 5 awesome_downgrade_survey  social  1085 56249
## 6 awesome_downgrade_survey    plan  1053 56249

There is one row in this data frame for each word-survey combination. n is the number of times that word is used in that survey and total is the total number of words in that survey’s responses.

The bind_tf_idf function

The idea of tf-idf is to find the important words for the content of each collection of comments by decreasing the weight for commonly used words and increasing the weight for words that are not used very much in an entire collection of documents, in this case all survey responses. We calculate tf-idf below.

# calculate tf_idf
survey_words <- survey_words %>%
  bind_tf_idf(word, type, n)

# view sample
head(survey_words)
## # A tibble: 6 x 7
##                       type    word     n total         tf   idf tf_idf
##                     <fctr>   <chr> <int> <int>      <dbl> <dbl>  <dbl>
## 1 awesome_downgrade_survey     not  2701 56249 0.04801863     0      0
## 2 awesome_downgrade_survey  buffer  1960 56249 0.03484506     0      0
## 3              exit_survey account  1846 34951 0.05281680     0      0
## 4              exit_survey     not  1479 34951 0.04231639     0      0
## 5 awesome_downgrade_survey  social  1085 56249 0.01928923     0      0
## 6 awesome_downgrade_survey    plan  1053 56249 0.01872033     0      0

The idf and tf_idf will be 0 for common words like “the” and “a”. Let’s visualize high tf_idf words for each type of churn survey.

The words that appear in these graphs appear more frequently in the specific survey type than they do in the other surveys. It’s interesting to see “finances” appear more commonly in the Awesome downgrade survey than the other surveys. The results under business_churn_survey do not appear to be very helpful, but prices (and price differences) are the top two terms listed in the business downgrade survey. The terms “duplicate”, “address”, and “created” indicate that account creation issues are a common reason for deleting Buffer accounts. Makes sense.

We don’t have much context and are required to speculate on what the meaning and emotion behind the words might be. It may be beneficial to look at groups of words to help us gather more information. We’ll explore this by looking at n-grams later in the analysis.

For now, let’s explore how different topics have changed over time.

Change over time

What words and topics have become more frequent, or less frequent, over time? These could give us a sense of what has become more and less important in our customers’ eyes.

We can first count the number of times each word is used each month, and then use the broom package to fit a logistic regression model to examine whether the frequency of each word increases or decreases over time. Every term will then have a growth rate (as an exponential term) associated with it.

Let’s start by defining a function that will plot the 12 terms with the highest coefficients of change for any particular survey type.

Now let’s plot the terms with the highest slopes in the Business churn survey.

# plot change words for business survey
plot_change("business_churn_survey") + labs(subtitle = "Business Churn Survey")

It’s interesting to see “client” and “clients” mentioned so frequently in the Business churn survey responses. I wonder what context they were used in? The term “instagram” seems to be coming up with increased frequency. I wonder if there are some improvements we can make to the Instagram Reminders process. The “afford” and “cost” terms indicate that price is still a factor for some folks. It’s also quite interesting to see “scheduling” appear more frequently – this makes sense, given that we’ve redesigned the posting schedule, but it is useful to know that it is causing people to churn.

Let’s look at a few of the business churn survey responses that include “client”.

# get responses with 'client'
responses %>% 
  filter(type == 'business_churn_survey' & tolower(details) %like% 'client') %>% 
  select(details) %>% 
  head(10)
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    details
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               The calendar functionality just isn't strong enough for what we need - there has been a lot of issues too with profiles disconnected without us being informed.\\n\\nTHe price point sits too high for what is being offered here and that is hwy we have decided to move to another platform that has better functions, bulk upload and actual reports that show data in a way a client would understand.\\n\\nThanks,\\n
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Weve had an issue several times now where a post for one Profile got postet on all linked profiles. since were taking care of our Clients social media profiles this is a nogo. None of them wants to find someone elses Posts on their profiles. thats why we want to cancel our businessplan.
## 3  So expensive and all about managing multiple networks from one place, then one can't even make proper posting groups that work, when I want to rebuffer a post I can't rebuffer it to groups but just to the very same one network I am rebuffering from.\\n\\nInstagram reminders are a joke.\\n\\nAnalytics are a joke\\n\\nand the biggest joke of them all, with this pricing that you have your stupid Reply and your even more cheekier strategy of pre announcing analytics as something WOOOW in 2018 and then also probably charge it extra.\\n\\nSorry but for such a big product like buffer with such a budget and so many clients it is really sad to see how this product has no edge at all and all innovation and talent has been probably wasted due to corporate greed.\\n\\nDefinitely  never coming back, probably one of the worst Value for Money tools I have come across so far.\\n\\nThe only thing, the interface looks nice but also only at first glance, then it has also major flaws, which really shouldn't be in a product that has been for so long on the market, but I guess that is what happens when all money goes into sponsored blog posts and those shitty things rather than into development and innovation.
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          We lost a couple of clients.....we will be back
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Not quite what we were after. Social listening is important to us, no IG scheduling available either. Too much for something we could do natively while we have a small number of social clients.
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 I no longer need to use a scheduling program for my biggest client, who lost their sponsorship and has to downsize their marketing.  I really liked the plan, just a shift in my and my client's needs. 
## 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               We have less clients so we were able to combine all clients to one account
## 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Change in client arrangement.\\nLack of Instagram integration
## 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                We're dropping our client
## 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           lost client, can't afford it just for myself.

Interestingly, it seems like there are a couple themes: users explaining problems with Buffer in the context of what their clients need, and people who lost clients no longer needing or affording Buffer. Makes sense. Let’s try to get a feel for why people include “post” in their responses.

# get responses with 'post'
responses %>% 
  filter(type == 'business_churn_survey' & tolower(details) %like% 'post') %>% 
  select(details) %>% 
  head(10)
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    details
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Hi there,\\n\\nWe were very happy with the support and the program. However we have decided to use one system to schedule all our posts, manage our community and have insights in the stats of all platforms, so we don't need Buffer at the moment. Thank you for the great support!\\n\\nBest wishes, Nicolet
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Weve had an issue several times now where a post for one Profile got postet on all linked profiles. since were taking care of our Clients social media profiles this is a nogo. None of them wants to find someone elses Posts on their profiles. thats why we want to cancel our businessplan.
## 3  So expensive and all about managing multiple networks from one place, then one can't even make proper posting groups that work, when I want to rebuffer a post I can't rebuffer it to groups but just to the very same one network I am rebuffering from.\\n\\nInstagram reminders are a joke.\\n\\nAnalytics are a joke\\n\\nand the biggest joke of them all, with this pricing that you have your stupid Reply and your even more cheekier strategy of pre announcing analytics as something WOOOW in 2018 and then also probably charge it extra.\\n\\nSorry but for such a big product like buffer with such a budget and so many clients it is really sad to see how this product has no edge at all and all innovation and talent has been probably wasted due to corporate greed.\\n\\nDefinitely  never coming back, probably one of the worst Value for Money tools I have come across so far.\\n\\nThe only thing, the interface looks nice but also only at first glance, then it has also major flaws, which really shouldn't be in a product that has been for so long on the market, but I guess that is what happens when all money goes into sponsored blog posts and those shitty things rather than into development and innovation.
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 currently we need the auto post feature, not the scheduled post. thanks.
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           I am switching to Paypal... You still need to improve the overall stability of Instagram and posting... It’s not reliable and we have to babysit this process.
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Great platform - we have decided to move in a different direction with social that involves more direct engagement with individuals and less \\"curated posts\\".  It's been a great ride with Buffer!  \\n\\nThanks.
## 7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Not being able to tag/link other contacts in posts is a real show stopper.
## 8                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        I have stopped using a planning service and prefer to just post myself for now to save the money.
## 9                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Like Buffer - but needed to go in a different direction. Found the ability to archive and reuse posts too critical to workflow.
## 10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   I'm constantly having to manually post items and \\"re-authorize\\" pages on Buffer. I've had enough.

These run the gambit – people switching services, costs, bugs, profile connection, etc.

Let’s create the same change plots as above, but this time for the Awesome downgrade survey.

# plot change words for awesome survey
plot_change("awesome_downgrade_survey") + labs(subtitle = "Awesome Churn Survey")

There are a couple interesting things here. It seems that Buffer “lacks” some features, and some people might have closed their businesses. Let’s look at some responses that include the term “active”.

# get responses with 'active'
responses %>% 
  filter(type == 'awesome_downgrade_survey' & tolower(details) %like% 'active') %>% 
  select(details) %>% 
  head(10)
##                                                                                                                                                                                                                                                                                                                                                details
## 1                                                                                                                                                                                                                                                                                                                        I'm not actively using Buffer
## 2                                                                                                                                                                                                        I'm not currently actively managing clients' accounts, am taking a bit of a break. But I'll definitely be back when I do - I LOVE Buffer! :-)
## 3                                                                                                                                                                                                                                                    I switched to crowd fire, it's a little more interactive and reminds me to do social media stuff.
## 4  I was getting lots of errors and needed to keep. Reconnecting my profiles and when I deleted them. And tried to add them back it said I needed to upgrade to add more profiles when I had 8 active on an awesome plan with 10 slots... Please help. I do not want to switch to another service but may do so if these problems are not resolved.\\n
## 5                                                                                                                                                                                            I love buffer but the accounts I'm using it for aren't as active any more so it's not worth me spending money on. Don't worry, it's not you, it's me! :) 
## 6                                                                                                                                                                                             Switching to ContentStudio. Does the same plus much more. I paid like $59 for lifetime access (early adopter promo). Their development is crazy active. 
## 7                                                                                                                                                                                                                                                                                                                     Not active enough at the moment.
## 8                                                                                                                                                                                                                                                            I'm not as active as I should be on social media, so I don't use the potential of Buffer.
## 9                                                                                                                                                                                                                                                                                                   I have someone else actively updating my LinkedIn 
## 10                                                                                                                                                                                                                                                                                                            Not so active anymore on social media...

It appears that users are telling us that they are not active enough on social media! I wonder how much of this effect is under Buffer’s control, and what it means for the market for Buffer as a whole.

Let’s move on to look at groups of words, instead of only looking at single terms.

N-grams

What if we looked at groups of words instead of just single words? We can check which words tend to appear immediately after another, and which words tend to appear together in the same document.

We’ve been using the unnest_tokens function to tokenize by word, but we can also use the function to tokenize into consecutive sequences of words, called n-grams. By seeing how often word X is followed by word Y, we can then build a model of the relationships between them.

We do this by adding the token = "ngrams" option to unnest_tokens(), and setting n to the number of words we wish to capture in each n-gram. When we set n to 2, we are examining groups of 2 consecutive words, often called “bigrams”:

# unnest bigrams from responses
bigrams <- responses %>%
  unnest_tokens(bigram, details, token = "ngrams", n = 2)

# view the bigrams
head(bigrams$bigram)
## [1] "we have"         "have choosen"    "choosen another" "another app"    
## [5] "mainly 2"        "2 reasons"

Great! Each token now is represented by a bigram. Let’s take a quick look at the most common bigrams

# Count the most common bigrams
bigrams %>%
  count(bigram, sort = TRUE) %>% 
  head(10)
## # A tibble: 10 x 2
##          bigram     n
##           <chr> <int>
##  1 social media  1186
##  2    not using  1045
##  3         i am  1011
##  4     using it  1006
##  5      i don't   980
##  6       i have   909
##  7    no longer   803
##  8       use it   782
##  9      need to   773
## 10   don't need   710

As we might expect, a lot of the most common bigrams are groups of common words. This is a useful time to use tidyr’s separate(), which splits a column into multiple based on a delimiter. This lets us separate it into two columns, “word1” and “word2”, at which point we can remove cases where either is a stop-word.

# separate words in bigrams
separated <- bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")

# define our own stop words
word <- c("i", "i'm", "it", "the", "at", "to", "right", "just", "to", "a", "an",
          "that", "but", "as", "so", "will", "for", "longer", "i'll", "of", "my",
          "n", "do", "did", "am", "with", "been", "and", "we")

# create tibble of stop words
stopwords <- tibble(word)

# filter out stop-words
filtered <- separated %>%
  filter(!word1 %in% stopwords$word) %>%
  filter(!word2 %in% stopwords$word)

# calculate new bigram counts
bigram_counts <- filtered %>% 
  count(word1, word2, sort = TRUE)

head(bigram_counts)
## # A tibble: 6 x 3
##     word1 word2     n
##     <chr> <chr> <int>
## 1  social media  1186
## 2     not using  1045
## 3   don't  need   710
## 4      be  back   587
## 5   don't   use   474
## 6 awesome  plan   438

We can already glean some useful information from this. We’ll use tidyr’s unite() function to recombine the columns into one. Then we’ll plot the most common bigrams included in all of the surveys.

The most common bigram is “social media”, which makes sense. It’s more interesting to see that the next two most common bigrams are “not using” and “don’t need”. This seems like a clear signal that Buffer wasn’t filling these users’ needs in one way or another, which led them to leaving the product.

Bigrams like “be back”, “another account”, and “have another” indicate that these users either have another Buffer account, or need to stop using it only temporarily.

A bigram can also be treated as a term in a document in the same way that we treated individual words. For example, we can look at the tf-idf of these trigrams across the surveys. These tf-idf values can be visualized within each segment, just as we did for single words earlier. We’ll exclude the Business downgrade (to Awesome) survey, because the sample is not large enough to give us anything useful.

Very interesting! We can spot themes much more easily here. People delete their Buffer accounts more because of account creation issues. Image uploads, cost, and need are themes more unique to the Business churn survey, and shceduling, FB groups, and upgrading again later are more unique to the Awesome churn survey.

Visualizing a network of bigrams with ggraph

As one common visualization, we can arrange the words into a network, or “graph.” Here we’ll be referring to a “graph” not in the sense of a visualization, but as a combination of connected nodes. A graph can be constructed from a tidy object since it has three variables:

  • from: the node an edge is coming from
  • to: the node an edge is going towards
  • weight: A numeric value associated with each edge

The igraph package has many powerful functions for manipulating and analyzing networks. One way to create an igraph object from tidy data is the graph_from_data_frame() function, which takes a data frame of edges with columns for “from”, “to”, and edge attributes (in this case n):

Let’s create a bigram graph object.

# filter for only relatively common combinations
bigram_graph <- bigram_counts %>%
  filter(n > 80) %>%
  graph_from_data_frame()

bigram_graph
## IGRAPH DN-- 86 91 -- 
## + attr: name (v/c), n (e/n)
## + edges (vertex names):
##  [1] social  ->media     not     ->using     don't   ->need     
##  [4] be      ->back      don't   ->use       awesome ->plan     
##  [7] thank   ->you       using   ->buffer    love    ->buffer   
## [10] too     ->expensive don't   ->have      buffer  ->is       
## [13] is      ->not       another ->account   use     ->buffer   
## [16] you     ->guys      business->plan      come    ->back     
## [19] this    ->account   more    ->than      signed  ->up       
## [22] be      ->able      this    ->time      buffer  ->account  
## + ... omitted several edges

We can convert an igraph object into a ggraph with the ggraph function, after which we add layers to it, much like layers are added in ggplot2. For example, for a basic graph we need to add three layers: nodes, edges, and text.

This is a visualization of a Markov chain, a model in text processing. In a Markov chain, each choice of word depends only on the previous word. In this case, a random generator following this model might spit out “buffer”, then “is”, then “great”, by following each word to the most common words that follow it. To make the visualization interpretable, I chose to show only the most common word to word connections. What can we learn from this graph?

We can use this graph to visualize some details about the text structure. For example, we can see that “buffer” and “plan” form the centers of groups of nodes.

We also see pairs or triplets along the outside that form common short phrases (“can’t afford”, “too expensive”, or “don’t need”).

I see that “not” is at the center of a cluster of nodes. The most common connections are “not using” and “social media”. This is indicated by the darkest arrows.

What would this graph look like if we only looked at the responses of the business churn survey?

We can see that “don’t need” and “don’t use” are common themes again. Cost is another factor. There is a gratitude corner in the top left - cool! Competitors Hubspot and Sprout are also included.

Conclusions

It still feels important to figure out why users stop using and needing Buffer. In many cases it could be due to external factors like business needs, market forces, layoffs, but in other cases it could be due to Buffer itself. Perhaps Buffer could have a better engagement loop. Or perhaps Buffer could help users that become inactive by suggesting content to share.

Another theme that appears repeatedly is cost. We know that the current pricing structure isn’t completely ideal, so it feels good to be working towards a more individualized structure over the next few months.

Account issues, i.e. duplicate accounts, seem to be a big issue. We’re actively addressing those soon, so I’m optimistic that we’ll see less of that theme in future responses.

There is a general theme of gratitude in these responses - “i love buffer” was a common phrase that appeared often in each survey. It’s comforting to know that people like the product and team – I hope that we’ll be able to use some of these learnings to give them a better experience. :)