Towards the end of 2017, Buffer sent out a survey to gauge the state of social media going into 2018. We had over 1700 responses, which we downloaded as a csv file and stored as an R data object. The questions and responses can be viewed here.
The survey consisted of 30 questions about how users use social media.
# get response data
responses <- readRDS('state-of-social.rds')
The data is quite messy and untidy. Let’s begin by addressing the column names. We can use the safe_names()
function in the buffer
package to take care of uppercase letters and periods in the column names.
# clean up column names
colnames(responses) <- safe_names(colnames(responses))
The column names in the dataframe are just the questions. These are too long to work with efficiently, so we’ll replace them with question numbers q1
through q87
. You might wonder why there are 88 columns in our dataframe. We have so many columns because some of the questions in the survey allow for multiple responses.
For example, the question “Which of the following channels does your business use currently?” could be answered with any combination of social networks, e.g. Facebook, Facebook and Snapchat, Linkedin and Twitter and Facebook, etc. This survey question can be broken apart into eight individual questions corresponding to each social network. The question really asks “does your business use Facebook?”, “does your business use Twitter?”, “does your business use Pinterest?”, etc.
Now, let’s create a lookup table that will help us reference the worded question for each question number.
# create lookup table for questions
questions <- data_frame(number = character(87), question = character(87))
# set the question numbers
questions$number <- paste0('q', seq(1:87))
# set the worded questions
questions$question <- colnames(select(responses, -x_))
Nice, now we’ll be able to look up the question for each question number. Let’s replace the column names in our responses
dataframe.
# set column names
column_names <- c('user_id', questions$number)
# replace column names
colnames(responses) <- column_names
Alright, I think we’re ready for some exploratory analysis!
Exploratory analysis
The first few questions we ask are about the companies the respondents work for. We ask about their industries, size, and social media profiles. Let’s plot a few histograms to see what some common combinations are.
We can see that over 400 respondents work in marketing, PR, or advertising. They represent around 23% of all respondents. Media and publishing, non-profits, consumer products, and education make up another 37% of respodents.
Let’s take a look at the sizes of these commpanies.
Most of the respondents worked for small companies. The most common response company size was just a single person! Let’s explore the relationship between the two.
table(responses$q1, responses$q2)
##
## 1,001+ 101-200 11-25 201-500 26-50
## 4 0 0 0 0 0
## Consumer products 1 19 7 14 9 10
## Ecommerce 0 2 4 8 2 6
## Education 2 23 11 7 19 4
## Financial Services 0 13 5 8 3 8
## Government 0 12 7 6 5 1
## IT and Services 0 12 3 17 7 8
## Law and Legal Services 0 3 0 4 0 2
## Marketing, PR, Advertising 0 8 9 59 8 29
## Media and Publishing 1 9 7 18 4 12
## Medical and Healthcare 0 13 2 7 4 5
## Non-profit 1 6 17 32 7 26
## Other 2 20 22 30 12 18
## Software 1 5 8 13 7 12
## Travel and Tourism 0 3 4 1 5 3
##
## 501-1,000 51-100 Fewer than 10 Just me
## 0 0 0 1
## Consumer products 10 11 35 29
## Ecommerce 2 5 15 18
## Education 9 12 28 21
## Financial Services 1 5 5 7
## Government 3 0 0 1
## IT and Services 9 6 25 15
## Law and Legal Services 2 2 2 1
## Marketing, PR, Advertising 5 16 130 154
## Media and Publishing 7 10 46 80
## Medical and Healthcare 2 7 8 9
## Non-profit 5 18 41 24
## Other 11 17 54 79
## Software 4 11 17 3
## Travel and Tourism 2 1 14 6
We can see that the most common responses are Marketing, PR, and Advertising companies with fewer than 10 employees, including single employees.
We can plot the combinations with at least 25 respondents.
We can see that only a few of the most popular combinations have a team size over 10. Now let’s try to answer a few specific questions.
Are B2B companies more likely to create Instagram strategy?
responses %>%
select(q4, q23) %>%
group_by(q4, q23) %>%
summarise(respondents = n()) %>%
mutate(percent = respondents / sum(respondents)) %>%
filter(q4 != '' & q23 != '' & q23 == "Yes") %>%
ungroup() %>%
mutate(q4 = reorder(q4, percent)) %>%
ggplot(aes(x = q4, y = percent)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_ipsum() +
labs(x = NULL, y = NULL, title = "Instagram strategy",
subtitle = "Percentage of companies that created a story in 2017")
How do B2B and B2C companies measure ROI?
Let’s begin by making a side-by-side plot to show the absolute percentages. We need to do a small transformation first.
# create ROI dataframe
roi <- responses %>%
select(q4, q74) %>%
rename(roi = q74) %>%
filter((q4 == 'B2B' | q4 == 'B2C') & roi != '') %>%
mutate(roi = as.character(roi))
# replace long engagement value with trucated one
engagement_rows <- grep("Engagement", roi$roi)
roi[engagement_rows, ]$roi <- "Engagement"
Now for the side by side plots.
library(gridExtra)
# b2c plot
b2c <- roi %>%
filter(roi != '' & q4 == 'B2C') %>%
group_by(roi) %>%
summarise(n = n()) %>%
mutate(percent = n / sum(n)) %>%
mutate(roi = reorder(roi, n)) %>%
ggplot(aes(x = roi, y = percent)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_ipsum() +
labs(x = NULL, y = NULL, title = "How Social Media ROI is Measured",
subtitle = "For B2C Companies")
# b2b plot
b2b <- roi %>%
filter(roi != '' & q4 == 'B2B') %>%
group_by(roi) %>%
summarise(n = n()) %>%
mutate(percent = n / sum(n)) %>%
mutate(roi = reorder(roi, n)) %>%
ggplot(aes(x = roi, y = percent)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_ipsum() +
labs(x = NULL, y = NULL, title = "",
subtitle = "For B2B Companies")
# plot two plots together
grid.arrange(b2c, b2b, nrow = 1)
Company size and most important networks
We’ll first determine if a company is considered small or large.
# categorize company size
responses <- responses %>%
mutate(size = ifelse(q2 == "", NA,
ifelse(q2 == "Just me", "small",
ifelse(q2 == "Fewer than 10", "small",
ifelse(q2 == "11-25", "small",
ifelse(q2 == "26-50", "small", "large"))))))
table(responses$size)
##
## large small
## 539 1236
Now let’s repeat what we’ve done above.
Do large businesses publish videos more frequently?
To answer this question, we’ll need to determine if a business is small or large. We’ll define a small business as one that has between one and fifty employees. We’ll categorize any businesses with over 50 employees as a large business.
# categorize company size
responses <- responses %>%
mutate(size = ifelse(q2 == "", NA,
ifelse(q2 == "Just me", "small",
ifelse(q2 == "Fewer than 10", "small",
ifelse(q2 == "11-25", "small",
ifelse(q2 == "26-50", "small", "large"))))))
table(responses$size)
##
## large small
## 539 1236
The question regarding video posting frequency is question number 25 in our dataframe. Let’s review the distribution of responses for both small and large businesses.
library(gridExtra)
# small business plot
small <- responses %>%
filter(q25 != '' & size == 'small') %>%
group_by(q25) %>%
summarise(n = n_distinct(user_id)) %>%
mutate(percent = n / sum(n)) %>%
mutate(q25 = reorder(q25, n)) %>%
ggplot(aes(x = q25, y = percent)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_ipsum() +
labs(x = NULL, y = NULL, title = "Frequency of Video Uploads",
subtitle = "For Small Businesses (1-50 employees)")
# large business plot
large <- responses %>%
filter(q25 != '' & size == 'large') %>%
group_by(q25) %>%
summarise(n = n_distinct(user_id)) %>%
mutate(percent = n / sum(n)) %>%
mutate(q25 = reorder(q25, n)) %>%
ggplot(aes(x = q25, y = percent)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_ipsum() +
labs(x = NULL, y = NULL, title = "Frequency of Video Uploads",
subtitle = "For Large Businesses (50+ employees)")
# plot two plots together
grid.arrange(small, large, nrow = 1)
Now we can calculate log ratios to determine if large businesses are more or less likely to publish videos at any frequency.
So, it looks like “small” businesses that responded to our survey were more than twice as likely to never create video content than “large” businesses that responded.
What factors are holding small businesses back from creating video content?
We’ll use the same approach as in the previous two questions. The question numbers are 32-38. This will take some data tidying first. We’ll need to count the number of users that reqponded to each question and bind them into a single data frame.
# get counts for each response
dont_post <- responses %>%
select(user_id, size, q32) %>%
rename(reason = q32) %>%
filter(reason == "We don't post video content")
other <- responses %>%
select(user_id, size, q33) %>%
rename(reason = q33) %>%
filter(reason == "Other")
time <- responses %>%
select(user_id, size, q34) %>%
rename(reason = q34) %>%
filter(reason == "Lack of time")
budget <- responses %>%
select(user_id, size, q35) %>%
rename(reason = q35) %>%
filter(reason == "No budget to produce video")
film <- responses %>%
select(user_id, size, q36) %>%
rename(reason = q36) %>%
filter(reason == 'Not sure how to film/edit')
what <- responses %>%
select(user_id, size, q37) %>%
rename(reason = q37) %>%
filter(reason == 'Not sure what video to create')
by_reason <- dont_post %>%
bind_rows(other) %>%
bind_rows(time) %>%
bind_rows(budget) %>%
bind_rows(film) %>%
bind_rows(what)
Now let’s create a side-by-side plot.
Measuring ROI
Let’s take the same approach to see if there is any difference in how small and large companies measure ROI.
# small business plot
small <- responses %>%
filter(q74 != '' & size == 'small') %>%
group_by(q74) %>%
summarise(n = n_distinct(user_id)) %>%
mutate(percent = n / sum(n)) %>%
mutate(q74 = reorder(q74, n)) %>%
ggplot(aes(x = q74, y = percent)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_ipsum() +
labs(x = NULL, y = NULL, title = "How Social Media ROI is Measured",
subtitle = "Small Businesses (1-50 employees)")
# large business plot
large <- responses %>%
filter(q74 != '' & size == 'large') %>%
group_by(q74) %>%
summarise(n = n_distinct(user_id)) %>%
mutate(percent = n / sum(n)) %>%
mutate(q74 = reorder(q74, n)) %>%
ggplot(aes(x = q74, y = percent)) +
geom_bar(stat = 'identity') +
scale_y_continuous(labels = percent) +
coord_flip() +
theme_ipsum() +
labs(x = NULL, y = NULL, title = "",
subtitle = "Large Businesses (50+ employees)")
# plot two plots together
grid.arrange(small, large, nrow = 1)
Now let’s plot the log ratios.
roi <- responses %>%
select(size, q74) %>%
filter(q74 != '') %>%
mutate(q74 = as.character(q74))
engagement_rows <- grep("Engagement", roi$q74)
roi[engagement_rows, ]$q74 <- "Engagement"
# calculate log ratios for the counts
counts <- roi %>%
filter(q74 != '') %>%
count(size, q74) %>%
spread(size, n, fill = 0) %>%
mutate(total = small + large,
small = (small) / sum(small + 1),
large = (large + 1) / sum(large + 1),
log_ratio = log2(small / large),
abs_ratio = abs(log_ratio)) %>%
arrange(desc(log_ratio))
# plot the ratios
counts %>%
group_by(direction = ifelse(log_ratio < 0, 'More "small"', "More 'large'")) %>%
top_n(15, abs_ratio) %>%
ungroup() %>%
mutate(q74 = reorder(q74, log_ratio)) %>%
ggplot(aes(q74, log_ratio, color = direction)) +
geom_segment(aes(x = q74, xend = q74,
y = 0, yend = log_ratio),
size = 1.1, alpha = 0.6) +
geom_point(size = 3.5) +
coord_flip() +
theme_ipsum() +
labs(x = NULL,
y = NULL,
title = "How ROI is Measured",
subtitle = "For Small and Large Businesses") +
scale_color_discrete(name = "", labels = c("More 'small'", "More 'large'")) +
scale_y_continuous(breaks = seq(-3, 3),
labels = c("8X", "4X", "2X",
"Same", "2X", "4X", "8X"))