DIY Twitter analytics (part 2: correlations)

I’ve been working with the Twitter API to develop my own Twitter analytics tool chain, and have been documenting the results on this blog. My last post on the subject described clustering my followers by their hashtag use to see whose tweets are most like mine. My goal of this project is to figure out best how to position my tweets for maximum impact, figure out who the main influencers in my subjects are, and so on. I haven’t figured all this out yet, but am making progress.

In this post I describe correlation studies I conducted relating number of followers with the number of retweets and favorites a Twitter user receives. The results surprised me a bit: A user’s number of followers is positively correlated with number of favorites they receive (no surprise), but their number of retweets is not correlated (this surprised me).

There is one major caveat to this study: The population I investigated, my approximately 250 followers, is not a random sample of Twitter users. Perhaps this is a problem, but I’m not sure. Later I’ll repeat this work on a random sample.

For each analysis presented below I only considered “original” tweets, meaning I ignored tweets that a user in my study population had retweeted from another source. I think this makes the analysis more authentic in that it relies solely on content that didn’t already have retweet/favorite “momentum”.

Getting the Source Data

I used the Twitter API to download up to the last 200 tweets from each of my followers. Contained in the JSON output returned by the API is each tweet’s count of favorites and retweets, which I correlated with the number of followers each user had. I downloaded each user’s information, including the number of followers they have, in a separate API query. The tweet information also contained a flag indicating whether the tweet is a retweet that originated from another user, which I filtered out in the analysis.

Correlation of Favorites with Number of Followers

First I simply examined the potential correlation between the number of favorites each user had received for all their original tweets and the number of followers the user has:

original_favorite_count

The Spearman correlation coefficient for this pair of data is 0.778. I consider this value sufficient to say correlation exists.

Then I normalized each user’s count of favorites by the number of that user’s tweets in the data set, and reran the investigation:

normalized_original_favorite_count

The Spearman correlation coefficient for this pair of data is 0.658. This is not as strong as the previous correlation, but still sufficient for me to assert that a correlation exists.

Lack of Correlation of Retweets with Number of Followers

Next I investigated the relationship between the number of retweets each user had received for all their original tweets and the number of followers the user has:

original_retweet_count

The Spearman correlation coefficient for this pair of data is 0.128. Therefore I don’t think a correlation exists.

Then I normalized each user’s count of retweets by the number of that user’s tweets in the dataset, and reran the analysis:

normalized_original_retweet_count

The Spearman correlation coefficient for this pair of data is -0.019. Again, I don’t think a correlation exists.

Why Though?

Two friends of mine shared the same good theory about why these results appear: Marking a tweet as a “favorite” requires less commitment than retweeting something, since the favored tweet does not appear in a user’s tweet record like a retweet would. Therefore users are more likely to favor a tweet than retweet it.

Leave a Reply

Your email address will not be published.