A repository for tracking #NICAR18 tweets
================
This is a dedicated repository for tracking #NICAR18 tweets (the official hashtag of 2018 annual Computer-Assisted Reporting Conference).
Whether you lookup the status IDs or search/stream new tweets, you’ll need to make sure to install the rtweet package. The code below will install [if it’s not already] and load rtweet.
## install rtweet if not already
if (!requireNamespace("rtweet", quietly = TRUE)) {
install.packages("rtweet")
}
## load rtweet
library(rtweet)
#>
#> Attaching package: 'rtweet'
#> The following object is masked from 'package:tfse':
#>
#> round_time
Our data collection method is described in detail below. However, if you want to get straight to the data, simply run the following code:
## download status IDs file
download.file(
"https://github.com/computer-assisted-reporting/NICAR18/blob/master/data/search-ids.rds?raw=true",
"NICAR18_status_ids.rds"
)
## read status IDs fromdownloaded file
ids <- readRDS("NICAR18_status_ids.rds")
## lookup data associated with status ids
rt <- rtweet::lookup_tweets(ids$status_id)
One of the easiest ways to gather Twitter data is to search for the data (using Twitter’s REST API). Unlike streaming, searching makes it possible to go back in time. Unfortunately, Twitter sets a rather restrictive cap–roughly nine days–on how far back you can go. Regardless, searching for tweets is often the preferred method. For example, the code below is setup in such a way that it can be executed once [or even several times] a day throughout the conference. See the R code here.
Here’s some example code showing what essentially we’re doing to collect the data:
## search terms
nicar18conf <- c("NICAR18", "NICAR2018", "IRE_NICAR")
## search for up to 10,000 tweets mentioning nicar18
rt <- search_tweets(paste(nicar18conf, collapse = " OR "), n = 10000)
To explore the Twitter data, we recommend using the tidyverse packages. We’re also using a customized ggplot2 theme. See the R code here.
To create the image below, the data were summarized into a time series-like data frame and then plotted in order depict the frequency of tweets–aggregated in two-hour intevals–about #nicar18 over time. See the R code here.
Next, some sentiment analysis of the tweets so far. See the R code here.
The image below depicts a quick and dirty visualization of the semantic network (connections via retweet, quote, mention, or reply) as it is observed in the data. See the R code here.
Ideally, the network visualization would be an interactive, searchable graphic. Since it’s not, I’ve printed out the node size values below.