So you use Twitter and have heard Splunk can do “Big Data”. By tapping into Twitter’s API you can use Splunk to investigate the stream of tweets being generated across the globe.
The great thing about using Splunk to do this is that you have complete control of the data meaning it’s incredibly flexible as to what you can build. A few basic ideas I’ve had include tracking hashtags, following specific influencers, or tracking tweets by location in real-time.
What’s more, it takes a matter of minutes before you can start analysing the wealth of data being generated. This post will show you how.
Prerequisites
- A basic understanding of Splunk
- Splunk installed and running (free download here)
- A basic Twitter account
Step 1: Create a Twitter App
Go to: “dev.twitter.com” > “Sign in / up” > select “Create App”.
It doesn’t really matter what name you enter when creating the app (especially if it’s not going to be public) although I’d recommend using something you can remember. Same goes for description and website.
The callback field can be left blank. I won’t go into why or when this should be used in this post.
Step 2: Generate API Keys
Once your app has been created click the “API Keys” tab. You should see “Your access token” with a button “Create my access token”. Press this button.
You should now see your API keys. Don’t worry about noting them down, we can come back to this page at anytime. You will want to keep them secret though (the app above will be deleted by the time you read this!).
Step 3: Install the REST API Modular Input in Splunk
To get this feed into Splunk we’ll use Damien Dallimore’s REST API Modular input for Splunk. You can download the app here with full instructions on how to install it.
Step 4: Configure Your RESTful Twitter Input
We’re on the home straight now! Now we just need to give Splunk the credentials to tap into the Twitter API.
In Splunk navigate to “Settings > Data Inputs > REST”, and select “Add new”.
4.1 OAuth settings:
REST API Input Name = TwitterFeed (optional)
Endpoint URL = https://stream.twitter.com/1.1/statuses/filter.json
HTTP Method: GET
Authentication Type = oauth1
OAUTH1 Client (Consumer) Key = <YOUR_CLIENT_KEY>
OAUTH1 Client (Consumer) Secret = <YOUR_CLIENT_SECRET>
OAUTH1 Access Token = <YOUR_ACCESS_KEY>
OAUTH1 Access Token Secret = <YOUR_ACCESS_SECRET>
If you need to retrieve your OAUTH keys created in Step 2 go to: “dev.twitter.com” > “My apps” > “[Your App]” > “API Keys” > “Test OAuth”.
Note, we will be using version 1.1 of Twitter’s API which imposes rate limitations on its endpoints. If you’re only collecting a small number of tweets every 15 minutes this shouldn’t be a problem. If you’re planning on polling thousands you should probably read this first.
4.2 Argument settings:
At this point you should read the Twitter API docs if you are unfamiliar with the arguments that can be passed.
Example 1
URL Arguments: track=#worldcup^stall_warnings=true
Here I am using the ‘track’ streaming API parameter. In this case, I am polling tweets that contain the hashtag #worldcup. Note, that if you want to track multiple keywords, these are separated by a comma. However, the REST API configuration screen expects a comma delimeter between key=value pairs. Notice that I have used a “^” delimiter instead, as I need to use commas for my track values.
Example 2
URL Arguments: follow=21756213^stall_warnings=true
Now I am collecting Tweets using the “follow” streaming API parameter for the account @himynamesdave (that’s me). Note, that when using the follow parameter you must use the users ID, not username. If you’re unsure how to find a user ID, this site will help you.
4.3 Response settings:
Response Type = json
Streaming Request = True
Request Timeout = 86400 (optional)
Delimeter: ^ (or whatever delimiter you used in the URL arguments field)
Set Sourcetype: Manual
Sourcetype: tweets (optional)
Note, for steps 4.1 – 4.3, I have only included the fields that are essential to configure (unless stated). Everything else can be left blank or as default (unless you need to enter in a proxy to get out to the internet, etc).
4.4 inputs.conf:
For reference, your new REST input configuration can also be found in: “<SPLUNK_HOME/etc/apps/launcher/local/inputs.conf”.
Step 5: Check Your Input is Working
Using a Splunk search will allow you to check your data is being received and indexed:
sourcetype="tweets"
Note, you will only start to see Tweets after your Input polls a new Twitter event (we will not be able to pull Tweets historically).
See the latest tweet:
sourcetype="tweets" | fields text | head 1
Look at Tweet volume over time:
sourcetype="tweets" | timechart count(_raw)
Or count the number of retweets:
sourcetype="tweets" | stats count(retweet_count)
… you get the idea.
Start polling other accounts or searches to build up a bigger picture of what’s happening by repeating the steps above.
Step 6: Enrich Your Tweets
Why not start by analysing the sentiment of your Tweets? Splunker, David Carasso, has built a Sentiment App for Splunk that will help you to do this.
Alternatively use the REST API Modular Input to bring other social media sources into Splunk. Foursquare, Facebook and LinkedIn are just a few others that spring to mind.
Let me know what mashups you dream up (and build!).