New support for authoring modular inputs in Node.js

September 17, 2014, 3:22 pm

≫ Next: What’s in store for Developers at .conf2014

Modular inputs allow you to teach Splunk Enterprise new ways to pull in events from internal systems, third party APIs or even devices. Modular Inputs extend Splunk Enterprise and are deployed on the Splunk Enterprise instance or on a forwarder. In version 1.4.0 of the Splunk SDK for JavaScript we added support for creating modular inputs in Node.js!

In this post, I’ll show you how to create a modular input with Node.js that pulls commit data from GitHub into Splunk.

Why Node.js

Node.js is designed for I/O intensive workloads. It offers great support for streaming data into and out of a Node application in an asynchronous manner. It also has great support for JSON out of the box. Finally, Node.js has a huge ecosystem of packages available via npm that are at your disposal. An input pulls data from a source and then streams those results directly into a Splunk instance. This makes modular inputs a great fit for Node.js.

Getting started

You can get the Splunk SDK for JavaScript from npm (npm install splunk-sdk), the Splunk Developer Portal or by grabbing the source from our GitHub repo. You can find out more about the SDK here. The SDK includes two sample modular inputs, random numbers, and GitHub commits. For the remainder of this post we’ll look at the GitHub example.

This input indexes all commits on the master branch of a GitHub repository using GitHub’s API. This example illustrates how to pull in data from an external source, as well as showing how to create checkpoints when you are periodically polling in order to prevent duplicate events from getting created.

Prerequisites

You have installed a Splunk Enterprise instance, version 5.0 or later
You have Node.js installed (v0.8 or later, check this by running node -v from the command line).
You have downloaded the Splunk SDK for JavaScript.

Installing the example

Set the $SPLUNK_HOME environment variable to the root directory of your Splunk Enterprise instance.

Copy the GitHub example from

/splunk-sdk-javascript/examples/modularinputs/github_commits

$SPLUNK_HOME/etc/apps

Open a command prompt or terminal window and go to the following directory:
```
$SPLUNK_HOME/etc/apps/github_commits/bin/app
```
Then type npm install, this will install the Node modules which are required, which includes the splunk-sdk itself and the github module.
Restart Splunk Enterprise by typing the following into the command line:
```
$SPLUNK_HOME/bin/splunk restart
```

Configuring the GitHub commits modular input example

Modular Inputs integrate with Splunk Enterprise, allowing Splunk Administrators to create new instances and provide necessary configuration right in the UI similar to other inputs in Splunk. To see this in action, follow these steps:

From Splunk Home, click the Settings menu. Under Data, click Data inputs, and find “GitHub commits”, the input you just added. Click Add new on that row.
Click Add new and fill in:
- name (whatever name you want to give this input)
- owner (the owner of the GitHub repository, this is a GitHub username or org name)
- repository (the name of the GitHub repository)
- (optional) token if using a private repository and/or to avoid GitHub’s API limits
To get a GitHub API token visit the GitHub settings page and make sure the repo and public_repo scopes are selected.
Save your input, and navigate back to Splunk Home.
Do a search for sourcetype=github_commits and you should see some events indexed; if your repository has a large number of commits indexing them may take a few moments.

Analyzing GitHub commit data

Now that your GitHub repository’s commit data has been indexed by Splunk Enterprise, you can leverage the power of Splunk’s Search Processing Language to do interesting things with your data. Below are some example searches you can run:

Want to know who the top contributors are for this repository? Run this search:

sourcetype="github_commits" source="github_commits://[your input name]" | stats count by author | sort count DESC

Want to see a graph of the repository’s commits over time? Run this search:
```
sourcetype="github_commits" source="github_commits://[your input name]" | timechart count(sha) as "Number of commits"
```
Then click the Vizualization tab, and select line from the drop down for visualization types (pie may be already selected).

Write your own modular input with the Splunk SDK for JavaScript

Adding a modular input to Splunk Enterprise is a two-step process: First, write a modular input script, and then package the script with several accompanying files and install it as a Splunk app.

Writing a modular input

A modular input will:

Return an introspection scheme. The introspection scheme defines the behavior and endpoints of the script. When Splunk Enterprise starts, it runs the input to determine the modular input’s behavior and configuration.
Validate the script’s configuration (optional). Whenever a user creates or edits an input, Splunk Enterprise can call the input to validate the configuration.
Stream events into Splunk. The input streams event data that can be indexed by Splunk Enterprise. Splunk Enterprise invokes the input and waits for it to stream events.

To create a modular input in Node.js, first require the splunk-sdk Node module. In our examples, we’ve also assigned the classes we’ll be using to variables, for convenience. At the very least, we recommend defining a ModularInputs variable as shown here:

var splunkjs        = require("splunk-sdk");
var ModularInputs   = splunkjs.ModularInputs;

The preceding three steps are accomplished as follows using the Splunk SDK for JavaScript:

Return the introspection scheme: Define the getScheme method on the exports object.
Validate the script’s configuration (optional): Define the validateInput method on the exports object. This is required if you set the scheme returned by getScheme to use external validation (that is, set Scheme.useExternalValidation to true).
Stream events into Splunk: Define the streamEvents method on the exports object.

In addition, you must run the script by calling the ModularInputs.execute method, passing in the exports object you just configured along with the module object which contains the state of this script:

ModularInputs.execute(exports, module);

To see the full GitHub commits input source code, see here.

Woah. Let’s take a deeper dive into the code so we can understand what’s really going on.

The getScheme method

When Splunk Enterprise starts, it looks for all the modular inputs defined by its configuration, and tries to run them with the argument –scheme. The scheme allows your input to tell Splunk arguments that need to be provided for the input, these arguments are then used for populating the UI when a user creates an instance of an input. Splunk expects each modular input to print a description of itself in XML to stdout. The SDK’s modular input framework takes care of all the details of formatting the XML and printing it. You only need to implement a getScheme method to return a new Scheme object, this makes your job much easier!

As mentioned earlier, we will be adding all methods to the exports object.

Let’s begin by defining getScheme, creating a new Scheme object, and setting its description:

exports.getScheme = function() {
        var scheme = new Scheme("GitHub Commits"); 
        scheme.description = "Streams events of commits in the specified GitHub repository (must be public, unless setting a token).";

For this scheme, the modular input will show up as “GitHub Commits” in Splunk.

Next, specify whether you want to use external validation or not by setting the useExternalValidation property (the default is true). If you set external validation to true without implementing the validateInput method on the exports object, the script will accept anything as valid. We want to make sure the GitHub repository exists, so we’ll define validateInput once we finish with getScheme.

       scheme.useExternalValidation = true;

If you set useSingleInstance to true (the default is false), Splunk will launch a single process executing the script which will handle all instances of the modular input. You are then responsible for implementing the proper handling for all instances within the script. Setting useSingleInstance to false will allow us to set an optional interval parameter in seconds or as a cron schedule(available under more settings when creating an input).

      scheme.useSingleInstance = false;

The GitHub commits example has 3 required arguments (name, owner, repository), and one optional argument (token). Let’s recap what these are for:

name: The name of this modular input definition (ex: Splunk SDK for JavaScript)
owner: The GitHub organization or user that owns the repository (ex: splunk)
repository: The GitHub repository (ex: splunk-sdk-javascript), don’t forget to set the token argument if the repository is private
token: A GitHub access token with at least the repo and public_repo scopes enabled. To get an access token, see the steps outlined earlier in this post.

Now let’s see how these arguments are defined within the Scheme. We need to set the args property of the Scheme object we just created to an array of Argument objects:

      scheme.args = [
            new Argument({
                name: "owner",
                dataType: Argument.dataTypeString,
                description: "GitHub user or organization that created the repository.",
                requiredOnCreate: true,
                requiredOnEdit: false
            }),
            new Argument({
                name: "repository",
                dataType: Argument.dataTypeString,
                description: "Name of a public GitHub repository, owned by the specified owner.",
                requiredOnCreate: true,
                requiredOnEdit: false
            }),
            new Argument({
                name: "token",
                dataType: Argument.dataTypeString,
                description: "(Optional) A GitHub API access token. Required for private repositories (the token must have the 'repo' and 'public_repo' scopes enabled). Recommended to avoid GitHub's API limit, especially if setting an interval.",
                requiredOnCreate: false,
                requiredOnEdit: false
            })
        ];

Each Argument constructor, takes a parameter of a JavaScript object with the required property name and the optional properties:

dataType: What kind of data is this argument? (Argument.dataTypeBoolean, Argument.dataTypeNumber, or Argument.dataTypeString)
description: A description for the user entering this argument (string)
requiredOnCreate: Is this a required argument? (boolean)
requiredOnEdit: Does a new value need to be specified when editing this input? (boolean)

After adding arguments to the scheme, return the scheme and we close the function:

        return scheme;
    };

The validateInput method

The validateInput method is where the configuration of an input is validated, and is only needed if you’ve set your modular input to use external validation. If validateInput does not call the done callback with an error argument, the input is assumed to be valid. Otherwise it throws an error when it tells Splunk that the configuration is not valid.

When you use external validation, after splunkd calls the modular input with the –scheme argument to get the scheme, it calls it again with the –validate-arguments argument for each instance of the modular inputs in its configuration files, feeding XML on stdin to the modular input to validate all enabled inputs. Splunk calls the modular input the same way again whenever the modular input’s configuration is changed.

In our GitHub Commits example, we’re using external validation since we want to make sure the repository is valid. Our validateInput method contains logic used the GitHub API to check that there is at least one commit on the master branch of the specified repository:

    exports.validateInput = function(definition, done) { 
        var owner = definition.parameters.owner;
        var repository = definition.parameters.repository;
        var token = definition.parameters.token;

        var GitHub = new GitHubAPI({version: "3.0.0"});

        try {
            if (token && token.length > 0) {
                GitHub.authenticate({
                    type: "oauth",
                    token: token
                });
            }

            GitHub.repos.getCommits({
                headers: {"User-Agent": SDK_UA_STRING},
                user: owner,
                repo: repository,
                per_page: 1,
                page: 1
            }, function (err, res) {
                if (err) {
                    done(err);
                }
                else {
                    if (res.message) {
                        done(new Error(res.message));
                    }
                    else if (res.length === 1 && res[0].hasOwnProperty("sha")) {
                        done();
                    }
                    else {
                        done(new Error("Expected only the latest commit, instead found " + res.length + " commits."));
                    }
                }
            });
        }
        catch (e) {
            done(e);
        }
    };

The streamEvents method

Here’s the best and most important part, streaming events!

The streamEvents method is where the event streaming happens. Events are streamed into stdout using an InputDefinition object as input that determines what events are streamed. In the case of the GitHub commits example, for each input, the arguments are retrieved before connecting to the GitHub API. Then, we go through each commit in the repository on the master branch.

Creating Events and Checkpointing

For each commit, we’ll check to see if we’ve already indexed it by looking in a checkpoint file. This is a file that Splunk allows us to create in order to track which data has been already processed so that we can prevent duplicates. If we have indexed the commit, we simply move on – we don’t want to have duplicate commit data in Splunk. If we haven’t indexed the commit we’ll create an Event object, set its properties, write the event using the EventWriter, then append the unique SHA for the commit to the checkpoint file. We will create a new checkpoint file for each input (in this case, each repository).

The getDisplayDate function, is used to transform the date we get back from the GitHub API into something more readable format.

exports.streamEvents = function(name, singleInput, eventWriter, done) {
        // Get the checkpoint directory out of the modular input's metadata.
        var checkpointDir = this._inputDefinition.metadata["checkpoint_dir"];

        var owner = singleInput.owner;
        var repository = singleInput.repository;
        var token      = singleInput.token;

        var alreadyIndexed = 0;

        var GitHub = new GitHubAPI({version: "3.0.0"});

        if (token && token.length > 0) {
            GitHub.authenticate({
                type: "oauth",
                token: token
            });
        }

        var page = 1;
        var working = true;

        Async.whilst(
            function() {
                return working;
            },
            function(callback) {
                try {
                    GitHub.repos.getCommits({
                        headers: {"User-Agent": SDK_UA_STRING},
                        user: owner,
                        repo: repository,
                        per_page: 100,
                        page: page
                    }, function (err, res) {
                        if (err) {
                            callback(err);
                            return;
                        }

                        if (res.meta.link.indexOf("rel=\"next\"") < 0) {
                            working = false;
                        }
                        
                        var checkpointFilePath  = path.join(checkpointDir, owner + " " + repository + ".txt");
                        var checkpointFileNewContents = "";
                        var errorFound = false;

                        var checkpointFileContents = "";
                        try {
                            checkpointFileContents = utils.readFile("", checkpointFilePath);
                        }
                        catch (e) {
                            fs.appendFileSync(checkpointFilePath, "");
                        }

                        for (var i = 0; i < res.length && !errorFound; i++) {
                            var json = {
                                sha: res[i].sha,
                                api_url: res[i].url,
                                url: "https://github.com/" + owner + "/" + repository + "/commit/" + res[i].sha
                            };

                            if (checkpointFileContents.indexOf(res[i].sha + "\n") < 0) {
                                var commit = res[i].commit;

                                json.message = commit.message.replace(/(\n|\r)+/g, " ");
                                json.author = commit.author.name;
                                json.rawdate = commit.author.date;
                                json.displaydate = getDisplayDate(commit.author.date.replace("T|Z", " ").trim());

                                try {
                                    var event = new Event({
                                        stanza: repository,
                                        sourcetype: "github_commits",
                                        data: JSON.stringify(json),
                                        time: Date.parse(json.rawdate)
                                    });
                                    eventWriter.writeEvent(event);

                                    checkpointFileNewContents += res[i].sha + "\n";
                                    Logger.info(name, "Indexed a GitHub commit with sha: " + res[i].sha);
                                }
                                catch (e) {
                                    errorFound = true;
                                    working = false;
                                    Logger.error(name, e.message, eventWriter._err);
                                    fs.appendFileSync(checkpointFilePath, checkpointFileNewContents);

                                    done(e);
                                    return;
                                }
                            }
                            else {
                                alreadyIndexed++;
                            }
                        }

                        fs.appendFileSync(checkpointFilePath, checkpointFileNewContents);

                        if (alreadyIndexed > 0) {
                            Logger.info(name, "Skipped " + alreadyIndexed.toString() + " already indexed GitHub commits from " + owner + "/" + repository);
                        }

                        page++;
                        alreadyIndexed = 0;
                        callback();
                    });
                }
                catch (e) {
                    callback(e);
                }
            },
            function(err) {
                done(err);
            }
        );
    };

Logging (optional)

Logging is an optional feature we’ve included with modular inputs the Splunk SDK for JavaScript.

It’s best practice for your modular input script to log diagnostic data to splunkd.log ($SPLUNK_HOME/var/log/splunk/splunkd.log). Use a Logger method to write log messages, which include a standard splunkd.log severity level (such as “DEBUG”, “WARN”, “ERROR” and so on) and a descriptive message. For instance, the following code is from the GitHub Commits streamEvents example, and logs a message if any GitHub commits have already been indexed:

if (alreadyIndexed > 0) {
    Logger.info(name, "Skipped " + alreadyIndexed.toString() + " already indexed GitHub commits from " + owner + "/" + repository);
}

Here we call the Logger.info method to log a message with the info severity, we’re also passing in the name argument, which the user set when creating the input.

That’s all the code you have to write to get started with modular inputs using the Splunk SDK for JavaScript!

Add the modular input to Splunk Enterprise

With your modular input completed, you’re ready to integrate it into Splunk Enterprise. First, package the input, and then install the modular input as a Splunk app.

Package the input

Files

Create the following files with the content indicated. Wherever you see modinput_name — whether in the file name or its contents — replace it with the name of your modular input JavaScript file. For example, if your script’s file name is github_commits.js, give the file indicated as modinput_name.cmd the name github_commits.cmd.

If you haven’t already, now is a good time to set your $SPLUNK_HOME environment variable.

We need to make sure all the names match up here, or Splunk will have problems recognizing your modular input.

modinput_name.cmd

@"%SPLUNK_HOME%"\bin\splunk cmd node "%~dp0\app\modinput_name.js" %*

modinput_name.sh

#!/bin/bash

current_dir=$(dirname "$0")
"$SPLUNK_HOME/bin/splunk" cmd node "$current_dir/app/modinput_name.js" $@

package.json

When creating this file, replace the values given with the corresponding values for your modular input. All values (except the splunk-sdk dependency, which should stay at “>=1.4.0″) can be changed.

{
    "name": "modinput_name",
    "version": "0.0.1",
    "description": "My great modular input",
    "main": "modinput_name.js",
    "dependencies": {
        "splunk-sdk": ">=1.4.0"
    },
    "author": "Me"
}

app.conf

When creating this file, replace the values given with the corresponding values for your modular input:

The is_configured value determines whether the modular input is preconfigured on install, or whether the user should configure it.
The is_visible value determines whether the modular input is visible to the user in Splunk Web.

inputs.conf.spec

[install]
is_configured = 0

[ui]
is_visible = 0
label = My modular input

[launcher]
author=Me
description=My great modular input
version = 1.0

When creating this file, in addition to replacing modinput_name with the name of your modular input’s JavaScript file, do the following:

After the asterisk (*), type a description for your modular input.
Add any arguments to your modular input as shown. You must list every argument that you define in the getScheme method of your script.

The file should look something like this:

[github_commits://<name>]
*Generates events of GitHub commits from a specified repository.

owner = <value>
repository = <value>
token = <value>

File structure

Next, create a directory that corresponds to the name of your modular input script—for instance, “modinput_name” — in a location such as your Documents directory. (It can be anywhere; you’ll copy the directory over to your Splunk Enterprise directory at the end of this process.)

Within this directory, create the following directory structure:
```
modinput_name/
    bin/
        app/
    default/
    README/
```

Copy your modular input script (modinput_name.js) and the files you created in the previous section so that your directory structure looks like this:

modinput_name/
    bin/
        modinput_name.cmd
        modinput_name.sh
        app/
            package.json
            modinput_name.js
    default/
        app.conf
    README/
        inputs.conf.spec

Install the modular input

Before using your modular input as a data input for your Splunk Enterprise instance, you must first install it.

Set the SPLUNK_HOME environment variable to the root directory of your Splunk Enterprise instance.
Copy the directory you created in Package the script to the following directory:
```
$SPLUNK_HOME/etc/apps/
```
Open a command prompt or terminal window and go to the following directory, where modinput_name is the name of your modular input script:
```
$SPLUNK_HOME/etc/apps/modinput_name/bin/app
```
Type the following, and then press Enter or Return: npm install
Restart Splunk Enterprise: From Splunk Home, click the Settings menu. Under System, click Server Controls. Click Restart Splunk; alternatively you can just run
```
$SPLUNK_HOME/bin/splunk restart
```
from command prompt or terminal.

Your modular input should now appear long the native Splunk input by going to Splunk Home, click the Settings menu. Under Data, click Data inputs, and find the names of the modular inputs you just created.

In Summary

In this post you’ve seen how to create a modular input using the Splunk SDK for JavaScript.

Now you can use your Node.js skills to extend Splunk and pull data from any source, even Github!

↧

What’s in store for Developers at .conf2014

October 1, 2014, 11:27 am

≫ Next: Get your Community on at .conf2014!

≪ Previous: New support for authoring modular inputs in Node.js

In less then a week, .conf2014 kicks off at the MGM Grand in Las Vegas. As in past years, there won’t only be tons of great keynotes, sessions and training for the entire Splunk community, but also plenty of things tailored just for developers.

Once again, Splunk University starts off the week with hands-on training, including an intense Splunk App Developer Bootcamp
This year we’re introducing the Splunk Dev Lounge, a dedicated space for hacking on Splunk throughout the conference. All throughout the week, you’ll find members of the engineering and evangelist teams ready to answer any question or guide you in the right direction. We’ll also have chalk talk sessions (heavy on code, light on slides) led by Splunkers as well as partners like Auth0 (maybe you read about them in USA Today) and community developers like Rich and Erica who won the Splunk App Contest last year with their Splunk for your Car App. Whether you’re new to Splunk and want to dig in at your own pace or a Splunk Ninja looking to pick up new skills building Splunk Apps, working with the SDKs or extending Splunk with modular inputs and custom search commands, the Splunk Dev Lounge is the place to be at .conf2014.
We have another full docket of sessions, covering everything from Splunk for .NET developers (led by community luminary and Splunker Glenn Block) to Splunking the Java Virtual Machine (led by Splunk Worldwide Evangelist Damien Dallimore) to building Splunk Apps to DevOps topics.
We’ll also have a Splunk for Developers booth at the Splunk Apps Showcase, with Splunkers on hand to show you the latest and greatest in building Splunk Apps, integrating Splunk with the REST API and SDKs and using Splunk to monitor App Dev and DevOps processes. Stop by to see what’s new and find out how you can do more as a developer with Splunk.

Throw in the great networking and socializing, and .conf2014 looks to be the biggest and best .conf yet. See you in Vegas!

↧

Get your Community on at .conf2014!

October 2, 2014, 7:16 am

≫ Next: Now Time For the Splunk Weather Forecast

≪ Previous: What’s in store for Developers at .conf2014

Community is HUGE at Splunk, and we’re doing it up big at this year’s .conf with our own gigantic Community Lounge. Here’s a sampling of what’s in the works:

Masters of IRC panel discussion

Wednesday, Oct 8th 11am-12noon on the Community Stage

Join us for an informal panel discussion with 6-7 of our most knowledgeable, longtime customers from the #splunk IRC channel. They will be taking your questions and sharing best practices and stories from their long years of experience deploying and maintaining Splunk at scale. Bring your questions! Whisky optional, but recommended :).

Learn how to start your own Splunk User Group (and meet other people who do, too)

Wednesday, Oct 8th, 12:15pm – 12:45pm on the Community Stage

Tony Reinke, leader of the most excellent Splunk402 User Group in Nebraska will tell us how to make friends and meet with them on a regular basis to talk about your shared interest: Splunk! Come find out how to start a user group, or just to meet people who might be starting one in your area. Bring business cards!

Gamers, splunked

Monday: 5pm–7pm

Tuesday: 8am-9am, 12pm–6pm

Wednesday: 8am–7pm

Thursday: 8am–2pm

Like last year, the Gamer Lounge will be open every day to meet your fragging needs. We’ll be splunking Team Fortress 2 and Minecraft, with some MAJOR live dashboard awesomeness for all to ogle. Come see how much fun it is to set fire to a spy or build a sentry turret and wreak havoc on the opposing team, then look at your own player data onscreen. If resource management and construction is more your style, play a little Minecraft and watch your resource usage and progress on a map! Here’s a sneak (in-development) preview of one of the many new dashboards in the TF2 app:

Many thanks to Splunkers Jesse Miller, Stephen Luedtke, Vladimir Skoryk, Satoshi Kawasaki, Michael Szebenyi, and Jeff Bernt for making these apps happen!

↧

Now Time For the Splunk Weather Forecast

October 15, 2014, 12:38 am

≫ Next: Delegated admin

≪ Previous: Get your Community on at .conf2014!

Raspberry Pi, Air Pi, and Splunk

If you were at .conf last week you would have likely seen some of the exciting Internet of Things projects people are using Splunk for. I think Ed Hunsinger put it best:

So far I’ve heard about @splunk being used for planes (Royal Flying Doctor), trains (New York Air Brake), and automobiles (VW). #splunkconf

@edhunsinger

Watching .conf 2014 from a far in the UK, I got excited about some of my own IOT projects. Then I remembered Brian Gillmore’s call for cool projects using Splunk with the RaspberryPi. At the same moment, by pure chance, I got an email telling me AirPi circuit boards (a RaspberryPi connected weather station) were back in-stock.

And it was settled. I would build a RaspberryPi weather station and Splunk the data. Here’s how I did it.

Step 1: Assemble the AirPi

Splunk AirPi Build

Essentially you can Splunk most data generated from a RaspberryPi and the additional components you hook up. For this project though I decided to use an AirPi circuit board and the bundled components (temperature, humidity, sound, light, air quality, and pressure sensors). You can pick up one here.

You’ll need to solder the board yourself, but don’t let this put you off. As Tom Hartley (the creator of the AirPi) notes, “we’ve had many people learn to solder using the AirPi kits!”. Soldering kits can be bought very cheaply off eBay too. Just make sure you use Rosin Core solder.

Step 2: Install the AirPi Code

AirPi Source Code

There are very detailed instructions of how to install the AirPi code onto your RaspberryPi here.

For this project I used a forked version of the source code created by Haydn Williams. Haydn added a write to CSV output function. Following the instructions, I wrote all my AirPi logs out to a CSV file located on an external hard drive.

Step 3: Configure the RaspberryPi Splunk Forwarder

Splunk RaspberryPi Forwarder

Grab the RaspberryPi Forwarder from apps.splunk.com. The great thing about the RaspberryPi Forwarder is that it works in exactly the same way as a regular Splunk Forwarder.

If you get stuck there is full documentation here. Brian Gillmore also wrote a great post on getting started with the Raspberry Pi Forwarder here too.

In my setup I used the Forwarder to monitor the CSV file I created as an output in step 2.

Step 4: Start Splunking

Splunk AirPi Feed

I set up my logs to be stored in an index named “rpi” so a simple search returns my AirPi’s output (assuming everything has worked well).

index="rpi"

I put the output in an easy to read table and noticed very quickly that everything wasn’t working as expected.

index=rpi | table SimpleTime AirQuality Humidity LightLevel Pressure Temp_BMP Temp_DHT UVLevel

The results for “Pressure” and “AirQuality” remained static for all results (likely a problem with the sensors).

Step 5: Make it look sexy

AirPi Splunk Dashboard

I created a simple dashboard in under 10 minutes which shows me real-time and historic information about what the sensors on my AirPi have logged.

Here are some of my searches:

Light Level vs. UV Level

index=rpi | timechart avg(LightLevel) as LightLevel avg(UVLevel) as "UV Level" span=1h

Temperature vs. Humidity

index=rpi | timechart avg(Temp_DHT) as "Temp DHT" avg(Temp_BMP) as "Temp BMP" avg(Humidity) as "Humidity" span=1h

Here’s the full code for the dashboard to get you started.

Step 6: Take on the Splunk RaspberryPi challenge

Can you improve on my weather station? Or do you have another exciting RaspberryPi project brewing where Splunk could help you collect and understand the data?

Let me know in the comments – I’d love to see what you’re working on!

↧

Delegated admin

October 30, 2014, 1:27 pm

≫ Next: Protocol Data Inputs

≪ Previous: Now Time For the Splunk Weather Forecast

The role hierarchy in splunk allows a user who has the ‘edit_user’ capability to create other splunk users and grant them any role including admin. But what if you want delegate user creation to a ‘mini-admin’ who should be able to create only users but not more admins.

Starting 6.2, we have the concept of a delegated admin, who can create users who can only belong to a pre-provided list of roles. This is a way of enforcing the principle that users can only create other users with privileges that are a subset of their own.

Let us see how this can be achieved.

Step 1 – Create a new role with the ‘edit_user’ capability and pass in an additional attribute called ‘grantable_ roles’ at the time of role creation. You can do so using curl or ‘splunk _internal’.

Here, we have created a new role called ‘delegated_admin’. A user belonging to this role can create users but these users have to belong to the user or power role.

Step 2 – Create a user for that role. Let us call the new user ‘delegated-admin’.

Step 3 – User ‘delegated_admin’ now creates new users.

But he is prevented from creating users outside the set of ‘grantable_roles’. Thus, a delegated admin cannot build a new user with permissions that he himself does not already have.

↧

Protocol Data Inputs

November 11, 2014, 12:16 pm

≫ Next: The Bank of Splunk

≪ Previous: Delegated admin

It must have been about a year ago now that I was talking with a Data Scientist at a Splunk Live event about some of the quite advanced use cases he was trying to achieve with Splunk. That conversation seeded some ideas in my mind , they fermented for a while as I toyed with designs , and over the last couple of months I’ve chipped away at creating a new Splunk App , Protocol Data Inputs (PDI).

So what is this all about ? Well to put it quite simply , it is a Modular Input for receiving data via a number of different protocols, with some pretty cool bells and whistles.

So let’s break down some of the features.

Core Architecture

PDI is implemented as a Modular Input , but the internal architecture relies on a little bit of special sauce.A framework called Vertx is utilized under the hood. I came across this framework because in a past life I had a great deal of success building very robust and scalable applications using Netty , and Vertx builds upon Netty. So I have become a bit of a fanboy of this ecosystem

This framework provides for an implementation that is :

asynchronous
event driven (reactive)
polyglot (code custom data handlers in numerous different languages )
non blocking IO
scales over all your available cores
can serve high volumes of concurrent client connections

Server

So what sort of protocols can PDI establish servers for ?

TCP
TCP w/ TLS , optional client certificate authentication
UDP (unicast and multicast)
HTTP (PUT and POST methods only , data in request body & file uploads)
HTTPS (PUT and POST methods only , data in request body & file uploads) , optional client certificate authentication
Websockets
SockJS

Event Bus

When the server receives some raw data (bytes) it does not alter the data in anyway. It simply receives the data and places it on an internal event bus. On the other side of this event bus is a data handler that you configure for each stanza you setup. It is then the job of the data handler to take these raw bytes and do something with them ie: turn them into text and output to Splunk.

Data Handler

The way in which the Modular Input processes the received raw data is entirely pluggable with custom implementations should you wish.

This allows you to :

pre process the raw data before indexing
transform the data into a more optimum state for Splunk
perform custom computations on the data that the Splunk Search language is not the best fit for
decode binary data (encrypted , compressed , images , proprietary protocols , EBCDIC etc….)
enforce CIM compliance on the data you feed into the Splunk indexing pipeline
basically do anything programmatic to the raw byte data you want

To do this you code a Vertx “Verticle“ to handle the received data.

These data handlers can be written in numerous JVM languages.

You then place the handler in the protocol_ta/bin/datahandlers directory.

On the Splunk config screen for the Modular Input there is a field where you can then specify the name of this handler to be applied.

If you don’t need a custom handler then a default handler is used.This simply rolls out any received bytes to text.

To get started , you can refer to the default handler examples in the datahandlers directory.

Polyglot

As mentioned above , you can write your data handlers in numerous languages.

Currently available languages and their file suffixes are :

Javascript .js
CoffeeScript .coffee
Ruby .rb
Python .py
Groovy .groovy
Java .java (compiled to .class)
Scala .scala
Clojure .clj
PHP .php
Ceylon .ceylon

Experimental Nashorn support is included for js and coffee (requires Java 8). To use the Nashorn JS/Coffee engine rather than the default Rhino engine , then edit protocol_ta/bin/vertx_conf/langs.properties

A DynJS Javascript language engine is also available. This means that you can use Nodyn to run your Node.js scripts as Vertx Verticles. Check out this blog from the Nodyn team.

Scalability

Due to the nature of the async/event driven/non blocking architecture , the out of the box default settings may just well suffice for you. But you can certainly “turn the amp up to 11″ if you need to.

In the above diagram , you can see just 1 instance for the server and data handler. However if you want to achieve more scale and more effectively utillise your CPU cores , then you can declaratively add more instances.

TLS

This is provisioned using your own Java Keystore that you can create using the keytool utility that is part of the JDK.

Client Authentication

Client certificate based authentication can be enabled for the TLS channels you setup.

Vertx Modules and Repositorys

Any required Vertx modules , such as various language modules for the polyglot functionality, will be dynamically downloaded from online repositorys and installed in your protocol_ta/bin/vertx_modules directory.

You can edit your repository locations in protocol_ta/bin/vertx_conf/repos.txt

But we already have TCP/UDP natively in Splunk

Yes we do. And by all means use those if they suit your use case. But if you want to perform some custom data handling and pre-processing of the received data before it gets indexed (above and beyond what you can accomplish using Splunk conf files) ,then this Modular Input presents another option for you.Furthermore , this Modular Input also implements several other protocols for sending data to Splunk.

Links

View PDI Source and example Data Handlers on Github

Download PDI App from apps.splunk.com

Vertx

Presentation by Tim Fox (Tim is employed by Red Hat where he is the creator and project lead for Vertx)

What Next ?

Download it, use it, tell me how to make it better , create data handlers in your language of choice and share them with the community. But most importantly….. get crazy and innovative with your data !!

↧

The Bank of Splunk

November 12, 2014, 2:39 pm

≫ Next: Building a great Splunk App for Apptitude

≪ Previous: Protocol Data Inputs

Spend by City

No, we’re not diversifying into a financial services company…

I recently received a letter from Her Majesty’s Revenue and Customs. If you’re reading from the US, they perform many of the same duties as the Internal Revenue Service. Thankfully it wasn’t a demand for unpaid taxes, but a breakdown of how my taxes had been spent over the previous year on things like education and welfare.

For a long time I’ve wanted to quantify my monthly financial accounts, similar to this letter, starting from when I first opened my bank account. Unfortunately in the UK we don’t have a product that works like MINT to do this just yet… but we do have Splunk.

Using Splunk I’ve now started to track my bank account activity, as well other financial service products I’m signed up to. One of these includes Nexonia, an expense tracking application we currently use at Splunk. Here’s how…

Expense Data

Nexonia Report

To get Nexonia data into Splunk I used the report function. I created a report to grab all my previous expenses, selecting the output as CSV. I then took this CSV and imported it into Splunk. Simple, right?

What Did I Discover?

I’ve spent the most money in the London. Followed by Dubai, and the San Francisco. However, data for London is skewed as most of my airfare is booked out of London.

Accommodation accounts for 35% of my total spend. Airfare is 24%, with all other items all fairly spread between 2% – 4% of total spend.

I spent the most money in July 2014. Almost double the second highest month (October 2014).

There is a correlation between Foursquare check-ins and the total amount of money spent.

The total amount I’ve spent is… a secret!

Try it yourself

Splunk Nexonia Dashboard

If you’re a Nexonia user you can grab my Splunk searches here. Most other services offer a CSV export format functionality and will work just as seamlessly with Splunk.

Where it gets interesting is when other data sources are added to Splunk. I used Foursquare checkins; but what effect does the weather have on spending for example?

I’m going to be using Splunk to track my finances each month to help me budget better. I’ll let you know how I get on!

↧

Building a great Splunk App for Apptitude

November 18, 2014, 6:01 pm

≫ Next: Making Sense: Manufacturing, Splunk and Industrial Data

≪ Previous: The Bank of Splunk

How do I build an app that’s going to stand out as the best among an intensely competitive pool? That’s a question that’s on a lot of minds as Splunk Apptitude gets rolling.

Splunk has introduced a program that rewards the best Splunk App in two categories, with a big cash payout. Apptitude is getting the attention of a lot of users and partners, Splunkers who may have created apps for their own purposes, but who never considered submitting their work to the Splunk Apps site.

So, what it does it take to earn glory, karma, and the admiration of your peers? All you have to do is create and publish a solid winning Splunk app in one of the contest categories! Requirements for packaging, distribution and directory hierarchy are described on the Resources page.

What does it take to earn cash, or the chance at a free .conf pass? Getting to the top requires a little more attention to detail, and has to really stand out from the crowd. For those who are setting their sights on a win, below are few tips that might help. These tips come from a guide which details many best practices for building a Splunk app which we have collected into one place for easy reference:

Consider building with a team. The time period for this first round of Apptitude is short. More hands make for faster iteration—plus, it’s more fun!
If you’ve never submitted an app to Splunk Apps before, consider using the app template embedded in Splunk. It is at Apps>Manage Apps>Create App. Using this template will help you keep the elements of your app in the right locations.
Make sure you read the Package your app or add-on page in our docs, and follow all of the advice there, such as testing knowledge object permissions. Before you submit the app, try it on a clean Splunk install and make sure all everything is accessible by a non-admin user.
Use the Common Information Model so that data from your app can be seamlessly integrated with other data sources and apps. Here’s a recent blog post with links to several CIM resources.
Check out the Dashboard Examples app. This is especially important if you are a veteran Splunker. You are likely to find some kickass stuff in that app which you didn’t even know existed! This app reads like a cookbook, and has complete SPL and XML sample code for every example.
Pay close attention to naming conventions.

And watch out for little things like:

Make sure to parameterize index names into eventtypes or macros. That way you only have to update them in one place instead of modifying every query that uses them.
Store all your customizations in default, not local.
Don’t hardcode the paths in a data collection script. Use the environment variable such as $SPLUNK_HOME to be resilient to Splunk configuration changes and platform differences.

We also have these books to recommend:

Working with Apps in Splunk, by Vincent Bumgarner
Exploring Splunk, by David Carasso

Last but not least, if you get stuck, or just want to help others, collaborate using the Splunk Answers site, or any of our other community resources! We’ve even created an Apptitude tag, which can be used as an FAQ for the contest itself.

Happy Splunking!

↧

Making Sense: Manufacturing, Splunk and Industrial Data

November 25, 2014, 1:15 pm

≪ Previous: Building a great Splunk App for Apptitude

Recently, in the online publication Manufacturers Monthly, Denise Carson published a piece called “Harnessing Operational Intelligence”, and really made the case for using big-data and platforms like Splunk to deal with “rising costs and the tyranny of distance”. Denise explained that operational intelligence has the potential to help manufacturers do things smarter and remain competitive in the face of massive volumes, velocity, and variety of data.

In the same week, in the “Smart Business” section of the Chinese language ITHome.com, Yu Zhihao wrote about how a Korean semiconductor company was using Splunk and big data to perform real-time analysis of the semiconductor production line, and was quickly getting to the bottom of production issues through advanced analytics and real-time alerting.

In that article (open in Chrome to translate from Chinese), details are presented on what is really a very advanced application of Splunk. As you may know, Splunk Enterprise is a highly extensible machine data platform. Just as Splunk’s core framework was extended for applications including Enterprise Security, and PCI Compliance, customers and partners are now recognizing that by harnessing the core capabilities of Splunk, but by providing some code extension and domain expertise on top of the platform, you can build world class solutions for specific domains, even in emerging areas like the internet of things, industrial environments and manufacturing. Lets look a little bit further into how this specific manufacturer built a Manufacturing Management Solution with Splunk.

Manu1

First, the semiconductor manufacturer uses Splunk to monitor 300GB of daily log data from the Tibco middleware already installed in their environment. While it appears that this data is loaded through file system monitoring and a Universal Forwarder, loading data from TibcoEMS and other message queues could also be done in real-time with the JMS Messaging Modular Input. There are other Message Queue Modular inputs built for Splunk to index data from Amazon Kinesis, Apache Kafka, MQTT, SNMP, and AMQP, please check out our IoT Solutions webpage for more information on accessing data through these methods. You can also easily access the real-time industrial data from manufacturing equipment through the Kepware Industrial Data Forwarder for Splunk.

Splunk ingests this data into a tier of indexers, and a set of rules (close to 300 according to the article) run in Splunk to identify abnormal manufacturing process steps and alert facility operators of the issue. Operators can then re-adjust the machine parameters to improve yield (2% reported) and to reduce raw material cost by 5%.

In another facility, an integrated circuit (IC) packaging and testing plant, quality assurance (QA) data from the testing equipment is entered into Splunk from a summarization layer for search, exploration and analytics.

manu2

Through a brilliant extension of Splunk with the R Project for Statistical Computing (see the R Project App for Splunk) and SciKit-Learn, novel complex event processing and machine learning capabilities were built into the solution to enable advanced statistical processing of the QA data. Insights gained through this approach apparently reduced IC packaging defect rates by 5%!

As Splunk continues to be applied to novel applications, there will be a continued demand for development on Splunk’s machine data platform to provide solutions based on specific domain expertise. Manufacturing is just one of the areas where we are seeing interest and acceleration of Splunk adoption. Oil and Gas Production, Smart Buildings, and Transportation are all areas where operational intelligence is gaining momentum, and there will be clear demand for applications and solutions built for these industries. As Denise Carson noted, Splunk and operational intelligence has the potential to truly break down data siloes and provides “a pragmatic approach” for industries to “optimize and innovate for sustainability and competitive advantage”.

As a catalyst to innovation in all areas of operational intelligence, Splunk is holding the Apptitude contest, an online competition for the next big app in Splunk. I’d love to see someone build a market changing application for industrial data and win this thing. Think of the possibilities – automated fault detection and diagnosis? Some advancement in rules processing? A unique visualization platform for industrial environments? The possibilities are only limited by your imagination. And to spark that imagination, and a bit of action, we are a offering 1^st prize of $20,000 cash and a free trip to .conf2015. Competition will be stiff, lets see what you have!

Any questions, concerns, thoughts, or results sets from general musings? I can always be reached at bgilmore@splunk.com, through our IoT and Industrial “Ask the Experts”, on Twitter @BrianMGilmore, or on LinkedIn at http://www.linkedin.com/in/industrialdata

↧

Popular Cisco Networks App Recognized with Splunk “Revolution Award”

December 8, 2014, 12:59 pm

≫ Next: Make it flash! Make it flash!

≪ Previous: Making Sense: Manufacturing, Splunk and Industrial Data

The first inkling I had of the usefulness of the Cisco Networks App for Splunk Enterprise (formerly Cisco IOS) came from a Cisco field team who helped their customer get the app working and immediately identified multiple issues with flapping ports. In the months that followed I’ve had the pleasure of getting to know Datametrix senior consultant, Splunk app developer and general rock star Mikael Bjerkeland.

Godfrey Sullivan, Chairman and CEO, Splunk Inc. (L) congratulates Mikael Bjerkeland, Sr. consultant at Datametrix AS on his Revolution Award

Godfrey Sullivan, Chairman and CEO, Splunk Inc. (L) congratulates Mikael Bjerkeland, Sr. Consultant at Datametrix AS (R) on his 2014 Revolution Award

At .conf2014 Mikael was recognized with a much-deserved Splunk 2014 Revolution Award. ComputerWorld Norway profiled the award and the Cisco networking app in a fantastic article (“Norsk programvaresuksess”) that anyone using Splunk and Cisco networking gear should read.

For folks who don’t speak Norwegian, here’s a quick recap …

Several years ago Mikael was inspired to start experimenting with Splunk after attending a Splunk app development session. Norwegian Cisco Gold Certified Partner Datametrix has extensive expertise in designing and delivering end-to-end turn-key networking, datacenter, collaboration and security solutions for local organizations. Roughly 80% of Datametrix’s business is related to Cisco products & services, and Mikael works closely with customers using a range of Cisco networking gear and network management systems such as Cisco Prime Infrastructure. He has led a number of projects focused on manual task automation.

As a result, Mikael is intimately familiar with the challenges associated with managing complex networks and need for centralized visibility across switches, routers, wireless devices and other infrastructure components. Mikael explains in the ComputerWorld article,

“I noticed that despite how widely Cisco routing and switching technologies have been deployed there wasn’t any Splunk app for Cisco IOS. This seemed like a huge void, so I decided to build one.”

“I want customers to engage in managing their devices and not just forget them immediately after putting them into operation.”

Mikael published his Cisco networking app to Splunk Apps in February 2013, with no promotion or fanfare. Requests for installation support and enhancements plus simple notes of thanks for an “excellent” app poured in.

The app provides simple yet valuable functionality, according to Mikael.

“My app extracts specific fields for values contained in the log file and runs a number of analyses on events, fault codes and other data. Users can correlate this with other log data and look at trends over time to identify network errors before systems are impacted. You simply can’t get this with the naked eye by logging in and looking at raw log files.”

Today, Mikael’s Cisco networking app covers a variety of Cisco networking devices including switches, routers and data center switches that support the standard Cisco format for syslog. This includes devices using IOS, IOS XE, IOS XR and NX-OS plus the Cisco WLC WLAN controller. The app has been regularly updated; the 2.0 release is compatible with Splunk Enterprise 6.2 and 6.1, fully CIM-compliant and includes a variety of enhanced dashboards and drilldowns, and Smart Call Home support. Check out a cool demo here:

With more than 7000 downloads, Mikael’s Cisco app ranks as one of the most popular Splunk apps.

Pretty cool, eh?

A hearty congratulations to Mikael and Datametrix … can’t wait to see what you do next!

Learn more:

↧

Make it flash! Make it flash!

December 10, 2014, 2:07 pm

≫ Next: SSSL (Splunk Secure Sockets Layer)

Splunk Traffic Lights

Splunk ships with some really neat visualisation options. From bar charts to gauges. Though sometimes they just don’t fit your requirements.

Wether that be something as simple as an custom icon or a super-slick D3 visualisation, Splunk’s framework makes it really easy to display your data in many number of ways.

One of the things I get asked a lot is: “Can we have a traffic light?”. The answer – yes! Let me show you how to light Splunk up in this post.

The Basics

You might have used Splunk’s built in rangemap command. It’s pretty awesome. You can group results based on defined ranges within your search. For example:

... | rangemap field=count low=0-0 elevated=1-100 default=severe

This example sets the value of each event’s range field to “low” if its count field is 0 (zero); “elevated”, if between 1-100; “severe”, otherwise.

Using rangemap we now have a great foundation to start playing with how the search results are displayed.

For example, using a Single Value Panel in a dashboard you can add colour to the value depending on it’s severity. Red for bad, green for good. You get the idea.

Splunk Single Value Decorations

Splunk will do this automatically when you use the default ranges (low, guarded, elevated, high and severe) because the default Splunk CSS has already mapped these categories to the colours: green, blue, yellow, orange and red, respectively.

You can make your own categories by editing the application.css for a particular app. For example this defines a colour for a Single Value Panel:

.SingleValue .purple {
        background-color: #660066;
        color: #ffffff;
 }

Here we have defined a CSS class that can be referred to by the range name purple (i.e purple=100-200).

Just remember that you will need to edit the dashboard Single Value Panel XML to make the colour show up. In simple XML:

<option name="classField">range</option>

You promised traffic lights!

The next step is to associate an image to a range. We can do this in CSS like this:

.single-value {
	 background-repeat: no-repeat;
     padding-left: 60px; /* Push the text over so that it doesn't sit on top of the image. Change this according to the dimensions of your image. */
     padding-top: 40px;
     height:100px;
}

.single-value.low{
     background-image: url('green-light-39x100.png'); /* Replace with your image. See http://goo.gl/yxW7O */
 }

Save the stylesheet and traffic light image in:

$APPHOME > appserver > static

Then all you need to do is reference the stylesheet in your dashboards XML:

<dashboard stylesheet="YOUR_STYLESHEET_NAME.css">

This code will show the image “green-light-39×100.png: in a single value panel when the result is within the specified low range.

To use more lights than just green we just need to add new images to the other range classes. This will switch the image displayed as the output value moves between ranges.

To make things easier for you: I’ve packaged all the setup instructions and example code into an app you can clone to build your own apps from. You can grab it here.

Once you’ve made it through the traffic

Splunk Dashboard Examples

You should also download the Dashboard Examples app. This app contains guided instructions on how to build tons of new visualisations and views into your Splunk dashboards.

↧

SSSL (Splunk Secure Sockets Layer)

December 17, 2014, 6:47 am

≫ Next: SMail: Splunking Your Inbox

≪ Previous: Make it flash! Make it flash!

Splunk SSL

The primary reason why SSL is used is to keep sensitive information sent across the internet encrypted so that only the intended recipient can understand it.

This is important because the information you send on the internet is passed from computer to computer to get to the destination server. Any computer in between you and the server can see your credit card numbers, usernames, passwords, Splunk searches and other sensitive information if it is not encrypted.

When an SSL certificate is used, the information should become unreadable to everyone except for the server you are sending the information to. This protects it from possible prying eyes.

It is often important to make sure the connection from Splunk Web to the browser you’re using to search from is encrypted. Using a SSL certificate to do this is a piece of cake. Here’s how.

Create a new private key and certificate signing request

SSL CSR

Before obtaining a certificate you’ll need to obtain a certificate signing request (CSR). Using OpenSSL you can use DigiCert to write an OpenSSL command to generate a CSR.

Once you’ve generated the command just paste this into your terminal. Here’s an example command:

$ openssl req -new -newkey rsa:2048 -nodes -out my_domain.csr -keyout my_domain.key -subj "/C=GB/ST=w9/L=London/O=Splunk/OU=Dept. of Awesome/CN=himynamesdave.com"

You will see both a .csr (CSR) and .key (Private Key) file have been created and stored in the current working directory.

my_domain.csr
my_domain.key

You’ll then want to convert the private key (.key) to an RSA private key by navigating to the directory the key file is stored in and run the command.

$ openssl rsa -in my_domain.key -out my_domain.rsa.key

Purchase the certificate

CSR Content

You can then go ahead and choose a certificate to purchase. During the registration phase you will need to provide the content of the CSR (my_domain.csr) to the certificate authority who will then create a new server certificate and sign it. Most certificate providers will walk you through this process.

Once this is complete the certificate authority will issue your certificate. You’ll probably receive 2 files from them that look something like this:

my_splunk_domain.crt
my_bundle.crt

Sometimes you will also receive intermediate certificates. In this case you need to bundle the intermediate and the server certificate into a single certificate, by concatenating the certificates together (the right type, and in the right order) and set that as the server certificate (my_splunk_domain.crt).

Splunk uses .pem certificate files, not .crt which the certificate authority is probably going to provide. We therefore need to concatenate the .csr files provided by our certificate authority into a single .pem file that Splunk will understand. You can do this by running:

$ cat my_splunk_domain.crt my_bundle.crt > my_splunk_bundle.pem

Configure Splunk SSL

Copy both the .pem and .key file to the following directory in your Splunk instance:

$SPLUNK_HOME/opt/splunk/share/splunk/certs/

Now the keys have been uploaded we now need to tell Splunk to accept connections over SSL and where to find the .pem and .key files. To do this we need to edit web.conf here:

$SPLUNK_HOME/etc/system/local/web.conf

With the following code:

[settings]
httpport = 443
enableSplunkWebSSL = 1
privKeyPath = /certs/my_domain.rsa.key
caCertPath = /certs/my_splunk_bundle.pem

After a quick restart of Splunk the SSL connection over port 443 should now be enabled allowing users accessing Splunk Web via a secure connection.

This should work for most browsers. In some cases certificates provided by unknown authorities may be flagged.

If you run into problems check port 443 is open to receive connections – this stumped me for some time! p.s Splunk Answers is also a fountain of wealth

↧

SMail: Splunking Your Inbox

January 5, 2015, 2:51 pm

≫ Next: A custom search command for Yelp

≪ Previous: SSSL (Splunk Secure Sockets Layer)

Splunk GMail

Google sent me a nice message to start the year – “Your inbox is reaching its limit”.

Looking at my GMail inbox I have well over 70k emails, taking up just under 15GB of space. I’m interested in how this number is made up – who emails me the most, who I email, what time I’m most productive, etc.

I decided to download my GMail archive using Google Takeout to analyse the data. Here’s how I did it.

Download Your Inbox

Google Takeout

First, use Google Takeout to download your GMail mailbox. Depending on the amount of emails you have accumulated this might take a while. My ~15GB took about an hour.

Once complete, Google will give you a .zip file. Download and unzip it. You should see a file named something like “<my_gmail_inbox>.mbox”.

Upload the .mbox file to Splunk

If your confident with editing props.conf directly, ignore the next paragraph.

Using the file uploader in Splunk, select your .mbox file using the option “Preview Data Before Indexing”. We will use the data preview to teach Splunk what .mbox events look like so that they are indexed correctly.

Using the “Advanced Mode” tab you can create the props.conf in the GUI. To get data indexing correctly, I suggest a props.conf structure similar to the following:

[gmail-mbox] #remove this line if using the Splunk GUI "Advanced Tab"
MAX_EVENTS = 100000
BREAK_ONLY_BEFORE = From\s.+?@
MAX_TIMESTAMP_LOOKAHEAD = 150
NO_BINARY_CHECK = 1
TRUNCATE = 100000
MAX_DAYS_AGO=3652

Let me describe what is being set here:

MAX_EVENTS = Specifies the maximum number of input lines to add to any event. Example=”100000″. Default=”256″. Some of my messages were over 1000 lines so I shot for 1000x this number.
BREAK_ONLY_BEFORE = Splunk creates a new event if it encounters a new line that matches the regular expression set. Example=”From\s.+?@”. This breaks the GMail events in the correct place (before the line starting: “From xxxx@…”.
TRUNCATE = The default maximum line length (in bytes). Example=”10000″. Default=”100000″. 100000 used in this example seems unlikely to be broken unless a really messy message is found.
MAX_DAYS_AGO= Specifies the maximum number of days past, from the current date, that an extracted date can be valid. Example=”3652″. Default=”2000″. Given that I had messages older than 5 years (1826 days), I increased this to 10 years (3652 days)

More information can be found in the docs here. You should read them

Indexing the data

Splunk GMail
Now all you need to do is set this input as a new sourcetype (in the props.conf above I’ve used “gmail-mbox”) and then upload the file into Splunk.

A simple search for “sourcetype=gmail-mbox” should show all your events indexed and broken apart nicely.

As you can see from the screenshot above the events can vary quite drastically, e.g 21 line event to 821 line event. I have a number of events which are thousands of lines long (mainly the result of email bodies filled with HTML).

The histogram returned immediately gives us a good indication of month-on-month message volume. Note, this search shows both sent and received messages from your GMail account.

Field extraction

You’ll see that fields will not have been extracted correctly from your events, so we need to teach Splunk what this new .mbox format looks like.

For this first exercise I am only interested in the “labels”, “to”, and “from” fields. Here are the extractions I used in my props.conf here:

[gmail-mbox] #remove this line if using the Splunk GUI "Advanced Tab"
... # variables set earlier
EXTRACT-gmail-mbox-labels = X-Gmail-Labels\s*:\s*(?P<X_Gmail_Labels>[\w]+,[\w]+)
EXTRACT-gmail-mbox-from = From\s*:\s*(.*?)(?P<gmail_from>[\w]+@[\w]+.[\w]+)
EXTRACT-gmail-mbox-to = To\s*:\s*(.*?)(?P<gmail_to>[\w]+@[\w]+.[\w]+)

As you can see my regular expression skills are weak and I’m sure you can improve upon these extractions. I needed a fair bit of help just to get this far.

If anyone wants to share how they would pull out fields from GMail’s .mbox file format (or similar email format for that matter), join the conversation over on Splunk Answers or leave a comment on the post. Lots of kudos on offer

Search on

Splunk Gmail Top Senders

Here are some example searches to get you started.

Number of emails you’ve sent

sourcetype="gmail-mbox" gmail_labels=*Sent* | stats count

People you’ve received the most emails from:

sourcetype="gmail-mbox" NOT gmail_from=my@email.com | top limit=10 gmail_from

People you’ve sent the most emails to:

sourcetype="gmail-mbox" NOT gmail_to=my@email.com | top limit=10 gmail_from

To do

More interesting queries
Fine tune existing extractions
Add more extractions

↧

A custom search command for Yelp

January 5, 2015, 3:33 pm

≫ Next: Notes on Splunk CIM

≪ Previous: SMail: Splunking Your Inbox

A while ago we posted on search commands and how to build a basic generating command which creates dummy “Hello World” events. Generating commands can be used for much more including talking to external APIs. For example, a fun command to think about would be allowing you to search for restaurants, theaters, etc using Yelp’s API. We’ve posted a sample Yelp search command that does just that. You can find it on github here.

Using the command you can do things like search for Sushi and Italian restaurants in SF:

| yelp location="San Franciso" term=sushi,italian

Or if you are an adventurer, you can find out where to make that next skydive when you visit New Zealand :-)

| yelp location="Auckland, New Zealand" term="Sky diving"

If you clone the repo you’ll get all the source for the command to see how you can implement one. In the readme, you will see the details for setup and usage.

Enjoy building custom search commands and happy yelping!

↧

Notes on Splunk CIM

January 17, 2015, 6:16 pm

≫ Next: Splunk App for SalesForce

≪ Previous: A custom search command for Yelp

So you want to work with the Splunk Common Information Model, and you’re not sure where to start… developers first working with the CIM and Add-ons are sometimes confused by its minimalist design, particularly if they’re familiar with the broadly used Desktop Management Task Force CIM. Here’s some notes on the CIM’s design that hopefully will help clear things up. First, we’ll look at how it’s used, and then we’ll talk about why the Splunk CIM is designed the way that it is.

The Splunk CIM describes concepts via tags rather than entities via database columns, and the first thing to understand when you’re trying to work with it is the event type. Events are the raw material that we work with, including metric measurements or inventory reports, which are just another type of event to Splunk. Events don’t necessarily have neat definitions of what it all means though, so we use the event type to recognize events and tags to label them. The CIM’s data models then read tagged events and collect recognizable fields to model so that an app can more easily use these events. It’s also important to realize that a given data source might involve lots of data models. For instance, a network firewall will of course produce events for Network Traffic, but it’s also got something to say to Authentication, Change Analysis, Inventory, and Performance models. And if it’s a modern firewall with deep packet inspection capabilities, it’s probably got awareness that’s useful to the Web and Malware models too. You can follow this idea as deeply or shallowly as you need… for instance the firewall may generate alerts, but that doesn’t mean you have to model them to the Alerts data model in CIM if you don’t want. The takeaway here is that the Splunk CIM doesn’t look at a data source and build a data model to describe that source. Instead, it describes a loose set of concepts and lets people model their data to those concepts where it makes sense.

Let’s look at this concept more closely as an app developer. If I want to see something in an app that I’m building, I start at a use case, which depends on a series of ideas:

obviously, I need a use case — the report, correlation search, key indicator, or alert that I want my user to see. This can be done in a lot of ways, but the easiest and most maintainable option is to write a search that tests against structured data for an expected outcome.
that means the next thing I’ll want is a data model, or a structured representation of the attributes and objects that the use case is going to test. As David Wheeler wrote, “All problems in computer science can be solved by another level of indirection.” The extra effort of abstracting raw data to a data model is worthwhile because it provides a stable interface, so that I can deal with changes in the raw data by having an Add-on or set of Add-ons apply Splunk’s late-binding schema to the data sources. It also means that supporting the app is easier, because I can separate problems of “getting data in” from problems of “looking at the data”.
so, I also need an Add-on, or a set of searches and regular expressions that will tag and name the raw data for use in the data model. Luckily there are lots of these on apps.splunk.com, and they’re easy to build as well.
Technically I’m all done solving my problem now, but in Splunk supported apps we also write an eventgen config and a unit test, using samples of data that represent the condition that the use case is looking for. That way we know immediately when a logic error or platform change causes something to break.

When one first looks at this stack of action items, it’s easy to think that a given data model, such as Network Traffic, must therefore include every attribute that the use case might ever want. For instance, if I am doing a Cisco Security panel, I will want to report on the dozens of complex decisions that Cisco gear has made, not the simplistic Network Traffic model. Expanding the Network Traffic model to cover all the attributes available from all sources of network traffic would be missing the design goal of the CIM, in my opinion. The Splunk CIM is a least common denominator of all network traffic sources — it’s simplistic, and therefore slim. It’s easily used by people who aren’t deeply familiar with the underlying technologies, and because it’s built on Splunk, this least common denominator approach doesn’t lose anything from the rich raw data sources. In other words, CIM’s Network Traffic model can be used to understand Stream captures, NetFlow data, Cisco logs, and Check Point logs at the same time, and any interesting indicators can be followed up by drilling into the full fidelity data.

Of course, there are tradeoffs in using a least common denominator model, and it might be instructive to review a different approach such as the Desktop Management Task Force Common Information Model mentioned at the beginning of this post. The DMTF CIM is different from ours in that it’s hierarchical in nature, like a traditional database. In grossly simplified terms, there’s a master entity node like “computer” or “asset”, and then sub-nodes which describe things like “network” or “cdrom”. At the top is the ultimate node that everything inherits from, and everything must fit into an a priori entity model before you can work with it.

This means that each section of the DMTF CIM model is very complex. Quoting from the DMTF CIM tutorial: “The Device Model will not be reviewed in its entirety – because its scope is too large, addressing all the various aspects of hardware functionality, configuration and state. In fact, the Device Model can be broken down to individual components (cooling and power, processors, storage, etc.) that are managed individually.” This is necessary because most possible configurations have to be defined in the structure, and it can easily lead to implementations that contain leftover concepts like columns that describe the type of floppy drive installed in your servers.

By describing the least common denominator of concepts in individual models containing loosely structured knowledge objects, the Splunk CIM can keep it very simple and stupid, and our equivalent of Device Model is described in a single web page. In other words, we shift work from the model architect to the application developer because it allows that developer greater flexibility. By keeping the complexity down and using datamodels as pointers to raw data, we keep performance up, which is a good thing for everyone.

↧

Splunk App for SalesForce

January 21, 2015, 9:38 am

≫ Next: git commit -a -m “Splunking Github Blog”

≪ Previous: Notes on Splunk CIM

Do you manage a Salesforce environment and would like to analyze who is accessing what? Would you like to find out who is exporting sensitive data? Would you like to detect any Salesforce related suspicious activities or any slow running reports, dashboards, SOQL queries?

If the answer to the above is yes, you should check out the Splunk App for Salesforce which has been recently released as a service on Splunk Cloud. This App relies on the Salesforce Event Log File that exposes Salesforce access logs. In addition to that, you can also leverage this app to collect and index any data from the standard Salesforce objects. In other words, you can use this app to index structured and unstructured salesforce data.
For a quick peek at the app, check out the Splunk App for Salesforce Demo

Architecture

In a nutshell, this App provides deep insight into three categories:

• Application Management: You can view various dashboards that let you detect slow running Salesforce reports, slow running dashboards, stale or unused reports. You can also have deep insight into your APEX backend performance such as slow running classes, SOQL queries, triggers, VisualForce pages and much more.

Screen Shot 2014-10-13 at 2.40.28 PM

• Adoption and Usage Analytics: You can use this app to perform trend analysis for all access to Salesforce by user, group, regions. You can find out what browsers, platforms/OS (mobile or PC) your users are connecting from. For example, you can also leverage this app to detect if some access related issues are caused by old/unsupported versions of the browser.

Browser Analytics

• Security: You can detect security threats by analyzing login patterns and also trigger alerts if there are, for example, high login requests from a given IP. The app can also prevent data loss by monitoring Report exports, accessed documents, previews, etc.

Data Export/Access

• Chatter feed: You can also automatically trigger Chatter feed entries for alerting your Salesforce admins of all anomalies.

Chatter feed alert

You can test drive the app for free by signing up for the Online Sandbox. The setup is pretty straight forward and should take you a few minutes provided you have the right access to your Salesforce instance. Make sure you have met all the pre-requisites as per the app documentation

Stay tuned for more. Happy Splunking!

↧

git commit -a -m “Splunking Github Blog”

February 11, 2015, 8:00 am

≫ Next: Announcing the Splunk Developer Guidance

≪ Previous: Splunk App for SalesForce

Github Splunk Analysis

I <3 Github. Splunk <3’s Github (check out our repos here). I am told it is just a coincidence our HQ is opposite theirs.

One of the neat things about Github I am just starting to explore is their API. You can use it to do loads of things, from interrogating user activity to searching for keywords within code. I recently saw this analysis of the most popular programming languages hosted on Github and I was inspired to recreate it within Splunk.

Indexing Github data into Splunk makes it super-simple to start exploring it. In this post I wanted to show you some of my first experiments connecting Splunk into the Github API.

The Prep Work

Github Token

First download and install the Github Modular Input. This will enable us t0 make the API calls to Github.

Now you’ll need to grab a Github token. This is to avoid some rate limiting imposed by unauthicated requests to the API. To do this: log onto github.com > settings > applications > generate new token

Store this somewhere safely.

And that’s it

Add an Input

Github Input

In the Splunk GUI head to: settings > data inputs > github commits > add new

In this example I am going to be querying a repo from our own Splunk org. I’ll use our Javascript SDK repo.

owner: splunk
repository: splunk-sdk-javascript
token: <YOUR_TOKEN>

And that’s it

Start searching

Github Splunk Search

If you use the example above a basic search should return 100 results, as per the per_page value set in the call.

source="source="github_commits://github-commits""

Here’s some other simple searches we can immediately run on this dataset:

The most active users in the organisation:

source="github_commits://github-commits"| stats count(type) as count by author | sort - count

… or least:

source="github_commits://github-commits"| stats count(type) as count by author | sort count

Repository activity over time:

source="github_commits://github-commits" | timechart count(_raw) as activity

Repository activity over time by user:

source="github_commits://github-commits" | timechart count(_raw) as activity by author

You get the idea.

Now you’ve got the basics nailed go away and show me some cool stuff

↧

Announcing the Splunk Developer Guidance

March 18, 2015, 2:00 pm

≫ Next: The Splunk SDK for JavaScript gets support for Node.js v0.12 and io.js!

≪ Previous: git commit -a -m “Splunking Github Blog”

Greetings, Splunk Developer Community!

This week we are announcing the new Splunk Developer Guidance program at the Splunk Partner Summit Americas 2015. The main objective is to provide our developer community with tools and guidance to build amazing apps on the Splunk platform and enrich users’ experience in gaining insights from their machine data – where ever it might come from and whatever domain they might be specializing in! We are fully aware that the first thing most devs are looking for is code that they can take apart, learn from, and reuse. That’s why we built reference apps for you. The reference apps are complete, end-to-end, real-world apps built with our partners that are meant to showcase various underlying technologies as well as good and proven practices & patterns for building Splunk solutions. There are components in the apps that you may find useful and applicable to your own use cases. Whether it’s a data provider (such as one for Google Drive), a visualization widget (for example, a clustered dendrogram or activity heatmap), or a wrapper library of an underlying Splunk feature (for example, kvstore_backbone), feel free to reuse them! Reuse for the win!

The apps are available through Splunkbase and we also open up our development repos, which include the evolution of both code and associated tests. They come with sample data sets and eventgen configuration. To top it off, we provide associated guidance which provides architectural considerations as well as various tips and tricks to help you become more productive, competent, and successful.

Here’s how you get all this goodness:

Release notes
Code repo
Test repo
Download as a package from Splunk>Apps

Release notes
Code repo
Download as a package from Splunk>Apps

Part I of Splunk Developer Guide, “The Journey,” contains the following chapters:

It should take you fewer than 5 minutes to download and install the reference apps. Do follow the installation instructions from the Release Notes.

We are excited to announce that the Splunk Reference App – PAS also comes preinstalled in the Splunk Cloud Sandbox. Just remember to turn on the eventgen and wait a few minutes for new sample data to start streaming in.

We’d like to thank our partners (Conducive Consulting and Auth0) together with our beta users and technical reviewers who provided thoughtful feedback on the Splunk Developer Guidance. To provide feedback about this release, to get help with any problems, or to stay connected with other developers building on Splunk Enterprise please visit:

answers.splunk.com community site
Issues on GitHub to submit bug reports and feature requests
Email: devinfo@splunk.com
Blog: http://blogs.splunk.com/dev
Twitter: @splunkdev

You can also contribute to our code base and revise/propose new content for the Splunk Developer Guide. We accept pull requests.

Happy coding!

↧

The Splunk SDK for JavaScript gets support for Node.js v0.12 and io.js!

March 25, 2015, 2:24 pm

≫ Next: Troubleshooting connectivity issues to Splunk’s API from the SDK

≪ Previous: Announcing the Splunk Developer Guidance

We’ve just released an update the Splunk SDK for JavaScript, v1.7, with some great new features! Most importantly support for Node.js v0.12.x and io.js.

You can get it on npm or GitHub, and docs are available at dev.splunk.com

New features and APIs

Added Service.getJob() method for getting a Job by its sid.
Added Service.ConfigurationFile.getDefaultStanza() method for getting the [default] stanza of a conf file.
Can now stream JavaScript objects with modular inputs by passing an object as the data parameter to the Event constructor; that object will then be passed to JSON.stringify().
- Updated the GitHub commits example to show this functionality.

New Examples

The node/helloworld/get_job.js example shows how to get a Job by its sid.
The node/helloworld/endpoint_instantiation.js example shows how to access unsupported REST API endpoints.
- Now you can roll your own endpoints!

↧

Troubleshooting connectivity issues to Splunk’s API from the SDK

March 25, 2015, 4:57 pm

≫ Next: Splunk supporting the .NET Fringe conference

≪ Previous: The Splunk SDK for JavaScript gets support for Node.js v0.12 and io.js!

A common problem we see customers struggle with is how to diagnose connectivity issues with any of our SDKs. In this post, I’ll show you a few tried and true practices that can help you figure out what might be going wrong.

There are two main families of errors folks see. One has to do with general connectivity / connection info, and the other has to do with security config on the client.

General connectivity issues

This means that you are unable to succesfully connect to the API. The best way I find to diagnose is to drop to a terminal and use curl to login to the Splunk API and see the results. The command to use is:

curl –k [server:port]/services/auth/login -d username=xxx -d password=xxx

If you get a response like:

<response>
   <sessionKey>G^b8zfS3YGWFSJTxY7c5fpqso4DZr6_O_z9lcXk^v...</sessionKey>
</response>

This means the API is definitely accessible. If not, then it means a number of possibilities.

If you get this: <msg type=”WARN”>Remote login disabled by ‘allowRemoteLogin’ in server.conf</msg> then you need to tweak the server.conf setting
If you get this: <msg type=”WARN”>Remote login has been disabled for ‘admin’ with the default password. Either set the password, or override by changing the ‘allowRemoteLogin’ setting in your server.conf file.</msg> then you are trying to connect with admin and you have not yet changed the admin password from the default of “changeme”.
If you get this: curl: (7) Failed connect to 10.80.9.131:8088; No error then either the URI is wrong, the port is not correct, or the port is not opened on the firewall.
If an empty reply, then you are using the wrong scheme i.e. http when it should be https

If you get a valid response using curl yet the SDK is still failing, then the credentials / URI passed in the code that uses the SDK could be wrong. Check your app’s configuration.

Security configuration issues

The second family of issues relates to either certificate validation failing, or the security protocol configuration in Splunk.

Certificate validation

Depending on the SDK you are using, there is another kind of error you might see, which relates to certificate validation. By default most platforms will automatically throw an exception when the HTTPS cert is not valid. Splunk by default does not return a valid cert, which causes this failure.

This can be disabled within the application code. Depending on the language/runtime stack this differs, as some require it to be done in the app setup, and others (like Node.js) allow you to do it when you make the call. For example if you are using our C# SDK, then you can turn off cert validation using code like this:

ServicePointManager.ServerCertificateValidationCallback += (sender, certificate, chain, sslPolicyErrors) =>
{
     return true;
};

Security protocol

You might also see an error that indicates the client is unable to negotiate a connection. Generally this means that the security protocol the client is using to connect, is not within the set that Splunk is configured for in server.conf. You might for example be using SSL2 while Splunk is configured for SSL3 or TLS only.

Again, each platform generally has a way to configure this protocol. Using C# again, here is how to configure the security protocol to use TLS:

ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls;

Once the protocols on client and server match, you should be able to make the connection.

Does that cover everything?

I’d like to think it does, but I am almost positive it doesn’t ;-). However, it covers the most common problems I have seen over the past few years, and it may cover yours!

If you have any other tips to share, put them in the comments, and I’ll add them to the post!

↧