New support for authoring modular inputs in Node.js

Modular inputs allow you to teach Splunk Enterprise new ways to pull in events from internal systems, third party APIs or even devices. Modular Inputs extend Splunk Enterprise and are deployed on the Splunk Enterprise instance or on a forwarder. In version 1.4.0 of the Splunk SDK for JavaScript we added support for creating modular inputs in Node.js!

In this post, I’ll show you how to create a modular input with Node.js that pulls commit data from GitHub into Splunk.

Why Node.js

Node.js is designed for I/O intensive workloads. It offers great support for streaming data into and out of a Node application in an asynchronous manner. It also has great support for JSON out of the box. Finally, Node.js has a huge ecosystem of packages available via npm that are at your disposal. An input pulls data from a source and then streams those results directly into a Splunk instance. This makes modular inputs a great fit for Node.js.

Getting started

You can get the Splunk SDK for JavaScript from npm (npm install splunk-sdk), the Splunk Developer Portal or by grabbing the source from our GitHub repo. You can find out more about the SDK here. The SDK includes two sample modular inputs, random numbers, and GitHub commits. For the remainder of this post we’ll look at the GitHub example.

This input indexes all commits on the master branch of a GitHub repository using GitHub’s API. This example illustrates how to pull in data from an external source, as well as showing how to create checkpoints when you are periodically polling in order to prevent duplicate events from getting created.

Prerequisites

You have installed a Splunk Enterprise instance, version 5.0 or later
You have Node.js installed (v0.8 or later, check this by running node -v from the command line).
You have downloaded the Splunk SDK for JavaScript.

Installing the example

Set the $SPLUNK_HOME environment variable to the root directory of your Splunk Enterprise instance.

Copy the GitHub example from

/splunk-sdk-javascript/examples/modularinputs/github_commits

$SPLUNK_HOME/etc/apps

Open a command prompt or terminal window and go to the following directory:
```
$SPLUNK_HOME/etc/apps/github_commits/bin/app
```
Then type npm install, this will install the Node modules which are required, which includes the splunk-sdk itself and the github module.
Restart Splunk Enterprise by typing the following into the command line:
```
$SPLUNK_HOME/bin/splunk restart
```

Configuring the GitHub commits modular input example

Modular Inputs integrate with Splunk Enterprise, allowing Splunk Administrators to create new instances and provide necessary configuration right in the UI similar to other inputs in Splunk. To see this in action, follow these steps:

From Splunk Home, click the Settings menu. Under Data, click Data inputs, and find “GitHub commits”, the input you just added. Click Add new on that row.
Click Add new and fill in:
- name (whatever name you want to give this input)
- owner (the owner of the GitHub repository, this is a GitHub username or org name)
- repository (the name of the GitHub repository)
- (optional) token if using a private repository and/or to avoid GitHub’s API limits
To get a GitHub API token visit the GitHub settings page and make sure the repo and public_repo scopes are selected.
Save your input, and navigate back to Splunk Home.
Do a search for sourcetype=github_commits and you should see some events indexed; if your repository has a large number of commits indexing them may take a few moments.

Analyzing GitHub commit data

Now that your GitHub repository’s commit data has been indexed by Splunk Enterprise, you can leverage the power of Splunk’s Search Processing Language to do interesting things with your data. Below are some example searches you can run:

Want to know who the top contributors are for this repository? Run this search:

sourcetype="github_commits" source="github_commits://[your input name]" | stats count by author | sort count DESC

Want to see a graph of the repository’s commits over time? Run this search:
```
sourcetype="github_commits" source="github_commits://[your input name]" | timechart count(sha) as "Number of commits"
```
Then click the Vizualization tab, and select line from the drop down for visualization types (pie may be already selected).

Write your own modular input with the Splunk SDK for JavaScript

Adding a modular input to Splunk Enterprise is a two-step process: First, write a modular input script, and then package the script with several accompanying files and install it as a Splunk app.

Writing a modular input

A modular input will:

Return an introspection scheme. The introspection scheme defines the behavior and endpoints of the script. When Splunk Enterprise starts, it runs the input to determine the modular input’s behavior and configuration.
Validate the script’s configuration (optional). Whenever a user creates or edits an input, Splunk Enterprise can call the input to validate the configuration.
Stream events into Splunk. The input streams event data that can be indexed by Splunk Enterprise. Splunk Enterprise invokes the input and waits for it to stream events.

To create a modular input in Node.js, first require the splunk-sdk Node module. In our examples, we’ve also assigned the classes we’ll be using to variables, for convenience. At the very least, we recommend defining a ModularInputs variable as shown here:

var splunkjs        = require("splunk-sdk");
var ModularInputs   = splunkjs.ModularInputs;

The preceding three steps are accomplished as follows using the Splunk SDK for JavaScript:

Return the introspection scheme: Define the getScheme method on the exports object.
Validate the script’s configuration (optional): Define the validateInput method on the exports object. This is required if you set the scheme returned by getScheme to use external validation (that is, set Scheme.useExternalValidation to true).
Stream events into Splunk: Define the streamEvents method on the exports object.

In addition, you must run the script by calling the ModularInputs.execute method, passing in the exports object you just configured along with the module object which contains the state of this script:

ModularInputs.execute(exports, module);

To see the full GitHub commits input source code, see here.

Woah. Let’s take a deeper dive into the code so we can understand what’s really going on.

The getScheme method

When Splunk Enterprise starts, it looks for all the modular inputs defined by its configuration, and tries to run them with the argument –scheme. The scheme allows your input to tell Splunk arguments that need to be provided for the input, these arguments are then used for populating the UI when a user creates an instance of an input. Splunk expects each modular input to print a description of itself in XML to stdout. The SDK’s modular input framework takes care of all the details of formatting the XML and printing it. You only need to implement a getScheme method to return a new Scheme object, this makes your job much easier!

As mentioned earlier, we will be adding all methods to the exports object.

Let’s begin by defining getScheme, creating a new Scheme object, and setting its description:

exports.getScheme = function() {
        var scheme = new Scheme("GitHub Commits"); 
        scheme.description = "Streams events of commits in the specified GitHub repository (must be public, unless setting a token).";

For this scheme, the modular input will show up as “GitHub Commits” in Splunk.

Next, specify whether you want to use external validation or not by setting the useExternalValidation property (the default is true). If you set external validation to true without implementing the validateInput method on the exports object, the script will accept anything as valid. We want to make sure the GitHub repository exists, so we’ll define validateInput once we finish with getScheme.

       scheme.useExternalValidation = true;

If you set useSingleInstance to true (the default is false), Splunk will launch a single process executing the script which will handle all instances of the modular input. You are then responsible for implementing the proper handling for all instances within the script. Setting useSingleInstance to false will allow us to set an optional interval parameter in seconds or as a cron schedule(available under more settings when creating an input).

      scheme.useSingleInstance = false;

The GitHub commits example has 3 required arguments (name, owner, repository), and one optional argument (token). Let’s recap what these are for:

name: The name of this modular input definition (ex: Splunk SDK for JavaScript)
owner: The GitHub organization or user that owns the repository (ex: splunk)
repository: The GitHub repository (ex: splunk-sdk-javascript), don’t forget to set the token argument if the repository is private
token: A GitHub access token with at least the repo and public_repo scopes enabled. To get an access token, see the steps outlined earlier in this post.

Now let’s see how these arguments are defined within the Scheme. We need to set the args property of the Scheme object we just created to an array of Argument objects:

      scheme.args = [
            new Argument({
                name: "owner",
                dataType: Argument.dataTypeString,
                description: "GitHub user or organization that created the repository.",
                requiredOnCreate: true,
                requiredOnEdit: false
            }),
            new Argument({
                name: "repository",
                dataType: Argument.dataTypeString,
                description: "Name of a public GitHub repository, owned by the specified owner.",
                requiredOnCreate: true,
                requiredOnEdit: false
            }),
            new Argument({
                name: "token",
                dataType: Argument.dataTypeString,
                description: "(Optional) A GitHub API access token. Required for private repositories (the token must have the 'repo' and 'public_repo' scopes enabled). Recommended to avoid GitHub's API limit, especially if setting an interval.",
                requiredOnCreate: false,
                requiredOnEdit: false
            })
        ];

Each Argument constructor, takes a parameter of a JavaScript object with the required property name and the optional properties:

dataType: What kind of data is this argument? (Argument.dataTypeBoolean, Argument.dataTypeNumber, or Argument.dataTypeString)
description: A description for the user entering this argument (string)
requiredOnCreate: Is this a required argument? (boolean)
requiredOnEdit: Does a new value need to be specified when editing this input? (boolean)

After adding arguments to the scheme, return the scheme and we close the function:

        return scheme;
    };

The validateInput method

The validateInput method is where the configuration of an input is validated, and is only needed if you’ve set your modular input to use external validation. If validateInput does not call the done callback with an error argument, the input is assumed to be valid. Otherwise it throws an error when it tells Splunk that the configuration is not valid.

When you use external validation, after splunkd calls the modular input with the –scheme argument to get the scheme, it calls it again with the –validate-arguments argument for each instance of the modular inputs in its configuration files, feeding XML on stdin to the modular input to validate all enabled inputs. Splunk calls the modular input the same way again whenever the modular input’s configuration is changed.

In our GitHub Commits example, we’re using external validation since we want to make sure the repository is valid. Our validateInput method contains logic used the GitHub API to check that there is at least one commit on the master branch of the specified repository:

    exports.validateInput = function(definition, done) { 
        var owner = definition.parameters.owner;
        var repository = definition.parameters.repository;
        var token = definition.parameters.token;

        var GitHub = new GitHubAPI({version: "3.0.0"});

        try {
            if (token && token.length > 0) {
                GitHub.authenticate({
                    type: "oauth",
                    token: token
                });
            }

            GitHub.repos.getCommits({
                headers: {"User-Agent": SDK_UA_STRING},
                user: owner,
                repo: repository,
                per_page: 1,
                page: 1
            }, function (err, res) {
                if (err) {
                    done(err);
                }
                else {
                    if (res.message) {
                        done(new Error(res.message));
                    }
                    else if (res.length === 1 && res[0].hasOwnProperty("sha")) {
                        done();
                    }
                    else {
                        done(new Error("Expected only the latest commit, instead found " + res.length + " commits."));
                    }
                }
            });
        }
        catch (e) {
            done(e);
        }
    };

The streamEvents method

Here’s the best and most important part, streaming events!

The streamEvents method is where the event streaming happens. Events are streamed into stdout using an InputDefinition object as input that determines what events are streamed. In the case of the GitHub commits example, for each input, the arguments are retrieved before connecting to the GitHub API. Then, we go through each commit in the repository on the master branch.

Creating Events and Checkpointing

For each commit, we’ll check to see if we’ve already indexed it by looking in a checkpoint file. This is a file that Splunk allows us to create in order to track which data has been already processed so that we can prevent duplicates. If we have indexed the commit, we simply move on – we don’t want to have duplicate commit data in Splunk. If we haven’t indexed the commit we’ll create an Event object, set its properties, write the event using the EventWriter, then append the unique SHA for the commit to the checkpoint file. We will create a new checkpoint file for each input (in this case, each repository).

The getDisplayDate function, is used to transform the date we get back from the GitHub API into something more readable format.

exports.streamEvents = function(name, singleInput, eventWriter, done) {
        // Get the checkpoint directory out of the modular input's metadata.
        var checkpointDir = this._inputDefinition.metadata["checkpoint_dir"];

        var owner = singleInput.owner;
        var repository = singleInput.repository;
        var token      = singleInput.token;

        var alreadyIndexed = 0;

        var GitHub = new GitHubAPI({version: "3.0.0"});

        if (token && token.length > 0) {
            GitHub.authenticate({
                type: "oauth",
                token: token
            });
        }

        var page = 1;
        var working = true;

        Async.whilst(
            function() {
                return working;
            },
            function(callback) {
                try {
                    GitHub.repos.getCommits({
                        headers: {"User-Agent": SDK_UA_STRING},
                        user: owner,
                        repo: repository,
                        per_page: 100,
                        page: page
                    }, function (err, res) {
                        if (err) {
                            callback(err);
                            return;
                        }

                        if (res.meta.link.indexOf("rel=\"next\"") < 0) {
                            working = false;
                        }
                        
                        var checkpointFilePath  = path.join(checkpointDir, owner + " " + repository + ".txt");
                        var checkpointFileNewContents = "";
                        var errorFound = false;

                        var checkpointFileContents = "";
                        try {
                            checkpointFileContents = utils.readFile("", checkpointFilePath);
                        }
                        catch (e) {
                            fs.appendFileSync(checkpointFilePath, "");
                        }

                        for (var i = 0; i < res.length && !errorFound; i++) {
                            var json = {
                                sha: res[i].sha,
                                api_url: res[i].url,
                                url: "https://github.com/" + owner + "/" + repository + "/commit/" + res[i].sha
                            };

                            if (checkpointFileContents.indexOf(res[i].sha + "\n") < 0) {
                                var commit = res[i].commit;

                                json.message = commit.message.replace(/(\n|\r)+/g, " ");
                                json.author = commit.author.name;
                                json.rawdate = commit.author.date;
                                json.displaydate = getDisplayDate(commit.author.date.replace("T|Z", " ").trim());

                                try {
                                    var event = new Event({
                                        stanza: repository,
                                        sourcetype: "github_commits",
                                        data: JSON.stringify(json),
                                        time: Date.parse(json.rawdate)
                                    });
                                    eventWriter.writeEvent(event);

                                    checkpointFileNewContents += res[i].sha + "\n";
                                    Logger.info(name, "Indexed a GitHub commit with sha: " + res[i].sha);
                                }
                                catch (e) {
                                    errorFound = true;
                                    working = false;
                                    Logger.error(name, e.message, eventWriter._err);
                                    fs.appendFileSync(checkpointFilePath, checkpointFileNewContents);

                                    done(e);
                                    return;
                                }
                            }
                            else {
                                alreadyIndexed++;
                            }
                        }

                        fs.appendFileSync(checkpointFilePath, checkpointFileNewContents);

                        if (alreadyIndexed > 0) {
                            Logger.info(name, "Skipped " + alreadyIndexed.toString() + " already indexed GitHub commits from " + owner + "/" + repository);
                        }

                        page++;
                        alreadyIndexed = 0;
                        callback();
                    });
                }
                catch (e) {
                    callback(e);
                }
            },
            function(err) {
                done(err);
            }
        );
    };

Logging (optional)

Logging is an optional feature we’ve included with modular inputs the Splunk SDK for JavaScript.

It’s best practice for your modular input script to log diagnostic data to splunkd.log ($SPLUNK_HOME/var/log/splunk/splunkd.log). Use a Logger method to write log messages, which include a standard splunkd.log severity level (such as “DEBUG”, “WARN”, “ERROR” and so on) and a descriptive message. For instance, the following code is from the GitHub Commits streamEvents example, and logs a message if any GitHub commits have already been indexed:

if (alreadyIndexed > 0) {
    Logger.info(name, "Skipped " + alreadyIndexed.toString() + " already indexed GitHub commits from " + owner + "/" + repository);
}

Here we call the Logger.info method to log a message with the info severity, we’re also passing in the name argument, which the user set when creating the input.

That’s all the code you have to write to get started with modular inputs using the Splunk SDK for JavaScript!

Add the modular input to Splunk Enterprise

With your modular input completed, you’re ready to integrate it into Splunk Enterprise. First, package the input, and then install the modular input as a Splunk app.

Package the input

Files

Create the following files with the content indicated. Wherever you see modinput_name — whether in the file name or its contents — replace it with the name of your modular input JavaScript file. For example, if your script’s file name is github_commits.js, give the file indicated as modinput_name.cmd the name github_commits.cmd.

If you haven’t already, now is a good time to set your $SPLUNK_HOME environment variable.

We need to make sure all the names match up here, or Splunk will have problems recognizing your modular input.

modinput_name.cmd

@"%SPLUNK_HOME%"\bin\splunk cmd node "%~dp0\app\modinput_name.js" %*

modinput_name.sh

#!/bin/bash

current_dir=$(dirname "$0")
"$SPLUNK_HOME/bin/splunk" cmd node "$current_dir/app/modinput_name.js" $@

package.json

When creating this file, replace the values given with the corresponding values for your modular input. All values (except the splunk-sdk dependency, which should stay at “>=1.4.0″) can be changed.

{
    "name": "modinput_name",
    "version": "0.0.1",
    "description": "My great modular input",
    "main": "modinput_name.js",
    "dependencies": {
        "splunk-sdk": ">=1.4.0"
    },
    "author": "Me"
}

app.conf

When creating this file, replace the values given with the corresponding values for your modular input:

The is_configured value determines whether the modular input is preconfigured on install, or whether the user should configure it.
The is_visible value determines whether the modular input is visible to the user in Splunk Web.

inputs.conf.spec

[install]
is_configured = 0

[ui]
is_visible = 0
label = My modular input

[launcher]
author=Me
description=My great modular input
version = 1.0

When creating this file, in addition to replacing modinput_name with the name of your modular input’s JavaScript file, do the following:

After the asterisk (*), type a description for your modular input.
Add any arguments to your modular input as shown. You must list every argument that you define in the getScheme method of your script.

The file should look something like this:

[github_commits://<name>]
*Generates events of GitHub commits from a specified repository.

owner = <value>
repository = <value>
token = <value>

File structure

Next, create a directory that corresponds to the name of your modular input script—for instance, “modinput_name” — in a location such as your Documents directory. (It can be anywhere; you’ll copy the directory over to your Splunk Enterprise directory at the end of this process.)

Within this directory, create the following directory structure:
```
modinput_name/
    bin/
        app/
    default/
    README/
```

Copy your modular input script (modinput_name.js) and the files you created in the previous section so that your directory structure looks like this:

modinput_name/
    bin/
        modinput_name.cmd
        modinput_name.sh
        app/
            package.json
            modinput_name.js
    default/
        app.conf
    README/
        inputs.conf.spec

Install the modular input

Before using your modular input as a data input for your Splunk Enterprise instance, you must first install it.

Set the SPLUNK_HOME environment variable to the root directory of your Splunk Enterprise instance.
Copy the directory you created in Package the script to the following directory:
```
$SPLUNK_HOME/etc/apps/
```
Open a command prompt or terminal window and go to the following directory, where modinput_name is the name of your modular input script:
```
$SPLUNK_HOME/etc/apps/modinput_name/bin/app
```
Type the following, and then press Enter or Return: npm install
Restart Splunk Enterprise: From Splunk Home, click the Settings menu. Under System, click Server Controls. Click Restart Splunk; alternatively you can just run
```
$SPLUNK_HOME/bin/splunk restart
```
from command prompt or terminal.

Your modular input should now appear long the native Splunk input by going to Splunk Home, click the Settings menu. Under Data, click Data inputs, and find the names of the modular inputs you just created.

In Summary

In this post you’ve seen how to create a modular input using the Splunk SDK for JavaScript.

Now you can use your Node.js skills to extend Splunk and pull data from any source, even Github!