Introducing Splunkbase Curated Experience

September 29, 2016, 8:03 am

≫ Next: Encrypt a Modular Input Field without using Setup.XML

There are about 1,200 apps in Splunkbase today. Up until now, the typical ways to look for an app on Splunkbase have been to either search for the app, or filter through multiple apps based on several filter criteria. We have not recommended apps to our user community in the past. With the launch of curated experience at Splunk .conf2016 we are changing this by bringing the notion of “curation” to Splunkbase.

We believe this will improve the app browsing and discovery experience for our users by highlighting apps that provide the most value. The main emphasis here is on “curation of content” by a team at Splunk – sifting through all the apps on Splunkbase, and highlighting these in specific areas.

Let’s walk through content curation and some other exciting changes we are bringing to Splunkbase!

Content Curation and Recommendation

With this release, we will now have the ability to “curate and recommend” apps in several areas on Splunkbase. This includes:

Category Pages

Each of the categories will have its own dedicated category page. This page will feature up to six apps that are recommended for this area. In addition, this page will have detailed sections on specific featured apps.

Technology Vendors

We are also adding the concept of apps featured and recommended for various technology vendors that we partner with. At launch we are highlighting the following partners:

Cisco
EMC
AWS
Palo Alto Networks

Look for more of our partners to be showcased with corresponding apps over time.

Splunk Built and Splunk Certified Content

We are incorporating changes that will provide more prominence to Splunk Built and Splunk Certified content. This will be done via swim lanes on the homepage highlighting content in these groups, as well as by badges on the app card which will signify if any apps or add-ons have been built or certified by Splunk. In addition, you will also be able to filter for Splunk Built and Splunk Certified content on the search page.

Responsive Design

This launch also features responsive design for the browsing sections on Splunkbase. This means you will now be able to browse for and look at app details from your mobile device leading to a much better experience than before. Look for this experience to improve over time as we add additional features to make the mobile experience even better.

User Specific App Recommendations

We are also bringing the notion of user-specific app recommendations to Splunkbase. When logged in to Splunkbase, and on the details page for an app, you should now start seeing recommendations for other related apps. Our recommendation engine will suggest these apps tailored to you based on what you have already downloaded.

What’s Next?

We are working on a lot of developer goodness over the next several months. Watch out for the new features in the Splunkbase Developer portal.

Let us know what you think, and as always please feel free to reach out to us at splunkbase-admin@splunk.com with your comments and suggestions!

↧

Encrypt a Modular Input Field without using Setup.XML

October 10, 2016, 7:12 am

≫ Next: Building add-ons just got 2.0 times easier

≪ Previous: Introducing Splunkbase Curated Experience

Modular Inputs are a great addition to Splunk Enterprise. One of the things I really like about Modular Inputs is that they allow you to create inputs that “look and feel” as if they were part of the Splunk installation by providing a nice user interface for parameter input.

But, what if you need to encrypt a Modular Input value? This could be a password, OAuth secret key, or some other confidential piece of information. Traditional Splunk applications use setup.xml and the storage/passwords endpoint to accomplish this. If you just need to encrypt an input value specific to the input (as opposed to the entire application), it may be cumbersome to the end user to first run through a setup.xml UI and then the Modular Input UI. In this blog post, I will show you a technique to encrypt input values without going through a separate setup.xml process.

In this example, we will use a simple username/password combination. This technique can apply to any field you want encrypted though.

The Technique

When our modular input code runs (which happens immediately after creating the input), the following will happen:

Retrieve the input parameters from inputs.conf for the modular input.
Check if the field we want to encrypt is clear text.
If the field is clear text, create an encrypted credential and mask the field in inputs.conf.
Decrypt the credential so that we can use the clear password in our code.

The Result

After installing the sample application code, go to Settings -> Data Inputs -> Splunk Modular Input Credential Example.

Create a new Input (this is a very simple example, but you can have as many fields as you want)

After clicking the “Next” button, the password is encrypted (creating a passwords.conf file in the local directory of the application) and masked in inputs.conf (in the same local directory).

Resulting local/inputs.conf

[splunk_modinput_cred_example://Testing123]
password = <nothing to see here>
username = Jason

Resulting local/passwords.conf

[credential::Jason:]
password = $1$oVfptNrGUg==

The code

First, we will set up some global variables to use anywhere in the code:

class MyScript(Script):
    # Define some global variables
    MASK           = "<nothing to see here>"
    APP            = __file__.split(os.sep)[-3]
    USERNAME       = None
    CLEAR_PASSWORD = None

The stream_events method is the entry point for the modular input code. We check to see if the password is masked here and take action if it is not.

def stream_events(self, inputs, ew):
    self.input_name, self.input_items = inputs.inputs.popitem()
    session_key = self._input_definition.metadata["session_key"]
    username = self.input_items["username"]
    password = self.input_items['password']
    self.USERNAME = username
    try:
        # If the password is not masked, mask it.
        if password != self.MASK:
            self.encrypt_password(username, password, session_key)
            self.mask_password(session_key, username)
        self.CLEAR_PASSWORD = self.get_password(session_key, username)
    except Exception as e:
        ew.log("ERROR", "Error: %s" % str(e))
    ew.log("INFO", "USERNAME:%s CLEAR_PASSWORD:%s" % (self.USERNAME, self.CLEAR_PASSWORD))

Here is how we encrypt the password (or any confidential piece of information).

def encrypt_password(self, username, password, session_key):
    args = {'token':session_key}
    service = client.connect(**args)
    try:
        # If the credential already exists, delte it.
        for storage_password in service.storage_passwords:
            if storage_password.username == username:
                service.storage_passwords.delete(username=storage_password.username)
                break
        # Create the credential.
        service.storage_passwords.create(password, username)
    except Exception as e:
        raise Exception, "An error occurred updating credentials. Please ensure your user account has admin_all_objects and/or list_storage_passwords capabilities. Details: %s" % str(e)

And, here is the masking.

def mask_password(self, session_key, username):
    try:
        args = {'token':session_key}
        service = client.connect(**args)
        kind, input_name = self.input_name.split("://")
        item = service.inputs.__getitem__((input_name, kind))
        kwargs = {
            "username": username,
            "password": self.MASK
        }
        item.update(**kwargs).refresh()
    except Exception as e:
        raise Exception("Error updating inputs.conf: %s" % str(e))

Finally, the method to decrypt the credential.

def get_password(self, session_key, username):
    args = {'token':session_key}
    service = client.connect(**args)
    # Retrieve the password from the storage/passwords endpoint 
    for storage_password in service.storage_passwords:
        if storage_password.username == username:
            return storage_password.content.clear_password

Running the following search shows the output from the INFO log message in the stream_events method displaying the clear text username and password (do not do this in your actual code as this was done just to show that we did, in fact, decrypt the password):

index=_internal source="/applications/splunk/var/log/splunk/splunkd.log"

Putting it all together

A complete working example can be found on GitHub here -> https://github.com/JasonConger/TA_modinput_cred-example

Notes:

SDK – this example makes use of the Splunk Python SDK to abstract a lot of the REST API plumbing for you.

Capabilities – in order for this example to work, the user creating the modular input needs to have the admin_all_objects capability.

Passwords.conf file – credentials are stored in a separate passwords.conf file starting with Splunk version 6.3.0

Distributed Environments – encrypting/decrypting relies on a splunk.secret key. In a search head cluster, the captain replicates its splunk.secret file to all other cluster members during initial deployment of the cluster. Reference http://docs.splunk.com/Documentation/Splunk/latest/Security/Deploysecurepasswordsacrossmultipleservers

↧

Building add-ons just got 2.0 times easier

October 12, 2016, 10:16 am

≫ Next: Important information for customers using Splunk Enterprise 6.2 or earlier

≪ Previous: Encrypt a Modular Input Field without using Setup.XML

Are you trying to build ES Adaptive Response actions or alert actions and need some help? Are you trying to validate your add-on to see if it is ready to submit for certification? Are you grappling with your add-on setup page and building credential encryptions? If you are, check out Splunk Add-on Builder 2.0.

Below is a brief overview of what’s new in Add-on Builder 2.0:

You can now leverage the easy-to-use, step-by-step workflow in Add-on Builder to create alert actions and ES adaptive response actions. No need to deal with .conf files and Python, let the tool do the work for you.

The validation process has been enhanced to include App Certification readiness. This validation process can also be performed on apps and add-ons that were created outside of Add-on Builder.

New enhanced user experience and step-by-step flow for building data collections. Let the tool automatically generate the Python code for you.

Enhanced out-of-box experience for building the setup page for add-ons with proxy support and multi-account support, as well as credentials encryption using the storage password endpoint.

New helper function libraries to make your life easier when building data collections and alert actions.

Click here for a walkthrough example of how to build ES adaptive response action. Please give Add-on Builder 2.0 a try and let us know your feedback. Happy Splunking and happy data on-boarding!

↧

Important information for customers using Splunk Enterprise 6.2 or earlier

October 18, 2016, 10:26 am

≫ Next: Creating McAfee ePO Alert and ARF Actions with Add-On Builder

≪ Previous: Building add-ons just got 2.0 times easier

Do you use SSL to secure Splunk Enterprise? Are you still using Splunk Enterprise version 6.2 or earlier? If you answered yes to both of these questions, please read on.

Securing communication with your Splunk instance can be essential in today’s digital environment, especially if it is collecting sensitive information. If communication to/from your Splunk instance can be easily intercepted (e.g. public access to SplunkWeb, Forwarders outside firewall) then this communication should be encrypted using SSL. Additionally, security functionality is constantly being enhanced to combat the evolving threat landscape so you should stay on as current a version of Splunk as possible.

You may have heard that the OpenSSL Software Foundation will cease support for OpenSSL version 1.0.1 as of Dec 31st, 2016. This means that new security vulnerabilities discovered in OpenSSL 1.0.1 will not be patched after this date.

Splunk Enterprise versions 6.0, 6.1, and 6.2 use OpenSSL 1.0.1. Hence, if you are running version 6.2 or earlier and use SSL to secure Splunk Enterprise we recommend that you upgrade to version 6.3 or higher.

An upgrade can also be a great opportunity for you to benefit from the latest advances in Splunk Enterprise. E.g., The latest version (6.5) includes several enhancements that make data analysis faster and easier, lower TCO, and extend the flexibility and value of the platform. Read more about the latest release of Splunk Enterprise.

What do I do next?

If you are currently running version 6.2 or earlier, please feel free to contact your Splunk account manager or partner with any questions. We can help you establish a migration path that is right for your business, and our Professional Services team has several service offerings to assist you with your upgrade.

Please continue to monitor the Splunk Security Portal for the latest Splunk Product Security Announcements.

For more information about how SSL is used to secure Splunk Enterprise, please visit Splunk Enterprise docs.

Thanks,
Thomas Chimento

↧

Creating McAfee ePO Alert and ARF Actions with Add-On Builder

October 24, 2016, 8:29 am

≫ Next: How to: Splunk Analytics for Hadoop on Amazon EMR.

≪ Previous: Important information for customers using Splunk Enterprise 6.2 or earlier

One of the best things about Splunk is the passionate user community. As a group, the community writes amazing Splunk searches, crafts beautiful dashboards, answers thousands of questions, and shares apps and add-ons with the world.

Building high quality add-ons is perhaps one of the more daunting ways to contribute. Since the recently-updated Splunk Add-On Builder 2.0 was released, however, it’s never been easier to build, test, validate and package add-ons for sharing on SplunkBase.

Technical Add-Ons, aka TAs, are specialized Splunk apps that make it easy for Splunk to ingest data, extract and calculate field values, and normalize field names against the Common Information Model (CIM). Since the release of version 6.3, Splunk Enterprise also supports TAs for modular alert actions. This allows users to take actions on Splunk alert search results by integrating with nearly any type of open system.

While I am no developer, I have tinkered with scripted alert actions in the past. Scripted alert actions existed before modular alert actions, but were more difficult to share and implement. When I saw that new version of the Splunk Add-On Builder had been released, and that it not only supported modular alert actions but also Enterprise Security Adaptive Response Framework (ARF) actions, I had to give it a try. In particular, I wanted to see if I could turn my scripted alert action that tags system in McAfee ePolicy Orchestrator (ePO) into a modular alert action and ARF action.

I downloaded and installed the Splunk Add-On Builder 2.0 to my home Splunk Enterprise 6.5 server. I went into the app and clicked “Create an add-on.” I then clicked the button to create a modular alert action. Most of the other great features of this tool around data ingestion, extraction and normalization weren’t relevant. I was quickly dropped into a very handy wizard that walks you through the entire process needed to make modular alert actions.

The wizard takes you through all the steps you need to create and describe the add-on, collect initial setup data from the user, and collect data needed for each individual alert. Perhaps the biggest hurdle to creating modular alerts in the past was the effort required to generate the initial setup screens and securely store the passwords. The Add-On Builder takes care of all of that for you! All I had to do was drag a few boxes onto a couple of screens and describe the data I was collecting – the Add-On Builder took care of everything else, including enabling secure password collection/storage, as well as providing sample code to access all the collected data in the alert action script.

Collect Setup Info and Passwords Securely

Specify Required Alert Inputs

Adding optional functionality to support Enterprise Security 4.5’s great new Adaptive Response Framework was incredibly simple. I had to ensure that I had the latest Common Information Model installed on my system, and just had to fill out 3 drop-down lists and 3 text fields to categorize the action. Enabling Splunk users to automate security responses has never been easier!

Simple Enterprise Security ARF Integration

The next step was to actually code the alert action in the tool using a little Python. The Add-On Builder provides a syntax-highlighting GUI for creating/editing the script, sample code so even a coding dunce like me will understand how to work with alert variables and search results, and a robust testing tool with logging. It’s all documented right here and here.

All I had to do was a little cut and paste, a bit of research on how to interface with the McAfee ePO web API, and the usual code troubleshooting that needs be done when you have a guy with only a history degree writing Python scripts. The helper functions in the sample code made most of it trivially easy. It was even a simple matter to enable robust logging for end users so they can troubleshoot their own deployment of my add-on.

Code and Test in the Add-On Builder

The only steps that remained were to validate that my app passed all the recommended best practices, and package it up so I could upload it to SplunkBase. Well, guess what? The Add-On Builder automates that process entirely! There’s a 1-click validation test, along with a button to package the add-on as an SPL file suitable for upload to SplunkBase.

Validate and Package

If you’re a Splunk user that uses McAfee ePO in your environment today, I recommend you check out my add-on. It will enable you to search for anything in Splunk that indicates an issue with an ePO-managed server or endpoint, and automatically tag that system so ePO can apply different policies and tasks as needed to address the issue. In addition, if you use Splunk Enterprise Security, you’ll be able to use this feature automatically when a correlation search fires and/or as an ad-hoc action when investigating notable events.

For example, if a Splunk query detects a server or endpoint is communicating with a known malicious host (e.g. through proxy logs with threat intel), this add-on can be used to tag that system as “compromised” or “infected” in ePO. ePO can then automatically run tag-specific tasks such as aggressive virus scans, and/or apply policies like blocking outbound communications via the endpoint firewall or HIPS on the compromised host. This enables true end-to-end automation between any data in Splunk and McAfee endpoint security tools.

Modular Alert Action in Use

And to take this further, if you have an idea for creating your own modular alert action to create a new Splunk integration, I strongly recommend you start by downloading the Splunk Add-On Builder from SplunkBase. It will greatly simplify the process and enable you to give back to the Splunk community. If you do so, please be sure to post a comment here – I’d love to see how others have made use of this incredible tool.

↧

How to: Splunk Analytics for Hadoop on Amazon EMR.

October 30, 2016, 4:48 pm

≫ Next: Splunking Kafka At Scale

≪ Previous: Creating McAfee ePO Alert and ARF Actions with Add-On Builder

**Please note: The following is an example approach outlining a functional Splunk Analytics for Hadoop environment running on AWS EMR. Please talk to your local Splunk team to determine the best architecture for you.

Using Amazon EMR and Splunk Analytics for Hadoop to explore, analyze and visualize machine data

Machine data can take many forms and comes from a variety of sources; system logs, application logs, service and system metrics, sensors data etc. In this step-by-step guide, you will learn how to build a big data solution for fast, interactive analysis of data stored in Amazon S3 or Hadoop. This hands-on guide is useful for solution architects, data analysts and developers.

You will need:

An Amazon EMR Cluster
A Splunk Analytics for Hadoop Instance
Amazon S3 bucket with your data
- Data can also be in Hadoop Distributed File System (HDFS)

To get started, go into Amazon EMR from the AWS management console page:

From here, you can manage your existing clusters, or create a new cluster. Click on ‘Create Cluster’:

This will take you to the configuration page. Set a meaningful cluster name, enable logging (if required) to an existing Amazon S3 bucket, and set the launch mode to cluster:

Under software configuration, choose Amazon EMR 5.x as per the following:

Several of the applications included are not required to run Splunk Analytics for Hadoop, however they may make management of your environment easier.

Choose the appropriate instance types, and number of instances according to your requirements:

** please note that Splunk recommends Hadoop nodes to be 8 cores / 16 vCPU. The M3.xlarge instances were used for demonstration here only.

For security and access settings, choose those appropriate to your deployment scenario. Using the defaults here can be an appropriate option:

Click ‘Create Cluster’.

This process may take some time. Keep an eye on the Cluster list for status changes:

When the cluster is deployed and ready:

Clicking on the cluster name will provide the details of the set up:

At this point, browse around the platform, and get familiar with the operation of the EMR cluster. Hue is a good option for managing the filesystem, and the data that will be analyzed through Splunk Analytics for Hadoop.

Configure Splunk Analytics for Hadoop on AWS AMI instance to connect to EMR Cluster

Installing Splunk Analytics for Hadoop on a separate Amazon EC2 instance, removed from yourAmazon EMR cluster is the Splunk recommended architectural approach. In order to configure this setup, we run up a Splunk 6.5 AMI from the AWS Marketplace, and then add the necessary Hadoop,Amazon S3 and Java libraries. This last step is further outlined on Splunk docs at -http://docs.splunk.com/Documentation/HadoopConnect/1.2.3/DeployHadoopConnect/HadoopCLI

To kick off, launch a newAmazon EC2 instance from the AWS Management Console:

Search the AWS Marketplace for Splunk and select the Splunk Enterprise 6.5 AMI:

Choose an instance size to suit your environment and requirements:

**please note that Splunk recommends minimum hardware specs for a production deployment. More details at http://docs.splunk.com/Documentation/Splunk/6.5.0/Installation/Systemrequirements

From here you can choose to further customize the instance (should you want more storage, or to add custom tags), or just review and launch:

Now, you’ll need to add the Hadoop,Amazon S3 and Java client libraries to the newly deployed Splunk AMI. To do this, first grab the versions from theAmazon EMR master node for each, to ensure that you are matching the libraries on your Splunk server. Once you have them, install them on the Splunk AMI:

Move this to /usr/bin and unpack it.

In order to search theAmazon S3 data, we need to ensure we have access to the S3 toolset. Add the following line to the file /usr/bin/hadoop/etc/hadoop/hadoop-env.sh:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop‌/tools/lib/*

Finally, we need to setup the necessary authentication to access Amazon S3 via our new virtual index connection. You’ll need a secret key ID and access key from your AWS Identity and Access Management (IAM) setup. In this instance, we have setup these credentials for an individual AWS user:

Ensure that when you create the access key, you record the details. You then need to include these in the file located at /usr/bin/hadoop/etc/hadoop/hdfs-site.xml. Include the following within the <configuration> tag:

<property>
   <name>fs.s3.awsAccessKeyId</name>
   <value>xxxx</value>
</property>
<property>
   <name>fs.s3.awsSecretAccessKey</name>
   <value>xxxx</value>
</property>
<property>
   <name>fs.s3n.awsAccessKeyId</name>
   <value>xxxx</value>
</property>
<property>
  <name>fs.s3n.awsSecretAccessKey</name>
  <value>xxxx</value>
</property>

You need to include the s3n keys, as that is the mechanism we will use to connect to the Amazon s3 dataset.

Create data to analyze with Splunk Analytics for Hadoop

We have multiple options for connecting to data for investigation within Splunk Analytics for Hadoop. In this guide, we will explore adding files to HDFS via Hue, and connecting to an existing Amazon S3 bucket to explore data.

Option 1 – S3

From the AWS Management Console, go into Amazon S3, and create a new bucket:

Give the bucket a meaningful name, and specify the region in which you would like it to exist:

Click create, and add some files to this new bucket as appropriate. You can choose to add the files to the top level, or create a directory structure:

The files or folders that you create within the Amazon S3 bucket need to have appropriate permissions to allow the Splunk Analytics for Hadoop user to connect and view them. Set these to allow ‘everyone’ read access, and reduce this scope to appropriate users or roles after testing.

Option 2 – HDFS

**this option is only relevant if you DO NOT want to leverage Amazon S3 for data storage. You’ll need to ensure that you have assigned appropriate disk space on the Hadoop nodes to leverage this method.

First, let’s create or upload some data in HDFS. First we will need a user in HDFS. We will use root, however this may not be the appropriate user in your environment. From the master node:

hadoop fs –mkdir hdfs://masternodeaddress:8020/user/root

hadoop fs –chown root:root hdfs://masternodeaddress:8020/user/root

Now, use hue to upload data to this new directory. Login to hue:

http://masternodeaddress:8888

Select the file browser, navigate to the /user/root directory and create a ‘data’ directory. Navigate into this directory, and then upload some files for use.

This should result in data being available in the Hadoop FS:

Set up Splunk Analytics for Hadoop for data analysis

To proceed, first you’ll need to grab some parameters from the Hadoop nodes:

Collect Hadoop and Yarn variables:

Java Home = type ‘which java’ = /usr/bin/java
Hadoop home = type ‘which hadoop’ = /usr/bin/hadoop
Hadoop version = type ‘hadoop version’ = hadoop 2.7.2-amzn-3
Name node port = In a browser go to http://masternodeaddress:50070 (or click on HDFS name node in the EMR management console screen)
Yarn resource manager scheduler address= In a browser go to http://masternodeaddress:8088/conf (or click on ‘resource manager’ in the EMR management console screen) = look for ‘yarn.resourcemanager.scheduler.address’ = x.x.x:8030
Yarn resource manager address= In a browser go to http://masternodeaddress:8088/conf (or click on ‘resource manager’ in the EMR management console screen) = look for ‘yarn.resourcemanager.address’ = x.x.x:8050

Now, we need to verify that the name node is correct. You can do this by executing this command:

hadoop fs –ls hdfs://masternodeaddress:8020/user/root/data

Now we can configure our Virtual Provider in Splunk. To do this, go to settings, and then Virtual Indexes:

Then choose to create a new provider:

Using the parameters that we gathered earlier, fill this section out:

Save this setup, and go to set up a new Virtual Index:

Here you can specify the path in HDFS that was set up in an earlier step, or choose to point to the S3 bucket that was created:

Option 1 – S3:

Ensure that you use the s3n prefix here.

Option 2 – HDFS:

Save this set up, and you should now be able to search the data within Amazon S3 (or HDFS) using Splunk Analytics for Hadoop!

Click search on the virtual index config:

Which will take you to the Splunk search interface. You should see something like the following:

↧

Splunking Kafka At Scale

October 31, 2016, 12:02 pm

≫ Next: Event Calendar Custom Visualization

≪ Previous: How to: Splunk Analytics for Hadoop on Amazon EMR.

At Splunk, we love data and we’re not picky about how you get it to us. We’re all about being open, flexible and scaling to meet your needs. We realize that not everybody has the need or desire to install the Universal Forwarder to send data to Splunk. That’s why we created the HTTP Event Collector. This has opened the door to getting a cornucopia of new data sources into Splunk, reliably and at scale.

We’re seeing more customers in Major Accounts looking to integrate their Pub/Sub message brokers with Splunk. Kafka is the most popular message broker that we’re seeing out there but Google Cloud Pub/Sub is starting to make some noise. I’ve been asked multiple times for guidance on the best way to consume data from Kafka.

In the past I’ve just directed people to our officially supported technology add-on for Kafka on Splunkbase. It works well for simple Kafka instances, but if you have a large Kafka cluster comprised of high throughput topics with tens to hundreds of partitions, it has its limitations. The first is that management is cumbersome. It has multiple configuration topologies and requires multiple collection nodes to facilitate data collection for the given topics. The second is that each data collection node is a simple consumer (single process) with no ability to auto-balance across the other ingest nodes. If you point it to a topic it will take ownership of all partitions on the topic and consumes via round-robin across the partitions. If your busy topic has many partitions, this won’t scale well and you’ll lag reading the data. You can scale by creating a dedicated input for each partition in the topic and manually assigning ownership of a partition number to each input, but that’s not ideal and creates a burden in configuration overhead. The other issue is that if any worker process dies, the data won’t get read for its assigned partition until it starts back up. Lastly, it requires a full Splunk instance or Splunk Heavy Forwarder to collect the data and forward it to your indexers.

Due to the limitations stated above, a handful of customers have created their own integrations. Unfortunately, nobody has shared what they’ve built or what drivers they’re using. I’ve created an integration in Python using PyKafka, Requests and the Splunk HTTP Event Collector. I wanted to share the code so anybody can use it as a starting point for their Kafka integrations with Splunk. Use it as is or fork it and modify it to suit your needs.

Why should you consider using this integration over the Splunk TA? The first is scalability and availability. The code uses a PyKafka balanced consumer. The balanced consumer coordinates state for several consumers who share a single topic by talking to the Kafka broker and directly to Zookeeper. It registers a consumer group id that is associated with several consumer processes to balance consumption across the topic. If any consumer dies, a rebalance across the remaining available consumers will take place which guarantees you will always consume 100% of your pipeline given available consumers. This allows you to scale, giving you parallelism and high availability in consumption. The code also takes advantage of multiple CPU cores using Python multiprocessing. You can spawn as many consumers as available cores to distribute the workload efficiently. If a single collection node doesn’t keep up with your topic, you can scale horizontally by adding more collection nodes and assigning them to the same consumer group id.

The second reason you should consider using it is the simplified configuration. The code uses a YAML config file that is very well documented and easy to understand. Once you have a base config for your topic, you can lay it over all the collection nodes using your favorite configuration management tool (Chef, Puppet, Ansible, et al.) and modify the number of workers according to the number of cores you want to allocate to data collection (or set to auto to use all available cores).

The other piece you’ll need is a highly available HTTP Event Collector tier to receive the data and forward it on to your Splunk indexers. I’d recommend scenario 3 outlined in the distributed deployment guide for the HEC. It’s comprised of a load balancer and a tier of N HTTP Event Collector instances which are managed by the deployment server.

The code utilizes the new HEC RAW endpoint so anything that passes through will go through the Splunk event pipeline (props and transforms). This will require Splunk version >= 6.4.0.

Once you’ve got your HEC tier configured, inputs created and your Kafka pipeline flowing with data you’re all set. Just fire up as many instances as necessary for the topics you want to Splunk and you’re off to the races! Feel free to contribute to the code or raise issues and make feature requests on the Github page.

Get the code

↧

Event Calendar Custom Visualization

November 1, 2016, 1:56 pm

≫ Next: Personal Dev/Test Licenses give you the freedom to explore

≪ Previous: Splunking Kafka At Scale

A while back, I wrote a blog post about using a custom calendar visualization in Simple XML dashboards. To accomplish this, I used a technique sometimes referred to as escape hatching JavaScript into Simple XML. While this works okay for a developer, the technique does not lend itself well to the end user.

Splunk Custom Visualizations

Splunk 6.4 introduced reusable custom visualizations which allows a developer to package up a visualization and integrate it into Splunk just like the native visualizations. This also addresses the limitation mentioned above – meaning any end user can use the visualization without mucking around with the Simple XML.

So, revisiting the older escape hatch calendar technique, I thought it would be a good exercise to convert the calendar into a custom visualization. The calendar is now available on Splunkbase, and several new features have been added.

Using the Calendar in Splunk

The calendar expects a search exposing _time and a count. The timechart search command does a good job of this. For example, the following search:

index=_internal | timechart span=1d dc(sourcetype) AS sourcetypes dc(source) as sources dc(host) as hosts

produces some nice tabular data like so:

The calendar visualization can take this data and visualize it on a calendar like this:

There are some formatting options as well.

Try it out yourself and go download it on Splunkbase.

↧

Personal Dev/Test Licenses give you the freedom to explore

November 2, 2016, 8:45 am

≫ Next: Splunk Challenge 2016 – Catch ’em all at Nanyang Polytechnic!

≪ Previous: Event Calendar Custom Visualization

Do you have a new use case to validate? Untapped data sources to investigate? Wouldn’t it be great to explore how Splunk might help other parts of your organization? All without impacting your production systems and license usage…

Free Personal Dev/Test Licenses

At .conf2016 in September, CEO Doug Merritt was clear that we want to make easier for you use Splunk across your business. Enforced metering is gone. And exploring new use cases should be hassle-free.

So now any Splunk Enterprise or Splunk Cloud customer employee can get a free personalized Splunk Enterprise Dev/Test software license. Each license is valid for up to 50 GB daily data ingestion and a six-month renewable term, giving you ample power and time to make big things happen with big data.

How do I get one?
Go to www.splunk.com/dev-test

Are there other limitations?
It’s single instance software. You can’t use it in production, nor stack it with other Splunk deployments. There are other nits, but the goal is to give you the freedom to explore without undue constraints. Learn more in the Dev/Test License FAQ.

Splunk On!

Kevin Faulkner

↧

Splunk Challenge 2016 – Catch ’em all at Nanyang Polytechnic!

November 23, 2016, 5:19 am

≫ Next: Announcing new AWS Lambda Blueprints for Splunk

≪ Previous: Personal Dev/Test Licenses give you the freedom to explore

Splunk Challenge 2016, the annual Splunk challenge that many NYP students have been waiting for, is here! Today, the students will be pitting their analytics’ skills learned using Splunk, against each other as they compete for a chance to take home some great prizes.

Unlike past years where the students were tasked to look into business and IT operation data, this year the ideas of analyzing “Pokemon” data was suggested by the lecturer to be used for the challenge. As the market leader in the data analytics space, not only it is important, but it also addresses some of our core values to keep what we are doing fun and innovative so that we will not only be able to attract more talent in the industry but also able to retain them. So here we have today, the “Pokemon” Challenge!

The data used may seem light and fun but to the 40 students in the room, the competition is intense. With all eyes glue to the screen and finger typing away at the keyboards, there is no time to waste.

Within the next few hours, highly sophisticated analytic reports showing the statistics and deep insights into Pokemon usage, profile, location, etc. are shown on their screens.

The students are so good it makes it very difficult to select a winner among them for the grand prize. Although there are only three prizes, in the eyes of all the lecturers and coordinators helping out in the event, they are all winners in their own right.

Lastly, a big “THANK YOU” to all the lecturers and Splunkers that helped to make this another successful Splunk Challenge. To the students, ’til we “Splunk” again!

↧

Announcing new AWS Lambda Blueprints for Splunk

November 29, 2016, 5:10 pm

≫ Next: Docker 1.13 with improved Splunk Logging Driver

≪ Previous: Splunk Challenge 2016 – Catch ’em all at Nanyang Polytechnic!

Splunk and Amazon Web Services (AWS) are continuously collaborating to drive customer success by leveraging both the agility of AWS, and the visibility provided by Splunk. To support that goal, we’re happy to announce new AWS Lambda blueprints to easily stream valuable logs, events and alerts from over 15 AWS services into Splunk to help customers gain critical security and operational insights.
splunk_lambda_medium With a point-and-click setup, you can use these blueprints to have Splunk ingest data from AWS services such as Kinesis Stream, CloudWatch Logs, DynamoDB Stream and IoT for further data processing & analytics in addition to logging AWS Lambda itself for instrumentation & troubleshooting.

Once Lambda blueprint is configured, events are automatically forwarded in near real-time by Lambda onto Splunk HTTP Event Collector without having to manage a single intermediary server, queuing or storage. This offers you an easy, fast & cost-effective data ingestion mechanism from AWS services to Splunk Enterprise or Splunk Cloud.

Below is a list of the various blueprints immediately available for you. In addition to updating the original splunk-logging Lambda blueprint which was released at re:Invent 2015 for the generic use case of logging events from and via AWS Lambda, we have also added new purpose-built blueprints specific for various AWS services to make it a simple plug-and-play process to to deliver streaming data from each of those AWS services to Splunk.

Part of this release, we’re also leveraging latest AWS Lambda features such as the recently announced support for environment variables to help you configure these blueprints with minimum code changes. Simply set your Lambda function environment variables to define your Splunk destination, specify the function trigger(s) or the AWS event source, and see your events stream in Splunk in near real-time.

To get started using these blueprints, visit the AWS Lambda Management Console.

To learn more go to http://dev.splunk.com/view/event-collector/SP-CAAAE6Y.

I. Stream AWS Kinesis Stream events to Splunk using splunk-kinesis-stream-processor Lambda blueprint

The splunk-kinesis-stream-processor blueprint can be used to automatically poll an Amazon Kinesis stream, parse new records and forward to Splunk.

kinesis-stream-blueprint-diagram-medium

You need to grant AWS Lambda permissions to poll your Amazon Kinesis stream. You grant all of these permissions to an IAM role (execution role) that AWS Lambda can assume to poll the stream and execute the Lambda function on your behalf. You specify the IAM role when you create your Lambda function. To simplify the process, you can use the predefined role AWSLambdaKinesisExecutionRole to create that IAM role for the Lambda function.

If interested to learn more about how Amazon Kinesis integrates with Lambda, see Using AWS Lambda with Amazon Kinesis.

II. Stream AWS CloudWatch Logs to Splunk using splunk-cloudwatch-logs-processor Lambda blueprint

The splunk-cloudwatch-logs-processor blueprint can be used to receive a real-time feed of logs events from CloudWatch Logs and forward to Splunk. The lambda blueprint takes care of decompressing and decoding the data before sending to Splunk.

cloudwatch-logs-blueprint-diagram-medium

You need to grant CloudWatch Logs the permission to execute your function. When using the AWS Console, Lambda will add the necessary permissions for Amazon CloudWatch Logs to invoke your Lambda function for the log group you specify when configuring the Lambda trigger in the AWS Console.

To complete the event source mapping, you need to set up a CloudWatch Logs subscription filter to enable the real-time feed of log events from CloudWatch Logs and have it delivered to Lambda. To learn more about Amazon CloudWatch Logs Subscriptions, see Real-time Processing of Log Data with Subscriptions.

III. Stream AWS DynamoDB Stream events to Splunk using splunk-dynamodb-stream-processor Lambda blueprint

The splunk-dynamodb-stream-processor blueprint is used to respond to updates made to a DynamoDB table, and forward that activity to Splunk.

dynamodb-stream-blueprint-diagram-medium

You need to create an Amazon DynamoDB Stream for your table. For more info, see Capturing Table Activity with DynamoDB Streams.

You need to grant AWS Lambda permissions to poll your DynamoDB stream. You grant all of these permissions to an IAM role (execution role) that AWS Lambda can assume to poll the stream and execute the Lambda function on your behalf. You specify the IAM role when you create your Lambda function. To simplify the process, you can use the predefined role AWSLambdaDynamoDBExecutionRole to create that IAM role for the Lambda function.

If interested to learn more about how DynamoDB integrates with Lambda, see Using AWS Lambda with Amazon DynamoDB.

IV. Stream AWS IoT events to Splunk using splunk-iot-processor Lambda blueprint

The splunk-iot-processor blueprint is used to create a Lambda function that responds and processes MQTT messages which have triggered an AWS IoT rule. These messages are typically sent by any IoT device or an AWS IoT button.

iot-blueprint-diagram-medium

When configuring the Lambda blueprint from the AWS console, you can create a new IoT rule as part of setting AWS IoT as trigger to your Lambda function. When using the AWS Console, Lambda will add the necessary permissions for AWS IoT to invoke your Lambda function. Alternatively, from AWS IoT Console, you can create (or reuse) an IoT rule, and add an action to invoke your Lambda function, as explained in Creating a Lambda Rule in IoT Rule Tutorial.

Conclusion

With serverless computing growing in popularity and streaming data volumes continuing to surge, we hope you’ll find these blueprints very useful. We can’t wait to see how you’ll be using these blueprints as part of your own big data ingestion pipelines. To further help you analyze and make sense of all this data from AWS services, make sure to check out the recently released Splunk App for AWS 5.0

↧

Docker 1.13 with improved Splunk Logging Driver

December 1, 2016, 1:00 pm

≫ Next: Easily Create Mod Inputs Using Splunk Add-on Builder 2.0 – Part IV

≪ Previous: Announcing new AWS Lambda Blueprints for Splunk

The evolution of Splunk and Docker continues! In the early days (2015) of Splunk and Docker we recommended using the native syslog logging driver in Docker Engine. In Feburary of 2016, Docker 1.10 came out and we contributed the first version of Splunk Logging Driver in Docker 1.10. Since that first release we have seen huge adoption. After reviewing feedback and thinking about what is needed for Splunk environments with Docker, we’ve added a bunch of new features!

When I wrote this blog post, Docker 1.13 was still in Release Candidate stage. If you still have a version of Docker less than 1.13, you can try to install it with docker-machine, like this:

$ docker-machine create \ --driver=virtualbox \ --virtualbox-disk-size=200000 \ --virtualbox-memory=4096 \ --virtualbox-cpu-count=4 \ --virtualbox-boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v1.13.0-rc2/boot2docker.iso \ docker-1.13.0-rc2

To try Splunk latest features you have to install matched or higher version of Docker Client. If you are using docker machine, you can just ssh on just created machine

$ docker-machine ssh docker-1.13.0-rc2

Skip verification for HTTP Event Collector endpoint availability

--log-opt splunk-verify-connection=true|false

In the example above and all following examples we will skip all the options we usually specify in the docker command to focus attention on the new parameters, the full command for the Splunk Logging Driver is:

docker run ubuntu --log-driver splunk --log-opt splunk-url=http://localhost:8088 --log-opt splunk-token=E055502A-D35D-45F7 -9457-085A9DB9A49F bash -c 'printf "my message\n{\"foo\": \"bar\"}\n"'

Sometimes a logging endpoint isn’t available/or misconfigured. Docker engineering recommended that the container should fail to start. This topic was an open discussion since opened the very first PR.

Since then we have received a lot of feedback on this issue–especially from the folks who wanted to run Splunk Enterprise on the same Docker host where they wanted to use Splunk Logging Driver. Not being in production environments, they were ok with missing logs before HTTP Event Collector endpoint was available. Our solution had to ensure when using the splunk logging driver, the container will start.

For production environments splunk-verify-connection=false also works nicely with new retry logic and configurable buffer in the Logging Driver.

Support for raw and JSON formats

--log-opt splunk-format=inline|json|raw

The default format will stay the same as it was originally implemented, now known as inline. For this update, we have two more format options, json and raw. Here’s an example. Let’s prepare a container which will print two messages to standard output, first line just a string,

Here’s an example. Let’s prepare a container which will print two messages to standard output, first line just a string, second line is a valid JSON message

docker run ubuntu bash -c 'printf "my message\n{\"foo\": \"bar\"}\n"' my message {"foo": "bar"}

Let’s look on how that will be represented in Splunk using various formats.

Inline format

--log-opt splunk-format=inline

This is the default format used in all versions up to docker 1.13.

Raw format

--log-opt splunk-format=raw

Note: we are still sending events on services/collector/event/1.0 endpoint (not services/collector/raw/1.0), which means that you can use it with Splunk Enterprise 6.3.x.

By default all messages are prepended with the value specified in the –log-opt tag, which, by default, has value {{.ID}} (you can read about all other available placeholders in Log tags for logging driver). You can remove tag from the message and for example send container id or name with the –log-opt splunk-source to identify logs from this container in the search

--log-opt splunk-format=raw --log-opt tag="" --log-opt splunk-source=my_container_id

JSON format

--log-opt splunk-format=json

It is very similar to the inline format, but in JSON format we also trying to parse the message as a valid JSON document, and if we fail we send it as inline string. Because we are parsing every message as JSON – Splunk Logging Driver will have small overhead in the logging pipeline.

Performance improvements

The first version of Splunk Logging Driver had very simple implementation. For every line in standard output we were sending one HTTP request to HTTP Event Collector, with one little downside– a huge overhead in the communication protocol. What we have learned from our customers is they might send a lot of messages to standard output. What we have learned from the Docker source code – the buffer between standard output and logging driver is set to 1Mb, in case if the buffer will be full – standard output will be blocked, which might block your application, if you are writing messages to standard output synchronously.

In docker 1.13 we have implemented buffer in the Splunk Logging Driver itself. Which can be configured, but not with the standard –log-opt way, as we did not want to confuse people with these options, because defaults should work in most cases. By default we are batching events in maximum of 1000 per HTTP request and sending them not less often than every 5 seconds. So in case if you are sending just one message in 10 seconds – possible that you will start seeing 5 seconds delay. This can be configured with environment variables, see Advanced options.

In case if you are sending a lot of messages to standard output you will see a huge improvement, for example

time docker run --log-driver=splunk --log-opt splunk-url=http://localhost:8088 --log-opt splunk-token=E055502A-D35D-45F7 -9457-085A9DB9A49F alpine sh -c "seq 1000000" > /dev/null

This simple example generates 1,000,000 lines in standard output. Before our change on my laptop result of time

real 2m52.159s user 0m0.080s sys 0m0.110s

After this change

real 0m14.122s user 0m0.060s sys 0m0.080s

As you can see the difference is in 12 times.

Retry logic

Because we have implemented our own buffer to improve the performance it was reasonable also to implement a retry logic. By default, we can store in buffer 10,000 events maximum. In case if we will reach this maximum we will start dropping events, one batch in a time. Again, this can be configured with advanced options.

Now that logic works really well with the splunk-verify-connection=false. If you are launching Splunk Enterprise image at the same time as your own application and you have small delay in when HTTP Event Collector will be ready to receive the data.

Gzip compression

--log-opt splunk-gzip=false|true

There is slight overhead, as data will need to be compressed. (default is off). Compression is handy in cases where you might be charged a traffic cost between datacenters or clouds. Enabling gzip compression in the Splunk logging driver can help to reduce the cost.

The level of compression can be set with

--log-opt splunk-gzip-level=-1,0,1...9

These numbers comes from gzip package, where -1 is DefaultCompression, 0 is NoCompression and all other numbers are level of compression, where 1 is BestSpeed and 9 is BestCompression.

Unit test code coverage

Unit test code coverage is the most important part of this update. We rewrote almost everything and introduced more complex logic as we wanted to be sure that code will be reliable and manageable. We covered Splunk Logging Driver with the unit tests (80-90% coverage). You can safely try to improve or change Splunk Logging Driver and send PR without worrying that something will be broken.

Enjoy!

↧

Easily Create Mod Inputs Using Splunk Add-on Builder 2.0 – Part IV

December 7, 2016, 7:18 pm

≫ Next: Kaufland DevSummit2016 – Splunk for DevOps – Faster Insights, better code

≪ Previous: Docker 1.13 with improved Splunk Logging Driver

Add-on Builder 2.0 provides capabilities to build modular inputs without writing any code. In this post however, we focus on using an advanced feature of Splunk’s Add-on Builder 2.0 to write custom python while taking advantage of its powerful helper functions.

NB: Future versions of Add-on Builder will obviate the need for some of the techniques mentioned below, most notably techniques in step #6 & step #8.

There is a veritable cornucopia of useful resources for building modular inputs at docs.splunk.com, dev.splunk.com, blogs.splunk.com, and more. This post certainly isn’t meant to replace those. No no, this post will simply walk you through leveraging Splunk Add-on Builder 2.0 to create custom code to query an API.

In this post we will create a modular input using some custom code to have more control while also leveraging Splunk Add-on Builder’s powerful helper functions. We’ll additionally explore some caveats between test mode and final-cut behavior.

If you’re looking for Part I of this blog, it doesn’t exist. Neither does Part II or Part III.
Spoiler Alert, neither did Leonard Part I, Leonard Part II, Leonard Part III, Leonard Part IV, or Leonard Part V. Some consider it regrettable that Leonard Part VI exists but I leave that to you to decide.

For a backstory, Part I would have used the Add-on Builder feature to Add a data input using a REST API and Part II would have used the feature to add a data input using shell commands. Part III would have therefore described adding a data input by writing your own code. They’re well described in those linked docs though, so we start where those stories lead us by expanding on Part III in this Part IV installment. Kind of A New Hope for our custom code.

You may have seen the post announcing Splunk’s Add-on Builder 2.0. If not, that would be a good pre-read too.

In This Post

Step 1 – Install Add-on Builder v. 2.0
Step 2 – Read through your API documentation
Step 3 – Create Your Add-On
Step 4 – Create Input
Step 5 – Initialize Parameters
Step 6 – Custom Code Primer: Single Instance Mode
Step 7 – Custom Code Auto Generated
Step 8 – Customizing The Auto Generated Code
Step 9 – Entering test values
Step 10 – Run Test
Step 11 – Save Work
Step 12 – Finish
Step 13 – Restart
Step 14 – Cooking With Gas

Step 1 – Install Add-on Builder v. 2.0

Download & install Add-on Builder version 2.0, which we’ll henceforth refer to as AoB

Return to Table of Contents

Step 2 – Read through your API documentation

You know what? This should actually be first, so you can decide how to implement this in AoB. The quicker add a data input using a REST API option may be in play. Yet here we are. For this example, we’ll use HackerNews, because I haven’t previously implemented it and it doesn’t require oAuth (I hope to release a Part V to review oAuth before 2017). Here is some documentation about the HackerNews API: https://github.com/HackerNews/API

Reading through it the first time I note we don’t need an API token or access key. I also notice we need to query max number of records each time, as text. The data itself will be returned in json format. We’ll want to use checkpointing to see how far we read in previous queries to see if we need to read more, etc. Fortunately, the AoB custom python option provides us helper functions to do these things, as we’ll see later in this example.

Return to Table of Contents

Step 3 – Create Your Add-On

Click “Create an add-on”

Fill in Add-on details and click Create

Return to Table of Contents

Step 4 – Create Input

Now we’ll define the input. Follow the workflow to create a custom Python input (Configure Data Collection -> Add Data -> Modular Input Code)

Step 4.1

Step 4.2

Step 4.3

Return to Table of Contents

Step 5 – Initialize Parameters

Now we need to specify the data input properties. NB: Default collection interval is 30 seconds, I adjusted to 300.

Next, data input variables are defined.
These are the per-instance variables configured in settings -> data inputs. There will be one of these for each user-configured input.
Generally this would be where user-specific information lives (e.g. API tokens, etc).
As this is a simple example, we will simply use “number of records to query each time” as a way to demo.

Finally, we’ll define the Add-on Setup Parameters. These are the global input parameters defined in the Add-on’s setup page. These will be the universal settings available to all inputs configured using this modular input.
In this example, we’ll specify an API base URI and the API version.

Return to Table of Contents

Step 6 – Custom Code Primer: Single Instance Mode

Modular inputs have a lot of plasticity. One such flexibility is that they can execute in single or multiple instance modes.

AoB 2.0 ‘my custom python’ feature always leverages single instance mode. It is statically defined in supporting-code that is automatically generated for you. It is not recommended to modify that code as it is re-generated each time you save your custom code in step 8.

It is mentioned here so that you understand that single instance mode only runs your custom code once for ALL its defined inputs. This means if you have three inputs, say foo, bar, and baz (each having their own stanzas in inputs.conf), your custom code will need to embed its logic within a loop that iterates over each stanza.

Don’t worry, we’ll solve for that in step 8 in an explicit example.

To further understand this topic, you may read Splunk’s mod input documentation that reviews single & multiple instance modes of a script.

NB: There are plans to make this easier in a future AoB version, this post is specifically written with AoB 2.0 in mind.

Return to Table of Contents

Step 7 – Custom Code Auto Generated

This is the code that is generated automatically. Notice all the guidance provided that is commented out, just ready for you to un-comment and use.

Review it here or in your browser, and skip to step 8.

# encoding = utf-8

import os

import sys

import time

import datetime

”’

IMPORTANT

Edit only the validate_input and collect_events functions.

Do not edit any other part in this file.

This file is generated only once when creating

the modular input.

”’

def validate_input(helper, definition):

“””Implement your own validation logic to validate the input stanza configurations”””

# This example accesses the modular input variable

# query_max = definition.parameters.get(‘query_max’, None)

pass

def collect_events(helper, inputs, ew):

“””Implement your data collection logic here”””

# The following example accesses the configurations and arguments

# Get the arguments of this input

# opt_query_max = helper.get_arg(‘query_max’)

# Get options from setup page configuration

# Get the loglevel from the setup page

# loglevel = helper.get_log_level()

# Proxy setting configuration

# proxy_settings = helper.get_proxy()

# User credentials

# account = helper.get_user_credential(“username”)

# Global variable configuration

# global_api_uri_base = helper.get_global_setting(“api_uri_base”)

# global_api_version = helper.get_global_setting(“api_version”)

# Write to the log for this modular input

# helper.log_error(“log message”)

# helper.log_info(“log message”)

# helper.log_debug(“log message”)

# Set the log level for this modular input

# helper.set_log_level(‘debug’)

# helper.set_log_level(‘info’)

# helper.set_log_level(‘warning’)

# helper.set_log_level(‘error’)

# helper function to send http request

# response = helper.send_http_request(url, method, parameters=None, payload=None,

# headers=None, cookies=None, verify=True, cert=None, timeout=None, use_proxy=True)

# get the response headers

# r_headers = response.headers

# get the response body as text

# r_text = response.text

# get response body as json. If the body text is not a json string, raise a ValueError

# r_json = response.json()

# get response cookies

# r_cookies = response.cookies

# get redirect history

# historical_responses = response.history

# get response status code

# r_status = response.status_code

# check the response status, if the status is not sucessful, raise requests.HTTPError

# response.raise_for_status()

# checkpoint related helper functions

# save checkpoint

# helper.save_check_point(key, state)

# delete checkpoint

# helper.delete_check_point(key)

# get checkpoint

# state = helper.get_check_point(key)

”’

# The following example writes a random number as an event

import random

data = str(random.randint(0,100))

event = helper.new_event(source=helper.get_input_name(), index=helper.get_output_index(), sourcetype=helper.get_sourcetype(), data=data)

try:

ew.write_event(event)

except Exception as e:

raise e

”’

Return to Table of Contents

Step 8 – Customizing The Auto Generated Code

Here we update auto generated code with our logic.

This is just a quick example so I skipped many important elements of a prod solution including (but not limited to) validation of inputs, logging verbosity flexibility, error handling opportunities, etc.

# encoding = utf-8

import os
import sys
import time
import datetime

'''
    IMPORTANT
    Edit only the validate_input and collect_events functions.
    Do not edit any other part in this file.
    This file is generated only once when creating
    the modular input.
'''
def validate_input(helper, definition):
    """Implement your own validation logic to validate the input stanza configurations"""
    # This example accesses the modular input variable
    # query_max = definition.parameters.get('query_max', None)
    pass

def collect_events(helper, inputs, ew):
  # We import json library for use in massaging data before writing the event
  import json
  
  # Return all the stanzas (per step #6)
  stanzas = helper.input_stanzas
  
  # Iterate through each defined Stanza (per step #6)
  # NB: I only ident this with two spaces so I don't have to re-ident everything else
  for stanza in stanzas:
      
    # Another two-space identation keeps all the "give-me" code from step #7 in-play 
    # without more indenting exercises
    helper.log_info('current stanza is: {}'.format(stanza))
    
    """Implement your data collection logic here"""
    # The following example accesses the args per defined input
    opt_query_max = helper.get_arg('query_max')
    # Test mode will yield single instance value, but once deployed, 
    # args are returned in dictionary so we take either one
    if type(opt_query_max) == dict:
        opt_query_max = int(opt_query_max[stanza])
    else:
        opt_query_max = int(opt_query_max)

    # Fetch global variable configuration (add-on setup page vars)
    # same as above regarding dictionary check
    global_api_uri_base = helper.get_global_setting("api_uri_base")
    if type(global_api_uri_base) == dict:
        global_api_uri_base = global_api_uri_base[stanza]
    global_api_version = helper.get_global_setting("api_version")
    if type(global_api_version) == dict:
        global_api_version = global_api_version[stanza]
        
    # now we construct the actual URI from those global vars
    api_uri = '/'.join([global_api_uri_base, 'v' + global_api_version])
    helper.log_info('api uri: {}'.format(api_uri))

    # set method & define url for initial API query
    method = 'GET'
    url = '/'.join([api_uri, 'maxitem.json?print=pretty'])
    # submit query
    response = helper.send_http_request(url, method, parameters=None, payload=None,
                              headers=None, cookies=None, verify=True, cert=None, timeout=None, use_proxy=True)
    # store total number of entries available from API
    num_entries = int(response.text)
    helper.log_info('number of entries available: {}'.format(num_entries))

    # get checkpoint or make one up if it doesn't exist
    state = helper.get_check_point('stanza' + '_max_id')
    if not state:
        # get some backlog if it doesn't exist by multiplying number of queries by 10
        # and subtracting from total number of entries available
        state = num_entries - (10 * opt_query_max)
        if state < 0:
            state = 0
    helper.log_info('fetched checkpoint value for {}_max_id: {}'.format(stanza, state))
    
    # Start a loop to grab up to number of queries per invocation without
    # exceeding number of entries available
    count = 0
    while (count < opt_query_max) or (count + state > num_entries):
        helper.log_info('while loop using count: {}, opt_query_max: {}, state: {}, and num_entries: {}'.format(count, opt_query_max, state, num_entries))
        count += 1
        # update url to examine actual record instead of getting number of entries
        url = '/'.join([api_uri, 'item', str(state + count) + '.json?print=pretty'])
        response = helper.send_http_request(url, method, parameters=None, payload=None,
                              headers=None, cookies=None, verify=True, cert=None, timeout=None, use_proxy=True)
        # store result as python dictionary
        r_json = response.json()  
        # massage epoch to a human readable datetime and stash it in key named the same
        if r_json['time']:
            r_json['datetime'] = datetime.datetime.fromtimestamp(r_json['time']).strftime('%Y-%m-%d %H:%M:%S')   
        helper.log_info('item {} is: {}'.format(state + count, r_json))   
        # format python dict to json proper
        data = json.dumps(r_json)
        # similar to getting args for input instance, find sourcetype & index
        # regardless of if we're in test mode (single value) or running as input (dict of values)
        st = helper.get_sourcetype()
        if type(st) == dict:
            st = st[stanza]
        idx = helper.get_output_index()
        if type(idx) == dict:
            idx = idx[stanza]      
        # write event to index if all goes well
        # NB: source is modified to reflect input instance in addition to input type
        event = helper.new_event(source=helper.get_input_name() + ':' + stanza, index=idx, sourcetype=st, data=data)
        try:
            ew.write_event(event)
            # assuming everything went well, increment checkpoint value by 1
            state += 1
        except Exception as e:
            raise e 
    # write new checkpoint value
    helper.log_info('saving check point for stanza {} @ {}'.format(stanza + '_max_id', state))
    helper.save_check_point('stanza' + '_max_id', state)

Return to Table of Contents

Step 9 – Entering test values

Now that we’ve copied pasta and modified for our own purposes, we can test!

Be sure to update the forms on the tabs Data input Definition & Add-on Setup Parameters to the left of the Code Editor to make sure test mode has parameters with which to work.

Return to Table of Contents

Step 10 – Run Test

I suppose this could have been included in step #9. Run the test once you’ve entered the parameters on both tabs per step #9 by clicking on that Test button.

In AoB 2.0, events successfully written to Splunk will be displayed in Output window on the right hand side.

Logging

A log for your mod input instance will be in $SPLUNK_HOME/var/log/splunk/<ta_name>_<input_name>.log
In my example, it lives at
/opt/splunk/var/log/splunk/ta_hackernews_hackernews.log

Return to Table of Contents

Step 11 – Save Work

Click the Save button

Return to Table of Contents

Step 12 – Finish

Click the Finish button

Return to Table of Contents

Step 13 – Restart

Re-start Splunk if you haven’t been prompted to do so by now

Return to Table of Contents

Step 14 – Cooking With Gas

Your custom code is setup. It is considered a good practice to create a sandbox index for which to test your new add-on. You can keep tweaking it via AoB until you get everything as you want, cleaning the sandbox index as needed, before validating & packaging (using AoB features, of course)

If you get stuck or have challenges, check the AoB Docs and explore Splunk Answers for Add-on Builder. If you don’t find what you need there, post a new question (be sure to tag it with “Splunk Add-on Builder”)

Return to Table of Contents

↧

Kaufland DevSummit2016 – Splunk for DevOps – Faster Insights, better code

December 20, 2016, 3:20 am

≫ Next: Visual link analysis with Splunk and Gephi

≪ Previous: Easily Create Mod Inputs Using Splunk Add-on Builder 2.0 – Part IV

The first DevSummit event was recently hosted by Kaufland with 200 people attending for the day to hear presentations about the “World of API”, discuss the latest best practice developments and build ideas in a hackathon. One highlight was the keynote from Markus Andrezak on how technology, business and innovation play together.

Of course, a team of Splunkers (big thanks to my colleagues Mark and Henning) wouldn’t miss such an event and got involved with a booth as well as a presentation. It was amazing to have so many fruitful discussions about how to make data more easily accessible and useable for business, development and operation teams. In the morning Joern Wanke from the Kaufland Omnichannel team presented on how his team gains business insights by monitoring their online shop with Splunk to analyze trends in product usage. Using Splunk’s pattern detection he showed how fast and easy it is to find error patterns in custom application logs and accelerate their development.

In the afternoon I had the chance to give a talk about DevOps and how Splunk can help in this area. Organizations with complex technology stacks often lack visibility across their full product development lifecycle. But visibility is crucial for fast, high quality product development and to achieve a fast to time market. So DevOps is under pressure to increase velocity and agility to support that goal. Often bugs and issues are not found until the next release goes into production. So how can Splunk help in this situation?

With Splunk we can abstract from all contributing systems and development steps by collecting and analyzing the data provided. Task tracking, code repo changes, results of continuous integration and build service systems, test automation and quality assurance, testing, staging, finally deployment and productive operations – all these moving parts can be analyzed and monitored in real time. This means all contributing teams have a unified view and the ability to make data driven decisions regarding their daily work. Don’t forget that they can also save a lot of time when it comes to finding and fixing errors, because development and operations has all the data easily accessible. The development team can also get insights from production logs – without touching critical systems. Finally it’s also worth mentioning that all teams can get proactive alerting about issues and easily integrate them with their collaborative environments, for example by utilizing custom alert actions or leverage DevOps related apps and add-ons from splunkbase. More details can be found in the presentation slides:

Splunk for DevOps – Faster Insights – Better Code from Philipp Drieger

In conclusion it was a great day with so many good conversations around new use cases to be discovered with Splunk. We’re excited to see how ideas thrive over the coming year and how Kaufland take it to the next level with DevSummit 2017. To learn more visit Splunk for DevOps and also check the Splunk Developer Portal.

↧

Visual link analysis with Splunk and Gephi

January 18, 2017, 7:33 pm

≫ Next: Getting Cloud Native with Splunk and Cloud Foundry

≪ Previous: Kaufland DevSummit2016 – Splunk for DevOps – Faster Insights, better code

As cyber-security risks and attacks have surged in recent years, identity fraud has become all too familiar for the common, unsuspecting user. You might wonder, “why don’t we have the capabilities to eliminate these incidents of fraud completely?” The reality is that fraud is difficult to characterize as it often requires much contextual information about what was occurring before, during, and after the event of concern in order to identify if any fraudulent behavior was even occurring at all. Cyber-security analysts therefore require a host of tools to monitor and investigate fraudulent behavior; tools capable of dealing with large amounts of disparate data sets. It would be great for these security analysts to have a platform to be able to automatically monitor logs of data in real-time, to raise red flags in accordance to certain risky behavior patterns, and then to be able to investigate trends in the data for fraudulent conduct. That’s where Splunk and Gephi come in.

Gephi is an open-source graph visualization software developed in Java. One technique to investigate fraud, which has gained popularity in recent years, is link analysis. Link analysis entails visualizing all of the data of concern and the relationships between elements to identify any significant or concerning patterns – hence Gephi. Here at Splunk, we integrated Gephi 0.9.1 with Splunk by modifying some of the Gephi source code and by creating an intermediary web server to handle all of the passing of data and communication with the Splunk instance via the Splunk API. Some key features that we implemented were:

Icon visualization of data types.
Expanding and collapsing of nodes into groups by data type.
Enhancing the timeline feature to include a Splunk style bar graph.
Drilling down into nodes (calling the Splunk API and populating data on the graph).

Gephi can populate a workspace or enrich the data already contained in a workspace by pulling in properly formatted data. We implemented this by setting up two servers, one of which would act as an intermediary and determine what kinds of data a node could pull in based on it’s nodetype, and another server which contained all the scripts that interacted with a Splunk instance to run Splunk searches, pull back the results, then format it in a way Gephi could already understand.

To make all this happen, Gephi makes a GET request to the Gephi-Splunk server (GSS) containing the nodetype, which prompts the GSS to return a list of available actions for that nodetype (Note: The list is statically defined in Gephi to simplify things for the demos). Each of these actions can be used (along with information about the node) to construct another GET request which gets sent again to the GSS then forwarded to a script server to execute that action. The action is completed by running a script held on the script server, actions involving Splunk searches are completed by using Splunk oneshot searches as defined in the Splunk API (http://dev.splunk.com/view/python-sdk/SP-CAAAEE5). The script server takes in the results of the search, formats it, and forwards it to the GSS, which responds to the original request from Gephi with a formatted output that Gephi can render. The architecture is defined visually below.

The reason for the separation of servers into a “permissions” server and a script server is to make it easier to expand this project to serve multiple use cases and leverage multiple Splunk instances, while keeping organization simple and limited to a single point. In other words, resources are separated, but management is centralized.

Install by following the instructions here: https://github.com/splunk/gephi-splunk-project/tree/master

The first screenshot shows a use-case in which an analyst might have six IP addresses to be investigated. The analyst can start out with only the six IP addresses shown on the graph, and then choose to select the “drilldown” menu option to make a call to Splunk for more information. Our Gephi instance will then populate the graph with all of the data received from Splunk, creating nodes with connections if the nodes do not already exist in the visualization, and only adding connections if the nodes do already exist in the visualization. The analyst can also choose to “playback” the data via the timeline to see how events were occurring through time.

Shown in the second screenshot is a use case in which an analyst might have a large dataset but no clues of where to start investigating. Importing the data into Gephi would allow for recognition of clusters of correlated events (shown as large red nodes in the screenshot). The timeline would also assist in seeing how these resources were being accessed through time.

In addition to anti-fraud use cases, the Gephi + Splunk integration can be applied to any datasets that have cause and effect relationships. The example we provide is of IP address, username, session ID, and user agent data. In order to use other datasets, you will have to change some of the code to display the correct icons and to drilldown into the nodes correctly (see “Altering Data Sources” section of the github docs).

Disclaimer: This integration is provided “as is” and should not be expected to be supported. The application has not been extensively tested with large data sets, so use with caution. Depending on the searches being run in Splunk, and the size of the underlying data set, searches may take a while to complete. The purpose of this application was to provide a proof of concept of using the Splunk API with an open-source graph visualization tool. At the moment, there are no official plans to integrate a graph visualization into the Splunk native web framework. If you intend on adapting this integration for your own uses, please be aware that it will require knowledge and use of Java and Python.

More information about Gephi can be found at their website: https://gephi.org/ and on their github repository: https://github.com/gephi/gephi

If you have any comments, questions, or feedback about this project, please send all inquiries to Joe Goldberg at jgoldberg@splunk.com

Special thanks to the Intern Team (Phillip Tow, Nicolas Stone, and Yue Kang) for making all this possible!

—
Gleb Esman,
Sr. Product Manager, Anti-Fraud

↧

Getting Cloud Native with Splunk and Cloud Foundry

January 19, 2017, 2:43 pm

≫ Next: Using machine learning for anomaly detection research

≪ Previous: Visual link analysis with Splunk and Gephi

The following is guest blog post by Matt Cholick, software engineer, Pivotal.

Enterprises are moving to microservices architectures, continuous delivery practices, and embracing DevOps culture. This is the foundation of a modern, “cloud-native” business. At Pivotal, we help companies make this transformation with our Pivotal Cloud Foundry product.

Our customers want to extend the utility of Splunk to include their new cloud-native apps running on Cloud Foundry. To this end, we’ve been working up an integration between these two products. This post reviews our progress so far, and concludes with an invite to our private beta program.

Screen Shot 2017-01-19 at 1.57.30 PM

What is Pivotal Cloud Foundry?

Pivotal Cloud Foundry is a platform, based on open source software, for deploying and operating applications. These apps can be deployed both on-premises and in the top public clouds. The product supports deploying a wide variety of languages (Java, .NET, Node.js, Python, and many others) in a uniform way.

Want to iterate on custom code quickly, but don’t want to re-solve all the problems of building a platform, container orchestration, and elasticity? Then Pivotal Cloud Foundry is worth a look! To learn more, visit the Pivotal Cloud Foundry platform site.

Metrics and logging are a big part of the platform, so let’s jump right into the integration with Splunk.

Cloud Foundry Logging Overview

Loggregator is Cloud Foundry’s logging system. It aggregates and streams logs from all user applications and platform components. The key concepts are:

Individual application developers connect to Loggregator to examine the logs of their app
Cloud Foundry operators use this same system to monitor the platform itself
The Firehose is a Loggregator feature that combines stream of logs from all apps and metrics data from CF components
A nozzle connects to the Firehose via WebSocket to receive these events

I ran a Splunk nozzle locally and captured all events in one of my team’s test environments. This resulted in ~170 events per second (EPS). The average event size varies based on the actual event mix between metrics & logs, and the size of application’s custom logs. Assuming a conservative average event size of 350 bytes, this translates to almost 5 GB/day of valuable data. This was a small environment (31 VMs), configured for high availability (i.e. nothing was scaled out, but redundancy was configured across availability zones).

At the other end of the spectrum, we recently did some Cloud Foundry scale testing, running applications in 250,000 containers. In a larger environment like that – which is common within Pivotal’s customer base – the underlying platform is over 1,500 VMs (50 times bigger than my test example). Imagine the amount of data that would generate!

With that many platform events, a solution like Splunk Enterprise is really useful to understanding what’s going on, which is where the new Splunk nozzle for Cloud Foundry Firehose proves helpful.

Splunk + Cloud Foundry

Now for a concrete example: let’s take a look at a message sent by the Gorouter service in Cloud Foundry. This component routes incoming traffic for both applications and the platform itself. The router periodically reports the total number of requests. Here’s a single message from the nozzle.

{ "cf_origin": "firehose", "delta": 1, "deployment": "cf", "event_type": "CounterEvent", "ip": "192.168.16.22", "job": "router", "job_index": "a6b31a06-7fab-4363-979c-0cbaac16b5fd", "name": "total_requests", "origin": "gorouter", "total": 75719 }

The “job” is router; this is component reporting. Components are scaled out, so “job_index” is the identifier for the individual VM that’s reporting. “CounterEvents” are strictly increasing until that instance of a component is restarted. The name is what’s getting counted, and each component reports several values.

After tracking this metric over time, we can run a Splunk search to translate all this data intro an interesting graph for this part of the platform:

Visualization

From the graph, it’s obvious this is a test environment: there are only a handful of requests per minute. To demonstrate a situation that might warrant investigation, I deployed an app and started making continual requests against it.

The chart shows nearly an order of magnitude more incoming requests: the sort of event an operator might want to examine further (perhaps to scale out components).

Here’s the underlying Splunk query:

This query takes advantages of several Splunk features to generate a visualization. It uses timechart and streamstats to build a delta across five minute increments. Summing the delta from the payload would be make simpler query, but a missed message would really throw off the graphs in that case. The subtraction at the end drops the last time bucket, as there’s nothing to calculate a difference against. After building several visualizations like this, I’ve really become a fan of Splunk’s search language.

The Details

The full solution looks like this:

Pivotal Cloud Foundry Splunk Enterprise +------------------+ BOSH Managed VMs +---------------+ | | +------------------+ | | | | | +-------------+ | | | | | | |Splunk heavy +--------> | | | | | forwarder | | | | | | | +-----^-------+ | | | | Loggregator | | | | | | | +-----+ | | | | +---------------+ | | +--+ | | +----+-----+ | | | | +---------> Nozzle | | | +-----+ | | | +----------+ | | |------+ | | | +------------------+ +------------------+

Pivotal Cloud Foundry aggregates logs and events, and ships them via the firehose, as described in the previous section.

To harvest events, the solution uses BOSH to deploy and manage a nozzle as well as a Splunk heavy forwarder, both co-located together on VMs. A full description of BOSH is outside the scope of this post, but for the short summary it’s a tool that:

Provisions VMs on multiple IaaS providers
Installs component software on those VMs
Monitors component health

BOSH can scale out the nozzle/forwarder VM as needed, based on the size of the platform.

Co-locating a nozzle with the heavy forwarder enables several features:

The forwarder buffers data during events like a network partition or a temporarily downed indexer
The forwarder can securely forward data to external Splunk deployment using SSL client authentication
The nozzle parses and sends JSON to the local forwarder, so events can be richer than they might otherwise be with a solution like text tile parsing. Metadata like application info can also be added.
The nozzle only forwards locally, so we don’t have to add complexity for features around acknowledgement (as this is handled by the forwarder already)

Next Steps

Do you use the open source version of Cloud Foundry? The Splunk nozzle is easy to run locally, checkout the open source nozzle code and test it out. The full, BOSH managed solution, is also available as open source.

We’ve also built a Splunk Add-on for Cloud Foundry which includes pre-built panels that operators can use as a starting point to build dashboards for their installation, in addition to the sample operational dashboard shown above.

For Pivotal Cloud Foundry operators, the tile is currently in closed beta. Contact your account manager if you’re interested in trying out the MVP.

For technical questions or feedback, feel free to contact myself or my Splunk counterpart (Roy Arsan).

Matt Cholick
Software Engineer
Pivotal

Pivotal Cloud Foundry @pivotalcf

↧

Using machine learning for anomaly detection research

February 15, 2017, 5:31 am

≫ Next: From API to easy street within minutes

≪ Previous: Getting Cloud Native with Splunk and Cloud Foundry

Over the last years I had many discussions around anomaly detection in Splunk. So it was really great to hear about a thesis dedicated to this topic and I think it’s worth sharing with the wider community. Thanks to its author Niklas Netz in advance!

Obviously anomaly detection is an important topic in all core use case areas of Splunk, but each one has different requirements and data, so unfortunately there is not always an easy button. In IT Operations you want to detect systems outages before they actually occur and proactively keep your depending services up and running to meet your business needs. In Security you want to detect anomalous behavior of entities to detect potential indicators for breaches before they occur. In Business Analytics you might want to spot customer churn or find patterns that indicate severe business impacts. In IoT you may want to find devices that suddenly turn into an unhealthy state or detect anomalies in sensor data that indicate potentially bad product usage.

Before we start with solutions let’s take a step back and raise a more fundamental question: “What is an anomaly?” or “What does anomaly detection mean (in your context)?” One common answer from Wikipedia “is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.”

So this means that we need to know an “expected pattern” or a normal status what is often referred with the term “baselining”. Sometimes people do not yet have a clear answer to the question what anomaly or normality means in the context of their use cases, so finding the right approach is obviously even harder then.

Historically there have been many in-depth studies around anomaly detection but recently there was a thesis published by Niklas Netz who took a closer look at different ways to spot anomalies specifically with Splunk. His research was part of the cooperation between Hamburg University of Applied Sciences and OTTO group together with Splunk partner LC Systems who also jointly presented the results at .conf 2016:

http://conf.splunk.com/files/2016/slides/anomaly-detection-on-business-items-with-machine-learning-algorithms.pdf

Now Niklas’ thesis (in german) is published and definitely worth a read for anybody who wants to go in depth and detail with anomaly detection in Splunk. He addresses the basic challenges and compares different approaches and solutions that span from basic SPL commands for anomaly detection over 3^rd party apps to Splunk App for Machine Learning. Read the full text here: http://edoc.sub.uni-hamburg.de/haw/volltexte/2016/3691/pdf/Bachelorarbeit_Netz.pdf

As a brief summary, Niklas concluded that getting the right data, cleaning and transforming it so that it was sufficient for his goals was the most time consuming part in the process. He decided to evaluate different machine learning models for categorical classification to detect data points that were labeled as anomaly if they were crossing a threshold of relative change compared to the hour or day before. So according to his goal he defined conditions and engineered features that helped to model what’s normal and in relation to that what is an anomaly. In his case a RandomForestClassifier did the best job. With his work he paved the road for further development of machine learning and anomaly detection use cases at OTTO, but I also hope the wider Splunk community will find his work valuable.

Finally I want to share a few links to useful products and resources that help to tackle anomaly detection in Splunk for specific areas or in general:

Discover Splunk premium apps like Splunk IT Service Intelligence with automated anomaly detection and dynamic thresholding or Splunk Enterprise Security and Splunk User Behavior Analytics
Splunk SPL commands for finding anomalies
Splunk Machine Learning App
Anomaly detectors
Explore apps on splunkbase around anomaly detection
Anomaly detection topics on answers.splunk.com

Follow @phdrieger

Follow @SplunkDE

↧

From API to easy street within minutes

February 21, 2017, 2:48 pm

≪ Previous: Using machine learning for anomaly detection research

30? 20? …15? It all depends on how well you know your third-party API. The point is that polling data from third-party APIs is easier than ever. CIM mapping is now a fun experience.

Want to find out more about what I mean? Read the rest of this blog and explore what’s new in Add-on Builder 2.1.0.

REST Connect… and with checkpointing

Interestingly this blog happens to address a problem I faced back on my very first project at Splunk. When I first started at Splunk as a Sales engineer, I worked on building a prototype of the ServiceNow Add-on. Writing Python, scripted inputs vs mod input, conf files, setup.xml, packaging, best practices, password encryption, proxy and even checkpointing… the list goes on. It was tough dealing with all of these, to say the least. Was wondering why this can’t be much easier.

Fast forward to today, and an easy solution has finally arrived. You can now build all of the above with the latest version of Add-on Builder, all without writing any code or dealing with conf files. If you know your third-party API, you could be building the corresponding mod input in minutes.
One powerful addition to our new data input builder is checkpointing. In case you were wondering, checkpoints are for APIs what file pointers represent for file monitoring. Instead of polling all data from an API, checkpointing allows you to do it incrementally for new events only, at every poll. Checkpointing is a pretty complicated concept at times but very essential to active data polling. Luckily, I can say that this is no longer as complex as it used to.

For an example of doing this in Add-on Builder 2.1.0, check out Andrea Longdon’s awesome walkthrough using the New York Times API. This cool example will show you how to monitor and index NY Times articles-based user-defined key words.

You will be able to define your app/add-on setup and automatically encrypt passwords using the storage password endpoint, in a drag and drop interface.

CIM update at run-time

CIM mapping has the following major enhancements:

A new UI that makes it possible to compare fields from your third-party source and CIM model fields side by side.
You can also update CIM mapping objects even if they are built outside of Add-on Builder with no restart needed. In other words, can now update CIM mapping at run time in one single view from Add-on builder.

What else is new?

The Add-on Builder has a new and enhanced setup library consistent with modern Splunk-built add-ons. This allows you to have more flexibility over the setup components you are building. That, in addition to automatically handling password encryption.

You can now import and export add-on projects, allowing you to work on an add-on on different computers and share projects with others. For details, see Import and export add-on projects.
One of my favorites: no more interruptions caused by having to restart Splunk Enterprise when building new data inputs, creating a new add-on, or any other step. Go through the end-to-end process, undisturbed.

Please check out our latest release. We would love to hear from you. Teaser alert, in the next blog post, I will share information about how to build SolarWinds Add-on using Add-on Builder 2.1.0.

Happy Splunking!

↧

Content Curation and Recommendation

Categories

Category Pages

Technology Vendors

Splunk Built and Splunk Certified Content

Responsive Design

User Specific App Recommendations

What’s Next?

The Technique

The Result

The code

Putting it all together

Notes:

Using Amazon EMR and Splunk Analytics for Hadoop to explore, analyze and visualize machine data

Configure Splunk Analytics for Hadoop on AWS AMI instance to connect to EMR Cluster

Create data to analyze with Splunk Analytics for Hadoop

Set up Splunk Analytics for Hadoop for data analysis

Splunk Custom Visualizations

Using the Calendar in Splunk

I. Stream AWS Kinesis Stream events to Splunk using splunk-kinesis-stream-processor Lambda blueprint

II. Stream AWS CloudWatch Logs to Splunk using splunk-cloudwatch-logs-processor Lambda blueprint

III. Stream AWS DynamoDB Stream events to Splunk using splunk-dynamodb-stream-processor Lambda blueprint

IV. Stream AWS IoT events to Splunk using splunk-iot-processor Lambda blueprint

Conclusion

Skip verification for HTTP Event Collector endpoint availability

Support for raw and JSON formats

Inline format

Raw format

JSON format

Performance improvements

Retry logic

Gzip compression

Unit test code coverage

In This Post

Step 1 – Install Add-on Builder v. 2.0

Step 2 – Read through your API documentation

Step 3 – Create Your Add-On

Step 4 – Create Input

Step 4.1

Step 4.2

Step 4.3

Step 5 – Initialize Parameters

Step 6 – Custom Code Primer: Single Instance Mode

Step 7 – Custom Code Auto Generated

Step 8 – Customizing The Auto Generated Code

Step 9 – Entering test values

Step 10 – Run Test

Logging

Step 11 – Save Work

Step 12 – Finish

Step 13 – Restart

Step 14 – Cooking With Gas

What is Pivotal Cloud Foundry?

Cloud Foundry Logging Overview

Splunk + Cloud Foundry

The Details

Next Steps

REST Connect… and with checkpointing

CIM update at run-time

What else is new?