Box Plots: Making Custom Visualizations

May 12, 2016, 10:10 am

≪ Previous: Creating a Splunk Javascript View

This is the first of a two part series on implementing Box Plots in Splunk for security use cases.

Analyzing complex data is difficult, which is why people use Splunk. Sometimes patterns in data are not obvious, so it takes various ways of looking at aggregate reports and multiple charts to ascertain the important information buried in the data. A common tool in a data analyst’s arsenal is a box plot. A box plot, also called a box and whisker plot, is a visual method to quickly ascertain the variability and skew of data, as well as the median. For more about using and reading box plots, read the excellent and succinct post by Nathan Yau of the Flowing Data blog “How to Read and Use a Box-and-Whisker Plot.”

With Splunk Enterprise 6.4, there is a new framework for custom visualizations. For anyone interested in building their own, there are extensive and well written docs on building custom visualizations, and they are excellent tutorials and reference materials for anyone building new visualization apps.

The most difficult part of building visualizations is not creating the Splunk app, especially with the excellent documentation and great community support from Answers, IRC, and Slack (signup online). The basic steps are (largely distilled from http://docs.splunk.com/Documentation/Splunk/6.4.0/AdvancedDev/CustomVizTutorial):

Create working visualization outside the Splunk framework. This usually is as simple as an HTML file to call the JavaScript and the JavaScript code itself, combined with some example input of some sort.
Download and install a fresh, unused Splunk Enterprise 6.4 (or newer) instance on a test machine or a workstation. Do not install any other apps or add-ons. This provides a clean and uncluttered environment for testing the app without potential conflicts or problems from a production environment. This, also, allows for restarting Splunk, reloading configs, or removing and reinstalling the app or Splunk itself at any time during the development process.
Download the example app from the tutorial.
Install the Viz_tutorial_app in the Splunk test instance.
Rename the app directory to match the new app being developed.
Edit the config files as directed in the tutorial for the new app name and other settings.
Perform JavaScript magic to create the working visualization within Splunk. This post will help with this process.
Optionally (and preferably) add user definable options.
Test and package the app.

The difficult part of building a visualization app is the JavaScript code drawing the chart mentioned in steps one and seven above. Most people start with pre-written libraries to save the arduous work of writing the code from a blank screen, which sometimes makes step one easier. However, even when these libraries work in their native form perfectly well, most of them require some massaging before they work correctly within the Splunk Custom Visualization framework.

The most common visualizations are built on top of the Data Driven Documents (D3) library D3.js. The bulk of existing D3 applications are designed to work with some raw, unprocessed data, often supplied by a CSV file or other static source. Because of this, the data input usually must be altered within the JavaScript when building a Splunk custom visualization. In addition, without an analytics engine supplying the data, most D3 applications are written to perform all the mathematical calculations on the static data sources. With Splunk’s superior ability to perform a wide range of calculations on vast amounts of data, it behooves a visualization developer to alter the JavaScript to accept pre-processed data.

Following this paradigm, the D3 Box Plot application started with Jens Grubert’s D3.js Boxplot with Axes and Labels code. Grubert’s code runs well using the static format for a CSV as data input, but the JavaScript is hardcoded for a specific number of columnar inputs from the file. Altering the code is required to change the number of columns and, therefore, the number of box plots displayed in a single chart. Also, the source app performs all the calculations within the JavaScript using raw number inputs from the CSV data file.

Splunk supplies the pre-calculated data for an arbitrary number of box plots needed and uses the D3 Box Plot app only for display. Therefore, the original code required significant changes to remove the calculations; alter the inputs to accept the pre-calculated data only needed to draw the visual elements; and to work within the Splunk Custom Visualization framework.

Kevin Kuchta provided significant assistance in reworking the JavaScript from Grubert’s original code into something meeting the data input requirements and removing the mathematical functions to operate as a standalone app. This was needed to ensure the application can perform all the needed functions before it was converted to work within Splunk. Some of the original code is commented out in case it becomes useful in future editions of the published app and some has been removed entirely.

Grubert’s code uses two scripts. One is embedded in an HTML file that is used to call the code with a browser, and the other is a standalone file called box.js. During the development phases of altering the code to run outside of Splunk, the embedded script in the HTML was moved to an outside file called boxparse.js and sourced within the HTML.

Setting up the Development Environment

The original code in the HTML file that calls the visualization library used (d3.min.js) and the box.js file and looks like:

<script src=”http://d3js.org/d3.v3.min.js”></script>
<script src=”d3.v3.min.js”></script>
<script src=“box.js”></script>

After pulling the code from between the <script></script> tags immediately following those three lines and putting it into the boxparse.js file, it was sourced by adding:

<script src=“boxparse.js”></script>

To test this locally without cross-domain errors in Chrome (the browser of choice for debugging JavaScript today), an in-place web server was run using port 9000 (to not interfere with Splunk running locally on 8000) on the local machine from the directory holding the box plot code using:

python -m SimpleHTTPServer 9000

This allows for rapid testing using Chrome pointed at http://localhost:9000/.

Changing Inputs and Removing Data Calculations

The next step was to remove the calculation code and altering the inputs to both be dynamic in the number of different data sets for a variable number of box plots to display and to accept pre-calculated values for the final data required to create a box plot.

The required values to create a box plot are:

median
min
max
lower quartile (used for lower bound of the box)
upper quartile (used for upper bound of the box)
interquartile range (the difference between the upper and lower quartiles and called iqr in the app)
list of outlier values (not used in the initial version of the Box Plot Viz app)
category name or label

The data parsing is in the boxparse.js code taken from the HTML file. This made the process simple to remove the lines starting with:

// parse in the data
d3.csv(“data.csv”, function(error, csv) {

and ending with:

if (rowMax > max) max = rowMax;
if (rowMin < min) min = rowMin;
});

This section of the original code both reads the input CSV file and performs calculations on the data to find min and max values for each set. All of this code was removed and min and max are now set using:

var yMinValues = [];
for (minind = 0; minind < data.length; minind++) {
yMinValues.push(data[minind][2][0]);
}
var yMin = Math.min(…yMinValues);

var yMaxValues = [];
for (maxind = 0; maxind < data.length; maxind++) {
yMaxValues.push(data[maxind][2][1]);
}
var yMax = Math.max(…yMaxValues);

This sets yMin and yMax as new variables for clarity in naming, rather than using the original code’s min and max variable names. This required changing the y-axis from using:

.domain([min, max])

to using:

.domain([yMin, yMax])

The iqr() function to calculate the interquartile range was removed entirely, and references to such were replaced with the iqr variable supplied by the external data (to prepare for conversion to Splunk Custom Visualization).

Another notable change was to pass the yMin and yMax variables to the d3.box() function thusly:

var chart = d3.box({“yMax”:yMax,”yMin”:yMin})

This sends the data as part of the config object sent to d3.box() in box.js. To use these in box.js, the following was added to the bottom of the d3.box() function:

yMin = config.yMin,
yMax = config.yMax;

During testing, an array was created in boxparse.js to include data for testing. This was better than using an external file because it simulates how the data will come from splunk in the variable named data. Arbitrarily, the decision was made to use an ordered, index array like:

var newFakeData = [
// Column Quartiles Whiskers Outliers Min Max
[“somedata”, [ 10, 20, 30 ], [5, 45], [1, 2], 0, 200],
[“otherdata”, [ 15, 25, 30 ], [5, 65], [], 2, 150],
];

Although outlier support was not included in the initial version due to the complexity of Splunk searches being difficult for the average user, the ability to read and draw them is still in the code. They are merely set to null on input.

The last part, converting box.js to use the new data inputs rather than the internally calculated values was a fairly lengthy but not difficult process. It required careful review of the code to see where all the values were submitted to the various drawing functions or setting variables from calculations. In the places where there were calculations, a simple variable assignment replaced the original code.

For example, the original box.js set min and max with:

var g = d3.select(this),
n = d.length,
min = d[0],
max = d[n – 1];

However, the new box.js simple does:

var g = d3.select(this),
min = data[4],
max = data[5];

In the cases where values were calculated in separate functions, those functions were complete replaced with variable assignments.

For example, the original box.js set whiskerData using:

// Compute whiskers. Must return exactly 2 elements, or null.
var whiskerIndices = whiskers && whiskers.call(this, d, i),
whiskerData = whiskerIndices && whiskerIndices.map(function(i) { return d[i]; });

Yet, the new box.js uses the supplied array of whisker values using the inputed data with:

whiskerData = data[2];

This method was used on the rest of the required variables needed to build a box plot.

After these changes, the box plot loaded via the HTML file and the local test HTTP server.

Conversion to a Splunk App

The next step was to convert the stand alone application to work in Splunk as a Custom Visualization. Kyle Smith, a developer for the Splunk partner Aplura, and author of Splunk Developer’s Guide, provided excellent and thorough guidance in this process. His personal advice and assistance combined with his book were instrumental in the success of this conversion. There were numerous display issues once the app was built in Splunk. This required many iterations of tweaking the code, running the build command, running the search, and more code tweaking.

This process took a fair amount of experimentation with fits and starts down the wrong paths, much as any development process. The final changes are roughly outlined below.

The first thing to do was pull the CSS formatting code from the HTML file and place it into the file at:

$SPLUNK_HOME/etc/apps/viz_boxplot_app/appserver/static/visualizations/boxplot/visualization.css

Next, based on suggestions in the tutorial and Smith, the boxparse.js file was pasted directly into the updateView() function in the supplied source file found at:

$SPLUNK_HOME/etc/apps/viz_boxplot_app/appserver/static/visualizations/boxplot/src/visualization_source.js

An immediate problem to tackle is the conversion of the data supplied from the Splunk search to the format coded as discussed above. In future versions this will entail recoding all the data references to pull from the Splunk supplied data structure, but for now there is code to convert the format to wedge it into the format shown above. There are three formats for Splunk sending the data. They are documented at in the Custom visualization API reference. The default is Row-major where a JavaScript object is returned with an array containing field names and the row values from the raw JSON results object. There is a Column-major option which does the same for column values. The third option is Raw, which returns the full JSON object. The approach used here is Raw. This is set in the visualization_source.js file in the getInitialDataParams by changing outputMode from:

outputMode: SplunkVisualizationBase.ROW_MAJOR_OUTPUT_MODE,

to:

outputMode: SplunkVisualizationBase.RAW_OUTPUT_MODE,

This makes it possible to pull all the field value pairs into the app for quick conversion into the index array format in the formatData function using:

var newData = _.map(data.results, function(d) { return [ d[data.fields[0].name], [ Number(d.lowerquartile), Number(d.median), Number(d.upperquartile) ], [ Number(d.lowerwhisker), Number(d.upperwhisker) ], [], Number(d.min), Number(d.max) ]; });

This, also, forces evaluation of the values to numbers for all but the first category field, which should be a string holding the values of the split by field.

The most difficult part to track down at that point was tweaking the display to work correctly drawing a dynamic Y axis with correctly positioned sized box plots relative to that scale. After much experimentation, some padding happened at the top of the graph by adding a programmable label offset using:

var labeloffset = 75;
var height = 400 + labeloffset – margin.top – margin.bottom;

in visualization_source.js within the updateView code to provide some room at the top of the graph. In addition, the height for the box plots scale was changed from:

.range([yMax, min]);

to:

.range([height, yMin]);

This allowed for the Y axis draw and the box plot range to use the same values, which then allows for the positioning and sizing of the two to be relative such that 50 on the axis lines up with 50 for each of the box plots drawn.

At this point, it was simply adding the steps needed to meet Splunkbase standards and packaging the app directory into a tar.gz file and renaming to .spl.

Final Results

The ultimate result is a Box Plot app with results such as the image below (taken from the Box Plot App example screenshot):

The app is available on Splunkbase at https://splunkbase.splunk.com/app/3157/.

The field used must be numeric and be split by another field. The search is longer than many of the normal visualizations, but the density of information displayed and the requirement of pre-computing all values necessitates the specifics.

The search used in the graph example (which comes with the app) uses a lookup with a series of values for the categories shown to provide numeric data and split by categories.

The specific search is:

| inputlookup boxplotexample.csv | stats median(Cost) AS median, min(Cost) AS min, max(Cost) AS max, p25(Cost) AS lowerquartile, p75(Cost) AS upperquartile by Service | fields – count | where isnotnull(median)| eval iqr=upperquartile-lowerquartile | eval lowerwhisker=median-(1.5*iqr) | eval upperwhisker=median+(1.5*iqr)

If the search starting at | stats … is copied and the Cost and Service field names changed, the box plot should draw. If the number ranges in one of the split by field values is in a totally different order of magnitude, the display will not likely be useful for comparisons. In those situations it may be useful to isolate split by field values by the general range for the numeric field (min and max) into different searches using their own box plots. This can be quickly determined by doing a | stats min(field) AS min, max(field) AS max and then sorting as needed to find common groupings.

As an aside, for those new to Splunk app development, speed up the process of reloading app content and configuration file changes, the Splunk docs Customization options and caching: Clear client and server assets caches after customization section suggests using http://<host:mport>/debug/refresh and/or http://<host:mport>/<locale_string>/_bump (e.g. http://localhost:8000/en-US/_bump). Use the debug/refresh end point to reload XML, configuration, and HTML files, and _bump to reload changes to ../appserver/static directories. These were both used many times during the development of the Box Plot app.

The next installment will show the Box Plot App leveraged for a security use case.

↧

Vote using Splunk

May 20, 2016, 10:30 am

≫ Next: Configuring Nginx Load Balancer For The HTTP Event Collector

≪ Previous: Box Plots: Making Custom Visualizations

Someone recently challenged me to use Splunk for voting. Splunk is a versatile platform, why not make a voting app? Sigi and Stephen put the app together one afternoon and then I tested it out on a live audience during SplunkLive! San Francisco.

Picture1 copy

It worked like a charm and we gained insight from the audience. That’s when I realized, although it’s not a typical use case of Splunk, this app could be useful for others. From polling an audience during a presentation or even getting consensus from coworkers on a question during a meeting, maybe I should put the app on splunkbase.

Anyone want me to open source this and put it on splunkbase? #splunklive https://t.co/IPaX7N0hwm

— Nate McKervey (@DataPhysicist) March 16, 2016

I finally got around to publishing it. It consists of a few components. A webpage (thanks Sigi!)which users click a letter to cast their vote, the splunk vote app (thanks Stephen!) which displays the results, and the Lookup File Editor App for Splunk which makes it easy to create and edit the questions people will vote on. The webpage is a simple page that allows participants to select one more more answers and move on to the next question by touching the right arrow at the bottom.

When a user clicks on an answer an event, like the one below, is sent to your Splunk instance using the HTTP Event Collector.

The subject “SplunkLive! San Francisco” is created by appending to the URL. In this case “my.webserver.com/vote/?subject=SplunkLive!%20San%20Francisco“. Simply substitute it for whatever event or topic you want people to vote on. I recommend using a URL shortener so you can give participants a more friendly URL to type in.

To setup the questions simply click the “Edit Questions” link in the nav bar. Some example questions are in the app already. Change them to whatever you would like and make sure to set the subject to match the subject in the URL.

The event will appear in the “Select Event” dropdown. Selecting an event and a question in the dropdowns will populate the URL and question panels in the dashboard by pulling from the lookup table.

That’s it! Now you can have people vote on topics anytime anywhere using Splunk!

P.S. If you want to shake things up check this out next. https://github.com/splunk/parallel-piper
At .conf2015 attendees shook their phones hard enough it triggered a custom alert action in Splunk and launched Buttercup from a cannon! I’d love to see what others have come up with. Be sure to send me a message if you’ve done something with the shake or vote app.

↧

Configuring Nginx Load Balancer For The HTTP Event Collector

May 24, 2016, 2:51 pm

≫ Next: Configuring Nginx With Splunk, REST API & SDK Compatibility

≪ Previous: Vote using Splunk

The HTTP Event Collector (HEC) is the perfect way to send data to Splunk, at scale, without a forwarder. If you’re a developer looking to push logs into Splunk over HTTP or you have an IOT use case then the HEC is for you. We cover multiple deployment scenarios in our docs. I want to focus on a single piece of the following distributed deployment for high availability, throughput and scale; the load balancer.

You can use any load balancer in front of the HEC but this article focuses on using Nginx to distribute the load. I’m also going to focus on using HTTPS as I’m assuming you care about security of your data in-flight.

You’re going to need to build or install a version of Nginx that enables HTTPS support for an HTTP server.

./configure --with-http_ssl_module

If you install from source and don’t change the prefix then you’ll have everything installed in /usr/local/nginx. The rest of the article will assume this is the install path for Nginx.

Once you’ve got Nginx installed you’re going to need to configure a few key items. First is the SSL certificate. If you’re using the default certificate that ships with Splunk then you’ll need to copy $SPLUNK_HOME/etc/auth/server.pem and place that on your load balancer. I’d highly encourage you to generate your own SSL certificate and use this in place of the default certificate. Here are the docs for configuring Splunk to use your own SSL certicicate.

The following configuration assumes you’ve copied server.pem to /usr/local/nginx/conf.

    server {
        # Enable SSL for default HEC port 8088
        listen 8088 ssl;

        # Configure Default Splunk Certificate. 
        # Private key is included in server.pem so use it in both settings.
	ssl_certificate     server.pem;
    	ssl_certificate_key server.pem;		

	location / {
            # HEC supports HTTP Keepalive so let's use it
	    # Default is HTTP/1, keepalive is only enabled in HTTP/1.1
  	    proxy_http_version 1.1;

  	    # Remove the Connection header if the client sends it,
  	    # it could be "close" to close a keepalive connection
  	    proxy_set_header Connection "";

            # Proxy requests to HEC
            proxy_pass https://hec/services/collector;
	}
    }

Next we’ll configure the upstream servers. This is the group of servers that are running the HTTP Event Collector and auto load balancing data to your indexers. Please note that you must use a heavy forwarder as the HEC does not run on a Universal Forwarder.

    
    upstream hec {
        # Our web server, listening for SSL traffic
        # Note the web server will expect traffic
        # at this xip.io "domain", just for our
        # example here
	keepalive 32;

        server splunk1:8088;
        server splunk2:8088;
    }

Now let’s put it all together in a working nginx.conf

# Tune this depending on your resources
# See the Nginx docs
worker_processes  auto;

events {
    # Tune this depending on your resources
    # See the Nginx docs
    worker_connections  1024;
}


http {
    upstream hec {
        # Our web server, listening for SSL traffic
        # Note the web server will expect traffic
        # at this xip.io "domain", just for our
        # example here
	keepalive 32;

        server splunk1:8088;
        server splunk2:8088;
    }

    server {
        # Enable SSL for default HEC port 8088
        listen 8088 ssl;

        # Configure Default Splunk Certificate. 
        # Private key is included in server.pem so use it in both settings.
	ssl_certificate     server.pem;
    	ssl_certificate_key server.pem;		

	location / {
            # HEC supports HTTP Keepalive so let's use it
	    # Default is HTTP/1, keepalive is only enabled in HTTP/1.1
  	    proxy_http_version 1.1;

  	    # Remove the Connection header if the client sends it,
  	    # it could be "close" to close a keepalive connection
  	    proxy_set_header Connection "";

            # Proxy requests to HEC
            proxy_pass https://hec/services/collector;
	}
    }
}

When you start Nginx you will be prompted to enter the PEM passphrase for the SSL certificate. The password for the default Splunk SSL certificate is password.

There are a bunch of settings you may want to tweak including HTTPS Server Optimization, load balancing method, session persistence, weighted load balancing and health checks.

I’ll leave those settings for you to research and implement as I’m not an expert on them all and everyone’s deployment will differ in complexity and underlying resources.

Hopefully this gives you the foundation for a reliable load balancer for your distributed HTTP Event Collector deployment.

↧

Configuring Nginx With Splunk, REST API & SDK Compatibility

May 25, 2016, 12:31 pm

≫ Next: Eureka! Extracting key-value pairs from JSON fields

≪ Previous: Configuring Nginx Load Balancer For The HTTP Event Collector

Last year I posted an article on how to configure HAProxy with Splunk, REST API & SDK compatibility. Yesterday, I posted an article on how to configure Nginx as a load balancer in front of a tier of HTTP Event Collectors. Today, I want to iterate on the work I did yesterday and show a basic config for Nginx that’s compatible with Splunk, the REST API and SDK’s.

You’re going to need to build or install a version of Nginx that enables HTTPS support for an HTTP server.

./configure --with-http_ssl_module

If you install from source and don’t change the prefix then you’ll have everything installed in /usr/local/nginx. The rest of the article will assume this is the install path for Nginx.

The following configuration assumes you’ve copied server.pem to /usr/local/nginx/conf.

    server {
        listen 8089 ssl;
        listen 8000;

        ssl_certificate     server.pem;
        ssl_certificate_key server.pem;

        location / {
            proxy_pass http://splunkweb;
        }

        location /services {
            proxy_pass https://splunkrest;

        }
    }

Next we’ll configure the upstream servers. If you’re using the open source version of Nginx you’ll need to use the IP Hash method for session persistence. If you’re using the commercial version Nginx Plus, you have more options for session persistence methods. Add as many servers as you have to each of the upstream blocks. I used two to illustrate that you can add N servers.

    upstream splunkweb {
        ip_hash;
        server splunk-server-1:8000;
        server splunk-server-2:8000;
    }

    upstream splunkrest {
        ip_hash;
        server splunk-server-1:8089;
        server splunk-server-2:8089;
    }

Now let’s put it all together in a working nginx.conf

worker_processes  auto;

events {
    worker_connections  1024;
}


http {
    upstream splunkweb {
        ip_hash;
        server splunk-server-1:8000;
        server splunk-server-2:8000;
    }

    upstream splunkrest {
        ip_hash;
        server splunk-server-1:8089;
        server splunk-server-2:8089;
    }

    server {
        listen 8089 ssl;
        listen 8000;

        ssl_certificate     server.pem;
        ssl_certificate_key server.pem;

        location / {
            proxy_pass http://splunkweb;
        }

        location /services {
            proxy_pass https://splunkrest;

        }
    }
}

When you start Nginx you will be prompted to enter the PEM passphrase for the SSL certificate. The password for the default Splunk SSL certificate is password.

There are a bunch of settings you may want to tweak including HTTPS Server Optimization, load balancing method, weighted load balancing and health checks.

I’ll leave those settings for you to research and implement as I’m not an expert on them all and everyone’s deployment will differ in complexity and underlying resources.

Hopefully this gives you the foundation for a reliable load balancer to use with Splunk, the REST API and SDK’s.

↧

Eureka! Extracting key-value pairs from JSON fields

June 28, 2016, 8:48 pm

≫ Next: Splunking a Microsoft Word document for metadata and content analysis

≪ Previous: Configuring Nginx With Splunk, REST API & SDK Compatibility

With the rise of HEC (and with our new Splunk logging driver), we’re seeing more and more of you, our beloved Splunk customers, pushing JSON over the wire to your Splunk instances. One common question we’re hearing you ask, how can key-value pairs be extracted from fields within the JSON? For example imagine you send an event like this:

{"event":{"name":"test", "payload":"foo=bar\r\nbar=\"bar bar\"\tboo.baz=boo.baz.baz"}}

This event has two fields, name and payload. Looking at the payload field however you can see that it has additional fields that are within as key-value pairs. Splunk will automatically extract name and payload, but it will not further look at payload to extract fields that are within. That is, not unless we tell it to.

Field Extractions to the rescue

Splunk allows you to specify additional field extractions at index or search time which can extract fields from the raw payload of an event (_raw). Thanks to its powerful support for regexes, we can use some regex FU (kudos to Dritan Btincka for the help here on an ultra compact regex!) to extract KVPs from the “payload” specified above.

Setup

To specify the extractions, we will define a new sourcetype httpevent_kvp in %SPLUNK_HOME%/etc/system/local/props.conf by adding the entries below. This regex uses negated character classes to specify the key and values to match on. If you are not a regex guru, that last statement might have made you pop a blood vessel :-)

[httpevent_kvp]
KV_MODE=json
EXTRACT-KVPS = (?:\\[rnt]|:")(?<_KEY_1>[^="\\]+)=(?:\\")?(?<_VAL_1>[^="\\]+)

Next configure your HEC token to use the sourcetype of httpevent_kvp, alternatively you can also set sourcetype in your JSON when you send you event.

Restart your Splunk instance, and you ready to test.

Testing it out

We’ll use curl to test if the new sourcetype is working.

curl -k https://localhost:8088/services/collector -H 'Authorization: Splunk 
16229CD8-BB6B-449E-BA84-86F9232AC3BC' -d '{"event":{"name":"test",
"payload":"foo=bar\r\nbar=\"bar bar\"\tboo.baz=boo.baz.baz"}}'

Heading to Splunk, we can see that the foo, bar and boo.baz fields were properly extracted as interesting fields.

Kvp interesting fields

Now heading to “All Fields” we can select each of the new fields.

Kvp select fields

And then see the values magically show up!

Kvp fields

Considerations:

There’s a few things to consider when using this approach.

It does have a cost and is not free. The extractions will run for all searches against the sourcetype that has the extractions defined for it. You’ll want to measure the performance impact to ensure the degradation is acceptable for your patterns of querying. You may be able to further refine these regexes further to limit the amount of matching which will help. Alternatively, using index time extractions will minimize search time impact but will slow down data ingest and also will increase the storage / license hit.
The regexes included here may or may not work based on your payloads and it may have to be tweaked. You’ll need to test to ensure that fields are properly being extracted. For example with the current regex if a key is sent like ” foo” with a leading space, after the quote, Splunk will extract the field name with the leading space.
The approach is brittle as it depends on clients sending data in a format that is compatible with the regexes. You may have to tweak the regexes over time if the format changes or new types of data appear. Unless you have a good idea of the kind of data that is being sent, it may not work for you.

In short, make sure you test.

Summary

Using this approach provides a way to allow you to extract KVPs residing within the values of your JSON fields. This is useful when using our Docker Log driver, and for general cases where you are sending JSON to Splunk.

In the future, hopefully we will support extracting from field values out of the box, in the meanwhile this may work for you.

Note: Special thanks to Martin Müller who provided tweaks to the regexes to improve performance and for his suggestions in the considerations section.

↧

Splunking a Microsoft Word document for metadata and content analysis

June 30, 2016, 7:47 am

≫ Next: Docker? Amazon ECS? Splunk? How they now all seamlessly work together

≪ Previous: Eureka! Extracting key-value pairs from JSON fields

The Big Data ecosystem is nowadays often abbreviated with ‘V’s. The 3Vs of Big Data, or the 4Vs of Big Data, even the 5Vs of Big Data! However many ‘V’s are used, two are always dedicated to Volume and Variety.

Recent news provides particularly rich examples with one being the Panama Papers. As explained by Wikipedia:

The Panama Papers are a leaked set of 11.5 million confidential documents that provide detailed information about more than 214,000 offshore companies listed by the Panamanian corporate service provider Mossack Fonseca. The documents […] totaled 2.6 terabytes of data.

This leak illustrates the following pretty well:

The need to process huge volume of data (2.6 TB of data in that particular case)
The need to process different kind of data (Emails, databases dumps, PDF documents, Word documents, etc).

So, let’s see what we could do to Splunk a Word document!

A Word document is a Zip file!

As illustrated by the results of the Linux file command, a Word document is a Zip archive.

# file document.docx document.docx: Zip archive data, at least v2.0 to extract #

Splunk being able to uncompress Zip files to read the logs it contains, let see what happen if we try to Splunk a Word document “as this”.

Pretty ugly. Unfortunately, Splunk 6.4 will only provide ineligible results as illustrated by the above screenshot because it cannot index a Word document without prior preprocessing.

Word document format

XML representation of Word documents was introduced by Microsoft with Word 2003, and it evolved to a multiple files representation since then (aggregated under the now familiar .docx extension). As a result of not losing any functionality of moving from a binary to a XML representation, the produced XML files could be intimidating as they contain a lot of information that is not related to the actual content of the file, but to the presentation of such content.

A Microsoft Word 2007 file format consists of a compressed ZIP file, called a package, which contains three major components:

Part items, the actual files
Content type items, the description of each part item (ex: file XYZ is an image/png)
Relationship items, which describes how everything fit together.

Readers expecting a complete and precise description of the format of a Word 2007 document are invited to go through the Walkthrough of Word 2007 XML Format from Microsoft.

Uncompress & Index

After using the regular unzip command to extract the files from the docx package into a directory named “document”, the listing of the files is as follow:

# find document/ -type f | sort document/[Content_Types].xml document/customXml/item1.xml document/customXml/itemProps1.xml document/customXml/_rels/item1.xml.rels document/docProps/app.xml document/docProps/core.xml document/docProps/thumbnail.jpeg document/_rels/.rels document/word/document.xml document/word/fontTable.xml document/word/media/image1.emf document/word/media/image2.emf document/word/media/image3.emf document/word/media/image4.png document/word/media/image5.png document/word/media/image6.png document/word/media/image7.png document/word/numbering.xml document/word/_rels/document.xml.rels document/word/settings.xml document/word/stylesWithEffects.xml document/word/styles.xml document/word/theme/theme1.xml document/word/webSettings.xml #

As we can see, many files are XML, so flat ASCII files that Splunk can ingest. To ingest that directory, a custom sourcetype has been created with the property TRUNCATE set to false (props.conf):

TRUNCATE = 0

The TRUNCATE option is required to make sure Splunk completely index all the files (except the binary ones like images; see option NO_BINARY_CHECK for that).

After ingesting the whole directory, here is how one event looks into Splunk

The resulting events are more user friendly, but not really operationally exploitable yet.

Content Types

At the root of our document directory, the file [Content_Types].xml contains the content types specifications. As this is a flat XML files, we can parse it with Splunk spath command to visualize what kind of content we have into our Word document as illustrated by the following screenshot. In that example, we have two kinds of data: XML files, and images.

The MSDN walkthrough details the construction of that file:

A typical content type begins with the word application and is followed by the vendor name.
The word vendor is abbreviated to vnd.
All content types that are specific to Word begin with application/vnd.ms-word.
If a content type is a XML file, then the URI ends with +xml. Other non-XML content types, such as images, do not have this addition.
etc…

So, using regular Splunk-fu, we can parse our content type file to have access to more useable fields:

The search is detailed hereafter:
source="*[Content_Types].xml" | spath input=_raw | rename Types.Override{@ContentType} AS ContentType Types.Override{@PartName} AS PartName | fields PartName ContentType | eval data = mvzip(ContentType, PartName) | mvexpand data | eval tmp = split(data, “,") | eval ContentType = mvindex(tmp, 0) | eval PartName = mvindex(tmp, 1) | eval tmp=split(ContentType, “/“) | eval family_type=mvindex(tmp,0) | eval part2=substr(ContentType,len(family_type)+2) | rex field=part2 “vnd\.(?<vendor>[^.$]+)" | eval part3=substr(part2, len(vendor)+6) | eval isXML = if(match(part3, "\+xml$"),"Yes", "No") | eval filetype = if(match(part3, "\+xml$"),substr(part3, 0, len(part3)-4), part3) | table PartName family_type vendor isXML filetype ContentType | sort PartName

Document Properties (Word metadata)

Two very interesting files exist within a Word 2007 package: core.xml and app.xml from the docProps directory. A simple parsing using Splunk command spath can give us insights into the author of the document, the creation time, the modified time, the number of pages composing the document, the system on which the document was created, the number of characters, etc.

core.xml

app.xml

Revision IDs (RSID)

To dive more on the actual content of such file, one key mechanism to understand about Word documents is revision identifiers (rsids). It’s very well explained here:

Every time a document is opened and edited a unique ID is generated, and any edits that are made get labeled with that ID. This doesn’t track who made the edits, or what date they were made, but it does allow you to see what was done in a unique session. The list of RSIDS is stored at the top of the document, and then every piece of text is labeled with the RSID from the session that text was entered.

Practically speaking, this leads to such thing:

It is to notate here that the sentence in the analyzed Word document was “When a notable event is raised, a security analyst needs […] or identities. This manual task […]”.

Clearly, a lot of noise surrounds the real content of the document (this “noise” is required on purpose, but that level of details in our case isn’t appropriate because we just want to have access to the words composing the document).

Accessing the content of the Word document

As the content is actually XML, it could be parsed the same way as the previous files with the Splunk spath command.

The problem with this method is that firstly some words or sentences are cut in the middle, and we also need to know the exact path in the XML tree (here, <w:p><w:r><w:t> under the root <w:document><w:body>)

However, we know for sure that the actual content for the file will be within the boundaries <w:body>. The idea becomes then to extract the content within those boundaries, and remove the XML tags.

The Splunk search is presented hereafter. The result is one field containing the whole content of the file as illustrated above.
source="*/document.xml" | rex field=_raw "\<w:body\>(?<wbody>.+)\</w:body\>" | fields wbody | rex field=wbody mode=sed "s/\<[^>]+\>/ /g" | table wbody
That’s more practicable, but what about searching for a term within that document, which is basically contained into one single field?

One trick could be to split the field into multiple fields based on the punctuation. The output is similar on the first approach with the spath command, the big difference being that words are not cut in the middle!

source="*/document.xml" | rex field=_raw "\<w:body\>(?<wbody>.+)\</w:body\>" | fields wbody | rex field=wbody mode=sed "s/\<[^>]+\>/ /g" | rex field=wbody mode=sed "s/[[:punct:]]/#/g" | eval wb = split(wbody, "#") | mvexpand wb | table wb
From there, we can easily search for simple terms by appending the following to the above search:
| search wb = “*notable*”
In this example, the word “notable” will be searched across the entire document.

Conclusion

This article is only scratching the surface of the Microsoft Word 2007 format (now a worldwide standard under the references ECMA-376 and ISO/IEC29500), and does not cover core components like relationship items for example. While it is technically possible to Splunk Word documents, that’s not an easy task and operationally limited as illustrated above.

Now, one question remains, what are your use cases around such feature? (-:

↧

Docker? Amazon ECS? Splunk? How they now all seamlessly work together

July 13, 2016, 12:00 pm

≫ Next: Sending binary data to Splunk and preprocessing it

≪ Previous: Splunking a Microsoft Word document for metadata and content analysis

Today the Amazon EC2 Container Service (ECS) team announced they have added the Splunk native logging driver to the newest version of the ECS agent. This means it’s now easier to implement a comprehensive monitoring solution for running your containers at scale. At Splunk, we’re incredibly excited about this integration because customers running containers in ECS can now receive all the benefits of the logging driver, like better data classification & searching, support for flexible RBAC, and easy and scalable data collection built on top of the Splunk HTTP Event Collector (HEC).

The following is a guest blog post by David Potes, AWS Solutions Architect:

Monitoring containers has been somewhat of a challenge in the past, but the ECS team has been hard at work making it easy to integrate your container logs and metrics into key monitoring ecosystems. Recently, they have added native logging to Splunk in the latest version of the ECS agent. In this article, we’ll look at how to get this up and running and present a few examples of how to get greater insight into your Docker containers on ECS.

If you don’t already have Splunk, that’s OK! You can download a 60-day trial of Splunk, or sign up for a Splunk Cloud trial.

How It Works

Using EC2 Container Services (ECS)? The Splunk logging driver is now a supported option. You can set the Splunk logging driver in your Task Definition per container under the “Log configuration” section. All log messages will be sent to Splunk providing additional access control, using a more secure method, and providing additional data classification options for logs collected from your docker ecosystem.

Not Using ECS? No problem!

You can configure Splunk logging as the default logging driver by passing the correct options to the Docker daemon, or you can set it at runtime for a specific container.

The receiver will be the HTTP Event Collector (HEC), a highly scalable and secure engine built into Splunk 6.3.0 or later. Our traffic will be secured by both a security token and SSL encryption. One of the great things about HEC is that it’s simple to use with either Splunk Enterprise or Splunk Cloud. There’s no need to deploy a forwarder to gather data, since the logging driver handles all of this for you.

Setting Up the HTTP Event Collector

The first step is to set up the HEC and create a security token. In Splunk, select Settings > Data Inputs, and click on the “HTTP Event Collector” link where the configurations can be applied. For the full instructions please refer to our online docs.

Configuring your Docker Containers

First, make sure your ECS agent is up to date. Run the following to check your agent version:

curl -s 127.0.0.1:51678/v1/metadata | python -mjson.tool. Please refer to the following site for other options on how to check your ECS Container Agent version.

From an Amazon linux image, get the latest ECS version is simple. To update your ECS Container Agent, you can follow the instructions available here.

Configuring Splunk logging driver in EC2 Container Services (ECS)

You can setup your “Log configuration” options in your AWS Console for you EC2 Container Service. Under your Task Definition, specify a new “Log configuration” under your existing “Container Definition” under the “STORAGE AND LOGGING” section.

Set the “Log driver” option to splunk
Specify the following mandatory log options, for more details please reference the documentation
1. splunk-url
2. splunk-token
3. splunk-insecureskipverify – set to “true” – required if you don’t specify the certificate options (splunk-capath, splunk-caname)
Specify any other optional parameters (e.g., tag, labels, splunk-source, etc.)
Click on the “Update” button to update your configurations

Figure 2: Sample configuration of Log configuration

Here’s a sample JSON Log configuration for a Task Definition:

“logConfiguration”: {

“logDriver”: “splunk”,

“options”: {

“splunk-url”: “https://splunkhost:8088”,

“splunk-token”: “< your token>”,

“tag”: “{{.Name}}”,

“splunk-insecureskipverify”: “true”,

}
Configuring Splunk logging driver by overriding the docker daemon logging option

Now we will set our logging options in Docker daemon. We can set Splunk logging on a per-container basis or define it as a default in the Docker daemon settings. We will specify some additional details at runtime to be passed along with our JSON payloads to help identify the source data.

docker run –log-driver=splunk \

–log-opt splunk-token=<your token>\

–log-opt splunk-url=https://splunkhost:8088 \

–log-opt splunk-capath=/path/to/cert/cacert.pem \

–log-opt splunk-caname=SplunkServerDefaultCert

–log-opt tag=”{{.Name}}/{{.ID}}”

–log-opt source=mytestsystem

–log-opt index=test

–log-opt sourcetype=apache

–log-opt labels=location

–log-opt env=TEST

–env “TEST=false”

–label location=us-west-2

your/application

The splunk-token is the security token we generated when setting up the HEC.

The splunk-url is target address of your Splunk Cloud or Splunk Enterprise system.

The next two lines define the name of and the local path to the Splunk CA cert. If you would rather not deploy the CA to your systems, you set the splunk-insecureskipverify to true is required if you don’t specify the certificate options (splunk-capath, splunk-caname), though it does reduce the security of your configuration.

The tag will add the name of the container and the full ID of the container. Using the ID option would only add the first 12 characters of the container.

We can send labels and env values, if these are specified in the container. If there is a collision between a label and env value, the env value will take precedence.

Optionally, but recommended, you can set the sourcetype, source and the target index for your Splunk implementation.

Now that we have started up the container with Splunk logging options, we should be able to see events populate shortly after the container is running into our Splunk searches. Using the default sourcetype, and if you set the options as in the example above, you can use the following search to see your data: sourcetype=httpevent

Here’s a sample of a container log message logged by the splunk logging driver:

And there you have it. Container monitoring can bring additional complexity to your infrastructure, but it doesn’t have to bring complexity to your job. It’s that easy to configure Splunk to monitor your docker containers on ECS and in your AWS infrastructure.

Thanks,
David Potes
AWS Solutions Architect

Follow @davidpotes

Follow @awscloud

Follow @splunk

↧

Sending binary data to Splunk and preprocessing it

July 28, 2016, 4:44 pm

≫ Next: Send data to Splunk via an authenticated TCP Input

≪ Previous: Docker? Amazon ECS? Splunk? How they now all seamlessly work together

A while ago I released an App on Splunkbase called Protocol Data Inputs (PDI) that allows you to send text or binary data to Splunk via many different protocols and dynamically apply pre processors to act on this data prior to indexing in Splunk. You can read more about it here.

I thought I’d just share this interesting use case that I was fiddling around with today. What if I wanted to send compressed data (which is a binary payload) to Splunk and index it ? Well , this is very trivial to accomplish with PDI.

Choose your protocol and binary data payload

PDI supports many different protocols , but for the purposes of this example I just rolled a dice and chose HTTP POST. I could have chosen raw TCP, SockJS or WebSockets and the steps in this blog for handling the binary data are the same.

Likewise for the binary payload. I just chose compressed Gzip data(could have chosen another compression algorithm) because more people can likely relate for the purposes of an example blog rather than using an example of an industry proprietary binary protocol like ISO8583 (financial services) or MATIP(aviation) or binary data encodings such as Avro or ProtoBuf.

Note , Splunk’s HTTP Event collector can also support a Gzip payload.

Setup a PDI stanza to listen for HTTP POST requests.

PDI has many options , but for this simple example you only need to choose the protocol and a port number.

Declare the custom handler to apply to the received compressed data (a binary payload).

You can see this above in the Custom Data Handler section. I’ve bundled this custom handler in with the PDI v1.2 release for convenience.Here is the source if you are interested. Handlers can be written in numerous JVM languages and then applied by simply declaring them in your PDI stanza as above and putting the code in the protocol_ta/bin/datahandlers directory, there are more template examples here.

The GZipHandler will intercept the compressed binary payload and decompress it into text for indexing in Splunk.

Send some test data to Splunk.

I just wrote a simple Python script to HTTP POST a compressed payload to Splunk.

Search for the data in Splunk.

Voila !

I hope this simple example can get you thinking about unleashing all that valuable binary data you have and sending it to Splunk.

↧

Send data to Splunk via an authenticated TCP Input

July 29, 2016, 9:07 pm

≫ Next: Secure Splunk Web in Five Minutes Using Let’s Encrypt

≪ Previous: Sending binary data to Splunk and preprocessing it

Wow , my second blog in 24 hrs about Protocol Data Inputs(PDI) , but sometimes you just infected with ideas and have to roll with it.

So my latest headbump is about sending text or binary data to Splunk over raw TCP and authenticating access to that TCP input.Simple to accomplish with PDI.

Setup a PDI stanza to listen for TCP requests

PDI has many options , but for this simple example you only need to choose the protocol(TCP) and a port number.

Declare a custom handler to authenticate the received data

You can see this above in the Custom Data Handler section.I have declared the handler and the authentication token that the handler should use via a JSON properties string that gets passed to the handler when everything instantiates.This JSON properties string can be any format that you want because your custom data handler that you code will have the logic for processing it.

The approach I used for the authentication is deliberately trivial , but it’s just an example :

1) received data is expected to be in the format : token=yourtoken,body=somedata
2) data is received and token is checked. If token matches , data from the body field is indexed , otherwise the data is dropped and an error is logged.

Here is the source if you are interested.

Handlers can be written in numerous JVM languages and then applied by simply declaring them in your PDI stanza as above and putting the code in the protocol_ta/bin/datahandlers directory, there are more template examples here.

Send some test data to Splunk

I just wrote a simple Python script to send some data to Splunk over raw TCP in the payload format that the authentication handler is expecting.

Search for the data in Splunk

If the token authentication fails , the data is dropped and an error is logged in Splunk.

And that’s it. Pretty simple to roll your own token auth handler and make your TCP inputs that much more secure.

Note : TCP was used for this example , but this exact same handler will work with any of the PDI protocol options , just choose another protocol and you’re off to the races !

↧

Secure Splunk Web in Five Minutes Using Let’s Encrypt

August 12, 2016, 10:00 am

≫ Next: Handling HTTP Event Collector (HEC) Content-Length too large errors without pulling your hair out

≪ Previous: Send data to Splunk via an authenticated TCP Input

Configuring SSL for your public facing Splunk instance is time-consuming, expensive and essential in today’s digital environment. Whether you choose to go with a cloud provider or self-hosting; RTFM-ing how to generate the keys correctly and configuring how Splunk should use them can be quite confusing. Last year, a new certificate authority Let’s Encrypt was born in an effort to streamline the CA process and make SSL encryption more widely available to users (The service is FREE). In this short tutorial, we will cover how to make use of this new CA to secure your Splunk instance and stop using self-signed certs. Using SSL will help you to secure your Splunk instance against MITM attacks. Let’s Encrypt utilizes all of the SSL best practices with none of the frustration.

The only requirements for this five-minute tutorial are:

Root/Sudo Access to the server running Splunk Web
Ownership of a publicly accessible domain name
Internet connectivity for the Splunk server

Configure the domain

One important requirement is for the publicly accessible domain to have an A record associated with the host you are creating a cert for. Additionally the @ record must also route to a publicly accessible server.

Example DNS Settings for AnthonyTellez.com:

Install Certbot & Generate Certs

Thanks to EFF there is an easy way to automate the cert process using Certbot.
You can find the exact instructions for getting it installed on your flavor of Linux here: https://certbot.eff.org/
From the drop down you want to select “none of the above” and the operating system you are using.
For this example, we are going to be using Ubuntu 16.04 (Xenial).

Install Certbot on the Splunk server you wish to secure with SSL using: sudo apt-get install letsencrypt

Once installed, use the following command line options for certbot, substituting your domain & subdomain.
$ letsencrypt certonly --standalone -d anthonytellez.com -d splunk-es.anthonytellez.com

At the prompt, fill out your information for key recovery and agree to the TOS.

On successful completion, you should see the following message:

Take note of the expiration date, you can renew whenever you need to.

Configure Splunkweb

Take a quick peek in /etc/letsencrypt/live/

root@splunk-es:~# cd /etc/letsencrypt/live/anthonytellez.com/ root@splunk-es:/etc/letsencrypt/live/anthonytellez.com# ls cert.pem chain.pem fullchain.pem privkey.pem

You will see four .pem files, you only need to copy two which are needed for Splunk web SSL (fullchain.pem & privkey.pem). The quickest way to get Splunk configured and remember is to create a directory in /opt/splunk/etc/auth/ In my case, I created a directory using the domain name to keep things simple and memorable.

mkdir /opt/splunk/etc/auth/anthonytellez cp fullchain.pem privkey.pem /opt/splunk/etc/auth/anthonytellez/ chown -R splunk:splunk /opt/splunk/

Configure Splunk web to make use of the certs in $SPLUNK_HOME/etc/system/local/web.conf:

[settings] enableSplunkWebSSL = 1 privKeyPath = etc/auth/anthonytellez/privkey.pem caCertPath = /opt/splunk/etc/auth/anthonytellez/fullchain.pem

Restart Splunk using: ./splunk restart and direct your browser to the https version of Splunk web.

In our example the URL would be: https://splunk-es.anthonytellez.com:8000

If you need additional examples, take a peek at docs.splunk.com: Configure Splunk Web to use the key and certificate files.

↧

Handling HTTP Event Collector (HEC) Content-Length too large errors without pulling your hair out

August 12, 2016, 3:03 pm

≫ Next: Android ANR troubleshooting with MINT

≪ Previous: Secure Splunk Web in Five Minutes Using Let’s Encrypt

Once you start using HEC, you want to send it more and more data, as you do your payloads are going to increase in size, especially if you start batching. Unfortunately as soon as you exceed a request payload size of close to 1MB (for example if you use our Akamai app or send events from AWS Lambda) you’ll get an error status 413, with a not so friendly error message:

“Content-Length of XXXXX too large (maximum is 1000000) “

At this point you might feel tempted to pull your hair out, but fortunately you have options. The reason you are hitting this error is because HEC has a pre-defined limit on the maximum content length for the request. Fortunately this limit is configurable via limits.conf.

If you look in $SPLUNK_HOME$/etc/system/default/limits.conf you’ll see the following:

# The max request content length.
max_content_length = 1000000

All you need to do is up that limit in /etc/system/local/limits.conf and restart your Splunk instance and you’ll be good to go. If you are hosted in Splunk Cloud, our support folks will be more than happy to take care of it for you.

As a side note, we’ll be upping this default in our next release to 800MB, so that you are never bothered by this error again.

↧

Android ANR troubleshooting with MINT

August 18, 2016, 10:39 am

≫ Next: Tracing Objective-C Methods

≪ Previous: Handling HTTP Event Collector (HEC) Content-Length too large errors without pulling your hair out

Being involved with shippable software for mobile and desktop, I realize that there is a class of problems that are not easy to troubleshoot.

Crashes are probably the easiest to reproduce in QA and Engineering environments and so they are easier to fix. But one class of problems, that in many cases requires more time and possible code redesign, is application sluggishness. This problem usually falls into the gray area of software development that everybody tries to address during design and implementation stages. The problem of application sluggishness seldom shows up in QA or other controller environments, but always happens when the actual user is trying to use the app.

Modern mobile apps are complex creatures. A lot of things are happening as a result of user input or internal processes in the background that are also trying to update the UI. Apps can also issue many backend calls to keep the UI up to date.

We all like a smooth UI experience with our apps. Android addresses UI issues by implementing an Application Not Responding (ANR) mechanism, which forcefully terminates non-responding apps. The timeout is enforced by the system and the data is available in the LogCat.

In the 5.1 release of the Splunk MINT SDK for Android, we’ve given you a way to monitor and troubleshoot your app’s ANR issues. Just opt-in for ANR monitoring for your app by calling:

Mint.startANRMonitoring(5000/*timeout*/, true/*ignoreDebugger*/);

ANR events will then be available in Splunk Enterprise. Run this search to view them:

sourcetype="mint:error" "extraData.ANR"=true

Example:

Please note that the stacktrace field in the event should be interpreted as a thread dump of your application threads (see the link to the documentation and example below).

Our monitoring feature will help you to identify common problem of ANR, such as application deadlocks and unexpectedly long-running or stalled HTTP requests.

Additional Reading:

Splunk MINT SDK for Android documentation: http://docs.splunk.com/Documentation/MintAndroidSDK/5.1.x/DevGuide/ReportANRs
Keeping Your App Responsive: https://developer.android.com/training/articles/perf-anr.html

↧

Tracing Objective-C Methods

August 18, 2016, 2:35 pm

≫ Next: iOS Memory Warnings

≪ Previous: Android ANR troubleshooting with MINT

You can write very fast programs in Objective-C, but you can also write very slow ones. Performance isn’t a characteristic of a language but of a language implementation, and more importantly, of the programs written in that language. Performance optimization requires that you measure the time to perform a task, then try algorithm and coding changes to make the task faster.

The most important performance issue is the quality of the libraries used in developing applications. Good quality libraries reduce the performance impact. So to help you improve performance in your apps, we’ve updated the Splunk MINT SDK for iOS to provide an easy way to trace a method performance using MACROS.

To trace an Objective-C method, add the MINT_METHOD_TRACE_START macro to the beginning of your method and the MINT_METHOD_TRACE_STOP macro to the end of it.

For example:

- (void)anyMethod {
    MINT_METHOD_TRACE_START
    ...
    MINT_METHOD_TRACE_STOP
}

If you are not using ARC, use the MINT_NONARC_METHOD_TRACE_STOP macro to avoid a memory leak issue.

The trace method automatically picks up performance metrics for your method and sends it to Splunk. The trace report contains following fields:

method
elapsedTime
threadID

To view the event information, run following search in Splunk:

index=mint sourcetype=mint:methodinvocation

Here is an example event:

{
    apiKey: 6d8c9a39
    appEnvironment: Staging
    appRunningState: Background
    appVersionCode: 1
    appVersionName: 3.1
    batteryLevel: -100
    carrier: NA
    connection: WIFI
    currentView: MainViewController
    device: iPad5,3
    elapsedTime: 1105708
    extraData: {
    }
    locale: GB
    method: -[MainViewController mintMeta]
    msFromStart: 1450
    osVersion: 9.2.1
    packageName: WhiteHouse
    platform: iOS
    remoteIP: 185.75.2.2
    screenOrientation: Portrait
    sdkVersion: 5.1.0
    session_id: E9F4BE3D-0CEB-4461-9442-145101E5EE67
    state: CONNECTED
    threadID: 10759
    transactions: [
    ]
    userIdentifier: XXXXXXXX
    uuid: XXXXXXXX
}

↧

iOS Memory Warnings

August 18, 2016, 2:53 pm

≫ Next: #splunkconf16 preview: IT Operations Track – Choose your own adventure!

≪ Previous: Tracing Objective-C Methods

Memory on mobile devices is a shared resource, and apps that manage memory improperly run out of memory and crash. iOS manages the memory footprint of an application by controlling the lifetime of all objects using object ownership, which is part of the compiler and runtime feature called Automatic Reference Counting (ARC). When you start interacting with an object, you’re said to own that object, which means that it’s guaranteed to exist as long as you’re using it. When you’re done with the object, you relinquish ownership and if the object has no other owners, the OS destroys the object and frees up the memory. Not relinquishing ownership of an object causes memory to leak and the app to crash. ARC takes away much of the pain of memory management, but you still need to be careful with the retain cycle, global data structures and lower-level classes that don’t support ARC.

A memory warning is a signal that is sent to your app when it leaks. If the app terminates because of a memory leak, the app won’t generate a crash report. Because of that, you might not be able to find and fix the leak in your production app unless you already implemented the memory warning delegate to free up memory in the ViewController class.

To help you manage memory, the Splunk MINT SDK for iOS has a memory warning feature that collects the memory footprint and the class that received the memory warning. When an app terminates but doesn’t send a crash report, that means the app received a memory warning and sent the memory footprint to Splunk. So, go check your MINT data in Splunk Enterprise for recent memory warnings, which might help you fix memory issues in your mobile apps.

Splunk MINT SDK for IOS automatically starts to monitor for memory warnings on initialization. There is no need to do anything extra.

The memory warning information contains the following fields:
• className
• totalMemory
• usedMemory
• wiredMemory
• activeMemory
• inactiveMemory
• freeMemory
• purgableMemory

To view memory information, run a search in Splunk Web for the mint:memorywarning sourcetype, for example:

index=mint sourcetype=mint:memorywarning

Here is an example event:

{
    activeMemory: 9118
    apiKey: 12345
    appEnvironment: Testing
    appRunningState: Foreground
    appVersionCode: 1
    appVersionName: 1.0
    batteryLevel: -100
    carrier: NA
    className: LoginViewController
    connection: WIFI
    currentView: LoginViewController
    device: x86_64
    extraData: {
    }
    freeMemory: 3040
    inactiveMemory: 1511
    locale: US
    message: Received memory warning
    msFromStart: 4334
    osVersion: 9.3
    packageName: SplunkTests
    platform: iOS
    purgableMemory: 210
    remoteIP: 204.107.141.240
    screenOrientation: Portrait
    sdkVersion: 5.0.0
    session_id: 1C048628-A709-44BC-9110-25069C7FC736
    state: CONNECTED
    totalMemory: 16384
    transactions: { [+]
    }
    usedMemory: 454
    userIdentifier: XXXXXXXX
    uuid: XXXXXXXX
    wiredMemory: 2112
}

To monitor memory warnings as they happen, create a real-time alert like this:
1. In Splunk Web, run this search: index=mint sourcetype=mint:memorywarning
2. Select Save As > Alert.
3. For Alert Type, click Real-time.
4. Click Add Actions to select an alert action.
5. Click Save.

↧

#splunkconf16 preview: IT Operations Track – Choose your own adventure!

September 7, 2016, 9:00 am

≫ Next: Splunk at ThingMonk 2016

≪ Previous: iOS Memory Warnings

Does anyone else remember the ‘choose your own adventure books’ from the 90s? I do, and this year’s #splunkconf16 has me almost as excited as getting a brand spankin’ new pile of books. Just kidding, 2016 user conference is going to be much, much better!

(No, this is not an ITSI Glass Table)

Splunk .conf2016 is coming up fast, and everyone on the Splunk team is excited to head down to the happiest place on earth for this year’s user conference. Check out some key details below about the great sessions that will be featured in the Splunk IT Operations track this year at .conf 2016. This year, we’ve made it easy for you by parsing the sessions into some easy-to-follow tracks. Session speakers will be covering everything from how to drive critical business decisions to maximizing operational efficiencies to Splunk for DevOps to smarter IT analytics with Splunk IT Service Intelligence. Below we sort through ~200 sessions to find a series to attend based on your interests. So go ahead, choose your own adventure!

ITSI Beginner:

For Customers who are new to our premium solution offering for IT professionals, IT Service Intelligence. These sessions will give you an overview of how you can leverage IT Service Intelligence in your organization to make better business decisions.

Introduction to Splunk IT Service Intelligence with Alok Bhide, Principal Product Manager, Splunk Inc. and David Millis, Staff Architect, IT Operations Analytics, Splunk Inc
- Tuesday, September 27, 2016 at 10:30am -11:15am AND Wednesday, September 28, 2016 at 1:10pm- 1:55pm
Earn a Seat at the Business Table with Splunk IT Service Intelligence with Erickson Delgado, Architect, Development Operations, Carnival Corporation and Juan Echeverry, Application Automation Engineer, Carnival Corporation, and Marc Franco, Manager, Web Operations, Carnival Corporation
- Tuesday, September 27, 2016 at 11:35am-12:20pm
How Anaplan Used Splunk Cloud and ITSI to Monitor Our Cloud Platform with Martin Hempstock, Monitoring and Metrics Architect, Anaplan
- Tuesday, September 27, 2016 at 3:15pm-4:00pm
Modernizing Enterprise Monitoring at the World Bank Group Using Splunk It Service Intelligence with Michael Makar, Sr Manager, Enterprise Monitoring, World Bank Group
- Tuesday, September 27, 2016 at 5:25pm-6:10pm
Splunk IT Service Intelligence: Keep Your Boss and Their Bosses Informed and Happy (and Still Have Time to Sleep at Night)! With Jonathan LeBaugh, ITOA Architect, Splunk
- Thursday, September 29, 2016 at 2:35pm-3:20pm

ITSI Advanced:

For customers who are familiar with our premium solution offering for IT Professionals, IT Service Intelligence. These sessions will go into greater detail into the why, what, and how to maximize the productivity of your current or future IT Service Intelligence deployment.

Machine learning and Anomaly Detection in Splunk IT Service Intelligence with Alex Cruise, Senior Dev. Manager/Architect, Splunk and Fred Zhang, Senior Data Scientist, Splunk
- Tuesday September 27, 2016 at 4:20pm- 5:05pm
An Ongoing Mission of Service Discovery with Michael Donnelly, ITOA Solutions Architect, Splunk and Ross Lazerowitz, Product Manager, Splunk
- Thursday, September 29, 2016 at 11:20am-12:05pm
Anatomy of a Successful Splunk IT Service Intelligence Deployment with Martin Wiser, ITOA Practitioner, Splunk
- Tuesday, September 27, 2016 at 12:40pm-1:25pm

IT Troubleshooting (and monitoring!):

For customers looking to learn more about Splunk for application management, Splunk to reduce costs and drive operational efficiencies, and how to get started with Splunk.

Splunk gone wild! Innovating a large Splunk solution at the speed of management with Kevin Dalian, Team Lead- Tools and Automation, Ford Motor Company and Glen Upreti, Professional Services Consultant, Sierra-Cedar
- Thursday, September 29, 2016 at 11:20am-12:05pm
How MD Anderson Cancer Center Uses Splunk to Deliver World Class Healthcare When Patients Need it the Most with Ed Gonzalez, Manager- Web Operations, MD Anderson Cancer Center, and Jeffrey Tacy, Senior Systems Analyst, MD Anderson Cancer Center
- Thursday, September 29, 2016 at 10:15am-11:00am
Splunking your Mobile Apps with Bill Emmett, Director, Solutions Marketing, Splunk, and Panagiotis Papadopoulos, Product Management Director, Splunk
- Thursday, September 29, 2016 at 12:25pm-1:10pm
Great, We Have Splunk at Yahoo!… Now What? With Dileep Eduri, Production Engineering, Yahoo and Indumathy Rajagopalan, Service Engineer, Yahoo and Francois Richard, Senior Engineering Director, Yahoo, and Tripati Kumar Subudhi, Senior DevOps, Yahoo
- Tuesday, September 27, 2016 at 11:35am-12:20pm
The Truthiness of Wire Data: Using Splunk App for Stream for Performance Monitoring with David Cavuto, Product Manager, Splunk
- Thursday, September 29, 2016 at 12:25pm-1:10pm

DevOps and Emerging Trends:

Check out these sessions to learn more about how you can leverage Splunk within your organization to move to continuous delivery and implement a DevOps culture shift.

Biz-PMO-Dev-QA-Sec-Build-Stage-Ops-Biz: Shared Metrics as a Forcing Function for End-to-End Enterprise Collaboration with Andi Mann, Chief Technology Advocate, Splunk Inc
- Wednesday, September 28, 2016 at 4:35pm-5:20pm
Splunks of War: Creating a better game development process through data analytics with Phil Cousins, Principal Software Engineer, The Coalition, Microsoft
- Tuesday, September 27, 2016 at 3:15pm-4:00pm
Puppet and Splunk: Better Together with CTO and Chief Architect, Puppet and Stela Udovicic, Senior Product Marketing Manager, Splunk
- Tuesday September 27, 2016 at 4:20pm-5:05pm
Splunking the User Experience: Going Beyond Application Logs with Doug Erkkila, PAS Capacity Management Analyst, CSAA Insurance Group
- Thursday, September 29, 2016 at 1:30pm-2:15pm
Data That Matters, A DevOps Expert Panel featuring Phil Cousins, Microsoft and Doug Erkkila, CSAA Insurance Group, and Deepak Giridharagopal, Puppet and Andi Mann, Splunk, and Sumit Nagal, Intuit, and Hal Rottenberg, Splunk
- Wednesday, September 28, 2016 at 1:10pm-1:55pm

Buttercup and pals in the Seattle office are pumped for .conf

On top of these awesome sessions we have lined up, we’ll have 3 days of Splunk University Training, 70 technology partners presenting, over 4,000 splunk enthusiasts, and the Splunk search party. It’s not too late to register for .conf2016 and head down to Disneyworld!

Follow all the conversations coming out of #splunkconf16!

Follow @splunkconf

Follow @splunk

↧

Splunk at ThingMonk 2016

September 8, 2016, 11:00 am

≫ Next: Talk to Splunk with Amazon Alexa

≪ Previous: #splunkconf16 preview: IT Operations Track – Choose your own adventure!

Hi everyone

I’m Duncan Turnbull and I am the technical lead for the Analytics and IoT practice team here at Splunk in Europe. This means I get to spend my time listening, explaining, showing and talking to organizations across EMEA about how to use their machine data to solve business problems and find the value from it by using Splunk’s software.

I’m delighted to be at Redmonk’s ThingMonk event this year at the Hack Day on day 0. I’ll be there to see what we can build on the day, build some cool things myself and showcase how to use all the data from these sensors. Last year we had Matt Davies and James Hodge from Splunk present and you can see what they got up to:

To help you get started with Splunk and using the data from all these connected things, feel free to sign up to the free trial of Splunk Cloud or download the free version of Splunk that will run on any respectable laptop.

If you want to bring in data via MQTT or HTTP we can help! Maybe you want to send alerts via Twilio when something happens, extending out Splunk’s platform. At another hackathon with Deutsche Bahn, Philipp Drieger, Robert Fujara and the team from Splunk were winners – analyzing data on rail and track performance. Target do robotics analytics to make their supply chain more efficient. Gatwick Airport use Splunk across the airport to capture IoT sensor data from all kinds of sources. What will you do with your data – hopefully we can find out at ThingMonk next week. I’ve got a workshop on Monday where we’ll be able to go through how to start using IoT and machine data in Splunk.

Looking forward to seeing you there.

Duncan Turnbull
EMEA Business Analytics & IoT Technical Lead
Splunk

Follow @d4rti

Follow all the conversations coming out of #thingmonk.

Follow @splunkuk

Follow @thingmonk

Follow @redmonk

↧

Talk to Splunk with Amazon Alexa

September 8, 2016, 4:40 pm

≫ Next: I can’t make my time range picker pick my time field.

≪ Previous: Splunk at ThingMonk 2016

What do you think the future experience of interacting with your data is going to be like ? Is it going to be logging in by way of a user interface and then using your mouse/keyboard/gestures to view and interact with something on a display panel , or is it going to be more like simply talking with another person ?

Introducing the “Talk to Splunk with Amazon Alexa” App

This is a Splunk App that enables your Splunk instance for interfacing with Amazon Alexa by way of a custom Alexa skill, thereby provisioning a Natural Language interface for Splunk.

You can then use an Alexa device such as Amazon’s Echo,Tap or Dot or another 3rd party hardware device to tell or ask Splunk anything you want.

Get answers to questions based off Splunk Searches
Ask for information , such as search command descriptions
Return static responses and audio file snippets
Developer extension hooks to plug in ANY custom voice driven requests and actions you want

The App also allows you to train your Splunk instance to the conversational vocabulary for your specific use case.

Vision

The ultimate vision I foresee here is a future where you can completely do away with your keyboard, mouse , monitor & login prompt.

Even right now there are use cases where having to look at a monitor or operate an input device are simply counter productive, infeasible or unsafe.Industrial operating environments immediately come to mind.

You should be able to be transparently & dynamically authenticated based on your voice signature and then simply converse with your data like how you would talk to another person… asking questions or requesting to perform some action.

This app is a step in the direction of this vision.

Go forth and talk !

This App is now available for download on Splunkbase.

Comprehensive usage documentation can be found here where the source code is also available.

I encourage feedback and collaboration particularly around creating custom dynamic actions to share with the community so we can organically grow the capabilities of this offering !

↧

I can’t make my time range picker pick my time field.

September 16, 2016, 8:08 am

≫ Next: Splunk your Google Analytics

≪ Previous: Talk to Splunk with Amazon Alexa

When you are working with Hadoop using Hunk or when you are working with Splunk and the time field you want to work with is not _time, you may want to use the time picker in a dashboard with some other time field. You may have the same problem when the current _time field is not the time field you want to use for the current search.

Here is a solution you might use to make time selections work in every case including in panels.

| inputlookup SampleData.csv
| eval _time=strptime(claim_filing_date,"%Y-%m-%d")
| sort - _time
| addinfo
| where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")

Let’s Break this search down into its parts.

| inputlookup SampleData.csv

This is an example of pulling in data directly from a .csv file. It behaves just like it would from one of your searches against a Hadoop file that has no _time value.

Add enough filters to the search so that you aren’t working with the entire data set. In Hadoop, this could be a serious situation leading to copying literally all of your data to a sort. Remember filter first > munge later. Get as specific as you can and then the search will run in the least amount of time. Do this by filtering data like this

index=myindex something=”thisOneThing” someThingElse=”thatThing” myTimeField=”06-26-2016”

| eval _time=strptime(claim_filing_date,"%Y-%m-%d")

This converts the date in “claim_filing_date” into epoch time and stores it in “_time”.

Learn to specify Date and Time variables here.

http://docs.splunk.com/Documentation/Splunk/6.4.3/SearchReference/Commontimeformatvariables

| sort - _time

This sorts all of the records by time since they weren’t in that order before.

| addinfo

This statement adds info_min_time and info_max_time fields which are the min and max of the new values for _time that you have. The statement is needed for the time control in reports and panels to make it work properly.

   | where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")

This is where the magic happens. Here we are filtering the results based on comparisons between your _time field and the time range you created with the time picker.

Notice that we also had to compare against “+infinity”. This is what Splunk uses for the info_max_time field when you select all time on the time picker. When you move this Splunk search to a Splunk panel, we will have to do a few more things before this works as you expect.

For the purpose of this demonstration, we need to format the output to make it easier to understand the results.

| eval Start_Time=strftime(info_min_time,"%m/%d/%y")
| eval Stop_Time=strftime(info_max_time,"%m/%d/%y")
| table claim_filing_date _time Start_Time info_min_time
        Stop_Time info_max_time "Provider Name"

That’s it, now you have a working search, try it with your data………

Next, you may want to put this into a Dashboard Panel which in this case would look like this.

Now lets add a time picker and a start button.

Finally there are two things we have to do with the panel to make it work so click edit source.

We are going to provide the time evaluation in our where clause so we don’t need the token=”field1″ statement. So remove this…….

Next, we need to remove the earliest and latest clauses from just after the query.

That’s it save the changes and enjoy the results.

One last thing. These statements could be added to a macro which you would call like this in your search.

`setsorttime(claim_filing_date, %Y-%m-%d)`

To do this, you define the macro, here is the link for building macros…

http://docs.splunk.com/Documentation/Splunk/6.4.3/Knowledge/Definesearchmacros

In the process, you will make these changes to macros.conf

[setsorttime]
 args = sortdatetime, datetimeformat
 definition = eval _time=strptime($sortdatetime$,"$datetimeformat$") 
     | sort _time 
     | addinfo 
     | where _time>=info_min_time AND (_time<=info_max_time OR info_max_time="+Infinity")

Then your search from above would look like this.

| inputlookup SampleData.csv | `setsorttime(claim_filing_date, %Y-%m-%d)` 
| eval Start_Time=strftime(info_min_time,"%m/%d/%y") 
| eval Stop_Time=strftime(info_max_time,"%m/%d/%y") 
| table claim_filing_date _time Start_Time info_min_time 
        Stop_Time info_max_time "Provider Name"

OK lets try this now…….

Good, you’re doing great. Now maybe you would like to add a Radio Button to allow you to pick the field you want to sort on.

In the panel editor, add a drop down and let’s give it a Label of “Pick a Date” and a Token Name of selected_date_field.

screenshot167

Now we can add a few fields to select from. Note that we are simply adding field names and a pretty description of each field.

Now we need to add the Radio Button variable to the search string. Click Edit Search like this.

You can see where I have inserted $selected_date_field$. This is the magic sauce which will choose the field to use for the Time Picker.

Also notice that I we added the new fields and values to the report to make it easier to understand what the Panel is doing.

So this should look something like your very own masterpiece.

That’s it, now you can sort on any time field you have and use it for the time pickin anytime and anywhere you want.

Use Knowledge Wisely.

SplunkYoda

If you feel the Force,? Send me a note.

↧

Splunk your Google Analytics

September 26, 2016, 5:00 am

≫ Next: Introducing AppInspect

≪ Previous: I can’t make my time range picker pick my time field.

Gain more insight into site performance and user activity by correlating Google Analytics data within Splunk.

A customer of mine recently wanted to understand more about the journey that retail consumers take when they arrive at its website. They recognized that consumers who have previously bought from the site will have more familiarity with the design and layout than those visiting the site for the first time. In addition, consumers who went directly to the site would have a greater brand engagement than those who were referred from an affiliate site.

If only we could implement a method to back up the data that gets submitted to Google Analytics, also sending it back to the local Apache web server logs and into Splunk.

Using the following change to the client side Google Analytics javascript code block already implemented on their site, we were able to start sending the Google Analytics payload back to the local site web server.

<script>

(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){

(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),

m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)

})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);

ga(‘create’, ‘UA-XXXXX-YY’, ‘auto’);

// START local backup of GA data request for Splunk

ga(function(tracker) {

var originalSendHitTask = tracker.get(‘sendHitTask’);

tracker.set(‘sendHitTask’, function(model) {

var payLoad = model.get(‘hitPayload’);

originalSendHitTask(model);

var gifRequest = new XMLHttpRequest();

// Send __ua.gif to the local server

var gifPath = “/__ua.gif”;

gifRequest.open(‘get’, gifPath + ‘?’ + payLoad, true);

gifRequest.send();

});

});

// END local backup of GA data request for Splunk

ga(‘send’, ‘pageview’);

</script>

The code snippet simply sends an XMLHttpRequest containing the payload to a 1×1 pixel .gif file uploaded to the local web server. The .gif file simply acts as an endpoint to receive the requests so they get logged locally.

This method captures all of the GA tracking information configured on a site and any additional client side information unavailable to standard server side web logs i.e. Screen Resolution, Viewable Screen Size, Screen Colour Depth & User Language.

Leveraging the Client ID generated by the Google Analytics library also allows the identification of users even before they are logged into a site, giving easily providing previously unknown information about user behavior.

Although this gathers the same data as Google Analytics there was a discrepancy in the numbers between the numbers returned by Splunk and those in the Google Analytics ad hoc dashboards. Further research revealed that Google Analytics performs data sampling to provide satisfactory preformance for ad-hoc reporting.

https://support.google.com/analytics/answer/1042498?hl=en

Splunk has to do neither (unless you want it to) and gives un-sampled statistics on visitor activity. Additionally, Splunk with this additional tracking information, gives a more complete view of user interaction for a single user across multiple devices even for multiple users behind a proxy.

So what are you waiting for? Splunk your Google Analytics data to enrich and correlate data from your users intreaction with your web site!

↧

Introducing AppInspect

September 28, 2016, 1:40 pm

≫ Next: Introducing Splunkbase Curated Experience

≪ Previous: Splunk your Google Analytics

Yesterday at .conf2016 we announced the general availability of Splunk AppInspect, the first static and dynamic analysis tool for Splunk apps. Built and used by the team that administers the Splunk App Certification program to speed the certification process, we’re now able to share it with developers who want the same insights into their apps, whether they plan to release them to Splunkbase or not.

“AppInspect has been invaluable in bringing Splunk certification testing into our automated build environment, helping us to create Splunk Apps that are ready for App Certification on the first upload to SplunkBase.” – Kyle Smith, Aplura, LLC

All developers want to get their work done faster, with fewer errors and less debugging. Splunk AppInspect makes that possible with a suite of over 165 individual checks in 36 different areas of a Splunk app.

AppInspect evaluates a Splunk app for:

Structure.
Feature set.
Security.
Readiness for Splunk certification .
Adherence to the guidelines for Splunk Cloud apps.

36 different technical areas are reviewed including:

Alert Actions
Configuration files
Custom search commands
Custom visualizations
Custom workflow actions
Data models
Directory structure
Deprecated files
Modular Inputs
Saved searches

Available as either a standalone tool that provides static analysis on a local machine, or through a RESTful API, providing both static and dynamic analysis. Splunk AppInspect is ready for all stages of the software development lifecycle, including automated unit testing, manual code reviews, and integration with continuous integration build systems.

Example – Extracting fields in transforms.conf’

Using transforms.conf to adjust data at index time is an essential tool of Splunk Apps, but anyone who has ever written a regular expression will tell you, it can tricky to get right. Let’s look at an example:
[field_eval] FORMAT = field_parent::$2 MV_ADD = 1 REGEX = ( *[\(,\+\-\/\/\*] *|^)([a-zA-Z_'\{\}][\w'\{\}\.]++)(?!(\(| [Aa][Ss])) SOURCE_KEY = conf_value

When we run the app through Splunk AppInspect we get the following failure message:

[ failure ] Check that all capture groups are used in transforms.conf. Groups not used for capturing should use the non-capture group syntax The format option in [field_eval] stanza of transforms.conf did not include $1, $3

In the REGEX there are two capture groups that have not been used. It is possible that the developer has done one of two things:

Forgotten to include the captured fields in their FORMAT string. If this is the case than the developer updates their FORMAT field.
Captured fields that are not needed. If this is the case then the developer converts the capture groups from
( *[\(,\+\-\/\/\*] *|^)

to
(?: *[\(,\+\-\/\/\*] *|^)

to use the non-capturing group format.

In either case it would be extremely time consuming to check each and every transform manually to confirm that all of the fields have been used. AppInspect accelerates this process to check in under a minute.

How to get it

We encourage you to download AppInspect, test it out, and see how your app does. View the documentation here, including an API reference. We’d love to hear from you, reach out to us at: appinspect@splunk.com

If you need help getting started with Splunk AppInspect you can email appinspect@splunk.com or ask on Splunk Answers with tag AppInspect.

↧