Hunk: Raw data to analytics in < 60 minutes

August 3, 2013, 5:15 pm

≪ Previous: Happy SysAdmin day! I need to Splunk my brain – does your organization need to?

Finally, I got a bit of down time to sit down and get to the third part of the “Hunk: Splunk Analytics for Hadoop Intro” series of blogs, a follow up to part 1 and part 2

Summary of what we’ll do

1. Set up the environment
2. Configure Hunk
3. Analyze some data

So let’s get started ..

Minutes 0 – 20: Set up the environment

In order to get up an running with Hunk you’ll need the following software packages available/installed in the server running Hunk:
1. Hunk bits – you’ll be able to get these once you sign up for the Hunk Beta program
2. JAVA – at least version 1.6 (or whatever is required by the Hadoop client libraries)
3. Hadoop client libraries – you can get these from the Hadoop vendor that you’re using or if you’re using the Apache distro you can fetch them from here

Installing the Hunk bits is pretty straightforward:

#1. untar the package
> tar -xvf splunkbeta-6.0-<BUILD#>-Linux-x86_64.tgz
#2. start Splunk
> ./splunkbeta/bin/splunk start

Download and follow the instructions for installing/updating Java and the Hadoop client libraries and make sure you keep note of JAVA_HOME and HADOOP_HOME as we’ll need it in the next section.

Minutes 20 – 40: Configure Hunk

Configuring Hunk can be done either by (a) using our Manager UI interface by going to Settings > Virtual Indexes or (b) through editing a conf file, indexes.conf. Here I’ll walk you through editing of the indexes.conf (we are changing a few things in the setup UI and I don’t want this post to be out of date by the time you read it)

We are going to work with the following file:

$SPLUNK_HOME/etc/system/local/indexes.conf

First: we need to tell Hunk about the Hadoop cluster where the data resides and how to communicate with it – in Hunk terminology this would be an “External Results Provider” (ERPs). The following stanza shows an example of how we define a Hunk ERP.

[provider:hadoop-dev01]
# this exact setting is required
vix.family = hadoop

# location of the Hadoop client libraries and Java
vix.env.HADOOP_HOME = /opt/hadoop/hadoop-dev01
vix.env.JAVA_HOME = /opt/java/latest/

# job tracker and default file system
vix.fs.default.name = hdfs://hadoop-dev01-nn.splunk.com:8020
vix.mapred.job.tracker = hadoop-dev01-jt.splunk.com:8021

# uncomment this line if you're running Hadoop 2.0 with MRv1
#vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-h2.0.jar

vix.splunk.home.hdfs = /home/ledion/hunk
vix.splunk.setup.package = /opt/splunkbeta-6.0-171187-Linux-x86_64.tgz

Most of the above configs are self explanatory, however I will take a few lines to explain some of them:

[stanza name]
This must start with “provider:” in order for Hunk to treat it as an ERP, the rest of the string is the name of the provider, so feel free to get more creative than me

vix.splunk.home.hdfs
This is a path in HDFS (or whatever the default file system is) that you want this Hunk instance to use as it’s working directory (scratch space)

vix.splunk.setup.package
This is a path in the Hunk server where Hunk can find a Linux x86_64 Hunk package which will be shipped and used on the TaskTracker/DataNodes.

Second: we need to define a virtual index which will contain the data that we want to analyze. For this post I’m going to use Apache access log data which is partitioned by date and is stored in HDFS in a directory structure that looks like this:

/home/ledion/data/weblogs/20130628/access.log.gz
/home/ledion/data/weblogs/20130627/access.log.gz
/home/ledion/data/weblogs/20130626/access.log.gz
....

Now, let’s configure a virtual index (in the same indexes.conf file as above) that encapsulates this data

[hunk]
# name of the provider stanza we defined above
# without the "provider:" prefix
vix.provider = hadoop-dev01

# path to data that this virtual index encapsulates
vix.input.1.path = /home/ledion/data/weblogs/...
vix.input.1.accept = /access\.log\.gz$
vix.input.1.ignore = ^$

# (optional) time range extraction from paths
vix.input.1.et.regex = /home/ledion/data/weblogs/(\d+)
vix.input.1.et.format = yyyyMMdd
vix.input.1.et.offset = 0

vix.input.1.lt.regex = /home/ledion/data/weblogs/(\d+)
vix.input.1.lt.format = yyyyMMdd
vix.input.1.lt.offset = 86400

There are a number of things to note in the virtual index stanza definition:

vix.input.1.path
Points to a directory under the default file system (e.g. HDFS) of the provider where the data of this virtual index lives. NOTE: the “…” at the end of the path denote that Hunk should recursively include the content of subdirectories.

vix.input.1.accept and vix.input.1.ignore allow you to specify regular expressions to filter in/out files (based on the full path) that should/not be considered part of this virtual index. Note that ignore takes precedence over accept. In the above example vix.input.1.ignore is not needed, but I included it to illustrate its availability. A common use case for using it is to ignore temporary files, or files that are currently being written to.

So far so good, but what the heck is all that “.et/lt” stuff?

Glad you asked In case you are not familiar with Splunk, time is a first class concept in Splunk and thus by extension in Hunk too. Given that the data is organized in a directory structure using date partitioning (and this is a very common practice) the “.et/lt” stuff is used to tell Hunk the time range of data that it can expect to find under a directory. The logic goes like this: match the regular expression against the path, concatenate all the capturing groups, then interpret that string using the given format string and finally add/subtract a number of seconds (offset) from the resulting time. The offset comes in handy when you want to extend the extracted time range to build some safety, e.g a few minutes of a given day end up in the next/previous day’s dir, or there’s a difference in timezone from the directory structure and the Hunk server. We do the whole time extraction routine twice in order to come up with a time range, ie extract an earliest time and a latest time. When the time range extraction is configured, Hunk is be able to skip/ignore directories/files which fall outside of the search’s time range. In Hunk speak this is known as: time based partition pruning.

Third: we need to tell Hunk how to schematize the data at search time. At this point we’re entering classic Splunk setup and configuration. In order for Hunk to bind a schema to the data we need to edit another configuration file.

We are going to work with the following file:

$SPLUNK_HOME/etc/system/local/props.conf

[source::/home/ledion/data/weblogs/...]
priority = 100
sourcetype = access_combined

This stanza tells Hunk to assign sourcetype access_combined to all the data in our virtual index (ie all the data under /home/ledion/data/weblogs/). The access_combined sourcetype is defined in $SPLUNK_HOME/etc/system/default/props.conf and defines how access log data should be processed (e.g. each event is a single line, where to find the timestamp and how to extract fields from the raw event)

Minutes 40 – 59: Analyze your Hadoop data

Now we’re ready to start exploring and analyzing our data. We simply run searches against the virtual index data as if it was a native Splunk index. I’m going to show two examples, highlighting data exploration and analytics

1. explore the raw data

 index=hunk

2. get a chart showing the status codes over a 30 day window, using daily buckets

 index=hunk | timechart span=1d count by status

Minutes 59 – ∞: Keep on Hunking !

There’s an unlimited number of ways to slice, dice and analyze your data with Hunk. Sign up for the Hunk Beta program to get access to the latest bits and take it for a spin. We’d love to get your feedback on how to make Hunk better for you.

Stay tuned for another post where I’ll walk you through how to extend the data formats supported by Hunk as well as how to add your own UDFs …

↧

A Developer’s Smorgasbord

August 6, 2013, 1:27 pm

≫ Next: There’s Still Time to Enter the Splunk App Dev Contest

≪ Previous: Hunk: Raw data to analytics in < 60 minutes

First bite of the Cherry(py)

I didn’t always work at Splunk. In fact, many moons ago I used to be a Splunk customer. At the time we were simply looking for a means to better consolidate our enterprise’s numerous sources of log data into a centralized repository. A colleague of mine mentioned this product called Splunk , and hence the journey began. Like many, this started with getting some log files indexed into Splunk and creating some trivial searches and Simple XML dashboards. This very quickly led to more data sources and more elaborate dashboards. Then the bloke sitting next to me saw what I was doing and wanted in on the action, then the adjacent team and then the floor. This internal viral growth required setting up a simple Splunk cluster under my desk with user/role access controls, data retention policies, backups etc., But more importantly, this was when I really started to become a Splunk Developer.

Appetite for construction

I’m a coder. I like making things. And when it comes to working with other products, platforms & frameworks, I am naturally drawn to those that allow me to open them up and extend, customize and augment them to my specific needs. This doesn’t necessarily mean open source, but they need to have an open architecture with as many developer “hooks” as possible. This openness is also what leads to and drives community and collaboration. A thriving community is the bedrock of a platform’s developer ecosystem, and within this realm great ideas are seeded and emerge.

Even though my Developer journey started with Splunk several versions ago, there were already enough developer hooks in the product to satisfy my needs and allow me to shape Splunk to suit the dynamics of the environment in which I was utilizing it. Proprietary closed platforms force you to adapt to their way. It should be the opposite. You should be able to adapt the platform to your way, your data , your requirements. And this adaptation should be as simple and timely as possible to accomplish. Splunk ticked all my boxes.

All you can eat buffet

The Splunk Developer landscape today is vast and growing, with hooks into numerous areas of the platform.

What follows is an overview of all the main areas where you can currently develop atop the Splunk platform.

In many ways it is great to be spoiled for choice, it adds to the agility of the platform. But sometimes, so much choice can be overwhelming, especially to the newbie developer. So I will provide a brief overview of each development hook, with links to more substantial documentation as well as my take as to why you might consider that particular development option.

CLI (Splunk’s Command Line Interface)

You can use the Splunk CLI to monitor, configure and search Splunk via a terminal/shell interface or wrapping the commands in a shell script.

Consider using this if you are not able to program to the REST API or use a language SDK and are more suited to simple shell commands.

Furthermore, the CLI has some functionality over and above what you can do with REST i.e.: start/stop the server, clean indexes, additional troubleshooting tools and more.

http://docs.splunk.com/Documentation/Splunk/5.0.3/Admin/AbouttheCLI

REST API

You can interact with most of the functionality of Splunk via the exposed REST API. This will typically be managing, searching and inputting data. Just think of your experience using Splunk Web. What you can do there, you can perform programmatically directly via REST.

You can perform this interaction from the command line using a program such as CURL, or programmatically in your code.

You might want to use the REST API directly if you are unable to use our language SDK’s or perhaps you are using a language that we don’t currently have an SDK for i.e.: R, Perl, or you might be hitting a custom REST endpoint that isn’t available via an SDK.

http://dev.splunk.com/view/rest-api-overview/SP-CAAADP8

SDKs (Software Development Kits)

Our language SDK’s build upon the underlying REST API by providing a higher level language interface, currently in 6 different language offerings (Python, Java, JavaScript, PHP, Ruby, C#). They make it even simpler to manage, search and input data into Splunk by abstracting all the REST plumbing so you can instead focus your efforts on productive coding in the language that best suits your development needs.

You’ll typically want to use an SDK by leveraging your existing developer language skills to utilize the Splunk REST API to code solutions that integrate with Splunk, be it a data only integration or perhaps a custom user interface, or you may have a requirement to utilize the core Splunk platform to build a standalone big data solution on top off it. The reality is, the potential use cases are numerous.

http://dev.splunk.com/view/sdks/SP-CAAADP7

Splunkbase Apps and Add-ons

At a very simple level Splunk apps & add-ons are a packaging up of the various configurations, searches, knowledge objects, UI components and customizations, inputs, role definitions, field extractions etc. that you might typically create via Splunk Web. An App will typically have a user interface that sits atop multiple other Splunk objects. An add-on typically serves a single reusable purpose to extend the Splunk platform i.e.: a custom input, and won’t have a UI or a setup screen. Apps are typically themed around a specific use case i.e.: Splunk for Active Directory, whereas an add-on will generally be generic and reusable across many diverse use cases i.e.: SNMP Modular Input.

You’ll want to create an app or add-on if you have something you want to share with the community (free or charged) on Splunkbase.

The power of the community working together is a great thing, you might contribute an app, and conversely someone else might have contributed an app that you can benefit from. Everyone wins. Furthermore, it promotes modularity and reuse, all good things for Splunk productivity. Also , many Splunk partners create apps to integrate with their products , hence leveraging these connected communities for the market.

You might also choose to create apps at your organization and share these internally. Building out your internal Splunk apps not only promotes modularity but makes it easier to secure access to a multi user/multi departmental environment via Splunk user/role based permissions and access policies.

You can also bundle up code that isn’t a traditional Splunkbase app by definition, but is something you want to share on Splunkbase i.e.: a code library or perhaps something you have created using an SDK.

http://docs.splunk.com/Documentation/Splunk/5.0.3/AdvancedDev/AppIntro

Scripted inputs

Out of the box, Splunk has simple generic input options available for getting data from a file or receiving data over TCP/UDP.

But you can also create your own custom scripts, in any language, to obtain data from any source. These are called scripted inputs.

You can then bundle these up as Splunk add-ons, share them on Splunkbase and also browse Splunkbase for any scripted inputs others have created.

You’ll want to create a scripted input when the data is not available by file or UDP/TCP, or perhaps you want to perform some additional pre-processing of the raw data before sending it to Splunk. Scripted inputs have been largely superseded since Splunk 5.0 with Modular Inputs, but their bare bones simplicity and speed of development still make them an effective tool in the developer’s arsenal. And if you have a Splunk environment pre 5.0, then Modular Inputs are obviously not available.

http://docs.splunk.com/Documentation/Splunk/5.0.3/AdvancedDev/ScriptedInputsIntro

Modular Inputs

Modular Inputs build upon Scripted Inputs by elevating the creation of custom input add-ons to first class citizen status in Splunk. By that I mean that once you install them , it is as if they are natively part of Splunk, just like File or TCP inputs. You are still achieving the same basic premise as with Scripted Inputs , providing some custom, reusable way of getting at data , but a Modular Input is tightly integrated into the Splunk lifecycle(install , logging, validation , runtime, setup page in the Splunk Manager UI, manageable via REST etc..) and the experience for the end user is much simpler and easier to configure access to their data.

I would advocate creating a Modular Input if you are using Splunk 5+ and wish to provide the simplest and most seamless experience for your users in setting up access to their data sources. And just like with Scripted Inputs , you can use any language you like , although I’d generally recommend Python as Splunk comes with it’s own Python interpreter and this should provision you Modular Input as being platform independent.

http://docs.splunk.com/Documentation/Splunk/5.0.3/AdvancedDev/ModInputsIntro

Custom search commands

Splunk’s Search language is incredibly powerful and extensive for deriving a wide range of analytical insights from your indexed data. But there may well be times when a particularly Search command doesn’t quite do what you want or you have a need for an entirely new search command. The good news is that you can extend the Splunk search language with your own custom commands. These can also be bundled up as Splunk add-ons and shared on Splunkbase. Search commands will be written in Python.

I would create a custom Splunk command when you have need for search processing that is not available out of the box, have a need for frequent reuse of the custom search command logic, and you want to maintain an integrated approach for your users to processing your data by keeping the processing logic as single searches/saved searches. By “integrated approach” I mean that it could alternatively be possible to perform part of the search, output the results via REST and then perform the additional processing from some custom code.

http://docs.splunk.com/Documentation/Splunk/5.0.3/AdvancedDev/SearchScripts

Custom alert scripts

Splunk alerting channels provided by defaults are Email and RSS. But let’s say for example that you wanted to send alerts via SMS, to a Messaging Queue, as an SNMP or directly to a trouble ticket system. Then you can code you own scripted alerts to perform this functionality in any language.

You will pretty much always need to go down this path if you have an alerting requirement beyond Email/RSS. However it would also be possible to code some external alerting program that is scheduled and executes searches / looks up saved search results via the REST API and then sends alerts to some channel based on the your alerting criteria being met.

http://docs.splunk.com/Documentation/Splunk/5.0.3/Alert/Configuringscriptedalerts

Custom REST endpoints

Splunk’s REST API is very thorough, but this can also be extended with your own custom REST endpoints that you could then integrate with programmatically. You would have to use the REST API directly as custom REST endpoints wouldn’t have interfaces in our language SDKs. Custom REST endpoints can also be accessed via an Apps setup page in Splunk Web.

You will typically write a custom REST endpoint when then is specific server side functionality you require that is not exposed or available via the standard REST API.

http://blogs.splunk.com/2011/08/16/bulding-custom-rest-endpoints-conf-2011-demo/

http://docs.splunk.com/Documentation/Splunk/5.0.3/AdvancedDev/SetupExampleCustom

http://docs.splunk.com/Documentation/Splunk/5.0.4/admin/restmapconf

Custom authentication handlers

Splunk ships with 2 authentication mechanisms, internal Spunk authentication and LDAP.

But Splunk also has a scripted authentication API for use with an external authentication system, so you can develop you own authentication handlers.

When you require authentication to external systems other than LDAP, you will need to create a custom authentication handler.

http://docs.splunk.com/Documentation/Splunk/5.0.3/Security/ConfigureSplunkToUsePAMOrRADIUSAuthentication

The App Framework

The Splunk App Framework allows you build your own custom MVC apps and components that run inside the Python based Splunk Web Application Server.

The App Framework layer API provides the abstraction needed to easily integrate your application into core Splunk software. The API provides access to the same libraries that power Splunk Web, including the REST API, CherryPy application server, and Mako templating facilities.

You should consider going down this path if you need to create custom Advanced XML modules and View Templates (for presenting a custom UI experience), creating custom controllers(for interactions from the browser) or create custom model components (for interacting with SplunkD). The code you develop can also be bundled up as an app/add-on and shared on Splunkbase i.e.: a package of custom D3 visualizations.

http://dev.splunk.com/view/app-framework/SP-CAAADVF

The “NEW” App Framework

The “NEW” App framework (currently in beta), allows developers to leverage their existing skills with languages and web frameworks that they are already familiar with to create custom Splunk web apps. It is built on top of the Django web framework and incorporates the Splunk Javascript and Python SDKs. The Splunk Application Framework is really a set of mostly-optional layers from which you can pick and choose. The goal of the framework is to provide you with tools to make working with searches and results quick and easy, while offering the greatest flexibility for different developers and their environments.

Choosing this approach is really a question of is this going to make it simpler for developers to create Splunk web apps and is it going to lead to greater productivity for you. The App Framework is very powerful but requires a particularly proprietary skillset to get productive. The “NEW” App Framework is about allowing you to take advantage of cross platform skills you may already have.

http://dev.splunk.com/view/app-framework/SP-CAAAEMA

Coding outside the square

You can also create whatever other tools, utilities & libraries related to Splunk that you want, you don’t have to rely directly on Splunk hooks. And there are already some cool projects on Github created internally and by the developer community. Here’s a sampling of them.

IOS Logging Libraries : https://github.com/carasso/splogger

Splunk Storm Mobile Analytics : https://github.com/nicholaskeytholeong/splunk-storm-mobile-analytics

Android SDK for Splunk : https://github.com/damiendallimore/splunk-sdk-android

SplunkJavaLogging : https://github.com/splunk/splunk-library-javalogging

SplunkJavaAgent : https://github.com/damiendallimore/SplunkJavaAgent

Powershell Resource Kit : https://github.com/splunk/splunk-reskit-powershell

FluentD Connector : https://github.com/parolkar/fluent-plugin-splunk

Flurry Connector : https://github.com/splunk/splunk-flurry

Apache Camel Component for Splunk : https://github.com/pax95/camel-splunk

Spring Integration Adaptors for Splunk : https://github.com/SpringSource/spring-integration-extensions/tree/master/spring-integration-splunk

You really are only limited by your imagination !

A few more morsels to take away

That’s quite a selection of development options isn’t it? Hopefully this has given you a pretty good overview of the landscape that’s currently out there for Splunk developers.

And don’t forget, Splunk’s Worldwide User’s Conference is coming up very soon and there is some great content for developers this year.

Splunk Conf Developer Track : http://conf.splunk.com/sessions

Splunk Conf Hackathon : http://conf.splunk.com/view/conference-hackathon/SP-CAAAG7C

Splunk Conf Contests : http://conf.splunk.com/view/conference-contest/SP-CAAAG3B

Here’s a few other links/resources to further help you out in your Splunk developer journey:

Free Developer License : http://dev.splunk.com/page/developer_license_sign_up

Splunk Dev : http://dev.splunk.com

Splunk Docs : http://docs.splunk.com

Github : https://github.com/splunk

Twitter : @splunkdev , @damiendallimore

Go forth and develop !!

↧

There’s Still Time to Enter the Splunk App Dev Contest

August 14, 2013, 9:42 am

≫ Next: Splunk .conf2013 Revolution Awards: Nominate your favorite use of Splunk for good

≪ Previous: A Developer’s Smorgasbord

The inaugural Splunk App Dev Contest is heading into the home stretch but there’s still plenty of time to enter and win. The Splunk App Dev Contest is a chance to show off your development chops and win up to $10,000! What kinds of things can you build and enter into the contest? You can build a Splunk App using JavaScript and Django with the new Framework (requires Splunk 5.0+) or with Advanced XML. You can integrate Splunk functionality and data into a web, mobile, desktop or server application using the Splunk REST API or Splunk SDKs for Java, JavaScript, Python, C#, Ruby and PHP. You can build something else amazing that we haven’t thought of and blow away the judges. Spend some time on http://dev.splunk.com/, get your brain storming and get started! Here are the details:

Best Application Category:

1st Prize $10,000
2nd Prize $5,000
3rd Prize $3,000

Best Application for Social Impact Category:

$500 donation to a nonprofit of the winner’s choice (sponsored by Splunk>4Good)

When does the contest run?

Your projects need to be created and final on https://www.hackerleague.org/hackathons/splunk-app-dev-contest/hacks by end of the day, August 30th. Contact the team at devinfo@splunk.com if you have any questions and follow us at on Twitter @splunkdev (we respond to DMs all the time).

Where can I find the full official rules?

Click here for the official rules.

Commence Hacking!

↧

Splunk .conf2013 Revolution Awards: Nominate your favorite use of Splunk for good

August 20, 2013, 10:33 am

≫ Next: Build a Splunk App by end of Aug, you could win something cool

≪ Previous: There’s Still Time to Enter the Splunk App Dev Contest

Splunk Revolution Award Nominations are almost closed! Read and act quickly as nominations are still being accepted thru the end of August!

We’re always excited to hear about the cool, inspiring ways our customers are using Splunk® software. That’s why we established the Splunk Revolution Awards—to distinguish individual Splunk users and recognize their achievements in multiple categories. So here’s your chance to nominate yourself or one of your colleagues! Fame and glory (okay, a nifty awards plaque) could be yours!

Splunk Revolution Awards will have winners in 5 categories: Developers Award, Do-Gooder Award, Enterprise Award, Innovation Award and Ninja Award. You can read all about the Awards, make your nominations and see full contest rules here.

Through Splunk4Good I am constantly encountering awesome Splunk use cases with a social impact. I am hoping to see some your favorite use case for good submitted as Do-Gooder nomination.

The “Do-Gooder” Award
Are you doing “green” things with Splunk software or using it for a social cause or humanitarian effort? Are you helping a charity or working for a non-profit? Here’s your chance to tell us what you’ve done to try and make the world a better place!

Entries will be accepted July 16 through August 31, 2013. Awards will be given out during Splunk’s Annual User Conference in Las Vegas Sept 30-Oct 3. If you havent already registered, do it now. I simply can’t wait to see all my favorite Splunk experts, customers and partners in one place

Submit those nominations here and see you at .conf2013!

↧

Build a Splunk App by end of Aug, you could win something cool

August 20, 2013, 11:29 am

≫ Next: Mobile Analytics (iOS) with Splunk and Storm (Part 2)

≪ Previous: Splunk .conf2013 Revolution Awards: Nominate your favorite use of Splunk for good

Splunk’s Annual User Conf is coming up in just six weeks (Sept 30-Oct 3) and is seriously one of my favorite times of year.

I am especially excited .conf2013 has so much awesomeness for developers.

In addition to the Hackathon we are holding during .conf2013 and the Splunk Revolution Awards we are giving out at .conf2013 – there is also the Splunk App Dev Contest! Chief Legal Counsel suggests I refer you to the official rules (tl;dr = Build a Splunk App by end of Aug, you could win something cool.)

We’re holding a contest to see what amazing projects can be built in the Splunk community. Build something awesome with Splunk software and you can win a prize worth up to $10,000!

Best Application Category:

1st Prize $10,000
2nd Prize $5,000
3rd Prize $3,000

Best Application for Social Impact Category:

$500 donation to a nonprofit of the winner’s choice

What kinds of projects are eligible for the contest?

You can build a Splunk App using JavaScript and Django with the Splunk Application Framework (requires Splunk 5.0+) or with Advanced XML! You can integrate Splunk functionality and data into a web, mobile, desktop or server application using the Splunk REST API or Splunk SDKs for Java, JavaScript, Python, C#, Ruby and PHP! Spend some time on http://dev.splunk.com/, get your brain storming and get started!

When does the contest run?

June 10th through August 30th 2013

To read all about the contest, how to submit, full rules, etc – head over to Splunk App Dev Contest!

↧

Mobile Analytics (iOS) with Splunk and Storm (Part 2)

August 21, 2013, 7:37 pm

≫ Next: Analyzing iOS data with Splunk Enterprise

≪ Previous: Build a Splunk App by end of Aug, you could win something cool

In the previous article “Mobile Analytics (iOS) with Storm” , we discussed about sending stacktrace of uncaught exceptions from apps that run on iOS platform into Storm with the STORM REST API. We hope that the article covers the basic steps to help iOS app developers to jumpstart into realizing the potential of making use of Splunk and Storm to help the developers to develop better quality apps.

Great news … iOS developers are now able to send the above-mentioned stacktrace via TCP into both Splunk Enterprise and Storm; and it is very simple to configure this library into iOS app. The breakdown of the steps are as described below:

CREATE A STORM ACCOUNT
[1] Create a Splunk Storm account by registering yourself here: https://www.splunkstorm.com

DOWNLOAD AND CONFIGURE THE LIBRARY
[1] Then download the logging library from http://splunk-base.splunk.com/apps/92296/mobile-analytics-with-splunk-storm-ios or https://github.com/nicholaskeytholeong/splunk-storm-mobile-analytics/blob/master/ios/splunkmobileanalytics.zip

[2] Unzip it and drag the splunkmobileanalytics folder into the project.

[3] Select Relative to Project at Reference Type, then click Add.

[4] In the AppDelegate interface file (AppDelegate.h), import Storm.h, like so:

#import
...
#import "Storm.h"
// other awesome codes that you are writing

[5] In the AppDelegate implementation file (AppDelegate.m), provide the stormTCPHost and stormTCPPort values in the TCPHost message

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    // Override point for customization after application launch.
    // Set the view controller as the window's root view controller and display.
    self.window.rootViewController = self.viewController;
    [self.window makeKeyAndVisible];

    [Storm TCPHost:@"YOUR_STORM_TCP_HOST" TCPPortNum:YOUR_STORM_TCP_PORT];

    return YES;
}

[6] You are set and Splunk Storm is now integrated seamlessly into your iOS mobile app!

BONUS SECTION
[1] Splunk Enterprise was briefly mentioned earlier in this article and many of you might be asking “Would this library send data to Splunk Enterprise via TCP?” Yes! And the configuration steps are very similar except that we are importing the “Splunk.h” header file. An example is as follow:

[2] In the AppDelegate interface file (AppDelegate.h), import Splunk.h, like so:

#import
...
#import "Splunk.h"
// other awesome codes that you are writing

[3] In the AppDelegate implementation file (AppDelegate.m), provide the splunkTCPHost and splunkTCPPort values in the TCPHost message

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions {
    // Override point for customization after application launch.
    // Set the view controller as the window's root view controller and display.
    self.window.rootViewController = self.viewController;
    [self.window makeKeyAndVisible];

    [Splunk TCPHost:@"YOUR_SPLUNK_TCP_HOST" TCPPortNum:YOUR_SPLUNK_TCP_PORT];

    return YES;
}

[4] It is very easy to configure TCP input with Splunk Enterprise. The complete tutorial is available here at your perusal: http://docs.splunk.com/Documentation/Splunk/latest/Data/Monitornetworkports

TIME FOR SOME ANALYTICS
Splunk Enterprise and Storm will receive the data from your iOS app when it is correctly configured and this is a sample snapshot of how the stacktrace looks like.

Please give it a try and feel free to file any issues or suggestions here: https://github.com/nicholaskeytholeong/splunk-storm-mobile-analytics/issues

We look forward to getting your feedback to further improve this mobile analytics library and seeing you at .conf2013! Register here if you have not

↧

Analyzing iOS data with Splunk Enterprise

August 26, 2013, 3:00 pm

≫ Next: Introducing Weblog Add-on

≪ Previous: Mobile Analytics (iOS) with Splunk and Storm (Part 2)

This article describes in detail the steps to:

configure the iOS library
install Splunk Enterprise and Splunk app to receive data forwarded from iOS mobile devices
basic Splunk searches

CONFIGURE THE iOS LIBRARY

Download the logging library from http://splunk-base.splunk.com/apps/92296/mobile-analytics-with-splunk-storm-ios or https://github.com/nicholaskeytholeong/splunk-storm-mobile-analytics/blob/master/ios/splunkmobileanalytics.zip
Unzip it and drag the splunkmobileanalytics folder into the project
Select Relative to Project at Reference Type, then click Add.
In the AppDelegate interface file (AppDelegate.h), import Splunk.h, like so:
In the AppDelegate implementation file (AppDelegate.m), provide the SPLUNK_HOST_URL and TCP_PORT values in the message
You are set! Splunk Enterprise is now integrated seamlessly into your iOS mobile app!

INSTALL SPLUNK ENTERPRISE AND SPLUNK APP

Download the latest Splunk Enterprise from http://www.splunk.com/download
Install Splunk Enterprise (in this article we assume a very simple Splunk deployment – your Splunk instance is both a receiver and an indexer)
Download the app “Mobile Analytics with Splunk” from Splunk Apps http://apps.splunk.com/app/1578
You may also install the app automatically from Splunk UI if you wish
The app will be listed if it is installed correctly
Go to the TCP inputs page in Splunk UI. You will notice that Splunk is listening to port 9090
You may change the incoming port at the Splunk Enterprise. To do this:
```
vi $SPLUNK_HOME/etc/apps/mobileanalytics/default/inputs.conf
[tcp://<ANOTHER_PORT>]
```
** Don’t forget to update the port number in the AppDelegate.m with “ANOTHER_PORT“
Restart your Splunk instance

BASIC SPLUNK SEARCHES

Hypothetically this is your stacktrace of the uncaught exception in your mobile app
Remember the data forwarding that we configured earlier? The search summary page will update itself with the received data from the iOS device
This is a simple search to filter only iOS events sourcetype=”ios_crash_log”
This is a sample search to count the different types of uncaught exceptions that caused the app to crash

We hope that you find this article useful to forward data from iOS apps and to configure Splunk Enterprise. Feedback and suggestions are always welcome.

↧

Introducing Weblog Add-on

September 4, 2013, 9:20 am

≫ Next: Realtime alerts of mobile app crashlog

≪ Previous: Analyzing iOS data with Splunk Enterprise

Another exciting day at Splunk and another great product release! I am thrilled to announce the release of Weblog Add-on. During .conf2011, we announced beta release of Splunk App for Web Intelligence. We learned quite a bit from this beta release. After over 7500 downloads of the Web Intelligence beta App, we decided to close the beta and work on a product that closely aligns to the customer needs. Weblog Add-on has couple of key features:

1) Field Extraction: Easy to map fields from Apache or IIS weblogs. This includes both standard fields and ability to create and map custom fields. No need to write code in configuration files to map fields.

2) Event-Type Library: Making event-types from Web Intelligence beta 1.0 available as a library to enable end users to build their custom Web Intelligence app

Let’s spend a couple of minutes on how the add-on works. Once you identify a source or sourcetype, the add-on allows users to map field with sample data. Simply drag and drop the field name that matches with the data. If there are custom field, simply label the custom field and map to the data. Header rows are not required to be present or accurate. Splunk’s DELIMS capabilities are used to provide higher performance in high-volume environments than that provided by regular expression-based configuration builders. Users can also remove leading or trailing characters and pick a delimiter by simply clicking on the buttons.

Most websites are different and differ in KPI’s. We want to provide our customers with flexibility on creating their own dashboards. To make it easy, nearly 230 event-types are available under the manager tab. You can pick one of the available event-type or create a new one.

For more details or to download the Weblog Add-on, use this link. For a live demo or to dive deeper, come and attend .conf2013 in Las Vegas. Did you register yet? Time is running out….

Happy Splunking!

↧

Realtime alerts of mobile app crashlog

September 4, 2013, 7:43 pm

≫ Next: OHI! Open Humanitarian Initiative Code Sprint

≪ Previous: Introducing Weblog Add-on

In the previous articles, we discussed about how to include the library to forward crashlog from iOS and Android mobile apps into Splunk Enterprise, install a Splunk app to aggregate the forwarded logs from mobile devices and then perform some simple analytics with the indexed data. If you have been following closely the write-ups and Splunk-ing the valuable data from iOS and Android mobile apps, you might be interested to know how to setup an alerting mechanism in the event of a crash.

We are going to discuss in particular how to configure realtime alerts via email with PDF attachment using Splunk Enterprise. It takes very little time (timed myself – 2 minutes at least; pro Splunkers out there, you might take even less time ) to enable realtime alert and the steps are follow:

CONFIGURE EMAIL SETTINGS

Go to Manager » System Settings » Email System Settings
Provide the values for Mail Host, Username and Password
Don’t forget to Enable SSL

CREATE REALTIME ALERT

A very simple search command is used in this example sourcetype=”ios_crash_log” with Real-time of 1 minute window
Click Create and select Alert

SCHEDULE ALERT

Provide a name for the alert. In this example it is Realtime iOS Crash Alert
Schedule the alert as Trigger in real-time whenever a result matches
Click Next » button

CONFIGURE ALERT TO SEND EMAIL

Check to Send email
Provide the email address(es) of the recipient(s)
Check to attach results as PDF
Check to show triggered alerts in Alerts manager
Click Next » button

SHARE ALERT

Select Share as read-only to all users of current app (if you want to share the search result to all users)
Click Finish » button

ALERT CONFIGURED SUCCESSFULLY

Congratulations! You have successfully created a real-time email alert with PDF attachment
Click OK button to conclude the configuration

RECEIVE EMAIL NOTIFICATION

Now that you have successfully configured realtime alert with Splunk, you may now check your email client for the alert
In this example, the email is sent from mobile.dev.acct@gmail.com (which was configured in Step 1)
You will notice that the subject of the email is Splunk Alert: Realtime iOS Crash Alert (this was configured in Step 4 where $name$ is replaced with the name of the alert)

PDF ATTACHMENT

Voila! A PDF attachment of the realtime search result

Hurray … you can now setup realtime alert with Splunk! This is A.W.E.S.O.M.E

Also, If you have been thinking about signing up to attend .conf2013, think no more, because (copied directly from http://conf.splunk.com) …

The 4th Annual Splunk Worldwide Users’ Conference is the best way to deepen your practical knowledge of Splunk, learn best practices and check out new solutions, apps and add-ons. Connect with hundreds of your peers, see how others apply Splunk technology to real-world projects and become more involved in the Splunk community. Together, we’ll find ways for our data to show us new approaches, opportunities, and innovations. The conference features three days of breakout sessions, plus two pre-conference days for Splunk University–aka Splunk hands-on training classes.

↧

OHI! Open Humanitarian Initiative Code Sprint

September 4, 2013, 9:37 pm

≫ Next: Updates to the Splunk SDKs for Java, Python and JavaScript

≪ Previous: Realtime alerts of mobile app crashlog

Splunk4Good is excited to sponsor the Open Humanitarian Initiative Code Sprint happening Sept 9-13 in Washington, DC, Birmingham, England and remotely via the Humanitarian Toolbox.

The Open Humanitarian Initiative (OHI) is working to revoluntize how information is shared in humanitarian response by engaging ngos, academic institutions, private sector technology companies, donors and governments in a shared vision that advocates for open data.

The OHI Code Sprint will gather technologists and subject matter experts the week of Sept 9-13 to work on data, mapping and visualization problems in the humanitarian and disaster response space.

Interested in mapping, data structures, humanitarian assisted disaster relief, design++? Check out the OHI Eventbrite page to learn more about how you can participate in DC, UK or virtually.

↧

Updates to the Splunk SDKs for Java, Python and JavaScript

September 6, 2013, 9:33 am

≫ Next: The Splunk SDK for Python gets modular input support

≪ Previous: OHI! Open Humanitarian Initiative Code Sprint

The Splunk SDKs for Java, Python and JavaScript have been refreshed with a handful of important updates for developers working with the Splunk platform:

The Splunk SDK for Java now has added support for building modular input scripts in Java
The Splunk SDK for Python now has added support for building modular input scripts in Python
The Splunk SDK for JavaScript now supports Node.js v0.8.x and v0.10.x and no longer supports Node.js v0.6.x

The Modular input support in the Splunk SDKs for Java and Python, already available in the Splunk SDK for C#, makes it easier for developers to manage custom inputs without having to code directly to the REST API. Modular Inputs allow you to get data that may not be accessible via TCP/UDP, not in an acceptable file format or that requires pre-processing into Splunk in a programmatic, reusable manner. There’s sample code in both the Splunk SDK for Java and the the Splunk SDK for Python that shows how to work with modular inputs.

Go download the updated SDKs and get coding!

↧

The Splunk SDK for Python gets modular input support

September 10, 2013, 10:17 am

≫ Next: Splunking Foursquare

≪ Previous: Updates to the Splunk SDKs for Java, Python and JavaScript

Support for modular inputs in Splunk 5.0 and later enables you to add new types of inputs to Splunk that are treated as native Splunk inputs.

Last week Jon announced updates to the Splunk SDKs for Java, Python, and JavaScript, now we’ll take a deep dive into modular input support for the Splunk SDK for Python.

The latest release of the Splunk SDK for Python brings modular input support. The Splunk SDKs for C# (see Developing Modular Inputs in C#) and Java also have this functionality as of version 1.0.0.0 and 1.2, respectively. The Splunk SDK for Python enables you to use Python to create new modular inputs for Splunk.

Getting started

The Splunk SDK for Python comes with two example modular input apps: random numbers and Github forks. You can get the Splunk SDK for Python on dev.splunk.com. Once you have the Splunk SDK for Python, you can build the .spl files for these examples and install them via the app manager in Splunkweb. Do this by running python setup.py dist in the root level of the SDK, the .spl files will be in the build directory.

Now I’ll walk you through the random numbers example.

Random numbers example

The random numbers example app will generate Splunk events containing a random number between the two specified values. Let’s get into the steps for creating this modular input.

Inherit from the Script class

As with all modular inputs, we should inherit from the abstract base class Script from splunklib.modularinput.script from the Splunk SDK for Python. They must override the get_scheme and stream_events functions, and, if the scheme returned by get_scheme has Scheme.use_external_validation set to True, the validate_input function.

Below, I’ve created a MyScript class in a new file called random_numbers.py which inherits from Script, and added the imports that will be used by the functions we will override.

import random, sys
from splunklib.modularinput import *
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

class MyScript(Script):
    # TODO: fill in this class

Override get_scheme

Now that we have a class set up, let’s override the get_scheme function from the Script class. We need to create a Scheme object, add some arguments, and return the Scheme object.

    def get_scheme(self):
        scheme = Scheme("Random Numbers")

        scheme.description = "Streams events containing a random number."
        # If you set external validation to True, without overriding
        # validate_input, the script will accept anything as valid.
        # Generally you only need external validation if there are
        # relationships you must maintain among the parameters,
        # such as requiring min to be less than max in this
        # example, or you need to check that some resource is
        # reachable or valid. Otherwise, Splunk lets you
        # specify a validation string for each argument
        # and will run validation internally using that string.
        scheme.use_external_validation = True
        scheme.use_single_instance = True

        min_argument = Argument("min")
        min_argument.data_type = Argument.data_type_number
        min_argument.description = "Minimum random number to be produced by this input."
        min_argument.required_on_create = True
        # If you are not using external validation, add something like:
        #
        # setValidation("min > 0")
        scheme.add_argument(min_argument)

        max_argument = Argument("max")
        max_argument.data_type = Argument.data_type_number
        max_argument.description = "Maximum random number to be produced by this input."
        max_argument.required_on_create = True
        scheme.add_argument(max_argument)

        return scheme

Optional: Override validate_input

Since we set scheme.use_external_validation to True in our get_scheme function, we need to specify some validation for our modular input in the validate_input function.

This is one of the great features of modular inputs, you’re able to validate data before it gets into Splunk.

In this example, we are using external validation to verify that min is less than max. If validate_input does not raise an exception, the input is assumed to be valid. Otherwise it prints the exception as an error message when telling splunkd that the configuration is invalid.

   def validate_input(self, validation_definition):
        # Get the parameters from the ValidationDefinition object,
        # then typecast the values as floats
        minimum = float(validation_definition.parameters["min"])
        maximum = float(validation_definition.parameters["max"])

        if minimum >= maximum:
            raise ValueError("min must be less than max; found min=%f, max=%f" % minimum, maximum)

Override stream_events

The stream_events function handles all the action: Splunk calls this modular input without arguments, streams XML describing the inputs to stdin, and waits for XML on stdout describing events.

    def stream_events(self, inputs, ew):
        # Go through each input for this modular input
        for input_name, input_item in inputs.inputs.iteritems():
            # Get the values, cast them as floats
            minimum = float(input_item["min"])
            maximum = float(input_item["max"])

            # Create an Event object, and set its data fields
            event = Event()
            event.stanza = input_name
            event.data = "number=\"%s\"" % str(random.uniform(minimum, maximum))

            # Tell the EventWriter to write this event
            ew.write_event(event)

Bringing it all together

Let’s bring all the functions together for our complete MyScript class. In addition, we need to add these 2 lines at the end of random_numbers.py to actually run the modular input script:

if __name__ == "__main__":
    sys.exit(MyScript().run(sys.argv))

Here is the complete random_numbers.py:

import random, sys

from splunklib.modularinput import *

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

class MyScript(Script):
    def get_scheme(self):
        scheme = Scheme("Random Numbers")

        scheme.description = "Streams events containing a random number."
        # If you set external validation to True, without overriding
        # validate_input, the script will accept anything as valid.
        # Generally you only need external validation if there are
        # relationships you must maintain among the parameters,
        # such as requiring min to be less than max in this
        # example, or you need to check that some resource is
        # reachable or valid. Otherwise, Splunk lets you
        # specify a validation string for each argument
        # and will run validation internally using that string.
        scheme.use_external_validation = True
        scheme.use_single_instance = True

        min_argument = Argument("min")
        min_argument.data_type = Argument.data_type_number
        min_argument.description = "Minimum random number to be produced by this input."
        min_argument.required_on_create = True
        # If you are not using external validation, add something like:
        #
        # setValidation("min > 0")
        scheme.add_argument(min_argument)

        max_argument = Argument("max")
        max_argument.data_type = Argument.data_type_number
        max_argument.description = "Maximum random number to be produced by this input."
        max_argument.required_on_create = True
        scheme.add_argument(max_argument)

        return scheme

    def validate_input(self, validation_definition):
        # Get the parameters from the ValidationDefinition object,
        # then typecast the values as floats
        minimum = float(validation_definition.parameters["min"])
        maximum = float(validation_definition.parameters["max"])

        if minimum >= maximum:
            raise ValueError("min must be less than max; found min=%f, max=%f" % minimum, maximum)

    def stream_events(self, inputs, ew):
        # Go through each input for this modular input
        for input_name, input_item in inputs.inputs.iteritems():
            # Get the values, cast them as floats
            minimum = float(input_item["min"])
            maximum = float(input_item["max"])

            # Create an Event object, and set its data fields
            event = Event()
            event.stanza = input_name
            event.data = "number=\"%s\"" % str(random.uniform(minimum, maximum))

            # Tell the EventWriter to write this event
            ew.write_event(event)

if __name__ == "__main__":
    sys.exit(MyScript().run(sys.argv))

Optional: set up logging

It’s best practice for your modular input script to log diagnostic data to splunkd.log. Use an EventWriter‘s log method to write log messages, which include both a standard splunkd.log level (such as DEBUG or ERROR) and a descriptive message.

Add the modular input to Splunk

We’ve got our script ready, now let’s prepare to add this modular input to Splunk.

Package the script and the SDK library

To add a modular input that you’ve created in Python to Splunk, you’ll need to first add the script as a Splunk app.

Create a directory that corresponds to the name of your modular input script—for instance, random_numbers—in a location such as your Documents directory. (You’ll copy the directory over to your Splunk directory at the end of this process.)
In the directory you just created, create the following three empty directories:
- bin
- default
- README
From the root level of the Splunk SDK for Python, copy the splunklib directory into the bin directory you just created.
Copy the modular input Python script (for instance, random_numbers.py) into the bin directory. Your app directory structure should now look like the following:

.../
  bin/
    app_name.py
    splunklib/
      __init__.py
      ...
  default/
  README/

Create an app.conf file

Within the default directory, create a file called app.conf. This file is used to maintain the state of an app or customize certain aspects of it in Splunk. The contents of the app.conf file can be very simple:

[install]
is_configured = 0

[ui]
is_visible = 1
label = My App

[launcher]
author = Splunk Inc
description = My app is awesome.
version = 1.0

For more examples of what to put in the app.conf file, see the corresponding files in the modular inputs examples.

Create an inputs.conf.spec file

You need to define the configuration for your modular input by creating an inputs.conf.spec file manually. See Create a modular input spec file in the main Splunk documentation for instructions, or take a look at the SDK samples’ inputs.conf.spec file, which is in the application’s README directory. For instance, the following is the contents of the random numbers example’s inputs.conf.spec file:

[random_numbers://<name>]
*Generates events containing a random floating point number.

min = <value>
max = <value>

Move the modular input script into your Splunk install

Your directory structure should look something like this:

.../
  bin/
    app_name.py
    splunklib/
      __init__.py
      ...
  default/
    app.conf
  README/
    inputs.conf.spec

The final step to install the modular input is to copy the app directory to the following path: $SPLUNK_HOME$/etc/apps/

Restart Splunk, and on the App menu, click Manage apps. If you wrote your modular input script correctly, the name of the modular input—for instance, Random Numbers—will appear here. If not, go back and double-check your script. You can do this by running python random_numbers.py --scheme and python random_numbers.py --validate-arguments from the bin directory of your modular input. These commands will verify that your scheme and arguments are configured correctly, these commands will also catch any indenting issues which could cause errors.

If your modular input appears in the list of apps, in Splunk Manager (or, in Splunk 6.0 or later, the Settings menu), under Data, click Data inputs. Your modular input will also be listed here. Click Add new, fill in any settings your modular input requires, and click Save.

Congratulations, you’ve now configured an instance of your modular input as a Splunk input!

↧

Splunking Foursquare

September 15, 2013, 2:45 pm

≫ Next: Splunk at the Connected Car Expo

≪ Previous: The Splunk SDK for Python gets modular input support

I tend to travel quite a bit in my role at Splunk.The other day I was wondering to myself how far I had traveled in the last week , the last month , the last year. It just so happens that I am a Foursquare user , not because I like to hoard mayorships across the globe , rather I tend to use Foursquare checkins to help me remember where I have been.Now you get where I am gong with this , because “where have I been” actually means “a lot of cool location meta data” that I can have fun with.

I was looking around online for a simple tool that could hook into Foursquare to tell me how far I have traveled and where I have been and visually geo plot this for me.Nothing that I tried really appealed.Fortunately I have all the tools at my disposal to very simply do what I want myself.

Getting at the Foursquare checkin data

Foursquare has a comprehensive REST API that makes it easy to get at your data.

In particularly your checkins.

In order to poll your checkin events , you first need to :

1) register a Foursquare App

This will generate your CLIENT ID and CLIENT SECRET which you’ll need in the next step.

2) acquire your OAUTH2 token

You will need the returned token for when you setup the REST input in Splunk.

Setting up the REST input in Splunk

To poll the Foursquare REST API from Splunk I am using the REST API Modular Input , freely available on Splunkbase Apps.

This is a completely generic modular input for polling any REST API , so we can use this with Foursquare.

Setting up the input stanza is very simple. You just need to provide the Endpoint URI and OAUTH2 token.

I have also specified a custom response handler. The reason for this is that the JSON response that Foursquare returns is 1 single document with all the checkin events aggregated. I want to parse this JSON document and separate each checkin out into an individually indexed event in Splunk.

So the REST API Modular Input provides an extension mechanism for adding in any custom request/response functionality you may require over and above the generic functionality provided out of the box. The custom response handler I have added in for Foursquare checkin responses is very simple.

Searching over the checkin data

Now , if everything has gone to plan , you will have all your Foursquare checkin events being indexed in Splunk in JSON format.

If we drill down on the venue.location field , we can see the geo location data , which is what I am interested in getting at.

Calculate the distance between checkin events

With this latitude and longitude data , I can use this to calculate the distance between 2 geo locations. But how can I perform the underlying trigonometry that takes into account the earth’s curvature(a spherical equation) to derive an accurate distance metric ? Well there is already a well known algorithm for this know as the Haversine formula.

Wouldn’t it be nice if there was a Splunk search command that used the Haversine formula to take 2 points on the globe in latitude/longitude format and gave me the distance as an output field.

Bazinga ! The Splunk community has spoken , and there is a freely downloadable add-on on Splunkbase that does just this.

Let’s get visual

In this search command I am basically extracting the latitude and longitude from the current event and the previous event & applying the haversine search command to output the distance between these 2 events. You’ll notice that I am also “deduping” just as a safeguard incase my polling from Foursquare has overlapped and pulled in some duplicate checkin events.

index=main sourcetype="4sq_checkins" |dedup id| sort - createdAt | rename venue.location.lat as "lat" | rename venue.location.lng as "long" | streamstats current=f global=f window=1 first(lat) as next_lat first(long) as next_long first(venue.name) as pointB_venue by sourcetype | strcat lat "," long pointA | haversine originField=pointA units=mi inputFieldLat=next_lat inputFieldLon=next_long outputField=distance_miles | strcat next_lat "," next_long pointB | rename venue.name as "pointA_venue" | table pointA pointA_venue pointB pointB_venue distance_miles | where distance_miles > 0

Now I can easily calculate my total distance traveled over a given period of time by piping through into a stats command to sum the distances up.

....| stats sum(eval(ceil(distance_miles))) as "Total Distance Travelled (Miles)"

And I can also output a “_geo” field that allows me to plot my checkins on a map from the popular Google Maps App on Splunkbase.

index=main sourcetype="4sq_checkins" |dedup id| sort - createdAt |strcat venue.location.lat "," venue.location.lng _geo

But don’t stop there

There is a world of data available out there that the Splunk REST API Modular Input can help you tap into.

Check out this recent blog.

Go and tap into this data , search over it , correlate it , turn it into powerful analytics and visualizations.

As we say , “Your Data , No Limits”.

↧

Splunk at the Connected Car Expo

September 18, 2013, 8:20 am

≫ Next: Blurred lines – digital intelligence, application management, big data and customer experience

≪ Previous: Splunking Foursquare

Splunk has teamed up with the LA Auto Show’s Connected Car Expo to sponsor the first ever FASTPITCH competition. The FASTPITCH takes place on the first day of the Expo – November 19, 2013 – and targets entrepreneurs and independent ventures in the seed, start-up or early stage of development within the automotive or technology sector and offer the opportunity to pitch their original business concept to a judging panel comprised of VCs, incubators and CEOs.

Splunk welcomes entrepreneurs and developers to enter the FASTPITCH for a chance to win a 12-month Splunk Start Program Enterprise License as well as great exposure to the automotive and technology industries.

Excited to say I will be on the FASTPITCH judging panel and look forward to the innovative submissions!

About the competition

FASTPITCH targets visionaries with new ideas, independent ventures in the seed, start-up or early stage of development within the automotive or technology sector (with <$500K investment), will have the opportunity to pitch their original business concept to a judging panel comprised of VCs, incubators and CEOs at this years CCE.

Entrant individuals/groups should attempt to solve an issue related to the connected car and/or automotive industry; original ideas can address topics including automotive data, mapping, emissions, environmental issues, in-car apps, in-car entertainment, parking and telematics.

How to submit your idea

To enter the Splunk FASTPITCH Competition, please complete and submit the following items to info@connectedcarexpo.com

Completed Application Form (PDF) DOWNLOAD APPLICATION HERE
Presentation Deck of Company (10 slide max; submit as PDF, PowerPoint, Keynote, Prezi or Sliderocket)

Deadlines

EARLY BIRD ENTRY DEADLINE: 6pm (PST) on Tuesday, September 17, 2013

FINAL ENTRY DEADLINE: 6pm (PST) on Monday, September 30, 2013

See CCE for full Rules & Conditions

Prizes

FIRST PLACE:
Splunk Start Program Enterprise License: 20GB for 12-month term, including Splunk education, marketing and support teams ($40,000-Value)

SECOND PLACE:
CCE 2014 Exhibit Space ($30,000-Value)

THIRD PLACE:
Top Tech Showcase Space at CCE 2014 ($9,000-Value)

Fastpitch Timeline

Aug 13 – Call for Entry
Sep 17 – Early Bird Entry Deadline
Sep 30 – Final Entry Deadline
Oct 16 – Eight Finalists Announced
Nov 19 – CCE FASTPITCH Competition

Find more info on the CCE website and keep up with the latest on Twitter: @connectedcarLA.

See you in LA!

↧

Blurred lines – digital intelligence, application management, big data and customer experience

September 20, 2013, 6:54 am

≫ Next: What’s in Store for Developers at .conf2013

≪ Previous: Splunk at the Connected Car Expo

Firstly – no references to Robin Thicke – there’s only trouble going down that route (though I do wonder how much big data has been generated by that video – maybe I’ll try the Splunk Twitter App to find out…). Onto less trivial things, I’ve been spending a lot of time with customers over the last couple of weeks talking about the blurring lines between four things – application management, digital intelligence, big data and their impact on customer experience.

Clean lines

The customer experience management subject is a hot topic right now and there’s a correlation between that and digital intelligence. The better the information and intelligence you have on your customer – the better the experience you can offer them. By making all the customer interaction data from all the channels accessible (mobile, web, social etc) – you can optimize the customers experience, increase conversation rate and maximize the chance for a positive outcome. To give you a customer example – here’s an excerpt from the Splunk Tesco case study –

“Tesco developers and business/web analysts and operations teams needed a better understanding of what products and website features customers were engaging and what pathways resulted in the highest lead conversions.”

So the lines are pretty clean here – you analyze the digital intelligence of a customer’s interactions – you can improve the customer experience.

Blurring the lines (a bit) with application management

Here’s where the lines start to get blurred – if I add improved application management to the mix – what impact does that have on the customer’s experience? I was talking to another leading EMEA retailer last week. They were explaining how managing all the tiers of their application infrastructure with Splunk made sure that they reduced errors in the handoff between their ecommerce site and that of a payment provider. Their story was all around application management that augmented digital intelligence with the end result of much better customer experience and hence improved conversion rates – a blurring of three things but with a positive outcome.

Really blurred lines – mixing in big data

To blur the lines further – how does big data fit into all of this? With the examples above – we’re talking about real-time data that is machine generated. It is high velocity, variable and has veracity (it is uncertain what you are going to get) but the volume isn’t always high. It might be – it really depends on your definition of volume. Is 100GB of real-time data a day from machines big data or not?

Where the lines truly get blurred is when you start to add big volumes of historical data. If you’ve been following the growth of Hadoop and the announcement of the beta of Hunk (Splunk analytics for Hadoop) you’ll get an idea of how the lines between big data, application management and digital intelligence are blurring and what the outcome is for customer experience. As an example – a very large EMEA organization on the Hunk beta is looking at how they can get the value from all of their data in Hadoop to drive better decisions across multiple business units with a focus on multi-channel customer experience.

Putting it all together

If I can take what all of my customers are doing right now in real time with all of my customer interactions from the last year and add it to my real-time and historical application data – what impact does that have on my customer experience?

With a bit of imagination – there are some exciting possibilities (and the potential for spectacular failures). However, there there are some quite fundamental questions we need answered so we can make sure that the advertising scene from Minority Report becomes reality…..

So I’ll leave you some of those fundamental questions about the blurred lines between digital intelligence, application management, customer experience and big data:

How are you going to manage and get access to that data?
How are you going to let marketing use it?
How are IT and ops going to manage it?
How are you going to build those next generation “killer apps” on top your data?

If you’re coming to .conf2013 – see you there – you’ll see how customers are answering some of those questions.

If you’re not coming – feel free to ask any questions in the comments below…

As always, thanks for reading…

↧

What’s in Store for Developers at .conf2013

September 23, 2013, 11:03 am

≫ Next: Splunking jQuery Conference: drive user experience online and on site!

≪ Previous: Blurred lines – digital intelligence, application management, big data and customer experience

We’re a week away from .conf and attending developers have a lot to be excited about. Last year, we held our first ever Splunk Hackathon that produced some pretty amazing winners.

This year we’re doing it again, Monday evening from 5:30-10:30pm. We’ll have special guests, Splunkers ready to help and plenty of food and beverages to power you through the hacking. Sign-up now and get ready for another great time.

Before we get hacking, Splunk University will hold two days of classes on Sunday and Monday, including classes on Building Splunk Apps and Developing with the Splunk SDKs for Java and Python. These classes will be a great way to build skills and warm up for the hackathon Monday night.

After Tuesday morning’s keynote, we’ll have an entire track for Developing on Splunk, with hands-on sessions on the Splunk SDKs, modular inputs, the new Framework for building Splunk Apps, semantic logging, dynamic lookups and more! Splunkers, partners and customers alike will be presenting, showing and teaching everything you need to know to develop with, and on, Splunk.

↧

Splunking jQuery Conference: drive user experience online and on site!

September 25, 2013, 8:57 am

≫ Next: Hunk Setup using Hortonworks Hadoop Sandbox

≪ Previous: What’s in Store for Developers at .conf2013

Last June, jQuery Foundation held their conference in beautiful Portland, Oregon. As a Diamond Sponsor, we wanted to build something that would be beneficial to the jQuery community part of our Splunk4Good initiatives. What’s better than Splunking the entire conference?

To see the end result, check out this interactive infographic showcasing Splunk-powered web analytics applied to the conference website. The complete Splunk dashboard can be found here.

The goal is to capture client-side data (e.g. pageviews, link/button clicks, hovers), and build powerful analytics & visualizations in order to tackle the following business questions:

Which topics are visitors most interested in?
What are the top traffic sources for visitors who purchase tickets?
How are visitors interacting with the site, including time leading to ticket purchase?

To help answer these questions, we collected a sizeable clickstream during the week of the conference:

8,500+ unique visitors with 15,000+ pageviews
240,000+ client-side events including 60,000+ clicks

1. Which topics are visitors most interested in?

Every time a visitor expanded a talk to read its description, it was recorded as a click. The underlying assumption is that the talks with the most clicks, are the talks that are most intriguing to conference attendees, and therefore tend to be the talks that are most popular.

Actionable insights:

Organizers can ensure future conference programs include highly targeted content by focusing on most popular tracks, as listed above.
Organizers can more optimally plan room capacity based on a talk’s anticipated popularity. For example, “jQuery UI Widgets vs HTML5” by TJ was a front-runner early on before the conference started; with that predictive knowledge we could prevent major room overflows.

2. What are the top traffic sources for visitors who purchase tickets?

While jQuery conference site is primarily for content consumption, it does act as a store front (yes, they sell tickets!). So we tracked the different steps (initial pageview, click on ‘buy’ button, and order confirmation) that led to ticket sales, and broke down the funnel by traffic source.

Actionable insights:

Surprisingly, a local site, calagator.org, had the highest conversion ratio of 3.85% or 8x more than jquery’s own blog. It turns out that site caters to the niche technology community of Portland. Therefore, conference marketers should consider featuring events on local community portals to capitalize on such high conversion sources.
Conversions from blog.jquery.com are about 2x more than the rest of the jquery properties like jquery.com, api.jquery.com and events.jquery.org. Thus conference marketers may want to further leverage jQuery blog & increase cross-promotion to market their event, more so than other jQuery subdomains.

3. How are visitors interacting with the site?

Finally, as with most sites, a lot of insight can be drawn by having a view of visitors’ timeline of actions: this is similar to Google Analytics Visitors Flow except it’s at the more granular event level vs. page level. Each visitor has its own ‘swimlane’ composed of a series of shapes: rectangles depict time spent on a specific section, and circles depict specific actions such as clicks. We like to think of this as the Visitors Behavior Flow:

Actionable insights:

Using the Visitors Behavior Flow, you can monitor user behavior on the site in real-time in order to detect potential user experience problems early on, or even provide targeted on-time support or promotions…etc.
While this sample data is small, you can visually observe that people who actually purchased tickets spent about the same amount of time on both the training and program sections, yet the training section is at the bottom of the page. This could mean training classes are at least as important as the session talks to this audience. Organizers may want to focus on the training section just as much as the program itself to further drive attendance and ticket sales.

Now it’s your turn to create your own powerful data visualizations on Splunk!

While Splunk offers easy-to-create dashboard visualizations, this infographic was built exclusively using open source JavaScript libraries such as jQuery, Backbone.js, Require.js, and D3.js – tools web developers are already familiar with – in order to show how one can fully customize the way Splunk search results are rendered.
Email me (Roy Arsan) or Shirley Wu and share with us your own innovative ways of visualizing your Splunk data to better understand your users.

Stay tuned for technical details on how to track & collect data from your own sites, or join us at our Digital Intelligence booth at Splunk .conf 2013 next week in Las Vegas:

Your Data, No Limits!

↧

Hunk Setup using Hortonworks Hadoop Sandbox

September 27, 2013, 2:23 pm

≫ Next: What’s New for Developers in Splunk 6

≪ Previous: Splunking jQuery Conference: drive user experience online and on site!

Hortonworks Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop examples. Recently Hortonworks and Splunk released a tutorial and video to install and connect Hunk with the Hortonworks Hadoop Sandbox

This blog summarizes the configurations used as part of the Hunk setup.

Configurations for Hadoop Provider:

Key	Value
Java Home	/usr/jdk/jdk1.6.0_31
Hadoop Home	/usr/lib/hadoop
Hadoop Version	Hadoop version 1.x, (MR1)
Job Tracker	sandbox:50300
File System	hdfs://sandbox:8021
Splunk search recordreader	com.splunk.mr.input.SimpleCSVRecordReader, com.splunk.mr.input.ValueAvroRecordReader

Configurations for Hadoop Virtual Indexes:

Key	Value
Name	hadoop_sports
Path to data in HDFS	/user/hue/raanan/…
Whitelist	\.csv$

For more Hunk details and examples go to the blog:

http://blogs.splunk.com/2013/08/03/hunk-intro-part-3/

↧

What’s New for Developers in Splunk 6

October 1, 2013, 11:20 am

≫ Next: The Splunk App for Unix 5.0 is finally here!

≪ Previous: Hunk Setup using Hortonworks Hadoop Sandbox

With Splunk Enterprise 6, we’ve delivered capabilities to bring operational intelligence to everyone across the organization. Key to driving operational intelligence across the enterprise with Splunk are, of course, developers. Developers instrument the logs, integrate the data and build the apps to make it happen. In Splunk 6 there are two great new features that make it easier for developers to quickly and efficiently build powerful Splunk apps: the Splunk Web Framework and Data Models

The Splunk Web Framework

The Splunk Web Framework, which was first made available in preview in February, enables developers to use the tools and languages they know to build Splunk apps with custom dashboards, flexible UI and custom data visualizations. Building a Splunk app now looks and feels like building any modern web application, making Splunk development accessible to millions of professional developers around the world. With the Web Framework, developers can easily integrate third-party data visualizations and UI components working with HTML5 and JavaScript.

One of the benefits of the Web Framework is its flexibility – developers can choose to build their Splunk app using Simple XML, JavaScript or Django (or any combination thereof). Simple XML is ideal for fast, lightweight app customization and building and requires minimal coding knowledge, making it well-suited for Splunk power users in IT to get fast visualization and analytics from their machine data. You can also edit and convert a SimpleXML dashboard to HTML with one click to do more powerful customization and integration with JavaScript.

Developers looking for more advanced functionality and capabilities can build Splunk apps from the ground up using popular, standards-based web technologies: JavaScript and Django. The Web Framework lets developers quickly create Splunk apps by using prebuilt components, styles, templates, and reusable samples as well as supporting the development of custom logic, interactions, components, and UI.

Here’s an example of how to use a Django template tag to create a Chart view using the Web Framework:


{% chart id="mychart" managerid="mysearch" type="line" %}

Now here’s an example of how to use a JavaScript to create a Chart view using the Web Framework:


    var deps = [
        "splunkjs/mvc",
        "splunkjs/mvc/chartview"
    ];
    require(deps, function(mvc) {
        var Chart = require("splunkjs/mvc/chartview");
        new Chart({
            id: "mychart",
            managerid: "mysearch",
            "type": "line",
            el: $("#mychart")
        }).render();
    });

Data Models

Data Models define meaningful relationships in underlying machine data, making the data in Splunk more useful to broader base of users. Unlike data models in the traditional structured world, Splunk Data Models focus on machine data and data mashups between machine data and structured data. Splunk software is founded on the ability to flexibly search and analyze highly diverse machine data employing late-binding or search-time techniques for schema-creation (“schema-on-the-fly”) and Data Models are no exception – they define relationships in the underlying data, while leaving the raw machine data intact, and map these relationships at search time.

Data Models power the new Pivot interface by defining an abstract model of the underlying machine data and meaningful relationships in that data so business analysts can more quickly easily derive insights from their machine data. Data Models also allow developers to abstract away the search language syntax, making Splunk queries more manageable and portable. With Data Models, developers no longer have to embed long, often cryptic query strings in their applications. And since Data Models have inheritance, the relationships between Data Models can be programmaticaly managed. And of course you can use Data Models when building apps with the Web Framework. Working with data models allow developer to focus on coding rather than the search language.

Thanks to Data Models and Pivot, this search:


( sourcetype="access_*" OR sourcetype="iis*" ) ( uri="*" ) uri=* uri_path=* status=* clientip=* referer=* useragent=* ( status=2* ) ( uri_path!=*.php OR uri_path!=*.html OR uri_path!=*.shtml OR uri_path!=*.rhtml OR uri_path!=*.asp ) ( uri_path=*.avi OR uri_path=*.swf ) ( uri_path=*.itpc OR uri_path=*.xml ) | litsearch ( sourcetype=access_* OR sourcetype=iis* ) ( uri="*" ) uri=* uri_path=* status=* clientip=* referer=* useragent=* ( status=2* ) ( uri_path!=*.php OR uri_path!=*.html OR uri_path!=*.shtml OR uri_path!=*.rhtml OR uri_path!=*.asp ) ( uri_path=*.avi OR uri_path=*.swf ) ( uri_path=*.itpc OR uri_path=*.xml ) | eval newX = " " | eval "useragent ::: status"='useragent'+" ::: "+'status' | addinfo type=count label=prereport_events | fields keepcolorder=t "newX" "prestats_reserved_*" "psrsvd_*" "useragent ::: status" | fillnull value=NULL "useragent ::: status" | prestats count by newX "useragent ::: status"

Can be managed as:

| pivot WebIntelligence PodcastDownload count(PodcastDownload) AS "Count of PodcastDownload" SPLITCOL useragent SPLITCOL status FILTER uri isNotNull NUMCOLS 100

Get Started

Check out dev.splunk.com for more about the Splunk Web Framework, including reference documentation and tutorials.
Check out http://docs.splunk.com/ for more about Data Models.
Follow @splunkdev on Twitter to stay up-to-date on Splunk goodness for developers!

↧

The Splunk App for Unix 5.0 is finally here!

October 2, 2013, 1:00 pm

≫ Next: Still using 3rd party web analytics providers? Build your own using Splunk!

≪ Previous: What’s New for Developers in Splunk 6

| history | search app=”*nix”

Those of you who have been Splunk users for more than 4 years remember the glorious launch of the original Splunk App for Unix. Back in those days, the app shipped with the core product alongside the Splunk App for Windows and had some pretty cutting edge features, including knowledge, dashboards, and saved searches with out-of-the-box email alerts (we’re still sorry, Paul S.).

Well, it took a while for us to follow up that triumphant release, but wait no longer: the new app is finally here! And oh, what’s better, the app is FREE!!! Read on for the technical details of the app.

Visualizations

One of the primary goals of the Splunk App for Unix 5.0 is to introduce new visualizations. As much as we all love pie charts and line charts, we all hate pie charts and line charts. In the home view, you can see two radial graphs, which are designed to:

Allow users to set color-based thresholds
Maximize screen real estate for outliers
Allow a quick, at-a-glance view into several metrics

One of the coolest little things about the radial gauge are the “tracers”, the dotted line that appears when a particular reading decreases in value. We felt it was pretty obvious when a value was suddenly elevated, but less so when a value suddenly went down. The tracers allow the busy analyst, whose time is usually split among several screens and indicators, to have a better chance to notice changes on those huge monitors on the NOC/SOC wall.

Speaking of those ubiquitous monitors, clicking on the “expand” link on the home dashboard takes you to a special full screen version of the home dashboard that is designed for their resolution and contrast.

Workflow

One of the things that I found frustrating about the last version of the Splunk App for Unix was that if I wanted to view two different metrics (say, CPU and Memory utilization), I had to load up two or three different dashboards in browser tabs and switch between them. Another thing that wasn’t great for folks with large environments was that you could seldom see more than the top 10 hosts in any given metric.

The new metrics view allows several improvements from the old paradigm:

Choose which hosts you want to focus on
Build simple time series reports
Use shape and color to find trends and outliers
Visualize two sets of metrics side-by-side

If you want to get more information on a given host or host, the hosts view allows you to do just that. Users of the Splunk App for Hadoop Ops or S.O.S might recognize this view, but we’ve embellished it a bit for this release:

Ability to switch between nodes and table views
Ability to filter by different host categories and groups (more on that below)
Ability to pin hosts and subsequently compare their snapshots

Actionable Indicators

One of the most important things for our users was more and better actionable indicators. Put another way, visualizations and workflow is useful for solving problems, but what tells you that there is a problem in the first place?

The headlines feature on the home page allows you to link a headline, or short message, to a scheduled saved search that has been configured as an alert. That way, you can get a truncated, context-specific indicator that helps you know when it is time to investigate. Moreover, you may not need to see every alert that is fired on your system; in that case, just set up headlines for the alerts that are interesting enough to demand additional action.

OK, so you’ve seen the indicator fire – what’s so actionable about that? Click on the indicator to be redirected to the alerts view, where you can see what was happening on the affected systems in the previous five minutes before the alert fired. Of course, you can also view the results from the fired alert in the handy, familiar search view.

Asset Categorization

When we talked to our customers, one of the most consistent pieces of feedback we got was that asset management was a particularly hard problem for them in Splunk. Specifically, they emphasized that they view their hosts through many different lenses. For example, one team might be interested in a pivoting and filtering on hosts based on their data center location, while others might be interested in tier (dev, test, QA, prod) and still others by business unit (Finance, Accounting, HR, Sales).

To help accomplish this, we introduced the concept of categories and groups. Categories represent the view of your hosts that you want to take (datacenter, business unit, etc) and groups allow you to compartmentalize your hosts into discreet buckets within the given category. Hosts can only be a member of one group per category, but can be a member of many categories. For example, host01 can be in the “east” group of the “datacenter” category and the “prod” group of the “tier” category, but can’t be in the “east” and “west” groups of the datacenter category. That doesn’t even make sense!

What’s next

You tell us! Most of the features in the app were dictated by you, the customer. That’s just how we roll. Thus, it is up to you to tell us where to go next. Drop us a line on Splunk Answers to share your ideas.

One last note: thank you to Cary, Ian, Roy, Liu-Yuan, Barry, James, Malcolm, Jack, Stela, and the rest of the team that collaborated on this app with me. It was fun, challenging, and most of all rewarding to work with all of you. Here’s to next time!

↧