Quantcast
Channel: Dev – Splunk Blogs
Viewing all 218 articles
Browse latest View live

Still using 3rd party web analytics providers? Build your own using Splunk!

$
0
0

Why Build Your Own (BYO) Client-Side Analytics?

There are many 3rd party web analytics providers such as Google Analytics and Omniture SiteCatalyst. However, with the flexibility of Splunk as general purpose analytics tool, many site owners opt to build their own client-side analytics powered by Splunk. Last month we talked about how jQuery Foundation had their conference website leverage Splunk to collect & analyze all client-side events.

Compared to off-the-shelf web analytics tools, building your own client-side analytics gives you significant advantages:

  • Avoid giving away your users’ data to 3rd party providers
  • Own the complete raw client-side data (as opposed to an aggregation or a sampling), and access it securely – and for free
  • Unlimited tracking and customization: no collection limits or custom dimensions/variables limit as imposed by leading web analytics providers
  • Correlate client-side data with your already existing server-side logs or offline metadata

To learn more about the difference between server-side and client-side data, check out the first part of this previous blog post.

Let’s show you how you can easily instrument your own sites:

Client-Side Analytics using Splunk

1) Tracking

Going through the 3 blue stages from right to left in the above diagram, the first step, tracking, is achieved by pasting a JavaScript snippet to your page to load a small analytics library. To help you with that, we’re providing you with an easy-to-use analytics library sp.js that gives you:

  • Page-level tracking such as unique visitors and pageviews data out of the box
  • Event-level tracking such as user interactions with an easy-to-use API

Simply add this script tag before the closing </head> tag on your page. This will asynchronously fetch the JavaScript library sp.js from a global CDN without impacting the page load time:

<script type="text/javascript">
var sp=sp||[];(function(){var e=["init","identify","track","trackLink","pageview"],t=function(e){return function(){sp.push([e].concat(Array.prototype.slice.call(arguments,0)))}};for(var n=0;n<e.length;n++)sp[e[n]]=t(e[n])})(),sp.load=function(e,o){sp._endpoint=e;if(o){sp.init(o)};var t=document.createElement("script");t.type="text/javascript",t.async=!0,t.src=("https:"===document.location.protocol?"https://":"http://")+"d21ey8j28ejz92.cloudfront.net/analytics/v1/sp.min.js";var n=document.getElementsByTagName("script")[0];n.parentNode.insertBefore(t,n)};
sp.load("https://www.example.com"); // Replace with your own collector URL
</script>

In the last line of above script, make sure to replace https://www.example.com with the address of your data collector discussed in the following section.

2) Collection

To use sp.js, you must specify an endpoint where tracking calls get made to. Behind that endpoint, a single collection server (or distributed collection tier) can respond to these calls, and collect the tracked events into a log file, say events.log.

Again, to help you with this BYO project, we’re providing on github a sample code for a Node.js based backend collector server with instructions on how to run it.

Once deployed, copy the collector server address and use it in the last line of the script tag as mentioned above.

3) Analytics & Visualization

Finally, the file events.log can get be ingested into Splunk either by using a Splunk forwarder to send data to your existing Splunk deployment, or running a local Splunk instance that continuously monitors that file.

Once data in Splunk, the sky is the limit: set up Splunk monitoring & alerts, analyze with Splunk dashboards, or build your custom visualizations for traffic segmentation, A/B testing, funnel analysis, etc.

Client-side tracking in action:

Consider the following website showing a program schedule that consists of sessions. In this particular case, a call was made by sp.js to track a user’s mouse click that’s expanding a session description. Note that, as with many client-side interactions, this mouse click cannot be tracked from web server logs as it doesn’t trigger a web server request.

Client-side tracking example

Notice the tracked data consists of:

  • Event e - custom user event such as ‘Click Program Description’
  • Properties kv - set of key-value pairs representing properties associated with the event such as speaker name clicked, title of the talk and whether expand is true as opposed to false for collapse. Properties also contain an automatically generated id field for a universally unique identifier to uniquely identify the visitor.
  • Timestamp t - automatically generated field specifying exact client-side timestamp

Finally, the following snapshot shows how this tracked event is monitored in real-time in Splunk as it gets collected and logged:

Real-time ingestion of client-side event into Splunk

Code References – Available for you to use!

Publicly available sp.js JavaScript Library for Tracking:

https://github.com/splunk/splunk-demo-collector-for-analyticsjs#setup

Sample Node.js based Backend Server for Collection:

https://github.com/splunk/splunk-demo-collector-for-analyticsjs


Meet Your Splunk App Dev Contest Winners: Splunk for Your Car

$
0
0

When we announced the inaugural Splunk App Dev Contest back in June, we were looking forward to seeing great work done by Splunk developers from all over the world. We received submissions from China, South Africa, India and the United States with use cases spanning digital marketing, network monitoring, ERP management and the Internet of Things. The 1st place team (and winners of the Best App for Social Good) of Rich Acosta and Erica Feldman built a Splunk for Your Car app using an Android phone, an OBD2 Bluetooth adapter, Dropbox, Google Maps and, of course, Splunk. Splunk for Your Car provides individuals with a number of key driving metrics like Average Speed, Maximum Speed, Time Spent Driving, Distance Driven and even a “Green Score” calculated from several variables and patterns within each logged trip that that represents driving efficiency. Rich and Erica extract data from an application called Torque Pro that captures real-time data from a car’s OBD2 port with a custom Android application they wrote which formats data into Splunk-friendly logs placed in a folder on Dropbox. The data is visualized in a variety of charts and graphs and enhanced with mapping done via Google Maps. Unlike other trip logging solutions, Splunk for Your Car provides data directly back to the consumer without passing through a third party like an insurance company or car dealership, providing privacy and control over your own data.

Coming in 2nd place in the Splunk App Dev Contest is KnowledgeAlpha (available on GitHub), written by Ethan Tian using the Splunk Web Framework. KnowlegeAlpha is an infrastructure to make data in Splunk more easily usable by a broader base of business and operational users. A Splunk power-user, familiar with the Splunk Search Processing Language, creates a saved search accessible via providing a simple search bar into which business and operational users can enter various keywords that match the “knowledge definition” of the underlying saved search. The results are then organized into various charts that make the results easier to understand . Earning 3rd place is the PingMan app (also available on GitHub) built by GH Yang using the Splunk Web Framework and utilizing modular inputs. PingMan is a lightweight, persistent node management app that includes detailed reporting on node responsiveness and performance as well as

Congratulations to all of the winners and thanks to all of the teams who submitted projects for consideration. Stay tuned for more contests, showcases and hackathons!

Hunk is a Big Data Platform for Building Applications on Hadoop

$
0
0

Hunk is not only a revolutionary new software product for exploring, analyzing and visualizing data in Hadoop, it’s also a powerful platform for rapidly building applications powered by data stored in Hadoop Distributed File System (HDFS). If you’re a developer, you can build on the Hunk platform using your choice of popular languages, frameworks and tools without having to manually program MapReduce jobs. Hunk enables you to work with data in Hadoop using your existing skills and a variety of standards-based technologies. If you’re familiar with the developer platform for Splunk Enterprise, you know everything you need to know to develop with Hunk.  Taking into account some of the fundamental differences between Hunk and Splunk Enterprise – the data in Hadoop is at rest, so no “real-time” searches – Hunk doesn’t manage data collection, so any functionality related to data ingestion (REST API endpoints, modular inputs) don’t apply – the developer experience is consistent. So what exactly can you do with Hunk?

Build Big Data apps

With Hunk, you can build apps powered by data stored in HDFS  that deliver insights like clickstream analysis, deep customer behavioral modeling and security analysis at enterprise-grade scale. By connecting your data in HDFS to a virtual index in Hunk, you gain access to the capabilities of the Splunk Web Framework (which is integrated into Hunk).

The Splunk Web Framework makes building an app on top of Hadoop look and feel like building any modern web application. You can quickly style and customize your Splunk app using Splunk’s visual dashboard editor or convert your dashboard to HTML with one click for more powerful customization and integration with JavaScript and HTML5. If you’re a web developer familiar with modern web development technologies and model-view patterns, you can easily build apps on Hunk with advanced functionality and capabilities using JavaScript and the popular Django framework.

Integrate and Extend Hunk

Hunk also lets you integrate data from HDFS into other applications and systems across the enterprise, from custom-built mobile reporting apps to Web Parts in Microsoft SharePoint. You can do it easily using the REST API and Software Development Kits (SDKs) for Java, JavaScript, Python, C#, Ruby and PHP. Hunk provides a fully-documented and supported REST API with over 200 endpoints that lets developers programmatically search and visualize data in Hunk from any application. The Splunk SDKs include documentation, code samples, resources and tools to make it faster and more efficient to program against the Splunk REST API using constructs and syntax familiar to developers experienced with Java, Python, JavaScript, PHP, Ruby and C#.

Search and Correlate Data in HDFS

Hunk offers ad hoc exploration, analysis and visualization of historical data at rest in Hadoop. You can dynamically query data in HDFS using the familiar search bar or write a custom search command in a few lines of Python without having to cobble together a bunch of Apache projects and components or set up MapReduce. Hunk utilizes the Splunk Search Processing Language (SPL™), the industry-leading method that lets you interactively explore data across large, diverse data sets. With Hunk’s schema-on-the-fly, you can immediately query and interrogate raw data in Hadoop through visual interactions and SPL for deeper analysis.

The developer platform enables you to expand the search language and write your own custom search commands to perform specific processing or calculations. Let’s say you have semantic customer-related data in HDFS from Twitter, product reviews and other direct feedback. Hunk lets you write your own sentiment analysis command in Python to analyze the data sitting in HDFS.

Also, customers with both Splunk Enterprise and Hunk licenses can search across data stored both in Hadoop and in native indexes in Splunk Enterprise – all in the same search.

Get Started

Hunk: Raw data to analytics in < 60 minutes

$
0
0

Update: now with UI setup instructions

Finally, I got a bit of down time to sit down and get to the third part of the “Hunk: Splunk Analytics for Hadoop Intro” series of blogs, a follow up to part 1 and part 2

Summary of what we’ll do

1. Set up the environment
2. Configure Hunk
3. Analyze some data

So let’s get started ..

Minutes 0 – 20: Set up the environment

In order to get up an running with Hunk you’ll need the following software packages available/installed in the server running Hunk:
1. Hunk bits – download Hunk and you can play with it free for 60 days
2. JAVA – at least version 1.6 (or whatever is required by the Hadoop client libraries)
3. Hadoop client libraries – you can get these from the Hadoop vendor that you’re using or if you’re using the Apache distro you can fetch them from here

Installing the Hunk bits is pretty straightforward:

#1. untar the package 
> tar -xvf splunk-6.0-<BUILD#>-Linux-x86_64.tgz 
#2. start Splunk
> ./splunk/bin/splunk start 

Download and follow the instructions for installing/updating Java and the Hadoop client libraries and make sure you keep note of JAVA_HOME and HADOOP_HOME as we’ll need it in the next section.

Minutes 20 – 40: Configure Hunk using the UI

Configuring Hunk can be done either by (a) using our Manager UI interface by going to Settings > Virtual Indexes or (b) through editing conf files, indexes.conf. Here we’ll cover both methods starting with the UI first (thus Minutes 20-40 appear twice)

1. Login to Hunk (default user: admin, password: changeme)
hunk login

2. Navigate to Settings > Virtual Indexes
Settings > Virtual Indexes

3. Create an External Results Provider (ERP), by providing
create new provider

4. Specify some environment information such as: Java home, Hadoop home and some cluster information such as Hadoop version, JobTracker host/port, default filesystem etc
populate provider form

5. After saving the provider you can then move on to creating virtual indexes for this provider. You can do this by switching to the “Virtual Indexes” tab after saving the provider
new virtual index

6. The main configuration requirement for a virtual index is a path that points to the data you want the virtual index to represent. You can optionally specify a whitelist regex that matches only the files you want to be part of the index. Also, if the data is partitioned using time, as in my case, you can also tell Hunk about how the time partitioning is implemented (read the this section if your’re interested in how time partitioning works)
create new vix

7. After saving the virtual index you can immediately start exploring it’s content by simply hitting “Search”
saved virtual index

You can skip the next section if you’re not interested in learning how to configure Hunk using the conf files.

Minutes 20 – 40: Configure Hunk using the conf files

In this section I’ll walk you configuring Hunk using the configuration files. We are going to work with the following file:

$SPLUNK_HOME/etc/system/local/indexes.conf

First: we need to tell Hunk about the Hadoop cluster where the data resides and how to communicate with it – in Hunk terminology this would be an “External Results Provider” (ERPs). The following stanza shows an example of how we define a Hunk ERP.

[provider:hadoop-dev01]
# this exact setting is required
vix.family = hadoop

# location of the Hadoop client libraries and Java
vix.env.HADOOP_HOME = /opt/hadoop/hadoop-dev01
vix.env.JAVA_HOME = /opt/java/latest/

# job tracker and default file system
vix.fs.default.name = hdfs://hadoop-dev01-nn.splunk.com:8020
vix.mapred.job.tracker = hadoop-dev01-jt.splunk.com:8021

# uncomment this line if you're running Hadoop 2.0 with MRv1
#vix.command.arg.3 = $SPLUNK_HOME/bin/jars/SplunkMR-s6.0-h2.0.jar

vix.splunk.home.hdfs = /home/ledion/hunk
vix.splunk.setup.package = /opt/splunkbeta-6.0-171187-Linux-x86_64.tgz  

Most of the above configs are self explanatory, however I will take a few lines to explain some of them:

[stanza name]
This must start with “provider:” in order for Hunk to treat it as an ERP, the rest of the string is the name of the provider, so feel free to get more creative than me :)

vix.splunk.home.hdfs
This is a path in HDFS (or whatever the default file system is) that you want this Hunk instance to use as it’s working directory (scratch space)

vix.splunk.setup.package
This is a path in the Hunk server where Hunk can find a Linux x86_64 Hunk package which will be shipped and used on the TaskTracker/DataNodes.

Second: we need to define a virtual index which will contain the data that we want to analyze. For this post I’m going to use Apache access log data which is partitioned by date and is stored in HDFS in a directory structure that looks like this:

/home/ledion/data/weblogs/20130628/access.log.gz
/home/ledion/data/weblogs/20130627/access.log.gz
/home/ledion/data/weblogs/20130626/access.log.gz
....

Now, let’s configure a virtual index (in the same indexes.conf file as above) that encapsulates this data

[hunk]
# name of the provider stanza we defined above
# without the "provider:" prefix
vix.provider = hadoop-dev01

# path to data that this virtual index encapsulates 
vix.input.1.path = /home/ledion/data/weblogs/...
vix.input.1.accept = /access\.log\.gz$
vix.input.1.ignore = ^$

# (optional) time range extraction from paths 
vix.input.1.et.regex = /home/ledion/data/weblogs/(\d+)
vix.input.1.et.format = yyyyMMdd
vix.input.1.et.offset = 0

vix.input.1.lt.regex = /home/ledion/data/weblogs/(\d+)
vix.input.1.lt.format = yyyyMMdd
vix.input.1.lt.offset = 86400

There are a number of things to note in the virtual index stanza definition:

vix.input.1.path
Points to a directory under the default file system (e.g. HDFS) of the provider where the data of this virtual index lives. NOTE: the “…” at the end of the path denote that Hunk should recursively include the content of subdirectories.

vix.input.1.accept and vix.input.1.ignore allow you to specify regular expressions to filter in/out files (based on the full path) that should/not be considered part of this virtual index. Note that ignore takes precedence over accept. In the above example vix.input.1.ignore is not needed, but I included it to illustrate its availability. A common use case for using it is to ignore temporary files, or files that are currently being written to.

So far so good, but what the heck is all that “.et/lt” stuff?

Glad you asked :) In case you are not familiar with Splunk, time is a first class concept in Splunk and thus by extension in Hunk too. Given that the data is organized in a directory structure using date partitioning (and this is a very common practice) the “.et/lt” stuff is used to tell Hunk the time range of data that it can expect to find under a directory. The logic goes like this: match the regular expression against the path, concatenate all the capturing groups, then interpret that string using the given format string and finally add/subtract a number of seconds (offset) from the resulting time. The offset comes in handy when you want to extend the extracted time range to build some safety, e.g a few minutes of a given day end up in the next/previous day’s dir, or there’s a difference in timezone from the directory structure and the Hunk server. We do the whole time extraction routine twice in order to come up with a time range, ie extract an earliest time and a latest time. When the time range extraction is configured, Hunk is be able to skip/ignore directories/files which fall outside of the search’s time range. In Hunk speak this is known as: time based partition pruning.

Third: we need to tell Hunk how to schematize the data at search time. At this point we’re entering classic Splunk setup and configuration. In order for Hunk to bind a schema to the data we need to edit another configuration file.

We are going to work with the following file:

$SPLUNK_HOME/etc/system/local/props.conf
[source::/home/ledion/data/weblogs/...]
priority = 100
sourcetype = access_combined

This stanza tells Hunk to assign sourcetype access_combined to all the data in our virtual index (ie all the data under /home/ledion/data/weblogs/). The access_combined sourcetype is defined in $SPLUNK_HOME/etc/system/default/props.conf and defines how access log data should be processed (e.g. each event is a single line, where to find the timestamp and how to extract fields from the raw event)

Minutes 40 – 59: Analyze your Hadoop data

Now we’re ready to start exploring and analyzing our data. We simply run searches against the virtual index data as if it was a native Splunk index. I’m going to show two examples, highlighting data exploration and analytics

1. explore the raw data

 index=hunk 

Use Hunk to explore your Hadoop data

2. get a chart showing the status codes over a 30 day window, using daily buckets

 index=hunk | timechart span=1d count by status 
Hunk report example

Minutes 59 – ∞: Keep on Hunking !

There’s an unlimited number of ways to slice, dice and analyze your data with Hunk. Take the latest Hunk bits for a spin – we’d love to get your feedback on how to make Hunk better for you.

Learn more about how you can use Hunk to search images stored in your Hadoop cluster

Stay tuned for another post where I’ll walk you through how to extend the data formats supported by Hunk as well as how to add your own UDFs …

Splunk and Ford Test Drive Open Data Development in Connected Cars

$
0
0

Splunk Inc. and Ford Motor Company collaborated to analyze real-time automotive data to gain insight into driving patterns and vehicle performance.

Using Ford OpenXC to gather data from connected vehicles, Splunk employees hit the streets of San Francisco in a Ford Focus Electric Vehicle and a gas-powered Ford Escape. The data was indexed, analyzed and visualized in Splunk® Enterprise and is now publicly available.

Check out the Connected Car dashboards and watch the video to see all the fun we had!

Want to know more about how we built the project? Keep reading for the technical deep dive.

OpenXC – what it is?

OpenXC is open source hardware/software that allows you to pull a wealth of data off your vehicle in real-time and extend vehicle capabilities via plugable modules.

In short, OpenXC is an API for your car.

If you know a bit about vehicle telemetry, you are probably wondering how this is different from ODB-II scanners. There are many differences, including number of data points collected, platforms supported and quality of the data. The OpenXC’s FAQ does a great job at addressing this in depth if you want the full read.

Getting the data from OpenXC into Splunk

Screen Shot 2013-11-13 at 3.53.16 PMThis was incredibly easy. We used the OpenXC vehicle interface and followed the step-by-step instructions to get data streaming via Bluetooth from the OpenXC vehicle interface to our laptop.

OpenXC vehicle interface sends data in a JSON format. Sample OpenXC data:

{“name”:”accelerator_pedal_position”,”value”:0,”timestamp”:1365512404.143000}

In Splunk we configured a new data input to ingest this JSON data and put it in index=openxc. A few field extractions later and we were ready to start creating searches and dashboards.

Creating Searches and Dashboards

I will walk you through a few of the dashboards and try to highlight some of the important panels. If there is a dashboard or panel I don’t describe that you are interested in learning more about, leave a comment or send us a tweet at @Splunk4Good.  For existing Splunk users analyzing this post with a watchful eye, you might notice the “tstats” search command being used.  To make these dashboards a speedy as our drivers, we used the “tscollect” command to take searches from the raw data and create an accelerated dataset, then used “tstats” to analyze it.   This data acceleration technology was introduced in Splunk 5.0 and have come to the forefront in the High Performance Analytics Store that Pivot / DataModel features makes use of in Splunk 6.

Electric Powered and Gas Powered Dashboards

These dashboards displays OpenXC data from the 2014 Ford Focus Electric and gas powered 2013 Ford Escape.

Distance panel – This panel utilizes a Single Value visualization. The OpenXC odometer signal is in kilometers by default, so we converted to miles. Here is the search driving that panel:

|tstats earliest(odo) as s_odo latest(odo) as e_odo from electricdata where driver=$driver$       | eval miles_driven=(e_odo-s_odo)*0.621        | fields miles_driven

MPG panel - This panel utilizes a Single Value visualization. We are using the OpenXC signals fuel_consumed_since_restart and odometer (we converted from the default of kilometer to miles). At first we were shocked by the low MPH (~5-7MPG). Upon further inspection we think our findings are legit given the very short distance (~2 miles), top speeds reached (~50-65 MPH) and city traffic conditions. Here is the search driving that panel:

| tstats earliest(odo) as s_odo latest(odo) as e_odo earliest(fuel_consumed_since_restart) as sfuel, latest(fuel_consumed_since_restart) as efuel from gasdata where driver=$driver$       |  eval gallons_spent=(efuel-sfuel)*0.264        |  eval miles_driven=(e_odo-s_odo)*0.621        |  eval mpg=miles_driven/gallons_spent       | fields mpg

Average Speed panel – This panel utilizes a Radial Gauge visualization. The OpenXC vehicle_speed signal is in kilometers by default, so we converted to miles. Here is the search driving that panel:

| tstats avg(speed) as Speed from electricdata where driver=$driver$ | eval Speed = (Speed*0.621)

Speed range map panel – This panel utilizes the Splunk Native Maps feature to plot the coordinates and speed. We are using the OpenXC latitude and longitude signals.

Here is the search driving that panel:

host=electric source=*erik* (name=latitude AND value=37*) OR (name=longitude AND value=-122*) OR (name=vehicle_speed) | eval {name}=value | transaction startswith=(name=vehicle_speed) | search latitude=* longitude=* | eval latitude=mvindex(latitude,0) | eval longitude=mvindex(longitude,0) | eval mpg_speed=(vehicle_speed*0.621) | bin span=10 mpg_speed | dedup latitude longitude sortby +_time | geostats  latfield=latitude longfield=longitude count by mpg_speed maxzoomlevel=18

Screen Shot 2013-11-13 at 4.17.31 PM

Average Speed and Acceleration over time panel - This panel utilizes an Area visualization to compare speed and acceleration. Negative acceleration is deceleration. We are using the OpenXC vehicle_speed signal (converted to miles as it is in kilometers by default).  Acceleration–defined–is the rate at which the velocity of a body changes with time.  The search retrieves data from the Splunk 6.0 “High Performance Analytics Store” using the “tstats” command, grouping by the _time field”.  Next we use “autoregress” to setup the data so we can calculate a moving average.  Further in the search string we use the formula to calculate acceleration by creating the “accel_mss” field (comparing current and previous speed and time with previous time).  Speed and Acceleration are then charted over time. Here is the search driving that panel:

|tstats max(speed) as speed from electricdata where driver=$driver$ groupby _time span=1s| autoregress _time as prev_time p=1 | autoregress speed as prev_speed p=1 | eval accel_mss=(speed-prev_speed)*0.277/(_time-prev_time) | timechart max(speed) as Speed avg(eval(accel_mss*10)) as Acceleration | eval Speed = (Speed * 0.621)

Screen Shot 2013-11-13 at 4.23.05 PM

Comparison Dashboard

Accelerator Pedal Position panel – This panel utilizes a column visualization. We are using the OpenXC accelerator_pedal_position signal (0-100%) and visualizing the weighted count per driver and car of times accelerator was in a given position.

|tstats max(accelerator_pedal_position) as accelerator_pedal_position from comparisondata groupby _time $split$ span=1s | search accelerator_pedal_position!=0 | eval accelerator_pedal_position = round(accelerator_pedal_position/5)*5| eventstats count by $split$ | eval $split$=case($split$ LIKE “%erik-electric%”, “Erik-Electric”,$split$ LIKE “%marklar-electric%”, “Marklar-Electric”,$split$ LIKE “%gerald-escape%”, “Gerald-Gas”,$split$ LIKE “%erik-escape%”, “Erik-Gas”,$split$ LIKE “%erik-escape%”, “Erik-Gas”,$split$ LIKE “%marklar-escape%”, “Marklar-Gas”,$split$ LIKE “erik”, “Erik”, $split$ LIKE “marklar”, “Marklar”, $split$ LIKE “gerald”, “Gerald”,$split$ LIKE “electric”, “Electric”, $split$ LIKE “escape”, “Gas” )|eval weight = 100/count  | chart sum(weight) by accelerator_pedal_position $split$

Screen Shot 2013-11-14 at 5.26.12 AM

Steering Wheel Angle panel – This panel utilizes a column visualization. We are using the OpenXC steering_wheel_angle signal (-600 to +600 degrees) and visualizing the weighted count per driver and car of times steering wheel was in a given position.

|tstats max(steering_wheel_angle) as steering_wheel_angle max(vehicle_speed) as vehicle_speed from comparisondata groupby _time $split$  span=1s| streamstats last(vehicle_speed) as vehicle_speed| search vehicle_speed!=0 | eval steering_wheel_angle = round(steering_wheel_angle/60)*60 |  eventstats count by $split$ | eval $split$=case($split$ LIKE “%erik-electric%”, “Erik-Electric”,$split$ LIKE “%marklar-electric%”, “Marklar-Electric”,$split$ LIKE “%gerald-escape%”, “Gerald-Gas”,$split$ LIKE “%erik-escape%”, “Erik-Gas”,$split$ LIKE “%erik-escape%”, “Erik-Gas”,$split$ LIKE “%marklar-escape%”, “Marklar-Gas”,$split$ LIKE “erik”, “Erik”, $split$ LIKE “marklar”, “Marklar”, $split$ LIKE “gerald”, “Gerald”,$split$ LIKE “electric”, “Electric”, $split$ LIKE “escape”, “Gas” )| eval weight = 100/count | chart sum(weight) by steering_wheel_angle $split$

Screen Shot 2013-11-14 at 5.27.17 AM

Transparency

Want the raw data and see if you can find new insights or bugs in our analysis? Great! Here is a zip of the gas-powered vehicle data. The electric data is from a developer build of the OpenXC Vehicle Interface and our friends at Ford have asked us to hold back on releasing the raw data just yet.

Thanks

This was one of the coolest Splunk4Good projects I have been able to pull off! First I have to thank Ford, especially TJ, for being excellent collaborators and also our consultant Jon Guzik who made this collaboration possible.

This project is part of Splunk4Good, Splunk’s corporate social responsibility program, and was entirely volunteer driven.

Thank you to the entire team that made this project a reality, but a very special thanks to the brilliant Michael Wilde, Chief of Awesome in the CTO’s Office. He worked tirelessly with the team to ensure the Connected Car Dashboards were…Awesome :)

My experience of building Splunk application

$
0
0

I joined Splunk a couple weeks ago and my first challenge was to learn everything I could about how to build Splunk applications. The best way of doing that is just to write your own application – and this is exactly what I did.

Application which I wrote contains two parts. The first part of application is a very simple scripted input for Firebase, the second part of application is built with the Splunk Web Framework that shows you objects and their routes on Google Maps using both real-time or playback historic information.

I hope that my experience can give you some thoughts about how you can extend Splunk for your needs.

Prepare environment

Installing Splunk

The first thing you need to do is download Splunk software on your local machine. You can find instructions about how to do it on this page Step-by-step installation instructions. In my case, I was using Mac OS X. The easiest way probably is just to download a tar file and extract it somewhere (I’ll assume that this folder is set to $SPLUNK_HOME variable). Next command can launch Splunk:

$SPLUNK_HOME/bin$ ./splunk start

This command launches both the server and UI componants of Splunk. On first launch it will ask you to read and accept license agreement. If you have security issues with launching this application, just don’t forget to give executable permissions for Splunk:

$SPLUNK_HOME/bin$ chmod +x splunk

After you launch it – you can try to open web page http://localhost:8000, where you will be asked to enter your user name and password (page will show you default user name and password).

To stop Splunk you need to use next command:

$SPLUNK_HOME/bin$ ./splunk stop

Installing application

Next step is to install the application that I built in Splunk. Splunk keeps all applications under $SPLUNK_HOME/etc/apps. If you need to install applications manually (without using http://apps.splunk.com/ you need copy your applications under $SPLUNK_HOME/etc/apps folder and that is it.

If you are familiar with git, you can clone my application repository somewhere, for example on ~/Desktop

~/Desktop$ git clone https://github.com/splunk/splunk-demo-app-firebase

After that you will need to copy folder routemap to $SPLUNK_HOME/etc/apps folder. If you don’t know how to use git or you don’t have git client installed – this is ok too. You can download zip archive splunk-demo-app-firebase, unzip them somewhere and copy folder routemap to $SPLUNK_HOME/etc/apps.

To load these applications you need to restart Splunk:

$SPLUNK_HOME/bin$ ./splunk restart

After all these steps you should be able to see the Route Map application on Splunk Dashboard:

Splunk Dashboard

If everything is installed and you have an Internet connection after navigating to Route Map application, you should be able to see buses and their routes on the map:

Route Map demo application

Building Application

Now, let’s talk about application and about what did I use to build it.

Firebase scripted input

Node.js and bash part

The first application which I wrote uses scripted input. Using scripted inputs is one of the simplest ways to import custom data into Splunk.

I needed to find a way to import data from one of the datasets of the Firebase database, which provides real-time information about buses locations in San Francisco (sf-muni). On the Firebase documentation website I found that it has SDK for Node.js. At first I wrote simple Node.js application, which just sends all data it gets to console output, you can find this application under bin/app. You can try to launch it with the following command if you have Node.js installed:

.../routemap/bin/app$ node app.js

If you don’t have Node.js on your machine, this is not a problem. Splunk has it, so you can write next command using cmd option:

$SPLUNK_HOME/bin$ ./splunk cmd node $SPLUNK_HOME/etc/apps/routemap/bin/app/app.js

If everything is installed properly you should see serialized JSON in output, like this:

{"dirTag":"08X__IB","heading":134,"id":6272,"lat":37.7180899,"lon":-122.44271,"predictable":"true","routeTag":"8X","secsSinceReport":25,"speedKmHr":0,"ts":1383004678.15}
{"dirTag":"28_IB2","heading":357,"id":8231,"lat":37.7480959,"lon":-122.47592,"predictable":"true","routeTag":28,"secsSinceReport":2,"speedKmHr":42,"ts":1383004701.141}
{"dirTag":"49_IB2","heading":1,"id":7011,"lat":37.79403,"lon":-122.42289,"predictable":"true","routeTag":49,"secsSinceReport":16,"speedKmHr":24,"ts":1383004846.601}
{"dirTag":"08AX_OB","heading":168,"id":6247,"lat":37.785683,"lon":-122.40955,"predictable":"true","routeTag":"8AX","secsSinceReport":15,"speedKmHr":16,"ts":1383004487.437}
{"dirTag":"14_OB2","heading":210,"id":7013,"lat":37.7162999,"lon":-122.44122,"predictable":"true","routeTag":14,"secsSinceReport":34,"speedKmHr":22,"ts":1383004809.27}
{"dirTag":"38_IB1","heading":267,"id":6414,"lat":37.777813,"lon":-122.492805,"predictable":"true","routeTag":38,"secsSinceReport":2,"speedKmHr":13,"ts":1383004860.681}

If go one level up from this folder you also will find simple bash script launch_app.sh, which launches this Node.js application using Splunk:

#!/bin/bash

current_dir=$(dirname "$0")
"$SPLUNK_HOME/bin/splunk" cmd node "$current_dir/app/app.js"

For Windows I wrote similar script launch_app.cmd.

Splunk integration

Ok, we have application, which can send all data it gets in console. Next step is to configure the application to make it work with Splunk. In folder ./default you can find two configuration files inputs.conf

# Linux Bash script
[script://./bin/launch_app.sh]
disabled = 0
sourcetype = firebase
source = sf-muni-data
host = publicdata-transit.firebaseio.com

# Windows Batch script
[script://.\bin\launch_app.cmd]
disabled = 0
sourcetype = firebase
source = sf-muni-data
host = publicdata-transit.firebaseio.com

and props.conf:

[firebase]
NO_BINARY_CHECK = 1
TIME_PREFIX="ts":

In file inputs.conf I specified information about how to launch my scripted input and what are the default field values source, sourcetype and host. You can find detailed documentation for inputs.conf at Splunk’s documentation. Also I highly recommend you to learn About default fields (host, source, sourcetype, and more).

File props.conf helps to recognize timestamp values from my events. Using Splunk data preview page you can easily find the right set of properties for your input. This was very helpful for me, so I’d like to explain you in details how you can do this.

For example, you can launch next command to generate preview.log for firebase app:

.../firebase/bin/app$ node app.js >> preview.log

Now you can use preview.log file to find out which properties you need to use. On Splunk Dashboard page click on Add Data:

Add data

After that choose From files and directories link

Add data from file

The Next step will be Preview data before indexing, just set path to file preview.log

Specify file for preview

On the next page you can choose Start new source type, but we are not going to import anything, we just want to find the right properties for our events. As you can see, Splunk failed to parse the timestamp:

Failed to parse timestamp

Click the adjust timestamp and event break settings link on the top of the page and go to the Timestamps tab, insert “ts”: in Timestamp is always prefaced by a pattern and click the Apply button:

Fixed timestamp

Ok, so looks like this is exactly what we want. The next step is to open Advanced mode (props.conf) tab and just copy Currently applied settings to props.conf file.

Advanced mode

Route map demo view

My second part of application is a Django Bindings based extension for Splunk with one custom view. To get template for your first Web Framework app, you just need to run one command:

$SPLUNK_HOME/etc/apps/framework$ ./splunkdj createapp my_app_name

After creating the application folders and template, I spent most of my time working in JavaScript files under django/routemap/static/routemap and HTML layout in file map.html.

Search Macros

Before starting work on your application, think about the data you are going to use and how are you going to use it. In my case, I knew that I was going to use events the I imported with my Firebase scripted input. I also wanted my Route Map application to work with any type of event with only one requirement: these events need to have geo data (latitude and longitude) and timestamps (all events in Splunk have timestamps, right?). So I decided to find an easy way in which user can prepare their data for showing it with my application. And this is how I met Search Macros. In my case I built normalize macros (macros.conf):

# macro prepares data for showing on map with grouping by one field
[normalize(4)]
args = ts, lat, lon, field1
definition = sort $ts$ | eval data=$ts$+";"+$lat$+";"+$lon$ | table data, $field1$

You can find in macros.conf more macros witht the same name normalize, but if you will take a deeper look on them you will find out that they are all the same, the only difference is that they return more fields with last command table. These are fields, which these macros expect:

  • ts – event timestamp, I use this field for sorting events on server side to make sure that I have all points in order on client side.
  • latLatitude.
  • lonLongitude.
  • field1, field2, field3, field4 – you can specify at minimum just one field or up to 4 fields by which you want to group these events. Basically these fields help me to identify objects.

So, for example, if you have events in Splunk from some source cars-positions which stores information about cars and their positions in time, for example each event has timestamp, latitude, longitude and plate-number fields, you can write next search command to use my Route Map application with your data:

source="cars-positions" | `normalize(ts=timestamp, lat=latitude, lon=longitude, field1=plate-number)`

Splunk Web Framework

Before building your own application with the Web Framework, it’s helpful to learn how the JavaScript libraries included with the Web Framework work:

  • UnderscoreJS – helps you to do a lot of manipulations with Arrays and objects.
  • Backbone.js – helps you to keep your application more MVC structured.
  • RequireJS – helps to manage dependencies in Splunk.
  • jQuery – helps to do manipulations with DOM.

You don’t really need to know all these libraries. If you never used these JavaScript libraries and you don’t want to go too deep, you can look at examples in the documentation for common implementations.

SplunkJS

If you want to build a Splunk App with the Web Framework it is also good to learn about the SplunkJS stack. In my application I used only three SplunkJS components (you can take a look how in this file mapObjectsPageController.js):

  • SearchManager – The is the SplunkJS component that helps you to get data from Splunk. Make sure you read: How to create a search manager using SplunkJS Stack.One thing worth to mention is that if you need to get real-time data you need to subscribe on preview event, and in case of non-real-time data you need to subscribe on result event (you also can keep using preview event, but to get all events it is better to use result event). Documentation has note about thisNote that real-time searches don’t finish and they never fire a search:done event.This is actually why I subscribe for both preview and results events. I handle preview when application is in realtime mode:
    // When we are in real-time we get only events on preview
    var previewData = this.searchManager.data("preview", {count: 0, output_mode: 'json'});
    previewData.on('data', function() {
      if (previewData.hasData() && this.mapObjectsView.viewModel.realtime()) {
        dataHandler(previewData.data().results);
      }
    }.bind(this));

    And handle results when it is in not realtime mode:

    // When we are not in real-time we get events on results
    var resultsData = this.searchManager.data('results', {count: 0, output_mode: 'json'});
    resultsData.on('data', function() {
      if (resultsData.hasData() && !this.mapObjectsView.viewModel.realtime()) {
        dataHandler(resultsData.data().results);
      }
    }.bind(this));

    But the data handler function is the same for both of them, which parses latitude, longitude and timestamp fields and call renderPoints method:

    var dataHandler = function(results) {
      var dataPoints = [];
    
      for (var rIndex = 0; rIndex < results.length; rIndex++) {
        var result = results[rIndex];
    
        if (result.data) {
          var data = result.data.split(';');
          var point = { ts: parseFloat(data[0]), lat: parseFloat(data[1]), lon: parseFloat(data[2]) };
          delete result['data'];
          dataPoints.push({obj: result, point: point});
        }
      }
    
      this.mapObjectsView.renderPoints(dataPoints);
    }.bind(this);
  • They other two components SearchBarView and SearchControlsView provide basic search bar and controls views:Search controlsI have a small trick about how I use SearchManager and SearchBarView, When a user wants to see real-time data set the interval for search [now-10s, now], which only gets new events for last 10 seconds, and at the same time I use selected real-time interval as a time window for how long the application needs to keep points in history. For example, if user selects that he wants to see real-time data in 30 minute window I keep all points in memory for 30 minutes, but ask the server only for events in the last 10 seconds. I don’t do any statistics on the server, so I don’t expect that something can be changed in past. This is my handler for time range events which handles this:
    // Update the search manager when the timerange in the searchbar changes
    this.searchBarView.timerange.on('change', function(timerange) {
      this.mapObjectsView.viewModel.realtime((/^rt(now)?$/).test(timerange.latest_time));
      if (this.mapObjectsView.viewModel.realtime()) {
        var timeWindow = parseTimeWindow(timerange.earliest_time);
        // In case of real-time we use Search time range as a time window 
        // for how long we want to keep data on client. But we always ask 
        // server only for new events with range -30 seconds.
        this.mapObjectsView.viewModel.timeWindow(timeWindow);
        this.searchManager.search.set({ latest_time: 'rt', earliest_time: 'rt-30' });
      } else {
        this.mapObjectsView.viewModel.timeWindow(null);
        this.searchManager.search.set(this.searchBarView.timerange.val());
      }
    }.bind(this));

Application architecture

Ok, now that I’ve told you everything I learned about building Splunk Applications, let’s talk about the Route Map application architecture and how to customize it.

  • mapObjectsDictionary.js – this file has two types MapObject and MapObjectsDictionary. MapObject represents each object on a map, this type keeps information about all points in time for this object and calculates the current position of the current object when somebody changes currentTime with the calculatePos method. MapObjectsDictionary is a container, which stores all instances of MapObject.
  • mapObjectsViewModel.js – this files contains one type MapObjectsViewModel, which keeps all required for my view properties and can replay how objects were moved in time from beginTime to endTime.
  • mapObjectsView.js – this file contains two types RoutesMapView and MapObjectListViewItem. First view type represents whole View you can see on page, including map, special controls and the list of all objects on map, which you can find on the left. MapObjectListViewItem represents all items on left.
  • mapObjectsPageController.js – this file has all logic to connect views with Splunk.

Two modes

Route Map application can work with real-time data as well as replay historic data depending on what you choose in the search control application.

In real-time mode you can only track locations of objects at the current time. The application also stores the routes of these objects for time window which you choose in the search controls:

Real time

If you choose a specific time range in past, like “Last 15 minutes”, you can replay object movements on the map, also you can specify Speed and Refresh Rate (how often you want to redraw points on map):

Playback mode

In both modes you can filter which objects you can see on map, as well turn showing routes off and on. To find object on the map you can click on the corresponding color in the list.

Links

  1. Route Map Demo Application on http://apps.splunk.com/.
  2. Splunk Installation Manual.
  3. Splunk Apps.
  4. Splunk Web Framework Overview.
  5. Splunk Web Framework Reference.
  6. Splunk Web Framework on GitHub.
  7. UnderscoreJS.
  8. Backbone.js.
  9. RequireJS.
  10. jQuery.
  11. GMaps.js – kudos to author of this library. It helps a lot to start using Google Maps.
  12. splunk-demo-app-routemap on GitHub.

Comparing week-over-week results

$
0
0

Comparing week-over-week results is a pain in Splunk. You have to do absurd math with crazy date calculations for even the simplest comparison of a single week to another week.

No more. I wrote a convenient search command called timewrap that does it all, for arbitrary time periods, over *multiple* periods (compare the last 5 weeks). Compare week-over-week, day-over-day, month-over-month, quarter-over-quarter, year-over-year, or any multiple (e.g. two week periods over two week periods). It also supports multiple series (e.g., min, max, and avg over the last few weeks).

After a ‘timechart’ command, just add “| timewrap 1w” to compare week-over-week, or use ‘h’ (hour), ‘m’ (month), ‘q’ (quarter), ‘y’ (year).

I’m done my part. Now do yours — download it, give feedback, let me know of problems, and rate the app.

Thanks.

 

timewrap

Help us grow the Splunk developer platform with your ideas and votes

$
0
0

Hello Splunk Developers!

I recently joined Splunk working on our developer platform efforts driving our SDKs and Tools. We are excited to be taking forward our dev platform and continuing to bring you better and better support for integrating with Splunk, extending Splunk, and building Splunk applications. On the dev plaform team we are now planning out what we’re going to do in the future. We’d love to have you help us figure out where we go next:

  • Should we invest in SDKs for mobile devices like IOS and Android?
  • Are there specific Splunk features like Data Models you’d like to see surface in our existing SDKS?
  • Should we be adding a new kind of charting to the web framework?
  • Are there language specific capabilities we could be better leveraging?
  • Should we provide richer support for Splunk extensions like our new custom search commands in different languages?

For your input on these questions and more, we recently launched a new UserVoice site at splunkdev.uservoice.com. Using this site you can tell us about what new features you’d like to see either by submitting new requests, or voting on existing. You can also comment on why you see that feature as valuable (or not).

Below is a screenshot of what you’ll see when you enter the site:

uservoice

If you don’t have a UserVoice account you’ll be asked to create one. Once you login you can vote on existing feature requests.

 

Screen Shot 2014-01-27 at 7.01.17 PM

 

And you can easily make a suggestion for a new feature or multiple features!

 

feature

 

Once the requests are in, anyone who has signed up can comment on the features. This includes members of our team who can weigh in on the conversation or ask for more clarity.

We’ll be monitoring UserVoice going forward to feed your ideas into our planning efforts. It’s your way to tell us what you want to see and connect directly with our team! Is there some killer feature that’s just critical to your business? Let us know!

We look forward to seeing you at splunkdev.uservoice.com.


Splunk Alerts and Charts on Your iPhone

$
0
0

Now Splunk is EVERYWHERE!

Push alerts and charts to your cellphone from your Splunk servers, when you’re on the beach.  Get your Splunk data conveniently on the go.  Available now!

EVERYWHERE is a one-way data push from firewalled splunk servers to mobile devices, via a cloud-based service run by Splunk or your own organization.

Go here:  Get the app for your Splunk server, sign up for the cloud services, and get the iPhone app.

Not an official Splunk product, but a really useful skunkworks project.

How to debug Django applications with pdb, PyCharm, and Visual Studio

$
0
0

Using a debugger is a common way to find out what is wrong with your application, but debugging a Django application in Splunk might not be so obvious. But it is possible, and I’ll show you how using pdb, PyCharm, and Visual Studio.

Disclaimer: Don’t try this in a production environment.

Python interpreter

Splunk ships with a Python interpreter. To launch it, use the splunk cmd command (see Command line tools for use with Support):

Windows

%SPLUNK_HOME%\bin > splunk.exe cmd python

Mac OS / Linux

$SPLUNK_HOME/bin $ ./splunk cmd python

To help run this command, let’s create a couple of small shell scripts under $SPLUNK_HOME/bin:

Windows (save it as python_splunk.cmd)

"%~dp0\splunk.exe" cmd python %*

Mac OS / Linux (save it as python_splunk.sh)

#!/bin/bash
"$(dirname "$0")/splunk" cmd python "$@"

Note: However you name the scripts, just make sure the name starts with python. Otherwise, you’ll run into this PY-11992 issue.

On Mac OS / Linux, be sure to give permissions to execute the script:

Mac OS / Linux

$SPLUNK_HOME/bin $ chmod +x python_splunk.sh

Note: We need a shell script for two reasons. First, a script simplifies the way we can execute Python code with the interpreter included with Splunk. Second, we need to have an executable file to set up the Python interpreter in Visual Studio and PyCharm. These IDEs do not allow you to specify a command with an executable and a set of parameters.

Now we can run the Python interpreter from the shell with the script:

Windows

%SPLUNK_HOME%\bin > python_splunk.cmd

Mac OS / Linux

$SPLUNK_HOME/bin $ ./python_splunk.sh

Discovering the start point of the SplunkWeb service

Before I show you how to debug SplunkWeb, I want to show you how to find out what you need to launch it manually so that if something changes in the future (how we launch SplunkWeb) these steps will help you to diagnose this and find new a start point.

Currently Splunk 6.0.1 is the latest stable version. If you are using this version, feel free to skip this section. If you skip these steps and if debugging does not work for you, follow the steps in this section to find what has been changed.

Note: All commands in this section are from a Mac OS terminal because Splunk for Windows has an actual SplunkWeb.exe service and it is not so easy to find out how it starts Python code.

I assume that you have Splunk installed on your development box. If Splunk is not running, start it from a terminal:

Mac OS / Linux

$SPLUNK_HOME/bin $ ./splunk start

Now run this command:

Mac OS / Linux

$SPLUNK_HOME/bin $ ps x -o command= | grep "^python"

This command shows us all processes that have python as an executable. This is the output I see:

Mac OS / Linux

python -O /Users/dgladkikh/dev/splunk/bin/splunk 6.0.1/lib/python2.7/site-packages/splunk/appserver/mrsparkle/root.py start

In my case I had only one result. If you have more than one result, it should be easy for you to figure out which one is SplunkWeb. The command in the output is the command that Splunk uses to launch the splunkweb service.

Let’s see if we can use it. Stop the splunkweb service:

Mac OS / Linux

$SPLUNK_HOME/bin $ ./splunk stop splunkweb

Let’s try to launch splunkweb manually by using the script we created earlier:

Mac OS / Linux

$SPLUNK_HOME/bin $ ./python_splunk.sh -O "./../lib/python2.7/site-packages/splunk/appserver/mrsparkle/root.py" start

Now you know how to launch SplunkWeb manually. To stop it, press Control + C in the terminal.

In Windows, Python libraries are located in a different folder relative to $SPLUNK_HOME:

Windows

%SPLUNK_HOME%\bin > python_splunk.cmd -O ".\..\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\root.py" start

Debugging with pdb, the Python debugger

If you like to debug in a terminal, you can debug using the pdb module. Change the command line to:

Mac OS / Linux

$SPLUNK_HOME/bin $ ./python_splunk.sh -m pdb "./../lib/python2.7/site-packages/splunk/appserver/mrsparkle/root.py" start

Windows

%SPLUNK_HOME%\bin > python_splunk.cmd -m pdb ".\..\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\root.py" start

Python pdb

I’ve dropped the -O switch, which is responsible for optimization. There is no point using it for development. See the official Python documentation to learn more about Python command-line arguments.

After this command you should get a breakpoint on the first executable line. See the documentation for the pdb module to learn how to continue (c) execution and how to work with breakpoints.

Debugging with PyCharm

I used the free Community Edition of PyCharm. Download it, install it, and launch it.

On the Welcome screen, choose Open Directory and select your $SPLUNK_HOME directory (where you installed Splunk):

PyCharm Welcome

Note for Mac OS/ Linux: You might have issues if you try to open the directory, which does not allow you to write into it. Opening the SPLUNK_HOME folder is not a requirement. You can open the directory with your application. Just make sure that you set up all paths to Splunk in PyCharm as I do below.

Note for Windows: You need to launch PyCharm with elevated permissions (Run As Administrator).

Now set up the Python interpreter. We’ll use the scripts python_splunk.sh / python_splunk.cmd that we created earlier.

Open PyCharm Preferences ⌘, (Settings… in Windows Ctrl+Alt+S) and go to the Project Interpreter / Python Interpreters. Click the Add + button, select Local… and choose $SPLUNK_HOME/bin/python_splunk.sh (%SPLUNK_HOME%\bin\python_splunk.cmd in Windows). PyCharm will ask you to set it up as Project Interpreter, click Yes.

Note for Windows: If you don’t see anything on the Paths tab, make sure to launch PyCharm with elevated permissions.

PyCharm Python Interpreters

Next we need to set up how to launch this project:

  1. Open Run / Edit Configurations…, click the + button and select Python. Name it as you wish (SplunkWeb for example).
  2. Set Working Directory to $SPLUNK_HOME/bin. In my case it is /Users/dgladkikh/dev/splunk/bin/splunk 6.0.1/bin/.
  3. Set Script as ./../lib/python2.7/site-packages/splunk/appserver/mrsparkle/root.py (on Windows it is …\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\root.py). This is the script that we found earlier.
  4. Set Script parameters to start.
  5. Verify that you have the right Python interpreter (the one we set in PyCharm Preferences above).

This is what you should see:

PyCharm Run/Debug Configurations

Finally, verify that splunkd is running and splunkweb is not:

Mac OS / Linux

$SPLUNK_HOME/bin $ ./splunk stop splunkweb
$SPLUNK_HOME/bin $ ./splunk start splunkd

Windows

%SPLUNK_HOME%\bin > splunk.exe stop splunkweb
%SPLUNK_HOME%\bin > splunk.exe start splunkd

In PyCharm press the Debug button to start the splunkweb server. You can place breakpoints in your source code and open Splunk in a browser.

PyCharm Run/Debug Configurations

Debugging with Visual Studio

Install Visual Studio and Python Tools for Visual Studio. In my case I installed Visual Studio 2013 Update 1 and Python Tools 2.0 for Visual Studio 2013.

Launch Visual Studio with elevated permissions (Run as Administrator).

First we need to configure the Python interpreter:

  1. Go to the Visual Studio Options -> Python Tools -> Environment Options.
  2. Click on Add Environment, name it as you wish, for example Python-Splunk.
  3. In Environment Settings -> Path specify the path to python_splunk.cmd (see first section of this article).
  4. In Environment Settings -> Windows Path specify the path to %SPLUNK_HOME%\bin%.
  5. In Environment Settings -> Library Path specify the location of the Python libraries %SPLUNK_HOME%\Python-2.7\Lib.
  6. Specify the architecture x64 (x86 if you are on a 32-bit machine).
  7. Specify the language version, 2.7.

This is what you should see:

Visual Studio Options

Let’s create a new project. Open File -> New Project… and select From Existing Python Code:

From Existing Python Code

Select the path to %SPLUNK_HOME%:

Create new project

Select the Python interpreter, which we created above:

Project Python Interpreter

Save the project:

Project Finish

Open Project Properties and specify Startup File on the General tab as .\Python-2.7\Lib\site-packages\splunk\appserver\mrsparkle\root.py:

Project Properties - General

On the Debug tab specify Script Arguments as start:

Project Properties - Debug

Save your settings, then verify that splunkd is running and splunkweb is not:

Windows

%SPLUNK_HOME%\bin > splunk.exe stop splunkweb
%SPLUNK_HOME%\bin > splunk.exe start splunkd

Now start your project in Visual Studio. In my case Visual Studio told me that I have some errors in my project. Just ignore the errors and click Yes to launch anyway.

Visual Studio Errors

Place breakpoints where you want, then open Splunk in a browser.

Visual Studio Errors

Note: To stop debugging, close the terminal window with Python/SplunkWeb. When I tried to stop debugging from within Visual Studio, it crashed.

Enjoy!

Splunk Eclipse plug-in and Custom Search, new tools for the developer arsenal

$
0
0

Today we’re excited to announce two new additions for the Splunk Developer ecosystem: the Splunk Plug-in for Eclipse and rich Custom Search support in the Splunk SDK for Python.

Splunk Plug-in for Eclipse

Developers can use the Splunk Plug-in for Eclipse for building applications that use and extend Splunk. Eclipse is the tool of choice for the over 10 million Java developers around the world, including the many at Splunk customers.

Java SDK template

The plug-in contains a project template for building a new Splunk SDK for Java application. This is ideal for building an application that searches against Splunk data or does automation. The project template includes snippets for performing common SDK tasks, as well as infrastructure for wiring up the application to log application data directly to Splunk utilizing popular log frameworks like Apache Log4J.

Below you can see a screenshot of using the plugin to create a new Splunk SDK for Java application.

New Java SDK Project

Once I have given it the basic info for my app, I can click finish and see a skeletal application is generated for me.

New Java SDK Wizard

The project comes loaded with a set of snippets for performing common tasks with the SDK. It is as easy as typing “spl + ctrl + space” in Eclipse to bring up a list of snippets you can select from.

Java SDK Snippets

Selecting “splconnect” for example will inject code for connecting to a splunk instance.

Selecting a snippet

Using the new template makes it really easy to kick start your Java development with Splunk!

Modular Inputs

Additionally, the Plug-in includes support for creating modular inputs to extend Splunk. Modular inputs are useful for pulling in streams of data additional data sources such as other internal systems or public APIs like Google, Facebook and Twitter.

Using the modular input template removes a lot of the plumbing you would otherwise need to create a Java modular input. The template will even create a fully working starter implementation for you to play with. Below is a screenshot of creating a new input using the defaults, which includes creating the starter.

Creating a modular input

Once I click “Finish” you can see that the Splunk application, configuration and default implementation is created for me!  Once my app is ready to go, the plug-in even includes Ant integration for packaging up an SPL that I can publish to Splunk Apps or distribute internally. Right clicking on the build.xml brings up a menu to use this feature.

Build SPL menu option

Selecting “Ant Build” takes care of everything for you. Below you can see the output of the successful build.

Building the Spl file

You can now grab the generated TestSplunkInput.spl and start sharing it right away. 

Custom Search Commands

One of the pieces of common feedback we hear from developers is that you love the support for creating custom commands in Python, but you wish that there was a more first class and simplified experience. Well we’ve heard you loud and clear! We’re happy to announce that today we’re introducing a new library for creating custom search commands within the SplunkSDK for Python. Using the library you will find it easy to author new commands and in a strongly typed and far less error-prone manner. The library also dramatically reduces the amount of code you need to write.

The new commands can be used for introducing complex algorithms like doing geo-fencing, or linear regression, for doing dynamic queries from public APIs (such as Social or Open Government APIs), or for addressing domain specific concerns like retrieving data from an internal system or applying custom business logic on top of Splunk data.

In the box you’ll find three types of base search commands you can derive from.

  • StreamingCommand allows you to pipe data from another Splunk command and apply custom transformation logic or filtering. For example imagine you want to filter results based on some domain specific logic, streaming commands are perfect.
  • ReportingCommand allows you to run map/reduce type operations. If you have a large set of data that you need to consolidate down or perform complex mathematical calculations on, then a ReportingCommand is what you want. 
  • GeneratingCommand allows generating transient data dynamically which can be piped into other commands. You can use it to do adhoc queries from system. It is a great complement to our indexing features.

In future posts we’ll dive into more detail on how to actually build these.

Just for illustration, you can use a custom GeneratingCommand to retrieve listings in an adhoc manner dynamically from Yelp’s API based on a set of parameters such as the location and the type of business. Using the SDK, I created a new “yelp”  command for just this purpose.

Below you can see using the command to retrieve a list of sushi restaurants near our Splunk Seattle office in order by distance.

Using the yelp command

You can use it for more than finding just a good sushi meal, like finding local theaters.

Theater results from Yelp

You could even imagine going further and getting the local show times for each theater by using something like the Fandango API.

To implement this command, I created a YelpCommand class in Python which inherits from GeneratingCommand

Yelp Command Code

Next I declaratively specified the options the command accepts using the Option decorator. Finally I implemented the Generate function, which in this case queries Yelp using the Python requests package, extracts the data I am interested in, and yields it back to Splunk.Once the data is returned from a custom search command, you can leverage the full power of Splunk to pipe the results to other commands for aggregating the data, creating charts, etc. You can even join existing events within Splunk to the custom search command to open up new ways of enhancing your data.

I’ll be blogging on how to build the Yelp command soon, but if you want it now, you’ll find it on Github!

Getting Started

To get the updated Python SDK with Search Commands or the new Eclipse tooling, head on over the dev.splunk.com. On the site you’ll find several walkthroughs on the new Eclipse plug-in as well as custom search commands. You can download either the SDK or tools, and in the SDK you’ll find custom search samples as well.

In the coming weeks our team will be blogging more in detail on how you can use these features including how to build custom commands like the Yelp command, as well as using the Eclipse plug-in.

So go check it out, and if you have any feedback on features you’d like to see, head on over to our UserVoice site at splunkdev.uservoice.com.

Command Modular Input Use Case Series

$
0
0

Modular Inputs and Scripted Inputs provide a great way to develop custom programs to collect and index virtually any kind of data that you can set your mind to.

But on whatever platform you have deployed Splunk on, you will also have a whole bevy of other inputs just waiting for you to tap into to get that data into Splunk .They would be the various programs that come with the platform and those that you have installed on your platform.v

This is actually why I created the Command Modular Input  that I introduced in a recent blog, a means to as simply as possible leverage the power of your existing system programs and get this data into Splunk ,rather than having to go and create custom wrappers for each program you want to get data from.

Now the use cases are limitless really.And since I released this Modular Input on Splunkbase Apps,  I’ve heard of customers, staff and partners using it for so many interesting uses cases, from a utility to rattle up a quick POC through to production Splunking.

 

So I have decided to start a blog series on these use cases, and this is episode 1.

 

Command Modular Input Use Case: Agentless monitoring of remote files

Use Case

You have files on a remote machine that you want to monitor without installing a remote agent.

Solution

The first thing I want to do is identify the system commands that will do this for me.

  • Tail : streaming , appended file data is returned.
  • Stat : non streaming , just a periodic poll of the file(s) attributes
  • Cat : non streaming , to periodically poll the full file contents , or perhaps create a baseline image of the file contents.

The Command Modular Input is able to operate in streaming and non-streaming mode depending on the nature of the command output.

These commands will return raw command output. The Command Modular Input has the ability to plugin custom response handlers that can be used for data pre-proessing and custom formatting of output (ie: you might convert output to JSON).

So now that we have our commands , how can we execute these remotely ? Well , you can execute the commands remotely over SSH.

Here is an example of how you can configure a Command Modular Input stanza to execute a remote tail command over SSH.

remotetailsetup

 

And likewise for the other commands :

stanzasetup

If you are wondering “hey , where is the password ? ” , well it’s generally a best practice to use shared keys as detailed in these links.

http://linuxers.org/howto/how-run-commands-remote-machine-using-ssh

http://linuxers.org/article/ssh-login-without-password

Here are some screenshots of what the raw indexed output from these remotely executed commands looks like :

remotetailremotestatremotecat

Here is a screencast of the remote tailing in action. In this example I have a Command Modular Input stanza executing a tail command to a Raspberry Pi 7000 miles away on the other side of the world , streaming file appends back to Splunk and searching over this in realtime.

 

RedMonk Chats with Customers and Partners about Development and DevOps with Splunk

$
0
0

RedMonk analyst Donnie Berkholz sat down with a few Splunk customers at various locations to discuss everything from DevOps and continuous deployment to building Splunk Apps with the Web Framework in Splunk 6. First Donnie sat down with Nick DiSanto of Snap Interactive, who talks about how they use Splunk to monitor continuous deployment and for trouble shooting, remarking “every single developer and product person uses Splunk on a daily basis”. Donnie also sat down with Steve Dodson and Kevin Conklin of Prelert who discuss why they chose to build on Splunk and the flexibility of the Web Framework in Splunk 6. Finally, Donnie also talks with Ashish Bhutiani of Function1 about the the Web Framework.

There are a ton of great insights and perspectives in this video, check it out!

redmonk-video

The new developer tool chain for data, panel participation at DeveloperWeek

$
0
0

NewImage

 

A few weeks ago I had the pleasure of partaking in a panel at DeveloperWeek entitled “Next Gen Data Dev: NoSQL, NewSQL, Graph Databases, Hadoop, Machine Learning….”. On the panel I was joined by Emil Eifrem, CEO of Neo Technology and co-founder for Neo4J as well as Ankur Goyal, Directory of Engineering for MemSQL. The high level theme was around the kinds of tools that have emerged for developers to work with data, and whether or not a new breed of developers is emerging. The panel started of with quick introductions on each of the products.

  • Ankur described MemSQL as the fastest database. MemSQL is a highly performant, distributed, transactional SQL database with an in-memory write-back store. Because it is fully SQL compliant it has the advantage of working with the existing ecosystem of SQL products. It combines the best of both worlds allowing fast queries in memory with the benefits of persistence. Unlike something like Redis, it is NOT just a key-value store so you can gain the schema benefits. It also includes support for a new JSON type which allows storing JSON blobs in a column, but then being able to index and query against it.
  •  Emil described Neo4j as a graph database. Neo stores data as a collection of nodes with properties that have complex relationships. It then provides a graph query language, which allows you to traverse the nodes and ask rich questions about the data. Similar to MemSQL, it is transactional, performs and scales really well.
  • I described Splunk as a product and platform for operational intelligence. Splunk can ingest evented / time stamped data from any source. Splunk applies the schema realtime rather than requiring the data to conform to a specific format. It allows you to aggregate all of the data in a single place and then query or visualize against it.

Here is a summary of the major themes that I took away from the discussion.

  • Big Data is just data. It is not some magic unicorn type of data, it literally is just data. What’s different is that data is often coming from many more sources and in a large enough volume and frequency that traditional database solutions are not ideal for processing.
  • Graph databases are a new type of DB that allow solving some really interesting problems in particular with regards to when there are deep relationships and networks for that data. For example a canonical use case is a social network. You could easily model a social network for friends whether it be Twitter, Facebook, etc. in a Graph DB. You could add all the friends preferences, likes, etc. Then you could start asking deep questions about the network with an instant response as opposed to a traditional database which would result in a complex query of joins which might take several hours to get the same results, or which would require building a data warehouse. 
  • New SQL is a new approach to an old solution. New SQL databases are modern relational DBs that are architected to offer solutions that are much more performant than their traditional counterparts. They do this through a variety of techniques including moving as much data and processing into memory and using write back solutions for persistence. Solutions like MemSQL are embracing JSON as a storage format allowing a hybrid of traditional structured storage and free-form document data. The big attraction for New SQL is that you don’t have to re-architect your app. Your existing SQL workloads should get a huge performance gain simply by moving to use a New SQL store.
  • There is no need for a new breed of developer. There was a pretty unanimous sentiment echoed by the panel that tools and SDKs need to rise to the occasion and meet the developer rather than vice-versa.
  • The era of one solution to rule them all is over. In the past it was common to try to use one database for everything. Now we live in a world where there are many different fit-for-purpose data storage options from SQL and New SQL, to key-value stores (like Redis), document databases (Mongo DB, Couch) and now graph storage. Building an application today is very much about choosing the right solution for the right problem. Fortunately we have many different options at our disposal, but this raises new challenges. For example how as a developer do you manage an “entity” that is persisted across a SQL store and a graph database? How do you retrieve it? These are gaps that the tool chain needs to fill.

Splunk stands at an interesting intersection for all of these different sources. It is not a database per se, but it can act as intermediary for storage and retrieval of data with these various stores.

For example we discussed the idea of using Splunk to take in events and then send updates back to Neo to update the graph, or possibly doing a lookup against the graph based on an incoming event. It could operate in a similar fashion with MemSQL. Another interesting idea would be to leverage MemSQLs support for JDBC to query against Splunk’s new ODBC connector.

It was a great experience to be part of the panel and we were all agreeing more than disagreeing. I left feeling excited about the continued energy and innovation occurring in this space and the part we play. Products like Neo, MemSQL and Splunk are offering newer, and more efficient ways to process the increasing volume, sources, and types of data. This just makes it easy for developers to get their jobs done.

Splunk’s New Web Framework, Volkswagen’s Data Lab, and the Internet of Things.

$
0
0

There are many incredible features in Splunk 6. Pivot, Data Models and integrated maps really stole the show at .conf2013. But I really have to give credit to our developer team in Seattle for the massive leap forward in user interface possibilities with the addition of the integrated web framework, which is included in Splunk 6 but is also available as an app download for Splunk 5.

In the midst of all that Splunk 6 excitement at .conf, I was introduced (at the Internet of Things pavilion) to the team at Volkswagen Data Lab, and had some great discussions with them about their interest in using Splunk as a  platform for the management, analysis, and visualization of data from connected cars. Splunk had recently done a project with Ford that had gone quite well, and I really hoped that we would be able to work with VW as well…

 

Fast-forward 5 months – CeBIT 2014. 

VolkswagenTweet What you may not recognize behind them is an amazing Splunk visualization of the connected Volkwagen e-ups that were used as shuttles at CeBIT.  Yes, you can make data-driven dashboards that look like that using Splunk 6’s web framework.

 

Here’s a close-up:

VolkswagenDashboard

There are some very interesting concepts and innovations in this dashboard. First is its capability to replay any vehicle’s journey for the selected time range. In the lower left, you can see the scrub controls, and vehicle activity is marked by a simple histogram.  All available sensors on the vehicle are “played back” in real-time or fast-forward mode, including vehicle speed, engine RPM, battery status, vehicle range, outdoor temperature, door and headlight status.

 

Here’s what Splunk’s CEO, Godfrey Sullivan, had to say about the project:

GodfreyVolkswagen

 

 

Concepts like the ones demonstrated in this project really show off Splunk’s capabilities when it comes to ingesting, analyzing and visualizing ALL data, but I really feel like this one should connect with many out there considering using Splunk as a framework for their Industrial and Internet of Things data. The possibilities are endless, just download Splunk 6, connect your data and start your engines!


Using Splunk as a data store for developers

$
0
0

A number of years ago, I wrote a blog entry called Everybody Splunk with the Splunk SDK, which succinctly encouraged developers to put data into Splunk for their applications and then search on the indexed data to avoid doing sequential search on unstructured text. Since it’s been a while and I don’t expect people to memorize the dissertations of ancient history (to paraphrase Bob Dylan), I’ve decided to write about the topic again, but this time in more detail with explanations on how to proceed.

Why Splunk as a Data Store?

Some may proclaim that there are many no-sql like data stores out there already, so why use Splunk for an application data store? The answers point to simplicity, performance, and scale. You can easily put any type of time series text into Splunk without having to worry about its format while Splunk at the same time provides free universal forwarders to send data from remote places whether the data comes from a file, a network port, or the output of an API (known to Splunk users as scripted input). We call this universal indexing. All data separated by punctuation in the event stream gets indexed. This leads to the performance aspect. If all data is indexed, search speed is incredibly fast for any search term. To make matters even better, a computer science concept called bloom filters used in Splunk, makes searching faster than just simply indexing all the data, especially when performing needle in the haystack searches. Scale is achieved by the implicit use of the mapreduce algorithm for horizontally scaling hosts that index the data. The user of Splunk does not have to write or think about mapreduce as it happens under the covers.

Getting data in is one thing, but getting it out is quite another. The ability to use “google like” searches with AND (implicit), OR, and NOT to retrieve events makes for a natural search experience. However, the real power of Splunk is the included Splunk Search Processing Language (the commands after the pipe symbol) that do wonders for productivity and analysis. If you combine universal indexing, a scalable engine to do the work, and a comprehensive set of commands to become productive quickly, you’ll see why I recommend using Splunk as a developer data store.

Steps to get started

In this year’s blog entry on this topic, I will list out the steps for those who want to get started. I am assuming that you are a software developer that is looking into a technology to use as a data store.

  1. Download Splunk and install it. You can start with the free version of Splunk. Download Universal Forwarders if you plan to send data from remote locations. If this is your first time using Splunk, try the tutorial.
  2. Get Data In.  After that, use the web interface to test retrieving data and to test out the Splunk search language.
  3. For the software developer, use one of the open source SDKs to interact with Splunk using either Java, Python, JavaScript, Ruby, C#, or PHP. Each SDK follows this pattern to retrieve data:
    • Connect to Splunk.
    • Authenticate (which may be implicit with configuration files with some languages).
    • Request a Search Job to execute a search. The search will be the same type of search text string you executed from the web interface.
    • Iterate over the results to do something with them. Results for matching events can come back as raw text, JSON, XML, or CSV formatted.
    • Disconnect, if needed.

This should get you started. More docs are at the Splunk developer website. For certain SDK languages, there may be more integrations that adhere to the culture of the language. For instance, the Java SDK works inside of Eclipse, NetBeans, and Spring.

To sum it up, the ease of getting time series data stored into Splunk with full fidelity, the ability to have it be universally indexed, the capability to scale to large amounts of data, and the inclusion of a powerful set of search commands is why I am advocating using Splunk as a data store.

SplunkDataStore

P.S. In the Everybody Splunk blog entry, I started a rap, but never did finish it for the developer version. Here it is in its entirety.

Everybody Splunk.
Superstars Dunk.
Everyone say hey.
Find the needle in the hay.
Let Splunk show you the way.

Everybody Splunk.
Correlation Funk.
Everybody search.
No need to lurch.
Let Splunk show you the way.

Everybody Splunk.
Don’t be a monk.
Everyone can play.
Shorten your day.
Let Splunk show you the way.

 

 

Splunk as a Recipient on the JMS Grid

$
0
0

A number of years ago, I was fascinated by the idea of SETI@home. The idea was that home computers, while idling, would be sent calculations to perform in the search for extraterrestrial life. If you wanted to participate, you would register your computer with the project and your unused cycles would be utilized for calculations sent back to the main servers. You could call it a poor man’s grid, but I thought it of it as a massive extension for overworked servers. I thought the whole idea could be applied to the Java Messaging Service (JMS) used in J2EE application servers.

Background

Almost a decade ago, I would walk around corporations at “closing” time and see a mass array of desktops idling by. I thought what if these machines and all other machines in the companies could be utilized to perform units of work on behalf of common servers. Each machine could be a JMS client that happily turns itself on in off hours and receives a message encapsulated with data to perform some calculation. The client would receive an object from a queue and call one interface method:

void doWork();

That would perform the calculation on the client, encapsulate the results in the same object, and the client would place this object on a reply-queue. The application server would then receive the message via message driven beans and store the results in some back end store such as a database.

ClassArchitecture

Keep in mind that these were the days before Hadoop, where mapped jobs can be sent to task nodes (although even in Hadoop, jobs are executed on servers, not on underutilized desktops, laptops, and Virtual Machines). So, my idea utilized JMS for the distribution of work. I created a self-contained framework for this idea that you can download for your own use at Github. Although, my implementation was tested on an Oracle WebLogic Server on a Windows machine, it is generic enough to run on any JMS implementation on any JMS platform. As long as the article is still being kept around, you can read about the entire implementation here.

The use cases for these types of calculations span many fields from executing banking applications, to gathering scientific analysis to performing linear optimization to building mathematical models. For my demo within the framework, I chose arbitrary matrix multiplication (used in weather forecasting apps) and mentioned that the results could be stored in a relational database.

Enter Splunk

What does this have to do with Splunk? Examine what a typical matrix would look like here:

matrix_sample

Notice that not only is it time series data, but it also could have arbitrary size. Storing a 3×3 matrix vs. storing, say a 25×25 matrix, in a relational database for historical posterity and further calculations may not be trivial. It is simple to do with Splunk. (Remember, Splunk did not exist a decade ago, so you can excuse me for not mentioning it as a data store in the original article.) This means using my framework from Github, you can now easily store the data into Splunk for time series text data.

JMS_Splunk

There are two ways to integrate the results into Splunk. One is to have the message driven beans store their results into a rotated file that is monitored by Splunk (or preferably Universal Forwarders). The other is to have JMS clients receive the results from the reply queue and have them send their results to standard output to be picked up by Splunk (or again, preferably Universal Forwarders). There is a technology add-on modular input in the app store to index the results from JMS clients that you can utilized for this latter approach.

You could even have the message driven bean recipients store their results into HDFS within a Hadoop cluster. You can then use Splunk’s HUNK product to query for the results using the Splunk search language without having to write Hadoop Map-Reduce jobs on your own.

Using the Search Language

Speaking of the Splunk Search language, here’s some examples for what you can do with the matrix demo results being stored within Splunk in my example. First you can treat each column as a multi-value field and simply view the values.

matrix_with_values

Next, you can create a visualization with the average of each column. Here’s a picture of this, when I first did this a few years ago within Splunk:

matix_avg_values

If you want to simply perform arbitrary math on the columns, here’s an example that creates multi-value fields for each column, uses mod 50 on each value, averages the column results and then rounds off the results into an integer.

matrix_eval

Conclusion

Some may say that my ideas may sound outdated as we have existing frameworks for message passing and mapping jobs, but the simplicity of my approach is not matched. In other words, anyone who has used J2EE before can try this at home and it addresses the simple notion of utilizing a corporation’s peripheral computing power. The introduction of Splunk to receive these messages adds another dimension to the original work as unstructured time series data now has a scalable home for further analysis using a powerful search language.

Building custom search commands in Python part I – A simple Generating command

$
0
0

Custom search commands in our Python SDK allow you to extend Splunk’s search language and teach it new capabilities. In this and other upcoming posts we’re going to look at how to develop several different search commands to illustrate what you can do with this.

In this post, we’re going to focus on building a very basic Generating command.  A generating command generates events which can be from any source, for example an internal system, or an external API. We’re going to create a GenerateHello command that will generate Hello World events based on a supplied count. The command is not very useful in itself, but it is a quick way to see how you can author custom commands.

Below is a screenshot of using the command we’re going to build. As you can see it outputs a series of “Hello World” events.

Screen Shot 2014 04 14 at 3 52 49 PM

Creating the Application skeleton

Custom search commands are deployed via a Splunk application. As with any Splunk app there is a specific file layout and some configuration files that are required. Fortunately the Splunk SDK for Python includes a template which you can use as a start point. Here are the steps to create a new app using the template.

  • Go to your $SPLUNK_HOME/etc/apps folder and create a new folder called generatehello_app.
  • In another folder i.e. “~/” go clone the Splunk SDK for Python using the following command: git clone git@github.com:splunk/splunk-sdk-python.git.
  • Next copy the contents of “./splunk-sdk-python/examples/search_commands_template” to $SPLUNK_HOME/etc/apps/generatehello_app.
  • Copy the “./splunk-sdk-python/splunklib folder into the $SPLUNK_HOME/etc/apps/search-starter/bin folder
  • Go into the bin folder of the new app. You will see 3 .py files. Delete report.py and stream.py as we’re creating a generating command.
  • Rename generating.py to generatehello.py.

Now we need to do some search and replacing in the template. Edit bin/generatehello.py, and app.conf, commands, conf and logging.conf in the default folder.

  • Replace each instance of $(command.title()) with GenerateHello.
  • Replace each instance of $(command.lower()) with helloworld as this is the name of the command.
  • Replace each of the remaining $(…) values with the appropriate information based on the name. The specific values in this case don’t really matter, but you must put something.

Implementing the GenerateHello Command

Authoring a search command involves 2 main steps, first specify parameters for the search command, second implement the generate() function with logic which creates events and returns them to Splunk.

Edit generatehello.py in the bin folder and paste the following code:

import sys, time
from splunklib.searchcommands import \
    dispatch, GeneratingCommand, Configuration, Option, validators

@Configuration()
class GenerateHelloCommand(GeneratingCommand):
    count = Option(require=True, validate=validators.Integer())

    def generate(self):
        for i in range(1, self.count + 1):
            text = 'Hello World %d' % i
            yield {'_time': time.time(), 'event_no': i, '_raw': text }

dispatch(GenerateHelloCommand, sys.argv, sys.stdin, sys.stdout, __name__)
What the code is doing:
  • The GenerateHelloCommand derives from GeneratingCommand
  • A count parameter is declared for the command using the Option decorator. Parameters can be optional or required, this one is specified as required. Additionally a validator is specified constraining the value to be an integer
  • A for loop runs to generate Python hash objects representing events.
  • event_no is set on each event representing the message count. This surfaces in Splunk as a field that can be selected in the field picker.
  • _time is set on each event to a timestamp. Splunk expects that all events have a timestamp associated, so this must be set.
  • _raw key is set. This field is an optional, if Splunk sees this it then it will display it by default in Raw or List mode. Raw commonly contains the full source event as it does here.

Testing the command outside of Splunk

Now that the command is created, we can run it right at the command line. This is really convenient in particular for debugging and you’ll probably prefer using this technique as you develop commands.
Enter the following at the terminal to run the command and have it generate 5 events.
python.py generatehello.py __EXECUTE__ count=5 < /dev/null
 
You should see output similar to the following showing that the events are getting properly generated:
Screen Shot 2014 04 14 at 5 26 13 PM

 

 

 

 

 

 

 

 

 

If you try to run without passing count, you’ll get an error as count is required:

Screen Shot 2014 04 14 at 5 26 24 PM

 

 

 

 

 

 

 

 

 

And if a letter is provided for count, it will also fail as it is not an Integer:

Screen Shot 2014 04 14 at 5 26 37 PM

 

 

 

 

 

 

 

 

 

Finally, any error output that is generated is also available within the log file which is in the root of the application:

Screen Shot 2014 04 14 at 4 02 58 PM

Testing the command inside of Splunk

Now that we can see that the command is working we can test it out in Splunk. If your Splunk instance is already running, you’ll need to restart it to have it load the new command either using the Splunk CLI or from Splunk UI.

Once you have logged in to the instance you should the new app has been loaded. Notice below the “Generate Hello” app:

Screen Shot 2014 04 14 at 3 50 30 PM

Clicking on that app will then take you to the search screen. Entering “| helloworld count=5” at the search bar should return the expected “Hello World” events as in the screenshot below. You can see Splunk has picked up our events!

 

Screen Shot 2014 04 14 at 3 52 49 PM

You can see in the screenshot that the event_no field has also been picked up, allowing it to be selected if the view is switched to “Table”

It’s that easy!

Next steps

Now that the search command is complete, the app can be packaged and up as any other application and then published on Splunk Apps or put on a share where it can be imported into a Splunk instance. You can also configure permissions for the app, or change it’s visibility to allow the commands to be accessible from anywhere within Splunk.

Summary

In this first post we built a very simple example of a generating command, but even in its simplicity, it shows its power! In the next post you’ll see how you can use this to build a more real-world generating command.

You can find a working version of the command we created in this post, along with instructions on installation in our Python SDK source on Github in the examples folder.

Enjoy!

Reflections on a Splunk developer’s journey : Part 1

$
0
0

It seems like only yesterday

…that I was writing my first Splunk App. It was the openness and extensibility of the Splunk platform that attracted me to this pursuit in the first place, and when I discovered the thriving community on Splunkbase (now called Splunk Apps / Answers), I just had to contribute. 11,000+ downloads across 9 different freely available community offerings later, I am feeling somewhat reflective. So in this 2 part blog series I want to share with you some of my lessons learned from developing and supporting Splunk community Apps/Add-ons (part 1) and then some musings on why you should consider developing Splunk Apps/Add-ons yourself and contribute to the Splunk developer ecosystem (part 2).

Some lessons learned

Keep it simple

Try to make the App/Add-on as simple as possible in terms of install, setup, configuration and navigability. I try to implement solutions with the least number of things for the user to do and know in order to get up, running and productive. You can still have complexity, but just make sure you abstract this from the users in an elegant, intuitive and robust experience.

Documentation

It’s going to make your life and you user’s lives are lot simpler if you have concise, easy to follow documentation that covers everything you’d expect to encounter in the normal usage of your App/Add-on. My documentation is also continually evolving as I respond to questions from users, add new features and learn about what the users expect. A step-by-step troubleshooting guide is also useful.

Logging

Vital for responding to issues, particularly with coded components such as Custom Search Commands & Modular Inputs. Be verbose enough to be able to remotely diagnose issues. Send your logs to Splunk’s logging directory and then your users can search over your logs in the “_internal” index via the Splunk Search App.

Listen to your users

They are a fantastic source of ideas and features. You’ll be able to think of many features, but the community will think of more once your App/Add-on is in the wild. Continually improve your App/Add-on based on feedback you get directly or via other forums such as Splunk Answers.

Support and Answers

Depending on the nature of your App/Add-on (community or paid for), your “SLA” will vary. I make my email available for direct contact and try to reply as timely as possible, but where possible I also try to direct users to Splunk Answers so that the community can benefit from the collaborative knowledgebase.

Crowdsourcing

Some Apps/Add-ons might have so many potential use cases and target data sources that it is not really feasible to perform completely thorough tests. Leverage the power of the community to help you out.

Reference Architectures

Your App/Add-on might not just be deployed on a single monolithic Splunk instance. In all likelihood it will be deployed in a distributed Splunk architecture. Also, the target data sources will have their own specific architectural properties and deployment scenarios. So some simple reference guidelines for how to deploy your App/Add-on in various scenarios will be very useful to your end users.

Github

Splunk Apps is good for distributing your releases, but is not a source code control system. I make all my code available on Github. In my mind this drives more community collaboration.

Legal Schmegal

OK, OK, I know you are a developer, but still be aware of any legal obligations i.e.: using 3rd party libraries or exporting crypto code.

Platform independence

I try to make all my Apps/Add-ons supported on all available Splunk OS architectures. This is just going to expand your potential user base. For example, with Modular Inputs, I prefer to code in Python and avoid using any native platform calls or libraries.

Choose the right language

Although Splunk ships with a Python runtime, one of the great things about the platform is that for the most part you are really not limited to any one language i.e.: you might use Java or C# to build a Modular Input if those languages were best suited to the task.

Licensing

Depending on the nature of your App/Add-on (community or paid for), you should provide an appropriate license. When uploading an App or Add-on to Splunk Apps , several different open source license options are available to choose from.

Develop / Build / Test / Release Environment

A vital part of the equation. I use Eclipse with various plugins (Egit for Github integration, various language plugins), Ant for building my releases, Sublime for text editing, a few Linux and Windows Virtual Machines running on my Mac for deployment and testing. I’ve tried to make my environment as simple and streamlined as possible so I can turnaround releases in a timely and robust manner.

Social Media is your friend

Get the word out! Twitter, LinkedIn, Facebook etc. Github is also a great social platform for publicizing your App/Add-on.

Credit where credit is due

If you have collaborators/ contributors, share the love, be cool and acknowledge them.

Look and Feel

You don’t have to be graphic designer of the year, but it’s worth a little bit of effort in creating you own app icons and perhaps a custom CSS at the simplest. Perhaps you even have several offerings, so you can adopt a graphical theme across them all to tie in your “brand”. Be careful not to use copyright images though.

Naming

Be aware of naming conventions if you want your App/Add-on to be approved and published to Splunk Apps:

 

Aside from the actual technical aspects of creating Apps/Add-ons, those are the key learnings I’ve drawn from experience. Of course, over at splunk.com there is a wealth of other technical detail about building apps, packaging and submitting them.

 

http://docs.splunk.com/Documentation/Splunkbase/latest/Splunkbase/Introduction

http://apps.splunk.com/develop

http://dev.splunk.com

http://docs.splunk.com/Documentation/Splunk/6.0.2/AdvancedDev/Whatsinthismanual

 

So that’s a lot of What, How and Where. Stay tuned for Part 2 of this blog series for some ideas on Why you should develop.

Announcing the Splunk Add-on for Check Point OPSEC LEA 2.1.0

$
0
0

Check Point administrators rejoice, Splunk Add-on for OPSEC LEA 2.1.0 has been released! The free update provides useful improvements to almost every aspect of the add-on.

 

User Interface

The old OPSEC interface has been completely overhauled and streamlined. The interface is no longer stuck in the past and should look right at home on your Splunk 6 search heads.

manage

 

The manage connections page now offers a much more powerful overview of your Check Point connections. As you can see on the screenshot, every connection has a set of metrics available. These differ based upon the connection type. An audit connection displays the timestamp of the last event collected. A normal connection displays throughput over the last 24 hours and the last 15 minutes. Simply clicking on a table row will display these metrics. These searches also employ accelerated data models, so they’re quite fast. We hope these metrics will save you from constantly running searches for more information about your connections.

 

There are additional improvements for larger Check Point deployments. Have a hundred connections? That’s unfortunate, but connection name filtering is here to help! A quick search in the filter bar can whittle down the number of connections. Pagination helps keep the list of connections readable, only displaying twenty connections at a time. Finally, most of the columns can be sorted. This is particularly helpful when you need to group your connections by connection type.

 

create

 

With the old add-on, it was very time consuming to create a connection to a dedicated log server. As you may know, Check Point log servers don’t have a certificate authority. Dealing with this required an ugly workaround to pull the certificate from the MDS. We’re very happy to say that the new workflow fixes this problem! With the new version, the MDS can be specified directly in the pull cert workflow. Pulling a cert will also no longer lock your browser with a synchronous AJAX request!

 

Performance

Connections now support online (realtime) mode. This helps decrease latency, since events are pulled as soon as possible. The add-on typically waits 30 seconds between trips to the Check Point server. However, note that completely saturated connections will probably not gain much performance. Try experimenting with this feature to see if it will actually improve performance for your connections.

 

The new version is available at http://apps.splunk.com/app/1454/ and is completely free! I would like to thank Caleb, Alex, Cary and Roussi and the rest of the team for all the hard work they put into this new release. Happy Splunking!

Viewing all 218 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>