Hal here, your friendly Lorax and developer evangelist! I wanted to share with everyone a guest post from a Splunker whom I met and see regularly at the Metro Atlanta Splunk User Group, Robert Labrie. Robert is a DevOps Engineer at The Network Inc, a company which builds solutions that prevent, detect and remediate misconduct to help companies maintain ethical cultures.
This post is about how Robert approached building out a new architecture, and of course, how to index the data generated by all of the components. Without further ado, take it away, Robert!
The team at TNWDevLabs started a new effort to develop an internal SaaS product. It’s a greenfield project, and since everything is new, it let us pick up some new technology and workflows, including neo4j and nodejs. In my role as DevOps Engineer, the big change was running all the application components in Docker containers hosted on CoreOS.
CoreOS is a minimalist version of Linux, basically life support for Docker containers. It is generally not recommended to install applications on the host, which raises a question: “How am I going to get my application logs into Splunk?”.
With a more traditional Linux system, you would just install a Splunk forwarder on the host, and pick up files from the file system.
With CoreOS, you can’t really install applications on the host, so there is the possibility of putting a forwarder in every container.
This would work, but it seems wasteful to have a number of instances of Splunk running on the same host, and it doesn’t give you any information about the host.
CoreOS leverages SystemD which has an improved logging facility called JournalD. All system events (updates, etcd, fleet, etc) from CoreOS are written to JournalD. Apps running inside docker containers generally log to stdout, and all those events are sent to JournalD as well. This improved logging facility means getting JournalD into Splunk is the obvious solution.
The first step was to get the Splunk Universal Forwarder into a docker container. There were already some around, but I wanted to take a different approach. The idea is instead of trying to manage .conf files and getting them into the container, I leverage the Deployment Server feature already built into Splunk. The result is a public image called tnwinc/splunkforwarder. It takes two parameters passed in as environment variables: DEPLOYMENTSERVER and CLIENTNAME. These two parameters are fed into $SPLUNK_HOME/etc/system/local/deploymentclient.conf when the container is started. This is the bare minimum to get the container up and talking to the deployment server.
Setting up CoreOS
Running CoreOS, the container is started as a service. The service definition might look like:
ExecStart=/usr/bin/docker run -h %H -e DEPLOYMENTSERVER=splunk.example.com:8089 -e CLIENTNAME=%H -v /var/splunk:/opt/splunk --name splunk tnwinc/splunkforwarder
-h %H | Sets the hostname inside the container to match the hostname of the CoreOS host. %H gets expanded to the CoreOS hostname when the .service file is processed. |
-e DEPLOYMENTSERVER | The host and port of your deployment server |
-e CLIENTNAME =%H | This is the friendly client name. |
-v /var/splunk:/opt/splunk | This exposes the real directory /var/splunk as /opt/splunk inside the container. This directory persists on disk when the container is restarted. |
Now that Splunk is running and has a place to live, we need to feed data to it. To do this, I setup another service, which uses the journalctl tool to export the journal to a text file. This is required as Splunk can’t read the native JournalD binary format:
ExecStart=/bin/bash -c '/usr/bin/journalctl --no-tail -f -o json > /var/splunk/journald'
This command dumps everything in the journal out to JSON format (more on that later), then tails the journal. The process doesn’t exit, and continues to write out to /var/splunk/journald, which exists as /opt/splunk/journald inside the container.
Also note that as the journald file will continue to grow, I have an ExecStartPre directive that will trim the journal before the export happens:
ExecStartPre=/usr/bin/journalctl --vacuum-size=10M
Since I’m not appending, every time the service starts, the file is replaced. You may want to consider a timer to restart the service on a regular interval, based on usage.
Get the data into Splunk
This was my first experience with Deployment Server; it’s pretty slick. The clients picked up and reached out to the deployment server. My app lives in $SPLUNK_HOME\etc\deployment-apps\journald\local. I’m not going to re-hash the process of setting up a deployment server, there is great documentation at Splunk.com on how to do it.
My inputs.conf simply monitors the file inside the container:
[monitor:/opt/splunk/journald] sourcetype = journald
The outputs.conf then feeds it back to the appropriate indexer:
[tcpout] defaultGroup = default-autolb-group
[tcpout:default-autolb-group] server = indexer.example.com:9997
[tcpout-server://indexer.example.com:9997]
Do something useful with it
None of getting the data into Splunk is special, but the props.conf is fun so I’m covering it separately. Running the journal out in JSON structures the data in a very nice format, and the following props.conf helps Splunk understand it:
[journald] KV_MODE = json MAX_TIMESTAMP_LOOKAHEAD = 10 NO_BINARY_CHECK = 1 SHOULD_LINEMERGE = false TIME_FORMAT = %s TIME_PREFIX = \"__REALTIME_TIMESTAMP\" : \" pulldown_type = 1 TZ=UTC
KV_MODE=json | Magically parse JSON data. Thanks, Splunk! |
TIME_PREFIX | This ugly bit of regex pulls out the timestamp from a field called __REALTIME_TIMESTAMP |
TIME_FORMAT | Standard strpdate for seconds |
MAX_TIMESTAMP_LOOKAHEAD | JournalD uses GNU time which is in microseconds (16 characters). This setting tells splunk to use the first 10. |
Once that app was published to all the agents, I could query the data out of Splunk. It looks great!
This is where the JSON output format from journald really shines. I get PID, UID, the command line, executable, message, and more. For me, coming from a Windows background, this is the kind of logging we’ve had for 20 years, now finally in Linux, and easily analyzed with Splunk.
The journal provides a standard transport, so I don’t have to think about application logging on Linux ever again. The contract with the developers is: You get it into the journal, I’ll get it into Splunk.
Edit @ 5/1/15: added detail about managing size of journald export file.