A few weeks ago I had the pleasure of partaking in a panel at DeveloperWeek entitled “Next Gen Data Dev: NoSQL, NewSQL, Graph Databases, Hadoop, Machine Learning….”. On the panel I was joined by Emil Eifrem, CEO of Neo Technology and co-founder for Neo4J as well as Ankur Goyal, Directory of Engineering for MemSQL. The high level theme was around the kinds of tools that have emerged for developers to work with data, and whether or not a new breed of developers is emerging. The panel started of with quick introductions on each of the products.
- Ankur described MemSQL as the fastest database. MemSQL is a highly performant, distributed, transactional SQL database with an in-memory write-back store. Because it is fully SQL compliant it has the advantage of working with the existing ecosystem of SQL products. It combines the best of both worlds allowing fast queries in memory with the benefits of persistence. Unlike something like Redis, it is NOT just a key-value store so you can gain the schema benefits. It also includes support for a new JSON type which allows storing JSON blobs in a column, but then being able to index and query against it.
- Emil described Neo4j as a graph database. Neo stores data as a collection of nodes with properties that have complex relationships. It then provides a graph query language, which allows you to traverse the nodes and ask rich questions about the data. Similar to MemSQL, it is transactional, performs and scales really well.
- I described Splunk as a product and platform for operational intelligence. Splunk can ingest evented / time stamped data from any source. Splunk applies the schema realtime rather than requiring the data to conform to a specific format. It allows you to aggregate all of the data in a single place and then query or visualize against it.
Here is a summary of the major themes that I took away from the discussion.
- Big Data is just data. It is not some magic unicorn type of data, it literally is just data. What’s different is that data is often coming from many more sources and in a large enough volume and frequency that traditional database solutions are not ideal for processing.
- Graph databases are a new type of DB that allow solving some really interesting problems in particular with regards to when there are deep relationships and networks for that data. For example a canonical use case is a social network. You could easily model a social network for friends whether it be Twitter, Facebook, etc. in a Graph DB. You could add all the friends preferences, likes, etc. Then you could start asking deep questions about the network with an instant response as opposed to a traditional database which would result in a complex query of joins which might take several hours to get the same results, or which would require building a data warehouse.
- New SQL is a new approach to an old solution. New SQL databases are modern relational DBs that are architected to offer solutions that are much more performant than their traditional counterparts. They do this through a variety of techniques including moving as much data and processing into memory and using write back solutions for persistence. Solutions like MemSQL are embracing JSON as a storage format allowing a hybrid of traditional structured storage and free-form document data. The big attraction for New SQL is that you don’t have to re-architect your app. Your existing SQL workloads should get a huge performance gain simply by moving to use a New SQL store.
- There is no need for a new breed of developer. There was a pretty unanimous sentiment echoed by the panel that tools and SDKs need to rise to the occasion and meet the developer rather than vice-versa.
- The era of one solution to rule them all is over. In the past it was common to try to use one database for everything. Now we live in a world where there are many different fit-for-purpose data storage options from SQL and New SQL, to key-value stores (like Redis), document databases (Mongo DB, Couch) and now graph storage. Building an application today is very much about choosing the right solution for the right problem. Fortunately we have many different options at our disposal, but this raises new challenges. For example how as a developer do you manage an “entity” that is persisted across a SQL store and a graph database? How do you retrieve it? These are gaps that the tool chain needs to fill.
Splunk stands at an interesting intersection for all of these different sources. It is not a database per se, but it can act as intermediary for storage and retrieval of data with these various stores.
For example we discussed the idea of using Splunk to take in events and then send updates back to Neo to update the graph, or possibly doing a lookup against the graph based on an incoming event. It could operate in a similar fashion with MemSQL. Another interesting idea would be to leverage MemSQLs support for JDBC to query against Splunk’s new ODBC connector.
It was a great experience to be part of the panel and we were all agreeing more than disagreeing. I left feeling excited about the continued energy and innovation occurring in this space and the part we play. Products like Neo, MemSQL and Splunk are offering newer, and more efficient ways to process the increasing volume, sources, and types of data. This just makes it easy for developers to get their jobs done.