A comparison of all the popular time series databases. Prometheus, Influx, M3Db, Levitate
Time series analysis is nothing new and has been used in many industries for many years. However, in the world of software development, the concept has started gaining popularity in the past decade as new and exciting time series databases emerge.
While using a time series database (sometimes shortened as tsdb) isn’t necessary for most companies, it can provide a lot of benefits. For instance, setting up monitoring systems for digital services and IOT can quickly become a complex topic, with some companies hiring full-time people to work solely on setting up and managing their monitoring systems. However, even a basic use case like monitoring CPU usage can showcase the advantage of a time series database. When you look at the CPU usage of a virtual machine (VM), you don’t just want a momentary look into the system; you want to aggregate time-series data so you know what the usage looks like over time. This can help you identify critical issues like big spikes in usage.
Of course, this information can also be stored in a traditional database, so why use a time series database? Because it is a specially developed tool and provides features that are specifically needed for time-series datasets, like being able to ingest incredibly large numbers of data points per second as well as being able to query the data in a structured way for visualization.
In this article, you’ll be introduced to eight different options for time series databases. These options will be compared on four different parameters: ease of use (which includes maintenance and query language support), installation experience, pricing, and customer support.
Prometheus is one of the most popular time series databases available and is the de facto in systems like Kubernetes.
Ease of Use
If you’ve ever worked with or considered working with Kubernetes, it’s likely you’ll have heard about Prometheus. It’s the standard for Kubernetes monitoring because it’s very easy to use. For instance, even tools like InfluxDB expose Prometheus endpoints to query.
You can interact with Prometheus by using their PromQL language or by using one of their client libraries. Officially, they support most of the popular programming languages like Go, Python, and Rust. However, there are also third-party libraries for languages like C#, Node.js, or PHP. You can view all the supported libraries in the Prometheus docs.
Installation of Prometheus
Setting up Prometheus is easy if you know how Prometheus works. In Kubernetes, Prometheus will be installed by default. On other systems, you’ll have to download a binary, and from there, you can install it.
The tool is configured via a prometheus.yml file. This is where the needed Prometheus knowledge comes into play. It does come with a sample configuration you can use; however, it’s a good idea to understand this file before deploying the tool.
Prometheus is, by default, free to use. You don’t need any license and are free to deploy the tool however you want. If you aren’t interested in hosting the tool for yourself, there are also managed versions of Prometheus. There isn’t a managed solution provided by the company itself, but companies, like Last9 , Google Cloud Platform, Microsoft Azure, and Amazon Web Services, offer some form of managed service. This again cements Prometheus as one of the most popular time series databases.
Besides their official GitHub repo, where you can open an issue, there aren’t official support channels. When you’re working with Prometheus, you’re relying on the community to help you with your questions. However, because the tool is so popular, this shouldn’t scare you away. You are bound to find an answer quickly or get an answer quickly if you post it.
InfluxDB, an open-source time series database platform, is a popular time series database due to its powerful API and toolset for real-time applications.
Ease of Use
One of the most important things to consider when choosing a new tool to implement in your toolchain is how easy it is to use. This will play a huge role for your developers when they need to implement it into your applications.
InfluxDB is popular in the eyes of many, partly because of how easy it is to use. In terms of writing to the database, you can choose among a variety of options, including no-code solutions like a Telegraf plug-in, or you can interact with the API directly using language-specific client libraries.
Querying data is straightforward as well, and you can use Flux, a scripting language developed specifically for interacting with InfluxDB. Similar to Python, Flux is an easy-to-use language.
For instance, if you want to use it while developing, you can install it locally on Windows or Mac. You can also find instructions for installing InfluxDB with a few simple commands in Docker or Kubernetes.
As soon as you’ve installed InfluxDB, there are no necessary configurations that need to be made, which means you can get started quickly.
InfluxDB can be downloaded and deployed for free on your own infrastructure. In this case, it’s up to you to figure out how much you’re paying for your infrastructure and, from there, figure out how much the base cost of InfluxDB will be. However, InfluxDB also offers a hosted solution called InfluxDB Cloud. Here, you can either choose between a free plan that is rate-limited or a usage-based plan where you pay $0.002 USD per MB, $0.01 USD per 100 query executions, $0.002 USD per GB-hour you are storing and $0.09 USD per GB of data you are transferring out of InfluxDB.
The chances that you will have to contact anyone to find answers to your questions when working with InfluxDB are slim, as they provide you with well-written documentation. In addition to that, they have a great community where you are likely to find that someone else has already asked your question.
Should you run into a scenario where you’re not able to find an answer, you can either post your question in the community forum for yourself or contact the official InfluxDB support channel. However, this is only available for paying customers.
kdb+’s ease of use heavily depends on what your use case is. When you install it, you also install the q language, which is a language that heavily resembles SQL. You can launch this from the command line, and from there, you can interact with kdb+.
If you’re looking to implement kdb+ in an application, you’re going to have to spend some time researching. There are client libraries available for C# and Python; otherwise, you’ll have to rely on an ODBC connection.
The documentation for kdb+ can also be considered somewhat lackluster in terms of real-world usage, relying on a bunch of theoretical examples.
Installation of kdb+
Getting kdb+ installed isn’t necessarily a challenging experience; however, it can be a tedious one. You aren’t provided with any easy-to-use one-liners for installing the database. Should you be looking to install kdb+ in a cloud provider, you can find options for it on most marketplaces; however, it will be installed on a regular VM, as kdb+ is in no way developed to be cloud native.
kdb+ is free to use for personal use; however, for commercial use cases, you need to buy a commercial license. Unfortunately, there are no public numbers for what this license costs; however, anecdotal evidence by some users puts the cost at around $100,000 USD per year.
With a price tag like kdb+, it’s fair to expect quite a number of things. Thankfully, buying a license with kdb+ means you’re getting access to a big team of supporters who are ready to help you with your queries. This includes access to a first-line support team but also to some of the best q programmers, making it very likely you will get an answer to your questions.
M3DB is a great option for those who aren’t necessarily looking to learn new technology but instead are looking for something that works with existing technologies.
Ease of Use
There are three ways to use M3DB. You can use the binary provided by the company, you can interact with it programmatically, or you can use any tool that supports Prometheus or InfluxDB.
M3DB wasn’t developed to revolutionize the market, but it tries to optimize what already exists. Because of this, it supports both the InfluxDB and Prometheus interfaces. You can even add it as a data source in Grafana by using the Prometheus type. This means that you have plenty of options for interacting with the tool in your daily workflow.
Installation of M3DB
There are two primary installation options for M3DB. One option is Kubernetes, where the installation procedure is pretty straightforward. The other option is to install M3DB directly on your machine, in which case it becomes more complex.
Deploying M3DB yourself includes networking, configuring the hosts, and setting up namespaces. You can view the complete installation instructions on their docs.
M3DB has no paid options, essentially making it a free tool. However, nothing is ever free. When you manage something yourself, you still need to pay the engineers who set up and maintain the tool. You must also pay for the underlying resources needed to host the tool. If you don’t want to host M3DB for yourself, there is one third-party option for a managed solution: Aiven.
There are no official support channels for M3DB; instead, you must rely on the community. The community posts issues on the official GitHub repo and Slack channel. Even with no official support channels, you’re likely to find support for your questions, as there are over 375 active members on the Slack channel and 78 contributors across over 20 companies on the GitHub repo. However, it’s still something to keep in mind when considering M3DB.
Grafana is a popular tool to visualize data and time series metrics; however, it does so by leveraging various data sources. Because of that, Grafana developed their time series database, Grafana Mimir.
Ease of Use
Mimir has a unique position in time series databases because it isn’t a stand-alone product. The purpose of Mimir is to provide a long-term storage solution for Prometheus. Because of this, Mimir is also very easy to use in your daily workflow, as it behaves like an extension of Prometheus. This means you can use Prometheus connectors to query data from the tool.
Installation of Mimir
Mimir is likely one of the easiest products on this list to deploy. The tool is only distributed as a container image, meaning you’ll either deploy it locally using Docker or, more likely, you’ll deploy it in Kubernetes. This can be quickly done by the officially supported Helm chart.
Mimir has no paid option, meaning the only cost is the price you will be spending for the underlying resources and the engineering hours put into setting up and maintaining the tool.
Mimir was developed by Grafana, meaning you can expect the same amount of support you would typically get from Grafana. When using a free account, you’re limited to the official documentation and the community. With no specific numbers on how many people the community consists of, you’re likely to get support, but it’s also expected that getting a proper answer can take a few days.
Used by many big companies like GitHub, Reddit, and Lyft, Graphite is a robust database that focuses on running well no matter what kind of hardware you’re running it on.
Ease of Use
Graphite is one of the less polished options on this list. This isn’t a commentary on the build of the tool but instead on the documentation. The documentation for Graphite is well-written, and there are answers to most of your questions; however, it lacks a sense of structure, meaning it can quickly become confusing when you’re working with it and need answers fast. Besides that, there are three ways to get data into Graphite: plaintext, pickle, or Advanced Message Queuing Protocol (AMQP).
Installation of Graphite
Getting started with Graphite is easy because they provide a container image you can deploy within a few minutes. Should you decide containers aren’t for you, you can also choose to install from source, using pip, virtualenv, Synthesize, or REsynthesize.
Graphite is free to use, and there are no paid options. As with Mimir, the only cost will be in terms of paying for underlying resources and engineering hours.
Getting support for Graphite relies entirely on the community, which is very apparent when clicking the support link on the homepage for Graphite, which links directly to a Stack Exchange page. The community isn’t hugely active, with only a few questions being posted per month.
TimescaleDB is one of the biggest time series databases and is used by some of the biggest companies in the world, like Marvel Studios, Apple Inc., and Walmart.
Ease of Use
TimescaleDB is the only tool on this list that utilizes regular structured SQL as the basis for interacting with the system. In this case, it’s PostgreSQL specifically. If you’ve worked with PostgreSQL before, using TimescaleDB will be easy. Even if you haven’t worked with PostgreSQL before, it’s just like any other SQL-like language, and you can use your favourite [object-relational mapping ORM in your application to write and query data.
Installation of TimescaleDB
There are two options when installing TimescaleDB. You can either deploy it on your own hardware or use Timescale Cloud. To deploy it locally, you just need to follow the instructions provided by Timescale, which can be done on all major platforms. If you want to use the hosted solution, it’s as easy as signing up for an account and creating a service.
If you deploy Timescale yourself, there are no costs related to the software directly. However, you have to keep in mind the cost of the underlying resources as well as the engineering hours you put into setting up and maintaining the tool. If you instead choose the hosted option, the price will depend on the size of the nodes. You can get it for $39 USD for 25 GB storage, 0.5 vCPUs, and 2 GB RAM. This goes all the way up to $9,229 USD for 16 TB storage, 32 vCPUs, and 128 GB RAM.
If you’re running the service on your own hardware, you can only expect to get support from the community. However, if you opt for the hosted version, a basic support package is included. With this package, you can ask all the questions you want; however, you should expect responses as supporters are available. You can also buy an enterprise package where response time is based on severity.
Apache Druid focuses heavily on being an analytics database, whereas other databases focus on simply being able to execute, store, and query requests fast. This lets the user decide what to use the database for, making Apache Druid a good choice for use cases like business intelligence and metric dashboards.
Ease of Use
When you’ve gotten Apache Druid set up you can choose to either use the provided GUI or the scripts downloaded during the installation process. Note that these are scripts you are provided with. Apache Druid does not come with a CLI tool for you to use. This means you cannot simply execute a command like druidctl load-data from anywhere in your terminal; you will always have to find the script. A minor inconvenience, but one to be aware of.
Getting data into Druid takes some getting used to. In most cases, you will be defining a JSON file called an “ingestion task spec”. This file will contain different parameters like what type of ingestion you want to use, what the data schema looks like, and, of course, where the data comes from. With the file created, you can tell Druid to ingest the data by clicking around in the GUI or using one of the provided scripts.
One of the significant things to be aware of when contemplating using Druid is that there is no native support for client libraries, meaning you will have to implement a custom or a third-party solution if you want to ingest data directly from one of your applications. The most common key to solving this problem is to pipe your data into Apache Kafka, from which Druid can then automatically load the data.
Installation of Apache Druid
How easy Apache Druid is to get started depends on your prior knowledge. There are mainly two options for setting up the database: Docker or local. If you have existing knowledge about Docker, Apache Druid will be easy to set up as it works almost like any other Docker container. The main difference is that Apache Druid requires several dependencies, meaning you’ll have to use a Docker Compose file (or something resembling it) to spin up all the dependencies.
If you choose to install Apache Druid locally, you’ll have to be familiar with installing and setting up Java, as Druid requires a JDK. Neither of these options is inherently tricky, even if you don’t have prior knowledge. However, it’s something to keep in mind.
Apache Druid is completely free and open-source, with no options for paying anything in terms of license or hosting. However, you need to consider engineering hours and hosting costs. Apache Druid doesn’t offer any hosting services but it still needs to be hosted somewhere like Azure, AWS, or Google Cloud.
Because Apache Druid is open-source and supported by Apache, you can be sure to find a great community. It might not be as big as some of the other databases on this list. However, the members of the community are devoted.
The best way to find support for Apache Druid is via the #druid channel on Apache Slack. However, there’s also a forum, a meetup community, a GitHub repo, and a few more places where you can find support.
Some time series databases are better than others, especially when considering the use cases. Some of these solutions position themselves as complete solutions for time series workloads, like InfluxDB and Prometheus, while others focus on optimizing parts of the time series workload, like M3DB and Mimir.
The option that is right for you will largely depend on what use case you’re trying to solve and whether you want something that can be hosted. It’s important always to remember that the price may be more than you initially thought, and you need to consider the cost of hosting something yourself and maintaining the platform. It’s up to you to evaluate where you believe the cost is best spent.
If you’re interested in taking the work of scaling and managing a time-series database off your team’s plate, try Levitate. Our managed Prometheus database is built to deliver massive scale and high performance while taking the strain of managing your time-series database off your team.
Thanks to Kasper for contributing to this article.