🏏 450 million fans watched the last IPL. What is 'Cricket Scale' for SREs? Know More

Jul 15th, ‘22/13 min read

Comparing Popular Time Series Databases

A comparison of all the popular time series databases. Prometheus, Influx, M3Db, Levitate.

Comparing Popular Time Series Databases

Time series analysis is nothing new and has been used in many industries for years. However, in software development, the concept has started gaining popularity in the past decade as new and exciting time series databases emerge. This post explores the best time series databases and compares them against each other.

While using a time series database (sometimes shortened as TSDB) isn’t necessary for most companies, it can provide many benefits. For instance, setting up monitoring systems for digital services and IOT can quickly become a complex topic, with some companies hiring full-time people to work solely on setting up and managing their monitoring systems. However, even a basic use case like monitoring CPU usage can showcase the advantage of a time series database. When you look at the CPU usage of a virtual machine (VM), you don’t just want a momentary look into the system; you want to aggregate time-series data to know what the usage looks like over time. This can help you identify critical issues like big spikes in usage.

Of course, this information can also be stored in a traditional database, so why use a time series database? Because it is a specially developed tool that provides features that are specifically needed for time-series datasets, like being able to ingest incredibly large numbers of data points per second and query the data in a structured way for visualization.

In this article, you’ll be introduced to eight options for time series databases. These options will be compared on four different parameters: ease of use (which includes maintenance and query language support), installation experience, pricing, and customer support.


Prometheus

Prometheus is one of the most popular time series databases available and is the de facto in systems like Kubernetes.

Ease of Use

If you’ve ever worked with or considered working with Kubernetes, you’ll likely have heard about Prometheus. It’s the standard for Kubernetes monitoring because it’s very easy to use. For instance, even tools like InfluxDB expose Prometheus endpoints to query.

You can interact with Prometheus using their PromQL language or one of their client libraries. Officially, they support popular programming languages like Go, Python, and Rust. However, third-party libraries for languages like C#, Node.js, or PHP exist. You can view all the supported libraries in the Prometheus docs.

Installation of Prometheus


Setting up Prometheus is easy if you know how Prometheus works. In Kubernetes, Prometheus will be installed by default. On other systems, you’ll have to download a binary, and from there, you can install it.

The tool is configured via a prometheus.yml file. This is where the needed Prometheus knowledge comes into play. It does come with a sample configuration you can use; however, it’s a good idea to understand this file before deploying the tool.

Cost

Prometheus is, by default, free to use. You don’t need any license and can deploy the tool however you want. If you aren’t interested in hosting the tool, there are managed versions of Prometheus. The company doesn’t provide a managed solution, but companies like Last9 , Google Cloud Platform, Microsoft Azure, and Amazon Web Services offer some managed services. This again cements Prometheus as one of the most popular time series databases.

Support

Besides their official GitHub repo, where you can open an issue, there aren’t official support channels. When working with Prometheus, you rely on the community to help you with your questions. However, this shouldn’t scare you away because the tool is so popular. You are bound to find an answer quickly or get an answer quickly if you post it.

InfluxDB

InfluxDB, an open-source time series database platform, is popular due to its powerful API and toolset for real-time applications.

Ease of Use

One of the most important things to consider when choosing a new tool to implement in your toolchain is its easy use. This will play a huge role for your developers when they need to implement it into your applications.

InfluxDB is popular in the eyes of many, partly because of how easy it is to use. Regarding writing to the database, you can choose among various options, including no-code solutions like a Telegraf plug-in or interact with the API directly using language-specific client libraries.

Querying data is also straightforward, and you can use Flux, a scripting language developed specifically for interacting with InfluxDB. Similar to Python, Flux is an easy-to-use language.

Installation of InfluxDB

Getting started with InfluxDB is easy, as you can find well-written instructions for installing it on the most popular platforms.

For instance, if you want to use it while developing, you can install it locally on Windows or Mac. You can also find instructions for installing InfluxDB with simple commands in Docker or Kubernetes.

As soon as you’ve installed InfluxDB, no necessary configurations need to be made, which means you can get started quickly.

Cost

InfluxDB can be downloaded and deployed for free on your own infrastructure. In this case, it’s up to you to figure out how much you’re paying for your infrastructure and, from there, figure out how much the base cost of InfluxDB will be. However, InfluxDB also offers a hosted solution called InfluxDB Cloud. Here, you can either choose between a free plan that is rate-limited or a usage-based plan where you pay $0.002 USD per MB, $0.01 USD per 100 query executions, $0.002 USD per GB-hour you are storing, and $0.09 USD per GB of data you are transferring out of InfluxDB.

Support

The chances that you will have to contact anyone to find answers to your questions when working with InfluxDB are slim, as they provide you with well-written documentation. In addition, they have a great community where you are likely to find that someone else has already asked your question.

Should you run into a scenario where you cannot find an answer, you can either post your question in the community forum for yourself or contact the official InfluxDB support channel. However, this is only available for paying customers.

kdb+

kdb+ is a database that can be accessed by various interfaces, such as the standardized Open Database Connectivity (ODBC) interface or client libraries in languages like Python.

Ease of Use

kdb+’s ease of use heavily depends on what your use case is. When you install it, you also install the q language, which is a language that heavily resembles SQL. You can launch this from the command line, and from there, you can interact with kdb+.

If you want to implement kdb+ in an application, you must spend some time researching. Client libraries are available for C# and Python; otherwise, you must rely on an ODBC connection.

The documentation for kdb+ can also be considered somewhat lackluster regarding real-world usage, relying on many theoretical examples.

Installation of kdb+

Getting kdb+ installed isn’t necessarily a challenging experience; however, it can be tedious. You aren’t provided with any easy-to-use one-liners for installing the database. Should you be looking to install kdb+ in a cloud provider, you can find options for it on most marketplaces; however, it will be installed on a regular VM, as kdb+ is in no way developed to be cloud native.

Cost

kdb+ is free to use for personal use; however, you need to buy a commercial license for commercial use cases. Unfortunately, there are no public numbers for what this license costs; however, anecdotal evidence by some users puts the cost at around $100,000 USD per year.

Support

With a price tag like kdb+, it’s fair to expect several things. Thankfully, buying a license with kdb+ means you’re getting access to a big team of supporters ready to help you with your queries. This includes access to a first-line support team but also some of the best q programmers, making it very likely you will get an answer to your questions.

M3DB

M3DB is a great option for those who aren’t necessarily looking to learn new technology but are looking for something that works with existing technologies.

Ease of Use

There are three ways to use M3DB. You can use the binary provided by the company, interact with it programmatically, or use any tool that supports Prometheus or InfluxDB.

M3DB wasn’t developed to revolutionize the market, but it tries to optimize what already exists. Because of this, it supports both the InfluxDB and Prometheus interfaces. You can even add it as a data source in Grafana by using the Prometheus type. This means you have plenty of options for interacting with the tool in your daily workflow.

Installation of M3DB

There are two primary installation options for M3DB. One option is Kubernetes, where the installation procedure is pretty straightforward. The other option is to install M3DB directly on your machine, in which case it becomes more complex.

Deploying M3DB yourself includes networking, configuring the hosts, and setting up namespaces. You can view the complete installation instructions on their docs.

Cost

M3DB has no paid options, essentially making it a free tool. However, nothing is ever free. When you manage something yourself, you must pay the engineers who set up and maintain the tool. You must also pay for the underlying resources needed to host the tool. If you don’t want to host M3DB for yourself, there is one third-party option for a managed solution: Aiven.

Support

There are no official support channels for M3DB; instead, you must rely on the community—the community posts issues on the official GitHub repo and Slack channel. Even with no official support channels, you’re likely to find support for your questions, as there are over 375 active members on the Slack channel and 78 contributors across over 20 companies on the GitHub repo. However, it’s still something to keep in mind when considering M3DB.

Mimir


Grafana is a popular tool to visualize data and time series metrics; however, it does so by leveraging various data sources. Because of that, Grafana developed their time series database, Grafana Mimir.

Ease of Use

Mimir is unique in time series databases because it isn’t a stand-alone product. The purpose of Mimir is to provide a long-term storage solution for Prometheus. Because of this, Mimir is also very easy to use in your daily workflow, as it behaves like an extension of Prometheus. This means you can use Prometheus connectors to query data from the tool.

Installation of Mimir

Mimir is likely one of the easiest products on this list to deploy. The tool is only distributed as a container image, meaning you’ll either deploy it locally using Docker or, more likely, you’ll deploy it in Kubernetes. The officially supported Helm chart can quickly do this.

Cost

Mimir has no paid option; the only cost is the price you will spend for the underlying resources and the engineering hours put into setting up and maintaining the tool.

Support

Mimir was developed by Grafana, meaning you can expect the same amount of support you would typically get from Grafana. You’re limited to the official documentation and the community when using a free account. With no specific numbers on how many people the community consists of, you’re likely to get support, but it’s also expected that getting a proper answer can take a few days.

Graphite

Used by many big companies like GitHub, Reddit, and Lyft, Graphite is a robust database that focuses on running well no matter what kind of hardware you’re running it on.

Ease of Use

Graphite is one of the less polished options on this list. This isn’t a commentary on the build of the tool but instead on the documentation. The documentation for Graphite is well-written, and there are answers to most of your questions; however, it lacks a sense of structure, meaning it can quickly become confusing when you’re working with it and need answers fast. Besides that, there are three ways to get data into Graphite: plaintext, pickle, or Advanced Message Queuing Protocol (AMQP).

Installation of Graphite

Getting started with Graphite is easy because they provide a container image you can deploy within a few minutes. Should you decide containers aren’t for you, you can also choose to install from source, using pip, virtualenv, Synthesize, or REsynthesize.

Cost

Graphite is free to use, and there are no paid options. As with Mimir, the only cost will be paying for underlying resources and engineering hours.

Support

Getting support for Graphite relies entirely on the community, which is very apparent when clicking the support link on the homepage for Graphite, which links directly to a Stack Exchange page. The community isn’t hugely active, with only a few questions being posted per month.

TimescaleDB

TimescaleDB is one of the biggest time series databases used by some of the biggest companies in the world, like Marvel Studios, Apple Inc., and Walmart.

Ease of Use

TimescaleDB is the only tool on this list that utilizes regular structured SQL as the basis for interacting with the system. In this case, it’s PostgreSQL specifically. If you’ve worked with PostgreSQL before, using TimescaleDB will be easy. Even if you haven’t worked with PostgreSQL before, it’s just like any other SQL-like language, and you can use your favorite [object-relational mapping ORM in your application to write and query data.

Installation of TimescaleDB

There are two options when installing TimescaleDB. You can either deploy it on your own hardware or use Timescale Cloud. To deploy it locally, you need to follow the instructions provided by Timescale, which can be done on all major platforms. If you want to use the hosted solution, it’s as easy as signing up for an account and creating a service.

Cost

If you deploy Timescale yourself, there are no direct costs related to the software. However, you have to consider the cost of the underlying resources and the engineering hours you put into setting up and maintaining the tool. If you instead choose the hosted option, the price will depend on the size of the nodes. You can get it for $39 USD for 25 GB storage, 0.5 vCPUs, and 2 GB RAM. This costs $9,229 USD for 16 TB storage, 32 vCPUs, and 128 GB RAM.

Support

If you’re running the service on your own hardware, you can only expect to get support from the community. However, a basic support package is included if you opt for the hosted version. With this package, you can ask all your questions; however, you should expect responses as supporters are available. You can also buy an enterprise package where response time is based on severity.

Apache Druid

Apache Druid focuses heavily on being an analytics database, whereas other databases focus on simply being able to execute, store, and query requests fast. This lets the user decide what to use the database for, making Apache Druid a good choice for use cases like business intelligence and metric dashboards.

Ease of Use

When you’ve gotten Apache Druid set up, you can choose to either use the provided GUI or the scripts downloaded during installation. Note that these are scripts you are provided with. Apache Druid does not come with a CLI tool for you to use. This means you cannot simply execute a command druidctl load-data from anywhere in your terminal; you will always have to find the script. A minor inconvenience, but one to be aware of.

Getting data into Druid takes some getting used to. In most cases, you will define a JSON file called an “ingestion task spec”. This file will contain different parameters like what type of ingestion you want to use, what the data schema looks like, and where the data comes from. With the file created, you can tell Druid to ingest the data by clicking around in the GUI or using one of the provided scripts.

One of the significant things to be aware of when contemplating using Druid is that there is no native support for client libraries. You must implement a custom or a third-party solution to ingest data directly from one of your applications. The most common key to solving this problem is to pipe your data into Apache Kafka, from which Druid can automatically load the data.

Installation of Apache Druid

How easy Apache Druid is to get started depends on your prior knowledge. There are mainly two options for setting up the database: Docker or local. If you have existing knowledge about Docker, Apache Druid will be easy to set up as it works almost like any other Docker container. The main difference is that Apache Druid requires several dependencies, meaning you’ll have to use a Docker Compose file (or something resembling it) to spin up all the dependencies.

If you choose to install Apache Druid locally, you must be familiar with installing and setting up Java, as Druid requires a JDK. Neither of these options is inherently tricky, even if you don’t have prior knowledge. However, it’s something to keep in mind.

Cost

Apache Druid is free and open-source, with no options for paying anything regarding license or hosting. However, you need to consider engineering hours and hosting costs. Apache Druid doesn’t offer hosting services, but it must still be hosted somewhere like Azure, AWS, or Google Cloud.

Support

Because Apache Druid is open-source and supported by Apache, you can surely find a great community. It might not be as big as some other databases on this list. However, the members of the community are devoted.

The best way to find support for Apache Druid is via the #druid channel on Apache Slack. However, there’s a forum, a meetup community, a GitHub repo, and a few more places to find support.

Conclusion

Some time series databases are better than others, especially when considering the use cases. Some of these solutions are complete solutions for time series workloads, like InfluxDB and Prometheus, while others focus on optimizing parts of the time series workload, like M3DB and Mimir.

The option that is right for you will largely depend on what use case you’re trying to solve and whether you want something that can be hosted. It’s important to remember that the price may be more than you initially thought, and you need to consider the cost of hosting something and maintaining the platform. It’s up to you to evaluate where you believe the cost is best spent.

If you’re interested in taking the work of scaling and managing a time-series database off your team’s plate, try Levitate. Our managed Prometheus-compatible database is built to deliver massive scale and high performance while taking the strain of managing your time-series database off your team.

Thanks to Kasper for contributing to this article.

Contents


Newsletter

Stay updated on the latest from Last9.