InfluxDB Tips #1

InfluxDB is designed to enable fast reading and writing of data, at the expense of not allowing for updating.

If you only need to change or add field values and not the measurement name or tags then you can overwrite existing data by ensuring the same timestamp, tags, and tag values are used (You can think of the combination of the timestamp and tags forming a primary key or unique row).

If you need to change the measurement name or change tags in any way, then it’s quite an involved process that requires some thought to avoid making mistakes.

Disclaimer

I am not an expert or authority on InfluxDB, I’m just sharing my real-world experience. I can not be held responsible for anyone who follows any instructions in this article without fully understanding them and the consequences for their data.

It’s very easily to overlook something when performing the steps described below and end up with a mistake that results in lost or incorrect data. This is why it’s a good idea to take a backup of the bucket or whole instance and test the process on a copy before attempting it on your “live” data.

Changing a tag

Say, for example, you wanted to change the DC tag value from ABC to XYZ for the CPU measurement which is collected from Telegraf.

The first step is to ensure that any new data from Telegraf is using the new DC, or else you will just end up having to run through this process again.

The next step is to ensure the query is producing the correct set of data, I do this using the built-in InfluxDB Data Explorer web interface. I change the default display mode from Graph to Table in to see the group keys. Note that I use a fairly small time range (1 hour) at this stage to avoid the query taking too long to return, or returning too much data and “upsetting” the browser.

1
2
3
4
from(bucket: "telegraf/autogen")
  |> range(start: -1h)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["dc"] == "ABC")

Once I’m happy that my query is returning only the data I want to change, then I can append the following 2 functions.

1
2
  |> set(key: "dc", value: "XYZ")
  |> to(bucket: "telegraf/autogen")

The set function is simply changing the DC from its existing value to XYZ, and the to function is writing out the data, in this case to the same bucket as we are reading from.

If you run the query as it then you will only be working on the last hour of data, to ensure all the data is included you can change the start range to be something like -10000w (10,000 weeks). You might also want to prevent all the data from being displayed in the browser by assigning it to a variable instead returning it to the browser/screen (yielding), so the final query would be as below.

1
2
3
4
5
6
x = from(bucket: "telegraf/autogen")
  |> range(start: -10000w)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> filter(fn: (r) => r["dc"] == "ABC")
  |> set(key: "dc", value: "XYZ")
  |> to(bucket: "telegraf/autogen")

When you run this query it will give you an error which includes “this Flux script returns no streaming data”, this is as expected and the data is still being written to the bucket, just not streamed out to the browser.

At this stage, you should have 2 sets of data, one with the old tag, and a nearly identical set with the new tag. I usually perform some checks to make sure everything looks correct such as counting the number of records for each DC tag. The number for the old and new tags should be almost identical, but the count of the new tag should increase over time as more data is added by Telegraf, while the count of the old tag should be static.

1
2
3
4
5
from(bucket: "telegraf/autogen")
  |> range(start: -10000w)
  |> filter(fn: (r) => r["_measurement"] == "cpu")
  |> group(columns: ["dc"])
  |> count()

Deleting the old data

This should go without saying - be very careful with your –predicate parameter when using the delete command. It’s always best to test on a copy of your data first.

To delete the old tag we need to use the command line (or the API)

1
influx delete -c live --bucket telegraf/autogen --org "Your Org" --start 1970-01-01T00:00:00Z --stop `date -u +"%Y-%m-%dT%H:%M:%SZ"` --predicate ' _measurement!="cpu" AND dc="ABC"'

The -c parameter is only required if you have multiple configs (multiple InfluxDB instances)
The –bucket, –org, and –start / –stop parameters are required. In this case, –start is an arbitrary date in the past that is guaranteed to be older than the first data point, –stop is simply the current date/time as returned by the Linux date utility.