Iperf3 and Grafana – Monitoring speed between sites

I have previously made some posts how I monitor bandwidth performance with Iperf2 and Check MK.  This works quite fine but only with the older Iperf. I also had to use the classic Nagios check that had some limitations regarding the parameters you want to use for Iperf. I have now found a good way of using Iperf3 and Grafana. Why do you want to do this?

  • Because running Speedtest will only messure the internet connection
  • You want to make sure the speed accross your sites is good
  • You want to messure this over time.

After doing this you will get something like this. In my case it checks the performance from our head office to our other offices. The bandwidth is 1Gbit.

First make sure you download Iperf3 and put the folder on your destination test server. This might be a windows or Linux server. It might be good to use the same os on all sites. Also make sure you have the same hardware. If one site is on a old switch or behind a firewall this might affect the result.

When you have installed Iperf run it in a prompt like “iperf3 -s”. You might want to make sure it is started by default as the prompt in Windows will close if the server is restarted.

In my case Grafana, Telegraf, Influx and Iperf is installed on the same server. An Ubuntu server. Install Iperf on you server like this:

sudo apt install iperf3

You can now test if it is working. In my case I use “iperf3 -c 10.10.10.2 -w 32M -P 4”. You will get an output similar like this:

Connecting to host 10.90.10.2, port 5201
[ 5] local 10.31.10.122 port 52114 connected to 10.90.10.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 77.5 MBytes 650 Mbits/sec 2427 1.00 MBytes
[ 5] 1.00-2.00 sec 46.2 MBytes 388 Mbits/sec 0 1.06 MBytes
[ 5] 2.00-3.00 sec 48.8 MBytes 409 Mbits/sec 0 1.19 MBytes
[ 5] 3.00-4.00 sec 27.5 MBytes 231 Mbits/sec 495 663 KBytes
[ 5] 4.00-5.00 sec 30.0 MBytes 252 Mbits/sec 0 727 KBytes
[ 5] 5.00-6.00 sec 33.8 MBytes 283 Mbits/sec 0 896 KBytes
[ 5] 6.00-7.00 sec 45.0 MBytes 377 Mbits/sec 0 1.18 MBytes
[ 5] 7.00-8.00 sec 60.0 MBytes 503 Mbits/sec 0 1.60 MBytes
[ 5] 8.00-9.00 sec 82.5 MBytes 692 Mbits/sec 0 2.17 MBytes
[ 5] 9.00-10.00 sec 105 MBytes 881 Mbits/sec 0 2.87 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 556 MBytes 467 Mbits/sec 2922 sender
[ 5] 0.00-10.00 sec 516 MBytes 433 Mbits/sec receiver

The reason why we use “-w 32M” is that I have different response times to my sites. If I run the command without a larger windows size the result will not be good.

For example:

Site one. Ping response time is about 25ms.

Running without -w

[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 8.55 MBytes 71.7 Mbits/sec 0 420 KBytes

I get about 71Mbit/s compared to 460 Mbit/s when using a larger size.

Site two. Ping response time is about 2ms.

Running without -w
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 42.6 MBytes 358 Mbits/sec 0 211 KBytes

I get about 358 Mbit/s compared to 460 Mbit/s when using a larger size.

To summarize it:

When you have many sites with different response times you will get slow performance by default on the sites with high response times and fast performance on sites with low response times

When you use -w 32M and a large windows size the performance will be similar in both cases.

Ok, Iperf is working now. Now we want to schedule it so we can get nice graphs over time. I have used Telegraf for this. Telegraf is something can can use for monitoring along with InfluxDB.

Install Telegraf. You can find the documentation here. Also install Influx database. There are plenty of info how to do this.

When Telegraf is installed open the conf file like “sudo vi erc/telegraf/telegraf.conf”. Add the sections like this:

[[inputs.exec]]
commands=["iperf3 -c 10.90.10.2 -w 32M --json"]
interval = "50m"
timeout = "240s"
data_format = "json"
json_query = "end"
name_override = "iperf3-Site1"

[[inputs.exec]]
commands=["iperf3 -c 10.35.10.16 -w 32M --json"]
interval = "60m"
timeout = "240s"
data_format = "json"
json_query = "end"
name_override = "iperf3-Site2"

Add as many checks as you want to. Make sure not to run them at the same time as this will make the data to be sent at the same time and make performance bad.

Test it with “telegraf –test” Laso restart Telegraf before. Now the Iperf is scheduled. And the best part is that the data is saved in your Influx database.

Make sure you have setup a database in Influx. In my case it looks like this in the telegraf.conf file:

############################################################################## # OUTPUT PLUGINS # ############################################################################### # Configuration for sending metrics to InfluxDB [[outputs.influxdb]] ## The full HTTP or UDP URL for your InfluxDB instance. ## ## Multiple URLs can be specified for a single cluster, only ONE of the ## urls will be written to each interval. # urls = [“unix:///var/run/influxdb.sock”] # urls = [“udp://127.0.0.1:8089”] urls = [“http://127.0.0.1:8086”] ## The target database for metrics; will be created as needed. database = “telegraf” ## HTTP Basic Auth username = “mysuername” password = “mypassword”

Now head to your Grafana site. Add the Influx datasource. Test it. If everything has been working as expected some Iperf data will now be in the database.

Start by adding a simple panel. Make sure to set the unit to bits/s. Choose Influx as datasource, select the field sum_sent_bits_per_second. Or any other fileld you may want. You might want to add another panel with received as well.

I have added all sites in a graph.

You can add any panels you want. In my case I also added this one showing sent.

So to summarize:

  • Iperf is great but won´t do the job if you want to schedule something
  • Infllux and Telegraf is great when you want to save the data overtime
  • Grafana can display this data in many ways.