4o1x5.dev/content/post/guides/nix/monitoring-via-prometheus/index.md

---
title: Monitor instances
description: Export metrics, collect them, visualize them.
date: 2024-05-17 07:00:00+0000
image: chris-yang-1tnS_BVy9Jk-unsplash.jpg
categories:
  - Nix
  - Guide
  - Sysadmin
  - Monitoring

tags:
  - Nix
  - Nginx
  - Prometheus
  - Exporters
  - Monitoring
  - Docker compose
draft: false

writingTime: "20m"
---

# Monitoring

Monitoring your instances allow you to keep track of servers load and its health overtime. Even looking at the stats once a day can make a huge difference as it allows you to prevent catastrophic disasters before they even happen.
I have been monitoring my servers with this method for years and I had many cases I was grateful for setting it all up.
In this small article I have included two guides to set these services up. First is with [NixOs](#nixos) and I also explain with [docker-compose](#docker-compose) but it's very sore as the main focus of this article is NixOS.

![Made with Excalidraw](graph1.png)

**Prometheus**
Prometheus is an open-source monitoring system. It helps to track, collect, and analyze
metrics from various applications and infrastructure components. It collects metrics from other software called _exporters_ that server a HTTP endpoint that return data in the prometheus data format.
Here is an example from `node-exporter`

```nix
# curl http://localhost:9100

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 2.54196405e+06
node_cpu_seconds_total{cpu="0",mode="iowait"} 4213.44
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 0.06
node_cpu_seconds_total{cpu="0",mode="softirq"} 743.4
...
```

**Grafana**
Grafana is an open-source data visualization and monitoring platform. It has hundreds of features embedded that can help you query from data sources like Prometheus, InfluxDB, MySQL and so on...

## NixOs

Nix makes it trivial to set up these services, as there are already predefined options for it in nixpkgs. I will give you example configuration files below that you can just copy and paste.

I have a guide on [remote deployment](/p/remote-deployments-on-nixos/) for NixOs, below you can see an example on a folder structure you can use to deploy the services.
{{< filetree/container >}}

{{< filetree/folder name="server1" state="closed" >}}

{{< filetree/folder name="services" state="closed" >}}
{{< filetree/file name="some-service.nix" >}}
{{< filetree/folder name="monitoring" state="closed" >}}
{{< filetree/file name="prometheus.nix" >}}
{{< filetree/file name="grafana.nix" >}}
{{< filetree/folder name="exporters" state="closed" >}}
{{< filetree/file name="node.nix" >}}
{{< filetree/file name="smartctl.nix" >}}
{{< /filetree/folder >}}
{{< /filetree/folder >}}
{{< /filetree/folder >}}

{{< filetree/file name="configuration.nix" >}}
{{< filetree/file name="flake.nix" >}}
{{< filetree/file name="flake.lock" >}}

{{< /filetree/folder >}}

{{< /filetree/container >}}

### Exporters

First is node-exporter. It exports all kind of system metrics ranging from cpu usage, load average and even systemd service count.

#### Node-exporter

```nix
# /services/monitoring/exporters/node.nix
{ pkgs, ... }: {
  services.prometheus.exporters.node = {
    enable = true;
    #port = 9001; #default is 9100
    enabledCollectors = [ "systemd" ];
  };
}
```

#### Smartctl

Smartctl is a tool included in the smartmontools package. It is a collection of monitoring tools for hard-drives, SSDs and filesystems.
This exporter enables you to check up on the health of your drive(s). And it will also give you a wall notifications if one of your drives has a bad sector(s), which mainly suggests it's dying off.

```nix
# /services/monitoring/exporters/smartctl.nix
{ pkgs, ... }: {
  # exporter
  services.prometheus.exporters.smartctl = {
    enable = true;
    devices = [ "/dev/sda" ];
  };
  # for wall notifications
  services.smartd = {
    enable = true;
    notifications.wall.enable = true;
    devices = [
      {
        device = "/dev/sda";
      }
    ];
  };

}
```

If you happen to have other drives you can just use `lsblk` to check their paths

```bash
nix-shell -p util-linux --command lsblk
```

For example here is my pc's drives

```
NAME                              MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda                                 8:0    1     0B  0 disk
nvme1n1                           259:0    0 476,9G  0 disk
├─nvme1n1p1                       259:1    0   512M  0 part  /boot
├─nvme1n1p2                       259:2    0 467,6G  0 part
│ └─luks-bbb8e429-bee1-4b5e-8ce8-c54f5f4f29a2
│                                 254:0    0 467,6G  0 crypt /nix/store
│                                                            /
└─nvme1n1p3                       259:3    0   8,8G  0 part
  └─luks-f7e86dde-55a5-4306-a7c2-cf2d93c9ee0b
                                  254:1    0   8,8G  0 crypt [SWAP]
nvme0n1                           259:4    0 931,5G  0 disk  /mnt/data
```

### Prometheus

Now that we have setup these two exporters we need to somehow collect their metrics.
Here is a config file for prometheus, with the scrape configs already written down.

```nix
# /services/monitoring/prometheus.nix
{pkgs, config, ... }:{

  services.prometheus = {
    enable = true;

    scrapeConfigs = [
      {
        job_name = "node";
        scrape_interval = "5s";
        static_configs = [
          {
            targets = [ "localhost:${toString config.services.prometheus.exporters.node.port}" ];
            labels = { alias = "node.server1.local"; };
          }
        ];
      }
      {
        job_name = "smartctl";
        scrape_interval = "5s";
        static_configs = [
          {
            targets = [ "localhost:${toString config.services.prometheus.exporters.smartctl.port}" ];
            labels = { alias = "smartctl.server1.local"; };
          }
        ];
      }
    ];
  };
}
```

I recommend setting the 5s delay to a bigger number if you have little storage as you can imagine it can generate a lot of data.
~16kB average per scrape (node-exporter). 1 day has 86400 seconds, divide that by 5 thats 17280 scrapes a day.
17280 \* 16 = 276480 kB. Thats 270 megabytes a day. And if you have multiple servers that causes X times as much.
30 days of scarping is about 8 gigabytes (1x). **But remember, by default prometheus stores data for 30 days!**

### Grafana

Now let's get onto gettin' a sexy dashboard like this. First we gotta setup grafana.

![Node exporter full (id 1860)](20240518_1958.png)

```nix
# /services/monitoring/grafana.nix
{ pkgs, config, ... }:
let
  grafanaPort = 3000;
in
{
  services.grafana = {
    enable = true;
    settings.server = {
      http_port = grafanaPort;
      http_addr = "0.0.0.0";
    };
    provision = {
      enable = true;
      datasources.settings.datasources = [
        {
          name = "prometheus";
          type = "prometheus";
          url = "http://127.0.0.1:${toString config.services.prometheus.port}";
          isDefault = true;
        }
      ];
    };
  };

  networking.firewall = {
    allowedTCPPorts = [ grafanaPort ];
    allowedUDPPorts = [ grafanaPort ];
  };
}
```

If you want to access it via the internet, change the following:

- `http_addr = "127.0.0.1"`
- remove the firewall allowed ports

This insures data will only flow thru the nginx reverse proxy

Remember to set `networking.domain = "example.com"` to your domain.

```nix
# /services/nginx.nix
{ pkgs, config, ... }:
let
  url = "http://127.0.0.1:${toString config.services.grafana.settings.server.http_port}";
in {
  services.nginx = {
    enable = true;

    virtualHosts = {
      "grafana.${config.networking.domain}" = {
        # Auto cert by let's encrypt
        forceSSL = true;
        enableACME = true;

        locations."/" = {
          proxyPass = url;
          extraConfig = "proxy_set_header Host $host;";
        };

        locations."/api" = {
          extraConfig = ''
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection $connection_upgrade;
            proxy_set_header Host $host;
          '';
          proxyPass = url;
        };
      };
    };
  };

  # enable 80 and 443 ports for nginx
  networking.firewall = {
    enable = true;
    allowedTCPPorts = [
      443
      80
    ];
    allowedUDPPorts = [
      443
      80
    ];
  };
}
```

### Log in

The default user is `admin` and password is `admin`. Grafana will ask you to change it upon logging-in!

### Add the dashboards

For node-exporter you can go to dashboards --> new --> import --> paste in `1860`
Now you can see all the metrics of all your server(s).

## Docker-compose

{{< filetree/container >}}

{{< filetree/folder name="monitoring-project" state="closed" >}}

{{< filetree/file name="docker-compose.yml" >}}
{{< filetree/file name="prometheus.nix" >}}

{{< /filetree/folder >}}

{{< /filetree/container >}}

### Compose project

I did not include a reverse proxy, neither smartctl as I forgot how to actually do it, that's how long I've been using nix :/

```yaml
# docker-compose.yml
version: "3.8"

networks:
  monitoring:
    driver: bridge

volumes:
  prometheus_data: {}

services:
  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    hostname: node-exporter
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - "--path.procfs=/host/proc"
      - "--path.rootfs=/rootfs"
      - "--path.sysfs=/host/sys"
      - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    hostname: prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.console.libraries=/etc/prometheus/console_libraries"
      - "--web.console.templates=/etc/prometheus/consoles"
      - "--web.enable-lifecycle"
    networks:
      - monitoring

 grafana:
    image: grafana/grafana:latest
    container_name: grafana
    networks:
      - monitoring
    restart: unless-stopped
    ports:
     - '3000:3000'
```

```yaml
# ./prometheus.yml
global:
  scrape_interval: 5s

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["node-exporter:9100"]
```

```bash
docker compose up -d
```

### Setup prometheus as data source inside grafana

Head to Connections --> Data sources --> Add new data source --> Prometheus
Type in http://prometheus:9090 as the URL, on the bottom click `Save & test`.

Now you can add the dashboards, [explained in this section](#add-the-dashboards)

Photo by <a href="https://unsplash.com/@chrisyangchrisfilm?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Chris Yang</a> on <a href="https://unsplash.com/photos/silhouette-photography-of-man-1tnS_BVy9Jk?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>