4o1x5.dev/content/post/guides/nix/monitoring-via-prometheus/index.md
2024-05-18 21:39:39 +02:00

391 lines
11 KiB
Markdown

---
title: Monitor instances
description: Export metrics, collect them, visualize them.
date: 2024-05-17 07:00:00+0000
image: chris-yang-1tnS_BVy9Jk-unsplash.jpg
categories:
- Nix
- Guide
- Sysadmin
- Monitoring
tags:
- Nix
- Nginx
- Prometheus
- Exporters
- Monitoring
- Docker compose
draft: false
---
# Monitoring
Monitoring your instances allow you to keep track of servers load and its health overtime. Even looking at the stats once a day can make a huge difference as it allows you to prevent catastrophic disasters before they even happen.
I have been monitoring my servers with this method for years and I had many cases I was grateful for setting it all up.
In this small article I have included two guides to set these services up. First is with [NixOs](#nixos) and I also explain with [docker-compose](#docker-compose) but it's very sore as the main focus of this article is NixOS.
![Made with Excalidraw](graph1.png)
**Prometheus**
Prometheus is an open-source monitoring system. It helps to track, collect, and analyze
metrics from various applications and infrastructure components. It collects metrics from other software called _exporters_ that server a HTTP endpoint that return data in the prometheus data format.
Here is an example from `node-exporter`
```nix
# curl http://localhost:9100
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 2.54196405e+06
node_cpu_seconds_total{cpu="0",mode="iowait"} 4213.44
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 0.06
node_cpu_seconds_total{cpu="0",mode="softirq"} 743.4
...
```
**Grafana**
Grafana is an open-source data visualization and monitoring platform. It has hundreds of features embedded that can help you query from data sources like Prometheus, InfluxDB, MySQL and so on...
## NixOs
Nix makes it trivial to set up these services, as there are already predefined options for it in nixpkgs. I will give you example configuration files below that you can just copy and paste.
I have a guide on [remote deployment](/p/remote-deployments-on-nixos/) for NixOs, below you can see an example on a folder structure you can use to deploy the services.
{{< filetree/container >}}
{{< filetree/folder name="server1" state="closed" >}}
{{< filetree/folder name="services" state="closed" >}}
{{< filetree/file name="some-service.nix" >}}
{{< filetree/folder name="monitoring" state="closed" >}}
{{< filetree/file name="prometheus.nix" >}}
{{< filetree/file name="grafana.nix" >}}
{{< filetree/folder name="exporters" state="closed" >}}
{{< filetree/file name="node.nix" >}}
{{< filetree/file name="smartctl.nix" >}}
{{< /filetree/folder >}}
{{< /filetree/folder >}}
{{< /filetree/folder >}}
{{< filetree/file name="configuration.nix" >}}
{{< filetree/file name="flake.nix" >}}
{{< filetree/file name="flake.lock" >}}
{{< /filetree/folder >}}
{{< /filetree/container >}}
### Exporters
First is node-exporter. It exports all kind of system metrics ranging from cpu usage, load average and even systemd service count.
#### Node-exporter
```nix
# /services/monitoring/exporters/node.nix
{ pkgs, ... }: {
services.prometheus.exporters.node = {
enable = true;
#port = 9001; #default is 9100
enabledCollectors = [ "systemd" ];
};
}
```
#### Smartctl
Smartctl is a tool included in the smartmontools package. It is a collection of monitoring tools for hard-drives, SSDs and filesystems.
This exporter enables you to check up on the health of your drive(s). And it will also give you a wall notifications if one of your drives has a bad sector(s), which mainly suggests it's dying off.
```nix
# /services/monitoring/exporters/smartctl.nix
{ pkgs, ... }: {
# exporter
services.prometheus.exporters.smartctl = {
enable = true;
devices = [ "/dev/sda" ];
};
# for wall notifications
services.smartd = {
enable = true;
notifications.wall.enable = true;
devices = [
{
device = "/dev/sda";
}
];
};
}
```
If you happen to have other drives you can just use `lsblk` to check their paths
```bash
nix-shell -p util-linux --command lsblk
```
For example here is my pc's drives
```
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 1 0B 0 disk
nvme1n1 259:0 0 476,9G 0 disk
├─nvme1n1p1 259:1 0 512M 0 part /boot
├─nvme1n1p2 259:2 0 467,6G 0 part
│ └─luks-bbb8e429-bee1-4b5e-8ce8-c54f5f4f29a2
│ 254:0 0 467,6G 0 crypt /nix/store
│ /
└─nvme1n1p3 259:3 0 8,8G 0 part
└─luks-f7e86dde-55a5-4306-a7c2-cf2d93c9ee0b
254:1 0 8,8G 0 crypt [SWAP]
nvme0n1 259:4 0 931,5G 0 disk /mnt/data
```
### Prometheus
Now that we have setup these two exporters we need to somehow collect their metrics.
Here is a config file for prometheus, with the scrape configs already written down.
```nix
# /services/monitoring/prometheus.nix
{pkgs, config, ... }:{
services.prometheus = {
enable = true;
scrapeConfigs = [
{
job_name = "node";
scrape_interval = "5s";
static_configs = [
{
targets = [ "localhost:${toString config.services.prometheus.exporters.node.port}" ];
labels = { alias = "node.server1.local"; };
}
];
}
{
job_name = "smartctl";
scrape_interval = "5s";
static_configs = [
{
targets = [ "localhost:${toString config.services.prometheus.exporters.smartctl.port}" ];
labels = { alias = "smartctl.server1.local"; };
}
];
}
];
};
}
```
I recommend setting the 5s delay to a bigger number if you have little storage as you can imagine it can generate a lot of data.
~16kB average per scrape (node-exporter). 1 day has 86400 seconds, divide that by 5 thats 17280 scrapes a day.
17280 \* 16 = 276480 kB. Thats 270 megabytes a day. And if you have multiple servers that causes X times as much.
30 days of scarping is about 8 gigabytes (1x). **But remember, by default prometheus stores data for 30 days!**
### Grafana
Now let's get onto gettin' a sexy dashboard like this. First we gotta setup grafana.
![Node exporter full (id 1860)](20240518_1958.png)
```nix
# /services/monitoring/grafana.nix
{ pkgs, config, ... }:
let
grafanaPort = 3000;
in
{
services.grafana = {
enable = true;
settings.server = {
http_port = grafanaPort;
http_addr = "0.0.0.0";
};
provision = {
enable = true;
datasources.settings.datasources = [
{
name = "prometheus";
type = "prometheus";
url = "http://127.0.0.1:${toString config.services.prometheus.port}";
isDefault = true;
}
];
};
};
networking.firewall = {
allowedTCPPorts = [ grafanaPort ];
allowedUDPPorts = [ grafanaPort ];
};
}
```
If you want to access it via the internet, change the following:
- `http_addr = "127.0.0.1"`
- remove the firewall allowed ports
This insures data will only flow thru the nginx reverse proxy
Remember to set `networking.domain = "example.com"` to your domain.
```nix
# /services/nginx.nix
{ pkgs, config, ... }:
let
url = "http://127.0.0.1:${toString config.services.grafana.settings.server.http_port}";
in {
services.nginx = {
enable = true;
virtualHosts = {
"grafana.${config.networking.domain}" = {
# Auto cert by let's encrypt
forceSSL = true;
enableACME = true;
locations."/" = {
proxyPass = url;
extraConfig = "proxy_set_header Host $host;";
};
locations."/api" = {
extraConfig = ''
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $host;
'';
proxyPass = url;
};
};
};
};
# enable 80 and 443 ports for nginx
networking.firewall = {
enable = true;
allowedTCPPorts = [
443
80
];
allowedUDPPorts = [
443
80
];
};
}
```
### Log in
The default user is `admin` and password is `admin`. Grafana will ask you to change it upon logging-in!
### Add the dashboards
For node-exporter you can go to dashboards --> new --> import --> paste in `1860`
Now you can see all the metrics of all your server(s).
## Docker-compose
{{< filetree/container >}}
{{< filetree/folder name="monitoring-project" state="closed" >}}
{{< filetree/file name="docker-compose.yml" >}}
{{< filetree/file name="prometheus.nix" >}}
{{< /filetree/folder >}}
{{< /filetree/container >}}
### Compose project
I did not include a reverse proxy, neither smartctl as I forgot how to actually do it, that's how long I've been using nix :/
```yaml
# docker-compose.yml
version: "3.8"
networks:
monitoring:
driver: bridge
volumes:
prometheus_data: {}
services:
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: unless-stopped
hostname: node-exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- "--path.procfs=/host/proc"
- "--path.rootfs=/rootfs"
- "--path.sysfs=/host/sys"
- "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
networks:
- monitoring
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
hostname: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--web.console.libraries=/etc/prometheus/console_libraries"
- "--web.console.templates=/etc/prometheus/consoles"
- "--web.enable-lifecycle"
networks:
- monitoring
grafana:
image: grafana/grafana:latest
container_name: grafana
networks:
- monitoring
restart: unless-stopped
ports:
- '3000:3000'
```
```yaml
# ./prometheus.yml
global:
scrape_interval: 5s
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["node-exporter:9100"]
```
```bash
docker compose up -d
```
### Setup prometheus as data source inside grafana
Head to Connections --> Data sources --> Add new data source --> Prometheus
Type in http://prometheus:9090 as the URL, on the bottom click `Save & test`.
Now you can add the dashboards, [explained in this section](#add-the-dashboards)
Photo by <a href="https://unsplash.com/@chrisyangchrisfilm?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Chris Yang</a> on <a href="https://unsplash.com/photos/silhouette-photography-of-man-1tnS_BVy9Jk?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>