393 lines
11 KiB
Markdown
393 lines
11 KiB
Markdown
---
|
|
title: Monitor instances
|
|
description: Export metrics, collect them, visualize them.
|
|
date: 2024-05-17 07:00:00+0000
|
|
image: chris-yang-1tnS_BVy9Jk-unsplash.jpg
|
|
categories:
|
|
- Nix
|
|
- Guide
|
|
- Sysadmin
|
|
- Monitoring
|
|
|
|
tags:
|
|
- Nix
|
|
- Nginx
|
|
- Prometheus
|
|
- Exporters
|
|
- Monitoring
|
|
- Docker compose
|
|
draft: false
|
|
|
|
writingTime: "20m"
|
|
---
|
|
|
|
# Monitoring
|
|
|
|
Monitoring your instances allow you to keep track of servers load and its health overtime. Even looking at the stats once a day can make a huge difference as it allows you to prevent catastrophic disasters before they even happen.
|
|
I have been monitoring my servers with this method for years and I had many cases I was grateful for setting it all up.
|
|
In this small article I have included two guides to set these services up. First is with [NixOs](#nixos) and I also explain with [docker-compose](#docker-compose) but it's very sore as the main focus of this article is NixOS.
|
|
|
|
![Made with Excalidraw](graph1.png)
|
|
|
|
**Prometheus**
|
|
Prometheus is an open-source monitoring system. It helps to track, collect, and analyze
|
|
metrics from various applications and infrastructure components. It collects metrics from other software called _exporters_ that server a HTTP endpoint that return data in the prometheus data format.
|
|
Here is an example from `node-exporter`
|
|
|
|
```nix
|
|
# curl http://localhost:9100
|
|
|
|
# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
|
|
# TYPE node_cpu_seconds_total counter
|
|
node_cpu_seconds_total{cpu="0",mode="idle"} 2.54196405e+06
|
|
node_cpu_seconds_total{cpu="0",mode="iowait"} 4213.44
|
|
node_cpu_seconds_total{cpu="0",mode="irq"} 0
|
|
node_cpu_seconds_total{cpu="0",mode="nice"} 0.06
|
|
node_cpu_seconds_total{cpu="0",mode="softirq"} 743.4
|
|
...
|
|
```
|
|
|
|
**Grafana**
|
|
Grafana is an open-source data visualization and monitoring platform. It has hundreds of features embedded that can help you query from data sources like Prometheus, InfluxDB, MySQL and so on...
|
|
|
|
## NixOs
|
|
|
|
Nix makes it trivial to set up these services, as there are already predefined options for it in nixpkgs. I will give you example configuration files below that you can just copy and paste.
|
|
|
|
I have a guide on [remote deployment](/p/remote-deployments-on-nixos/) for NixOs, below you can see an example on a folder structure you can use to deploy the services.
|
|
{{< filetree/container >}}
|
|
|
|
{{< filetree/folder name="server1" state="closed" >}}
|
|
|
|
{{< filetree/folder name="services" state="closed" >}}
|
|
{{< filetree/file name="some-service.nix" >}}
|
|
{{< filetree/folder name="monitoring" state="closed" >}}
|
|
{{< filetree/file name="prometheus.nix" >}}
|
|
{{< filetree/file name="grafana.nix" >}}
|
|
{{< filetree/folder name="exporters" state="closed" >}}
|
|
{{< filetree/file name="node.nix" >}}
|
|
{{< filetree/file name="smartctl.nix" >}}
|
|
{{< /filetree/folder >}}
|
|
{{< /filetree/folder >}}
|
|
{{< /filetree/folder >}}
|
|
|
|
{{< filetree/file name="configuration.nix" >}}
|
|
{{< filetree/file name="flake.nix" >}}
|
|
{{< filetree/file name="flake.lock" >}}
|
|
|
|
{{< /filetree/folder >}}
|
|
|
|
{{< /filetree/container >}}
|
|
|
|
### Exporters
|
|
|
|
First is node-exporter. It exports all kind of system metrics ranging from cpu usage, load average and even systemd service count.
|
|
|
|
#### Node-exporter
|
|
|
|
```nix
|
|
# /services/monitoring/exporters/node.nix
|
|
{ pkgs, ... }: {
|
|
services.prometheus.exporters.node = {
|
|
enable = true;
|
|
#port = 9001; #default is 9100
|
|
enabledCollectors = [ "systemd" ];
|
|
};
|
|
}
|
|
```
|
|
|
|
#### Smartctl
|
|
|
|
Smartctl is a tool included in the smartmontools package. It is a collection of monitoring tools for hard-drives, SSDs and filesystems.
|
|
This exporter enables you to check up on the health of your drive(s). And it will also give you a wall notifications if one of your drives has a bad sector(s), which mainly suggests it's dying off.
|
|
|
|
```nix
|
|
# /services/monitoring/exporters/smartctl.nix
|
|
{ pkgs, ... }: {
|
|
# exporter
|
|
services.prometheus.exporters.smartctl = {
|
|
enable = true;
|
|
devices = [ "/dev/sda" ];
|
|
};
|
|
# for wall notifications
|
|
services.smartd = {
|
|
enable = true;
|
|
notifications.wall.enable = true;
|
|
devices = [
|
|
{
|
|
device = "/dev/sda";
|
|
}
|
|
];
|
|
};
|
|
|
|
}
|
|
```
|
|
|
|
If you happen to have other drives you can just use `lsblk` to check their paths
|
|
|
|
```bash
|
|
nix-shell -p util-linux --command lsblk
|
|
```
|
|
|
|
For example here is my pc's drives
|
|
|
|
```
|
|
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
|
|
sda 8:0 1 0B 0 disk
|
|
nvme1n1 259:0 0 476,9G 0 disk
|
|
├─nvme1n1p1 259:1 0 512M 0 part /boot
|
|
├─nvme1n1p2 259:2 0 467,6G 0 part
|
|
│ └─luks-bbb8e429-bee1-4b5e-8ce8-c54f5f4f29a2
|
|
│ 254:0 0 467,6G 0 crypt /nix/store
|
|
│ /
|
|
└─nvme1n1p3 259:3 0 8,8G 0 part
|
|
└─luks-f7e86dde-55a5-4306-a7c2-cf2d93c9ee0b
|
|
254:1 0 8,8G 0 crypt [SWAP]
|
|
nvme0n1 259:4 0 931,5G 0 disk /mnt/data
|
|
```
|
|
|
|
### Prometheus
|
|
|
|
Now that we have setup these two exporters we need to somehow collect their metrics.
|
|
Here is a config file for prometheus, with the scrape configs already written down.
|
|
|
|
```nix
|
|
# /services/monitoring/prometheus.nix
|
|
{pkgs, config, ... }:{
|
|
|
|
services.prometheus = {
|
|
enable = true;
|
|
|
|
scrapeConfigs = [
|
|
{
|
|
job_name = "node";
|
|
scrape_interval = "5s";
|
|
static_configs = [
|
|
{
|
|
targets = [ "localhost:${toString config.services.prometheus.exporters.node.port}" ];
|
|
labels = { alias = "node.server1.local"; };
|
|
}
|
|
];
|
|
}
|
|
{
|
|
job_name = "smartctl";
|
|
scrape_interval = "5s";
|
|
static_configs = [
|
|
{
|
|
targets = [ "localhost:${toString config.services.prometheus.exporters.smartctl.port}" ];
|
|
labels = { alias = "smartctl.server1.local"; };
|
|
}
|
|
];
|
|
}
|
|
];
|
|
};
|
|
}
|
|
```
|
|
|
|
I recommend setting the 5s delay to a bigger number if you have little storage as you can imagine it can generate a lot of data.
|
|
~16kB average per scrape (node-exporter). 1 day has 86400 seconds, divide that by 5 thats 17280 scrapes a day.
|
|
17280 \* 16 = 276480 kB. Thats 270 megabytes a day. And if you have multiple servers that causes X times as much.
|
|
30 days of scarping is about 8 gigabytes (1x). **But remember, by default prometheus stores data for 30 days!**
|
|
|
|
### Grafana
|
|
|
|
Now let's get onto gettin' a sexy dashboard like this. First we gotta setup grafana.
|
|
|
|
![Node exporter full (id 1860)](20240518_1958.png)
|
|
|
|
```nix
|
|
# /services/monitoring/grafana.nix
|
|
{ pkgs, config, ... }:
|
|
let
|
|
grafanaPort = 3000;
|
|
in
|
|
{
|
|
services.grafana = {
|
|
enable = true;
|
|
settings.server = {
|
|
http_port = grafanaPort;
|
|
http_addr = "0.0.0.0";
|
|
};
|
|
provision = {
|
|
enable = true;
|
|
datasources.settings.datasources = [
|
|
{
|
|
name = "prometheus";
|
|
type = "prometheus";
|
|
url = "http://127.0.0.1:${toString config.services.prometheus.port}";
|
|
isDefault = true;
|
|
}
|
|
];
|
|
};
|
|
};
|
|
|
|
networking.firewall = {
|
|
allowedTCPPorts = [ grafanaPort ];
|
|
allowedUDPPorts = [ grafanaPort ];
|
|
};
|
|
}
|
|
```
|
|
|
|
If you want to access it via the internet, change the following:
|
|
|
|
- `http_addr = "127.0.0.1"`
|
|
- remove the firewall allowed ports
|
|
|
|
This insures data will only flow thru the nginx reverse proxy
|
|
|
|
Remember to set `networking.domain = "example.com"` to your domain.
|
|
|
|
```nix
|
|
# /services/nginx.nix
|
|
{ pkgs, config, ... }:
|
|
let
|
|
url = "http://127.0.0.1:${toString config.services.grafana.settings.server.http_port}";
|
|
in {
|
|
services.nginx = {
|
|
enable = true;
|
|
|
|
virtualHosts = {
|
|
"grafana.${config.networking.domain}" = {
|
|
# Auto cert by let's encrypt
|
|
forceSSL = true;
|
|
enableACME = true;
|
|
|
|
locations."/" = {
|
|
proxyPass = url;
|
|
extraConfig = "proxy_set_header Host $host;";
|
|
};
|
|
|
|
locations."/api" = {
|
|
extraConfig = ''
|
|
proxy_http_version 1.1;
|
|
proxy_set_header Upgrade $http_upgrade;
|
|
proxy_set_header Connection $connection_upgrade;
|
|
proxy_set_header Host $host;
|
|
'';
|
|
proxyPass = url;
|
|
};
|
|
};
|
|
};
|
|
};
|
|
|
|
# enable 80 and 443 ports for nginx
|
|
networking.firewall = {
|
|
enable = true;
|
|
allowedTCPPorts = [
|
|
443
|
|
80
|
|
];
|
|
allowedUDPPorts = [
|
|
443
|
|
80
|
|
];
|
|
};
|
|
}
|
|
```
|
|
|
|
### Log in
|
|
|
|
The default user is `admin` and password is `admin`. Grafana will ask you to change it upon logging-in!
|
|
|
|
### Add the dashboards
|
|
|
|
For node-exporter you can go to dashboards --> new --> import --> paste in `1860`
|
|
Now you can see all the metrics of all your server(s).
|
|
|
|
## Docker-compose
|
|
|
|
{{< filetree/container >}}
|
|
|
|
{{< filetree/folder name="monitoring-project" state="closed" >}}
|
|
|
|
{{< filetree/file name="docker-compose.yml" >}}
|
|
{{< filetree/file name="prometheus.nix" >}}
|
|
|
|
{{< /filetree/folder >}}
|
|
|
|
{{< /filetree/container >}}
|
|
|
|
### Compose project
|
|
|
|
I did not include a reverse proxy, neither smartctl as I forgot how to actually do it, that's how long I've been using nix :/
|
|
|
|
```yaml
|
|
# docker-compose.yml
|
|
version: "3.8"
|
|
|
|
networks:
|
|
monitoring:
|
|
driver: bridge
|
|
|
|
volumes:
|
|
prometheus_data: {}
|
|
|
|
services:
|
|
node-exporter:
|
|
image: prom/node-exporter:latest
|
|
container_name: node-exporter
|
|
restart: unless-stopped
|
|
hostname: node-exporter
|
|
volumes:
|
|
- /proc:/host/proc:ro
|
|
- /sys:/host/sys:ro
|
|
- /:/rootfs:ro
|
|
command:
|
|
- "--path.procfs=/host/proc"
|
|
- "--path.rootfs=/rootfs"
|
|
- "--path.sysfs=/host/sys"
|
|
- "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
|
|
networks:
|
|
- monitoring
|
|
|
|
prometheus:
|
|
image: prom/prometheus:latest
|
|
container_name: prometheus
|
|
restart: unless-stopped
|
|
hostname: prometheus
|
|
volumes:
|
|
- ./prometheus.yml:/etc/prometheus/prometheus.yml
|
|
- prometheus_data:/prometheus
|
|
command:
|
|
- "--config.file=/etc/prometheus/prometheus.yml"
|
|
- "--storage.tsdb.path=/prometheus"
|
|
- "--web.console.libraries=/etc/prometheus/console_libraries"
|
|
- "--web.console.templates=/etc/prometheus/consoles"
|
|
- "--web.enable-lifecycle"
|
|
networks:
|
|
- monitoring
|
|
|
|
grafana:
|
|
image: grafana/grafana:latest
|
|
container_name: grafana
|
|
networks:
|
|
- monitoring
|
|
restart: unless-stopped
|
|
ports:
|
|
- '3000:3000'
|
|
```
|
|
|
|
```yaml
|
|
# ./prometheus.yml
|
|
global:
|
|
scrape_interval: 5s
|
|
|
|
scrape_configs:
|
|
- job_name: "node"
|
|
static_configs:
|
|
- targets: ["node-exporter:9100"]
|
|
```
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
### Setup prometheus as data source inside grafana
|
|
|
|
Head to Connections --> Data sources --> Add new data source --> Prometheus
|
|
Type in http://prometheus:9090 as the URL, on the bottom click `Save & test`.
|
|
|
|
Now you can add the dashboards, [explained in this section](#add-the-dashboards)
|
|
|
|
Photo by <a href="https://unsplash.com/@chrisyangchrisfilm?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Chris Yang</a> on <a href="https://unsplash.com/photos/silhouette-photography-of-man-1tnS_BVy9Jk?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>
|