Skip to content

Dashboard User Guide

This guide walks you through how to monitor your stack using the included Grafana dashboards. It shows how to use each dashboard, and some ideas of what things to look out for.

Availability - How well are things running?

Availability Dashboard

Open the Cogstack Monitoring Dashboard on localhost/grafana

Use the percentage uptime charts at the top to see the availability over a given time period. For example, “Over the last 8 hours, we have 99.5% availability on my service”.

Use the time filter in the top right corner of the page to change the window, for example change it to 30 days to see availability for the total month.

Look for trends like:

  • Has there been a full outage of a service for 5 minutes, where where 5m availability goes to 0
  • Is there some disruption over the time period, where my 5m availability stays high, but my 6h availability is going down?
  • Have we met the service level objective, if we set the time threshold to 30 days?

Use the filters at the top, or click in the table to better filter the view down to specific targets, services or hosts.

See Setup Probing to do the full setup of probers.

Inventory - What is running?

Docker Metrics Dashboard

Use the Docker Metrics dashboard to check which containers are running, where, and whether they're healthy. This is useful for verifying deployments or diagnosing issues.

The dashboard above includes the hostnames, IP addresses and any other details configured.

Check for things like:

  • Containers not running where you thought they should be by looking at the hostname for each container
  • Containers restarting unexpectedly, by looking at the "Running" column in the table

See telemetry to set this up

Telemetry - How can I see details of resources?

Some additional dashboards are setup to provide more metrics.

VM Metrics

 VM Metrics dashboard

Open the VM Metrics dashboard on localhost/grafana

Select a VM from the host dropdown .

Look for things like:

  • CPU Usage — is a process using too much CPU?
  • Memory Usage — if you're running out of RAM
  • Disk IO / Space — alerts you to low disk conditions
  • Trends over time, by setting the time filter to 30 days. Is your disk usage increasing over time?

Elasticsearch Metrics

ElasticSearch Metrics Dashboard Open the Elasticsearch Metrics dashboard on localhost/grafana

This dashboard helps you understand how your ElasticSearch or Opensearch cluster is behaving.

Look at:

  • Cluster health status — shows yellow/red states immediately
  • Index size per shard — to detect unbalanced index growth
  • Query latency and throughput — useful during heavy search loads

See telemetry to set this up

Alerting - When should I look at this?

Alerting is setup using Grafana Alerts, but paused by default

When alerts are setup, the grafana graphs will show when the alerts were fired. Alerts Firing on dashboard

Two sets of rules are defined in this project:

See Alerting to set this up