|
2 | 2 |
|
3 | 3 | ## Description |
4 | 4 |
|
5 | | -We now include a monitoring stack for introspection on a running Kubernetes cluster. The stack includes 3 components: |
| 5 | +We now include a monitoring stack for introspection on a running Kubernetes cluster. The stack includes 4 components: |
6 | 6 |
|
7 | | -* [Telegraf](https://docs.influxdata.com/telegraf) - Metrics collection daemon written by team behind InfluxDB. |
8 | | -* [InfluxDB](https://docs.influxdata.com/influxdb) - Time series database |
9 | | -* [Grafana](http://grafana.org/) - Graphing tool for time series data |
| 7 | +* [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics), kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. |
| 8 | +* [Node Exporter](http://github.com/prometheus/node_exporter), Prometheus exporter for hardware and OS metrics exposed by *NIX kernels. |
| 9 | +* [Prometheus](https://prometheus.io/), a [Cloud Native Computing Foundation](https://cncf.io/) project, is a systems and service monitoring system. |
| 10 | +* [Grafana](http://grafana.org/), Graphing tool for time series data |
10 | 11 |
|
11 | 12 | ## Architecture Diagram |
12 | 13 |
|
13 | 14 | ``` |
14 | | - ┌────────┐ |
15 | | - │ Router │ ┌────────┐ ┌─────┐ |
16 | | - └────────┘ │ Logger │◀───▶│Redis│ |
17 | | - │ └────────┘ └─────┘ |
18 | | - Log file ▲ |
19 | | - │ │ |
20 | | - ▼ │ |
21 | | -┌────────┐ ┌─────────┐ logs/metrics ┌──────────────┐ |
22 | | -│App Logs│──Log File──▶│ fluentd │───────topics─────▶│ Redis Stream │ |
23 | | -└────────┘ └─────────┘ └──────────────┘ |
24 | | - │ |
25 | | - │ |
26 | | -┌─────────────┐ │ |
27 | | -│ HOST │ ▼ |
28 | | -│ Telegraf │───┐ ┌────────┐ |
29 | | -└─────────────┘ │ │Telegraf│ |
30 | | - │ └────────┘ |
31 | | -┌─────────────┐ │ │ |
32 | | -│ HOST │ │ ┌───────────┐ │ |
33 | | -│ Telegraf │───┼───▶│ InfluxDB │◀────Wire ───────────┘ |
34 | | -└─────────────┘ │ └───────────┘ Protocol |
35 | | - │ ▲ |
36 | | -┌─────────────┐ │ │ |
37 | | -│ HOST │ │ ▼ |
38 | | -│ Telegraf │───┘ ┌──────────┐ |
39 | | -└─────────────┘ │ Grafana │ |
40 | | - └──────────┘ |
| 15 | +┌────────────────┐ |
| 16 | +│ HOST │ |
| 17 | +│ node-exporter │◀──┐ ┌──────────────────┐ |
| 18 | +└────────────────┘ │ │kube-state-metrics│ |
| 19 | + │ └──────────────────┘ |
| 20 | +┌────────────────┐ │ ▲ |
| 21 | +│ HOST │ │ ┌────────────┐ │ |
| 22 | +│ node-exporter │◀──┼────│ Prometheus │─────────────┘ |
| 23 | +└────────────────┘ │ └────────────┘ |
| 24 | + │ ▲ |
| 25 | +┌───────────────┐ │ │ |
| 26 | +│ HOST │ │ ▼ |
| 27 | +│ node-exporter│◀───┘ ┌──────────┐ |
| 28 | +└───────────────┘ │ Grafana │ |
| 29 | + └──────────┘ |
41 | 30 | ``` |
42 | 31 |
|
43 | 32 | ## [Grafana](https://grafana.com/) |
@@ -75,44 +64,28 @@ If you wish to have persistence for Grafana you can set `enabled` to `true` in t |
75 | 64 |
|
76 | 65 | If you wish to provide your own Grafana instance you can set `grafanaLocation` in the `values.yaml` file before running `helm install`. |
77 | 66 |
|
78 | | -## [InfluxDB](https://docs.influxdata.com/influxdb) |
79 | | -InfluxDB writes data to the host disk; however, if the InfluxDB pod dies and comes back on another host, the data will not be recovered. The InfluxDB Admin UI is also exposed through the router allowing users to access the query engine by going to `influx.mydomain.com`. You will need to configure where to find the `influx-api` endpoint by clicking the "gear" icon at the top right and changing the host to `influx-api.mydomain.com` and port to `80`. |
| 67 | +## [Prometheus](https://prometheus.io/) |
| 68 | +Prometheus writes data to the host disk; however, if the prometheus pod dies and comes back on another host, the data will not be recovered. The prometheus graph UI is also exposed through the router allowing users to access the query engine by going to `prometheus.mydomain.com`. |
80 | 69 |
|
81 | 70 | ### On Cluster Persistence |
82 | | -If you wish to have persistence for InfluxDB you can set `enabled` to `true` in the `values.yaml` file before running `helm install`. |
| 71 | +You can set `node-exporter` and `kube-state-metrics` to `true` or `false` in the `values.yaml`. |
| 72 | +If you wish to have persistence for Prometheus you can set `enabled` to `true` in the `values.yaml` file before running `helm install`. |
83 | 73 |
|
84 | 74 | ``` |
85 | | - influxdb: |
86 | | - # Configure the following ONLY if you want persistence for on-cluster grafana |
87 | | - # GCP PDs and EBS volumes are supported only |
88 | | - persistence: |
89 | | - enabled: true # Set to true to enable persistence |
90 | | - size: 5Gi # PVC size |
| 75 | +prometheus: |
| 76 | + prometheus-server: |
| 77 | + persistence: |
| 78 | + enabled: true # Set to true to enable persistence |
| 79 | + size: 10Gi # PVC size |
| 80 | +node-exporter: |
| 81 | + enabled: true |
| 82 | +kube-state-metrics: |
| 83 | + enabled: true |
91 | 84 | ``` |
92 | 85 |
|
93 | | -### Off Cluster Influxdb |
94 | | - |
95 | | -To use off-cluster Influx v2, please provide the following values in the `values.yaml` file before running `helm install`. |
96 | | - |
97 | | -* `influxdbLocation=off-cluster` |
98 | | -* `url = "http://my-influxhost.com:8086"` |
99 | | -* `bucket = "metrics"` |
100 | | -* `org = "drycc"` |
101 | | -* `token = "MysuperSecurePassword"` |
102 | | - |
103 | | - |
104 | | -## [Telegraf](https://docs.influxdata.com/telegraf) |
105 | | - |
106 | | -Telegraf is the metrics collection daemon used within the monitoring stack. It will collect and send the following metrics to InfluxDB: |
107 | | - |
108 | | -* System level metrics such as CPU, Load Average, Memory, Disk, and Network stats |
109 | | -* Container level metrics such as CPU and Memory |
110 | | -* Kubernetes metrics such as API request latency, Pod Startup Latency, and number of running pods |
111 | | - |
112 | | -It is possible to send these metrics to other endpoints besides InfluxDB. For more information please consult the following [file](https://github.com/drycc/monitor/blob/main/telegraf/rootfs/config.toml.tpl) |
113 | | - |
114 | | -### Customizing the Monitoring Stack |
| 86 | +### Off Cluster Prometheus |
115 | 87 |
|
116 | | -To learn more about customizing each of the above components please visit the [Tuning Component Settings][] section. |
| 88 | +To use off-cluster Prometheus, please provide the following values in the `values.yaml` file before running `helm install`. |
117 | 89 |
|
118 | | -[Tuning Component Settings]: tuning-component-settings.md#customizing-the-monitor |
| 90 | +* `global.prometheusLocation=off-cluster` |
| 91 | +* `url = "http://my.prometheus.url:9090"` |
0 commit comments