You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/managing-workflow/platform-monitoring.md
+70-5Lines changed: 70 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,18 +52,32 @@ Grafana will preload several dashboards to help operators get started with monit
52
52
These dashboards are meant as starting points and don't include every item that might be desirable to monitor in a
53
53
production installation.
54
54
55
-
Deis Workflow monitoring does not currently write data to the host filesystem or to long-term storage. If the Grafana
56
-
instance fails, modified dashboards are lost. Until there is a solution to persist this, export dashboards and store
57
-
them separately in version control.
55
+
Deis Workflow monitoring by default does not write data to the host filesystem or to long-term storage. If the Grafana instance fails, modified dashboards are lost.
56
+
57
+
### On Cluster Persistence
58
+
59
+
If you wish to have persistence for Grafana you can set `enabled` to `true` in the `values.yaml` file before running `helm install`.
60
+
61
+
```
62
+
grafana:
63
+
# Configure the following ONLY if you want persistence for on-cluster grafana
64
+
# GCP PDs and EBS volumes are supported only
65
+
persistence:
66
+
enabled: true # Set to true to enable persistence
67
+
size: 5Gi # PVC size
68
+
```
69
+
70
+
You have to set (if you do not have it already) `standard` StorageClass as per [PVC Dynamic Provisioning](#pvc-dynamic-provisioning), as it does not get set by default in Kubernetes v1.4.x and v1.5.x.
71
+
58
72
59
73
### Off Cluster Grafana
60
74
61
-
It is recommended that users provide their own installation for Grafana if possible. The current deployment of Grafana within Workflow is not durable across pod restarts which means custom dashboards that are created after startup will not be restored when the pod comes back up. If you wish to provide your own Grafana instance you can set `grafana_location` in the `values.yaml` file before running `helm install`.
75
+
If you wish to provide your own Grafana instance you can set `grafana_location` in the `values.yaml` file before running `helm install`.
62
76
63
77
## InfluxDB
64
78
65
79
InfluxDB writes data to the host disk, however, if the InfluxDB pod dies and comes back on
66
-
another host the data will not be recovered. We intend to fix this in a future release. The InfluxDB Admin UI is also
80
+
another host, the data will not be recovered you need to enable on-cluster persistence for data to persist. The InfluxDB Admin UI is also
67
81
exposed through the router allowing users to access the query engine by going to `influx.mydomain.com`. You will need to
68
82
configure where to find the `influx-api` endpoint by clicking the "gear" icon at the top right and changing the host to
69
83
`influxapi.mydomain.com` and port to `80`.
@@ -75,6 +89,22 @@ You can choose to not expose the Influx UI and API to the world by updating
75
89
`$CHART_HOME/workspace/workflow-$WORKFLOW_RELEASE/manifests/deis-monitor-influxdb-ui-svc.yaml` and removing the
76
90
following line - `router.deis.io/routable: "true"`.
77
91
92
+
### On Cluster Persistence
93
+
94
+
If you wish to have persistence for InfluxDB you can set `enabled` to `true` in the `values.yaml` file before running `helm install`.
95
+
96
+
```
97
+
influxdb:
98
+
# Configure the following ONLY if you want persistence for on-cluster grafana
99
+
# GCP PDs and EBS volumes are supported only
100
+
persistence:
101
+
enabled: true # Set to true to enable persistence
102
+
size: 5Gi # PVC size
103
+
```
104
+
105
+
You have to set (if you do not have it already) `standard` StorageClass as per [PVC Dynamic Provisioning](#pvc-dynamic-provisioning), as it does not get set by default in Kubernetes v1.4.x and v1.5.x.
106
+
107
+
78
108
### Off Cluster Influxdb
79
109
80
110
To use off-cluster Influx, please provide the following values in the `values.yaml` file before running `helm install`.
@@ -85,6 +115,41 @@ To use off-cluster Influx, please provide the following values in the `values.ya
85
115
*`user = "InfluxUser"`
86
116
*`password = "MysuperSecurePassword"`
87
117
118
+
119
+
## PVC Dynamic Provisioning
120
+
121
+
Kubernetes v1.4.x has introduced Dynamic Provisioning and Storage Classes, you can read about it [here](http://blog.kubernetes.io/2016/10/dynamic-provisioning-and-storage-in-kubernetes.html).
122
+
123
+
To use persistence for Grafana and InfluxDB you also need to deploy StorageClass objects to the Kubernetes cluster with `kubectl create -f storage-standard.yaml`.
124
+
125
+
Note: GCE/GKE and AWS have different `StorageClass` settings.
126
+
127
+
GCE/GKE `storage-standard.yaml` manifest:
128
+
129
+
```
130
+
kind: StorageClass
131
+
apiVersion: storage.k8s.io/v1beta1
132
+
metadata:
133
+
name: standard
134
+
provisioner: kubernetes.io/gce-pd
135
+
parameters:
136
+
type: pd-standard
137
+
```
138
+
139
+
140
+
AWS `storage-standard.yaml` manifest:
141
+
142
+
```
143
+
kind: StorageClass
144
+
apiVersion: storage.k8s.io/v1beta1
145
+
metadata:
146
+
name: standard
147
+
provisioner: kubernetes.io/aws-ebs
148
+
parameters:
149
+
type: gp2
150
+
```
151
+
152
+
88
153
## Telegraf
89
154
90
155
Telegraf is the metrics collection daemon used within the monitoring stack. It will collect and send the following metrics to InfluxDB:
0 commit comments