diff --git a/Makefile b/Makefile index 26084ae48d95a219f8f0ba824f8ca93db6192b69..8ff8135605e21ffa0e9a0418db3e80fa957efae7 100644 --- a/Makefile +++ b/Makefile @@ -5,7 +5,7 @@ image: generate: image @echo ">> Compiling assets and generating Kubernetes manifests" - docker run --rm -v `pwd`:/go/src/github.com/coreos/prometheus-operator/contrib/kube-prometheus --workdir /go/src/github.com/coreos/prometheus-operator/contrib/kube-prometheus po-jsonnet make generate-raw + docker run --rm -u=$(shell id -u $(USER)):$(shell id -g $(USER)) -v `pwd`:/go/src/github.com/coreos/prometheus-operator/contrib/kube-prometheus --workdir /go/src/github.com/coreos/prometheus-operator/contrib/kube-prometheus po-jsonnet make generate-raw generate-raw: - ./hack/scripts/generate-manifests.sh + ./hack/scripts/build-jsonnet.sh example-dist/base/kube-prometheus.jsonnet manifests diff --git a/README.md b/README.md index 4ada050d3407663a4c160f752ac296a02cd70f95..7defae27272d6a860cb9448e74b74344bc28b387 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,7 @@ # kube-prometheus +> Note that everything in the `contrib/kube-prometheus/` directory is experimental and may change significantly at any time. + This repository collects Kubernetes manifests, [Grafana](http://grafana.com/) dashboards, and [Prometheus rules](https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/) combined with documentation and scripts to provide single-command deployments of end-to-end @@ -46,16 +48,15 @@ install Simply run: ```bash -export KUBECONFIG=<path> # defaults to "~/.kube/config" cd contrib/kube-prometheus/ hack/cluster-monitoring/deploy ``` -After all pods are ready, you can reach: +After all pods are ready, you can reach each of the UIs by port-forwarding: -* Prometheus UI on node port `30900` -* Alertmanager UI on node port `30903` -* Grafana on node port `30902` +* Prometheus UI on node port `kubectl -n monitoring port-forward prometheus-k8s-0 9090` +* Alertmanager UI on node port `kubectl -n monitoring port-forward alertmanager-main-0 9093` +* Grafana on node port `kubectl -n monitoring port-forward $(kubectl get pods -n monitoring -lapp=grafana -ojsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}') 3000` To tear it all down again, run: @@ -63,9 +64,53 @@ To tear it all down again, run: hack/cluster-monitoring/teardown ``` +## Customizing + +As everyone's infrastructure is slightly different, different organizations have different requirements. Thereby there may be modifications you want to do on kube-prometheus to fit your needs. + +The kube-prometheus stack is intended to be a jsonnet library for organizations to consume and use in their own infrastructure repository. Below is an example how it can be used to deploy the stack properly on minikube. + +The three "distribution" examples we have assembled can be found in: + +* `example-dist/base`: contains the plain kube-prometheus stack for organizations to build on. +* `example-dist/kubeadm`: contains the kube-prometheus stack with slight modifications to work properly monitoring kubeadm clusters and exposes UIs on NodePorts for demonstration purposes. +* `example-dist/bootkube`: contains the kube-prometheus stack with slight modifications to work properly on clusters created with bootkube. + +The examples in `example-dist/` are purely meant for demonstration purposes, the `kube-prometheus.jsonnet` file should live in your organizations infrastructure repository and use the kube-prometheus library provided here. + +Examples of additoinal modifications you may want to make could be adding an `Ingress` object for each of the UIs, but the point of this is that as opposed to other solutions out there, this library does not need to yield all possible customization options, it's all up to the user to customize! + +### minikube kubeadm example + +See `example-dist/kubeadm` for an example for deploying on minikube, using the minikube kubeadm bootstrapper. The `example-dist/kubeadm/kube-prometheus.jsonnet` file renders the kube-prometheus manifests using jsonnet and then merges the result with kubeadm specifics, such as information on how to monitor kube-controller-manager and kube-scheduler as created by kubeadm. In addition for demonstration purposes, it converts the services selecting Prometheus, Alertmanager and Grafana to NodePort services. + +Let's give that a try, and create a minikube cluster: + +``` +minikube delete && minikube start --kubernetes-version=v1.9.6 --memory=4096 --bootstrapper=kubeadm --extra-config=kubelet.authentication-token-webhook=true --extra-config=kubelet.authorization-mode=Webhook --extra-config=scheduler.address=0.0.0.0 --extra-config=controller-manager.address=0.0.0.0 +``` + +Then we can render the manifests for kubeadm (because we are using the minikube kubeadm bootstrapper): + +``` +docker run --rm \ + -v `pwd`:/go/src/github.com/coreos/prometheus-operator/contrib/kube-prometheus \ + --workdir /go/src/github.com/coreos/prometheus-operator/contrib/kube-prometheus \ + po-jsonnet \ + ./hack/scripts/build-jsonnet.sh example-dist/kubeadm/kube-prometheus.jsonnet example-dist/kubeadm/manifests +``` + +> Note the `po-jsonnet` docker image is built using [this Dockerfile](/scripts/jsonnet/Dockerfile), you can also build it using `make image` from the `contrib/kube-prometheus` folder. + +Then the stack can be deployed using + +``` +hack/cluster-monitoring/deploy example-dist/kubeadm +``` + ## Monitoring custom services -The example manifests in [manifests/examples/example-app](/contrib/kube-prometheus/manifests/examples/example-app) +The example manifests in [examples/example-app](/contrib/kube-prometheus/examples/example-app) deploy a fake service exposing Prometheus metrics. They additionally define a new Prometheus server and a [`ServiceMonitor`](https://github.com/coreos/prometheus-operator/blob/master/Documentation/design.md#servicemonitor), which specifies how the example service should be monitored. @@ -76,10 +121,13 @@ manage its life cycle. hack/example-service-monitoring/deploy ``` -After all pods are ready you can reach the Prometheus server on node port `30100` and observe -how it monitors the service as specified. Same as before, this Prometheus server automatically -discovers the Alertmanager cluster deployed in the [Monitoring Kubernetes](#Monitoring-Kubernetes) -section. +After all pods are ready you can reach the Prometheus server similar to the Prometheus server above: + +```bash +kubectl port-forward prometheus-frontend-0 9090 +``` + +Then you can access Prometheus through `http://localhost:9090/`. Teardown: diff --git a/build.sh b/build.sh deleted file mode 100755 index a42a6bb76b6ac8eee63469d2be28351f18054da3..0000000000000000000000000000000000000000 --- a/build.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/usr/bin/env bash -set -e -set -x - -prefix="tmp/manifests" -json="tmp/manifests.json" - -rm -rf ${prefix} -mkdir -p $(dirname "${json}") -jsonnet -J /home/brancz/.jsonnet-bundler/src/git/git@github.com-ksonnet-ksonnet-lib/master jsonnet/kube-prometheus.jsonnet > ${json} - -files=$(jq -r 'keys[]' ${json}) - -for file in ${files}; do - dir=$(dirname "${file}") - path="${prefix}/${dir}" - mkdir -p ${path} - jq -r ".[\"${file}\"]" ${json} | yaml2json | json2yaml > "${prefix}/${file}" -done diff --git a/example-dist/base/kube-prometheus.jsonnet b/example-dist/base/kube-prometheus.jsonnet new file mode 100644 index 0000000000000000000000000000000000000000..01760e6545710034cbcb09a403d539f51aa0aca9 --- /dev/null +++ b/example-dist/base/kube-prometheus.jsonnet @@ -0,0 +1,6 @@ +local kubePrometheus = import "kube-prometheus.libsonnet"; + +local namespace = "monitoring"; +local objects = kubePrometheus.new(namespace); + +{[path]: std.manifestYamlDoc(objects[path]) for path in std.objectFields(objects)} diff --git a/example-dist/bootkube/.gitignore b/example-dist/bootkube/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..4ea90de6f06de4520a0fc0bb782f154778860149 --- /dev/null +++ b/example-dist/bootkube/.gitignore @@ -0,0 +1,2 @@ +tmp/ +manifests/ diff --git a/example-dist/bootkube/kube-prometheus.jsonnet b/example-dist/bootkube/kube-prometheus.jsonnet new file mode 100644 index 0000000000000000000000000000000000000000..fa731106ba1d9083e96cecbbd80b4ebb0b6853bd --- /dev/null +++ b/example-dist/bootkube/kube-prometheus.jsonnet @@ -0,0 +1,36 @@ +local k = import "ksonnet.beta.3/k.libsonnet"; +local service = k.core.v1.service; +local servicePort = k.core.v1.service.mixin.spec.portsType; +local kubePrometheus = import "kube-prometheus.libsonnet"; + +local namespace = "monitoring"; + +local controllerManagerService = service.new("kube-controller-manager-prometheus-discovery", {"k8s-app": "kube-controller-manager"}, servicePort.newNamed("http-metrics", 10252, 10252)) + + service.mixin.metadata.withNamespace("kube-system") + + service.mixin.metadata.withLabels({"k8s-app": "kube-controller-manager"}); + +local schedulerService = service.new("kube-scheduler-prometheus-discovery", {"k8s-app": "kube-scheduler"}, servicePort.newNamed("http-metrics", 10251, 10251)) + + service.mixin.metadata.withNamespace("kube-system") + + service.mixin.metadata.withLabels({"k8s-app": "kube-scheduler"}); + +local kubeDNSService = service.new("kube-dns-prometheus-discovery", {"k8s-app": "kube-dns"}, [servicePort.newNamed("http-metrics-skydns", 10055, 10055), servicePort.newNamed("http-metrics-dnsmasq", 10054, 10054)]) + + service.mixin.metadata.withNamespace("kube-system") + + service.mixin.metadata.withLabels({"k8s-app": "kube-dns"}); + +local objects = kubePrometheus.new(namespace) + + { + "prometheus-k8s/prometheus-k8s-service.yaml"+: + service.mixin.spec.withPorts(servicePort.newNamed("web", 9090, "web") + servicePort.withNodePort(30900)) + + service.mixin.spec.withType("NodePort"), + "alertmanager-main/alertmanager-main-service.yaml"+: + service.mixin.spec.withPorts(servicePort.newNamed("web", 9093, "web") + servicePort.withNodePort(30903)) + + service.mixin.spec.withType("NodePort"), + "grafana/grafana-service.yaml"+: + service.mixin.spec.withPorts(servicePort.newNamed("http", 3000, "http") + servicePort.withNodePort(30902)) + + service.mixin.spec.withType("NodePort"), + "prometheus-k8s/kube-controller-manager-prometheus-discovery-service.yaml": controllerManagerService, + "prometheus-k8s/kube-scheduler-prometheus-discovery-service.yaml": schedulerService, + "prometheus-k8s/kube-dns-prometheus-discovery-service.yaml": kubeDNSService, + }; + +{[path]: std.manifestYamlDoc(objects[path]) for path in std.objectFields(objects)} diff --git a/example-dist/kubeadm/.gitignore b/example-dist/kubeadm/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..4ea90de6f06de4520a0fc0bb782f154778860149 --- /dev/null +++ b/example-dist/kubeadm/.gitignore @@ -0,0 +1,2 @@ +tmp/ +manifests/ diff --git a/example-dist/kubeadm/kube-prometheus.jsonnet b/example-dist/kubeadm/kube-prometheus.jsonnet new file mode 100644 index 0000000000000000000000000000000000000000..50ce102092917b97a80bffd79a19af98a369687b --- /dev/null +++ b/example-dist/kubeadm/kube-prometheus.jsonnet @@ -0,0 +1,31 @@ +local k = import "ksonnet.beta.3/k.libsonnet"; +local service = k.core.v1.service; +local servicePort = k.core.v1.service.mixin.spec.portsType; +local kubePrometheus = import "kube-prometheus.libsonnet"; + +local namespace = "monitoring"; + +local controllerManagerService = service.new("kube-controller-manager-prometheus-discovery", {component: "kube-controller-manager"}, servicePort.newNamed("http-metrics", 10252, 10252)) + + service.mixin.metadata.withNamespace("kube-system") + + service.mixin.metadata.withLabels({"k8s-app": "kube-controller-manager"}); + +local schedulerService = service.new("kube-scheduler-prometheus-discovery", {component: "kube-scheduler"}, servicePort.newNamed("http-metrics", 10251, 10251)) + + service.mixin.metadata.withNamespace("kube-system") + + service.mixin.metadata.withLabels({"k8s-app": "kube-scheduler"}); + +local objects = kubePrometheus.new(namespace) + + { + "prometheus-k8s/prometheus-k8s-service.yaml"+: + service.mixin.spec.withPorts(servicePort.newNamed("web", 9090, "web") + servicePort.withNodePort(30900)) + + service.mixin.spec.withType("NodePort"), + "alertmanager-main/alertmanager-main-service.yaml"+: + service.mixin.spec.withPorts(servicePort.newNamed("web", 9093, "web") + servicePort.withNodePort(30903)) + + service.mixin.spec.withType("NodePort"), + "grafana/grafana-service.yaml"+: + service.mixin.spec.withPorts(servicePort.newNamed("http", 3000, "http") + servicePort.withNodePort(30902)) + + service.mixin.spec.withType("NodePort"), + "prometheus-k8s/kube-controller-manager-prometheus-discovery-service.yaml": controllerManagerService, + "prometheus-k8s/kube-scheduler-prometheus-discovery-service.yaml": schedulerService, + }; + +{[path]: std.manifestYamlDoc(objects[path]) for path in std.objectFields(objects)} diff --git a/manifests/examples/basic-auth/secrets.yaml b/examples/basic-auth/secrets.yaml similarity index 100% rename from manifests/examples/basic-auth/secrets.yaml rename to examples/basic-auth/secrets.yaml diff --git a/manifests/examples/basic-auth/service-monitor.yaml b/examples/basic-auth/service-monitor.yaml similarity index 100% rename from manifests/examples/basic-auth/service-monitor.yaml rename to examples/basic-auth/service-monitor.yaml diff --git a/manifests/examples/example-app/example-app.yaml b/examples/example-app/example-app.yaml similarity index 100% rename from manifests/examples/example-app/example-app.yaml rename to examples/example-app/example-app.yaml diff --git a/manifests/examples/example-app/prometheus-frontend-alertmanager-discovery-role-binding.yaml b/examples/example-app/prometheus-frontend-alertmanager-discovery-role-binding.yaml similarity index 100% rename from manifests/examples/example-app/prometheus-frontend-alertmanager-discovery-role-binding.yaml rename to examples/example-app/prometheus-frontend-alertmanager-discovery-role-binding.yaml diff --git a/manifests/examples/example-app/prometheus-frontend-alertmanager-discovery-role.yaml b/examples/example-app/prometheus-frontend-alertmanager-discovery-role.yaml similarity index 100% rename from manifests/examples/example-app/prometheus-frontend-alertmanager-discovery-role.yaml rename to examples/example-app/prometheus-frontend-alertmanager-discovery-role.yaml diff --git a/manifests/examples/example-app/prometheus-frontend-role-binding.yaml b/examples/example-app/prometheus-frontend-role-binding.yaml similarity index 100% rename from manifests/examples/example-app/prometheus-frontend-role-binding.yaml rename to examples/example-app/prometheus-frontend-role-binding.yaml diff --git a/manifests/examples/example-app/prometheus-frontend-role.yaml b/examples/example-app/prometheus-frontend-role.yaml similarity index 100% rename from manifests/examples/example-app/prometheus-frontend-role.yaml rename to examples/example-app/prometheus-frontend-role.yaml diff --git a/manifests/examples/example-app/prometheus-frontend-service-account.yaml b/examples/example-app/prometheus-frontend-service-account.yaml similarity index 100% rename from manifests/examples/example-app/prometheus-frontend-service-account.yaml rename to examples/example-app/prometheus-frontend-service-account.yaml diff --git a/manifests/examples/example-app/prometheus-frontend-svc.yaml b/examples/example-app/prometheus-frontend-svc.yaml similarity index 100% rename from manifests/examples/example-app/prometheus-frontend-svc.yaml rename to examples/example-app/prometheus-frontend-svc.yaml diff --git a/manifests/examples/example-app/prometheus-frontend.yaml b/examples/example-app/prometheus-frontend.yaml similarity index 100% rename from manifests/examples/example-app/prometheus-frontend.yaml rename to examples/example-app/prometheus-frontend.yaml diff --git a/manifests/examples/example-app/servicemonitor-frontend.yaml b/examples/example-app/servicemonitor-frontend.yaml similarity index 100% rename from manifests/examples/example-app/servicemonitor-frontend.yaml rename to examples/example-app/servicemonitor-frontend.yaml diff --git a/manifests/custom-metrics-api/.gitignore b/experimental/custom-metrics-api/.gitignore similarity index 100% rename from manifests/custom-metrics-api/.gitignore rename to experimental/custom-metrics-api/.gitignore diff --git a/manifests/custom-metrics-api/README.md b/experimental/custom-metrics-api/README.md similarity index 100% rename from manifests/custom-metrics-api/README.md rename to experimental/custom-metrics-api/README.md diff --git a/manifests/custom-metrics-api/custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml b/experimental/custom-metrics-api/custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml rename to experimental/custom-metrics-api/custom-metrics-apiserver-auth-delegator-cluster-role-binding.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-apiserver-auth-reader-role-binding.yaml b/experimental/custom-metrics-api/custom-metrics-apiserver-auth-reader-role-binding.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-apiserver-auth-reader-role-binding.yaml rename to experimental/custom-metrics-api/custom-metrics-apiserver-auth-reader-role-binding.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-apiserver-deployment.yaml b/experimental/custom-metrics-api/custom-metrics-apiserver-deployment.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-apiserver-deployment.yaml rename to experimental/custom-metrics-api/custom-metrics-apiserver-deployment.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml b/experimental/custom-metrics-api/custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml rename to experimental/custom-metrics-api/custom-metrics-apiserver-resource-reader-cluster-role-binding.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-apiserver-service-account.yaml b/experimental/custom-metrics-api/custom-metrics-apiserver-service-account.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-apiserver-service-account.yaml rename to experimental/custom-metrics-api/custom-metrics-apiserver-service-account.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-apiserver-service.yaml b/experimental/custom-metrics-api/custom-metrics-apiserver-service.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-apiserver-service.yaml rename to experimental/custom-metrics-api/custom-metrics-apiserver-service.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-apiservice.yaml b/experimental/custom-metrics-api/custom-metrics-apiservice.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-apiservice.yaml rename to experimental/custom-metrics-api/custom-metrics-apiservice.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-cluster-role.yaml b/experimental/custom-metrics-api/custom-metrics-cluster-role.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-cluster-role.yaml rename to experimental/custom-metrics-api/custom-metrics-cluster-role.yaml diff --git a/manifests/custom-metrics-api/custom-metrics-resource-reader-cluster-role.yaml b/experimental/custom-metrics-api/custom-metrics-resource-reader-cluster-role.yaml similarity index 100% rename from manifests/custom-metrics-api/custom-metrics-resource-reader-cluster-role.yaml rename to experimental/custom-metrics-api/custom-metrics-resource-reader-cluster-role.yaml diff --git a/manifests/custom-metrics-api/deploy.sh b/experimental/custom-metrics-api/deploy.sh similarity index 100% rename from manifests/custom-metrics-api/deploy.sh rename to experimental/custom-metrics-api/deploy.sh diff --git a/manifests/custom-metrics-api/gencerts.sh b/experimental/custom-metrics-api/gencerts.sh similarity index 100% rename from manifests/custom-metrics-api/gencerts.sh rename to experimental/custom-metrics-api/gencerts.sh diff --git a/manifests/custom-metrics-api/hpa-custom-metrics-cluster-role-binding.yaml b/experimental/custom-metrics-api/hpa-custom-metrics-cluster-role-binding.yaml similarity index 100% rename from manifests/custom-metrics-api/hpa-custom-metrics-cluster-role-binding.yaml rename to experimental/custom-metrics-api/hpa-custom-metrics-cluster-role-binding.yaml diff --git a/manifests/custom-metrics-api/teardown.sh b/experimental/custom-metrics-api/teardown.sh similarity index 100% rename from manifests/custom-metrics-api/teardown.sh rename to experimental/custom-metrics-api/teardown.sh diff --git a/manifests/metrics-server/auth-delegator.yaml b/experimental/metrics-server/auth-delegator.yaml similarity index 100% rename from manifests/metrics-server/auth-delegator.yaml rename to experimental/metrics-server/auth-delegator.yaml diff --git a/manifests/metrics-server/auth-reader.yaml b/experimental/metrics-server/auth-reader.yaml similarity index 100% rename from manifests/metrics-server/auth-reader.yaml rename to experimental/metrics-server/auth-reader.yaml diff --git a/manifests/metrics-server/metrics-apiservice.yaml b/experimental/metrics-server/metrics-apiservice.yaml similarity index 100% rename from manifests/metrics-server/metrics-apiservice.yaml rename to experimental/metrics-server/metrics-apiservice.yaml diff --git a/manifests/metrics-server/metrics-server-cluster-role-binding.yaml b/experimental/metrics-server/metrics-server-cluster-role-binding.yaml similarity index 100% rename from manifests/metrics-server/metrics-server-cluster-role-binding.yaml rename to experimental/metrics-server/metrics-server-cluster-role-binding.yaml diff --git a/manifests/metrics-server/metrics-server-cluster-role.yaml b/experimental/metrics-server/metrics-server-cluster-role.yaml similarity index 100% rename from manifests/metrics-server/metrics-server-cluster-role.yaml rename to experimental/metrics-server/metrics-server-cluster-role.yaml diff --git a/manifests/metrics-server/metrics-server-deployment.yaml b/experimental/metrics-server/metrics-server-deployment.yaml similarity index 100% rename from manifests/metrics-server/metrics-server-deployment.yaml rename to experimental/metrics-server/metrics-server-deployment.yaml diff --git a/manifests/metrics-server/metrics-server-service-account.yaml b/experimental/metrics-server/metrics-server-service-account.yaml similarity index 100% rename from manifests/metrics-server/metrics-server-service-account.yaml rename to experimental/metrics-server/metrics-server-service-account.yaml diff --git a/manifests/metrics-server/metrics-server-service.yaml b/experimental/metrics-server/metrics-server-service.yaml similarity index 100% rename from manifests/metrics-server/metrics-server-service.yaml rename to experimental/metrics-server/metrics-server-service.yaml diff --git a/hack/cluster-monitoring/deploy b/hack/cluster-monitoring/deploy index a4f7c184403f2cd93569b3fd61443c15ecfaecde..41e051876498ccf22256c52b731d2a86be297d2d 100755 --- a/hack/cluster-monitoring/deploy +++ b/hack/cluster-monitoring/deploy @@ -1,40 +1,24 @@ #!/usr/bin/env bash -if [ -z "${KUBECONFIG}" ]; then - export KUBECONFIG=~/.kube/config -fi +manifest_prefix=${1-.} -# CAUTION - setting NAMESPACE will deploy most components to the given namespace -# however some are hardcoded to 'monitoring'. Only use if you have reviewed all manifests. +kubectl create namespace monitoring -if [ -z "${NAMESPACE}" ]; then - NAMESPACE=monitoring -fi - -kubectl create namespace "$NAMESPACE" - -kctl() { - kubectl --namespace "$NAMESPACE" "$@" -} - -kctl apply -f manifests/prometheus-operator +kubectl apply -f ${manifest_prefix}/manifests/prometheus-operator/ # Wait for CRDs to be ready. printf "Waiting for Operator to register custom resource definitions..." -until kctl get customresourcedefinitions servicemonitors.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done -until kctl get customresourcedefinitions prometheuses.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done -until kctl get customresourcedefinitions alertmanagers.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done -until kctl get servicemonitors.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done -until kctl get prometheuses.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done -until kctl get alertmanagers.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done +until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done +until kubectl get customresourcedefinitions prometheuses.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done +until kubectl get customresourcedefinitions alertmanagers.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done +until kubectl get servicemonitors.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done +until kubectl get prometheuses.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done +until kubectl get alertmanagers.monitoring.coreos.com > /dev/null 2>&1; do sleep 1; printf "."; done echo "done!" -kctl apply -f manifests/node-exporter -kctl apply -f manifests/kube-state-metrics -kctl apply -f manifests/grafana/grafana-credentials.yaml -kctl apply -f manifests/grafana -find manifests/prometheus -type f ! -name prometheus-k8s-roles.yaml ! -name prometheus-k8s-role-bindings.yaml -exec kubectl --namespace "$NAMESPACE" apply -f {} \; -kubectl apply -f manifests/prometheus/prometheus-k8s-roles.yaml -kubectl apply -f manifests/prometheus/prometheus-k8s-role-bindings.yaml -kctl apply -f manifests/alertmanager/ +kubectl apply -f ${manifest_prefix}/manifests/node-exporter/ +kubectl apply -f ${manifest_prefix}/manifests/kube-state-metrics/ +kubectl apply -f ${manifest_prefix}/manifests/grafana/ +kubectl apply -f ${manifest_prefix}/manifests/prometheus-k8s/ +kubectl apply -f ${manifest_prefix}/manifests/alertmanager-main/ diff --git a/hack/cluster-monitoring/minikube-deploy b/hack/cluster-monitoring/minikube-deploy deleted file mode 100755 index 64cb86be7e2ab34634741d5943d43a077ee9bc0d..0000000000000000000000000000000000000000 --- a/hack/cluster-monitoring/minikube-deploy +++ /dev/null @@ -1,17 +0,0 @@ -#!/usr/bin/env bash - -# We assume that the kubelet uses token authN and authZ, as otherwise -# Prometheus needs a client certificate, which gives it full access to the -# kubelet, rather than just the metrics. Token authN and authZ allows more fine -# grained and easier access control. Simply start minikube with the following -# command (you can of course adapt the version and memory to your needs): -# -# $ minikube delete && minikube start --kubernetes-version=v1.9.1 --memory=4096 --bootstrapper=kubeadm --extra-config=kubelet.authentication-token-webhook=true --extra-config=kubelet.authorization-mode=Webhook --extra-config=scheduler.address=0.0.0.0 --extra-config=controller-manager.address=0.0.0.0 -# -# In future versions of minikube and kubeadm this will be the default, but for -# the time being, we will have to configure it ourselves. - -hack/cluster-monitoring/deploy - -kubectl --namespace=kube-system apply -f manifests/k8s/kubeadm/ - diff --git a/hack/cluster-monitoring/minikube-teardown b/hack/cluster-monitoring/minikube-teardown deleted file mode 100755 index 3a4c986ed06f08edf4bd845c5223f90f4168ea43..0000000000000000000000000000000000000000 --- a/hack/cluster-monitoring/minikube-teardown +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env bash - -hack/cluster-monitoring/teardown - -kubectl --namespace=kube-system delete -f manifests/k8s/minikube - diff --git a/hack/cluster-monitoring/self-hosted-deploy b/hack/cluster-monitoring/self-hosted-deploy deleted file mode 100755 index 7cbce37d5304346fe71dd8d69be17b9debeeb0ee..0000000000000000000000000000000000000000 --- a/hack/cluster-monitoring/self-hosted-deploy +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env bash - -hack/cluster-monitoring/deploy - -kubectl apply -f manifests/k8s/self-hosted - diff --git a/hack/cluster-monitoring/self-hosted-teardown b/hack/cluster-monitoring/self-hosted-teardown deleted file mode 100755 index f9d7da9f7cb805344eedc997cb73b0fc0e5143cc..0000000000000000000000000000000000000000 --- a/hack/cluster-monitoring/self-hosted-teardown +++ /dev/null @@ -1,6 +0,0 @@ -#!/usr/bin/env bash - -hack/cluster-monitoring/teardown - -kubectl delete -f manifests/k8s/self-hosted - diff --git a/hack/cluster-monitoring/teardown b/hack/cluster-monitoring/teardown index b2c4c544bbdf19bb7047d720b45f20bd5e668f81..0ef9a6b3d80e6ed7184694fc51fbc499e6609e10 100755 --- a/hack/cluster-monitoring/teardown +++ b/hack/cluster-monitoring/teardown @@ -1,30 +1,4 @@ #!/usr/bin/env bash -if [ -z "${KUBECONFIG}" ]; then - export KUBECONFIG=~/.kube/config -fi - -# CAUTION - NAMESPACE must match its value when deploy script was run. -# Some resources are always deployed to the monitoring namespace. - -if [ -z "${NAMESPACE}" ]; then - NAMESPACE=monitoring -fi - -kctl() { - kubectl --namespace "$NAMESPACE" "$@" -} - -kctl delete -f manifests/node-exporter -kctl delete -f manifests/kube-state-metrics -kctl delete -f manifests/grafana -find manifests/prometheus -type f ! -name prometheus-k8s-roles.yaml ! -name prometheus-k8s-role-bindings.yaml -exec kubectl --namespace "$NAMESPACE" delete -f {} \; -kubectl delete -f manifests/prometheus/prometheus-k8s-roles.yaml -kubectl delete -f manifests/prometheus/prometheus-k8s-role-bindings.yaml -kctl delete -f manifests/alertmanager - -# Hack: wait a bit to let the controller delete the deployed Prometheus server. -sleep 5 - -kctl delete -f manifests/prometheus-operator +kubectl delete namespace monitoring diff --git a/hack/example-service-monitoring/deploy b/hack/example-service-monitoring/deploy index 18b0ef6a6120bb470bec1becc00128ec6de797f4..4912dd9658d18db27dba7e900a08edb0b03ed1b1 100755 --- a/hack/example-service-monitoring/deploy +++ b/hack/example-service-monitoring/deploy @@ -1,3 +1,3 @@ #!/usr/bin/env bash -kubectl apply -f manifests/examples/example-app +kubectl apply -f examples/example-app diff --git a/hack/example-service-monitoring/teardown b/hack/example-service-monitoring/teardown index a5fc176023d8ed3c33512afa26a4ce8a9f809f16..62b546dea28c9d8519059b1f49beff49fabc38f0 100755 --- a/hack/example-service-monitoring/teardown +++ b/hack/example-service-monitoring/teardown @@ -1,3 +1,3 @@ #!/usr/bin/env bash -kubectl delete -f manifests/examples/example-app +kubectl delete -f examples/example-app diff --git a/hack/scripts/build-jsonnet.sh b/hack/scripts/build-jsonnet.sh new file mode 100755 index 0000000000000000000000000000000000000000..7189962f7aecb5c45489c0214ba67e6fd9834425 --- /dev/null +++ b/hack/scripts/build-jsonnet.sh @@ -0,0 +1,25 @@ +#!/usr/bin/env bash +set -e +set -x + +jsonnet="${1-kube-prometheus.jsonnet}" +prefix="${2-manifests}" +json="tmp/manifests.json" + +rm -rf ${prefix} +mkdir -p $(dirname "${json}") +jsonnet \ + -J $GOPATH/src/github.com/ksonnet/ksonnet-lib \ + -J $GOPATH/src/github.com/grafana/grafonnet-lib \ + -J $GOPATH/src/github.com/coreos/prometheus-operator/contrib/kube-prometheus/jsonnet \ + -J $GOPATH/src/github.com/brancz/kubernetes-grafana/src/kubernetes-jsonnet \ + ${jsonnet} > ${json} + +files=$(jq -r 'keys[]' ${json}) + +for file in ${files}; do + dir=$(dirname "${file}") + path="${prefix}/${dir}" + mkdir -p ${path} + jq -r ".[\"${file}\"]" ${json} | gojsontoyaml -yamltojson | gojsontoyaml > "${prefix}/${file}" +done diff --git a/hack/scripts/generate-alertmanager-config-secret.sh b/hack/scripts/generate-alertmanager-config-secret.sh deleted file mode 100755 index b0b4aaef77eb4467a51b2b979cd916b41ceed798..0000000000000000000000000000000000000000 --- a/hack/scripts/generate-alertmanager-config-secret.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -cat <<-EOF -apiVersion: v1 -kind: Secret -metadata: - name: alertmanager-main -data: - alertmanager.yaml: $(cat assets/alertmanager/alertmanager.yaml | base64 --wrap=0) -EOF - diff --git a/hack/scripts/generate-dashboards-configmap.sh b/hack/scripts/generate-dashboards-configmap.sh deleted file mode 100755 index 47ccfc127d876d85d7c46bf5e6e9d7f61f89dca6..0000000000000000000000000000000000000000 --- a/hack/scripts/generate-dashboards-configmap.sh +++ /dev/null @@ -1,39 +0,0 @@ -#!/bin/bash -set -e -set +x - -cat <<-EOF -apiVersion: v1 -kind: ConfigMap -metadata: - name: grafana-dashboard-definitions-0 -data: -EOF - -for f in assets/grafana/generated/*-dashboard.json -do - rm -rf $f -done - -virtualenv -p python3 .env 2>&1 > /dev/null -source .env/bin/activate 2>&1 > /dev/null -pip install -Ur requirements.txt 2>&1 > /dev/null -for f in assets/grafana/*.dashboard.py -do - basefilename=$(basename $f) - JSON_FILENAME="assets/grafana/generated/${basefilename%%.*}-dashboard.json" - generate-dashboard $f -o $JSON_FILENAME 2>&1 > /dev/null -done - -cp assets/grafana/raw-json-dashboards/*-dashboard.json assets/grafana/generated/ - -for f in assets/grafana/generated/*-dashboard.json -do - basefilename=$(basename $f) - echo " $basefilename: |+" - if [ "$basefilename" = "etcd-dashboard.json" ]; then - hack/scripts/wrap-dashboard.sh $f prometheus-etcd | sed "s/^/ /g" - else - hack/scripts/wrap-dashboard.sh $f prometheus | sed "s/^/ /g" - fi -done diff --git a/hack/scripts/generate-grafana-credentials-secret.sh b/hack/scripts/generate-grafana-credentials-secret.sh deleted file mode 100755 index e877b080b1442d829c386e74a8c93c222a4138fe..0000000000000000000000000000000000000000 --- a/hack/scripts/generate-grafana-credentials-secret.sh +++ /dev/null @@ -1,20 +0,0 @@ -#!/bin/bash - -if [ "$#" -ne 2 ]; then - echo "Usage: $0 user password" - exit 1 -fi - -user=$1 -password=$2 - -cat <<-EOF -apiVersion: v1 -kind: Secret -metadata: - name: grafana-credentials -data: - user: $(echo -n ${user} | base64 --wrap=0) - password: $(echo -n ${password} | base64 --wrap=0) -EOF - diff --git a/hack/scripts/generate-manifests.sh b/hack/scripts/generate-manifests.sh deleted file mode 100755 index 6f14056b9c69cf75ebcea27ce866eca7ce56521e..0000000000000000000000000000000000000000 --- a/hack/scripts/generate-manifests.sh +++ /dev/null @@ -1,26 +0,0 @@ -#!/bin/bash -set -e -set +x - -# Generate Alert Rules ConfigMap -hack/scripts/generate-rules-configmap.sh > manifests/prometheus/prometheus-k8s-rules.yaml - -# Generate Dashboard ConfigMap -hack/scripts/generate-dashboards-configmap.sh > manifests/grafana/grafana-dashboard-definitions.yaml - -# Generate Dashboard ConfigMap with configmap-generator tool -# Max Size per ConfigMap: 240000 -# Input dir: assets/grafana -# output file: manifests/grafana/grafana-dashboards.yaml -# grafana deployment output file: manifests/grafana/grafana-deployment.yaml -test -f manifests/grafana/grafana-dashboard-definitions.yaml && rm -f manifests/grafana/grafana-dashboard-definitions.yaml -test -f manifests/grafana/grafana-deployment.yaml && rm -f manifests/grafana/grafana-deployment.yaml -test -f manifests/grafana/grafana-dashboards.yaml && rm -f manifests/grafana/grafana-dashboards.yaml -hack/grafana-dashboards-configmap-generator/bin/grafana_dashboards_generate.sh -s 240000 -i assets/grafana/generated -o manifests/grafana/grafana-dashboard-definitions.yaml -g manifests/grafana/grafana-deployment.yaml -d manifests/grafana/grafana-dashboards.yaml - -# Generate Grafana Credentials Secret -hack/scripts/generate-grafana-credentials-secret.sh admin admin > manifests/grafana/grafana-credentials.yaml - -# Generate Secret for Alertmanager config -hack/scripts/generate-alertmanager-config-secret.sh > manifests/alertmanager/alertmanager-config.yaml - diff --git a/hack/scripts/generate-rules-configmap.sh b/hack/scripts/generate-rules-configmap.sh deleted file mode 100755 index 96c5433f18229ee71e645b864d7da44693842922..0000000000000000000000000000000000000000 --- a/hack/scripts/generate-rules-configmap.sh +++ /dev/null @@ -1,18 +0,0 @@ -#!/bin/bash - -cat <<-EOF -apiVersion: v1 -kind: ConfigMap -metadata: - name: prometheus-k8s-rules - labels: - role: alert-rules - prometheus: k8s -data: -EOF - -for f in assets/prometheus/rules/*.rules.y*ml -do - echo " $(basename "$f"): |+" - cat $f | sed "s/^/ /g" -done diff --git a/hack/scripts/wrap-dashboard.sh b/hack/scripts/wrap-dashboard.sh deleted file mode 100755 index d3b04085902aa87e8fefa548144a9f59ced9a2a7..0000000000000000000000000000000000000000 --- a/hack/scripts/wrap-dashboard.sh +++ /dev/null @@ -1,51 +0,0 @@ -#!/bin/bash -eu - -# Intended usage: -# * Edit dashboard in Grafana (you need to login first with admin/admin -# login/password). -# * Save dashboard in Grafana to check is specification is correct. -# Looks like this is the only way to check if dashboard specification -# has errors. -# * Download dashboard specification as JSON file in Grafana: -# Share -> Export -> Save to file. -# * Drop dashboard specification in assets folder: -# mv Nodes-1488465802729.json assets/grafana/node-dashboard.json -# * Regenerate Grafana configmap: -# ./hack/scripts/generate-manifests.sh -# * Apply new configmap: -# kubectl -n monitoring apply -f manifests/grafana/grafana-cm.yaml - -if [ "$#" -ne 2 ]; then - echo "Usage: $0 path-to-dashboard.json grafana-prometheus-datasource-name" - exit 1 -fi - -dashboardjson=$1 -datasource_name=$2 -inputname="DS_PROMETHEUS" - -if [ "$datasource_name" = "prometheus-etcd" ]; then - inputname="DS_PROMETHEUS-ETCD" -fi - -cat <<EOF -{ - "dashboard": -EOF - -cat $dashboardjson - -cat <<EOF -, - "inputs": [ - { - "name": "$inputname", - "pluginId": "prometheus", - "type": "datasource", - "value": "$datasource_name" - } - ], - "overwrite": true -} -EOF - diff --git a/jsonnet/alertmanager/alertmanager-main-secret.libsonnet b/jsonnet/alertmanager/alertmanager-main-secret.libsonnet index fca165667692cf2cd3fdba34ef527f5e93951b09..a8f9011b3bd6ed062661fe8b9de3535d1d91de8d 100644 --- a/jsonnet/alertmanager/alertmanager-main-secret.libsonnet +++ b/jsonnet/alertmanager/alertmanager-main-secret.libsonnet @@ -1,25 +1,8 @@ local k = import "ksonnet.beta.3/k.libsonnet"; local secret = k.core.v1.secret; -local plainConfig = "global: - resolve_timeout: 5m -route: - group_by: ['job'] - group_wait: 30s - group_interval: 5m - repeat_interval: 12h - receiver: 'null' - routes: - - match: - alertname: DeadMansSwitch - receiver: 'null' -receivers: -- name: 'null'"; - -local config = std.base64(plainConfig); - { - new(namespace):: - secret.new("alertmanager-main", {"alertmanager.yaml": config}) + + new(namespace, plainConfig):: + secret.new("alertmanager-main", {"alertmanager.yaml": std.base64(plainConfig)}) + secret.mixin.metadata.withNamespace(namespace) } diff --git a/jsonnet/kube-prometheus.jsonnet b/jsonnet/kube-prometheus.jsonnet deleted file mode 100644 index 3a0ef2cfc477bc4db79eb6c852c54f77f0f8f582..0000000000000000000000000000000000000000 --- a/jsonnet/kube-prometheus.jsonnet +++ /dev/null @@ -1,62 +0,0 @@ -local k = import "ksonnet.beta.3/k.libsonnet"; - -local alertmanager = import "alertmanager/alertmanager.libsonnet"; -local ksm = import "kube-state-metrics/kube-state-metrics.libsonnet"; -local nodeExporter = import "node-exporter/node-exporter.libsonnet"; -local po = import "prometheus-operator/prometheus-operator.libsonnet"; -local prometheus = import "prometheus/prometheus.libsonnet"; - -local namespace = "monitoring"; - -local objects = { - "alertmanager-main/alertmanager-main-secret.yaml": alertmanager.config.new(namespace), - "alertmanager-main/alertmanager-main-service-account.yaml": alertmanager.serviceAccount.new(namespace), - "alertmanager-main/alertmanager-main-service.yaml": alertmanager.service.new(namespace), - "alertmanager-main/alertmanager-main.yaml": alertmanager.alertmanager.new(namespace), - - "kube-state-metrics/kube-state-metrics-cluster-role-binding": ksm.clusterRoleBinding.new(namespace), - "kube-state-metrics/kube-state-metrics-cluster-role.yaml": ksm.clusterRole.new(), - "kube-state-metrics/kube-state-metrics-deployment.yaml": ksm.deployment.new(namespace), - "kube-state-metrics/kube-state-metrics-role-binding.yaml": ksm.roleBinding.new(namespace), - "kube-state-metrics/kube-state-metrics-role.yaml": ksm.role.new(namespace), - "kube-state-metrics/kube-state-metrics-service-account.yaml": ksm.serviceAccount.new(namespace), - "kube-state-metrics/kube-state-metrics-service.yaml": ksm.service.new(namespace), - - "node-exporter/node-exporter-cluster-role-binding.yaml": nodeExporter.clusterRoleBinding.new(namespace), - "node-exporter/node-exporter-cluster-role.yaml": nodeExporter.clusterRole.new(), - "node-exporter/node-exporter-daemonset.yaml": nodeExporter.daemonset.new(namespace), - "node-exporter/node-exporter-service-account.yaml": nodeExporter.serviceAccount.new(namespace), - "node-exporter/node-exporter-service.yaml": nodeExporter.service.new(namespace), - - "prometheus-operator/prometheus-operator-cluster-role-binding.yaml": po.clusterRoleBinding.new(namespace), - "prometheus-operator/prometheus-operator-cluster-role.yaml": po.clusterRole.new(), - "prometheus-operator/prometheus-operator-deployment.yaml": po.deployment.new(namespace), - "prometheus-operator/prometheus-operator-service.yaml": po.service.new(namespace), - "prometheus-operator/prometheus-operator-service-account.yaml": po.serviceAccount.new(namespace), - - "prometheus-k8s/prometheus-k8s-cluster-role-binding.yaml": prometheus.clusterRoleBinding.new(namespace), - "prometheus-k8s/prometheus-k8s-cluster-role.yaml": prometheus.clusterRole.new(), - "prometheus-k8s/prometheus-k8s-service-account.yaml": prometheus.serviceAccount.new(namespace), - "prometheus-k8s/prometheus-k8s-service.yaml": prometheus.service.new(namespace), - "prometheus-k8s/prometheus-k8s.yaml": prometheus.prometheus.new(namespace), - "prometheus-k8s/prometheus-k8s-role-binding-config.yaml": prometheus.roleBindingConfig.new(namespace), - "prometheus-k8s/prometheus-k8s-role-binding-namespace.yaml": prometheus.roleBindingNamespace.new(namespace), - "prometheus-k8s/prometheus-k8s-role-binding-kube-system.yaml": prometheus.roleBindingKubeSystem.new(namespace), - "prometheus-k8s/prometheus-k8s-role-binding-default.yaml": prometheus.roleBindingDefault.new(namespace), - "prometheus-k8s/prometheus-k8s-role-config.yaml": prometheus.roleConfig.new(namespace), - "prometheus-k8s/prometheus-k8s-role-namespace.yaml": prometheus.roleNamespace.new(namespace), - "prometheus-k8s/prometheus-k8s-role-kube-system.yaml": prometheus.roleKubeSystem.new(), - "prometheus-k8s/prometheus-k8s-role-default.yaml": prometheus.roleDefault.new(), - "prometheus-k8s/prometheus-k8s-service-monitor-alertmanager.yaml": prometheus.serviceMonitorAlertmanager.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-apiserver.yaml": prometheus.serviceMonitorApiserver.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-coredns.yaml": prometheus.serviceMonitorCoreDNS.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-kube-controller-manager.yaml": prometheus.serviceMonitorControllerManager.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-kube-scheduler.yaml": prometheus.serviceMonitorScheduler.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-kube-state-metrics.yaml": prometheus.serviceMonitorKubeStateMetrics.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-kubelet.yaml": prometheus.serviceMonitorKubelet.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-node-exporter.yaml": prometheus.serviceMonitorNodeExporter.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-prometheus-operator.yaml": prometheus.serviceMonitorPrometheusOperator.new(namespace), - "prometheus-k8s/prometheus-k8s-service-monitor-prometheus.yaml": prometheus.serviceMonitorPrometheus.new(namespace), -}; - -{[path]: std.manifestYamlDoc(objects[path]) for path in std.objectFields(objects)} diff --git a/jsonnet/kube-prometheus.libsonnet b/jsonnet/kube-prometheus.libsonnet new file mode 100644 index 0000000000000000000000000000000000000000..44ee06a5dfd41b0c2bd49b8b2be81d508914cb43 --- /dev/null +++ b/jsonnet/kube-prometheus.libsonnet @@ -0,0 +1,85 @@ +local k = import "ksonnet.beta.3/k.libsonnet"; + +local alertmanager = import "alertmanager/alertmanager.libsonnet"; +local ksm = import "kube-state-metrics/kube-state-metrics.libsonnet"; +local nodeExporter = import "node-exporter/node-exporter.libsonnet"; +local po = import "prometheus-operator/prometheus-operator.libsonnet"; +local prometheus = import "prometheus/prometheus.libsonnet"; +local grafana = import "grafana/grafana.libsonnet"; + +local alertmanagerConfig = importstr "../assets/alertmanager/alertmanager.yaml"; + +local ruleFiles = { + "alertmanager.rules.yaml": importstr "../assets/prometheus/rules/alertmanager.rules.yaml", + "etcd3.rules.yaml": importstr "../assets/prometheus/rules/etcd3.rules.yaml", + "general.rules.yaml": importstr "../assets/prometheus/rules/general.rules.yaml", + "kube-controller-manager.rules.yaml": importstr "../assets/prometheus/rules/kube-controller-manager.rules.yaml", + "kube-scheduler.rules.yaml": importstr "../assets/prometheus/rules/kube-scheduler.rules.yaml", + "kube-state-metrics.rules.yaml": importstr "../assets/prometheus/rules/kube-state-metrics.rules.yaml", + "kubelet.rules.yaml": importstr "../assets/prometheus/rules/kubelet.rules.yaml", + "kubernetes.rules.yaml": importstr "../assets/prometheus/rules/kubernetes.rules.yaml", + "node.rules.yaml": importstr "../assets/prometheus/rules/node.rules.yaml", + "prometheus.rules.yaml": importstr "../assets/prometheus/rules/prometheus.rules.yaml", +}; + +{ + new(namespace):: + { + "grafana/grafana-dashboard-definitions.yaml": grafana.dashboardDefinitions.new(namespace), + "grafana/grafana-dashboard-sources.yaml": grafana.dashboardSources.new(namespace), + "grafana/grafana-datasources.yaml": grafana.dashboardDatasources.new(namespace), + "grafana/grafana-deployment.yaml": grafana.deployment.new(namespace), + "grafana/grafana-service-account.yaml": grafana.serviceAccount.new(namespace), + "grafana/grafana-service.yaml": grafana.service.new(namespace), + + "alertmanager-main/alertmanager-main-secret.yaml": alertmanager.config.new(namespace, alertmanagerConfig), + "alertmanager-main/alertmanager-main-service-account.yaml": alertmanager.serviceAccount.new(namespace), + "alertmanager-main/alertmanager-main-service.yaml": alertmanager.service.new(namespace), + "alertmanager-main/alertmanager-main.yaml": alertmanager.alertmanager.new(namespace), + + "kube-state-metrics/kube-state-metrics-cluster-role-binding.yaml": ksm.clusterRoleBinding.new(namespace), + "kube-state-metrics/kube-state-metrics-cluster-role.yaml": ksm.clusterRole.new(), + "kube-state-metrics/kube-state-metrics-deployment.yaml": ksm.deployment.new(namespace), + "kube-state-metrics/kube-state-metrics-role-binding.yaml": ksm.roleBinding.new(namespace), + "kube-state-metrics/kube-state-metrics-role.yaml": ksm.role.new(namespace), + "kube-state-metrics/kube-state-metrics-service-account.yaml": ksm.serviceAccount.new(namespace), + "kube-state-metrics/kube-state-metrics-service.yaml": ksm.service.new(namespace), + + "node-exporter/node-exporter-cluster-role-binding.yaml": nodeExporter.clusterRoleBinding.new(namespace), + "node-exporter/node-exporter-cluster-role.yaml": nodeExporter.clusterRole.new(), + "node-exporter/node-exporter-daemonset.yaml": nodeExporter.daemonset.new(namespace), + "node-exporter/node-exporter-service-account.yaml": nodeExporter.serviceAccount.new(namespace), + "node-exporter/node-exporter-service.yaml": nodeExporter.service.new(namespace), + + "prometheus-operator/prometheus-operator-cluster-role-binding.yaml": po.clusterRoleBinding.new(namespace), + "prometheus-operator/prometheus-operator-cluster-role.yaml": po.clusterRole.new(), + "prometheus-operator/prometheus-operator-deployment.yaml": po.deployment.new(namespace), + "prometheus-operator/prometheus-operator-service.yaml": po.service.new(namespace), + "prometheus-operator/prometheus-operator-service-account.yaml": po.serviceAccount.new(namespace), + + "prometheus-k8s/prometheus-k8s-cluster-role-binding.yaml": prometheus.clusterRoleBinding.new(namespace), + "prometheus-k8s/prometheus-k8s-cluster-role.yaml": prometheus.clusterRole.new(), + "prometheus-k8s/prometheus-k8s-service-account.yaml": prometheus.serviceAccount.new(namespace), + "prometheus-k8s/prometheus-k8s-service.yaml": prometheus.service.new(namespace), + "prometheus-k8s/prometheus-k8s.yaml": prometheus.prometheus.new(namespace), + "prometheus-k8s/prometheus-k8s-rules.yaml": prometheus.rules.new(namespace, ruleFiles), + "prometheus-k8s/prometheus-k8s-role-binding-config.yaml": prometheus.roleBindingConfig.new(namespace), + "prometheus-k8s/prometheus-k8s-role-binding-namespace.yaml": prometheus.roleBindingNamespace.new(namespace), + "prometheus-k8s/prometheus-k8s-role-binding-kube-system.yaml": prometheus.roleBindingKubeSystem.new(namespace), + "prometheus-k8s/prometheus-k8s-role-binding-default.yaml": prometheus.roleBindingDefault.new(namespace), + "prometheus-k8s/prometheus-k8s-role-config.yaml": prometheus.roleConfig.new(namespace), + "prometheus-k8s/prometheus-k8s-role-namespace.yaml": prometheus.roleNamespace.new(namespace), + "prometheus-k8s/prometheus-k8s-role-kube-system.yaml": prometheus.roleKubeSystem.new(), + "prometheus-k8s/prometheus-k8s-role-default.yaml": prometheus.roleDefault.new(), + "prometheus-k8s/prometheus-k8s-service-monitor-alertmanager.yaml": prometheus.serviceMonitorAlertmanager.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-apiserver.yaml": prometheus.serviceMonitorApiserver.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-coredns.yaml": prometheus.serviceMonitorCoreDNS.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-kube-controller-manager.yaml": prometheus.serviceMonitorControllerManager.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-kube-scheduler.yaml": prometheus.serviceMonitorScheduler.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-kube-state-metrics.yaml": prometheus.serviceMonitorKubeStateMetrics.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-kubelet.yaml": prometheus.serviceMonitorKubelet.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-node-exporter.yaml": prometheus.serviceMonitorNodeExporter.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-prometheus-operator.yaml": prometheus.serviceMonitorPrometheusOperator.new(namespace), + "prometheus-k8s/prometheus-k8s-service-monitor-prometheus.yaml": prometheus.serviceMonitorPrometheus.new(namespace), + } +} diff --git a/jsonnet/prometheus-operator/prometheus-operator-deployment.libsonnet b/jsonnet/prometheus-operator/prometheus-operator-deployment.libsonnet index 2ad7f5260bebe19fdfd4a3bb3378be61f86d07c9..0212bc966e1e122b498717a56136b81bb7eefff9 100644 --- a/jsonnet/prometheus-operator/prometheus-operator-deployment.libsonnet +++ b/jsonnet/prometheus-operator/prometheus-operator-deployment.libsonnet @@ -2,7 +2,7 @@ local k = import "ksonnet.beta.3/k.libsonnet"; local rawVersion = importstr "../../../../VERSION"; local removeLineBreaks = function(str) std.join("", std.filter(function(c) c != "\n", std.stringChars(str))); -local version = removeLineBreaks(rawVersion); +local version = "v0.18.1";//removeLineBreaks(rawVersion); local deployment = k.apps.v1beta2.deployment; local container = k.apps.v1beta2.deployment.mixin.spec.template.spec.containersType; @@ -12,7 +12,7 @@ local targetPort = 8080; local podLabels = {"k8s-app": "prometheus-operator"}; local operatorContainer = - container.new("prometheus-operator", "quay.io/coreos/prometheus-operator:v" + version) + + container.new("prometheus-operator", "quay.io/coreos/prometheus-operator:" + version) + container.withPorts(containerPort.newNamed("http", targetPort)) + container.withArgs(["--kubelet-service=kube-system/kubelet", "--config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1"]) + container.mixin.resources.withRequests({cpu: "100m", memory: "50Mi"}) + diff --git a/jsonnet/prometheus/prometheus-k8s-rules.libsonnet b/jsonnet/prometheus/prometheus-k8s-rules.libsonnet new file mode 100644 index 0000000000000000000000000000000000000000..abe98fa943dd632ff6fbda18a2ea7dd7af3f0183 --- /dev/null +++ b/jsonnet/prometheus/prometheus-k8s-rules.libsonnet @@ -0,0 +1,8 @@ +local k = import "ksonnet.beta.3/k.libsonnet"; +local configMap = k.core.v1.configMap; + +{ + new(namespace, ruleFiles):: + configMap.new("prometheus-k8s-rules", ruleFiles) + + configMap.mixin.metadata.withNamespace(namespace) +} diff --git a/jsonnet/prometheus/prometheus-k8s-service.libsonnet b/jsonnet/prometheus/prometheus-k8s-service.libsonnet index 96781d695ecc8e98d97e71733bb97a8057ee167b..add240ddd0ec37ee01f40299717c9f9e6f72fa7e 100644 --- a/jsonnet/prometheus/prometheus-k8s-service.libsonnet +++ b/jsonnet/prometheus/prometheus-k8s-service.libsonnet @@ -4,7 +4,6 @@ local servicePort = k.core.v1.service.mixin.spec.portsType; local prometheusPort = servicePort.newNamed("web", 9090, "web"); - { new(namespace):: service.new("prometheus-k8s", {app: "prometheus", prometheus: "k8s"}, prometheusPort) + diff --git a/jsonnet/prometheus/prometheus.libsonnet b/jsonnet/prometheus/prometheus.libsonnet index edc75c0893d5415c4c25ec7f78be4f1f20ef0a7d..99fb9f541ab5a6b8f7ccb72b6c1af96c3da48f0b 100644 --- a/jsonnet/prometheus/prometheus.libsonnet +++ b/jsonnet/prometheus/prometheus.libsonnet @@ -9,6 +9,7 @@ roleNamespace:: import "prometheus-k8s-role-namespace.libsonnet", roleKubeSystem:: import "prometheus-k8s-role-kube-system.libsonnet", roleDefault:: import "prometheus-k8s-role-default.libsonnet", + rules:: import "prometheus-k8s-rules.libsonnet", serviceAccount:: import "prometheus-k8s-service-account.libsonnet", serviceMonitorAlertmanager:: import "prometheus-k8s-service-monitor-alertmanager.libsonnet", serviceMonitorApiserver:: import "prometheus-k8s-service-monitor-apiserver.libsonnet", diff --git a/manifests/alertmanager/alertmanager-config.yaml b/manifests/alertmanager-main/alertmanager-main-secret.yaml similarity index 91% rename from manifests/alertmanager/alertmanager-config.yaml rename to manifests/alertmanager-main/alertmanager-main-secret.yaml index 62d39016214739e8f04a9158aad07df4a488013b..4a143fbb16883e5afd3375518d4abf12632398fe 100644 --- a/manifests/alertmanager/alertmanager-config.yaml +++ b/manifests/alertmanager-main/alertmanager-main-secret.yaml @@ -1,6 +1,8 @@ apiVersion: v1 +data: + alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0Kcm91dGU6CiAgZ3JvdXBfYnk6IFsnam9iJ10KICBncm91cF93YWl0OiAzMHMKICBncm91cF9pbnRlcnZhbDogNW0KICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJlY2VpdmVyOiAnbnVsbCcKICByb3V0ZXM6CiAgLSBtYXRjaDoKICAgICAgYWxlcnRuYW1lOiBEZWFkTWFuc1N3aXRjaAogICAgcmVjZWl2ZXI6ICdudWxsJwpyZWNlaXZlcnM6Ci0gbmFtZTogJ251bGwnCg== kind: Secret metadata: name: alertmanager-main -data: - alertmanager.yaml: Z2xvYmFsOgogIHJlc29sdmVfdGltZW91dDogNW0Kcm91dGU6CiAgZ3JvdXBfYnk6IFsnam9iJ10KICBncm91cF93YWl0OiAzMHMKICBncm91cF9pbnRlcnZhbDogNW0KICByZXBlYXRfaW50ZXJ2YWw6IDEyaAogIHJlY2VpdmVyOiAnbnVsbCcKICByb3V0ZXM6CiAgLSBtYXRjaDoKICAgICAgYWxlcnRuYW1lOiBEZWFkTWFuc1N3aXRjaAogICAgcmVjZWl2ZXI6ICdudWxsJwpyZWNlaXZlcnM6Ci0gbmFtZTogJ251bGwnCg== + namespace: monitoring +type: Opaque diff --git a/manifests/alertmanager-main/alertmanager-main-service-account.yaml b/manifests/alertmanager-main/alertmanager-main-service-account.yaml new file mode 100644 index 0000000000000000000000000000000000000000..5c06d5e40cb4b3d99ad5388af1bb401458b63213 --- /dev/null +++ b/manifests/alertmanager-main/alertmanager-main-service-account.yaml @@ -0,0 +1,5 @@ +apiVersion: v1 +kind: ServiceAccount +metadata: + name: alertmanager-main + namespace: monitoring diff --git a/manifests/alertmanager/alertmanager-service.yaml b/manifests/alertmanager-main/alertmanager-main-service.yaml similarity index 78% rename from manifests/alertmanager/alertmanager-service.yaml rename to manifests/alertmanager-main/alertmanager-main-service.yaml index a5413102dd60338d6dffffa29810d4abc0e5783b..0c7567932b707ebe7c30571f009ab7381b254ee4 100644 --- a/manifests/alertmanager/alertmanager-service.yaml +++ b/manifests/alertmanager-main/alertmanager-main-service.yaml @@ -4,13 +4,12 @@ metadata: labels: alertmanager: main name: alertmanager-main + namespace: monitoring spec: - type: NodePort ports: - name: web - nodePort: 30903 port: 9093 - protocol: TCP targetPort: web selector: alertmanager: main + app: alertmanager diff --git a/manifests/alertmanager/alertmanager.yaml b/manifests/alertmanager-main/alertmanager-main.yaml similarity index 70% rename from manifests/alertmanager/alertmanager.yaml rename to manifests/alertmanager-main/alertmanager-main.yaml index 3157d5874a5ec995fec3b9ebcaa44026ba5b27c9..84e72ec521f19dc69889b3ea36d9ab18db85b284 100644 --- a/manifests/alertmanager/alertmanager.yaml +++ b/manifests/alertmanager-main/alertmanager-main.yaml @@ -1,9 +1,11 @@ apiVersion: monitoring.coreos.com/v1 kind: Alertmanager metadata: - name: main labels: alertmanager: main + name: main + namespace: monitoring spec: replicas: 3 + serviceAccountName: alertmanager-main version: v0.14.0 diff --git a/manifests/etcd/etcd-bootkube-gce.yaml b/manifests/etcd/etcd-bootkube-gce.yaml deleted file mode 100644 index 94e1c020b33f48906da12a50def47c441ab982cc..0000000000000000000000000000000000000000 --- a/manifests/etcd/etcd-bootkube-gce.yaml +++ /dev/null @@ -1,28 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - name: etcd-k8s - labels: - k8s-app: etcd -spec: - type: ClusterIP - clusterIP: None - ports: - - name: api - port: 2379 - protocol: TCP ---- -apiVersion: v1 -kind: Endpoints -metadata: - name: etcd-k8s - labels: - k8s-app: etcd -subsets: -- addresses: - - ip: 10.142.0.2 - nodeName: 10.142.0.2 - ports: - - name: api - port: 2379 - protocol: TCP diff --git a/manifests/etcd/etcd-bootkube-vagrant-multi.yaml b/manifests/etcd/etcd-bootkube-vagrant-multi.yaml deleted file mode 100644 index 51293c60e96839b4586e11ad9c77033bb554d5b3..0000000000000000000000000000000000000000 --- a/manifests/etcd/etcd-bootkube-vagrant-multi.yaml +++ /dev/null @@ -1,28 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - name: etcd-k8s - labels: - k8s-app: etcd -spec: - type: ClusterIP - clusterIP: None - ports: - - name: api - port: 2379 - protocol: TCP ---- -apiVersion: v1 -kind: Endpoints -metadata: - name: etcd-k8s - labels: - k8s-app: etcd -subsets: -- addresses: - - ip: 172.17.4.51 - nodeName: 172.17.4.51 - ports: - - name: api - port: 2379 - protocol: TCP diff --git a/manifests/grafana/grafana-credentials.yaml b/manifests/grafana/grafana-credentials.yaml deleted file mode 100644 index c3da1b631131dee0940720aa1f85aa62c7fa0f18..0000000000000000000000000000000000000000 --- a/manifests/grafana/grafana-credentials.yaml +++ /dev/null @@ -1,7 +0,0 @@ -apiVersion: v1 -kind: Secret -metadata: - name: grafana-credentials -data: - user: YWRtaW4= - password: YWRtaW4= diff --git a/manifests/grafana/grafana-dashboard-definitions.yaml b/manifests/grafana/grafana-dashboard-definitions.yaml index dbd2a30d47eeb977dab15bdca78d9b95abe646f4..df4c5203b2537f72ed93f6d438ef9a188e7e2961 100644 --- a/manifests/grafana/grafana-dashboard-definitions.yaml +++ b/manifests/grafana/grafana-dashboard-definitions.yaml @@ -1,7360 +1,5403 @@ apiVersion: v1 -kind: ConfigMap -metadata: - name: grafana-dashboard-definitions-0 data: - deployment-dashboard.json: |+ + deployments-dashboard.json: |- { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "graphTooltip": 1, - "hideControls": false, - "links": [], - "rows": [ - { - "collapse": false, - "editable": false, - "height": "200px", - "panels": [ + "annotations": { + "list": [ + + ] + }, + "editable": false, + "gnetId": null, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [ + + ], + "refresh": "", + "rows": [ { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 8, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "cores", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "CPU", - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 2, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 4, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "lineColor": "rgb(31, 120, 193)", + "show": true + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$deployment_namespace\",pod_name=\u007e\"$deployment_name.*\"}[3m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "CPU", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 3, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 4, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "lineColor": "rgb(31, 120, 193)", + "show": true + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=\u007e\"$deployment_name.*\"}) / 1024^3", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Memory", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 4, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 4, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "lineColor": "rgb(31, 120, 193)", + "show": true + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=\u007e\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=\u007e\"$deployment_name.*\"}[3m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Network", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 9, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "GB", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "80%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(container_memory_usage_bytes{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}) / 1024^3", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Memory", - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" + "collapse": false, + "collapsed": false, + "height": "100px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 5, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "max(kube_deployment_spec_replicas{namespace=\"$deployment_namespace\",deployment=\"$deployment_name\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Desired Replicas", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 6, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "min(kube_deployment_status_replicas_available{namespace=\"$deployment_namespace\",deployment=\"$deployment_name\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Available Replicas", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 7, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "max(kube_deployment_status_observed_generation{namespace=\"$deployment_namespace\",deployment=\"$deployment_name\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Observed Generation", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 8, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Metadata Generation", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "Bps", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": false - }, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 9, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "current replicas", + "refId": "A" + }, + { + "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "available", + "refId": "B" + }, + { + "expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "unavailable", + "refId": "C" + }, + { + "expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "updated", + "refId": "D" + }, + { + "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "desired", + "refId": "E" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Replicas", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" + } + ], + "schemaVersion": 14, + "style": "dark", + "tags": [ + + ], + "templating": { + "list": [ + { + "allValue": null, + "current": { + + }, + "datasource": "prometheus", + "hide": 0, + "includeAll": false, + "label": "Namespace", + "multi": false, + "name": "deployment_namespace", + "options": [ + + ], + "query": "label_values(kube_deployment_metadata_generation, namespace)", + "refresh": 2, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [ + + ], + "tagsQuery": "", + "type": "query", + "useTags": false }, { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$deployment_namespace\",pod_name=~\"$deployment_name.*\"}[3m]))", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Network", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" + "allValue": null, + "current": { + + }, + "datasource": "prometheus", + "hide": 0, + "includeAll": false, + "label": "Name", + "multi": false, + "name": "deployment_name", + "options": [ + + ], + "query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)", + "refresh": 2, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [ + + ], + "tagsQuery": "", + "type": "query", + "useTags": false + } + ] + }, + "time": { + "from": "now-1h", + "to": "now" + }, + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] + }, + "timezone": "browser", + "title": "Deployments", + "version": 0 + } + kubernetes-capacity-planning-dashboard.json: |- + { + "annotations": { + "list": [ + + ] }, - { - "collapse": false, - "editable": false, - "height": "100px", - "panels": [ + "editable": false, + "gnetId": null, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [ + + ], + "refresh": "", + "rows": [ { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": false - }, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "metric": "kube_deployment_spec_replicas", - "refId": "A", - "step": 600 - } - ], - "title": "Desired Replicas", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 2, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(rate(node_cpu{mode=\"idle\"}[2m])) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{cpu}}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Idle CPU", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "percent", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "percent", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + }, + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 3, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(node_load1)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "load 1m", + "refId": "A" + }, + { + "expr": "sum(node_load5)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "load 5m", + "refId": "B" + }, + { + "expr": "sum(node_load15)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "load 15m", + "refId": "C" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "System Load", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "percent", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "percent", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 6, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Available Replicas", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 4, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 9, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory used", + "refId": "A" + }, + { + "expr": "sum(node_memory_Buffers)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory buffers", + "refId": "B" + }, + { + "expr": "sum(node_memory_Cached)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory cached", + "refId": "C" + }, + { + "expr": "sum(node_memory_MemFree)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory free", + "refId": "D" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Memory Usage", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 5, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Memory Usage", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_deployment_status_observed_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Observed Generation", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 6, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 9, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(rate(node_disk_bytes_read[5m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "read", + "refId": "A" + }, + { + "expr": "sum(rate(node_disk_bytes_written[5m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "written", + "refId": "B" + }, + { + "expr": "sum(rate(node_disk_io_time_ms[5m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "io time", + "refId": "C" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Disk I/O", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 7, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"}) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Disk Space Usage", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_deployment_metadata_generation{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Metadata Generation", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "350px", - "panels": [ + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 8, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(rate(node_network_receive_bytes{device!\u007e\"lo\"}[5m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Network Received", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + }, + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 9, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(rate(node_network_transmit_bytes{device!\u007e\"lo\"}[5m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Network Transmitted", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" + }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 1, - "isNew": true, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "max(kube_deployment_status_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "current replicas", - "refId": "A", - "step": 30 - }, - { - "expr": "min(kube_deployment_status_replicas_available{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "available", - "refId": "B", - "step": 30 - }, - { - "expr": "max(kube_deployment_status_replicas_unavailable{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "unavailable", - "refId": "C", - "step": 30 - }, - { - "expr": "min(kube_deployment_status_replicas_updated{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "updated", - "refId": "D", - "step": 30 - }, - { - "expr": "max(kube_deployment_spec_replicas{deployment=\"$deployment_name\",namespace=\"$deployment_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "desired", - "refId": "E", - "step": 30 - } - ], - "title": "Replicas", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "none", - "label": "", - "logBase": 1, - "show": true - }, - { - "format": "short", - "label": "", - "logBase": 1, - "show": false - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 10, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 9, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(kube_pod_info)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "Current Number of Pods", + "refId": "A" + }, + { + "expr": "sum(kube_node_status_capacity_pods)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "Maximum Capacity of Pods", + "refId": "B" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Cluster Pod Utilization", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 11, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Pod Utilization", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [ - { - "allValue": ".*", - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": false, - "label": "Namespace", - "multi": false, - "name": "deployment_namespace", - "options": [], - "query": "label_values(kube_deployment_metadata_generation, namespace)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": null, - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - }, - { - "allValue": null, - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": false, - "label": "Deployment", - "multi": false, - "name": "deployment_name", - "options": [], - "query": "label_values(kube_deployment_metadata_generation{namespace=\"$deployment_namespace\"}, deployment)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "deployment", - "type": "query", - "useTags": false - } - ] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Deployment", - "version": 1 + "schemaVersion": 14, + "style": "dark", + "tags": [ + + ], + "templating": { + "list": [ + + ] + }, + "time": { + "from": "now-24h", + "to": "now" + }, + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] + }, + "timezone": "browser", + "title": "Kubernetes Capacity Planning", + "version": 0 } - etcd-dashboard.json: |+ + kubernetes-cluster-health-dashboard.json: |- { - "__inputs": [ - { - "name": "prometheus", - "label": "prometheus", - "description": "", - "type": "datasource", - "pluginId": "prometheus", - "pluginName": "Prometheus" - } - ], - "__requires": [ - { - "type": "grafana", - "id": "grafana", - "name": "Grafana", - "version": "4.5.2" + "annotations": { + "list": [ + + ] + }, + "editable": false, + "gnetId": null, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [ + + ], + "refresh": "10s", + "rows": [ + { + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 2, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(up{job=\u007e\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Control Plane Components Down", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "Everything UP and healthy", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 3, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Alerts Firing", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 4, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Alerts Pending", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 5, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Crashlooping Pods", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" + }, + { + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 6, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Node Not Ready", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 7, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Node Disk Pressure", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 8, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Node Memory Pressure", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 9, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(kube_node_spec_unschedulable)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Node Unschedulable", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" + } + ], + "schemaVersion": 14, + "style": "dark", + "tags": [ + + ], + "templating": { + "list": [ + + ] + }, + "time": { + "from": "now-1h", + "to": "now" }, - { - "type": "panel", - "id": "graph", - "name": "Graph", - "version": "" + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] }, - { - "type": "datasource", - "id": "prometheus", - "name": "Prometheus", - "version": "1.0.0" + "timezone": "browser", + "title": "Kubernetes Cluster Health", + "version": 0 + } + kubernetes-cluster-status-dashboard.json: |- + { + "annotations": { + "list": [ + + ] }, - { - "type": "panel", - "id": "singlestat", - "name": "Singlestat", - "version": "" - } - ], - "annotations": { - "list": [] - }, - "description": "etcd sample Grafana dashboard with Prometheus", - "editable": false, - "gnetId": null, - "graphTooltip": 0, - "hideControls": false, - "id": null, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "height": "250px", - "panels": [ + "editable": false, + "gnetId": null, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [ + + ], + "refresh": "10s", + "rows": [ { - "cacheTimeout": null, - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "error": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 28, - "interval": null, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "nullText": null, - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "tableColumn": "", - "targets": [ - { - "expr": "sum(etcd_server_has_leader)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "metric": "etcd_server_has_leader", - "refId": "A", - "step": 20 - } - ], - "thresholds": "", - "title": "Up", - "type": "singlestat", - "valueFontSize": "200%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 2, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 6, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(up{job=\u007e\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Control Plane UP", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 3, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 6, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Alerts Firing", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 0, - "id": 23, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 5, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(grpc_server_started_total{grpc_type=\"unary\"}[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "RPC Rate", - "metric": "grpc_server_started_total", - "refId": "A", - "step": 4 - }, - { - "expr": "sum(rate(grpc_server_handled_total{grpc_type=\"unary\",grpc_code!=\"OK\"}[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "RPC Failed Rate", - "metric": "grpc_server_handled_total", - "refId": "B", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "RPC Rate", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "ops", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(245, 54, 54, 0.9)", + "rgba(237, 129, 40, 0.89)", + "rgba(50, 172, 45, 0.97)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 4, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "50, 80", + "title": "API Servers UP", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(245, 54, 54, 0.9)", + "rgba(237, 129, 40, 0.89)", + "rgba(50, 172, 45, 0.97)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 5, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "50, 80", + "title": "Controller Managers UP", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(245, 54, 54, 0.9)", + "rgba(237, 129, 40, 0.89)", + "rgba(50, 172, 45, 0.97)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 6, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "50, 80", + "title": "Schedulers Up", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 7, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "count(increase(kube_pod_container_status_restarts{namespace=\u007e\"kube-system|tectonic-system\"}[1h]) > 5)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Crashlooping Control Plane Pods", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 0, - "id": 41, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 4, - "stack": true, - "steppedLine": false, - "targets": [ - { - "expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Watch\",grpc_type=\"bidi_stream\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Watch Streams", - "metric": "grpc_server_handled_total", - "refId": "A", - "step": 4 - }, - { - "expr": "sum(grpc_server_started_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"}) - sum(grpc_server_handled_total{grpc_service=\"etcdserverpb.Lease\",grpc_type=\"bidi_stream\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Lease Streams", - "metric": "grpc_server_handled_total", - "refId": "B", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Active Streams", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": "", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 8, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(100 - (avg by (instance) (rate(node_cpu{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu{job=\"node-exporter\",mode=\"idle\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "CPU Utilization", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 9, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Memory Utilization", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 10, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Filesystem Utilization", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 11, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Pod Utilization", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" } - ], - "repeat": null, - "repeatIteration": null, - "repeatRowId": null, - "showTitle": false, - "title": "Row", - "titleSize": "h6" + ], + "schemaVersion": 14, + "style": "dark", + "tags": [ + + ], + "templating": { + "list": [ + + ] + }, + "time": { + "from": "now-1h", + "to": "now" }, - { - "collapse": false, - "height": "250px", - "panels": [ + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] + }, + "timezone": "browser", + "title": "Kubernetes Cluster Status", + "version": 0 + } + kubernetes-kubelet-dashboard.json: |- + { + "annotations": { + "list": [ + + ] + }, + "editable": false, + "gnetId": null, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [ + + ], + "refresh": "", + "rows": [ { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "decimals": null, - "editable": false, - "error": false, - "fill": 0, - "grid": {}, - "id": 1, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 4, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "etcd_debugging_mvcc_db_total_size_in_bytes", - "format": "time_series", - "hide": false, - "interval": "", - "intervalFactor": 2, - "legendFormat": "{{instance}} DB Size", - "metric": "", - "refId": "A", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "DB Size", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 2, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "lineColor": "rgb(31, 120, 193)", + "show": true + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(kubelet_running_pod_count{instance=\u007e\"$instance\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Count", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 3, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 10, + "stack": true, + "steppedLine": false, + "targets": [ + { + "expr": "kubelet_running_pod_count{instance=\u007e\"$instance\"}", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{ instance }}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Count", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": true, + "title": "Pods", + "titleSize": "h4", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 0, - "grid": {}, - "id": 3, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 1, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 4, - "stack": false, - "steppedLine": true, - "targets": [ - { - "expr": "histogram_quantile(0.99, sum(rate(etcd_disk_wal_fsync_duration_seconds_bucket[5m])) by (instance, le))", - "format": "time_series", - "hide": false, - "intervalFactor": 2, - "legendFormat": "{{instance}} WAL fsync", - "metric": "etcd_disk_wal_fsync_duration_seconds_bucket", - "refId": "A", - "step": 4 - }, - { - "expr": "histogram_quantile(0.99, sum(rate(etcd_disk_backend_commit_duration_seconds_bucket[5m])) by (instance, le))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "{{instance}} DB fsync", - "metric": "etcd_disk_backend_commit_duration_seconds_bucket", - "refId": "B", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Disk Sync Duration", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "s", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": false - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "#299c46", + "rgba(237, 129, 40, 0.89)", + "#d44a3a" + ], + "datasource": "prometheus", + "format": "none", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": false, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 4, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 2, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "lineColor": "rgb(31, 120, 193)", + "show": true + }, + "tableColumn": "", + "targets": [ + { + "expr": "sum(kubelet_running_container_count{instance=\u007e\"$instance\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "", + "title": "Count", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "0", + "value": "null" + } + ], + "valueName": "current" + }, + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 5, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 10, + "stack": true, + "steppedLine": false, + "targets": [ + { + "expr": "kubelet_running_container_count{instance=\u007e\"$instance\"}", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{ instance }}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Count", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": true, + "title": "Containers", + "titleSize": "h4", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 0, - "id": 29, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 4, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "process_resident_memory_bytes", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "{{instance}} Resident Memory", - "metric": "process_resident_memory_bytes", - "refId": "A", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Memory", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "description": "Rate of Kubelet Operations in 5min", + "fill": 1, + "gridPos": { + + }, + "id": 6, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum(rate(kubelet_runtime_operations{instance=\u007e\"$instance\"}[5m])) by (instance)", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{ instance }}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Operations", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": true, + "title": "Kubelet", + "titleSize": "h4", + "type": "row" } - ], - "repeat": null, - "repeatIteration": null, - "repeatRowId": null, - "showTitle": false, - "title": "New row", - "titleSize": "h6" + ], + "schemaVersion": 14, + "style": "dark", + "tags": [ + + ], + "templating": { + "list": [ + { + "allValue": null, + "current": { + + }, + "datasource": "prometheus", + "hide": 0, + "includeAll": true, + "label": null, + "multi": false, + "name": "instance", + "options": [ + + ], + "query": "label_values(kubelet_running_pod_count,instance)", + "refresh": 2, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [ + + ], + "tagsQuery": "", + "type": "query", + "useTags": false + } + ] }, - { - "collapse": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 5, - "id": 22, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 3, - "stack": true, - "steppedLine": false, - "targets": [ - { - "expr": "rate(etcd_network_client_grpc_received_bytes_total[5m])", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "{{instance}} Client Traffic In", - "metric": "etcd_network_client_grpc_received_bytes_total", - "refId": "A", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Client Traffic In", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "Bps", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, + "time": { + "from": "now-1h", + "to": "now" + }, + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] + }, + "timezone": "browser", + "title": "Kubelet", + "version": 0 + } + nodes.json: |- + { + "annotations": { + "list": [ + + ] + }, + "editable": false, + "gnetId": null, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [ + + ], + "refresh": "", + "rows": [ { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 5, - "id": 21, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 3, - "stack": true, - "steppedLine": false, - "targets": [ - { - "expr": "rate(etcd_network_client_grpc_sent_bytes_total[5m])", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "{{instance}} Client Traffic Out", - "metric": "etcd_network_client_grpc_sent_bytes_total", - "refId": "A", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Client Traffic Out", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "Bps", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 2, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "100 - (avg by (cpu) (irate(node_cpu{mode=\"idle\", instance=\"$server\"}[5m])) * 100)", + "format": "time_series", + "intervalFactor": 10, + "legendFormat": "{{cpu}}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Idle CPU", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "percent", + "label": null, + "logBase": 1, + "max": 100, + "min": 0, + "show": true + }, + { + "format": "percent", + "label": null, + "logBase": 1, + "max": 100, + "min": 0, + "show": true + } + ] + }, + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 3, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "node_load1{instance=\"$server\"} * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "load 1m", + "refId": "A" + }, + { + "expr": "node_load5{instance=\"$server\"} * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "load 5m", + "refId": "B" + }, + { + "expr": "node_load15{instance=\"$server\"} * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "load 15m", + "refId": "C" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "System load", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "percent", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "percent", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 0, - "id": 20, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 3, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(etcd_network_peer_received_bytes_total[5m])) by (instance)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "{{instance}} Peer Traffic In", - "metric": "etcd_network_peer_received_bytes_total", - "refId": "A", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Peer Traffic In", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "Bps", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 4, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 9, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory used", + "refId": "A" + }, + { + "expr": "node_memory_Buffers{instance=\"$server\"}", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory buffers", + "refId": "B" + }, + { + "expr": "node_memory_Cached{instance=\"$server\"}", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory cached", + "refId": "C" + }, + { + "expr": "node_memory_MemFree{instance=\"$server\"}", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "memory free", + "refId": "D" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Memory Usage", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ] + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 5, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "((node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}) / node_memory_MemTotal{instance=\"$server\"}) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Memory Usage", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "decimals": null, - "editable": false, - "error": false, - "fill": 0, - "grid": {}, - "id": 16, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 3, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(etcd_network_peer_sent_bytes_total[5m])) by (instance)", - "format": "time_series", - "hide": false, - "interval": "", - "intervalFactor": 2, - "legendFormat": "{{instance}} Peer Traffic Out", - "metric": "etcd_network_peer_sent_bytes_total", - "refId": "A", - "step": 4 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Peer Traffic Out", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "Bps", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "repeat": null, - "repeatIteration": null, - "repeatRowId": null, - "showTitle": false, - "title": "New row", - "titleSize": "h6" - }, - { - "collapse": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 0, - "id": 40, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(etcd_server_proposals_failed_total[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Proposal Failure Rate", - "metric": "etcd_server_proposals_failed_total", - "refId": "A", - "step": 2 - }, - { - "expr": "sum(etcd_server_proposals_pending)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Proposal Pending Total", - "metric": "etcd_server_proposals_pending", - "refId": "B", - "step": 2 - }, - { - "expr": "sum(rate(etcd_server_proposals_committed_total[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Proposal Commit Rate", - "metric": "etcd_server_proposals_committed_total", - "refId": "C", - "step": 2 - }, - { - "expr": "sum(rate(etcd_server_proposals_applied_total[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Proposal Apply Rate", - "refId": "D", - "step": 2 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Raft Proposals", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": "", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 6, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + { + "alias": "read", + "yaxis": 1 + }, + { + "alias": "io time", + "yaxis": 2 + } + ], + "spaceLength": 10, + "span": 9, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum by (instance) (rate(node_disk_bytes_read{instance=\"$server\"}[2m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "read", + "refId": "A" + }, + { + "expr": "sum by (instance) (rate(node_disk_bytes_written{instance=\"$server\"}[2m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "written", + "refId": "B" + }, + { + "expr": "sum by (instance) (rate(node_disk_io_time_ms{instance=\"$server\"}[2m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "io time", + "refId": "C" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Disk I/O", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "ms", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ] + }, + { + "cacheTimeout": null, + "colorBackground": false, + "colorValue": false, + "colors": [ + "rgba(50, 172, 45, 0.97)", + "rgba(237, 129, 40, 0.89)", + "rgba(245, 54, 54, 0.9)" + ], + "datasource": "prometheus", + "format": "percent", + "gauge": { + "maxValue": 100, + "minValue": 0, + "show": true, + "thresholdLabels": false, + "thresholdMarkers": true + }, + "gridPos": { + + }, + "id": 7, + "interval": null, + "links": [ + + ], + "mappingType": 1, + "mappingTypes": [ + { + "name": "value to text", + "value": 1 + }, + { + "name": "range to text", + "value": 2 + } + ], + "maxDataPoints": 100, + "nullPointMode": "connected", + "nullText": null, + "postfix": "", + "postfixFontSize": "50%", + "prefix": "", + "prefixFontSize": "50%", + "rangeMaps": [ + { + "from": "null", + "text": "N/A", + "to": "null" + } + ], + "span": 3, + "sparkline": { + "fillColor": "rgba(31, 118, 189, 0.18)", + "full": false, + "lineColor": "rgb(31, 120, 193)", + "show": false + }, + "tableColumn": "", + "targets": [ + { + "expr": "(sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) * 100", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "" + } + ], + "thresholds": "80, 90", + "title": "Disk Space Usage", + "type": "singlestat", + "valueFontSize": "80%", + "valueMaps": [ + { + "op": "=", + "text": "N/A", + "value": "null" + } + ], + "valueName": "current" + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "decimals": 0, - "editable": false, - "error": false, - "fill": 0, - "id": 19, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "max": false, - "min": false, - "rightSide": false, - "show": false, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "changes(etcd_server_leader_changes_seen_total[1d])", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "{{instance}} Total Leader Elections Per Day", - "metric": "etcd_server_leader_changes_seen_total", - "refId": "A", - "step": 2 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Total Leader Elections Per Day", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "buckets": null, - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 8, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "rate(node_network_receive_bytes{instance=\"$server\",device!\u007e\"lo\"}[5m])", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{device}}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Network Received", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ] + }, + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 9, + "legend": { + "alignAsTable": false, + "avg": false, + "current": false, + "max": false, + "min": false, + "rightSide": false, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 6, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "rate(node_network_transmit_bytes{instance=\"$server\",device!\u007e\"lo\"}[5m])", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{device}}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Network Transmitted", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": null, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" } - ], - "repeat": null, - "repeatIteration": null, - "repeatRowId": null, - "showTitle": false, - "title": "New row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-15m", - "to": "now" - }, - "timepicker": { - "now": true, - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "etcd", - "version": 4 + "schemaVersion": 14, + "style": "dark", + "tags": [ + + ], + "templating": { + "list": [ + { + "allValue": null, + "current": { + + }, + "datasource": "prometheus", + "hide": 0, + "includeAll": false, + "label": null, + "multi": false, + "name": "server", + "options": [ + + ], + "query": "label_values(node_boot_time, instance)", + "refresh": 2, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [ + + ], + "tagsQuery": "", + "type": "query", + "useTags": false + } + ] + }, + "time": { + "from": "now-1h", + "to": "now" + }, + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] + }, + "timezone": "browser", + "title": "Nodes", + "version": 0 } - kubernetes-capacity-planning-dashboard.json: |+ + pods-dashboard.json: |- { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "gnetId": 22, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ + "annotations": { + "list": [ + + ] + }, + "editable": false, + "gnetId": null, + "graphTooltip": 0, + "hideControls": false, + "id": null, + "links": [ + + ], + "refresh": "", + "rows": [ { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_cpu{mode=\"idle\"}[2m])) * 100", - "hide": false, - "intervalFactor": 10, - "legendFormat": "", - "refId": "A", - "step": 50 - } - ], - "title": "Idle CPU", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percent", - "label": "cpu usage", - "logBase": 1, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 2, + "legend": { + "alignAsTable": true, + "avg": true, + "current": true, + "max": false, + "min": false, + "rightSide": true, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum by(container_name) (container_memory_usage_bytes{namespace=\"$namespace\", pod_name=\"$pod\", container_name=\u007e\"$container\", container_name!=\"POD\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "Current: {{ container_name }}", + "refId": "A" + }, + { + "expr": "sum by(container) (kube_pod_container_resource_requests_memory_bytes{namespace=\"$namespace\", pod=\"$pod\", container=\u007e\"$container\", container!=\"POD\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "Requested: {{ container }}", + "refId": "B" + }, + { + "expr": "sum by(container) (kube_pod_container_resource_limits_memory_bytes{namespace=\"$namespace\", pod=\"$pod\", container=\u007e\"$container\", container!=\"POD\"})", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "Limit: {{ container }}", + "refId": "C" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Memory Usage", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 9, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(node_load1)", - "intervalFactor": 4, - "legendFormat": "load 1m", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "sum(node_load5)", - "intervalFactor": 4, - "legendFormat": "load 5m", - "refId": "B", - "step": 20, - "target": "" - }, - { - "expr": "sum(node_load15)", - "intervalFactor": 4, - "legendFormat": "load 15m", - "refId": "C", - "step": 20, - "target": "" - } - ], - "title": "System Load", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percentunit", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 3, + "legend": { + "alignAsTable": true, + "avg": true, + "current": true, + "max": false, + "min": false, + "rightSide": true, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sum by (container_name) (rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{ container_name }}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "CPU Usage", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "short", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" + }, { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 4, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": true, - "steppedLine": false, - "targets": [ - { - "expr": "sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)", - "intervalFactor": 2, - "legendFormat": "memory usage", - "metric": "memo", - "refId": "A", - "step": 10, - "target": "" + "collapse": false, + "collapsed": false, + "height": "250px", + "panels": [ + { + "aliasColors": { + + }, + "bars": false, + "dashLength": 10, + "dashes": false, + "datasource": "prometheus", + "fill": 1, + "gridPos": { + + }, + "id": 4, + "legend": { + "alignAsTable": true, + "avg": true, + "current": true, + "max": false, + "min": false, + "rightSide": true, + "show": true, + "total": false, + "values": false + }, + "lines": true, + "linewidth": 1, + "nullPointMode": "null", + "percentage": false, + "pointradius": 5, + "points": false, + "renderer": "flot", + "repeat": null, + "seriesOverrides": [ + + ], + "spaceLength": 10, + "span": 12, + "stack": false, + "steppedLine": false, + "targets": [ + { + "expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))", + "format": "time_series", + "intervalFactor": 2, + "legendFormat": "{{ pod_name }}", + "refId": "A" + } + ], + "thresholds": [ + + ], + "timeFrom": null, + "timeShift": null, + "title": "Network I/O", + "tooltip": { + "shared": true, + "sort": 0, + "value_type": "individual" + }, + "type": "graph", + "xaxis": { + "buckets": null, + "mode": "time", + "name": null, + "show": true, + "values": [ + + ] + }, + "yaxes": [ + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + }, + { + "format": "bytes", + "label": null, + "logBase": 1, + "max": null, + "min": 0, + "show": true + } + ] + } + ], + "repeat": null, + "repeatIteration": null, + "repeatRowId": null, + "showTitle": false, + "title": "Dashboard Row", + "titleSize": "h6", + "type": "row" + } + ], + "schemaVersion": 14, + "style": "dark", + "tags": [ + + ], + "templating": { + "list": [ + { + "allValue": null, + "current": { + + }, + "datasource": "prometheus", + "hide": 0, + "includeAll": false, + "label": "Namespace", + "multi": false, + "name": "namespace", + "options": [ + + ], + "query": "label_values(kube_pod_info, namespace)", + "refresh": 2, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [ + + ], + "tagsQuery": "", + "type": "query", + "useTags": false }, { - "expr": "sum(node_memory_Buffers)", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory buffers", - "metric": "memo", - "refId": "B", - "step": 10, - "target": "" + "allValue": null, + "current": { + + }, + "datasource": "prometheus", + "hide": 0, + "includeAll": false, + "label": "Pod", + "multi": false, + "name": "pod", + "options": [ + + ], + "query": "label_values(kube_pod_info{namespace=\u007e\"$namespace\"}, pod)", + "refresh": 2, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [ + + ], + "tagsQuery": "", + "type": "query", + "useTags": false }, { - "expr": "sum(node_memory_Cached)", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory cached", - "metric": "memo", - "refId": "C", - "step": 10, - "target": "" - }, - { - "expr": "sum(node_memory_MemFree)", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory free", - "metric": "memo", - "refId": "D", - "step": 10, - "target": "" - } - ], - "title": "Memory Usage", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "min": "0", - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100", - "intervalFactor": 2, - "metric": "", - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "80, 90", - "title": "Memory Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "246px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 6, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "read", - "yaxis": 1 - }, - { - "alias": "{instance=\"172.17.0.1:9100\"}", - "yaxis": 2 - }, - { - "alias": "io time", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_disk_bytes_read[5m]))", - "hide": false, - "intervalFactor": 4, - "legendFormat": "read", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "sum(rate(node_disk_bytes_written[5m]))", - "intervalFactor": 4, - "legendFormat": "written", - "refId": "B", - "step": 20 - }, - { - "expr": "sum(rate(node_disk_io_time_ms[5m]))", - "intervalFactor": 4, - "legendFormat": "io time", - "refId": "C", - "step": 20 - } - ], - "title": "Disk I/O", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "ms", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percentunit", - "gauge": { - "maxValue": 1, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 12, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})", - "intervalFactor": 2, - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "0.75, 0.9", - "title": "Disk Space Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 8, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_network_receive_bytes{device!~\"lo\"}[5m]))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 10, - "target": "" - } - ], - "title": "Network Received", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 10, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(node_network_transmit_bytes{device!~\"lo\"}[5m]))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "", - "refId": "B", - "step": 10, - "target": "" - } - ], - "title": "Network Transmitted", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "276px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 11, - "isNew": true, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 11, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(kube_pod_info)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Current number of Pods", - "refId": "A", - "step": 10 - }, - { - "expr": "sum(kube_node_status_capacity_pods)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Maximum capacity of pods", - "refId": "B", - "step": 10 - } - ], - "title": "Cluster Pod Utilization", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "80, 90", - "title": "Pod Utilization", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-1h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Capacity Planning", - "version": 4 - } - kubernetes-cluster-health-dashboard.json: |+ - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": "10s", - "rows": [ - { - "collapse": false, - "editable": false, - "height": "254px", - "panels": [ - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 1, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Control Plane Components Down", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "Everything UP and healthy", - "value": "null" - }, - { - "op": "=", - "text": "", - "value": "" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Alerts Firing", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(ALERTS{alertstate=\"pending\",alertname!=\"DeadMansSwitch\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "3, 5", - "title": "Alerts Pending", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "count(increase(kube_pod_container_status_restarts[1h]) > 5)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Crashlooping Pods", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_status_condition{condition=\"Ready\",status!=\"true\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Node Not Ready", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 6, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_status_condition{condition=\"DiskPressure\",status=\"true\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Node Disk Pressure", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_status_condition{condition=\"MemoryPressure\",status=\"true\"})", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Node Memory Pressure", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 8, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(kube_node_spec_unschedulable)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Nodes Unschedulable", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Cluster Health", - "version": 9 - } - kubernetes-cluster-status-dashboard.json: |+ - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "rows": [ - { - "collapse": false, - "editable": false, - "height": "129px", - "panels": [ - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 6, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(up{job=~\"apiserver|kube-scheduler|kube-controller-manager\"} == 0)", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Control Plane UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "UP", - "value": "null" - } - ], - "valueName": "total" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 6, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 6, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(ALERTS{alertstate=\"firing\",alertname!=\"DeadMansSwitch\"})", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "3, 5", - "title": "Alerts Firing", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": true, - "title": "Cluster Health", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "168px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 1, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"apiserver\"} == 1) / count(up{job=\"apiserver\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "API Servers UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / count(up{job=\"kube-controller-manager\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Controller Managers UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / count(up{job=\"kube-scheduler\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Schedulers UP", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - }, - { - "colorBackground": false, - "colorValue": true, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "count(increase(kube_pod_container_status_restarts{namespace=~\"kube-system|tectonic-system\"}[1h]) > 5)", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "1, 3", - "title": "Crashlooping Control Plane Pods", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": true, - "title": "Control Plane Status", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "158px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 8, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "sum(100 - (avg by (instance) (rate(node_cpu{job=\"node-exporter\",mode=\"idle\"}[5m])) * 100)) / count(node_cpu{job=\"node-exporter\",mode=\"idle\"})", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "CPU Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "((sum(node_memory_MemTotal) - sum(node_memory_MemFree) - sum(node_memory_Buffers) - sum(node_memory_Cached)) / sum(node_memory_MemTotal)) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "Memory Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 9, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(node_filesystem_size{device!=\"rootfs\"}) - sum(node_filesystem_free{device!=\"rootfs\"})) / sum(node_filesystem_size{device!=\"rootfs\"})", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "Filesystem Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 10, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "100 - (sum(kube_node_status_capacity_pods) - sum(kube_pod_info)) / sum(kube_node_status_capacity_pods) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "80, 90", - "title": "Pod Utilization", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": true, - "title": "Capacity Planning", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Cluster Status", - "version": 3 - } - kubernetes-control-plane-status-dashboard.json: |+ - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "rows": [ - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 1, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"apiserver\"} == 1) / sum(up{job=\"apiserver\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "API Servers UP", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-controller-manager\"} == 1) / sum(up{job=\"kube-controller-manager\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Controller Managers UP", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(up{job=\"kube-scheduler\"} == 1) / sum(up{job=\"kube-scheduler\"})) * 100", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "thresholds": "50, 80", - "title": "Schedulers UP", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(sum by(instance) (rate(apiserver_request_count{code=~\"5..\"}[5m])) / sum by(instance) (rate(apiserver_request_count[5m]))) * 100", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 600 - } - ], - "thresholds": "5, 10", - "title": "API Server Request Error Rate", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "0", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 7, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by(verb) (rate(apiserver_latency_seconds:quantile[5m]) >= 0)", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 30 - } - ], - "title": "API Server Request Latency", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 5, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "cluster:scheduler_e2e_scheduling_latency_seconds:quantile", - "format": "time_series", - "intervalFactor": 2, - "refId": "A", - "step": 60 - } - ], - "title": "End to End Scheduling Latency", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "dtdurations", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 6, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by(instance) (rate(apiserver_request_count{code!~\"2..\"}[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Error Rate", - "refId": "A", - "step": 60 - }, - { - "expr": "sum by(instance) (rate(apiserver_request_count[5m]))", - "format": "time_series", - "intervalFactor": 2, - "legendFormat": "Request Rate", - "refId": "B", - "step": 60 - } - ], - "title": "API Server Request Rates", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Control Plane Status", - "version": 3 - } - kubernetes-resource-requests-dashboard.json: |+ - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": false, - "height": "300px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "description": "This represents the total [CPU resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu) in the cluster.\nFor comparison the total [allocatable CPU cores](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 1, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "min(sum(kube_node_status_allocatable_cpu_cores) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Allocatable CPU Cores", - "refId": "A", - "step": 20 - }, - { - "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Requested CPU Cores", - "refId": "B", - "step": 20 - } - ], - "title": "CPU Cores", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": "CPU Cores", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "max(sum(kube_pod_container_resource_requests_cpu_cores) by (instance)) / min(sum(kube_node_status_allocatable_cpu_cores) by (instance)) * 100", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 240 - } - ], - "thresholds": "80, 90", - "title": "CPU Cores", - "transparent": false, - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "CPU Cores", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "300px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "description": "This represents the total [memory resource requests](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-memory) in the cluster.\nFor comparison the total [allocatable memory](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node-allocatable.md) is also shown.", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "null", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "min(sum(kube_node_status_allocatable_memory_bytes) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Allocatable Memory", - "refId": "A", - "step": 20 - }, - { - "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance))", - "hide": false, - "intervalFactor": 2, - "legendFormat": "Requested Memory", - "refId": "B", - "step": 20 - } - ], - "title": "Memory", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "label": "Memory", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 4, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "max(sum(kube_pod_container_resource_requests_memory_bytes) by (instance)) / min(sum(kube_node_status_allocatable_memory_bytes) by (instance)) * 100", - "intervalFactor": 2, - "legendFormat": "", - "refId": "A", - "step": 240 - } - ], - "thresholds": "80, 90", - "title": "Memory", - "transparent": false, - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Memory", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [] - }, - "time": { - "from": "now-3h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Kubernetes Resource Requests", - "version": 2 - } - nodes-dashboard.json: |+ - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "description": "Dashboard to get an overview of one server", - "editable": false, - "gnetId": 22, - "graphTooltip": 0, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "100 - (avg by (cpu) (irate(node_cpu{mode=\"idle\", instance=\"$server\"}[5m])) * 100)", - "hide": false, - "intervalFactor": 10, - "legendFormat": "{{cpu}}", - "refId": "A", - "step": 50 - } - ], - "title": "Idle CPU", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percent", - "label": "cpu usage", - "logBase": 1, - "max": 100, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 9, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "node_load1{instance=\"$server\"}", - "intervalFactor": 4, - "legendFormat": "load 1m", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "node_load5{instance=\"$server\"}", - "intervalFactor": 4, - "legendFormat": "load 5m", - "refId": "B", - "step": 20, - "target": "" - }, - { - "expr": "node_load15{instance=\"$server\"}", - "intervalFactor": 4, - "legendFormat": "load 15m", - "refId": "C", - "step": 20, - "target": "" - } - ], - "title": "System Load", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "percentunit", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 4, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "node_memory_SwapFree{instance=\"172.17.0.1:9100\",job=\"prometheus\"}", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": true, - "steppedLine": false, - "targets": [ - { - "expr": "node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}", - "hide": false, - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory used", - "metric": "", - "refId": "C", - "step": 10 - }, - { - "expr": "node_memory_Buffers{instance=\"$server\"}", - "interval": "", - "intervalFactor": 2, - "legendFormat": "memory buffers", - "metric": "", - "refId": "E", - "step": 10 - }, - { - "expr": "node_memory_Cached{instance=\"$server\"}", - "intervalFactor": 2, - "legendFormat": "memory cached", - "metric": "", - "refId": "F", - "step": 10 - }, - { - "expr": "node_memory_MemFree{instance=\"$server\"}", - "intervalFactor": 2, - "legendFormat": "memory free", - "metric": "", - "refId": "D", - "step": 10 - } - ], - "title": "Memory Usage", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "min": "0", - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percent", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "((node_memory_MemTotal{instance=\"$server\"} - node_memory_MemFree{instance=\"$server\"} - node_memory_Buffers{instance=\"$server\"} - node_memory_Cached{instance=\"$server\"}) / node_memory_MemTotal{instance=\"$server\"}) * 100", - "intervalFactor": 2, - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "80, 90", - "title": "Memory Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 6, - "isNew": true, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "read", - "yaxis": 1 - }, - { - "alias": "{instance=\"172.17.0.1:9100\"}", - "yaxis": 2 - }, - { - "alias": "io time", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 9, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by (instance) (rate(node_disk_bytes_read{instance=\"$server\"}[2m]))", - "hide": false, - "intervalFactor": 4, - "legendFormat": "read", - "refId": "A", - "step": 20, - "target": "" - }, - { - "expr": "sum by (instance) (rate(node_disk_bytes_written{instance=\"$server\"}[2m]))", - "intervalFactor": 4, - "legendFormat": "written", - "refId": "B", - "step": 20 - }, - { - "expr": "sum by (instance) (rate(node_disk_io_time_ms{instance=\"$server\"}[2m]))", - "intervalFactor": 4, - "legendFormat": "io time", - "refId": "C", - "step": 20 - } - ], - "title": "Disk I/O", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "ms", - "logBase": 1, - "show": true - } - ] - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(50, 172, 45, 0.97)", - "rgba(237, 129, 40, 0.89)", - "rgba(245, 54, 54, 0.9)" - ], - "datasource": "prometheus", - "editable": false, - "format": "percentunit", - "gauge": { - "maxValue": 1, - "minValue": 0, - "show": true, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "hideTimeOverride": false, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "(sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"}) - sum(node_filesystem_free{device!=\"rootfs\",instance=\"$server\"})) / sum(node_filesystem_size{device!=\"rootfs\",instance=\"$server\"})", - "intervalFactor": 2, - "refId": "A", - "step": 60, - "target": "" - } - ], - "thresholds": "0.75, 0.9", - "title": "Disk Space Usage", - "transparent": false, - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "current" - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 8, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "rate(node_network_receive_bytes{instance=\"$server\",device!~\"lo\"}[5m])", - "hide": false, - "intervalFactor": 2, - "legendFormat": "{{device}}", - "refId": "A", - "step": 10, - "target": "" - } - ], - "title": "Network Received", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 10, - "isNew": false, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [ - { - "alias": "transmitted", - "yaxis": 2 - } - ], - "spaceLength": 10, - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "rate(node_network_transmit_bytes{instance=\"$server\",device!~\"lo\"}[5m])", - "hide": false, - "intervalFactor": 2, - "legendFormat": "{{device}}", - "refId": "B", - "step": 10, - "target": "" - } - ], - "title": "Network Transmitted", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "bytes", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [ - { - "allValue": null, - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": false, - "label": null, - "multi": false, - "name": "server", - "options": [], - "query": "label_values(node_boot_time, instance)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - } - ] - }, - "time": { - "from": "now-1h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Nodes", - "version": 2 - } - pods-dashboard.json: |+ - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "graphTooltip": 1, - "hideControls": false, - "links": [], - "refresh": false, - "rows": [ - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 1, - "isNew": false, - "legend": { - "alignAsTable": true, - "avg": true, - "current": true, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by(container_name) (container_memory_usage_bytes{pod_name=\"$pod\", container_name=~\"$container\", container_name!=\"POD\"})", - "interval": "10s", - "intervalFactor": 1, - "legendFormat": "Current: {{ container_name }}", - "metric": "container_memory_usage_bytes", - "refId": "A", - "step": 15 - }, - { - "expr": "kube_pod_container_resource_requests_memory_bytes{pod=\"$pod\", container=~\"$container\"}", - "interval": "10s", - "intervalFactor": 2, - "legendFormat": "Requested: {{ container }}", - "metric": "kube_pod_container_resource_requests_memory_bytes", - "refId": "B", - "step": 20 - }, - { - "expr": "kube_pod_container_resource_limits_memory_bytes{pod=\"$pod\", container=~\"$container\"}", - "interval": "10s", - "intervalFactor": 2, - "legendFormat": "Limit: {{ container }}", - "metric": "kube_pod_container_resource_limits_memory_bytes", - "refId": "C", - "step": 20 - } - ], - "title": "Memory Usage", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" - }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 2, - "isNew": false, - "legend": { - "alignAsTable": true, - "avg": true, - "current": true, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by (container_name)(rate(container_cpu_usage_seconds_total{image!=\"\",container_name!=\"POD\",pod_name=\"$pod\"}[1m]))", - "intervalFactor": 2, - "legendFormat": "{{ container_name }}", - "refId": "A", - "step": 30 - }, - { - "expr": "kube_pod_container_resource_requests_cpu_cores{pod=\"$pod\", container=~\"$container\"}", - "interval": "10s", - "intervalFactor": 2, - "legendFormat": "Requested: {{ container }}", - "metric": "kube_pod_container_resource_requests_cpu_cores", - "refId": "B", - "step": 20 - }, - { - "expr": "kube_pod_container_resource_limits_cpu_cores{pod=\"$pod\", container=~\"$container\"}", - "interval": "10s", - "intervalFactor": 2, - "legendFormat": "Limit: {{ container }}", - "metric": "kube_pod_container_resource_limits_memory_bytes", - "refId": "C", - "step": 20 - } - ], - "title": "CPU Usage", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "Row", - "titleSize": "h6" + "allValue": null, + "current": { + + }, + "datasource": "prometheus", + "hide": 0, + "includeAll": true, + "label": "Container", + "multi": false, + "name": "container", + "options": [ + + ], + "query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)", + "refresh": 2, + "regex": "", + "sort": 0, + "tagValuesQuery": "", + "tags": [ + + ], + "tagsQuery": "", + "type": "query", + "useTags": false + } + ] }, - { - "collapse": false, - "editable": false, - "height": "250px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 3, - "isNew": false, - "legend": { - "alignAsTable": true, - "avg": true, - "current": true, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": true, - "show": true, - "total": false, - "values": true - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sort_desc(sum by (pod_name) (rate(container_network_receive_bytes_total{pod_name=\"$pod\"}[1m])))", - "intervalFactor": 2, - "legendFormat": "{{ pod_name }}", - "refId": "A", - "step": 30 - } - ], - "title": "Network I/O", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "bytes", - "logBase": 1, - "show": true - }, - { - "format": "short", - "logBase": 1, - "show": true - } - ] - } - ], - "showTitle": false, - "title": "New Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [ - { - "allValue": ".*", - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": true, - "label": "Namespace", - "multi": false, - "name": "namespace", - "options": [], - "query": "label_values(kube_pod_info, namespace)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - }, - { - "allValue": null, - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": false, - "label": "Pod", - "multi": false, - "name": "pod", - "options": [], - "query": "label_values(kube_pod_info{namespace=~\"$namespace\"}, pod)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - }, - { - "allValue": ".*", - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": true, - "label": "Container", - "multi": false, - "name": "container", - "options": [], - "query": "label_values(kube_pod_container_info{namespace=\"$namespace\", pod=\"$pod\"}, container)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - } - ] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "Pods", - "version": 1 - } - statefulset-dashboard.json: |+ - { - "__inputs": [ - { - "description": "", - "label": "prometheus", - "name": "prometheus", - "pluginId": "prometheus", - "pluginName": "Prometheus", - "type": "datasource" - } - ], - "annotations": { - "list": [] - }, - "editable": false, - "graphTooltip": 1, - "hideControls": false, - "links": [], - "rows": [ - { - "collapse": false, - "editable": false, - "height": "200px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 8, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "cores", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "CPU", - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 9, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "GB", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "80%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(container_memory_usage_bytes{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}) / 1024^3", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Memory", - "type": "singlestat", - "valueFontSize": "110%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "Bps", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": false - }, - "id": 7, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfix": "", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 4, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": true - }, - "targets": [ - { - "expr": "sum(rate(container_network_transmit_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m])) + sum(rate(container_network_receive_bytes_total{namespace=\"$statefulset_namespace\",pod_name=~\"$statefulset_name.*\"}[3m]))", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Network", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" + "time": { + "from": "now-1h", + "to": "now" }, - { - "collapse": false, - "editable": false, - "height": "100px", - "panels": [ - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": false - }, - "id": 5, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "metric": "kube_statefulset_replicas", - "refId": "A", - "step": 600 - } - ], - "title": "Desired Replicas", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 6, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Available Replicas", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 3, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_statefulset_status_observed_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Observed Generation", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - }, - { - "colorBackground": false, - "colorValue": false, - "colors": [ - "rgba(245, 54, 54, 0.9)", - "rgba(237, 129, 40, 0.89)", - "rgba(50, 172, 45, 0.97)" - ], - "datasource": "prometheus", - "editable": false, - "format": "none", - "gauge": { - "maxValue": 100, - "minValue": 0, - "show": false, - "thresholdLabels": false, - "thresholdMarkers": true - }, - "id": 2, - "links": [], - "mappingType": 1, - "mappingTypes": [ - { - "name": "value to text", - "value": 1 - }, - { - "name": "range to text", - "value": 2 - } - ], - "maxDataPoints": 100, - "nullPointMode": "connected", - "postfixFontSize": "50%", - "prefix": "", - "prefixFontSize": "50%", - "rangeMaps": [ - { - "from": "null", - "text": "N/A", - "to": "null" - } - ], - "span": 3, - "sparkline": { - "fillColor": "rgba(31, 118, 189, 0.18)", - "full": false, - "lineColor": "rgb(31, 120, 193)", - "show": false - }, - "targets": [ - { - "expr": "max(kube_statefulset_metadata_generation{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "refId": "A", - "step": 600 - } - ], - "title": "Metadata Generation", - "type": "singlestat", - "valueFontSize": "80%", - "valueMaps": [ - { - "op": "=", - "text": "N/A", - "value": "null" - } - ], - "valueName": "avg" - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" + "timepicker": { + "refresh_intervals": [ + "5s", + "10s", + "30s", + "1m", + "5m", + "15m", + "30m", + "1h", + "2h", + "1d" + ], + "time_options": [ + "5m", + "15m", + "1h", + "6h", + "12h", + "24h", + "2d", + "7d", + "30d" + ] }, - { - "collapse": false, - "editable": false, - "height": "350px", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "dashLength": 10, - "dashes": false, - "datasource": "prometheus", - "editable": false, - "error": false, - "fill": 1, - "grid": { - "threshold1Color": "rgba(216, 200, 27, 0.27)", - "threshold2Color": "rgba(234, 112, 112, 0.22)" - }, - "id": 1, - "isNew": true, - "legend": { - "alignAsTable": false, - "avg": false, - "current": false, - "hideEmpty": false, - "hideZero": false, - "max": false, - "min": false, - "rightSide": false, - "show": true, - "total": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "spaceLength": 10, - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "min(kube_statefulset_status_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "available", - "refId": "B", - "step": 30 - }, - { - "expr": "max(kube_statefulset_replicas{statefulset=\"$statefulset_name\",namespace=\"$statefulset_namespace\"}) without (instance, pod)", - "intervalFactor": 2, - "legendFormat": "desired", - "refId": "E", - "step": 30 - } - ], - "title": "Replicas", - "tooltip": { - "msResolution": true, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "none", - "label": "", - "logBase": 1, - "show": true - }, - { - "format": "short", - "label": "", - "logBase": 1, - "show": false - } - ] - } - ], - "showTitle": false, - "title": "Dashboard Row", - "titleSize": "h6" - } - ], - "schemaVersion": 14, - "sharedCrosshair": false, - "style": "dark", - "tags": [], - "templating": { - "list": [ - { - "allValue": ".*", - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": false, - "label": "Namespace", - "multi": false, - "name": "statefulset_namespace", - "options": [], - "query": "label_values(kube_statefulset_metadata_generation, namespace)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": null, - "tags": [], - "tagsQuery": "", - "type": "query", - "useTags": false - }, - { - "allValue": null, - "current": {}, - "datasource": "prometheus", - "hide": 0, - "includeAll": false, - "label": "StatefulSet", - "multi": false, - "name": "statefulset_name", - "options": [], - "query": "label_values(kube_statefulset_metadata_generation{namespace=\"$statefulset_namespace\"}, statefulset)", - "refresh": 1, - "regex": "", - "sort": 0, - "tagValuesQuery": "", - "tags": [], - "tagsQuery": "statefulset", - "type": "query", - "useTags": false - } - ] - }, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "timezone": "browser", - "title": "StatefulSet", - "version": 1 + "timezone": "browser", + "title": "Pods", + "version": 0 } ---- +kind: ConfigMap +metadata: + name: grafana-dashboard-definitions + namespace: monitoring diff --git a/manifests/grafana/grafana-dashboard-sources.yaml b/manifests/grafana/grafana-dashboard-sources.yaml new file mode 100644 index 0000000000000000000000000000000000000000..61fdcf611110388a60708078701e6cfddfb54ebd --- /dev/null +++ b/manifests/grafana/grafana-dashboard-sources.yaml @@ -0,0 +1,18 @@ +apiVersion: v1 +data: + dashboards.yaml: |- + [ + { + "folder": "", + "name": "0", + "options": { + "path": "/grafana-dashboard-definitions/0" + }, + "org_id": 1, + "type": "file" + } + ] +kind: ConfigMap +metadata: + name: grafana-dashboards + namespace: monitoring diff --git a/manifests/grafana/grafana-dashboards.yaml b/manifests/grafana/grafana-dashboards.yaml deleted file mode 100644 index 772d3f64902f652365e4b7b2714f483cdde42b6e..0000000000000000000000000000000000000000 --- a/manifests/grafana/grafana-dashboards.yaml +++ /dev/null @@ -1,12 +0,0 @@ -apiVersion: v1 -kind: ConfigMap -metadata: - name: grafana-dashboards -data: - dashboards.yaml: |+ - - name: '0' - org_id: 1 - folder: '' - type: file - options: - folder: /grafana-dashboard-definitions/0 diff --git a/manifests/grafana/grafana-datasources.yaml b/manifests/grafana/grafana-datasources.yaml index 33c3081f7497e91daba85b737908c8e30314adec..5ed25a025c4f3247985dac8b4736321e5ddebcb8 100644 --- a/manifests/grafana/grafana-datasources.yaml +++ b/manifests/grafana/grafana-datasources.yaml @@ -1,15 +1,20 @@ apiVersion: v1 +data: + prometheus.yaml: |- + { + "datasources": [ + { + "access": "proxy", + "etitable": false, + "name": "prometheus", + "org_id": 1, + "type": "prometheus", + "url": "http://prometheus-k8s.monitoring.svc:9090", + "version": 1 + } + ] + } kind: ConfigMap metadata: name: grafana-datasources -data: - prometheus.yaml: |+ - datasources: - - name: prometheus - type: prometheus - access: proxy - org_id: 1 - url: http://prometheus-k8s.monitoring.svc:9090 - version: 1 - editable: false - + namespace: monitoring diff --git a/manifests/grafana/grafana-deployment.yaml b/manifests/grafana/grafana-deployment.yaml index 9eb8750fa024ac446b633066bbf8c2643c7756d9..9d7ae88f09fd855cc40f13fb031539adaa54df40 100644 --- a/manifests/grafana/grafana-deployment.yaml +++ b/manifests/grafana/grafana-deployment.yaml @@ -1,48 +1,59 @@ -apiVersion: apps/v1beta1 +apiVersion: apps/v1beta2 kind: Deployment metadata: + labels: + app: grafana name: grafana + namespace: monitoring spec: replicas: 1 + selector: + matchLabels: + app: grafana template: metadata: labels: app: grafana spec: - securityContext: - runAsNonRoot: true - runAsUser: 65534 containers: - - name: grafana - image: quay.io/coreos/monitoring-grafana:5.0.3 - volumeMounts: - - name: grafana-storage - mountPath: /data - - name: grafana-datasources - mountPath: /grafana/conf/provisioning/datasources - - name: grafana-dashboards - mountPath: /grafana/conf/provisioning/dashboards - - name: grafana-dashboard-definitions-0 - mountPath: /grafana-dashboard-definitions/0 + - image: quay.io/coreos/monitoring-grafana:5.0.3 + name: grafana ports: - - name: web - containerPort: 3000 + - containerPort: 3000 + name: http resources: - requests: - memory: 100Mi - cpu: 100m limits: - memory: 200Mi cpu: 200m + memory: 200Mi + requests: + cpu: 100m + memory: 100Mi + volumeMounts: + - mountPath: /data + name: grafana-storage + readOnly: false + - mountPath: /grafana/conf/provisioning/datasources + name: grafana-datasources + readOnly: false + - mountPath: /grafana/conf/provisioning/dashboards + name: grafana-dashboards + readOnly: false + - mountPath: /grafana-dashboard-definitions/0 + name: grafana-dashboard-definitions + readOnly: false + securityContext: + runAsNonRoot: true + runAsUser: 65534 + serviceAccountName: grafana volumes: - - name: grafana-storage - emptyDir: {} - - name: grafana-datasources - configMap: + - emptyDir: {} + name: grafana-storage + - configMap: name: grafana-datasources - - name: grafana-dashboards - configMap: + name: grafana-datasources + - configMap: name: grafana-dashboards - - name: grafana-dashboard-definitions-0 - configMap: - name: grafana-dashboard-definitions-0 + name: grafana-dashboards + - configMap: + name: grafana-dashboard-definitions + name: grafana-dashboard-definitions diff --git a/manifests/grafana/grafana-service-account.yaml b/manifests/grafana/grafana-service-account.yaml new file mode 100644 index 0000000000000000000000000000000000000000..3ed3e031e7445fc347e27f8f7134d70c384f0be3 --- /dev/null +++ b/manifests/grafana/grafana-service-account.yaml @@ -0,0 +1,5 @@ +apiVersion: v1 +kind: ServiceAccount +metadata: + name: grafana + namespace: monitoring diff --git a/manifests/grafana/grafana-service.yaml b/manifests/grafana/grafana-service.yaml index fbcee40dcb2b0ca0020c410c3bfdabd4ec21911a..45f77a0dca1d20f1051fc4c19053b7afc9fb5b19 100644 --- a/manifests/grafana/grafana-service.yaml +++ b/manifests/grafana/grafana-service.yaml @@ -2,14 +2,11 @@ apiVersion: v1 kind: Service metadata: name: grafana - labels: - app: grafana + namespace: monitoring spec: - type: NodePort ports: - - port: 3000 - protocol: TCP - nodePort: 30902 - targetPort: web + - name: http + port: 3000 + targetPort: http selector: app: grafana diff --git a/manifests/k8s/kubeadm/kube-controller-manager.yaml b/manifests/k8s/kubeadm/kube-controller-manager.yaml deleted file mode 100644 index bd8d7cb5a975a8d709f068f551a65360b97b0631..0000000000000000000000000000000000000000 --- a/manifests/k8s/kubeadm/kube-controller-manager.yaml +++ /dev/null @@ -1,17 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - namespace: kube-system - name: kube-controller-manager-prometheus-discovery - labels: - k8s-app: kube-controller-manager -spec: - selector: - component: kube-controller-manager - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics - port: 10252 - targetPort: 10252 - protocol: TCP diff --git a/manifests/k8s/kubeadm/kube-scheduler.yaml b/manifests/k8s/kubeadm/kube-scheduler.yaml deleted file mode 100644 index 2d90097a9ba63a45ac36a2bdf8d38bbb748e87ba..0000000000000000000000000000000000000000 --- a/manifests/k8s/kubeadm/kube-scheduler.yaml +++ /dev/null @@ -1,17 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - namespace: kube-system - name: kube-scheduler-prometheus-discovery - labels: - k8s-app: kube-scheduler -spec: - selector: - component: kube-scheduler - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics - port: 10251 - targetPort: 10251 - protocol: TCP diff --git a/manifests/k8s/self-hosted/kube-controller-manager.yaml b/manifests/k8s/self-hosted/kube-controller-manager.yaml deleted file mode 100644 index a2983101ab87a56a1aad5a3e72304b8ae544b23f..0000000000000000000000000000000000000000 --- a/manifests/k8s/self-hosted/kube-controller-manager.yaml +++ /dev/null @@ -1,17 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - namespace: kube-system - name: kube-controller-manager-prometheus-discovery - labels: - k8s-app: kube-controller-manager -spec: - selector: - k8s-app: kube-controller-manager - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics - port: 10252 - targetPort: 10252 - protocol: TCP diff --git a/manifests/k8s/self-hosted/kube-dns.yaml b/manifests/k8s/self-hosted/kube-dns.yaml deleted file mode 100644 index e032771489b144e695aefe9a4b1b26dd39bdc91d..0000000000000000000000000000000000000000 --- a/manifests/k8s/self-hosted/kube-dns.yaml +++ /dev/null @@ -1,21 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - namespace: kube-system - name: kube-dns-prometheus-discovery - labels: - k8s-app: kube-dns -spec: - selector: - k8s-app: kube-dns - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics-skydns - port: 10055 - targetPort: 10055 - protocol: TCP - - name: http-metrics-dnsmasq - port: 10054 - targetPort: 10054 - protocol: TCP diff --git a/manifests/k8s/self-hosted/kube-scheduler.yaml b/manifests/k8s/self-hosted/kube-scheduler.yaml deleted file mode 100644 index 0fe05dd74a339f4d152e3b15cf7035322f42e975..0000000000000000000000000000000000000000 --- a/manifests/k8s/self-hosted/kube-scheduler.yaml +++ /dev/null @@ -1,17 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - namespace: kube-system - name: kube-scheduler-prometheus-discovery - labels: - k8s-app: kube-scheduler -spec: - selector: - k8s-app: kube-scheduler - type: ClusterIP - clusterIP: None - ports: - - name: http-metrics - port: 10251 - targetPort: 10251 - protocol: TCP diff --git a/manifests/kube-state-metrics/kube-state-metrics-cluster-role-binding.yaml b/manifests/kube-state-metrics/kube-state-metrics-cluster-role-binding.yaml index 8284fc151317205c95baf4c62add202ebd8e22ea..9a8f3111abb8f0f418960ff0cdd56aaf95037076 100644 --- a/manifests/kube-state-metrics/kube-state-metrics-cluster-role-binding.yaml +++ b/manifests/kube-state-metrics/kube-state-metrics-cluster-role-binding.yaml @@ -1,4 +1,4 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 +apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: kube-state-metrics diff --git a/manifests/kube-state-metrics/kube-state-metrics-cluster-role.yaml b/manifests/kube-state-metrics/kube-state-metrics-cluster-role.yaml index ef5e91ac61b225f148413fca0151bdc51e123286..cae184834cfff42306b4d56fff6cff4004267ba8 100644 --- a/manifests/kube-state-metrics/kube-state-metrics-cluster-role.yaml +++ b/manifests/kube-state-metrics/kube-state-metrics-cluster-role.yaml @@ -1,10 +1,13 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 +apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kube-state-metrics rules: -- apiGroups: [""] +- apiGroups: + - "" resources: + - configmaps + - secrets - nodes - pods - services @@ -15,31 +18,49 @@ rules: - persistentvolumes - namespaces - endpoints - verbs: ["list", "watch"] -- apiGroups: ["extensions"] + verbs: + - list + - watch +- apiGroups: + - extensions resources: - daemonsets - deployments - replicasets - verbs: ["list", "watch"] -- apiGroups: ["apps"] + verbs: + - list + - watch +- apiGroups: + - apps resources: - statefulsets - verbs: ["list", "watch"] -- apiGroups: ["batch"] + verbs: + - list + - watch +- apiGroups: + - batch resources: - cronjobs - jobs - verbs: ["list", "watch"] -- apiGroups: ["autoscaling"] + verbs: + - list + - watch +- apiGroups: + - autoscaling resources: - horizontalpodautoscalers - verbs: ["list", "watch"] -- apiGroups: ["authentication.k8s.io"] + verbs: + - list + - watch +- apiGroups: + - authentication.k8s.io resources: - tokenreviews - verbs: ["create"] -- apiGroups: ["authorization.k8s.io"] + verbs: + - create +- apiGroups: + - authorization.k8s.io resources: - subjectaccessreviews - verbs: ["create"] \ No newline at end of file + verbs: + - create diff --git a/manifests/kube-state-metrics/kube-state-metrics-deployment.yaml b/manifests/kube-state-metrics/kube-state-metrics-deployment.yaml index 61f918eb57d638de82466d7fbb9b64fdc1adb85d..bd6d9475db3006b31c5cbab509f49469ced3d3fc 100644 --- a/manifests/kube-state-metrics/kube-state-metrics-deployment.yaml +++ b/manifests/kube-state-metrics/kube-state-metrics-deployment.yaml @@ -1,80 +1,95 @@ -apiVersion: extensions/v1beta1 +apiVersion: apps/v1beta2 kind: Deployment metadata: + labels: + app: kube-state-metrics name: kube-state-metrics + namespace: monitoring spec: replicas: 1 + selector: + matchLabels: + app: kube-state-metrics template: metadata: labels: app: kube-state-metrics spec: - serviceAccountName: kube-state-metrics - securityContext: - runAsNonRoot: true - runAsUser: 65534 containers: - - name: kube-rbac-proxy-main - image: quay.io/brancz/kube-rbac-proxy:v0.2.0 - args: - - "--secure-listen-address=:8443" - - "--upstream=http://127.0.0.1:8081/" + - args: + - --secure-listen-address=:8443 + - --upstream=http://127.0.0.1:8081/ + image: quay.io/coreos/kube-rbac-proxy:v0.3.0 + name: kube-rbac-proxy-main ports: - - name: https-main - containerPort: 8443 + - containerPort: 8443 + name: https-main resources: - requests: - memory: 20Mi - cpu: 10m limits: - memory: 40Mi cpu: 20m - - name: kube-rbac-proxy-self - image: quay.io/brancz/kube-rbac-proxy:v0.2.0 - args: - - "--secure-listen-address=:9443" - - "--upstream=http://127.0.0.1:8082/" + memory: 40Mi + requests: + cpu: 10m + memory: 20Mi + - args: + - --secure-listen-address=:9443 + - --upstream=http://127.0.0.1:8082/ + image: quay.io/coreos/kube-rbac-proxy:v0.3.0 + name: kube-rbac-proxy-self ports: - - name: https-self - containerPort: 9443 + - containerPort: 9443 + name: https-self resources: + limits: + cpu: 20m + memory: 40Mi requests: - memory: 20Mi cpu: 10m + memory: 20Mi + - args: + - --host=127.0.0.1 + - --port=8081 + - --telemetry-host=127.0.0.1 + - --telemetry-port=8082 + image: quay.io/coreos/kube-state-metrics:v1.3.0 + name: kube-state-metrics + resources: limits: - memory: 40Mi - cpu: 20m - - name: kube-state-metrics - image: quay.io/coreos/kube-state-metrics:v1.2.0 - args: - - "--host=127.0.0.1" - - "--port=8081" - - "--telemetry-host=127.0.0.1" - - "--telemetry-port=8082" - - name: addon-resizer - image: gcr.io/google_containers/addon-resizer:1.0 + cpu: 102m + memory: 180Mi + requests: + cpu: 102m + memory: 180Mi + - command: + - /pod_nanny + - --container=kube-state-metrics + - --cpu=100m + - --extra-cpu=2m + - --memory=150Mi + - --extra-memory=30Mi + - --threshold=5 + - --deployment=kube-state-metrics + env: + - name: MY_POD_NAME + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.name + - name: MY_POD_NAMESPACE + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: metadata.namespace + image: quay.io/coreos/addon-resizer:1.0 + name: addon-resizer resources: limits: - cpu: 100m + cpu: 10m memory: 30Mi requests: - cpu: 100m + cpu: 10m memory: 30Mi - env: - - name: MY_POD_NAME - valueFrom: - fieldRef: - fieldPath: metadata.name - - name: MY_POD_NAMESPACE - valueFrom: - fieldRef: - fieldPath: metadata.namespace - command: - - /pod_nanny - - --container=kube-state-metrics - - --cpu=100m - - --extra-cpu=2m - - --memory=150Mi - - --extra-memory=30Mi - - --threshold=5 - - --deployment=kube-state-metrics + securityContext: + runAsNonRoot: true + runAsUser: 65534 + serviceAccountName: kube-state-metrics diff --git a/manifests/kube-state-metrics/kube-state-metrics-role-binding.yaml b/manifests/kube-state-metrics/kube-state-metrics-role-binding.yaml index a93c396505f49e6b9b1ca545d9342a767b294b79..dcad8055aba90e8ad0f12ecf2ac6d8de7e39ff01 100644 --- a/manifests/kube-state-metrics/kube-state-metrics-role-binding.yaml +++ b/manifests/kube-state-metrics/kube-state-metrics-role-binding.yaml @@ -1,12 +1,12 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 +apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kube-state-metrics + namespace: monitoring roleRef: apiGroup: rbac.authorization.k8s.io kind: Role - name: kube-state-metrics-resizer + name: kube-state-metrics-addon-resizer subjects: - kind: ServiceAccount name: kube-state-metrics - diff --git a/manifests/kube-state-metrics/kube-state-metrics-role.yaml b/manifests/kube-state-metrics/kube-state-metrics-role.yaml index 6bf21fb882e15683cc57654f2efc33d82f8d3eff..0063ffb453a8d1f254755e7209a73f74ec1a5195 100644 --- a/manifests/kube-state-metrics/kube-state-metrics-role.yaml +++ b/manifests/kube-state-metrics/kube-state-metrics-role.yaml @@ -1,15 +1,21 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 +apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: - name: kube-state-metrics-resizer + name: kube-state-metrics + namespace: monitoring rules: -- apiGroups: [""] +- apiGroups: + - "" resources: - pods - verbs: ["get"] -- apiGroups: ["extensions"] + verbs: + - get +- apiGroups: + - extensions + resourceNames: + - kube-state-metrics resources: - deployments - resourceNames: ["kube-state-metrics"] - verbs: ["get", "update"] - + verbs: + - get + - update diff --git a/manifests/kube-state-metrics/kube-state-metrics-service-account.yaml b/manifests/kube-state-metrics/kube-state-metrics-service-account.yaml index 997793528ad8b9970a4b5100cc644ea1ceafcb41..fff1028b442c69109cb8fa8e5f808f2a856838f8 100644 --- a/manifests/kube-state-metrics/kube-state-metrics-service-account.yaml +++ b/manifests/kube-state-metrics/kube-state-metrics-service-account.yaml @@ -2,3 +2,4 @@ apiVersion: v1 kind: ServiceAccount metadata: name: kube-state-metrics + namespace: monitoring diff --git a/manifests/kube-state-metrics/kube-state-metrics-service.yaml b/manifests/kube-state-metrics/kube-state-metrics-service.yaml index b4422685c270f2ad45556ba1e8c3b2a09fe4431c..3e88b562b615e5ff0a3f9441f377c418c0449b9a 100644 --- a/manifests/kube-state-metrics/kube-state-metrics-service.yaml +++ b/manifests/kube-state-metrics/kube-state-metrics-service.yaml @@ -2,20 +2,16 @@ apiVersion: v1 kind: Service metadata: labels: - app: kube-state-metrics k8s-app: kube-state-metrics name: kube-state-metrics + namespace: monitoring spec: - clusterIP: None ports: - name: https-main port: 8443 targetPort: https-main - protocol: TCP - name: https-self port: 9443 targetPort: https-self - protocol: TCP selector: app: kube-state-metrics - diff --git a/manifests/node-exporter/node-exporter-cluster-role.yaml b/manifests/node-exporter/node-exporter-cluster-role.yaml index 932b7762c43051937e9ead2a721e9b2414eecdf9..ad783ae9bfdb3eb2c57e8d7bc9be580e75880bbf 100644 --- a/manifests/node-exporter/node-exporter-cluster-role.yaml +++ b/manifests/node-exporter/node-exporter-cluster-role.yaml @@ -3,11 +3,15 @@ kind: ClusterRole metadata: name: node-exporter rules: -- apiGroups: ["authentication.k8s.io"] +- apiGroups: + - authentication.k8s.io resources: - tokenreviews - verbs: ["create"] -- apiGroups: ["authorization.k8s.io"] + verbs: + - create +- apiGroups: + - authorization.k8s.io resources: - subjectaccessreviews - verbs: ["create"] + verbs: + - create diff --git a/manifests/node-exporter/node-exporter-daemonset.yaml b/manifests/node-exporter/node-exporter-daemonset.yaml index f92113e87ccb0afd13b9a65ad2166a05b2911384..1284e93dd260a3ad37adcfd65ed771dcc67752de 100644 --- a/manifests/node-exporter/node-exporter-daemonset.yaml +++ b/manifests/node-exporter/node-exporter-daemonset.yaml @@ -1,69 +1,63 @@ -apiVersion: extensions/v1beta1 +apiVersion: apps/v1beta2 kind: DaemonSet metadata: + labels: + app: node-exporter name: node-exporter + namespace: monitoring spec: - updateStrategy: - rollingUpdate: - maxUnavailable: 1 - type: RollingUpdate + selector: + matchLabels: + app: node-exporter template: metadata: labels: app: node-exporter - name: node-exporter spec: - serviceAccountName: node-exporter - securityContext: - runAsNonRoot: true - runAsUser: 65534 - hostNetwork: true - hostPID: true containers: - - image: quay.io/prometheus/node-exporter:v0.15.2 - args: - - "--web.listen-address=127.0.0.1:9101" - - "--path.procfs=/host/proc" - - "--path.sysfs=/host/sys" + - args: + - --web.listen-address=127.0.0.1:9101 + - --path.procfs=/host/proc + - --path.sysfs=/host/sys + image: quay.io/prometheus/node-exporter:v0.15.2 name: node-exporter resources: - requests: - memory: 30Mi - cpu: 100m limits: - memory: 50Mi - cpu: 200m + cpu: 102m + memory: 180Mi + requests: + cpu: 102m + memory: 180Mi volumeMounts: - - name: proc - readOnly: true - mountPath: /host/proc - - name: sys - readOnly: true - mountPath: /host/sys - - name: kube-rbac-proxy - image: quay.io/brancz/kube-rbac-proxy:v0.2.0 - args: - - "--secure-listen-address=:9100" - - "--upstream=http://127.0.0.1:9101/" + - mountPath: /host/proc + name: proc + readOnly: false + - mountPath: /host/sys + name: sys + readOnly: false + - args: + - --secure-listen-address=:9100 + - --upstream=http://127.0.0.1:9101/ + image: quay.io/coreos/kube-rbac-proxy:v0.3.0 + name: kube-rbac-proxy ports: - containerPort: 9100 - hostPort: 9100 name: https resources: - requests: - memory: 20Mi - cpu: 10m limits: - memory: 40Mi cpu: 20m - tolerations: - - effect: NoSchedule - operator: Exists + memory: 40Mi + requests: + cpu: 10m + memory: 20Mi + securityContext: + runAsNonRoot: true + runAsUser: 65534 + serviceAccountName: node-exporter volumes: - - name: proc - hostPath: + - hostPath: path: /proc - - name: sys - hostPath: + name: proc + - hostPath: path: /sys - + name: sys diff --git a/manifests/node-exporter/node-exporter-service-account.yaml b/manifests/node-exporter/node-exporter-service-account.yaml index 703a274882355461607aeea10d00f2127186810a..8a03ac1641f68cbca54468aae86c79a20156058a 100644 --- a/manifests/node-exporter/node-exporter-service-account.yaml +++ b/manifests/node-exporter/node-exporter-service-account.yaml @@ -2,3 +2,4 @@ apiVersion: v1 kind: ServiceAccount metadata: name: node-exporter + namespace: monitoring diff --git a/manifests/node-exporter/node-exporter-service.yaml b/manifests/node-exporter/node-exporter-service.yaml index 8aa3774792925629be50c08eed611247de866fd7..101a9769daff5d003f9970c58c04233cc4914095 100644 --- a/manifests/node-exporter/node-exporter-service.yaml +++ b/manifests/node-exporter/node-exporter-service.yaml @@ -2,16 +2,13 @@ apiVersion: v1 kind: Service metadata: labels: - app: node-exporter k8s-app: node-exporter name: node-exporter + namespace: monitoring spec: - type: ClusterIP - clusterIP: None ports: - name: https port: 9100 - protocol: TCP + targetPort: https selector: app: node-exporter - diff --git a/manifests/prometheus-k8s/prometheus-k8s-cluster-role-binding.yaml b/manifests/prometheus-k8s/prometheus-k8s-cluster-role-binding.yaml new file mode 100644 index 0000000000000000000000000000000000000000..554bb6f8767c0a8a346614c685649321a176b640 --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-cluster-role-binding.yaml @@ -0,0 +1,12 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: prometheus-k8s +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: prometheus-k8s +subjects: +- kind: ServiceAccount + name: prometheus-k8s + namespace: monitoring diff --git a/manifests/prometheus-k8s/prometheus-k8s-cluster-role.yaml b/manifests/prometheus-k8s/prometheus-k8s-cluster-role.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d5c4598304eaf7fe0c183e34969ca734f793aa75 --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-cluster-role.yaml @@ -0,0 +1,15 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: prometheus-k8s +rules: +- apiGroups: + - "" + resources: + - nodes/metrics + verbs: + - get +- nonResourceURLs: + - /metrics + verbs: + - get diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-binding-config.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-binding-config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f21ff3c5978a18636dd43473d13eb3e62a697f8b --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-binding-config.yaml @@ -0,0 +1,13 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: prometheus-k8s-config + namespace: monitoring +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: prometheus-k8s-config +subjects: +- kind: ServiceAccount + name: prometheus-k8s-config + namespace: monitoring diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-binding-default.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-binding-default.yaml new file mode 100644 index 0000000000000000000000000000000000000000..c40397108012dde1390d3e5e5301c77f2d41b1fb --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-binding-default.yaml @@ -0,0 +1,13 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: prometheus-k8s + namespace: default +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: prometheus-k8s +subjects: +- kind: ServiceAccount + name: prometheus-k8s + namespace: monitoring diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-binding-kube-system.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-binding-kube-system.yaml new file mode 100644 index 0000000000000000000000000000000000000000..250c7307540671cb11c18185cdb7a80d38c42647 --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-binding-kube-system.yaml @@ -0,0 +1,13 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: prometheus-k8s + namespace: kube-system +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: prometheus-k8s +subjects: +- kind: ServiceAccount + name: prometheus-k8s + namespace: monitoring diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-binding-namespace.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-binding-namespace.yaml new file mode 100644 index 0000000000000000000000000000000000000000..068c77d37d9b1c54b6216994b01efd260b64a35a --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-binding-namespace.yaml @@ -0,0 +1,13 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: prometheus-k8s + namespace: monitoring +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: prometheus-k8s +subjects: +- kind: ServiceAccount + name: prometheus-k8s + namespace: monitoring diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-config.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-config.yaml new file mode 100644 index 0000000000000000000000000000000000000000..5f1cd043919a18141c2331c05836d5ef45d099eb --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-config.yaml @@ -0,0 +1,12 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: prometheus-k8s-config + namespace: monitoring +rules: +- apiGroups: + - "" + resources: + - configmaps + verbs: + - get diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-default.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-default.yaml new file mode 100644 index 0000000000000000000000000000000000000000..1c336117a26c5efa32dca57a1b73c2a8b4245162 --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-default.yaml @@ -0,0 +1,17 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: prometheus-k8s + namespace: default +rules: +- apiGroups: + - "" + resources: + - nodes + - services + - endpoints + - pods + verbs: + - get + - list + - watch diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-kube-system.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-kube-system.yaml new file mode 100644 index 0000000000000000000000000000000000000000..d82fe3ab8b2d12352dcf390a4dc8cf4d47277b1c --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-kube-system.yaml @@ -0,0 +1,17 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: prometheus-k8s + namespace: kube-system +rules: +- apiGroups: + - "" + resources: + - nodes + - services + - endpoints + - pods + verbs: + - get + - list + - watch diff --git a/manifests/prometheus-k8s/prometheus-k8s-role-namespace.yaml b/manifests/prometheus-k8s/prometheus-k8s-role-namespace.yaml new file mode 100644 index 0000000000000000000000000000000000000000..343cfc6dff7e41ff59ced5a55f093da3032aecd7 --- /dev/null +++ b/manifests/prometheus-k8s/prometheus-k8s-role-namespace.yaml @@ -0,0 +1,17 @@ +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: prometheus-k8s + namespace: monitoring +rules: +- apiGroups: + - "" + resources: + - nodes + - services + - endpoints + - pods + verbs: + - get + - list + - watch diff --git a/manifests/prometheus/prometheus-k8s-rules.yaml b/manifests/prometheus-k8s/prometheus-k8s-rules.yaml similarity index 83% rename from manifests/prometheus/prometheus-k8s-rules.yaml rename to manifests/prometheus-k8s/prometheus-k8s-rules.yaml index c45bbbbc88bd8d3d204120cca5dca48a3ccdff3d..90cb3f3e165c6a07d0206fa98682b00814bc02ae 100644 --- a/manifests/prometheus/prometheus-k8s-rules.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-rules.yaml @@ -1,12 +1,6 @@ apiVersion: v1 -kind: ConfigMap -metadata: - name: prometheus-k8s-rules - labels: - role: alert-rules - prometheus: k8s data: - alertmanager.rules.yaml: |+ + alertmanager.rules.yaml: | groups: - name: alertmanager.rules rules: @@ -40,7 +34,7 @@ data: description: Reloading Alertmanager's configuration has failed for {{ $labels.namespace }}/{{ $labels.pod}}. summary: Alertmanager's configuration reload failed - etcd3.rules.yaml: |+ + etcd3.rules.yaml: | groups: - name: ./etcd3.rules rules: @@ -164,7 +158,7 @@ data: annotations: description: etcd instance {{ $labels.instance }} commit durations are high summary: high commit durations - general.rules.yaml: |+ + general.rules.yaml: | groups: - name: general.rules rules: @@ -204,7 +198,7 @@ data: description: '{{ $labels.job }}: {{ $labels.namespace }}/{{ $labels.pod }} instance will exhaust in file/socket descriptors within the next hour' summary: file descriptors soon exhausted - kube-controller-manager.rules.yaml: |+ + kube-controller-manager.rules.yaml: | groups: - name: kube-controller-manager.rules rules: @@ -218,7 +212,7 @@ data: controllers are not making progress. runbook: https://coreos.com/tectonic/docs/latest/troubleshooting/controller-recovery.html#recovering-a-controller-manager summary: Controller manager is down - kube-scheduler.rules.yaml: |+ + kube-scheduler.rules.yaml: | groups: - name: kube-scheduler.rules rules: @@ -277,7 +271,7 @@ data: to nodes. runbook: https://coreos.com/tectonic/docs/latest/troubleshooting/controller-recovery.html#recovering-a-scheduler summary: Scheduler is down - kube-state-metrics.rules.yaml: |+ + kube-state-metrics.rules.yaml: | groups: - name: kube-state-metrics.rules rules: @@ -337,7 +331,7 @@ data: description: Pod {{$labels.namespace}}/{{$labels.pod}} was restarted {{$value}} times within the last hour summary: Pod is restarting frequently - kubelet.rules.yaml: |+ + kubelet.rules.yaml: | groups: - name: kubelet.rules rules: @@ -386,7 +380,7 @@ data: description: Kubelet {{$labels.instance}} is running {{$value}} pods, close to the limit of 110 summary: Kubelet is close to pod limit - kubernetes.rules.yaml: |+ + kubernetes.rules.yaml: | groups: - name: kubernetes.rules rules: @@ -477,7 +471,7 @@ data: description: No API servers are reachable or all have disappeared from service discovery summary: No API servers are reachable - + - alert: K8sCertificateExpirationNotice labels: severity: warning @@ -485,7 +479,7 @@ data: description: Kubernetes API Certificate is expiring soon (less than 7 days) summary: Kubernetes API Certificate is expiering soon expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="604800"}) > 0 - + - alert: K8sCertificateExpirationNotice labels: severity: critical @@ -493,7 +487,7 @@ data: description: Kubernetes API Certificate is expiring in less than 1 day summary: Kubernetes API Certificate is expiering expr: sum(apiserver_client_certificate_expiration_seconds_bucket{le="86400"}) > 0 - node.rules.yaml: |+ + node.rules.yaml: | groups: - name: node.rules rules: @@ -541,105 +535,53 @@ data: description: device {{$labels.device}} on node {{$labels.instance}} is running full within the next 2 hours (mounted at {{$labels.mountpoint}}) summary: Node disk is running full within 2 hours - prometheus.rules.yaml: |+ - groups: - - name: prometheus.rules - rules: - - alert: PrometheusConfigReloadFailed - expr: prometheus_config_last_reload_successful == 0 - for: 10m - labels: - severity: warning - annotations: - description: Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}} - summary: Reloading Promehteus' configuration failed - - - alert: PrometheusNotificationQueueRunningFull - expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > prometheus_notifications_queue_capacity - for: 10m - labels: - severity: warning - annotations: - description: Prometheus' alert notification queue is running full for {{$labels.namespace}}/{{ - $labels.pod}} - summary: Prometheus' alert notification queue is running full - - - alert: PrometheusErrorSendingAlerts - expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m]) - > 0.01 - for: 10m - labels: - severity: warning - annotations: - description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ - $labels.pod}} to Alertmanager {{$labels.Alertmanager}} - summary: Errors while sending alert from Prometheus - - - alert: PrometheusErrorSendingAlerts - expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m]) - > 0.03 - for: 10m - labels: - severity: critical - annotations: - description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{ - $labels.pod}} to Alertmanager {{$labels.Alertmanager}} - summary: Errors while sending alerts from Prometheus - - - alert: PrometheusNotConnectedToAlertmanagers - expr: prometheus_notifications_alertmanagers_discovered < 1 - for: 10m - labels: - severity: warning - annotations: - description: Prometheus {{ $labels.namespace }}/{{ $labels.pod}} is not connected - to any Alertmanagers - summary: Prometheus is not connected to any Alertmanagers - - - alert: PrometheusTSDBReloadsFailing - expr: increase(prometheus_tsdb_reloads_failures_total[2h]) > 0 - for: 12h - labels: - severity: warning - annotations: - description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} - reload failures over the last four hours.' - summary: Prometheus has issues reloading data blocks from disk - - - alert: PrometheusTSDBCompactionsFailing - expr: increase(prometheus_tsdb_compactions_failed_total[2h]) > 0 - for: 12h - labels: - severity: warning - annotations: - description: '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}} - compaction failures over the last four hours.' - summary: Prometheus has issues compacting sample blocks - - - alert: PrometheusTSDBWALCorruptions - expr: tsdb_wal_corruptions_total > 0 - for: 4h - labels: - severity: warning - annotations: - description: '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead - log (WAL).' - summary: Prometheus write-ahead log is corrupted - - - alert: PrometheusNotIngestingSamples - expr: rate(prometheus_tsdb_head_samples_appended_total[5m]) <= 0 - for: 10m - labels: - severity: warning - annotations: - description: "Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples." - summary: "Prometheus isn't ingesting samples" - - - alert: PrometheusTargetScapesDuplicate - expr: increase(prometheus_target_scrapes_sample_duplicate_timestamp_total[5m]) > 0 - for: 10m - labels: - severity: warning - annotations: - description: "{{$labels.namespace}}/{{$labels.pod}} has many samples rejected due to duplicate timestamps but different values" - summary: Prometheus has many samples rejected + prometheus.rules.yaml: "groups:\n- name: prometheus.rules\n rules:\n - alert: + PrometheusConfigReloadFailed\n expr: prometheus_config_last_reload_successful + == 0\n for: 10m\n labels:\n severity: warning\n annotations:\n description: + Reloading Prometheus' configuration has failed for {{$labels.namespace}}/{{$labels.pod}}\n + \ summary: Reloading Promehteus' configuration failed\n\n - alert: PrometheusNotificationQueueRunningFull\n + \ expr: predict_linear(prometheus_notifications_queue_length[5m], 60 * 30) > + prometheus_notifications_queue_capacity\n for: 10m\n labels:\n severity: + warning\n annotations:\n description: Prometheus' alert notification queue + is running full for {{$labels.namespace}}/{{\n $labels.pod}}\n summary: + Prometheus' alert notification queue is running full \n\n - alert: PrometheusErrorSendingAlerts\n + \ expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])\n + \ > 0.01\n for: 10m\n labels:\n severity: warning\n annotations:\n + \ description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{\n + \ $labels.pod}} to Alertmanager {{$labels.Alertmanager}}\n summary: + Errors while sending alert from Prometheus\n\n - alert: PrometheusErrorSendingAlerts\n + \ expr: rate(prometheus_notifications_errors_total[5m]) / rate(prometheus_notifications_sent_total[5m])\n + \ > 0.03\n for: 10m\n labels:\n severity: critical\n annotations:\n + \ description: Errors while sending alerts from Prometheus {{$labels.namespace}}/{{\n + \ $labels.pod}} to Alertmanager {{$labels.Alertmanager}}\n summary: + Errors while sending alerts from Prometheus\n\n - alert: PrometheusNotConnectedToAlertmanagers\n + \ expr: prometheus_notifications_alertmanagers_discovered < 1\n for: 10m\n + \ labels:\n severity: warning\n annotations:\n description: Prometheus + {{ $labels.namespace }}/{{ $labels.pod}} is not connected\n to any Alertmanagers\n + \ summary: Prometheus is not connected to any Alertmanagers\n\n - alert: + PrometheusTSDBReloadsFailing\n expr: increase(prometheus_tsdb_reloads_failures_total[2h]) + > 0\n for: 12h\n labels:\n severity: warning\n annotations:\n description: + '{{$labels.job}} at {{$labels.instance}} had {{$value | humanize}}\n reload + failures over the last four hours.'\n summary: Prometheus has issues reloading + data blocks from disk\n\n - alert: PrometheusTSDBCompactionsFailing\n expr: + increase(prometheus_tsdb_compactions_failed_total[2h]) > 0\n for: 12h\n labels:\n + \ severity: warning\n annotations:\n description: '{{$labels.job}} + at {{$labels.instance}} had {{$value | humanize}}\n compaction failures + over the last four hours.'\n summary: Prometheus has issues compacting sample + blocks\n\n - alert: PrometheusTSDBWALCorruptions\n expr: tsdb_wal_corruptions_total + > 0\n for: 4h\n labels:\n severity: warning\n annotations:\n description: + '{{$labels.job}} at {{$labels.instance}} has a corrupted write-ahead\n log + (WAL).'\n summary: Prometheus write-ahead log is corrupted\n\n - alert: + PrometheusNotIngestingSamples\n expr: rate(prometheus_tsdb_head_samples_appended_total[5m]) + <= 0\n for: 10m\n labels:\n severity: warning\n annotations:\n description: + \"Prometheus {{ $labels.namespace }}/{{ $labels.pod}} isn't ingesting samples.\"\n + \ summary: \"Prometheus isn't ingesting samples\"\n\n - alert: PrometheusTargetScapesDuplicate\n + \ expr: increase(prometheus_target_scrapes_sample_duplicate_timestamp_total[5m]) + > 0\n for: 10m\n labels:\n severity: warning\n annotations:\n description: + \"{{$labels.namespace}}/{{$labels.pod}} has many samples rejected due to duplicate + timestamps but different values\"\n summary: Prometheus has many samples + rejected\n" +kind: ConfigMap +metadata: + name: prometheus-k8s-rules + namespace: monitoring diff --git a/manifests/prometheus/prometheus-k8s-service-account.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-account.yaml similarity index 74% rename from manifests/prometheus/prometheus-k8s-service-account.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-account.yaml index 58d5342dfcebb4f667b0f2a916d8e52396651a03..3e55fad675b8691f3aab29f152fdaa5416605745 100644 --- a/manifests/prometheus/prometheus-k8s-service-account.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-account.yaml @@ -2,3 +2,4 @@ apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-k8s + namespace: monitoring diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-alertmanager.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-alertmanager.yaml similarity index 81% rename from manifests/prometheus/prometheus-k8s-service-monitor-alertmanager.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-alertmanager.yaml index 19669e3e8f6a8fc9bd8343dcce34b06ea34fc680..e4e75ccc736027f662b95a4ff190afa3f61f50f5 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-alertmanager.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-alertmanager.yaml @@ -1,16 +1,17 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: alertmanager labels: k8s-app: alertmanager + name: alertmanager + namespace: monitoring spec: - selector: - matchLabels: - alertmanager: main + endpoints: + - interval: 30s + port: web namespaceSelector: matchNames: - monitoring - endpoints: - - port: web - interval: 30s + selector: + matchLabels: + alertmanager: main diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-apiserver.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-apiserver.yaml similarity index 81% rename from manifests/prometheus/prometheus-k8s-service-monitor-apiserver.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-apiserver.yaml index 40361f04f8c24eb2639cda02b310d93281d0d5f5..0cffe5418bf65194c8cb7d2b2e71e87394ab5310 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-apiserver.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-apiserver.yaml @@ -1,23 +1,24 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: kube-apiserver labels: k8s-app: apiserver + name: kube-apiserver + namespace: monitoring spec: - jobLabel: component - selector: - matchLabels: - component: apiserver - provider: kubernetes - namespaceSelector: - matchNames: - - default endpoints: - - port: https + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token interval: 30s + port: https scheme: https tlsConfig: caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt serverName: kubernetes - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + jobLabel: component + namespaceSelector: + matchNames: + - default + selector: + matchLabels: + component: apiserver + provider: kubernetes diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-coredns.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-coredns.yaml similarity index 69% rename from manifests/prometheus/prometheus-k8s-service-monitor-coredns.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-coredns.yaml index 362ac89959cf8e2be1835e6b08ef20feeca01b7e..12a4c5bf9fb3858dfb520d13c8b44b9a78bca341 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-coredns.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-coredns.yaml @@ -4,16 +4,17 @@ metadata: labels: k8s-app: coredns name: coredns + namespace: monitoring spec: + endpoints: + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + interval: 15s + port: http-metrics jobLabel: k8s-app + namespaceSelector: + matchNames: + - kube-system selector: matchLabels: - k8s-app: coredns component: metrics - namespaceSelector: - matchNames: - - kube-system - endpoints: - - port: http-metrics - interval: 15s - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + k8s-app: coredns diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-kube-controller-manager.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-controller-manager.yaml similarity index 82% rename from manifests/prometheus/prometheus-k8s-service-monitor-kube-controller-manager.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-controller-manager.yaml index 681c320d87c13c05f267e4d9104538eb3ecfe748..dfb2a25d7508d24d71d1a2f47769c1c1427d4bd7 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-kube-controller-manager.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-controller-manager.yaml @@ -1,17 +1,18 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: kube-controller-manager labels: k8s-app: kube-controller-manager + name: kube-controller-manager + namespace: monitoring spec: - jobLabel: k8s-app endpoints: - - port: http-metrics - interval: 30s - selector: - matchLabels: - k8s-app: kube-controller-manager + - interval: 30s + port: http-metrics + jobLabel: k8s-app namespaceSelector: matchNames: - kube-system + selector: + matchLabels: + k8s-app: kube-controller-manager diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-kube-scheduler.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-scheduler.yaml similarity index 81% rename from manifests/prometheus/prometheus-k8s-service-monitor-kube-scheduler.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-scheduler.yaml index 6927f58e2995498101f5463378303bb909bdc7ff..f00db0e474023495ed4a825d61674367b1dfd7b8 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-kube-scheduler.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-scheduler.yaml @@ -1,17 +1,18 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: kube-scheduler labels: k8s-app: kube-scheduler + name: kube-scheduler + namespace: monitoring spec: - jobLabel: k8s-app endpoints: - - port: http-metrics - interval: 30s - selector: - matchLabels: - k8s-app: kube-scheduler + - interval: 30s + port: http-metrics + jobLabel: k8s-app namespaceSelector: matchNames: - kube-system + selector: + matchLabels: + k8s-app: kube-scheduler diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-kube-state-metrics.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-state-metrics.yaml similarity index 71% rename from manifests/prometheus/prometheus-k8s-service-monitor-kube-state-metrics.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-state-metrics.yaml index 1433a5feb168e30e8ffcb012970ccd1870ca05fb..cca52f696071938a307dc6386421fe0443a69cca 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-kube-state-metrics.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kube-state-metrics.yaml @@ -1,28 +1,29 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: kube-state-metrics labels: k8s-app: kube-state-metrics + name: kube-state-metrics + namespace: monitoring spec: - jobLabel: k8s-app - selector: - matchLabels: - k8s-app: kube-state-metrics - namespaceSelector: - matchNames: - - monitoring endpoints: - - port: https-main - scheme: https - interval: 30s + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + interval: 30s + port: https-main + scheme: https tlsConfig: insecureSkipVerify: true - - port: https-self - scheme: https + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token interval: 30s - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + port: https-self + scheme: https tlsConfig: insecureSkipVerify: true + jobLabel: k8s-app + namespaceSelector: + matchNames: + - monitoring + selector: + matchLabels: + k8s-app: kube-state-metrics diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kubelet.yaml similarity index 71% rename from manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-kubelet.yaml index 16c9752df755bb69d7c04ee8f14cccd8548c7322..06ec7fc84b3efda7befa08d1d6256136bbac4ef4 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-kubelet.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-kubelet.yaml @@ -1,29 +1,30 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: kubelet labels: k8s-app: kubelet + name: kubelet + namespace: monitoring spec: - jobLabel: k8s-app endpoints: - - port: https-metrics - scheme: https + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token interval: 30s + port: https-metrics + scheme: https tlsConfig: insecureSkipVerify: true - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token - - port: https-metrics - scheme: https - path: /metrics/cadvisor - interval: 30s + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token honorLabels: true + interval: 30s + path: /metrics/cadvisor + port: https-metrics + scheme: https tlsConfig: insecureSkipVerify: true - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token - selector: - matchLabels: - k8s-app: kubelet + jobLabel: k8s-app namespaceSelector: matchNames: - kube-system + selector: + matchLabels: + k8s-app: kubelet diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-node-exporter.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-node-exporter.yaml similarity index 78% rename from manifests/prometheus/prometheus-k8s-service-monitor-node-exporter.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-node-exporter.yaml index 0dd72e759a7234996a16b2e724ca703baea6b442..529f2944f61f18a616a1a99121214e660ae3effd 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-node-exporter.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-node-exporter.yaml @@ -1,21 +1,22 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: node-exporter labels: k8s-app: node-exporter + name: node-exporter + namespace: monitoring spec: - jobLabel: k8s-app - selector: - matchLabels: - k8s-app: node-exporter - namespaceSelector: - matchNames: - - monitoring endpoints: - - port: https - scheme: https + - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token interval: 30s - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + port: https + scheme: https tlsConfig: insecureSkipVerify: true + jobLabel: k8s-app + namespaceSelector: + matchNames: + - monitoring + selector: + matchLabels: + k8s-app: node-exporter diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-prometheus-operator.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-prometheus-operator.yaml similarity index 90% rename from manifests/prometheus/prometheus-k8s-service-monitor-prometheus-operator.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-prometheus-operator.yaml index 0b8028e778b4fa8aef77abaed10860c8e4b09ba7..10e0059aa4c6f3785dc0beda7e2b36fbf8b91a44 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-prometheus-operator.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-prometheus-operator.yaml @@ -1,9 +1,10 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: prometheus-operator labels: k8s-app: prometheus-operator + name: prometheus-operator + namespace: monitoring spec: endpoints: - port: http diff --git a/manifests/prometheus/prometheus-k8s-service-monitor-prometheus.yaml b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-prometheus.yaml similarity index 81% rename from manifests/prometheus/prometheus-k8s-service-monitor-prometheus.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service-monitor-prometheus.yaml index c3d11e57b38fb77a1d26b19a0c7dd0a8d8b044ec..90b25476e057ae87e33a0708b8b3f01865bcabe2 100644 --- a/manifests/prometheus/prometheus-k8s-service-monitor-prometheus.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service-monitor-prometheus.yaml @@ -1,16 +1,17 @@ apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: - name: prometheus labels: k8s-app: prometheus + name: prometheus + namespace: monitoring spec: - selector: - matchLabels: - prometheus: k8s + endpoints: + - interval: 30s + port: web namespaceSelector: matchNames: - monitoring - endpoints: - - port: web - interval: 30s + selector: + matchLabels: + prometheus: k8s diff --git a/manifests/prometheus/prometheus-k8s-service.yaml b/manifests/prometheus-k8s/prometheus-k8s-service.yaml similarity index 77% rename from manifests/prometheus/prometheus-k8s-service.yaml rename to manifests/prometheus-k8s/prometheus-k8s-service.yaml index 5cd3b65b7b50a0e21da2416a65d771151f375536..85b007f8b787324932918166acdc470aaef2b21c 100644 --- a/manifests/prometheus/prometheus-k8s-service.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s-service.yaml @@ -4,13 +4,12 @@ metadata: labels: prometheus: k8s name: prometheus-k8s + namespace: monitoring spec: - type: NodePort ports: - name: web - nodePort: 30900 port: 9090 - protocol: TCP targetPort: web selector: + app: prometheus prometheus: k8s diff --git a/manifests/prometheus/prometheus-k8s.yaml b/manifests/prometheus-k8s/prometheus-k8s.yaml similarity index 54% rename from manifests/prometheus/prometheus-k8s.yaml rename to manifests/prometheus-k8s/prometheus-k8s.yaml index 8f243eb08eb6612ef5e47d2bbc19313378189994..324d96c71f3d74c6ea37bd8a0dfbf9eb5ead1388 100644 --- a/manifests/prometheus/prometheus-k8s.yaml +++ b/manifests/prometheus-k8s/prometheus-k8s.yaml @@ -1,29 +1,27 @@ apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: - name: k8s labels: prometheus: k8s + name: k8s + namespace: monitoring spec: + alerting: + alertmanagers: + - name: alertmanager-main + namespace: monitoring + port: web replicas: 2 - version: v2.2.1 - serviceAccountName: prometheus-k8s - serviceMonitorSelector: - matchExpressions: - - {key: k8s-app, operator: Exists} - ruleSelector: - matchLabels: - role: alert-rules - prometheus: k8s resources: requests: - # 2Gi is default, but won't schedule if you don't have a node with >2Gi - # memory. Modify based on your target and time-series count for - # production use. This value is mainly meant for demonstration/testing - # purposes. memory: 400Mi - alerting: - alertmanagers: - - namespace: monitoring - name: alertmanager-main - port: web + ruleSelector: + matchLabels: + prometheus: k8s + role: alert-rules + serviceAccountName: prometheus-k8s + serviceMonitorSelector: + matchExpressions: + - key: k8s-app + operator: Exists + version: v2.2.1 diff --git a/manifests/prometheus-operator/prometheus-operator-cluster-role-binding.yaml b/manifests/prometheus-operator/prometheus-operator-cluster-role-binding.yaml index e7e03a29bcb9a50a9f4e80853d25dc9782568f39..8f1b4d36ca2d94b20ab75794e37849845c0b71d4 100644 --- a/manifests/prometheus-operator/prometheus-operator-cluster-role-binding.yaml +++ b/manifests/prometheus-operator/prometheus-operator-cluster-role-binding.yaml @@ -1,4 +1,4 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 +apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-operator diff --git a/manifests/prometheus-operator/prometheus-operator-cluster-role.yaml b/manifests/prometheus-operator/prometheus-operator-cluster-role.yaml index 1b13c89951c4a0c88ac94d88a97840f9adb558a5..9ecd917e9d2258457a22a80ee032f3afc6d04abf 100644 --- a/manifests/prometheus-operator/prometheus-operator-cluster-role.yaml +++ b/manifests/prometheus-operator/prometheus-operator-cluster-role.yaml @@ -1,4 +1,4 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 +apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-operator @@ -8,13 +8,13 @@ rules: resources: - thirdpartyresources verbs: - - "*" + - '*' - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: - - "*" + - '*' - apiGroups: - monitoring.coreos.com resources: @@ -24,31 +24,45 @@ rules: - alertmanagers/finalizers - servicemonitors verbs: - - "*" + - '*' - apiGroups: - apps resources: - statefulsets - verbs: ["*"] -- apiGroups: [""] + verbs: + - '*' +- apiGroups: + - "" resources: - configmaps - secrets - verbs: ["*"] -- apiGroups: [""] + verbs: + - '*' +- apiGroups: + - "" resources: - pods - verbs: ["list", "delete"] -- apiGroups: [""] + verbs: + - list + - delete +- apiGroups: + - "" resources: - services - - endpoints - verbs: ["get", "create", "update"] -- apiGroups: [""] + verbs: + - get + - create + - update +- apiGroups: + - "" resources: - nodes - verbs: ["list", "watch"] -- apiGroups: [""] + verbs: + - list + - watch +- apiGroups: + - "" resources: - namespaces - verbs: ["list"] + verbs: + - list diff --git a/manifests/prometheus-operator/prometheus-operator.yaml b/manifests/prometheus-operator/prometheus-operator-deployment.yaml similarity index 97% rename from manifests/prometheus-operator/prometheus-operator.yaml rename to manifests/prometheus-operator/prometheus-operator-deployment.yaml index d00301111466c972e8f97e5175ade17ded9e5ac9..1c70ef0a51a2b25756191298de971d18ea77ec16 100644 --- a/manifests/prometheus-operator/prometheus-operator.yaml +++ b/manifests/prometheus-operator/prometheus-operator-deployment.yaml @@ -4,6 +4,7 @@ metadata: labels: k8s-app: prometheus-operator name: prometheus-operator + namespace: monitoring spec: replicas: 1 selector: diff --git a/manifests/prometheus-operator/prometheus-operator-service-account.yaml b/manifests/prometheus-operator/prometheus-operator-service-account.yaml index 38d18cce71105613f954591c54f1eaa4ddd15b06..11898a6fbef1c3e2e47f4f554967e5d5651c808c 100644 --- a/manifests/prometheus-operator/prometheus-operator-service-account.yaml +++ b/manifests/prometheus-operator/prometheus-operator-service-account.yaml @@ -2,3 +2,4 @@ apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-operator + namespace: monitoring diff --git a/manifests/prometheus-operator/prometheus-operator-service.yaml b/manifests/prometheus-operator/prometheus-operator-service.yaml index 8882d4a793ed26bc1e4fb3c85134ca16061a25f4..8a82538740f2cb4f3563918b4e27669e24ea89d0 100644 --- a/manifests/prometheus-operator/prometheus-operator-service.yaml +++ b/manifests/prometheus-operator/prometheus-operator-service.yaml @@ -2,14 +2,11 @@ apiVersion: v1 kind: Service metadata: name: prometheus-operator - labels: - k8s-app: prometheus-operator + namespace: monitoring spec: - type: ClusterIP ports: - name: http port: 8080 targetPort: http - protocol: TCP selector: k8s-app: prometheus-operator diff --git a/manifests/prometheus/prometheus-k8s-role-bindings.yaml b/manifests/prometheus/prometheus-k8s-role-bindings.yaml deleted file mode 100644 index 5f190e7ab1a21e09673c6a9bef9e683b22e98304..0000000000000000000000000000000000000000 --- a/manifests/prometheus/prometheus-k8s-role-bindings.yaml +++ /dev/null @@ -1,54 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: RoleBinding -metadata: - name: prometheus-k8s - namespace: monitoring -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: Role - name: prometheus-k8s -subjects: -- kind: ServiceAccount - name: prometheus-k8s - namespace: monitoring ---- -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: RoleBinding -metadata: - name: prometheus-k8s - namespace: kube-system -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: Role - name: prometheus-k8s -subjects: -- kind: ServiceAccount - name: prometheus-k8s - namespace: monitoring ---- -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: RoleBinding -metadata: - name: prometheus-k8s - namespace: default -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: Role - name: prometheus-k8s -subjects: -- kind: ServiceAccount - name: prometheus-k8s - namespace: monitoring ---- -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: ClusterRoleBinding -metadata: - name: prometheus-k8s -roleRef: - apiGroup: rbac.authorization.k8s.io - kind: ClusterRole - name: prometheus-k8s -subjects: -- kind: ServiceAccount - name: prometheus-k8s - namespace: monitoring diff --git a/manifests/prometheus/prometheus-k8s-roles.yaml b/manifests/prometheus/prometheus-k8s-roles.yaml deleted file mode 100644 index 4f738e776e7b5de3a5d6fe5a543392bf36ba0213..0000000000000000000000000000000000000000 --- a/manifests/prometheus/prometheus-k8s-roles.yaml +++ /dev/null @@ -1,55 +0,0 @@ -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: Role -metadata: - name: prometheus-k8s - namespace: monitoring -rules: -- apiGroups: [""] - resources: - - nodes - - services - - endpoints - - pods - verbs: ["get", "list", "watch"] -- apiGroups: [""] - resources: - - configmaps - verbs: ["get"] ---- -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: Role -metadata: - name: prometheus-k8s - namespace: kube-system -rules: -- apiGroups: [""] - resources: - - services - - endpoints - - pods - verbs: ["get", "list", "watch"] ---- -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: Role -metadata: - name: prometheus-k8s - namespace: default -rules: -- apiGroups: [""] - resources: - - services - - endpoints - - pods - verbs: ["get", "list", "watch"] ---- -apiVersion: rbac.authorization.k8s.io/v1beta1 -kind: ClusterRole -metadata: - name: prometheus-k8s -rules: -- apiGroups: [""] - resources: - - nodes/metrics - verbs: ["get"] -- nonResourceURLs: ["/metrics"] - verbs: ["get"] diff --git a/manifests/prometheus/prometheus-k8s-service-coredns-metrics.yaml b/manifests/prometheus/prometheus-k8s-service-coredns-metrics.yaml deleted file mode 100644 index cd90a55e614a0c39fa5a59a96f91adc93ef121c6..0000000000000000000000000000000000000000 --- a/manifests/prometheus/prometheus-k8s-service-coredns-metrics.yaml +++ /dev/null @@ -1,18 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - name: coredns-prometheus-discovery - namespace: kube-system - labels: - k8s-app: coredns - component: metrics -spec: - ports: - - name: http-metrics - port: 9153 - protocol: TCP - targetPort: 9153 - selector: - k8s-app: coredns - type: ClusterIP - clusterIP: None diff --git a/requirements.txt b/requirements.txt deleted file mode 100644 index fdc2b200388fc3a7d1fc9f67c8e846aebff90d8b..0000000000000000000000000000000000000000 --- a/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -git+https://github.com/aknuds1/grafanalib.git@v0.4.0 \ No newline at end of file