To secure GitOps based deployments and reduce the risks of compromise, the GitOps deployment in the Shivering-Isles Infrastructure only accepts signed commits. This prevents a deployment of workload if an attackers mananges to push a commit onto the GitOps repository. The git forge itself is in charge of preventing rollbacks in the commit history. Rollbacks could be prevented by using git tags instead of git branches as reference, but are less practical.
Further all secrets stored in the GitOps repository are encrypted using [SOPS](https://getsops.io/) along with unsensitive, but irrelevant information, such as dns names.
Further all secrets stored in the GitOps repository are encrypted using [SOPS](https://getsops.io/) along with insensitive, but irrelevant information, such as dns names.
@@ -8,21 +8,26 @@ In the Shivering-Isles Infrastructure various apps have an own set of SLOs to va
Besides maintaining reasonable SLOs, other SRE practices are implemented, such as post mortems and especially the practice of reducing toil. All components of the infrastructure have a maintenance budget, if it's depleted, it's time to fix the apps or get rid of it.
Service Level Objectives
---
All public facing apps and infrastructure components should have an Service Level Objective (SLO). The most basic SLOs for web apps are the availability and latency measured through the ingress controller. [An examples for an SLO definitions is the Shivering-Isles blog.](https://git.shivering-isles.com/shivering-isles/infrastructure-gitops/-/blob/797843c960f82a1974e2c3b632f0d45e5de9d6fe/apps/k8s01/blog/slo.yaml)
Apps that provide more insight via metrics, can have app-specific SLOs to optimise for user impacting situations, that aren't covered by basic web metrics. [An example is the sidekiq SLO for Mastodon.](https://git.shivering-isles.com/shivering-isles/infrastructure-gitops/-/blob/797843c960f82a1974e2c3b632f0d45e5de9d6fe/apps/k8s01/mastodon/slo.yaml#L9-21)
The actual objectives in the Shivering-Isles infrastructure are often relatively low around 95 percent.
Learning about SRE
---
A good start is this small video Series by Google:
Further there is the [Google SRE book](https://sre.google/sre-book/introduction/) as recommended read.
Service Level Objectives
---
All public facing apps should have an Service Level Objective (SLO). The most basic SLOs for web apps are the availability and latency measured throught the ingress controller. [An examples for an SLO definitions is the Shivering-Isles blog.](https://git.shivering-isles.com/shivering-isles/infrastructure-gitops/-/blob/797843c960f82a1974e2c3b632f0d45e5de9d6fe/apps/k8s01/blog/slo.yaml)
Further there are some good talks from SREcon:
Apps that provide more insight via metrics, can have app-specific SLOs to optimise for user impacting situations, that aren't covered by basic web metrics. [An example is the sidekiq SLO for Mastodon.](https://git.shivering-isles.com/shivering-isles/infrastructure-gitops/-/blob/797843c960f82a1974e2c3b632f0d45e5de9d6fe/apps/k8s01/mastodon/slo.yaml#L9-21)