Polish default EKS installation

Introduction Link to heading

EKS is a great solution for those, who can not afford to have dedicated team looking for k8s layer only (compared to KOPS deployments). But as most software/utilities not only comes with a set of default settings which you should adjust, but also limitations - the same applies for EKS (and most likely any managed kubernetes service as only masterplane responsibility is handed over to a provider and YOU are still responsible for you workers and workflows).

One of the biggest limiation in my opinion is that you can not modify enabled admission controllers by yourself. Both AlwaysPullImages and DenyEscalatingExec are disabled and there is no way to enable them to improve security of your cluster. Altho the latter one has been deprecated in and finally has been removed in v1.18 (recommended approach is to use mix of RBAC and PSP or custom admission plugin). Importance of the AllwaysPullImages is greatly described in this blog post. And this is just single limitation example which may impact your EKS clusters.

NOTE: I’ve performed these actions on EKS 1.17.

Etcd encryption Link to heading

When you’re creating your EKS cluster, ensure to check “use envelope encryption for etcd” checkbox.

More details here.

NOTE: You will have to recreate cluster if you have not initially enabled this feature.

CoreDNS Link to heading

By default coredns deployment comes with soft podAntiAffinity rule and without PodDisruptionBudget - this is wrong.

While there is small chance to colocate both replicas on single worker, this can still happen due to soft rule. And it eventually happen. Happened for me few time, will happen for you too, its a matter of time.

First of all, who runs kubernetes cluster with single worker these days?

Assuming above situation has happened, and you’re in position where you have worker with both coredns replicas scheduled on it (+ no PDB) and that worker goes down ungracefully or you start to drain this node. DNS outage - you’re right.

There for I propose to replace default coredns deployment with attached manifest. There are two differences if you’ll compare it to default:

  • replaced soft PodAntiAffinity constraint with hard
  • updated coredns to 1.6.9 as there are few updates to plugins
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    eks.amazonaws.com/component: coredns
    k8s-app: kube-dns
    kubernetes.io/name: CoreDNS
  name: coredns
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      eks.amazonaws.com/component: coredns
      k8s-app: kube-dns
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        eks.amazonaws.com/compute-type: ec2
      creationTimestamp: null
      labels:
        eks.amazonaws.com/component: coredns
        k8s-app: kube-dns
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: beta.kubernetes.io/os
                operator: In
                values:
                - linux
              - key: beta.kubernetes.io/arch
                operator: In
                values:
                - amd64
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: k8s-app
                operator: In
                values:
                - kube-dns
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: coredns/coredns:1.6.9
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: coredns
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/coredns
          name: config-volume
          readOnly: true
        - mountPath: /tmp
          name: tmp
      dnsPolicy: Default
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: coredns
      serviceAccountName: coredns
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/master
      - key: CriticalAddonsOnly
        operator: Exists
      volumes:
      - emptyDir: {}
        name: tmp
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile
            path: Corefile
          name: coredns
        name: config-volume
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: coredns
  namespace: kube-system
spec:
  minAvailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns

CNI Link to heading

EKS by default comes with aws-vpc-cni out of the box. Be familiar with its limitations.

Strongly advice to consider replacing it with either Calico or Clilium. This blog post may make you preffer Cilium.

IRSA Link to heading

Enable IRSA from day0. This is a must, otherwise all your pods will have node privileges.

Other options you’d want to consider using instead are kiam (necessarily in pair with with cert-manager - see this blog post and kiam docs) for or kube2iam.

PodSecurityPolicies Link to heading

First thing (and most important) to note is that EKS comes by default with single eks.privileged PodSecurityPolicy applied. You can read about them in documentation. In short, it allows ALL authenticated users to deploy privileged containers on to your cluster. Why bother?

ANY anythenticated user can:

  • mount ANY directory from your hosts to his containers
  • run ANY containers in your host network
  • run ANY containers as root
  • run containers with ALL capabilities

Trust me, this is not what you want to have on any kind of cluster (minikube does not count).

Start by removing defaults (you can always restore it whenever needed, see docs):

kubectl delete PodSecurityPolicy eks.privileged
kubectl delete ClusterRole eks:podsecuritypolicy:privileged
kubectl delete ClusterRoleBinding eks:podsecuritypolicy:authenticated

Next step is to add slightly less permissive policy so aws-node, kube-proxy and coredns can be scheduled. My example was generated with kube-psp-advisor. I assume you might adjust this policy later or add new one and update role if you want to run other things in kube-system namespace.

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
  name: psp-for-kube-system
spec:
  allowPrivilegeEscalation: true
  allowedHostPaths:
  - pathPrefix: /etc/cni/net.d
  - pathPrefix: /lib/modules
  - pathPrefix: /opt/cni/bin
  - pathPrefix: /run/xtables.lock
  - pathPrefix: /var/log
  - pathPrefix: /var/log
  - pathPrefix: /var/run/docker.sock
  - pathPrefix: /var/run/dockershim.sock
  defaultAddCapabilities:
  - NET_BIND_SERVICE
  fsGroup:
    rule: RunAsAny
  hostNetwork: true
  hostPorts:
  - max: 61678
    min: 0
  privileged: true
  readOnlyRootFilesystem: false
  requiredDropCapabilities:
  - all
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
  - emptyDir
  - configMap
  - hostPath
  - secret
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: psp-for-kube-system
  namespace: kube-system
rules:
- apiGroups:
  - policy
  resourceNames:
  - psp-for-kube-system
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: psp-for-kube-system
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: psp-for-kube-system
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
  namespace: kube-system

What has happened there is you’ve:

  • created new PSP
  • new role within kube-system namespace which allows to use new PSP
  • new rolebinding within kube-system namespace to bind role to system:serviceaccounts so kubernetes controllers can use this new policy while scheduling new pods

Because PSP checks are performed on schedule time, you need to rollout all pods in kube-system namespace, easiest way to achieve this is simply by kubectl -n kube-system delete pod --all --wait=false. In few secs, you should see all pods being recreated. You can verify psp assigment by kubectl get pod -n kube-system -o yaml | grep psp - as output you should get few lines of kubernetes.io/psp: psp-for-kube-system.

Now it is time to apply default, more restricted policy. Start with this one (adjust for your needs, pointing to readonlyRootFilesystem):

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
  name: 0-default-restricted-psp
spec:
  allowPrivilegeEscalation: false
  fsGroup:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  readOnlyRootFilesystem: true
  runAsUser:
    rule: MustRunAsNonRoot
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    ranges:
    - max: 65535
      min: 1
    rule: MustRunAs
  volumes:
  - configMap
  - emptyDir
  - secret
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
    name: default-restricted-psp
rules:
- apiGroups:
  - policy
  resourceNames:
  - 0-default-restricted-psp
  resources:
  - podsecuritypolicies
  verbs:
  - use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: default-restricted-psp
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: default-restricted-psp
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:authenticated
  namespace: kube-system
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
  namespace: kube-system

If you’re authenticating to EKS as the same user who created cluster, this means you are a God / root / cluster-admin. You should not use that user on daily basis.

  • created new default restricted PSP
  • new clusterrole which allows to use new PSP
  • new clusterrolebinding to bind clusterrole to system:authenticated group so ANY authenticated user can use this new policy while scheduling new pods, but also allows kubernetes controllers to schedule pods using this policy.

Now, create new (cluster)role(s) and (cluster)rolebinding(s), map different IAM user/role with aws-auth configmap and enjoy safer cluster.

Gatekeeper Link to heading

While PSP are still in beta, new approach araised and it is called OPA Gatekeeper. The big difference here is that your webhook will verify your objects before they actually land in etcd instead of “when creating pod”.

With Gatekeeper you can do things like ensuring your deployments contain securityContext or whitelist certain container repositories. Downside you need to learn Rego.

Highly Recommended

Use managed worker pools Link to heading

You already opted in for managed service. Opt in for managed workers as well, it won’t hurt. You don’t want to manage and update AMIs by yourself, trust me. Amazon Linux 2 is good enough for a start.

Altho I’d recommend you to trial Flatcar Linux or Bottlerocket Linux. Sadly CoreOS is dead, but these two are continuing its legacy. Go check them out, really.

Summary Link to heading

EKS is a great service to start, no doubt. But ensure you’re familiar with shared responsibility model and understand that using managed service won’t secure you from failures. As stated in k8s failure stories, 99% of outages were caused not by masterplance issues (which you’re effectivly delegate responsibility for), but by part of service, which you’re responsible for.

As EKS comes with limitations, if security is your main concern I’d still encourage you to invest time and effort into self-managed solution like KOPS.

These are my recommendations for EKS so far, stay tuned for more (I’ll update this page eventually).