Introduction #
EKS is a great solution for those, who can not afford to have dedicated team looking for k8s layer only (compared to KOPS deployments). But as most software/utilities not only comes with a set of default settings which you should adjust, but also limitations - the same applies for EKS (and most likely any managed kubernetes service as only masterplane responsibility is handed over to a provider and YOU are still responsible for you workers and workflows).
One of the biggest limiation in my opinion is that you can not modify enabled admission controllers by yourself. Both AlwaysPullImages
and DenyEscalatingExec
are disabled and there is no way to enable them to improve security of your cluster. Altho the latter one has been deprecated in and finally has been removed in v1.18 (recommended approach is to use mix of RBAC and PSP or custom admission plugin). Importance of the AllwaysPullImages
is greatly described in this blog post. And this is just single limitation example which may impact your EKS clusters.
NOTE: I’ve performed these actions on EKS 1.17.
Etcd encryption #
When you’re creating your EKS cluster, ensure to check “use envelope encryption for etcd” checkbox.
More details here.
NOTE: You will have to recreate cluster if you have not initially enabled this feature.
CoreDNS #
By default coredns
deployment comes with soft podAntiAffinity rule and without PodDisruptionBudget - this is wrong.
While there is small chance to colocate both replicas on single worker, this can still happen due to soft rule. And it eventually happen. Happened for me few time, will happen for you too, its a matter of time.
First of all, who runs kubernetes cluster with single worker these days?
Assuming above situation has happened, and you’re in position where you have worker with both coredns
replicas scheduled on it (+ no PDB) and that worker goes down ungracefully or you start to drain this node. DNS outage - you’re right.
There for I propose to replace default coredns
deployment with attached manifest. There are two differences if you’ll compare it to default:
- replaced soft PodAntiAffinity constraint with hard
- updated coredns to
1.6.9
as there are few updates to plugins
coredns.yaml #
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
labels:
eks.amazonaws.com/component: coredns
k8s-app: kube-dns
kubernetes.io/name: CoreDNS
name: coredns
namespace: kube-system
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
eks.amazonaws.com/component: coredns
k8s-app: kube-dns
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
eks.amazonaws.com/compute-type: ec2
creationTimestamp: null
labels:
eks.amazonaws.com/component: coredns
k8s-app: kube-dns
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/os
operator: In
values:
- linux
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname
containers:
- args:
- -conf
- /etc/coredns/Corefile
image: coredns/coredns:1.6.9
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: coredns
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/coredns
name: config-volume
readOnly: true
- mountPath: /tmp
name: tmp
dnsPolicy: Default
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: coredns
serviceAccountName: coredns
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- key: CriticalAddonsOnly
operator: Exists
volumes:
- emptyDir: {}
name: tmp
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
name: coredns
name: config-volume
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: coredns
namespace: kube-system
spec:
minAvailable: 1
selector:
matchLabels:
k8s-app: kube-dns
CNI #
EKS by default comes with aws-vpc-cni out of the box. Be familiar with its limitations.
Strongly advice to consider replacing it with either Calico or Clilium. This blog post may make you preffer Cilium.
IRSA #
Enable IRSA from day0. This is a must, otherwise all your pods will have node privileges.
Other options you’d want to consider using instead are kiam (necessarily in pair with with cert-manager - see this blog post and kiam docs) for or kube2iam.
PodSecurityPolicies #
First thing (and most important) to note is that EKS comes by default with single eks.privileged
PodSecurityPolicy applied. You can read about them in documentation. In short, it allows ALL authenticated users to deploy privileged containers on to your cluster. Why bother?
ANY anythenticated user can:
- mount ANY directory from your hosts to his containers
- run ANY containers in your host network
- run ANY containers as root
- run containers with ALL capabilities
- …
Trust me, this is not what you want to have on any kind of cluster (minikube does not count).
Start by removing defaults (you can always restore it whenever needed, see docs):
kubectl delete PodSecurityPolicy eks.privileged
kubectl delete ClusterRole eks:podsecuritypolicy:privileged
kubectl delete ClusterRoleBinding eks:podsecuritypolicy:authenticated
Next step is to add slightly less permissive policy so aws-node
, kube-proxy
and coredns
can be scheduled. My example was generated with kube-psp-advisor. I assume you might adjust this policy later or add new one and update role if you want to run other things in kube-system
namespace.
psp-for-kube-system.yaml #
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
name: psp-for-kube-system
spec:
allowPrivilegeEscalation: true
allowedHostPaths:
- pathPrefix: /etc/cni/net.d
- pathPrefix: /lib/modules
- pathPrefix: /opt/cni/bin
- pathPrefix: /run/xtables.lock
- pathPrefix: /var/log
- pathPrefix: /var/log
- pathPrefix: /var/run/docker.sock
- pathPrefix: /var/run/dockershim.sock
defaultAddCapabilities:
- NET_BIND_SERVICE
fsGroup:
rule: RunAsAny
hostNetwork: true
hostPorts:
- max: 61678
min: 0
privileged: true
readOnlyRootFilesystem: false
requiredDropCapabilities:
- all
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- emptyDir
- configMap
- hostPath
- secret
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: psp-for-kube-system
namespace: kube-system
rules:
- apiGroups:
- policy
resourceNames:
- psp-for-kube-system
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: psp-for-kube-system
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: psp-for-kube-system
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
namespace: kube-system
What has happened there is you’ve:
- created new PSP
- new role within kube-system namespace which allows to use new PSP
- new rolebinding within kube-system namespace to bind role to system:serviceaccounts so kubernetes controllers can use this new policy while scheduling new pods
Because PSP checks are performed on schedule time, you need to rollout all pods in kube-system
namespace, easiest way to achieve this is simply by kubectl -n kube-system delete pod --all --wait=false
. In few secs, you should see all pods being recreated. You can verify psp assigment by kubectl get pod -n kube-system -o yaml | grep psp
- as output you should get few lines of kubernetes.io/psp: psp-for-kube-system
.
Now it is time to apply default, more restricted policy. Start with this one (adjust for your needs, pointing to readonlyRootFilesystem):
psp-restricted.yaml #
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
name: 0-default-restricted-psp
spec:
allowPrivilegeEscalation: false
fsGroup:
ranges:
- max: 65535
min: 1
rule: MustRunAs
readOnlyRootFilesystem: true
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
ranges:
- max: 65535
min: 1
rule: MustRunAs
volumes:
- configMap
- emptyDir
- secret
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: default-restricted-psp
rules:
- apiGroups:
- policy
resourceNames:
- 0-default-restricted-psp
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: default-restricted-psp
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: default-restricted-psp
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
namespace: kube-system
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts
namespace: kube-system
If you’re authenticating to EKS as the same user who created cluster, this means you are a God / root / cluster-admin. You should not use that user on daily basis.
- created new default restricted PSP
- new clusterrole which allows to use new PSP
- new clusterrolebinding to bind clusterrole to system:authenticated group so ANY authenticated user can use this new policy while scheduling new pods, but also allows kubernetes controllers to schedule pods using this policy.
Now, create new (cluster)role(s) and (cluster)rolebinding(s), map different IAM user/role with aws-auth
configmap and enjoy safer cluster.
Gatekeeper #
While PSP are still in beta, new approach araised and it is called OPA Gatekeeper. The big difference here is that your webhook will verify your objects before they actually land in etcd instead of “when creating pod”.
With Gatekeeper you can do things like ensuring your deployments contain securityContext or whitelist certain container repositories. Downside you need to learn Rego.
Highly Recommended
Use managed worker pools #
You already opted in for managed service. Opt in for managed workers as well, it won’t hurt. You don’t want to manage and update AMIs by yourself, trust me. Amazon Linux 2 is good enough for a start.
Altho I’d recommend you to trial Flatcar Linux or Bottlerocket Linux. Sadly CoreOS is dead, but these two are continuing its legacy. Go check them out, really.
Summary #
EKS is a great service to start, no doubt. But ensure you’re familiar with shared responsibility model and understand that using managed service won’t secure you from failures. As stated in k8s failure stories, 99% of outages were caused not by masterplance issues (which you’re effectivly delegate responsibility for), but by part of service, which you’re responsible for.
As EKS comes with limitations, if security is your main concern I’d still encourage you to invest time and effort into self-managed solution like KOPS.
These are my recommendations for EKS so far, stay tuned for more (I’ll update this page eventually).