How do I use Topology Aware Hints in Amazon EKS?

4 minute read

I want to use Topology Aware Hints (TAH) in my Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Resolution

Note: TAH might not be suitable for clusters that have Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances, Horizontal Pod Autoscaling, or auto scaling turned on. When you use these cluster configurations, you can't achieve an allocation that's proportional to the CPU cores allocated to nodes. You exceed the allowed overhead threshold. Also, if there are pod assignment constraints that prohibit endpoint redistribution, then kube-proxy doesn't use TAH.

Prerequisites

Make sure that your Amazon EKS cluster version is 1.24 or later.
Set up an Amazon EKS cluster and a managed node group with three nodes. Each node must have the same CPU capacity and must be distributed across three Availability Zones.

To use TAH in Amazon EKS, complete the following steps:

Create a new namespace:

Note: Replace example-namespace with your namespace name.

apiVersion: v1
kind: Namespace
metadata:
  name: "example-namespace"
  labels:
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/warn: restricted

Use the BusyBox image to create a sample deployment:

Note: Replace example-deployment-name with your deployment name and example-namespace with your namespace name.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment-name
  namespace: example-namespace
spec:
  replicas: 3
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      dnsPolicy: Default
      enableServiceLinks: false
      automountServiceAccountToken: false
      securityContext:
        seccompProfile:
          type: RuntimeDefault
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
      containers:
        - name: busybox
          image: public.ecr.aws/docker/library/busybox:latest
          command: ["/bin/sh"]
          args:
            - "-c"
            - |
              echo "<html><body><h1>PodName: $MY_POD_NAME  NodeName: $MY_NODE_NAME podIP:$MY_POD_IP</h1></body></html>" > /tmp/index.html;
              while true; do
                printf 'HTTP/1.1 200 OK\n\n%s\n' $(cat /tmp/index.html) | nc -l -p 8080
              done
          ports:
            - containerPort: 8080
          env:
          - name: MY_NODE_NAME
            valueFrom:
             fieldRef:
              fieldPath: spec.nodeName
          - name: MY_POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          resources:
            limits:
              memory: "128Mi"
              cpu: "500m"
            requests:
              memory: "64Mi"
              cpu: "250m"
          securityContext:
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop:
                - ALL
          volumeMounts:
          - name: tmp
            mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir: {}

Expose the deployment as a ClusterIP service type, and then add service.kubernetes.io/topology-mode: auto as an annotation:

Note: Replace example-service-name with your service name and example-namespace with your namespace name. In version 1.27 or later, the service.kubernetes.io/topology-aware-hints: auto annotation is changed to service.kubernetes.io/topology-mode: auto.

apiVersion: v1
kind: Service
metadata:
  name: example-service-name
  namespace: example-namespace
  annotations:
   service.kubernetes.io/topology-mode: auto
spec:
  selector:
    app: demo
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

Check if the TAHs are populated in the endpoint:

Note: Replace example-namespace with your namespace name and example-service-name with your service name.

kubectl get 'endpointslices.discovery.k8s.io' -l kubernetes.io/service-name=example-service-name -n example-namespace -o yaml

Example output:

endpoints:
- addresses:
  - 10.0.21.125
  conditions:
    ready: true
    serving: true
    terminating: false
  hints:
    forZones:
    - name: eu-west-1b
  nodeName: ip-10-0-17-215.eu-west-1.compute.internal
  targetRef:
    kind: Pod
    name: example-deployment-name-5875bbbb7c-m2j8t
    namespace: example-namespace
    uid: 4e789648-965e-4caa-91db-bd27d240ea59
  zone: eu-west-1b

Deploy a test pod to check if the traffic is routed to a pod in the same Availability Zone.

Note: Replace example-node-name with your node name.

kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot --overrides='{"spec": { "nodeSelector": {"kubernetes.io/hostname":"example-node-name"}}}'

Find the pod and node that your test pod connects to:

curl example-service-name.example-namespace:80

Example output:

PodName: 7b7b9bf455-c27z9  HTTP/1.1 200 OK
NodeName: ip-10-0-9-45.eu-west-1.compute.internal
HTTP/1.1 200 OK
podIP: example-10.0.11.140

Use PodName and NodeName from the preceding output to check whether traffic aligns with the same Availability Zone where your test pod is deployed.
Scale the deployment to four replicas, and then inspect the EndpointSlices:

Note: Replace example-namespace with your namespace name and example-deployment-name with your deployment name.
```
kubectl -n example-namespace scale deployments example-deployment-name --replicas=4
```
Note: A deployment that's scaled to four replicas results in at least one Availability Zone that has a 50% ratio of endpoints. Also, the overhead threshold of 20% is exceeded and TAHs aren't used for kube-proxy.

Related information

Topology Aware Routing on the Kubernetes website

Exploring the effect of Topology Aware Hints on network traffic in Amazon Elastic Kubernetes Service

Topics

Containers

Relevant content

Hibernating Spot Instances upon interruption in Amazon EKS
Accepted Answer
Jay_M
asked 4 years ago
How do i remove Amazon Elastic Compute Cloud NatGateway?
mosalem
asked 6 months ago
Disable Elastic Load Balancing And Amazon Elastic Compute Cloud Instances
rePost-User-3769112
asked a year ago
Do people actually use Amazon EC2 Spot?
renshinsuk
asked a year ago
Does EKS support topology Aware Hints?
Accepted Answer
rePost-User-7467190
asked 2 years ago
How do I troubleshoot Window update failures for Amazon Elastic Compute Cloud instances?
AWS OFFICIALUpdated 4 months ago
How do I launch and troubleshoot Spot Instances using Amazon EKS managed node groups?
AWS OFFICIALUpdated a year ago
What are some best practices for using EC2 Spot Instances with Amazon EKS?
AWS OFFICIALUpdated 2 years ago
How do I create and troubleshoot topology aware volume provisioning that uses an EBS CSI driver in Amazon EKS?
AWS OFFICIALUpdated a year ago
Introducing Amazon EC2 I4g storage-optimized instances
EXPERT
Markus Adhiwiyogo
published a year ago