Zero Trust in Kubernetes

We are already attacked. What are you gonna do about it?

Introduction

A typical enterprise changed in so many ways. This evolution, while presented many benefits, also made legacy technologies incompatible in modern technological landscape. Gone are the days when we can clearly define a perimeter within which everything is "trustworthy". With the rise of cloud, remote work, it has become a futile exercise to even try to define one. This shift has resulted in a new paradigm of security — Zero trust.

What is zero trust?

The "zero" in zero trust is an exaggeration. For software to work something needs to trust something else. Zero trust is about reducing trust to the bare minimum not eliminating trust entirely!

Zero trust is primarily focused on data and service protection(any application, API, or another resource that is provided to users is a service BTW), but it should be and must be expanded to include all enterprise assets (devices) and users, human or non-human.

Zero trust means that! This approach assumes that an attacker is already present in the environment and that an enterprise-owned environment is not more trustworthy than a non-enterprise-owned environment.

“[N]o actor, system, network, or service operating outside or within the 
security perimeter is trusted. Instead, we must verify anything and everything 
attempting to establish access. It is a dramatic paradigm shift in [the] 
philosophy of how we secure our infrastructure, networks, and data, from verify
once at the perimeter to continual verification of each user, device, 
application, and transaction."
~ Department of Defense's Reference Architecture.

In this new approach, an enterprise must not assume any implicit trust and the risks are continually analyzed and evaluated and protections are used to mitigate these risks.

The protective measures are:

  1. Least privilege: Access to resources is minimized strictly to those subjects which need those resources.

  2. Continuous analysis: Each access request is identified and its security posture should be vetted.

What zero trust is:

  • A set of guiding principles for workflow, system design and operations that can be used to improve the security posture of any classification or sensitivity level.

  • Implementing zero trust doesn't mean you don't need firewalls, or you don't need to do anything else. It just means that you check not only once at the perimeter but every time everywhere.

What Zero trust is not:

  • It's not a wholesale replacement of technology – rather, Zero Trust is a series of implementation steps.

Why is Zero trust important for Kubernetes?

  • Kubernetes clusters are complex. They hold many things like nodes, pods, services, and microservices. With this complexity, it becomes really difficult to secure all the components. Zero trust simplifies this by assuming that no entity is trustworthy by default.

  • Kubernetes is also used as a platform for running multiple applications or services, sometimes for different teams. In such cases, resources are shared. Zero trust enables segmentation and isolation. This means applications cannot access each other's resources. In this way, access to resources is restricted even within a shared cluster.

  • Kubernetes is widely used for microservices. In such a use case, secure communication between microservices is very crucial for proper functioning of the app. Zero trust addresses these problems by implementing stringent access controls.

  • One of the core principles of zero trust is least privilege. Implementing zero trust principles reduce attack surface.

  • Applying zero trust means that we are monitoring continuously i.e., regularly monitoring, auditing activities such as pod-to-pod communication. This helps in detecting anomalies.

A trust zone is the area in network where the assets or resources are trusted. 
In zero trust, trust zones are to be shrinked as small as possible. Typically, 
any entitiy enters trust zone only after it is vetted by authorization and 
authentication.

Zero trust principles

The basic principles that guide implementing zero trust principles are:

  • The entire enterprise private network is not an implicit trust zone. This means that all devices and users on the network, including those inside the enterprise perimeter, must be authenticated and authorized before they can access any resources.

  • Devices on the network may not be owned or configurable by the enterprise. This includes devices such as personal devices (Bring Your Own Device - BYOD), contractor devices, and guest devices. ZTA networks must be able to accommodate these devices while maintaining security.

  • No resource is inherently trusted. This includes devices, users, applications, and data. ZTA networks must continuously evaluate the security posture of all resources before granting access.

  • Not all enterprise resources are on enterprise-owned infrastructure. Some enterprise resources may be located in the cloud or on other non-enterprise networks. ZTA networks must be able to provide secure access to these resources.

  • Remote subjects and assets cannot fully trust their local network connection. This means that devices and users on non-enterprise networks, such as remote workers and cloud-based applications, must assume that the local network is potentially hostile. ZTA networks must provide secure communication for remote subjects and assets.

  • Consistent security policy and posture for assets and workflows. Assets and workflows that move between enterprise and non-enterprise infrastructure must maintain a consistent security posture. This ensures that security is maintained even when devices and workloads transition between different network environments.

Implementing zero trust in Kubernetes

The following are some of the many ways we can implement zero-trust principles in Kubernetes:

Identity and access management (IAM)

There are two main ways to implement identity and access management (IAM) in Kubernetes:

  1. Use Kubernetes role-based access control (RBAC)

  2. Use an external IAM provider

Role-Based Access Control (RBAC)

Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within your organization.

RBAC authorization uses the rbac.authorization.k8s.io API group to drive authorization decisions, allowing you to dynamically configure policies through the Kubernetes API.

To enable RBAC, start the API server with the --authorization-mode flag set to a comma-separated list that includes RBAC; for example:

kube-apiserver --authorization-mode=Example,RBAC --other-options --more-options
API Objects

The RBAC API declares four kinds of Kubernetes objects: Role, ClusterRole, RoleBinding, and ClusterRoleBinding and we can describe or amend the objects using tools such as kubectl.

Role

A Role always sets permissions within a particular namespace; when you create a Role, you have to specify the namespace it belongs in.

ClusterRole

ClusterRole, by contrast, is a non-namespaced resource. The resources have different names (Role and ClusterRole) because a Kubernetes object always has to be either namespaced or not namespaced; it can't be both.

If you want to define a role within a namespace, use a Role; if you want to define a role cluster-wide, use a ClusterRole.

RoleBinding

A role binding grants the permissions defined in a role to a user or set of users. It holds a list of subjects (users, groups, or service accounts), and a reference to the role being granted. A RoleBinding grants permissions within a specific namespace.

A RoleBinding may reference any Role in the same namespace. Alternatively, a RoleBinding can reference a ClusterRole and bind that ClusterRole to the namespace of the RoleBinding.

ClusterRoleBinding

A ClusterRoleBinding grants the permissions defined in a role to a user or set of users across the cluster.

The following ClusterRoleBinding allows any user in the group "manager" to read secrets in any namespace.

apiVersion: rbac.authorization.k8s.io/v1
# This cluster role binding allows anyone in the "manager" group to read secrets in any namespace.
kind: ClusterRoleBinding
metadata:
  name: read-secrets-global
subjects:
- kind: Group
  name: manager # Name is case sensitive
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: secret-reader
  apiGroup: rbac.authorization.k8s.io

In this YAML definition of ClusterRoleBinding, we can see that kind specifies the kind of object that is being defined the file i.e., ClusterRoleBinding. the metadata defines the information about ClusterRoleBinding that is name: read-secrets-global, the subjects field specifies the objects to which the ClusterRoleBinding will bind the role to: any user in the Group manager, roleRef defines the role that is being bound which is secret-holder which is of the type ClusterRole.

After you create a binding, you cannot change the Role or ClusterRole it refers to. We will have to remove the binding object and create a new one.

Use an external IAM provider

An external IAM provider is a third-party service that can be used to manage users and their permissions for Kubernetes. External IAM providers can offer a number of features that are not available in Kubernetes RBAC, such as support for multiple authentication methods.

There are several different external IAM providers available, such as Okta, Azure Active Directory, and AWS IAM.

You choose to use RBAC or external IAM based on your specific needs.

Microsegmentation

Microsegmentation is a security technique that logically divides a network into small segments and isolates each segment from the others. In this way, security breaches are contained and attackers are prevented from moving laterally within the network.

Microsegmentation can be implemented in Kubernetes using a variety of tools:

  1. One common way is to use network policies.

  2. Another way is to use a service mesh.

Network Policies

Kubernetes NetworkPolicies are a way to control traffic flow between pods and other network entities at the IP address or port level (OSI layer 3 or 4).

  • NetworkPolicies are defined based on the needs of an application. For example, you might have a NetworkPolicy that allows a pod to communicate with a specific database service, but blocks all other traffic.

  • NetworkPolicies can be used to control both ingress and egress traffic. Ingress traffic is traffic that is coming into a pod, while egress traffic is traffic that is leaving a pod.

  • NetworkPolicies are defined using a selector. The selector is used to identify the pods that the NetworkPolicy applies to. For example, you might have a NetworkPolicy that applies to all pods with the label app=web.

  • NetworkPolicies can also be used to control traffic to and from IP blocks(An IP block, also known as an IP range, is a contiguous segment of Internet Protocol (IP) addresses assigned to an organization or country. IP blocks are used to identify networks and to route traffic between them.). This can be useful for controlling traffic to and from external services, such as databases or load balancers.

  • To use NetworkPolicies, you must be using a networking solution which supports NetworkPolicy. Some popular networking solutions that support NetworkPolicy include:

    • Calico

    • Cilium

    • Flannel

    • Weave Net

By default, if no policies exist in a namespace, then all ingress and egress traffic is allowed to and from pods in that namespace.

Use a Service Mesh

A service mesh is a dedicated infrastructure layer that provides application-level networking, security, and observability for microservices. It sits between the microservices and the underlying infrastructure, and it handles all of the communication between them.

Service meshes are typically implemented using a sidecar proxy pattern.

A sidecar proxy is a lightweight proxy that runs alongside each microservice. It intercepts all of the microservice's network traffic and routes it through the service mesh.

  • A service mesh can be used to implement network policies. In that way we achieve microsegmentation.

Some of the most popular service meshes include

  • Istio

  • Linkerd

  • Consul Connect.

Pod Security Standards

  • The Pod Security Standards define three different policies to broadly cover the security spectrum: Privileged, baseline, restrictive.

  • These policies are cumulative and range from highly permissive to highly restrictive.

Privileged

  • This policy is entirely unrestricted and aimed only at trusted users, processes, and devices.

Baseline

The Baseline policy is a set of security controls that are designed to prevent known privilege escalations(Privilege escalation is a security exploit where an attacker gains access to a higher level of privileges or permissions than they were initially granted, allowing them to perform unauthorized actions within a system or network.) in Kubernetes pods. It is aimed at ease of adoption for common containerized workloads, while still providing a basic level of security.

The policy restricts the following:

  • HostProcess containers: These are containers that run with full access to the host operating system. This can be a security risk, so the policy disallows them.

  • Host namespaces: These are namespaces that provide access to the host's resources, such as the network and filesystem. The policy disallows sharing of these namespaces.

  • Privileged containers: These are containers that have all capabilities enabled. This can also be a security risk, so the policy disallows them.

  • Capabilities: These are special permissions that allow a container to perform certain tasks, such as accessing the network or filesystem. The policy only allows a limited set of capabilities.

  • HostPath volumes: These are volumes that are mounted from the host's filesystem. This can be a security risk, so the policy disallows them.

  • HostPorts: These are ports that are exposed to the host's network. The policy either disallows hostPorts entirely or restricts them to a known list.

  • AppArmor: AppArmor is a Linux security module that can be used to restrict the capabilities of a container. The policy either disallows overriding or disabling the default AppArmor profile, or restricts overrides to an allowed set of profiles.

  • SELinux: SELinux is a Linux security module that can be used to label containers and control their access to resources. The policy restricts the ability to set the SELinux type, user, or role.

  • /proc Mount Type: The /proc filesystem provides information about the host system. The policy requires the default /proc masks to be set, which reduces the attack surface.

  • Seccomp profile: Seccomp is a Linux security feature that can be used to restrict the syscalls that a container can make. The policy disallows setting the seccomp profile to Unconfined.

  • Sysctls: Sysctls are kernel parameters that can be used to configure the Linux kernel. The policy disallows setting sysctls except for an allowed "safe" subset.

The Baseline policy is a good starting point for securing Kubernetes pods. However, it is important to note that it is not a complete security solution. You may need to implement additional security measures, such as network segmentation and intrusion detection, to protect your Kubernetes cluster.

Restricted

  • The Restricted policy for Kubernetes pods is a set of security controls that are designed to enforce current Pod hardening best practices at the expense of some compatibility. It is targeted at operators and developers of security-critical applications, as well as lower-trust users.

  • The Restricted policy includes all of the controls from the Baseline policy, plus the following:

    Volume Types: The Restricted policy only permits the following volume types:

    • ConfigMap: A ConfigMap is a Kubernetes object that is used to store configuration data in key-value pairs. ConfigMaps are often used to store environment variables, configuration files, and other types of configuration data. ConfigMaps are a good choice for storing data that needs to be accessed by multiple containers or pods.

    • DownwardAPI: The DownwardAPI is a Kubernetes feature that allows you to inject information about the current pod into the environment of a container. This information includes things like the pod name, namespace, and labels. The DownwardAPI is a good choice for injecting information that needs to be available to all containers in a pod.

    • Ephemeral volume: An ephemeral volume is a temporary volume that is created and deleted with the pod. Ephemeral volumes are a good choice for storing data that only needs to be available for the lifetime of a pod.

    • PersistentVolumeClaim: A PersistentVolumeClaim (PVC) is a Kubernetes object that is used to request a PersistentVolume (PV). A PV is a Kubernetes object that represents a piece of physical storage, such as a disk or a directory on a network filesystem. PVCs are a good choice for storing data that needs to be persisted across pod restarts.

    • Projected volume: A projected volume is a type of volume that allows you to project information from different sources into a single volume. This information can include things from ConfigMaps, Secrets, and DownwardAPI. Projected volumes are a good choice for storing data that needs to be combined from multiple sources.

    • Secret: A Secret is a Kubernetes object that is used to store sensitive data, such as passwords, keys, and certificates. Secrets are encrypted when they are stored at rest, and they are only decrypted when they are needed by a container. Secrets are a good choice for storing data that needs to be kept secure.

    • Seccomp: Seccomp is a Linux security feature that allows you to restrict the syscalls that a process can make. Kubernetes supports a variety of Seccomp profiles, which are predefined sets of allowed syscalls. The Restricted policy requires the Seccomp profile to be explicitly set to one of the allowed values. This helps to prevent attackers from executing malicious code on the host system.

Network encryption

Network encryption in Kubernetes is the process of encrypting traffic between pods, nodes, and other Kubernetes resources. This helps to protect your data from unauthorized access and tampering.

There are two main ways to implement network encryption in Kubernetes:

  • Node-to-node encryption: This encrypts traffic between nodes in the cluster. It is typically implemented using a VPN or other tunneling protocol.

  • Pod-to-pod encryption: This encrypts traffic between pods on the same node. It is typically implemented using a service mesh.

Node-to-node encryption is important for protecting your data from attackers who have gained access to one or more nodes in the cluster. Pod-to-pod encryption is important for protecting your data from attackers who have gained access to a single pod in the cluster.

Kubernetes supports network encryption using TLS. Encrypting network traffic protects data from main-in-the-middle attack.

End-to-end security monitoring

End-to-end security monitoring is the process of collecting and analyzing data from all aspects of your Kubernetes environment to identify data from all aspects of your Kubernetes environment and respond to security threats if any.

There are many tools and services to achieve end-to-end monitoring. Some of them are:

  • Promethues

  • Grafana (visualization tool)

  • Sysdig, etc

Zero trust best practices for Kubernetes

The following are the best practices while implementing zero trust principles in Kubernetes:

  • The traffic should be authenticated, authorized regardless of whether it is destined for API Server or Pods. Zero trust principles should be applied to both Control plane and data plane.

  • Use a service mesh

  • Use Ingress Controllers. An Ingress controller can be used to implement zero trust principles by only allowing authenticated and authorized traffic to reach your pods. There are a number of ingress controllers. You can find more info here.

  • Implement a least privilege authorization model.

  • Use strong encryption for all traffic.

  • Implement continuous authentication and authorization.

  • Monitor your environment for suspicious activity.

Conclusion

Thank you so much for reading this REALLY LONG blog. A lot of effort went into writing this blog. I would really appreciate some feedback. If you like this blog, consider reading my other blogs as well. To receive my blogs directly into your inbox, consider subscribing to my newsletter. Until next time, keep learning, keep sharing!