<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kubernetes Blog</title><link>https://kubernetes.io/</link><description>The Kubernetes blog is used by the project to communicate new features, community reports, and any news that might be relevant to the Kubernetes community.</description><generator>Hugo -- gohugo.io</generator><language>en</language><image><url>https://raw.githubusercontent.com/kubernetes/kubernetes/master/logo/logo.png</url><title>The Kubernetes project logo</title><link>https://kubernetes.io/</link></image><atom:link href="https://kubernetes.io/feed.xml" rel="self" type="application/rss+xml"/><item><title>Kubernetes v1.35: Restricting executables invoked by kubeconfigs via exec plugin allowList added to kuberc</title><link>https://kubernetes.io/blog/2026/01/09/kubernetes-v1-35-kuberc-credential-plugin-allowlist/</link><pubDate>Fri, 09 Jan 2026 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2026/01/09/kubernetes-v1-35-kuberc-credential-plugin-allowlist/</guid><description>
&lt;p>Did you know that &lt;code>kubectl&lt;/code> can run arbitrary executables, including shell
scripts, with the full privileges of the invoking user, and without your
knowledge? Whenever you download or auto-generate a &lt;code>kubeconfig&lt;/code>, the
&lt;code>users[n].exec.command&lt;/code> field can specify an executable to fetch credentials on
your behalf. Don't get me wrong, this is an incredible feature that allows you
to authenticate to the cluster with external identity providers. Nevertheless,
you probably see the problem: Do you know exactly what executables your &lt;code>kubeconfig&lt;/code>
is running on your system? Do you trust the pipeline that generated your &lt;code>kubeconfig&lt;/code>?
If there has been a supply-chain attack on the code that generates the kubeconfig,
or if the generating pipeline has been compromised, an attacker might well be
doing unsavory things to your machine by tricking your &lt;code>kubeconfig&lt;/code> into running
arbitrary code.&lt;/p>
&lt;p>To give the user more control over what gets run on their system, &lt;a href="https://git.k8s.io/community/sig-auth">SIG-Auth&lt;/a> and &lt;a href="https://git.k8s.io/community/sig-cli">SIG-CLI&lt;/a> added the credential plugin policy and allowlist as a beta feature to
Kubernetes 1.35. This is available to all clients using the &lt;code>client-go&lt;/code> library,
by filling out the &lt;a href="https://github.com/kubernetes/client-go/blob/master/tools/clientcmd/api/types.go#L290">ExecProvider.PluginPolicy&lt;/a> struct on a REST config. To
broaden the impact of this change, Kubernetes v1.35 also lets you manage this without
writing a line of application code. You can configure &lt;code>kubectl&lt;/code> to enforce
the policy and allowlist by adding two fields to the &lt;code>kuberc&lt;/code> configuration
file: &lt;code>credentialPluginPolicy&lt;/code> and &lt;code>credentialPluginAllowlist&lt;/code>. Adding one or
both of these fields restricts which credential plugins &lt;code>kubectl&lt;/code> is allowed to execute.&lt;/p>
&lt;h2 id="how-it-works">How it works&lt;/h2>
&lt;p>A full description of this functionality is available in our &lt;a href="https://kubernetes.io/docs/reference/kubectl/kuberc/">official documentation&lt;/a> for kuberc,
but this blog post will give a brief overview of the new security knobs. The new
features are in beta and available without using any feature gates.&lt;/p>
&lt;p>The following example is the simplest one: simply don't specify the new fields.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubectl.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Preference&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This will keep &lt;code>kubectl&lt;/code> acting as it always has, and all plugins will be
allowed.&lt;/p>
&lt;p>The next example is functionally identical, but it is more explicit and
therefore preferred if it's actually what you want:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubectl.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Preference&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">credentialPluginPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>AllowAll&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If you &lt;em>don't know&lt;/em> whether or not you're using exec credential plugins, try
setting your policy to &lt;code>DenyAll&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubectl.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Preference&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">credentialPluginPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>DenyAll&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If you &lt;em>are&lt;/em> using credential plugins, you'll quickly find out what &lt;code>kubectl&lt;/code> is
trying to execute. You'll get an error like the following.&lt;/p>
&lt;blockquote>
&lt;p>Unable to connect to the server: getting credentials: plugin &amp;quot;cloudco-login&amp;quot; not allowed: policy set to &amp;quot;DenyAll&amp;quot;&lt;/p>
&lt;/blockquote>
&lt;p>If there is insufficient information for you to debug the issue, increase the
logging verbosity when you run your next command. For example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># increase or decrease verbosity if the issue is still unclear&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>kubectl get pods --verbosity &lt;span style="color:#666">5&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="selectively-allowing-plugins">Selectively allowing plugins&lt;/h3>
&lt;p>What if you need the &lt;code>cloudco-login&lt;/code> plugin to do your daily work? That is why
there's a third option for your policy, &lt;code>Allowlist&lt;/code>. To allow a specific plugin,
set the policy and add the &lt;code>credentialPluginAllowlist&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubectl.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Preference&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">credentialPluginPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Allowlist&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">credentialPluginAllowlist&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>/usr/local/bin/cloudco-login&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>get-identity&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>You'll notice that there are two entries in the allowlist. One of them is
specified by full path, and the other, &lt;code>get-identity&lt;/code> is just a basename. When
you specify just the basename, the full path will be looked up using
&lt;code>exec.LookPath&lt;/code>, which does not expand globbing or handle wildcards.
Globbing is not supported at this time. Both forms
(basename and full path) are acceptable, but the full path is preferable because
it narrows the scope of allowed binaries even further.&lt;/p>
&lt;h3 id="future-enhancements">Future enhancements&lt;/h3>
&lt;p>Currently, an allowlist entry has only one field, &lt;code>name&lt;/code>. In the future, we
(Kubernetes SIG CLI) want to see other requirements added. One idea that seems
useful is checksum verification whereby, for example, a binary would only be allowed
to run if it has the sha256 sum
&lt;code>b9a3fad00d848ff31960c44ebb5f8b92032dc085020f857c98e32a5d5900ff9c&lt;/code> &lt;strong>and&lt;/strong>
exists at the path &lt;code>/usr/bin/cloudco-login&lt;/code>.&lt;/p>
&lt;p>Another possibility is only allowing binaries that have been signed by one of a
set of a trusted signing keys.&lt;/p>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>The credential plugin policy is still under development and we are very interested
in your feedback. We'd love to hear what you like about it and what problems
you'd like to see it solve. Or, if you have the cycles to contribute one of the
above enhancements, they'd be a great way to get started contributing to
Kubernetes. Feel free to join in the discussion on slack:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.slack.com/archives/C2GL57FJ4">#sig-cli&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://kubernetes.slack.com/archives/C0EN96KUY">#sig-auth&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.35: Mutable PersistentVolume Node Affinity (alpha)</title><link>https://kubernetes.io/blog/2026/01/08/kubernetes-v1-35-mutable-pv-nodeaffinity/</link><pubDate>Thu, 08 Jan 2026 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2026/01/08/kubernetes-v1-35-mutable-pv-nodeaffinity/</guid><description>
&lt;p>The PersistentVolume &lt;a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/#node-affinity">node affinity&lt;/a> API
dates back to Kubernetes v1.10.
It is widely used to express that volumes may not be equally accessible by all nodes in the cluster.
This field was previously immutable,
and it is now mutable in Kubernetes v1.35 (alpha). This change opens a door to more flexible online volume management.&lt;/p>
&lt;h2 id="why-make-node-affinity-mutable">Why make node affinity mutable?&lt;/h2>
&lt;p>This raises an obvious question: why make node affinity mutable now?
While stateless workloads like Deployments can be changed freely
and the changes will be rolled out automatically by re-creating every Pod,
PersistentVolumes (PVs) are stateful and cannot be re-created easily without losing data.&lt;/p>
&lt;p>However, Storage providers evolve and storage requirements change.
Most notably, multiple providers are offering regional disks now.
Some of them even support live migration from zonal to regional disks, without disrupting the workloads.
This change can be expressed through the
&lt;a href="https://kubernetes.io/docs/concepts/storage/volume-attributes-classes/">VolumeAttributesClass&lt;/a> API,
which recently graduated to GA in 1.34.
However, even if the volume is migrated to regional storage,
Kubernetes still prevents scheduling Pods to other zones because of the node affinity recorded in the PV object.
In this case, you may want to change the PV node affinity from:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeAffinity&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">required&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeSelectorTerms&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>topology.kubernetes.io/zone&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>In&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">values&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- us-east1-b&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>to:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeAffinity&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">required&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeSelectorTerms&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>topology.kubernetes.io/region&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>In&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">values&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- us-east1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>As another example, providers sometimes offer new generations of disks.
New disks cannot always be attached to older nodes in the cluster.
This accessibility can also be expressed through PV node affinity and ensures the Pods can be scheduled to the right nodes.
But when the disk is upgraded, new Pods using this disk can still be scheduled to older nodes.
To prevent this, you may want to change the PV node affinity from:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeAffinity&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">required&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeSelectorTerms&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>provider.com/disktype.gen1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>In&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">values&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- available&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>to:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeAffinity&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">required&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodeSelectorTerms&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>provider.com/disktype.gen2&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>In&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">values&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- available&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>So, it is mutable now, a first step towards a more flexible online volume management.
While it is a simple change that removes one validation from the API server,
we still have a long way to go to integrate well with the Kubernetes ecosystem.&lt;/p>
&lt;h2 id="try-it-out">Try it out&lt;/h2>
&lt;p>This feature is for you if you are a Kubernetes cluster administrator,
and your storage provider allows online update that you want to utilize,
but those updates can affect the accessibility of the volume.&lt;/p>
&lt;p>Note that changing PV node affinity alone will not actually change the accessibility of the underlying volume.
Before using this feature,
you must first update the underlying volume in the storage provider,
and understand which nodes can access the volume after the update.
You can then enable this feature and keep the PV node affinity in sync.&lt;/p>
&lt;p>Currently, this feature is in alpha state.
It is disabled by default, and may subject to change.
To try it out, enable the &lt;code>MutablePVNodeAffinity&lt;/code> feature gate on APIServer, then you can edit the PV &lt;code>spec.nodeAffinity&lt;/code> field.
Typically only administrators can edit PVs, please make sure you have the right RBAC permissions.&lt;/p>
&lt;h3 id="race-condition-between-updating-and-scheduling">Race condition between updating and scheduling&lt;/h3>
&lt;p>There are only a few factors outside of a Pod that can affect the scheduling decision, and PV node affinity is one of them.
It is fine to allow more nodes to access the volume by relaxing node affinity,
but there is a race condition when you try to tighten node affinity:
it is unclear how the Scheduler will see the modified PV in its cache,
so there is a small window where the scheduler may place a Pod on an old node that can no longer access the volume.
In this case, the Pod will stuck at &lt;code>ContainerCreating&lt;/code> state.&lt;/p>
&lt;p>One mitigation currently under discussion is for the kubelet to fail Pod startup if the PersistentVolume’s node affinity is violated.
This has not landed yet.
So if you are trying this out now, please watch subsequent Pods that use the updated PV,
and make sure they are scheduled onto nodes that can access the volume.
If you update PV and immediately start new Pods in a script, it may not work as intended.&lt;/p>
&lt;h2 id="future-integration-with-csi-container-storage-interface">Future integration with CSI (Container Storage Interface)&lt;/h2>
&lt;p>Currently, it is up to the cluster administrator to modify both PV's node affinity and the underlying volume in the storage provider.
But manual operations are error-prone and time-consuming.
It is preferred to eventually integrate this with VolumeAttributesClass,
so that an unprivileged user can modify their PersistentVolumeClaim (PVC) to trigger storage-side updates,
and PV node affinity is updated automatically when appropriate, without the need for cluster admin's intervention.&lt;/p>
&lt;h2 id="we-welcome-your-feedback-from-users-and-storage-driver-developers">We welcome your feedback from users and storage driver developers&lt;/h2>
&lt;p>As noted earlier, this is only a first step.&lt;/p>
&lt;p>If you are a Kubernetes user,
we would like to learn how you use (or will use) PV node affinity.
Is it beneficial to update it online in your case?&lt;/p>
&lt;p>If you are a CSI driver developer,
would you be willing to implement this feature? How would you like the API to look?&lt;/p>
&lt;p>Please provide your feedback via:&lt;/p>
&lt;ul>
&lt;li>Slack channel &lt;a href="https://kubernetes.slack.com/messages/sig-storage">#sig-storage&lt;/a>.&lt;/li>
&lt;li>Mailing list &lt;a href="https://groups.google.com/a/kubernetes.io/g/sig-storage">kubernetes-sig-storage&lt;/a>.&lt;/li>
&lt;li>The KEP issue &lt;a href="https://kep.k8s.io/5381">Mutable PersistentVolume Node Affinity&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>For any inquiries or specific questions related to this feature, please reach out to the &lt;a href="https://github.com/kubernetes/community/tree/master/sig-storage">SIG Storage community&lt;/a>.&lt;/p></description></item><item><title>Kubernetes v1.35: A Better Way to Pass Service Account Tokens to CSI Drivers</title><link>https://kubernetes.io/blog/2026/01/07/kubernetes-v1-35-csi-sa-tokens-secrets-field-beta/</link><pubDate>Wed, 07 Jan 2026 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2026/01/07/kubernetes-v1-35-csi-sa-tokens-secrets-field-beta/</guid><description>
&lt;p>If you maintain a CSI driver that uses service account tokens,
Kubernetes v1.35 brings a refinement you'll want to know about.
Since the introduction of the &lt;a href="https://kubernetes-csi.github.io/docs/token-requests.html">TokenRequests feature&lt;/a>,
service account tokens requested by CSI drivers have been passed to them through the &lt;code>volume_context&lt;/code> field.
While this has worked, it's not the ideal place for sensitive information,
and we've seen instances where tokens were accidentally logged in CSI drivers.&lt;/p>
&lt;p>Kubernetes v1.35 introduces a beta solution to address this:
&lt;em>CSI Driver Opt-in for Service Account Tokens via Secrets Field&lt;/em>.
This allows CSI drivers to receive service account tokens
through the &lt;code>secrets&lt;/code> field in &lt;code>NodePublishVolumeRequest&lt;/code>,
which is the appropriate place for sensitive data in the CSI specification.&lt;/p>
&lt;h2 id="understanding-the-existing-approach">Understanding the existing approach&lt;/h2>
&lt;p>When CSI drivers use the &lt;a href="https://kubernetes-csi.github.io/docs/token-requests.html">TokenRequests feature&lt;/a>,
they can request service account tokens for workload identity
by configuring the &lt;code>TokenRequests&lt;/code> field in the CSIDriver spec.
These tokens are passed to drivers as part of the volume attributes map,
using the key &lt;code>csi.storage.k8s.io/serviceAccount.tokens&lt;/code>.&lt;/p>
&lt;p>The &lt;code>volume_context&lt;/code> field works, but it's not designed for sensitive data.
Because of this, there are a few challenges:&lt;/p>
&lt;p>First, the &lt;a href="https://github.com/kubernetes-csi/csi-lib-utils/tree/master/protosanitizer">&lt;code>protosanitizer&lt;/code>&lt;/a> tool that CSI drivers use doesn't treat volume context as sensitive,
so service account tokens can end up in logs when gRPC requests are logged.
This happened with &lt;a href="https://github.com/kubernetes-sigs/secrets-store-csi-driver/security/advisories/GHSA-g82w-58jf-gcxx">CVE-2023-2878&lt;/a> in the Secrets Store CSI Driver
and &lt;a href="https://github.com/kubernetes/kubernetes/issues/124759">CVE-2024-3744&lt;/a> in the Azure File CSI Driver.&lt;/p>
&lt;p>Second, each CSI driver that wants to avoid this issue needs to implement its own sanitization logic,
which leads to inconsistency across drivers.&lt;/p>
&lt;p>The CSI specification already has a &lt;code>secrets&lt;/code> field in &lt;code>NodePublishVolumeRequest&lt;/code>
that's designed exactly for this kind of sensitive information.
The challenge is that we can't just change where we put the tokens
without breaking existing CSI drivers that expect them in volume context.&lt;/p>
&lt;h2 id="how-the-opt-in-mechanism-works">How the opt-in mechanism works&lt;/h2>
&lt;p>Kubernetes v1.35 introduces an opt-in mechanism that lets CSI drivers choose
how they receive service account tokens.
This way, existing drivers continue working as they do today,
and drivers can move to the more appropriate secrets field when they're ready.&lt;/p>
&lt;p>CSI drivers can set a new field in their CSIDriver spec:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># CAUTION: this is an example configuration.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># Do not use this for your own cluster!&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>storage.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>CSIDriver&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>example-csi-driver&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># ... existing fields ...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tokenRequests&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">audience&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;example.com&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">expirationSeconds&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">3600&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># New field for opting into secrets delivery&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">serviceAccountTokenInSecrets&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># defaults to false&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The behavior depends on the &lt;code>serviceAccountTokenInSecrets&lt;/code> field:&lt;/p>
&lt;p>When set to &lt;code>false&lt;/code> (the default), tokens are placed in &lt;code>VolumeContext&lt;/code> with the key &lt;code>csi.storage.k8s.io/serviceAccount.tokens&lt;/code>, just like today.
When set to &lt;code>true&lt;/code>, tokens are placed only in the &lt;code>Secrets&lt;/code> field with the same key.&lt;/p>
&lt;h2 id="about-the-beta-release">About the beta release&lt;/h2>
&lt;p>The &lt;code>CSIServiceAccountTokenSecrets&lt;/code> feature gate is enabled by default
on both kubelet and kube-apiserver.
Since the &lt;code>serviceAccountTokenInSecrets&lt;/code> field defaults to &lt;code>false&lt;/code>,
enabling the feature gate doesn't change any existing behavior.
All drivers continue receiving tokens via volume context unless they explicitly opt in.
This is why we felt comfortable starting at beta rather than alpha.&lt;/p>
&lt;h2 id="guide-for-csi-driver-authors">Guide for CSI driver authors&lt;/h2>
&lt;p>If you maintain a CSI driver that uses service account tokens, here's how to adopt this feature.&lt;/p>
&lt;h3 id="adding-fallback-logic">Adding fallback logic&lt;/h3>
&lt;p>First, update your driver code to check both locations for tokens.
This makes your driver compatible with both the old and new approaches:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">const&lt;/span> serviceAccountTokenKey = &lt;span style="color:#b44">&amp;#34;csi.storage.k8s.io/serviceAccount.tokens&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> &lt;span style="color:#00a000">getServiceAccountTokens&lt;/span>(req &lt;span style="color:#666">*&lt;/span>csi.NodePublishVolumeRequest) (&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>, &lt;span style="color:#0b0;font-weight:bold">error&lt;/span>) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Check secrets field first (new behavior when driver opts in)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> tokens, ok &lt;span style="color:#666">:=&lt;/span> req.Secrets[serviceAccountTokenKey]; ok {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> tokens, &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Fall back to volume context (existing behavior)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> tokens, ok &lt;span style="color:#666">:=&lt;/span> req.VolumeContext[serviceAccountTokenKey]; ok {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> tokens, &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> &lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>, fmt.&lt;span style="color:#00a000">Errorf&lt;/span>(&lt;span style="color:#b44">&amp;#34;service account tokens not found&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This fallback logic is backward compatible and safe to ship in any driver version,
even before clusters upgrade to v1.35.&lt;/p>
&lt;h3 id="rollout-sequence">Rollout sequence&lt;/h3>
&lt;p>CSI driver authors need to follow a specific sequence when adopting this feature to avoid breaking existing volumes.&lt;/p>
&lt;p>&lt;strong>Driver preparation&lt;/strong> (can happen anytime)&lt;/p>
&lt;p>You can start preparing your driver right away by adding fallback logic that checks both the secrets field and volume context for tokens.
This code change is backward compatible and safe to ship in any driver version, even before clusters upgrade to v1.35.
We encourage you to add this fallback logic early, cut releases, and even backport to maintenance branches where feasible.&lt;/p>
&lt;p>&lt;strong>Cluster upgrade and feature enablement&lt;/strong>&lt;/p>
&lt;p>Once your driver has the fallback logic deployed, here's the safe rollout order for enabling the feature in a cluster:&lt;/p>
&lt;ol>
&lt;li>Complete the kube-apiserver upgrade to 1.35 or later&lt;/li>
&lt;li>Complete kubelet upgrade to 1.35 or later on all nodes&lt;/li>
&lt;li>Ensure CSI driver version with fallback logic is deployed (if not already done in preparation phase)&lt;/li>
&lt;li>Fully complete CSI driver DaemonSet rollout across all nodes&lt;/li>
&lt;li>Update your CSIDriver manifest to set &lt;code>serviceAccountTokenInSecrets: true&lt;/code>&lt;/li>
&lt;/ol>
&lt;h3 id="important-constraints">Important constraints&lt;/h3>
&lt;p>The most important thing to remember is timing.
If your CSI driver DaemonSet and CSIDriver object are in the same manifest or Helm chart,
you need two separate updates.
Deploy the new driver version with fallback logic first,
wait for the DaemonSet rollout to complete,
then update the CSIDriver spec to set &lt;code>serviceAccountTokenInSecrets: true&lt;/code>.&lt;/p>
&lt;p>Also, don't update the CSIDriver before all driver pods have rolled out.
If you do, volume mounts will fail on nodes still running the old driver version,
since those pods only check volume context.&lt;/p>
&lt;h2 id="why-this-matters">Why this matters&lt;/h2>
&lt;p>Adopting this feature helps in a few ways:&lt;/p>
&lt;ul>
&lt;li>It eliminates the risk of accidentally logging service account tokens as part of volume context in gRPC requests&lt;/li>
&lt;li>It uses the CSI specification's designated field for sensitive data, which feels right&lt;/li>
&lt;li>The &lt;code>protosanitizer&lt;/code> tool automatically handles the secrets field correctly, so you don't need driver-specific workarounds&lt;/li>
&lt;li>It's opt-in, so you can migrate at your own pace without breaking existing deployments&lt;/li>
&lt;/ul>
&lt;h2 id="call-to-action">Call to action&lt;/h2>
&lt;p>We (Kubernetes SIG Storage) encourage CSI driver authors to adopt this feature and provide feedback
on the migration experience.
If you have thoughts on the API design or run into any issues during adoption,
please reach out to us on the
&lt;a href="https://kubernetes.slack.com/archives/C8EJ01Z46">#csi&lt;/a> channel on Kubernetes Slack
(for an invitation, visit &lt;a href="https://slack.k8s.io/">https://slack.k8s.io/&lt;/a>).&lt;/p>
&lt;p>You can follow along on
&lt;a href="https://kep.k8s.io/5538">KEP-5538&lt;/a>
to track progress across the coming Kubernetes releases.&lt;/p></description></item><item><title>Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)</title><link>https://kubernetes.io/blog/2026/01/05/kubernetes-v1-35-numeric-toleration-operators/</link><pubDate>Mon, 05 Jan 2026 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2026/01/05/kubernetes-v1-35-numeric-toleration-operators/</guid><description>
&lt;p>Many production Kubernetes clusters blend on-demand (higher-SLA) and spot/preemptible (lower-SLA) nodes to optimize costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt-in with explicit thresholds like &amp;quot;I can tolerate nodes with failure probability up to 5%&amp;quot;.&lt;/p>
&lt;p>Today, Kubernetes taints and tolerations can match exact values or check for existence, but they can't compare numeric thresholds. You'd need to create discrete taint categories, use external admission controllers, or accept less-than-optimal placement decisions.&lt;/p>
&lt;p>In Kubernetes v1.35, we're introducing &lt;strong>Extended Toleration Operators&lt;/strong> as an alpha feature. This enhancement adds &lt;code>Gt&lt;/code> (Greater Than) and &lt;code>Lt&lt;/code> (Less Than) operators to &lt;code>spec.tolerations&lt;/code>, enabling threshold-based scheduling decisions that unlock new possibilities for SLA-based placement, cost optimization, and performance-aware workload distribution.&lt;/p>
&lt;h2 id="the-evolution-of-tolerations">The evolution of tolerations&lt;/h2>
&lt;p>Historically, Kubernetes supported two primary toleration operators:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>Equal&lt;/code>&lt;/strong>: The toleration matches a taint if the key and value are exactly equal&lt;/li>
&lt;li>&lt;strong>&lt;code>Exists&lt;/code>&lt;/strong>: The toleration matches a taint if the key exists, regardless of value&lt;/li>
&lt;/ul>
&lt;p>While these worked well for categorical scenarios, they fell short for numeric comparisons. Starting with v1.35, we are closing this gap.&lt;/p>
&lt;p>Consider these real-world scenarios:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>SLA requirements&lt;/strong>: Schedule high-availability workloads only on nodes with failure probability below a certain threshold&lt;/li>
&lt;li>&lt;strong>Cost optimization&lt;/strong>: Allow cost-sensitive batch jobs to run on cheaper nodes that exceed a specific cost-per-hour value&lt;/li>
&lt;li>&lt;strong>Performance guarantees&lt;/strong>: Ensure latency-sensitive applications run only on nodes with disk IOPS or network bandwidth above minimum thresholds&lt;/li>
&lt;/ul>
&lt;p>Without numeric comparison operators, cluster operators have had to resort to workarounds like creating multiple discrete taint values or using external admission controllers, neither of which scale well or provide the flexibility needed for dynamic threshold-based scheduling.&lt;/p>
&lt;h2 id="why-extend-tolerations-instead-of-using-nodeaffinity">Why extend tolerations instead of using NodeAffinity?&lt;/h2>
&lt;p>You might wonder: NodeAffinity already supports numeric comparison operators, so why extend tolerations? While NodeAffinity is powerful for expressing pod preferences, taints and tolerations provide critical operational benefits:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Policy orientation&lt;/strong>: NodeAffinity is per-pod, requiring every workload to explicitly opt-out of risky nodes. Taints invert control—nodes declare their risk level, and only pods with matching tolerations may land there. This provides a safer default; most pods stay away from spot/preemptible nodes unless they explicitly opt-in.&lt;/li>
&lt;li>&lt;strong>Eviction semantics&lt;/strong>: NodeAffinity has no eviction capability. Taints support the &lt;code>NoExecute&lt;/code> effect with &lt;code>tolerationSeconds&lt;/code>, enabling operators to drain and evict pods when a node's SLA degrades or spot instances receive termination notices.&lt;/li>
&lt;li>&lt;strong>Operational ergonomics&lt;/strong>: Centralized, node-side policy is consistent with other safety taints like disk-pressure and memory-pressure, making cluster management more intuitive.&lt;/li>
&lt;/ul>
&lt;p>This enhancement preserves the well-understood safety model of taints and tolerations while enabling threshold-based placement for SLA-aware scheduling.&lt;/p>
&lt;h2 id="introducing-gt-and-lt-operators">Introducing Gt and Lt operators&lt;/h2>
&lt;p>Kubernetes v1.35 introduces two new operators for tolerations:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>Gt&lt;/code> (Greater Than)&lt;/strong>: The toleration matches if the taint's numeric value is less than the toleration's value&lt;/li>
&lt;li>&lt;strong>&lt;code>Lt&lt;/code> (Less Than)&lt;/strong>: The toleration matches if the taint's numeric value is greater than the toleration's value&lt;/li>
&lt;/ul>
&lt;p>When a pod tolerates a taint with &lt;code>Lt&lt;/code>, it's saying &amp;quot;I can tolerate nodes where this metric is &lt;em>less than&lt;/em> my threshold&amp;quot;. Since tolerations allow scheduling, the pod can run on nodes where the taint value is greater than the toleration value. Think of it as: &amp;quot;I tolerate nodes that are above my minimum requirements&amp;quot;.&lt;/p>
&lt;p>These operators work with numeric taint values and enable the scheduler to make sophisticated placement decisions based on continuous metrics rather than discrete categories.&lt;/p>
&lt;div class="alert alert-info" role="alert">&lt;h4 class="alert-heading">Note:&lt;/h4>&lt;p>Numeric values for &lt;code>Gt&lt;/code> and &lt;code>Lt&lt;/code> operators must be positive 64-bit integers without leading zeros. For example, &lt;code>&amp;quot;100&amp;quot;&lt;/code> is valid, but &lt;code>&amp;quot;0100&amp;quot;&lt;/code> (with leading zero) and &lt;code>&amp;quot;0&amp;quot;&lt;/code> (zero value) are not permitted.&lt;/p>
&lt;p>The &lt;code>Gt&lt;/code> and &lt;code>Lt&lt;/code> operators work with all taint effects: &lt;code>NoSchedule&lt;/code>, &lt;code>NoExecute&lt;/code>, and &lt;code>PreferNoSchedule&lt;/code>.&lt;/p>
&lt;/div>
&lt;h2 id="use-cases-and-examples">Use cases and examples&lt;/h2>
&lt;p>Let's explore how Extended Toleration Operators solve real-world scheduling challenges.&lt;/p>
&lt;h3 id="example-1-spot-instance-protection-with-sla-thresholds">Example 1: Spot instance protection with SLA thresholds&lt;/h3>
&lt;p>Many clusters mix on-demand and spot/preemptible nodes to optimize costs. Spot nodes offer significant savings but have higher failure rates. You want most workloads to avoid spot nodes by default, while allowing specific workloads to opt-in with clear SLA boundaries.&lt;/p>
&lt;p>First, taint spot nodes with their failure probability (for example, 15% annual failure rate):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Node&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>spot-node-1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">taints&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;failure-probability&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;15&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoExecute&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>On-demand nodes have much lower failure rates:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Node&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ondemand-node-1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">taints&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;failure-probability&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;2&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoExecute&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Critical workloads can specify strict SLA requirements:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>payment-processor&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tolerations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;failure-probability&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Lt&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;5&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoExecute&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tolerationSeconds&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">30&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>app&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>payment-app:v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This pod will &lt;strong>only&lt;/strong> schedule on nodes with &lt;code>failure-probability&lt;/code> less than 5 (meaning &lt;code>ondemand-node-1&lt;/code> with 2% but not &lt;code>spot-node-1&lt;/code> with 15%). The &lt;code>NoExecute&lt;/code> effect with &lt;code>tolerationSeconds: 30&lt;/code> means if a node's SLA degrades (for example, cloud provider changes the taint value), the pod gets 30 seconds to gracefully terminate before forced eviction.&lt;/p>
&lt;p>Meanwhile, a fault-tolerant batch job can explicitly opt-in to spot instances:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>batch-job&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tolerations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;failure-probability&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Lt&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;20&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoExecute&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>worker&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>batch-worker:v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This batch job tolerates nodes with failure probability up to 20%, so it can run on both on-demand and spot nodes, maximizing cost savings while accepting higher risk.&lt;/p>
&lt;h3 id="example-2-ai-workload-placement-with-gpu-tiers">Example 2: AI workload placement with GPU tiers&lt;/h3>
&lt;p>AI and machine learning workloads often have specific hardware requirements. With Extended Toleration Operators, you can create GPU node tiers and ensure workloads land on appropriately powered hardware.&lt;/p>
&lt;p>Taint GPU nodes with their compute capability score:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Node&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gpu-node-a100&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">taints&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;gpu-compute-score&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;1000&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">---&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Node&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gpu-node-t4&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">taints&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;gpu-compute-score&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;500&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A heavy training workload can require high-performance GPUs:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>model-training&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tolerations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;gpu-compute-score&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Gt&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;800&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>trainer&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ml-trainer:v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">limits&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nvidia.com/gpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">1&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This ensures the training pod only schedules on nodes with compute scores greater than 800 (like the A100 node), preventing placement on lower-tier GPUs that would slow down training.&lt;/p>
&lt;p>Meanwhile, inference workloads with less demanding requirements can use any available GPU:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>model-inference&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tolerations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;gpu-compute-score&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Gt&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;400&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>inference&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ml-inference:v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">limits&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nvidia.com/gpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">1&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="example-3-cost-optimized-workload-placement">Example 3: Cost-optimized workload placement&lt;/h3>
&lt;p>For batch processing or non-critical workloads, you might want to minimize costs by running on cheaper nodes, even if they have lower performance characteristics.&lt;/p>
&lt;p>Nodes can be tainted with their cost rating:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">taints&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;cost-per-hour&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;50&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A cost-sensitive batch job can express its tolerance for expensive nodes:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">tolerations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;cost-per-hour&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Lt&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;100&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This batch job will schedule on nodes costing less than $100/hour but avoid more expensive nodes. Combined with Kubernetes scheduling priorities, this enables sophisticated cost-tiering strategies where critical workloads get premium nodes while batch workloads efficiently use budget-friendly resources.&lt;/p>
&lt;h3 id="example-4-performance-based-placement">Example 4: Performance-based placement&lt;/h3>
&lt;p>Storage-intensive applications often require minimum disk performance guarantees. With Extended Toleration Operators, you can enforce these requirements at the scheduling level.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">tolerations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;disk-iops&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Gt&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;3000&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This toleration ensures the pod only schedules on nodes where &lt;code>disk-iops&lt;/code> exceeds 3000. The &lt;code>Gt&lt;/code> operator means &amp;quot;I need nodes that are greater than this minimum&amp;quot;.&lt;/p>
&lt;h2 id="how-to-use-this-feature">How to use this feature&lt;/h2>
&lt;p>Extended Toleration Operators is an &lt;strong>alpha feature&lt;/strong> in Kubernetes v1.35. To try it out:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Enable the feature gate&lt;/strong> on both your API server and scheduler:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>--feature-gates&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#b8860b">TaintTolerationComparisonOperators&lt;/span>&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#a2f">true&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>&lt;strong>Taint your nodes&lt;/strong> with numeric values representing the metrics relevant to your scheduling needs:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span> kubectl taint nodes node-1 failure-probability&lt;span style="color:#666">=&lt;/span>5:NoSchedule
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kubectl taint nodes node-2 disk-iops&lt;span style="color:#666">=&lt;/span>5000:NoSchedule
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>&lt;strong>Use the new operators&lt;/strong> in your pod specifications:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tolerations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;failure-probability&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Lt&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;1&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">effect&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;NoSchedule&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ol>
&lt;div class="alert alert-info" role="alert">&lt;h4 class="alert-heading">Note:&lt;/h4>As an alpha feature, Extended Toleration Operators may change in future releases and should be used with caution in production environments. Always test thoroughly in non-production clusters first.&lt;/div>
&lt;h2 id="what-s-next">What's next?&lt;/h2>
&lt;p>This alpha release is just the beginning. As we gather feedback from the community, we plan to:&lt;/p>
&lt;ul>
&lt;li>Add support for &lt;a href="https://github.com/kubernetes/enhancements/issues/5500">CEL (Common Expression Language) expressions&lt;/a> in tolerations and node affinity for even more flexible scheduling logic, including semantic versioning comparisons&lt;/li>
&lt;li>Improve integration with cluster autoscaling for threshold-aware capacity planning&lt;/li>
&lt;li>Graduate the feature to beta and eventually GA with production-ready stability&lt;/li>
&lt;/ul>
&lt;p>We're particularly interested in hearing about your use cases! Do you have scenarios where threshold-based scheduling would solve problems? Are there additional operators or capabilities you'd like to see?&lt;/p>
&lt;h2 id="getting-involved">Getting involved&lt;/h2>
&lt;p>This feature is driven by the &lt;a href="https://github.com/kubernetes/community/tree/master/sig-scheduling">SIG Scheduling&lt;/a> community. Please join us to connect with the community and share your ideas and feedback around this feature and beyond.&lt;/p>
&lt;p>You can reach the maintainers of this feature at:&lt;/p>
&lt;ul>
&lt;li>Slack: &lt;a href="https://kubernetes.slack.com/messages/sig-scheduling">#sig-scheduling&lt;/a> on Kubernetes Slack&lt;/li>
&lt;li>Mailing list: &lt;a href="https://groups.google.com/g/kubernetes-sig-scheduling">kubernetes-sig-scheduling@googlegroups.com&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>For questions or specific inquiries related to Extended Toleration Operators, please reach out to the SIG Scheduling community. We look forward to hearing from you!&lt;/p>
&lt;h2 id="how-can-i-learn-more">How can I learn more?&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/">Taints and Tolerations&lt;/a> for understanding the fundamentals&lt;/li>
&lt;li>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#numeric-comparison-operators">Numeric comparison operators&lt;/a> for details on using &lt;code>Gt&lt;/code> and &lt;code>Lt&lt;/code> operators&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/5471">KEP-5471: Extended Toleration Operators for Threshold-Based Placement&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.35: New level of efficiency with in-place Pod restart</title><link>https://kubernetes.io/blog/2026/01/02/kubernetes-v1-35-restart-all-containers/</link><pubDate>Fri, 02 Jan 2026 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2026/01/02/kubernetes-v1-35-restart-all-containers/</guid><description>
&lt;p>The release of Kubernetes 1.35 introduces a powerful new feature that provides a much-requested capability: the ability to trigger a full, in-place restart of the Pod. This feature, &lt;em>Restart All Containers&lt;/em> (alpha in 1.35), allows for an efficient way to reset a Pod's state compared to resource-intensive approach of deleting and recreating the entire Pod. This feature is especially useful for AI/ML workloads allowing application developers to concentrate on their core training logic while offloading complex failure-handling and recovery mechanisms to sidecars and declarative Kubernetes configuration. With &lt;code>RestartAllContainers&lt;/code> and other planned enhancements, Kubernetes continues to add building blocks for creating the most flexible, robust, and efficient platforms for AI/ML workloads.&lt;/p>
&lt;p>This new functionality is available by enabling the &lt;code>RestartAllContainersOnContainerExits&lt;/code> feature gate. This alpha feature extends the &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-restart-rules">&lt;em>Container Restart Rules&lt;/em> feature&lt;/a>, which graduated to beta in Kubernetes 1.35.&lt;/p>
&lt;h2 id="the-problem-when-a-single-container-restart-isn-t-enough-and-recreating-pods-is-too-costly">The problem: when a single container restart isn't enough and recreating pods is too costly&lt;/h2>
&lt;p>Kubernetes has long supported restart policies at the Pod level (&lt;code>restartPolicy&lt;/code>) and, more recently, at the &lt;a href="https://kubernetes.io/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/">individual container level&lt;/a>. These policies are great for handling crashes in a single, isolated process. However, many modern applications have more complex inter-container dependencies. For instance:&lt;/p>
&lt;ul>
&lt;li>An &lt;strong>init container&lt;/strong> prepares the environment by mounting a volume or generating a configuration file. If the main application container corrupts this environment, simply restarting that one container is not enough. The entire initialization process needs to run again.&lt;/li>
&lt;li>A &lt;strong>watcher sidecar&lt;/strong> monitors system health. If it detects an unrecoverable but retriable error state, it must trigger a restart of the main application container from a clean slate.&lt;/li>
&lt;li>A &lt;strong>sidecar&lt;/strong> that manages a remote resource fails. Even if the sidecar restarts on its own, the main container may be stuck trying to access an outdated or broken connection.&lt;/li>
&lt;/ul>
&lt;p>In all these cases, the desired action is not to restart a single container, but all of them. Previously, the only way to achieve this was to delete the Pod and have a controller (like a Job or ReplicaSet) create a new one. This process is slow and expensive, involving the scheduler, node resource allocation and re-initialization of networking and storage.&lt;/p>
&lt;p>This inefficiency becomes even worse when handling large-scale AI/ML workloads (&amp;gt;= 1,000 Nodes with one Pod per Node). A common requirement for these synchronous workloads is that when a failure occurs (such as a Node crash), all Pods in the fleet must be recreated to reset the state before training can resume, even if all the other Pods were not directly affected by the failure. Deleting, creating and scheduling thousands of Pods simultaneously creates a massive bottleneck. The estimated overhead of this failure could cost &lt;a href="https://docs.google.com/document/d/16zexVooHKPc80F4dVtUjDYK9DOpkVPRNfSv0zRtfFpk/edit?tab=t.0#bookmark=id.qwqcnzf96avw">$100,000 per month in wasted resources&lt;/a>.&lt;/p>
&lt;p>Handling these failures for AI/ML training jobs requires a complex integration touching both the training framework and Kubernetes, which are often fragile and toilsome. This feature introduces a Kubernetes-native solution, improving system robustness and allowing application developers to concentrate on their core training logic.&lt;/p>
&lt;p>Another major benefit of restarting Pods in place is that keeping Pods on their assigned Nodes allows for further optimizations. For example, one can implement node-level caching tied to a specific Pod identity, something that is impossible when Pods are unnecessarily being recreated on different Nodes.&lt;/p>
&lt;h2 id="introducing-the-restartallcontainers-action">Introducing the &lt;code>RestartAllContainers&lt;/code> action&lt;/h2>
&lt;p>To address this, Kubernetes v1.35 adds a new action to the container restart rules: &lt;code>RestartAllContainers&lt;/code>. When a container exits in a way that matches a rule with this action, the kubelet initiates a fast, &lt;strong>in-place&lt;/strong> restart of the Pod.&lt;/p>
&lt;p>This in-place restart is highly efficient because it preserves the Pod's most important resources:&lt;/p>
&lt;ul>
&lt;li>The Pod's UID, IP address and network namespace.&lt;/li>
&lt;li>The Pod's sandbox and any attached devices.&lt;/li>
&lt;li>All volumes, including &lt;code>emptyDir&lt;/code> and mounted volumes from PVCs.&lt;/li>
&lt;/ul>
&lt;p>After terminating all running containers, the Pod's startup sequence is re-executed from the very beginning. This means all &lt;strong>init containers&lt;/strong> are run again in order, followed by the sidecar and regular containers, ensuring a completely fresh start in a known-good environment. With the exception of ephemeral containers (which are terminated), all other containers—including those that previously succeeded or failed—will be restarted, regardless of their individual restart policies.&lt;/p>
&lt;h2 id="use-cases">Use cases&lt;/h2>
&lt;h3 id="1-efficient-restarts-for-ml-batch-jobs">1. Efficient restarts for ML/Batch jobs&lt;/h3>
&lt;p>For ML training jobs, &lt;a href="https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/#roadmap-for-failure-modes-container-code-failed">rescheduling a worker Pod on failure&lt;/a> is a costly operation that wastes valuable compute resources. On a 1,000-node training cluster, rescheduling overhead can waste &lt;a href="https://docs.google.com/document/d/16zexVooHKPc80F4dVtUjDYK9DOpkVPRNfSv0zRtfFpk/edit?tab=t.0#bookmark=id.qwqcnzf96avw">over $100,000 in compute resources monthly&lt;/a>.&lt;/p>
&lt;p>With &lt;code>RestartAllContainers&lt;/code> actions you can address this by enabling a much faster, hybrid recovery strategy: recreate only the &amp;quot;bad&amp;quot; Pods (e.g., those on unhealthy Nodes) while triggering &lt;code>RestartAllContainers&lt;/code> for the remaining healthy Pods. Benchmarks show this reduces the recovery overhead &lt;a href="https://docs.google.com/document/d/16zexVooHKPc80F4dVtUjDYK9DOpkVPRNfSv0zRtfFpk/edit?tab=t.0#bookmark=id.cwkee8kar0i5">from minutes to a few seconds&lt;/a>.&lt;/p>
&lt;p>With in-place restarts, a watcher sidecar can monitor the main training process. If it encounters a specific, retriable error, the watcher can exit with a designated code to trigger a fast reset of the worker Pod, allowing it to restart from the last checkpoint without involving the Job controller. This capability is now natively supported by Kubernetes.&lt;/p>
&lt;p>Read more details about future development and JobSet features at &lt;a href="https://github.com/kubernetes-sigs/jobset/issues/467">KEP-467 JobSet in-place restart&lt;/a>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ml-worker-pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Never&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">initContainers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># This init container will re-run on every in-place restart&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>setup-environment&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-repo/setup-worker:1.0&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>watcher-sidecar&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-repo/watcher:1.0&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Always&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicyRules&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">action&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>RestartAllContainers&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">onExit&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">exitCodes&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>In&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># A specific exit code from the watcher triggers a full pod restart&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">values&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#666">88&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>main-application&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-repo/training-app:1.0&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="2-re-running-init-containers-for-a-clean-state">2. Re-running init containers for a clean state&lt;/h3>
&lt;p>Imagine a scenario where an init container is responsible for fetching credentials or setting up a shared volume. If the main application fails in a way that corrupts this shared state, you need the &lt;a href="https://github.com/kubernetes/enhancements/issues/3676">init container to rerun&lt;/a>.&lt;/p>
&lt;p>By configuring the main application to exit with a specific code upon detecting such a corruption, you can trigger the &lt;code>RestartAllContainers&lt;/code> action, guaranteeing that the init container provides a clean setup before the application restarts.&lt;/p>
&lt;h3 id="3-handling-high-rate-of-similar-tasks-execution">3. Handling high rate of similar tasks execution&lt;/h3>
&lt;p>There are cases when tasks are best represented as a Pod execution. And each task requires a clean execution. The task may be a game session backend or some queue item processing. If the rate of tasks is high, running the whole cycle of Pod creation, scheduling and initialization is simply too expensive, especially when tasks can be short. The ability to restart all containers from scratch enables a Kubernetes-native way to handle this scenario without custom solutions or frameworks.&lt;/p>
&lt;h2 id="how-to-use-it">How to use it&lt;/h2>
&lt;p>To try this feature, you must enable the &lt;code>RestartAllContainersOnContainerExits&lt;/code> feature gate on your Kubernetes cluster components (API server and kubelet) running Kubernetes v1.35+. This alpha feature extends the &lt;code>ContainerRestartRules&lt;/code> feature, which graduated to beta in v1.35 and is enabled by default.&lt;/p>
&lt;p>Once enabled, you can add &lt;code>restartPolicyRules&lt;/code> to any container (init, sidecar, or regular) and use the &lt;code>RestartAllContainers&lt;/code> action.&lt;/p>
&lt;p>The feature is designed to be easily usable on existing apps. However, if an application does not follow some best practices, it may cause issues for the application or for observability tooling. When enabling the feature, make sure that all containers are reentrant and that external tooling is prepared for init containers to re-run. Also, when restarting all containers, the kubelet does not run &lt;code>preStop&lt;/code> hooks. This means containers must be designed to handle abrupt termination without relying on &lt;code>preStop&lt;/code> hooks for graceful shutdown.&lt;/p>
&lt;h2 id="observing-the-restart">Observing the restart&lt;/h2>
&lt;p>To make this process observable, a new Pod condition, &lt;code>AllContainersRestarting&lt;/code>, is added to the Pod's status. When a restart is triggered, this condition becomes &lt;code>True&lt;/code> and it reverts to &lt;code>False&lt;/code> once all containers have terminated and the Pod is ready to start its lifecycle anew. This provides a clear signal to users and other cluster components about the Pod's state.&lt;/p>
&lt;p>All containers restarted by this action will have their restart count incremented in the container status.&lt;/p>
&lt;h2 id="learn-more">Learn more&lt;/h2>
&lt;ul>
&lt;li>Read the official documentation on &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-all-containers">Pod Lifecycle&lt;/a>.&lt;/li>
&lt;li>Read the detailed proposal in the &lt;a href="https://kep.k8s.io/5532">KEP-5532: Restart All Containers on Container Exits&lt;/a>.&lt;/li>
&lt;li>Read the proposal for JobSet in-place restart in &lt;a href="https://github.com/kubernetes-sigs/jobset/issues/467">JobSet issue #467&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h2 id="we-want-your-feedback">We want your feedback!&lt;/h2>
&lt;p>As an alpha feature, &lt;code>RestartAllContainers&lt;/code> is ready for you to experiment with and any use cases and feedback are welcome. This feature is driven by the &lt;a href="https://github.com/kubernetes/community/blob/master/sig-node/README.md">SIG Node&lt;/a> community. If you are interested in getting involved, sharing your thoughts, or contributing, please join us!&lt;/p>
&lt;p>You can reach SIG Node through:&lt;/p>
&lt;ul>
&lt;li>Slack: &lt;a href="https://kubernetes.slack.com/messages/sig-node">#sig-node&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://groups.google.com/forum/#!forum/kubernetes-sig-node">Mailing list&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes 1.35: Enhanced Debugging with Versioned z-pages APIs</title><link>https://kubernetes.io/blog/2025/12/31/kubernetes-v1-35-structured-zpages/</link><pubDate>Wed, 31 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/31/kubernetes-v1-35-structured-zpages/</guid><description>
&lt;p>Debugging Kubernetes control plane components can be challenging, especially when you need to quickly understand the runtime state of a component or verify its configuration. With Kubernetes 1.35, we're enhancing the z-pages debugging endpoints with structured, machine-parseable responses that make it easier to build tooling and automate troubleshooting workflows.&lt;/p>
&lt;h2 id="what-are-z-pages">What are z-pages?&lt;/h2>
&lt;p>z-pages are special debugging endpoints exposed by Kubernetes control plane components. Introduced as an alpha feature in Kubernetes 1.32, these endpoints provide runtime diagnostics for components like &lt;code>kube-apiserver&lt;/code>, &lt;code>kube-controller-manager&lt;/code>, &lt;code>kube-scheduler&lt;/code>, &lt;code>kubelet&lt;/code> and &lt;code>kube-proxy&lt;/code>. The name &amp;quot;z-pages&amp;quot; comes from the convention of using &lt;code>/*z&lt;/code> paths for debugging endpoints.&lt;/p>
&lt;p>Currently, Kubernetes supports two primary z-page endpoints:&lt;/p>
&lt;dl>
&lt;dt>&lt;code>/statusz&lt;/code>&lt;/dt>
&lt;dd>Displays high-level component information including version information, start time, uptime, and available debug paths&lt;/dd>
&lt;dt>&lt;code>/flagz&lt;/code>&lt;/dt>
&lt;dd>Shows all command-line arguments and their values used to start the component (with confidential values redacted for security)&lt;/dd>
&lt;/dl>
&lt;p>These endpoints are valuable for human operators who need to quickly inspect component state, but until now, they only returned plain text output that was difficult to parse programmatically.&lt;/p>
&lt;h2 id="what-s-new-in-kubernetes-1-35">What's new in Kubernetes 1.35?&lt;/h2>
&lt;p>Kubernetes 1.35 introduces structured, versioned responses for both &lt;code>/statusz&lt;/code> and &lt;code>/flagz&lt;/code> endpoints. This enhancement maintains backward compatibility with the existing plain text format while adding support for machine-readable JSON responses.&lt;/p>
&lt;h3 id="backward-compatible-design">Backward compatible design&lt;/h3>
&lt;p>The new structured responses are opt-in. Without specifying an &lt;code>Accept&lt;/code> header, the endpoints continue to return the familiar plain text format:&lt;/p>
&lt;pre tabindex="0">&lt;code>$ curl --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \
--key /etc/kubernetes/pki/apiserver-kubelet-client.key \
--cacert /etc/kubernetes/pki/ca.crt \
https://localhost:6443/statusz
kube-apiserver statusz
Warning: This endpoint is not meant to be machine parseable, has no formatting compatibility guarantees and is for debugging purposes only.
Started: Wed Oct 16 21:03:43 UTC 2024
Up: 0 hr 00 min 16 sec
Go version: go1.23.2
Binary version: 1.35.0-alpha.0.1595
Emulation version: 1.35
Paths: /healthz /livez /metrics /readyz /statusz /version
&lt;/code>&lt;/pre>&lt;h3 id="structured-json-responses">Structured JSON responses&lt;/h3>
&lt;p>To receive a structured response, include the appropriate &lt;code>Accept&lt;/code> header:&lt;/p>
&lt;pre tabindex="0">&lt;code>Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz
&lt;/code>&lt;/pre>&lt;p>This returns a versioned JSON response:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;kind&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;Statusz&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;apiVersion&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;config.k8s.io/v1alpha1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;metadata&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;name&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;kube-apiserver&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;startTime&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;2025-10-29T00:30:01Z&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;uptimeSeconds&amp;#34;&lt;/span>: &lt;span style="color:#666">856&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;goVersion&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;go1.23.2&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;binaryVersion&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;1.35.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;emulationVersion&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;1.35&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;paths&amp;#34;&lt;/span>: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;/healthz&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;/livez&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;/metrics&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;/readyz&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;/statusz&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;/version&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Similarly, &lt;code>/flagz&lt;/code> supports structured responses with the header:&lt;/p>
&lt;pre tabindex="0">&lt;code>Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz
&lt;/code>&lt;/pre>&lt;p>Example response:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;kind&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;Flagz&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;apiVersion&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;config.k8s.io/v1alpha1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;metadata&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;name&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;kube-apiserver&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;flags&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;advertise-address&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;192.168.8.4&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;allow-privileged&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;true&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;authorization-mode&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;[Node,RBAC]&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;enable-priority-and-fairness&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;true&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#008000;font-weight:bold">&amp;#34;profiling&amp;#34;&lt;/span>: &lt;span style="color:#b44">&amp;#34;true&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="why-structured-responses-matter">Why structured responses matter&lt;/h2>
&lt;p>The addition of structured responses opens up several new possibilities:&lt;/p>
&lt;h3 id="1-automated-health-checks-and-monitoring">1. &lt;strong>Automated health checks and monitoring&lt;/strong>&lt;/h3>
&lt;p>Instead of parsing plain text, monitoring tools can now easily extract specific fields. For example, you can programmatically check if a component has been running with an unexpected emulated version or verify that critical flags are set correctly.&lt;/p>
&lt;h3 id="2-better-debugging-tools">2. &lt;strong>Better debugging tools&lt;/strong>&lt;/h3>
&lt;p>Developers can build sophisticated debugging tools that compare configurations across multiple components or track configuration drift over time. The structured format makes it trivial to &lt;code>diff&lt;/code> configurations or validate that components are running with expected settings.&lt;/p>
&lt;h3 id="3-api-versioning-and-stability">3. &lt;strong>API versioning and stability&lt;/strong>&lt;/h3>
&lt;p>By introducing versioned APIs (starting with &lt;code>v1alpha1&lt;/code>), we provide a clear path to stability. As the feature matures, we'll introduce &lt;code>v1beta1&lt;/code> and eventually &lt;code>v1&lt;/code>, giving you confidence that your tooling won't break with future Kubernetes releases.&lt;/p>
&lt;h2 id="how-to-use-structured-z-pages">How to use structured z-pages&lt;/h2>
&lt;h3 id="prerequisites">Prerequisites&lt;/h3>
&lt;p>Both endpoints require feature gates to be enabled:&lt;/p>
&lt;ul>
&lt;li>&lt;code>/statusz&lt;/code>: Enable the &lt;code>ComponentStatusz&lt;/code> feature gate&lt;/li>
&lt;li>&lt;code>/flagz&lt;/code>: Enable the &lt;code>ComponentFlagz&lt;/code> feature gate&lt;/li>
&lt;/ul>
&lt;h3 id="example-getting-structured-responses">Example: Getting structured responses&lt;/h3>
&lt;p>Here's an example using &lt;code>curl&lt;/code> to retrieve structured JSON responses from the kube-apiserver:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># Get structured statusz response&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> --key /etc/kubernetes/pki/apiserver-kubelet-client.key &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> --cacert /etc/kubernetes/pki/ca.crt &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> -H &lt;span style="color:#b44">&amp;#34;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz&amp;#34;&lt;/span> &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> https://localhost:6443/statusz | jq .
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># Get structured flagz response&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> --key /etc/kubernetes/pki/apiserver-kubelet-client.key &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> --cacert /etc/kubernetes/pki/ca.crt &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> -H &lt;span style="color:#b44">&amp;#34;Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz&amp;#34;&lt;/span> &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span> https://localhost:6443/flagz | jq .
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;div class="alert alert-info" role="alert">&lt;h4 class="alert-heading">Note:&lt;/h4>The examples above use client certificate authentication and verify the server's certificate using &lt;code>--cacert&lt;/code>.
If you need to bypass certificate verification in a test environment, you can use &lt;code>--insecure&lt;/code> (or &lt;code>-k&lt;/code>),
but this should never be done in production as it makes you vulnerable to man-in-the-middle attacks.&lt;/div>
&lt;h2 id="important-considerations">Important considerations&lt;/h2>
&lt;h3 id="alpha-feature-status">Alpha feature status&lt;/h3>
&lt;p>The structured z-page responses are an &lt;strong>alpha&lt;/strong> feature in Kubernetes 1.35. This means:&lt;/p>
&lt;ul>
&lt;li>The API format may change in future releases&lt;/li>
&lt;li>These endpoints are intended for debugging, not production automation&lt;/li>
&lt;li>You should avoid relying on them for critical monitoring workflows until they reach beta or stable status&lt;/li>
&lt;/ul>
&lt;h3 id="security-and-access-control">Security and access control&lt;/h3>
&lt;p>z-pages expose internal component information and require proper access controls. Here are the key security considerations:&lt;/p>
&lt;p>&lt;strong>Authorization&lt;/strong>: Access to z-page endpoints is restricted to members of the &lt;code>system:monitoring&lt;/code> group, which follows the same authorization model as other debugging endpoints like &lt;code>/healthz&lt;/code>, &lt;code>/livez&lt;/code>, and &lt;code>/readyz&lt;/code>. This ensures that only authorized users and service accounts can access debugging information. If your cluster uses RBAC, you can manage access by granting appropriate permissions to this group.&lt;/p>
&lt;p>&lt;strong>Authentication&lt;/strong>: The authentication requirements for these endpoints depend on your cluster's configuration. Unless anonymous authentication is enabled for your cluster, you typically need to use authentication mechanisms (such as client certificates) to access these endpoints.&lt;/p>
&lt;p>&lt;strong>Information disclosure&lt;/strong>: These endpoints reveal configuration details about your cluster components, including:&lt;/p>
&lt;ul>
&lt;li>Component versions and build information&lt;/li>
&lt;li>All command-line arguments and their values (with confidential values redacted)&lt;/li>
&lt;li>Available debug endpoints&lt;/li>
&lt;/ul>
&lt;p>Only grant access to trusted operators and debugging tools. Avoid exposing these endpoints to unauthorized users or automated systems that don't require this level of access.&lt;/p>
&lt;h3 id="future-evolution">Future evolution&lt;/h3>
&lt;p>As the feature matures, we (Kubernetes SIG Instrumentation) expect to:&lt;/p>
&lt;ul>
&lt;li>Introduce &lt;code>v1beta1&lt;/code> and eventually &lt;code>v1&lt;/code> versions of the API&lt;/li>
&lt;li>Gather community feedback on the response schema&lt;/li>
&lt;li>Potentially add additional z-page endpoints based on user needs&lt;/li>
&lt;/ul>
&lt;h2 id="try-it-out">Try it out&lt;/h2>
&lt;p>We encourage you to experiment with structured z-pages in a test environment:&lt;/p>
&lt;ol>
&lt;li>Enable the &lt;code>ComponentStatusz&lt;/code> and &lt;code>ComponentFlagz&lt;/code> feature gates on your control plane components&lt;/li>
&lt;li>Try querying the endpoints with both plain text and structured formats&lt;/li>
&lt;li>Build a simple tool or script that uses the structured data&lt;/li>
&lt;li>Share your feedback with the community&lt;/li>
&lt;/ol>
&lt;h2 id="learn-more">Learn more&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.io/docs/reference/instrumentation/zpages/">z-pages documentation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/4827-component-statusz/README.md">KEP-4827: Component Statusz&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/4828-component-flagz/README.md">KEP-4828: Component Flagz&lt;/a>&lt;/li>
&lt;li>Join the discussion in the &lt;a href="https://kubernetes.slack.com/archives/C20HH14P7">#sig-instrumentation&lt;/a> channel on Kubernetes Slack&lt;/li>
&lt;/ul>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>We'd love to hear your feedback! The structured z-pages feature is designed to make Kubernetes easier to debug and monitor. Whether you're building internal tooling, contributing to open source projects, or just exploring the feature, your input helps shape the future of Kubernetes observability.&lt;/p>
&lt;p>If you have questions, suggestions, or run into issues, please reach out to SIG Instrumentation. You can find us on Slack or at our regular &lt;a href="https://github.com/kubernetes/community/tree/master/sig-instrumentation">community meetings&lt;/a>.&lt;/p>
&lt;p>Happy debugging!&lt;/p></description></item><item><title>Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager</title><link>https://kubernetes.io/blog/2025/12/30/kubernetes-v1-35-watch-based-route-reconciliation-in-ccm/</link><pubDate>Tue, 30 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/30/kubernetes-v1-35-watch-based-route-reconciliation-in-ccm/</guid><description>
&lt;p>Up to and including Kubernetes v1.34, the route controller in Cloud Controller Manager (CCM)
implementations built using the &lt;a href="https://github.com/kubernetes/cloud-provider">k8s.io/cloud-provider&lt;/a> library reconciles
routes at a fixed interval. This causes unnecessary API requests to the cloud provider when
there are no changes to routes. Other controllers implemented through the same library already
use watch-based mechanisms, leveraging informers to avoid unnecessary API calls. A new feature gate
is being introduced in v1.35 to allow changing the behavior of the route controller to use watch-based informers.&lt;/p>
&lt;h2 id="what-s-new">What's new?&lt;/h2>
&lt;p>The feature gate &lt;code>CloudControllerManagerWatchBasedRoutesReconciliation&lt;/code> has been
introduced to &lt;a href="https://github.com/kubernetes/cloud-provider">k8s.io/cloud-provider&lt;/a> in alpha stage by &lt;a href="https://github.com/kubernetes/community/blob/master/sig-cloud-provider/README.md">SIG Cloud Provider&lt;/a>.
To enable this feature you can use &lt;code>--feature-gate=CloudControllerManagerWatchBasedRoutesReconciliation=true&lt;/code>
in the CCM implementation you are using.&lt;/p>
&lt;h2 id="about-the-feature-gate">About the feature gate&lt;/h2>
&lt;p>This feature gate will trigger the route reconciliation loop whenever a node is
added, deleted, or the fields &lt;code>.spec.podCIDRs&lt;/code> or &lt;code>.status.addresses&lt;/code> are updated.&lt;/p>
&lt;p>An additional reconcile is performed in a random interval between 12h and 24h,
which is chosen at the controller's start time.&lt;/p>
&lt;p>This feature gate does not modify the logic within the reconciliation loop.
Therefore, users of a CCM implementation should not experience significant
changes to their existing route configurations.&lt;/p>
&lt;h2 id="how-can-i-learn-more">How can I learn more?&lt;/h2>
&lt;p>For more details, refer to the &lt;a href="https://kep.k8s.io/5237">KEP-5237&lt;/a>.&lt;/p></description></item><item><title>Kubernetes v1.35: Introducing Workload Aware Scheduling</title><link>https://kubernetes.io/blog/2025/12/29/kubernetes-v1-35-introducing-workload-aware-scheduling/</link><pubDate>Mon, 29 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/29/kubernetes-v1-35-introducing-workload-aware-scheduling/</guid><description>
&lt;p>Scheduling large workloads is a much more complex and fragile operation than scheduling a single Pod,
as it often requires considering all Pods together instead of scheduling each one independently.
For example, when scheduling a machine learning batch job, you often need to place each worker strategically,
such as on the same rack, to make the entire process as efficient as possible.
At the same time, the Pods that are part of such a workload are very often identical
from the scheduling perspective, which fundamentally changes how this process should look.&lt;/p>
&lt;p>There are many custom schedulers adapted to perform workload scheduling efficiently,
but considering how common and important workload scheduling is to Kubernetes users,
especially in the AI era with the growing number of use cases,
it is high time to make workloads a first-class citizen for &lt;code>kube-scheduler&lt;/code> and support them natively.&lt;/p>
&lt;h2 id="workload-aware-scheduling">Workload aware scheduling&lt;/h2>
&lt;p>The recent 1.35 release of Kubernetes delivered the first tranche of &lt;em>workload aware scheduling&lt;/em> improvements.
These are part of a wider effort that is aiming to improve scheduling and management of workloads.
The effort will span over many SIGs and releases, and is supposed to gradually expand
capabilities of the system toward reaching the north star goal,
which is seamless workload scheduling and management in Kubernetes including,
but not limited to, preemption and autoscaling.&lt;/p>
&lt;p>Kubernetes v1.35 introduces the Workload API that you can use to describe the desired shape
as well as scheduling-oriented requirements of the workload. It comes with an initial implementation
of &lt;em>gang scheduling&lt;/em> that instructs the &lt;code>kube-scheduler&lt;/code> to schedule gang Pods in the &lt;em>all-or-nothing&lt;/em> fashion.
Finally, we improved scheduling of identical Pods (that typically make a gang) to speed up the process
thanks to the &lt;em>opportunistic batching&lt;/em> feature.&lt;/p>
&lt;h2 id="workload-api">Workload API&lt;/h2>
&lt;p>The new Workload API resource is part of the &lt;code>scheduling.k8s.io/v1alpha1&lt;/code>
&lt;a class='glossary-tooltip' title='A set of related paths in the Kubernetes API.' data-toggle='tooltip' data-placement='top' href='https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-groups-and-versioning' target='_blank' aria-label='API group'>API group&lt;/a>.
This resource acts as a structured, machine-readable definition of the scheduling requirements
of a multi-Pod application. While user-facing workloads like Jobs define what to run, the Workload resource
determines how a group of Pods should be scheduled and how its placement should be managed
throughout its lifecycle.&lt;/p>
&lt;p>A Workload allows you to define a group of Pods and apply a scheduling policy to them.
Here is what a gang scheduling configuration looks like. You can define a &lt;code>podGroup&lt;/code> named &lt;code>workers&lt;/code>
and apply the &lt;code>gang&lt;/code> policy with a &lt;code>minCount&lt;/code> of 4.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>scheduling.k8s.io/v1alpha1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Workload&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>training-job-workload&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">namespace&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>some-ns&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">podGroups&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>workers&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">policy&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">gang&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># The gang is schedulable only if 4 pods can run at once&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">minCount&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">4&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>When you create your Pods, you link them to this Workload using the new &lt;code>workloadRef&lt;/code> field:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>worker-0&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">namespace&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>some-ns&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">workloadRef&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>training-job-workload&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">podGroup&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>workers&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>...&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="how-gang-scheduling-works">How gang scheduling works&lt;/h2>
&lt;p>The &lt;code>gang&lt;/code> policy enforces &lt;em>all-or-nothing&lt;/em> placement. Without gang scheduling,
a Job might be partially scheduled, consuming resources without being able to run,
leading to resource wastage and potential deadlocks.&lt;/p>
&lt;p>When you create Pods that are part of a gang-scheduled pod group, the scheduler's &lt;code>GangScheduling&lt;/code>
plugin manages the lifecycle independently for each pod group (or replica key):&lt;/p>
&lt;ol>
&lt;li>
&lt;p>When you create your Pods (or a controller makes them for you),
the scheduler blocks them from scheduling, until:&lt;/p>
&lt;ul>
&lt;li>The referenced Workload object is created.&lt;/li>
&lt;li>The referenced pod group exists in a Workload.&lt;/li>
&lt;li>The number of pending Pods in that group meets your &lt;code>minCount&lt;/code>.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Once enough Pods arrive, the scheduler tries to place them. However,
instead of binding them to nodes immediately, the Pods wait at a &lt;code>Permit&lt;/code> gate.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The scheduler checks if it has found valid assignments for the entire group (at least the &lt;code>minCount&lt;/code>).&lt;/p>
&lt;ul>
&lt;li>If there is room for the group, the gate opens, and all Pods are bound to nodes.&lt;/li>
&lt;li>If only a subset of the group pods was successfully scheduled within a timeout (set to 5 minutes),
the scheduler rejects &lt;strong>all&lt;/strong> of the Pods in the group.
They go back to the queue, freeing up the reserved resources for other workloads.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;p>We'd like to point out that that while this is a first implementation, the Kubernetes project firmly
intends to improve and expand the gang scheduling algorithm in future releases.
Benefits we hope to deliver include a single-cycle scheduling phase for a whole gang,
workload-level preemption, and more, moving towards the north star goal.&lt;/p>
&lt;h2 id="opportunistic-batching">Opportunistic batching&lt;/h2>
&lt;p>In addition to explicit gang scheduling, v1.35 introduces &lt;em>opportunistic batching&lt;/em>.
This is a Beta feature that improves scheduling latency for identical Pods.&lt;/p>
&lt;p>Unlike gang scheduling, this feature does not require the Workload API
or any explicit opt-in on the user's part. It works opportunistically within the scheduler
by identifying Pods that have identical scheduling requirements (container images, resource requests,
affinities, etc.). When the scheduler processes a Pod, it can reuse the feasibility calculations
for subsequent identical Pods in the queue, significantly speeding up the process.&lt;/p>
&lt;p>Most users will benefit from this optimization automatically, without taking any special steps,
provided their Pods meet the following criteria.&lt;/p>
&lt;h3 id="restrictions">Restrictions&lt;/h3>
&lt;p>Opportunistic batching works under specific conditions. All fields used by the &lt;code>kube-scheduler&lt;/code>
to find a placement must be identical between Pods. Additionally, using some features
disables the batching mechanism for those Pods to ensure correctness.&lt;/p>
&lt;p>Note that you may need to review your &lt;code>kube-scheduler&lt;/code> configuration
to ensure it is not implicitly disabling batching for your workloads.&lt;/p>
&lt;p>See the &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/scheduler-perf-tuning/#enabling-opportunistic-batching">docs&lt;/a> for more details about restrictions.&lt;/p>
&lt;h2 id="the-north-star-vision">The north star vision&lt;/h2>
&lt;p>The project has a broad ambition to deliver workload aware scheduling.
These new APIs and scheduling enhancements are just the first steps.
In the near future, the effort aims to tackle:&lt;/p>
&lt;ul>
&lt;li>Introducing a workload scheduling phase&lt;/li>
&lt;li>Improved support for multi-node &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/">DRA&lt;/a>
and topology aware scheduling&lt;/li>
&lt;li>Workload-level preemption&lt;/li>
&lt;li>Improved integration between scheduling and autoscaling&lt;/li>
&lt;li>Improved interaction with external workload schedulers&lt;/li>
&lt;li>Managing placement of workloads throughout their entire lifecycle&lt;/li>
&lt;li>Multi-workload scheduling simulations&lt;/li>
&lt;/ul>
&lt;p>And more. The priority and implementation order of these focus areas
are subject to change. Stay tuned for further updates.&lt;/p>
&lt;h2 id="getting-started">Getting started&lt;/h2>
&lt;p>To try the workload aware scheduling improvements:&lt;/p>
&lt;ul>
&lt;li>Workload API: Enable the
&lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#GenericWorkload">&lt;code>GenericWorkload&lt;/code>&lt;/a>
feature gate on both &lt;code>kube-apiserver&lt;/code> and &lt;code>kube-scheduler&lt;/code>, and ensure the &lt;code>scheduling.k8s.io/v1alpha1&lt;/code>
&lt;a class='glossary-tooltip' title='A set of related paths in the Kubernetes API.' data-toggle='tooltip' data-placement='top' href='https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-groups-and-versioning' target='_blank' aria-label='API group'>API group&lt;/a> is enabled.&lt;/li>
&lt;li>Gang scheduling: Enable the
&lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#GangScheduling">&lt;code>GangScheduling&lt;/code>&lt;/a>
feature gate on &lt;code>kube-scheduler&lt;/code> (requires the Workload API to be enabled).&lt;/li>
&lt;li>Opportunistic batching: As a Beta feature, it is enabled by default in v1.35.
You can disable it using the
&lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#OpportunisticBatching">&lt;code>OpportunisticBatching&lt;/code>&lt;/a>
feature gate on &lt;code>kube-scheduler&lt;/code> if needed.&lt;/li>
&lt;/ul>
&lt;p>We encourage you to try out workload aware scheduling in your test clusters
and share your experiences to help shape the future of Kubernetes scheduling.
You can send your feedback by:&lt;/p>
&lt;ul>
&lt;li>Reaching out via &lt;a href="https://kubernetes.slack.com/archives/C09TP78DV">Slack (#sig-scheduling)&lt;/a>.&lt;/li>
&lt;li>Commenting on the &lt;a href="https://github.com/kubernetes/kubernetes/issues/132192">workload aware scheduling tracking issue&lt;/a>&lt;/li>
&lt;li>Filing a new &lt;a href="https://github.com/kubernetes/enhancements/issues">issue&lt;/a> in the Kubernetes repository.&lt;/li>
&lt;/ul>
&lt;h2 id="learn-more">Learn more&lt;/h2>
&lt;ul>
&lt;li>Read the KEPs for
&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/4671-gang-scheduling">Workload API and gang scheduling&lt;/a> and
&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/5598-opportunistic-batching">Opportunistic batching&lt;/a>.&lt;/li>
&lt;li>Track the &lt;a href="https://github.com/kubernetes/kubernetes/issues/132192">Workload aware scheduling issue&lt;/a>
for recent updates.&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.35: Fine-grained Supplemental Groups Control Graduates to GA</title><link>https://kubernetes.io/blog/2025/12/23/kubernetes-v1-35-fine-grained-supplementalgroups-control-ga/</link><pubDate>Tue, 23 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/23/kubernetes-v1-35-fine-grained-supplementalgroups-control-ga/</guid><description>
&lt;p>On behalf of Kubernetes SIG Node, we are pleased to announce the graduation of &lt;em>fine-grained supplemental groups control&lt;/em> to General Availability (GA) in Kubernetes v1.35!&lt;/p>
&lt;p>The new Pod field, &lt;code>supplementalGroupsPolicy&lt;/code>, was introduced as an opt-in alpha feature for Kubernetes v1.31, and then had graduated to beta in v1.33.
Now, the feature is generally available.
This feature allows you to implement more precise control over supplemental groups in Linux containers that can strengthen the security posture particularly in accessing volumes.
Moreover, it also enhances the transparency of UID/GID details in containers, offering improved security oversight.&lt;/p>
&lt;p>If you are planning to upgrade your cluster from v1.32 or an earlier version, please be aware that some behavioral breaking change introduced since beta (v1.33).
For more details, see the &lt;a href="https://kubernetes.io/blog/2025/05/06/kubernetes-v1-33-fine-grained-supplementalgroups-control-beta/#the-behavioral-changes-introduced-in-beta">behavioral changes introduced in beta&lt;/a> and
the &lt;a href="https://kubernetes.io/blog/2025/05/06/kubernetes-v1-33-fine-grained-supplementalgroups-control-beta/#upgrade-consideration">upgrade considerations&lt;/a> sections of the previous blog for graduation to beta.&lt;/p>
&lt;h2 id="motivation-implicit-group-memberships-defined-in-etc-group-in-the-container-image">Motivation: Implicit group memberships defined in &lt;code>/etc/group&lt;/code> in the container image&lt;/h2>
&lt;p>Even though the majority of Kubernetes cluster admins/users may not be aware of this,
by default Kubernetes &lt;em>merges&lt;/em> group information from the Pod with information defined in &lt;code>/etc/group&lt;/code> in the container image.&lt;/p>
&lt;p>Here's an example; a Pod manifest that specifies &lt;code>spec.securityContext.runAsUser: 1000&lt;/code>, &lt;code>spec.securityContext.runAsGroup: 3000&lt;/code> and &lt;code>spec.securityContext.supplementalGroups: 4000&lt;/code> as part of the Pod's security context.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>implicit-groups-example&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">securityContext&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">runAsUser&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">1000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">runAsGroup&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">3000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">supplementalGroups&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#666">4000&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>example-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>registry.k8s.io/e2e-test-images/agnhost:2.45&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;sh&amp;#34;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;-c&amp;#34;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;sleep 1h&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">securityContext&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">allowPrivilegeEscalation&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">false&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>What is the result of &lt;code>id&lt;/code> command in the &lt;code>example-container&lt;/code> container? The output should be similar to this:&lt;/p>
&lt;pre tabindex="0">&lt;code class="language-none" data-lang="none">uid=1000 gid=3000 groups=3000,4000,50000
&lt;/code>&lt;/pre>&lt;p>Where does group ID &lt;code>50000&lt;/code> in supplementary groups (&lt;code>groups&lt;/code> field) come from, even though &lt;code>50000&lt;/code> is not defined in the Pod's manifest at all? The answer is &lt;code>/etc/group&lt;/code> file in the container image.&lt;/p>
&lt;p>Checking the contents of &lt;code>/etc/group&lt;/code> in the container image contains something like the following:&lt;/p>
&lt;pre tabindex="0">&lt;code class="language-none" data-lang="none">user-defined-in-image:x:1000:
group-defined-in-image:x:50000:user-defined-in-image
&lt;/code>&lt;/pre>&lt;p>This shows that the container's primary user &lt;code>1000&lt;/code> belongs to the group &lt;code>50000&lt;/code> in the last entry.&lt;/p>
&lt;p>Thus, the group membership defined in &lt;code>/etc/group&lt;/code> in the container image for the container's primary user is &lt;em>implicitly&lt;/em> merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.&lt;/p>
&lt;h3 id="what-s-wrong-with-it">What's wrong with it?&lt;/h3>
&lt;p>The &lt;em>implicitly&lt;/em> merged group information from &lt;code>/etc/group&lt;/code> in the container image poses a security risk. These implicit GIDs can't be detected or validated by policy engines because there's no record of them in the Pod manifest. This can lead to unexpected access control issues, particularly when accessing volumes (see &lt;a href="https://issue.k8s.io/112879">kubernetes/kubernetes#112879&lt;/a> for details) because file permission is controlled by UID/GIDs in Linux.&lt;/p>
&lt;h2 id="fine-grained-supplemental-groups-control-in-a-pod-supplementarygroupspolicy">Fine-grained supplemental groups control in a Pod: &lt;code>supplementaryGroupsPolicy&lt;/code>&lt;/h2>
&lt;p>To tackle this problem, a Pod's &lt;code>.spec.securityContext&lt;/code> now includes &lt;code>supplementalGroupsPolicy&lt;/code> field.&lt;/p>
&lt;p>This field lets you control how Kubernetes calculates the supplementary groups for container processes within a Pod. The available policies are:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;em>Merge&lt;/em>: The group membership defined in &lt;code>/etc/group&lt;/code> for the container's primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backward compatibility).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Strict&lt;/em>: Only the group IDs specified in &lt;code>fsGroup&lt;/code>, &lt;code>supplementalGroups&lt;/code>, or &lt;code>runAsGroup&lt;/code> are attached as supplementary groups to the container processes. Group memberships defined in &lt;code>/etc/group&lt;/code> for the container's primary user are ignored.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>I'll explain how the &lt;code>Strict&lt;/code> policy works. The following Pod manifest specifies &lt;code>supplementalGroupsPolicy: Strict&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>strict-supplementalgroups-policy-example&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">securityContext&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">runAsUser&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">1000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">runAsGroup&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">3000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">supplementalGroups&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#666">4000&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">supplementalGroupsPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Strict&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>example-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>registry.k8s.io/e2e-test-images/agnhost:2.45&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;sh&amp;#34;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;-c&amp;#34;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;sleep 1h&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">securityContext&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">allowPrivilegeEscalation&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">false&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The result of &lt;code>id&lt;/code> command in the &lt;code>example-container&lt;/code> container should be similar to this:&lt;/p>
&lt;pre tabindex="0">&lt;code class="language-none" data-lang="none">uid=1000 gid=3000 groups=3000,4000
&lt;/code>&lt;/pre>&lt;p>You can see &lt;code>Strict&lt;/code> policy can exclude group &lt;code>50000&lt;/code> from &lt;code>groups&lt;/code>!&lt;/p>
&lt;p>Thus, ensuring &lt;code>supplementalGroupsPolicy: Strict&lt;/code> (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.&lt;/p>
&lt;div class="alert alert-info" role="alert">&lt;h4 class="alert-heading">Note:&lt;/h4>&lt;p>A container with sufficient privileges can change its process identity.
The &lt;code>supplementalGroupsPolicy&lt;/code> only affect the initial process identity.&lt;/p>
&lt;p>Read on for more details.&lt;/p>
&lt;/div>
&lt;h2 id="attached-process-identity-in-pod-status">Attached process identity in Pod status&lt;/h2>
&lt;p>This feature also exposes the process identity attached to the first container process of the container
via &lt;code>.status.containerStatuses[].user.linux&lt;/code> field. It would be helpful to see if implicit group IDs are attached.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#00f;font-weight:bold">...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">status&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containerStatuses&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ctr&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">user&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">linux&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">gid&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">3000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">supplementalGroups&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#666">3000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#666">4000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">uid&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">1000&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;div class="alert alert-info" role="alert">&lt;h4 class="alert-heading">Note:&lt;/h4>&lt;p>Please note that the values in &lt;code>status.containerStatuses[].user.linux&lt;/code> field is &lt;em>the firstly attached&lt;/em>
process identity to the first container process in the container. If the container has sufficient privilege
to call system calls related to process identity (e.g. &lt;a href="https://man7.org/linux/man-pages/man2/setuid.2.html">&lt;code>setuid(2)&lt;/code>&lt;/a>, &lt;a href="https://man7.org/linux/man-pages/man2/setgid.2.html">&lt;code>setgid(2)&lt;/code>&lt;/a> or &lt;a href="https://man7.org/linux/man-pages/man2/setgroups.2.html">&lt;code>setgroups(2)&lt;/code>&lt;/a>, etc.), the container process can change its identity. Thus, the &lt;em>actual&lt;/em> process identity will be dynamic.&lt;/p>
&lt;p>There are several ways to restrict these permissions in containers. We suggest the belows as simple solutions:&lt;/p>
&lt;ul>
&lt;li>setting &lt;code>privilege: false&lt;/code> and &lt;code>allowPrivilegeEscalation: false&lt;/code> in your container's &lt;code>securityContext&lt;/code>, or&lt;/li>
&lt;li>conform your pod to &lt;a href="https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted">&lt;code>Restricted&lt;/code> policy in Pod Security Standard&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Also, kubelet has no visibility into NRI plugins or container runtime internal workings. Cluster Administrator configuring nodes or highly privilege workloads with the permission of a local administrator may change supplemental groups for any pod. However this is outside of a scope of Kubernetes control and should not be a concern for security-hardened nodes.&lt;/p>
&lt;/div>
&lt;h2 id="strict-policy-requires-up-to-date-container-runtimes">&lt;code>Strict&lt;/code> policy requires up-to-date container runtimes&lt;/h2>
&lt;p>The high level container runtime (e.g. containerd, CRI-O) plays a key role for calculating supplementary group ids
that will be attached to the containers. Thus, &lt;code>supplementalGroupsPolicy: Strict&lt;/code> requires a CRI runtime that support this feature.
The old behavior (&lt;code>supplementalGroupsPolicy: Merge&lt;/code>) can work with a CRI runtime that does not support this feature,
because this policy is fully backward compatible.&lt;/p>
&lt;p>Here are some CRI runtimes that support this feature, and the versions you need
to be running:&lt;/p>
&lt;ul>
&lt;li>containerd: v2.0 or later&lt;/li>
&lt;li>CRI-O: v1.31 or later&lt;/li>
&lt;/ul>
&lt;p>And, you can see if the feature is supported in the Node's &lt;code>.status.features.supplementalGroupsPolicy&lt;/code> field. Please note that this field is different from &lt;code>status.declaredFeatures&lt;/code> introduced in &lt;a href="https://github.com/kubernetes/enhancements/issues/5328">KEP-5328: Node Declared Features(formerly Node Capabilities)&lt;/a>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Node&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">status&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">features&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">supplementalGroupsPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>As container runtimes support this feature universally, various security policies may start enforcing the &lt;code>Strict&lt;/code> behavior as more secure. It is the best practice to ensure that your Pods are ready for this enforcement and all supplemental groups are transparently declared in Pod spec, rather than in images.&lt;/p>
&lt;h2 id="getting-involved">Getting involved&lt;/h2>
&lt;p>This enhancement was driven by the &lt;a href="https://github.com/kubernetes/community/tree/master/sig-node">SIG Node&lt;/a> community.
Please join us to connect with the community and share your ideas and feedback around the above feature and
beyond. We look forward to hearing from you!&lt;/p>
&lt;h2 id="how-can-i-learn-more">How can I learn more?&lt;/h2>
&lt;!-- https://github.com/kubernetes/website/pull/46920 -->
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/security-context/">Configure a Security Context for a Pod or Container&lt;/a>
for the further details of &lt;code>supplementalGroupsPolicy&lt;/code>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/enhancements/issues/3619">KEP-3619: Fine-grained SupplementalGroups control&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.35: Kubelet Configuration Drop-in Directory Graduates to GA</title><link>https://kubernetes.io/blog/2025/12/22/kubernetes-v1-35-kubelet-config-drop-in-directory-ga/</link><pubDate>Mon, 22 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/22/kubernetes-v1-35-kubelet-config-drop-in-directory-ga/</guid><description>
&lt;p>With the recent v1.35 release of Kubernetes, support for a kubelet configuration drop-in directory is generally available.
The newly stable feature simplifies the management of kubelet configuration across large, heterogeneous clusters.&lt;/p>
&lt;p>With v1.35, the kubelet command line argument &lt;code>--config-dir&lt;/code> is production-ready and fully supported,
allowing you to specify a directory containing kubelet configuration drop-in files.
All files in that directory will be automatically merged with your main kubelet configuration.
This allows cluster administrators to maintain a cohesive &lt;em>base configuration&lt;/em> for kubelets while enabling targeted customizations for different node groups or use cases, and without complex tooling or manual configuration management.&lt;/p>
&lt;h2 id="the-problem-managing-kubelet-configuration-at-scale">The problem: managing kubelet configuration at scale&lt;/h2>
&lt;p>As Kubernetes clusters grow larger and more complex, they often include heterogeneous node pools with different hardware capabilities, workload requirements, and operational constraints. This diversity necessitates different kubelet configurations across node groups—yet managing these varied configurations at scale becomes increasingly challenging. Several pain points emerge:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Configuration drift&lt;/strong>: Different nodes may have slightly different configurations, leading to inconsistent behavior&lt;/li>
&lt;li>&lt;strong>Node group customization&lt;/strong>: GPU nodes, edge nodes, and standard compute nodes often require different kubelet settings&lt;/li>
&lt;li>&lt;strong>Operational overhead&lt;/strong>: Maintaining separate, complete configuration files for each node type is error-prone and difficult to audit&lt;/li>
&lt;li>&lt;strong>Change management&lt;/strong>: Rolling out configuration changes across heterogeneous node pools requires careful coordination&lt;/li>
&lt;/ul>
&lt;p>Before this support was added to Kubernetes, cluster administrators had to choose between using a single monolithic configuration file for all nodes,
manually maintaining multiple complete configuration files, or relying on separate tooling. Each approach had its own drawbacks.
This graduation to stable gives cluster administrators a fully supported fourth way to solve that challenge.&lt;/p>
&lt;h2 id="example-use-cases">Example use cases&lt;/h2>
&lt;h3 id="managing-heterogeneous-node-pools">Managing heterogeneous node pools&lt;/h3>
&lt;p>Consider a cluster with multiple node types: standard compute nodes, high-capacity nodes (such as those with GPUs or large amounts of memory), and edge nodes with specialized requirements.&lt;/p>
&lt;h4 id="base-configuration">Base configuration&lt;/h4>
&lt;p>File: &lt;code>00-base.conf&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubelet.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>KubeletConfiguration&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">clusterDNS&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#b44">&amp;#34;10.96.0.10&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">clusterDomain&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>cluster.local&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="high-capacity-node-override">High-capacity node override&lt;/h4>
&lt;p>File: &lt;code>50-high-capacity-nodes.conf&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubelet.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>KubeletConfiguration&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">maxPods&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">50&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">systemReserved&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;4Gi&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;1000m&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="edge-node-override">Edge node override&lt;/h4>
&lt;p>File: &lt;code>50-edge-nodes.conf&lt;/code> (edge compute typically has lower capacity)&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubelet.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>KubeletConfiguration&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">evictionHard&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory.available&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;500Mi&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">nodefs.available&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;5%&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>With this structure, high-capacity nodes apply both the base configuration and the capacity-specific overrides, while edge nodes apply the base configuration with edge-specific settings.&lt;/p>
&lt;h3 id="gradual-configuration-rollouts">Gradual configuration rollouts&lt;/h3>
&lt;p>When rolling out configuration changes, you can:&lt;/p>
&lt;ol>
&lt;li>Add a new drop-in file with a high numeric prefix (e.g., &lt;code>99-new-feature.conf&lt;/code>)&lt;/li>
&lt;li>Test the changes on a subset of nodes&lt;/li>
&lt;li>Gradually roll out to more nodes&lt;/li>
&lt;li>Once stable, merge changes into the base configuration&lt;/li>
&lt;/ol>
&lt;h2 id="viewing-the-merged-configuration">Viewing the merged configuration&lt;/h2>
&lt;p>Since configuration is now spread across multiple files, you can inspect the final merged configuration using the kubelet's &lt;code>/configz&lt;/code> endpoint:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># Start kubectl proxy&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>kubectl proxy
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># In another terminal, fetch the merged configuration&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># Change the &amp;#39;&amp;lt;node-name&amp;gt;&amp;#39; placeholder before running the curl command&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl -X GET http://127.0.0.1:8001/api/v1/nodes/&amp;lt;node-name&amp;gt;/proxy/configz | jq .
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This shows the actual configuration the kubelet is using after all merging has been applied.
The merged configuration also includes any configuration settings that were specified via kubelet command-line arguments.&lt;/p>
&lt;p>For detailed setup instructions, configuration examples, and merging behavior, see the official documentation:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/#kubelet-conf-d">Set Kubelet Parameters Via A Configuration File&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kubernetes.io/docs/reference/node/kubelet-config-directory-merging/">Kubelet Configuration Directory Merging&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="good-practices">Good practices&lt;/h2>
&lt;p>When using the kubelet configuration drop-in directory:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Test configurations incrementally&lt;/strong>: Always test new drop-in configurations on a subset of nodes before rolling out cluster-wide to minimize risk&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Version control your drop-ins&lt;/strong>: Store your drop-in configuration files in version control (or the configuration source from which these are generated) alongside your infrastructure as code to track changes and enable easy rollbacks&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Use numeric prefixes for predictable ordering&lt;/strong>: Name files with numeric prefixes (e.g., &lt;code>00-&lt;/code>, &lt;code>50-&lt;/code>, &lt;code>90-&lt;/code>) to explicitly control merge order and make the configuration layering obvious to other administrators&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Be mindful of temporary files&lt;/strong>: Some text editors automatically create backup files (such as &lt;code>.bak&lt;/code>, &lt;code>.swp&lt;/code>, or files with &lt;code>~&lt;/code> suffix) in the same directory when editing. Ensure these temporary or backup files are not left in the configuration directory, as they may be processed by the kubelet&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>This feature was developed through the collaborative efforts of &lt;a href="https://github.com/kubernetes/community/tree/master/sig-node">SIG Node&lt;/a>. Special thanks to all contributors who helped design, implement, test, and document this feature across its journey from alpha in v1.28, through beta in v1.30, to GA in v1.35.&lt;/p>
&lt;p>To provide feedback on this feature, join the &lt;a href="https://github.com/kubernetes/community/tree/master/sig-node">Kubernetes Node Special Interest Group&lt;/a>, participate in discussions on the &lt;a href="http://slack.k8s.io/">public Slack channel&lt;/a> (#sig-node), or file an issue on &lt;a href="https://github.com/kubernetes/kubernetes/issues">GitHub&lt;/a>.&lt;/p>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>If you have feedback or questions about kubelet configuration management, or want to share your experience using this feature, join the discussion:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/kubernetes/community/tree/master/sig-node">SIG Node community page&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://slack.k8s.io/">Kubernetes Slack&lt;/a> in the #sig-node channel&lt;/li>
&lt;li>&lt;a href="https://groups.google.com/forum/#!forum/kubernetes-sig-node">SIG Node mailing list&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>SIG Node would love to hear about your experiences using this feature in production!&lt;/p></description></item><item><title>Avoiding Zombie Cluster Members When Upgrading to etcd v3.6</title><link>https://kubernetes.io/blog/2025/12/21/preventing-etcd-zombies/</link><pubDate>Sun, 21 Dec 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/12/21/preventing-etcd-zombies/</guid><description>
&lt;p>&lt;em>This article is a mirror of an &lt;a href="https://etcd.io/blog/2025/zombie_members_upgrade/">original&lt;/a> that was recently published to the official etcd blog&lt;/em>.
The &lt;a href="https://etcd.io/blog/2025/zombie_members_upgrade/#key-takeaway">key takeaway&lt;/a>?
Always upgrade to etcd v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired, and avoids zombie members.&lt;/p>
&lt;h2 id="issue-summary">Issue summary&lt;/h2>
&lt;p>Recently, the etcd community addressed an issue that may appear when users &lt;a href="https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/">upgrade from v3.5 to v3.6&lt;/a>. This bug can cause the cluster to report &amp;quot;zombie members&amp;quot;, which are etcd nodes that were removed from the database cluster some time ago, and are re-appearing and joining database consensus. The etcd cluster is then inoperable until these zombie members are removed.&lt;/p>
&lt;p>In etcd v3.5 and earlier, the v2store was the source of truth for membership data, even though the v3store was also present. As a part of our &lt;a href="https://github.com/etcd-io/etcd/issues/12913">v2store deprecation plan&lt;/a>, in v3.6 the v3store is the source of truth for cluster membership. Through a &lt;a href="https://github.com/etcd-io/etcd/issues/20967">bug report&lt;/a> we found out that, in some older clusters, v2store and v3store could become inconsistent. This inconsistency manifests after upgrading as seeing old, removed &amp;quot;zombie&amp;quot; cluster members re-appearing in the cluster.&lt;/p>
&lt;h2 id="the-fix-and-upgrade-path">The fix and upgrade path&lt;/h2>
&lt;p>We’ve added a &lt;a href="https://github.com/etcd-io/etcd/pull/20995">mechanism in etcd v3.5.26&lt;/a> to automatically sync v3store from v2store, ensuring that affected clusters are repaired before upgrading to 3.6.x.&lt;/p>
&lt;p>To support the many users currently upgrading to 3.6, we have provided the following safe upgrade path:&lt;/p>
&lt;ol>
&lt;li>Upgrade your cluster to &lt;a href="https://github.com/etcd-io/etcd/releases/tag/v3.5.26">v3.5.26&lt;/a> or later.&lt;/li>
&lt;li>Wait and confirm that all members are healthy post-update.&lt;/li>
&lt;li>Upgrade to v3.6.&lt;/li>
&lt;/ol>
&lt;p>We are unable to provide a safe workaround path for users who have some obstacle preventing updating to v3.5.26. As such, if v3.5.26 is not available from your packaging source or vendor, you should delay upgrading to v3.6 until it is.&lt;/p>
&lt;h2 id="additional-technical-detail">Additional technical detail&lt;/h2>
&lt;p>&lt;strong>Information below is offered for reference only. Users can follow the safe upgrade path without knowledge of the following details.&lt;/strong>&lt;/p>
&lt;p>This issue is encountered with clusters that have been running in production on etcd v3.5.25 or earlier. It is a side effect of adding and removing members from the cluster, or recovering the cluster from failure. This means that the issue is more likely the older the etcd cluster is, but it cannot be ruled out for any user regardless of the age of the cluster.&lt;/p>
&lt;p>etcd maintainers, working with issue reporters, have found three possible triggers for the issue based on symptoms and an analysis of etcd code and logs:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Bug in &lt;code>etcdctl snapshot restore&lt;/code> (v3.4 and old versions)&lt;/strong>: When restoring a snapshot using &lt;code>etcdctl snapshot restore&lt;/code>, etcdctl was supposed to remove existing members before adding the new ones. In v3.4, due to a bug, old members were not removed, resulting in zombie members. Refer to the &lt;a href="https://github.com/etcd-io/etcd/issues/20967#issuecomment-3618010356">comment on etcdctl&lt;/a>.&lt;/li>
&lt;li>&lt;strong>&lt;code>--force-new-cluster&lt;/code> in v3.5 and earlier versions&lt;/strong>: In rare cases, forcibly creating a new single-member cluster did not fully remove old members, leaving zombies. The issue was &lt;a href="https://github.com/etcd-io/etcd/pull/20339">resolved&lt;/a> in v3.5.22. Please refer to &lt;a href="https://github.com/etcd-io/raft/pull/300">this PR&lt;/a> in the Raft project for detailed technical information.&lt;/li>
&lt;li>&lt;strong>--unsafe-no-sync enabled&lt;/strong>: If &lt;code>--unsafe-no-sync&lt;/code> is enabled, in rare cases etcd might persist a membership change to v3store but crash before writing it to the WAL, causing inconsistency between v2store and v3store. This is a problem for single-member clusters. For multi-member clusters, forcibly creating a new single-member cluster from the crashed node’s data may lead to zombie members.&lt;/li>
&lt;/ol>
&lt;div class="alert alert-info" role="alert">
&lt;h4 class="alert-heading">Note&lt;/h4>
&lt;code>--unsafe-no-sync&lt;/code> is generally not recommended, as it may break the guarantees given by the consensus protocol.
&lt;/div>
&lt;p>Importantly, there may be other triggers for v2store and v3store membership data becoming inconsistent that we have not yet found. This means that you cannot assume that you are safe just because you have not performed any of the three actions above.
Once users are upgraded to etcd v3.6, v3store becomes the source of membership data, and further inconsistency is not possible.&lt;/p>
&lt;p>Advanced users who want to verify the consistency between v2store and v3store can follow the steps described in this &lt;a href="https://github.com/etcd-io/etcd/issues/20967#issuecomment-3590609775">comment&lt;/a>. This check is not required to fix the issue, nor does SIG etcd recommend bypassing the v3.5.26 update regardless of the results of the check.&lt;/p>
&lt;h2 id="key-takeaway">Key takeaway&lt;/h2>
&lt;p>Always upgrade to &lt;a href="https://github.com/etcd-io/etcd/releases/tag/v3.5.26">v3.5.26&lt;/a> or later before moving to v3.6. This ensures your cluster is automatically repaired and avoids zombie members.&lt;/p>
&lt;h2 id="acknowledgements">Acknowledgements&lt;/h2>
&lt;p>We would like to thank &lt;a href="https://github.com/thechristschn">Christian Baumann&lt;/a> for reporting this long-standing upgrade issue. His report and follow-up work helped bring the issue to our attention so that we could investigate and resolve it upstream.&lt;/p></description></item><item><title>Kubernetes 1.35: In-Place Pod Resize Graduates to Stable</title><link>https://kubernetes.io/blog/2025/12/19/kubernetes-v1-35-in-place-pod-resize-ga/</link><pubDate>Fri, 19 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/19/kubernetes-v1-35-in-place-pod-resize-ga/</guid><description>
&lt;p>This release marks a major step: more than 6 years after its initial conception,
the &lt;strong>In-Place Pod Resize&lt;/strong> feature (also known as In-Place Pod Vertical Scaling), first introduced as
alpha in Kubernetes v1.27, and graduated to beta in Kubernetes v1.33, is now &lt;strong>stable (GA)&lt;/strong> in Kubernetes
1.35!&lt;/p>
&lt;p>This graduation is a major milestone for improving resource efficiency and flexibility for workloads
running on Kubernetes.&lt;/p>
&lt;h2 id="what-is-in-place-pod-resize">What is in-place Pod Resize?&lt;/h2>
&lt;p>In the past, the CPU and memory resources allocated to a container in a Pod were immutable. This meant changing
them required deleting and recreating the entire Pod. For stateful services, batch jobs, or latency-sensitive
workloads, this was an incredibly disruptive operation.&lt;/p>
&lt;p>In-Place Pod Resize makes CPU and memory requests and limits mutable, allowing you to adjust these resources
within a running Pod, often without requiring a container restart.&lt;/p>
&lt;p>&lt;strong>Key Concept:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Desired Resources:&lt;/strong> A container's &lt;code>spec.containers[*].resources&lt;/code> field now represents the desired
resources. For CPU and memory, these fields are now mutable.&lt;/li>
&lt;li>&lt;strong>Actual Resources:&lt;/strong> The &lt;code>status.containerStatuses[*].resources&lt;/code> field reflects the resources currently
configured for a running container.&lt;/li>
&lt;li>&lt;strong>Triggering a Resize:&lt;/strong> You can request a resize by updating the desired &lt;code>requests&lt;/code>
and &lt;code>limits&lt;/code> in the Pod's specification by utilizing the new &lt;code>resize&lt;/code> subresource.&lt;/li>
&lt;/ul>
&lt;h2 id="how-can-i-start-using-in-place-pod-resize">How can I start using in-place Pod Resize?&lt;/h2>
&lt;p>Detailed usage instructions and examples are provided in the official documentation:
&lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/">Resize CPU and Memory Resources assigned to Containers&lt;/a>.&lt;/p>
&lt;h2 id="how-does-this-help-me">How does this help me?&lt;/h2>
&lt;p>In-place Pod Resize is a foundational building block that unlocks seamless, vertical autoscaling and
improvements to workload efficiency.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Resources adjusted without disruption&lt;/strong> Workloads sensitive to latency or restarts can have their resources
modified in-place without downtime or loss of state.&lt;/li>
&lt;li>&lt;strong>More powerful autoscaling&lt;/strong> Autoscalers are now empowered to adjust resources and with less
impact. For example, Vertical Pod Autoscaler (VPA)'s &lt;code>InPlaceOrRecreate&lt;/code> update mode, which leverages this
feature, has graduated to beta. This allows resources to be adjusted automatically and seamlessly based on
usage with minimal disruption.
&lt;ul>
&lt;li>See &lt;a href="https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support">AEP-4016&lt;/a> for more details.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Address transient resource needs&lt;/strong> Workloads that temporarily need more resources can be adjusted quickly. This enables features like the CPU Startup Boost (&lt;a href="https://github.com/kubernetes/autoscaler/pull/7863">AEP-7862&lt;/a>) where applications can request more CPU during startup and then automatically scale back down.&lt;/li>
&lt;/ul>
&lt;p>Here are a few examples of some use cases:&lt;/p>
&lt;ul>
&lt;li>A game server that needs to adjust its size with shifting player count.&lt;/li>
&lt;li>A pre-warmed worker that can be shrunk while unused but inflated with the first request.&lt;/li>
&lt;li>Dynamically scale with load for efficient bin-packing.&lt;/li>
&lt;li>Increased resources for JIT compilation on startup.&lt;/li>
&lt;/ul>
&lt;h2 id="changes-between-beta-1-33-and-stable-1-35">Changes between beta (1.33) and stable (1.35)&lt;/h2>
&lt;p>Since the initial beta in v1.33, development effort has primarily been around stabilizing the feature and
improving its usability based on community feedback. Here are the primary changes for the stable release:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Memory limit decrease&lt;/strong> Decreasing memory limits was previously prohibited. This restriction has been
lifted, and memory limit decreases are now permitted. The Kubelet attempts to prevent OOM-kills by allowing the
resize only if the current memory usage is below the new desired limit. However, this check is best-effort and
not guaranteed.&lt;/li>
&lt;li>&lt;strong>Prioritized resizes&lt;/strong> If a node doesn't have enough room to accept all resize requests, &lt;em>Deferred&lt;/em> resizes
are reattempted based on the following priority:
&lt;ul>
&lt;li>PriorityClass&lt;/li>
&lt;li>QoS class&lt;/li>
&lt;li>Duration &lt;em>Deferred&lt;/em>, with older requests prioritized first.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Pod Level Resources (Alpha)&lt;/strong> Support for in-place Pod Resize with Pod Level Resources has been introduced
behind its own feature gate, which is alpha in v1.35.&lt;/li>
&lt;li>&lt;strong>Increased observability&lt;/strong>: There are now new Kubelet metrics and Pod events specifically associated with
In-Place Pod Resize to help users track and debug resource changes.&lt;/li>
&lt;/ul>
&lt;h2 id="what-s-next">What's next?&lt;/h2>
&lt;p>The graduation of In-Place Pod Resize to stable opens the door for powerful integrations across the Kubernetes
ecosystem. There are several areas for futher improvement that are currently planned.&lt;/p>
&lt;h3 id="integration-with-autoscalers-and-other-projects">Integration with autoscalers and other projects&lt;/h3>
&lt;p>There are planned integrations with several autoscalers and other projects to improve workload efficiency at a larger scale. Some projects under discussion:&lt;/p>
&lt;ul>
&lt;li>VPA CPU startup boost (&lt;a href="https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/7862-cpu-startup-boost">AEP-7862&lt;/a>): Allows applications to request more CPU at startup and scale back down after a specific period of time.&lt;/li>
&lt;li>VPA Support for in-place updates (&lt;a href="https://github.com/kubernetes/autoscaler/tree/455d29039bf6b1eb9f784f498f28769a8698bc21/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support">AEP-4016&lt;/a>): VPA support for &lt;code>InPlaceOrRecreate&lt;/code> has recently graduated to beta, with the eventual goal being to graduate
the feature to stable. Support for &lt;code>InPlace&lt;/code> mode is still being worked on; see &lt;a href="https://github.com/kubernetes/autoscaler/pull/8818">this pull request&lt;/a>.&lt;/li>
&lt;li>Ray autoscaler: Plans to leverage In-Place Pod Resize to improve workload efficiency. See &lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/ray-on-gke-new-features-for-ai-scheduling-and-scaling">this Google Cloud blog post&lt;/a> for more details.&lt;/li>
&lt;li>Agent-sandbox &amp;quot;Soft-Pause&amp;quot;: Investigating leveraging in-place Pod Resize for better improved latency. See the &lt;a href="https://github.com/kubernetes-sigs/agent-sandbox/issues/103">Github issue&lt;/a> for more details.&lt;/li>
&lt;li>Runtime support: Java and Python runtimes do not support resizing memory without restart. There is an open
conversation with the Java developers, see &lt;a href="https://bugs.openjdk.org/browse/JDK-8359211">the bug&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>If you have a project that could benefit from integration with in-place pod resize, please reach out using
the channels listed in the feedback section!&lt;/p>
&lt;h3 id="feature-expansion">Feature expansion&lt;/h3>
&lt;p>Today, In-Place Pod Resize is prohibited when used in combination with: swap, the static CPU Manager, and the
static Memory Manager. Additionally, resources other than CPU and memory are still immutable. Expanding the set
of supported features and resources is under consideration as more feedback about community needs comes in.&lt;/p>
&lt;p>There are also plans to support workload preemption; if there is not enough room on the node for the resize of
a high priority pod, the goal is to enable policies to automatically evict a lower-priority pod or upsize
the node.&lt;/p>
&lt;h3 id="improved-stability">Improved stability&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Resolve kubelet-scheduler race conditions&lt;/strong> There are known race conditions between the kubelet and
scheduler with regards to in-place pod resize. Work is underway to resolve these issues over the next few releases. See the &lt;a href="https://github.com/kubernetes/kubernetes/issues/126891">issue&lt;/a> for more details.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Safer memory limit decrease&lt;/strong> The Kubelet's best-effort check for OOM-kill prevention can be made even
safer by moving the memory usage check into the container runtime itself. See the &lt;a href="https://github.com/kubernetes/kubernetes/issues/135670">issue&lt;/a> for more details.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="providing-feedback">Providing feedback&lt;/h2>
&lt;p>Looking to further build on this foundational feature, please share your feedback on how to improve and
extend this feature. You can share your feedback through GitHub issues, mailing lists, or Slack channels
related to the Kubernetes &lt;a href="https://kubernetes.slack.com/archives/C0BP8PW9G">#sig-node&lt;/a> and &lt;a href="https://kubernetes.slack.com/archives/C09R1LV8S">#sig-autoscaling&lt;/a> communities.&lt;/p>
&lt;p>Thank you to everyone who contributed to making this long-awaited feature a reality!&lt;/p></description></item><item><title>Kubernetes v1.35: Job Managed By Goes GA</title><link>https://kubernetes.io/blog/2025/12/18/kubernetes-v1-35-job-managedby-for-jobs-goes-ga/</link><pubDate>Thu, 18 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/18/kubernetes-v1-35-job-managedby-for-jobs-goes-ga/</guid><description>
&lt;p>In Kubernetes v1.35, the ability to specify an external Job controller (through &lt;code>.spec.managedBy&lt;/code>) graduates to General Availability.&lt;/p>
&lt;p>This feature allows external controllers to take full responsibility for Job reconciliation, unlocking powerful scheduling patterns like multi-cluster dispatching with &lt;a href="https://kueue.sigs.k8s.io/docs/concepts/multikueue/">MultiKueue&lt;/a>.&lt;/p>
&lt;h2 id="why-delegate-job-reconciliation">Why delegate Job reconciliation?&lt;/h2>
&lt;p>The primary motivation for this feature is to support multi-cluster batch scheduling architectures, such as MultiKueue.&lt;/p>
&lt;p>The MultiKueue architecture distinguishes between a Management Cluster and a pool of Worker Clusters:&lt;/p>
&lt;ul>
&lt;li>The Management Cluster is responsible for dispatching Jobs but not executing them. It needs to accept Job objects to track status, but it skips the creation and execution of Pods.&lt;/li>
&lt;li>The Worker Clusters receive the dispatched Jobs and execute the actual Pods.&lt;/li>
&lt;li>Users usually interact with the Management Cluster. Because the status is automatically propagated back, they can observe the Job's progress &amp;quot;live&amp;quot; without accessing the Worker Clusters.&lt;/li>
&lt;li>In the Worker Clusters, the dispatched Jobs run as regular Jobs managed by the built-in Job controller, with no &lt;code>.spec.managedBy&lt;/code> set.&lt;/li>
&lt;/ul>
&lt;p>By using &lt;code>.spec.managedBy&lt;/code>, the MultiKueue controller on the Management Cluster can take over the reconciliation of a Job. It copies the status from the &amp;quot;mirror&amp;quot; Job running on the Worker Cluster back to the Management Cluster.&lt;/p>
&lt;p>Why not just disable the Job controller? While one could theoretically achieve this by disabling the built-in Job controller entirely, this is often impossible or impractical for two reasons:&lt;/p>
&lt;ol>
&lt;li>Managed Control Planes: In many cloud environments, the Kubernetes control plane is locked, and users cannot modify controller manager flags.&lt;/li>
&lt;li>Hybrid Cluster Role: Users often need a &amp;quot;hybrid&amp;quot; mode where the Management Cluster dispatches some heavy workloads to remote clusters but still executes smaller or control-plane-related Jobs in the Management Cluster. &lt;code>.spec.managedBy&lt;/code> allows this granularity on a per-Job basis.&lt;/li>
&lt;/ol>
&lt;h2 id="how-spec-managedby-works">How &lt;code>.spec.managedBy&lt;/code> works&lt;/h2>
&lt;p>The &lt;code>.spec.managedBy&lt;/code> field indicates which controller is responsible for the Job, specifically there are two modes of operation:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Standard&lt;/strong>: if unset or set to the reserved value &lt;code>kubernetes.io/job-controller&lt;/code>, the built-in Job controller reconciles the Job as usual (standard behavior).&lt;/li>
&lt;li>&lt;strong>Delegation&lt;/strong>: If set to any other value, the built-in Job controller skips reconciliation entirely for that Job.&lt;/li>
&lt;/ul>
&lt;p>To prevent orphaned Pods or resource leaks, this field is immutable. You cannot transfer a running Job from one controller to another.&lt;/p>
&lt;p>If you are looking into implementing an external controller, be aware that your controller needs to be conformant with the definitions for the &lt;a href="https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/job-v1/">Job API&lt;/a>.
In order to enforce the conformance, a significant part of the effort was to introduce the extensive Job status validation rules.
Navigate to the &lt;a href="#how-can-you-learn-more">How can you learn more?&lt;/a> section for more details.&lt;/p>
&lt;h2 id="ecosystem-adoption">Ecosystem Adoption&lt;/h2>
&lt;p>The &lt;code>.spec.managedBy&lt;/code> field is rapidly becoming the standard interface for delegating control in the Kubernetes batch ecosystem.&lt;/p>
&lt;p>Various custom workload controllers are adding this field (or an equivalent) to allow MultiKueue to take over their reconciliation and orchestrate them across clusters:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/kubernetes-sigs/jobset">JobSet&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.kubeflow.org/docs/components/training/">Kubeflow Trainer&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.ray.io/en/latest/cluster/kubernetes/">KubeRay&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://project-codeflare.github.io/appwrapper/">AppWrapper&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://tekton.dev/docs/">Tekton Pipelines&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>While it is possible to use &lt;code>.spec.managedBy&lt;/code> to implement a custom Job controller from scratch, we haven't observed that yet. The feature is specifically designed to support delegation patterns, like MultiKueue, without reinventing the wheel.&lt;/p>
&lt;h2 id="how-can-you-learn-more">How can you learn more?&lt;/h2>
&lt;p>If you want to dig deeper:&lt;/p>
&lt;p>Read the user-facing documentation for:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Jobs&lt;/a>,&lt;/li>
&lt;li>&lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#delegation-of-managing-a-job-object-to-external-controller">Delegation of managing a Job object to an external controller&lt;/a>, and&lt;/li>
&lt;li>&lt;a href="https://kueue.sigs.k8s.io/docs/concepts/multikueue/">MultiKueue&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Deep dive into the design history:&lt;/p>
&lt;ul>
&lt;li>The Kubernetes Enhancement Proposal (KEP) &lt;a href="https://github.com/kubernetes/enhancements/issues/4368">Job's managed-by mechanism&lt;/a> including introduction of the extensive &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/4368-support-managed-by-for-batch-jobs#job-status-validation">Job status validation rules&lt;/a>.&lt;/li>
&lt;li>The Kueue KEP for &lt;a href="https://github.com/kubernetes-sigs/kueue/tree/main/keps/693-multikueue">MultiKueue&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Explore how MultiKueue uses &lt;code>.spec.managedBy&lt;/code> in practice in the task guide for &lt;a href="https://kueue.sigs.k8s.io/docs/tasks/run/multikueue/job/">running Jobs across clusters&lt;/a>.&lt;/p>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>As with any Kubernetes feature, a lot of people helped shape this one through design discussions, reviews, test runs,
and bug reports.&lt;/p>
&lt;p>We would like to thank, in particular:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/soltysh">Maciej Szulik&lt;/a> - for guidance, mentorship, and reviews.&lt;/li>
&lt;li>&lt;a href="https://github.com/atiratree">Filip Křepinský&lt;/a> - for guidance, mentorship, and reviews.&lt;/li>
&lt;/ul>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>This work was sponsored by the Kubernetes
&lt;a href="https://github.com/kubernetes/community/tree/master/wg-batch">Batch Working Group&lt;/a>
in close collaboration with the
&lt;a href="https://github.com/kubernetes/community/tree/master/sig-apps">SIG Apps&lt;/a>,
and with strong input from the
&lt;a href="https://github.com/kubernetes/community/tree/master/sig-scheduling">SIG Scheduling&lt;/a> community.&lt;/p>
&lt;p>If you are interested in batch scheduling, multi-cluster solutions, or further improving the Job API:&lt;/p>
&lt;ul>
&lt;li>Join us in the Batch WG and SIG Apps meetings.&lt;/li>
&lt;li>Subscribe to the &lt;a href="https://kubernetes.slack.com/messages/wg-batch">WG Batch Slack channel&lt;/a>.&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.35: Timbernetes (The World Tree Release)</title><link>https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/</link><pubDate>Wed, 17 Dec 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/</guid><description>
&lt;p>&lt;strong>Editors&lt;/strong>: Aakanksha Bhende, Arujjwal Negi, Chad M. Crowell, Graziano Casto, Swathi Rao&lt;/p>
&lt;p>Similar to previous releases, the release of Kubernetes v1.35 introduces new stable, beta, and alpha features. The consistent delivery of high-quality releases underscores the strength of our development cycle and the vibrant support from our community.&lt;/p>
&lt;p>This release consists of 60 enhancements, including 17 stable, 19 beta, and 22 alpha features.&lt;/p>
&lt;p>There are also some &lt;a href="#deprecations-removals-and-community-updates">deprecations and removals&lt;/a> in this release; make sure to read about those.&lt;/p>
&lt;h2 id="release-theme-and-logo">Release theme and logo&lt;/h2>
&lt;figure class="release-logo ">
&lt;img src="https://kubernetes.io/blog/2025/12/17/kubernetes-v1-35-release/k8s-v1.35.png"
alt="Kubernetes v1.35 Timbernetes logo: a storybook hex badge with a glowing world tree whose branches cradle Earth and a white Kubernetes wheel; three cheerful squirrels stand below—a wizard in a plum robe holding an LGTM scroll, a warrior with an axe and blue Kubernetes shield, and a lantern-carrying rogue in a navy cloak—on green grass above a gold ribbon reading World Tree Release, backed by soft mountains and cloud-swept sky"/>
&lt;/figure>
&lt;p>2025 began in the shimmer of Octarine: The Color of Magic (v1.33) and rode the gusts Of Wind &amp;amp; Will (v1.34). We close the year with our hands on the World Tree, inspired by Yggdrasil, the tree of life that binds many realms. Like any great tree, Kubernetes grows ring by ring and release by release, shaped by the care of a global community.&lt;/p>
&lt;p>At its center sits the Kubernetes wheel wrapped around the Earth, grounded by the resilient maintainers, contributors and users who keep showing up. Between day jobs, life changes, and steady open-source stewardship, they prune old APIs, graft new features and keep one of the world’s largest open source projects healthy.&lt;/p>
&lt;p>Three squirrels guard the tree: a wizard holding the LGTM scroll for reviewers, a warrior with an axe and Kubernetes shield for the release crews who cut new branches, and a rogue with a lantern for the triagers who bring light to dark issue queues.&lt;/p>
&lt;p>Together, they stand in for a much larger adventuring party. Kubernetes v1.35 adds another growth ring to the World Tree, a fresh cut shaped by many hands, many paths and a community whose branches reach higher as its roots grow deeper.&lt;/p>
&lt;h2 id="spotlight-on-key-updates">Spotlight on key updates&lt;/h2>
&lt;p>Kubernetes v1.35 is packed with new features and improvements. Here are a few select updates the Release Team would like to highlight!&lt;/p>
&lt;h3 id="stable-in-place-update-of-pod-resources">Stable: In-place update of Pod resources&lt;/h3>
&lt;p>Kubernetes has graduated in-place updates for Pod resources to General Availability (GA).
This feature allows users to adjust CPU and memory resources without restarting Pods or Containers. Previously, such modifications required recreating Pods, which could disrupt workloads, particularly for stateful or batch applications. Earlier Kubernetes releases allowed you to change only infrastructure resource settings (requests and limits) for existing Pods. The new in-place functionality allows for smoother, nondisruptive vertical scaling, improves efficiency, and can also simplify development.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/1287">KEP #1287&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="beta-pod-certificates-for-workload-identity-and-security">Beta: Pod certificates for workload identity and security&lt;/h3>
&lt;p>Previously, delivering certificates to pods required external controllers (cert-manager, SPIFFE/SPIRE), CRD orchestration, and Secret management, with rotation handled by sidecars or init containers. Kubernetes v1.35 enables native workload identity with automated certificate rotation, drastically simplifying service mesh and zero-trust architectures.&lt;/p>
&lt;p>Now, the &lt;code>kubelet&lt;/code> generates keys, requests certificates via PodCertificateRequest, and writes credential bundles directly to the Pod's filesystem. The &lt;code>kube-apiserver&lt;/code> enforces node restriction at admission time, eliminating the most common pitfall for third-party signers: accidentally violating node isolation boundaries. This enables pure mTLS flows with no bearer tokens in the issuance path.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4317">KEP #4317&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="alpha-node-declared-features-before-scheduling">Alpha: Node declared features before scheduling&lt;/h3>
&lt;p>When control planes enable new features but nodes lag behind (permitted by Kubernetes skew policy), the scheduler can place pods requiring those features onto incompatible older nodes.
The node-declaration features framework allows nodes to declare their supported Kubernetes features. With the new alpha feature enabled, a Node reports the features it supports, publishing this information to the control plane via a new &lt;code>.status.declaredFeatures&lt;/code> field. Then, the &lt;code>kube-scheduler&lt;/code>, admission controllers, and third-party components can use these declarations. For example, you can enforce scheduling and API validation constraints to ensure that Pods run only on compatible nodes.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5328">KEP #5328&lt;/a> led by SIG Node.&lt;/p>
&lt;h2 id="features-graduating-to-stable">Features graduating to Stable&lt;/h2>
&lt;p>&lt;em>This is a selection of some of the improvements that are now stable following the v1.35 release.&lt;/em>&lt;/p>
&lt;h3 id="prefersamenode-traffic-distribution">PreferSameNode traffic distribution&lt;/h3>
&lt;p>The &lt;code>trafficDistribution&lt;/code> field for Services has been updated to provide more explicit control over traffic routing. A new option, &lt;code>PreferSameNode&lt;/code>, has been introduced to let services strictly prioritize endpoints on the local node if available, falling back to remote endpoints otherwise.&lt;/p>
&lt;p>Simultaneously, the existing &lt;code>PreferClose&lt;/code> option has been renamed to &lt;code>PreferSameZone&lt;/code>. This change makes the API self-explanatory by explicitly indicating that traffic is preferred within the current availability zone. While &lt;code>PreferClose&lt;/code> is preserved for backward compatibility, &lt;code>PreferSameZone&lt;/code> is now the standard for zonal routing, ensuring that both node-level and zone-level preferences are clearly distinguished.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3015">KEP #3015&lt;/a> led by SIG Network.&lt;/p>
&lt;h3 id="job-api-managed-by-mechanism">Job API managed-by mechanism&lt;/h3>
&lt;p>The Job API now includes a &lt;code>managedBy&lt;/code> field that allows an external controller to handle Job status synchronization. This feature, which graduates to stable in Kubernetes v1.35, is primarily driven by &lt;a href="https://github.com/kubernetes-sigs/kueue/tree/main/keps/693-multikueue">MultiKueue&lt;/a>, a multi-cluster dispatching system where a Job created in a management cluster is mirrored and executed in a worker cluster, with status updates propagated back. To enable this workflow, the built-in Job controller must not act on a particular Job resource so that the Kueue controller can manage status updates instead.&lt;/p>
&lt;p>The goal is to allow clean delegation of Job synchronization to another controller. It does not aim to pass custom parameters to that controller or modify CronJob concurrency policies.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4368">KEP #4368&lt;/a> led by SIG Apps.&lt;/p>
&lt;h3 id="reliable-pod-update-tracking-with-metadata-generation">Reliable Pod update tracking with &lt;code>.metadata.generation&lt;/code>&lt;/h3>
&lt;p>Historically, the Pod API lacked the &lt;code>metadata.generation&lt;/code> field found in other Kubernetes objects such as Deployments.
Because of this omission, controllers and users had no reliable way to verify whether the &lt;code>kubelet&lt;/code> had actually processed the latest changes to a Pod's specification. This ambiguity was particularly problematic for features like &lt;a href="#stable-in-place-update-of-pod-resources">In-Place Pod Vertical Scaling&lt;/a>, where it was difficult to know exactly when a resource resize request had been enacted.&lt;/p>
&lt;p>Kubernetes v1.33 added &lt;code>.metadata.generation&lt;/code> fields for Pods, as an alpha feature. That field is now stable in the v1.35 Pod API, which means that every time a Pod's &lt;code>spec&lt;/code> is updated, the &lt;code>.metadata.generation&lt;/code> value is incremented. As part of this improvement, the Pod API also gained a &lt;code>.status.observedGeneration&lt;/code> field, which reports the generation that the &lt;code>kubelet&lt;/code> has successfully seen and processed. Pod conditions also each contain their own individual &lt;code>observedGeneration&lt;/code> field that clients can report and / or observe.&lt;/p>
&lt;p>Because this feature has graduated to stable in v1.35, it is available for all workloads.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5067">KEP #5067&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="configurable-numa-node-limit-for-topology-manager">Configurable NUMA node limit for topology manager&lt;/h3>
&lt;p>The &lt;a href="https://kubernetes.io/docs/concepts/policy/node-resource-managers/">topology manager&lt;/a> historically used a hard-coded limit of 8 for the maximum number of NUMA nodes it can support, preventing state explosion during affinity calculation. (There's an important detail here; a &lt;em>NUMA node&lt;/em> is not the same as a Node in the Kubernetes API.) This limit on the number of NUMA nodes prevented Kubernetes from fully utilizing modern high-end servers, which increasingly feature CPU architectures with more than 8 NUMA nodes.&lt;/p>
&lt;p>Kubernetes v1.31 introduced a new, &lt;strong>beta&lt;/strong> &lt;code>max-allowable-numa-nodes&lt;/code> option to the topology manager policy configuration. In Kubernetes v1.35, that option is stable. Cluster administrators who enable it can use servers with more than 8 NUMA nodes.&lt;/p>
&lt;p>Although the configuration option is stable, the Kubernetes community is aware of the poor performance for large NUMA hosts, and there is a &lt;a href="https://kep.k8s.io/5726">proposed enhancement&lt;/a> (KEP-5726) that aims to improve on it. You can learn more about this by reading &lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/">Control Topology Management Policies on a node&lt;/a>.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4622">KEP #4622&lt;/a> led by SIG Node.&lt;/p>
&lt;h2 id="new-features-in-beta">New features in Beta&lt;/h2>
&lt;p>&lt;em>This is a selection of some of the improvements that are now beta following the v1.35 release.&lt;/em>&lt;/p>
&lt;h3 id="expose-node-topology-labels-via-downward-api">Expose node topology labels via Downward API&lt;/h3>
&lt;p>Accessing node topology information, such as region and zone, from within a Pod has typically required querying the Kubernetes API server. While functional, this approach creates complexity and security risks by necessitating broad RBAC permissions or sidecar containers just to retrieve infrastructure metadata. Kubernetes v1.35 promotes the capability to expose node topology labels directly via the Downward API to beta.&lt;/p>
&lt;p>The &lt;code>kubelet&lt;/code> can now inject standard topology labels, such as &lt;code>topology.kubernetes.io/zone&lt;/code> and &lt;code>topology.kubernetes.io/region&lt;/code>, into Pods as environment variables or projected volume files. The primary benefit is a safer and more efficient way for workloads to be topology-aware. This allows applications to natively adapt to their availability zone or region without dependencies on the API server, strengthening security by upholding the principle of least privilege and simplifying cluster configuration.&lt;/p>
&lt;p>&lt;strong>Note:&lt;/strong> Kubernetes now injects available topology labels to every Pod so that they can be used as inputs to the &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/downward-api/">downward API&lt;/a>. With the v1.35 upgrade, most cluster administrators will see several new labels added to each Pod; this is expected as part of the design.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4742">KEP #4742&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="native-support-for-storage-version-migration">Native support for storage version migration&lt;/h3>
&lt;p>In Kubernetes v1.35, the native support for storage version migration graduates to beta and is enabled by default. This move integrates the migration logic directly into the core Kubernetes control plane (&amp;quot;in-tree&amp;quot;), eliminating the dependency on external tools.&lt;/p>
&lt;p>Historically, administrators relied on manual &amp;quot;read/write loops&amp;quot;—often piping &lt;code>kubectl get&lt;/code> into &lt;code>kubectl replace&lt;/code>—to update schemas or re-encrypt data at rest. This method was inefficient and prone to conflicts, especially for large resources like Secrets. With this release, the built-in controller automatically handles update conflicts and consistency tokens, providing a safe, streamlined, and reliable way to ensure stored data remains current with minimal operational overhead.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4192">KEP #4192&lt;/a> led by SIG API Machinery.&lt;/p>
&lt;h3 id="mutable-volume-attach-limits">Mutable Volume attach limits&lt;/h3>
&lt;p>A CSI (Container Storage Interface) driver is a Kubernetes plugin that provides a consistent way for storage systems to be exposed to containerized workloads. The &lt;code>CSINode&lt;/code> object records details about all CSI drivers installed on a node. However, a mismatch can arise between the reported and actual attachment capacity on nodes. When volume slots are consumed after a CSI driver starts up, the &lt;code>kube-scheduler&lt;/code> may assign stateful pods to nodes without sufficient capacity, ultimately getting stuck in a &lt;code>ContainerCreating&lt;/code> state.&lt;/p>
&lt;p>Kubernetes v1.35 makes &lt;code>CSINode.spec.drivers[*].allocatable.count&lt;/code> mutable so that a node’s available volume attachment capacity can be updated dynamically. It also allows CSI drivers to control how frequently the &lt;code>allocatable.count&lt;/code> value is updated on all nodes by introducing a configurable refresh interval, defined through the &lt;code>CSIDriver&lt;/code> object. Additionally, it automatically updates &lt;code>CSINode.spec.drivers[*].allocatable.count&lt;/code> on detecting a failure in volume attachment due to insufficient capacity. Although this feature graduated to beta in v1.34 with the feature flag &lt;code>MutableCSINodeAllocatableCount&lt;/code> disabled by default, it remains in beta for v1.35 to allow time for feedback, but the feature flag is enabled by default.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4876">KEP #4876&lt;/a> led by SIG Storage.&lt;/p>
&lt;h3 id="opportunistic-batching">Opportunistic batching&lt;/h3>
&lt;p>Historically, the Kubernetes scheduler processes pods sequentially with time complexity of &lt;code>O(num pods × num nodes)&lt;/code>, which can result in redundant computation for compatible pods. This KEP introduces an opportunistic batching mechanism that aims to improve performance by identifying such compatible Pods via &lt;code>Pod scheduling signature&lt;/code> and batching them together, allowing shared filtering and scoring results across them.&lt;/p>
&lt;p>The pod scheduling signature ensures that two pods with the same signature are “the same” from a scheduling perspective. It takes into account not only the pod and node attributes, but also the other pods in the system and global data about the pod placement. This means that any pod with the given signature will get the same scores/feasibility results from any arbitrary set of nodes.&lt;/p>
&lt;p>The batching mechanism consists of two operations that can be invoked whenever needed - &lt;em>create&lt;/em> and &lt;em>nominate&lt;/em>. Create leads to the creation of a new set of batch information from the scheduling results of Pods that have a valid signature. Nominate uses the batching information from create to set the nominated node name from a new Pod whose signature matches the canonical Pod’s signature.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5598">KEP #5598&lt;/a> led by SIG Scheduling.&lt;/p>
&lt;h3 id="maxunavailable-for-statefulsets">&lt;code>maxUnavailable&lt;/code> for StatefulSets&lt;/h3>
&lt;p>A StatefulSet runs a group of Pods and maintains a sticky identity for each of those Pods. This is critical for stateful workloads requiring stable network identifiers or persistent storage. When a StatefulSet's &lt;code>.spec.updateStrategy.&amp;lt;type&amp;gt;&lt;/code> is set to &lt;code>RollingUpdate&lt;/code>, the StatefulSet controller will delete and recreate each Pod in the StatefulSet. It will proceed in the same order as Pod termination (from the largest ordinal to the smallest), updating each Pod one at a time.&lt;/p>
&lt;p>Kubernetes v1.24 added a new &lt;strong>alpha&lt;/strong> field to a StatefulSet's &lt;code>rollingUpdate&lt;/code> configuration settings, called &lt;code>maxUnavailable&lt;/code>. That field wasn't part of the Kubernetes API unless your cluster administrator explicitly opted in.
In Kubernetes v1.35 that field is beta and is available by default. You can use it to define the maximum number of pods that can be unavailable during an update. This setting is most effective in combination with &lt;code>.spec.podManagementPolicy&lt;/code> set to Parallel. You can set &lt;code>maxUnavailable&lt;/code> as either a positive number (example: 2) or a percentage of the desired number of Pods (example: 10%). If this field is not specified, it will default to 1, to maintain the previous behavior of only updating one Pod at a time. This improvement allows stateful applications (that can tolerate more than one Pod being down) to finish updating faster.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/961">KEP #961&lt;/a> led by SIG Apps.&lt;/p>
&lt;h3 id="configurable-credential-plugin-policy-in-kuberc">Configurable credential plugin policy in &lt;code>kuberc&lt;/code>&lt;/h3>
&lt;p>The optional &lt;a href="https://kubernetes.io/docs/reference/kubectl/kuberc/">&lt;code>kuberc&lt;/code> file&lt;/a> is a way to separate server configurations and cluster credentials from user preferences without disrupting already running CI pipelines with unexpected outputs.&lt;/p>
&lt;p>As part of the v1.35 release, &lt;code>kuberc&lt;/code> gains additional functionality which allows users to configure credential plugin policy. This change introduces two fields &lt;code>credentialPluginPolicy&lt;/code>, which allows or denies all plugins, and allows specifying a list of allowed plugins using &lt;code>credentialPluginAllowlist&lt;/code>.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3104">KEP #3104&lt;/a> as a cooperation between SIG Auth and SIG CLI.&lt;/p>
&lt;h3 id="kyaml">KYAML&lt;/h3>
&lt;p>YAML is a human-readable format of data serialization. In Kubernetes, YAML files are used to define and configure resources, such as Pods, Services, and Deployments. However, complex YAML is difficult to read. YAML's significant whitespace requires careful attention to indentation and nesting, while its optional string-quoting can lead to unexpected type coercion (see: The Norway Bug). While JSON is an alternative, it lacks support for comments and has strict requirements for trailing commas and quoted keys.&lt;/p>
&lt;p>KYAML is a safer and less ambiguous subset of YAML designed specifically for Kubernetes. Introduced as an opt-in alpha feature in v1.34, this feature graduated to beta in Kubernetes v1.35 and has been enabled by default. It can be disabled by setting the environment variable &lt;code>KUBECTL_KYAML=false&lt;/code>.&lt;/p>
&lt;p>KYAML addresses challenges pertaining to both YAML and JSON. All KYAML files are also valid YAML files. This means you can write KYAML and pass it as an input to any version of kubectl. This also means that you don’t need to write in strict KYAML for the input to be parsed.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5295">KEP #5295&lt;/a> led by SIG CLI.&lt;/p>
&lt;h3 id="configurable-tolerance-for-horizontalpodautoscalers">Configurable tolerance for HorizontalPodAutoscalers&lt;/h3>
&lt;p>The Horizontal Pod Autoscaler (HPA) has historically relied on a fixed, global 10% tolerance for scaling actions. A drawback of this hardcoded value was that workloads requiring high sensitivity, such as those needing to scale on a 5% load increase, were often blocked from scaling, while others might oscillate unnecessarily.&lt;/p>
&lt;p>With Kubernetes v1.35, the configurable tolerance feature graduates to beta and is enabled by default. This enhancement allows users to define a custom tolerance window on a per-resource basis within the HPA &lt;code>behavior&lt;/code> field. By setting a specific tolerance (e.g., lowering it to 0.05 for 5%), operators gain precise control over autoscaling sensitivity, ensuring that critical workloads react quickly to small metric changes, without requiring cluster-wide configuration adjustments.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4951">KEP #4951&lt;/a> led by SIG Autoscaling.&lt;/p>
&lt;h3 id="support-for-user-namespaces-in-pods">Support for user namespaces in Pods&lt;/h3>
&lt;p>Kubernetes is adding support for user namespaces, allowing pods to run with isolated user and group ID mappings instead of sharing host IDs. This means containers can operate as root internally while actually being mapped to an unprivileged user on the host, reducing the risk of privilege escalation in the event of a compromise. The feature improves pod-level security and makes it safer to run workloads that need root inside the container. Over time, support has expanded to both stateless and stateful Pods through id-mapped mounts.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/127">KEP #127&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="volumesource-oci-artifact-and-or-image">VolumeSource: OCI artifact and/or image&lt;/h3>
&lt;p>When creating a Pod, you often need to provide data, binaries, or configuration files for your containers. This meant including the content into the main container image or using a custom init container to download and unpack files into an &lt;code>emptyDir&lt;/code>. Both these approaches are still valid. Kubernetes v1.31 added support for the &lt;code>image&lt;/code> volume type allowing Pods to declaratively pull and unpack OCI container image artifacts into a volume. This lets you package and deliver data-only artifacts such as configs, binaries, or machine learning models using standard OCI registry tools.&lt;/p>
&lt;p>With this feature, you can fully separate your data from your container image and remove the need for extra init containers or startup scripts. The image volume type has been in beta since v1.33 and is enabled by default in v1.35. Please note that using this feature requires a compatible container runtime, such as containerd v2.1 or later.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4639">KEP #4639&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="enforced-kubelet-credential-verification-for-cached-images">Enforced &lt;code>kubelet&lt;/code> credential verification for cached images&lt;/h3>
&lt;p>The &lt;code>imagePullPolicy: IfNotPresent&lt;/code> setting currently allows a Pod to use a container image that is already cached on a node, even if the Pod itself does not possess the credentials to pull that image. A drawback of this behavior is that it creates a security vulnerability in multi-tenant clusters: if a Pod with valid credentials pulls a sensitive private image to a node, a subsequent unauthorized Pod on the same node can access that image simply by relying on the local cache.&lt;/p>
&lt;p>This KEP introduces a mechanism where the &lt;code>kubelet&lt;/code> enforces credential verification for cached images. Before allowing a Pod to use a locally cached image, the &lt;code>kubelet&lt;/code> checks if the Pod has the valid credentials to pull it. This ensures that only authorized workloads can use private images, regardless of whether they are already present on the node, significantly hardening the security posture for shared clusters.&lt;/p>
&lt;p>In Kubernetes v1.35, this feature has graduated to beta and is enabled by default. Users can still disable it by setting the &lt;code>KubeletEnsureSecretPulledImages&lt;/code> feature gate to false. Additionally, the &lt;code>imagePullCredentialsVerificationPolicy&lt;/code> flag allows operators to configure the desired security level, ranging from a mode that prioritizes backward compatibility to a strict enforcement mode that offers maximum security.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/2535">KEP #2535&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="fine-grained-container-restart-rules">Fine-grained Container restart rules&lt;/h3>
&lt;p>Historically, the &lt;code>restartPolicy&lt;/code> field was defined strictly at the Pod level, forcing the same behavior on all containers within a Pod. A drawback of this global setting was the lack of granularity for complex workloads, such as AI/ML training jobs. These often required &lt;code>restartPolicy: Never&lt;/code> for the Pod to manage job completion, yet individual containers would benefit from in-place restarts for specific, retriable errors (like network glitches or GPU init failures).&lt;/p>
&lt;p>Kubernetes v1.35 addresses this by enabling &lt;code>restartPolicy&lt;/code> and &lt;code>restartPolicyRules&lt;/code> within the container API itself. This allows users to define restart strategies for individual regular and init containers that operate independently of the Pod's overall policy. For example, a container can now be configured to restart automatically only if it exits with a specific error code, avoiding the expensive overhead of rescheduling the entire Pod for a transient failure.&lt;/p>
&lt;p>In this release, the feature has graduated to beta and is enabled by default. Users can immediately leverage &lt;code>restartPolicyRules&lt;/code> in their container specifications to optimize recovery times and resource utilization for long-running workloads, without altering the broader lifecycle logic of their Pods.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5307">KEP #5307&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="csi-driver-opt-in-for-service-account-tokens-via-secrets-field">CSI driver opt-in for service account tokens via secrets field&lt;/h3>
&lt;p>Providing ServiceAccount tokens to Container Storage Interface (CSI) drivers has traditionally relied on injecting them into the &lt;code>volume_context&lt;/code> field. This approach presents a significant security risk because &lt;code>volume_context&lt;/code> is intended for non-sensitive configuration data and is frequently logged in plain text by drivers and debugging tools, potentially leaking credentials.&lt;/p>
&lt;p>Kubernetes v1.35 introduces an opt-in mechanism for CSI drivers to receive ServiceAccount tokens via the dedicated secrets field in the NodePublishVolume request. Drivers can now enable this behavior by setting the &lt;code>serviceAccountTokenInSecrets&lt;/code> field to true in their CSIDriver object, instructing the &lt;code>kubelet&lt;/code> to populate the token securely.&lt;/p>
&lt;p>The primary benefit is the prevention of accidental credential exposure in logs and error messages. This change ensures that sensitive workload identities are handled via the appropriate secure channels, aligning with best practices for secret management while maintaining backward compatibility for existing drivers.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5538">KEP #5538&lt;/a> led by SIG Auth in cooperation with SIG Storage.&lt;/p>
&lt;h3 id="deployment-status-count-of-terminating-replicas">Deployment status: count of terminating replicas&lt;/h3>
&lt;p>Historically, the Deployment status provided details on available and updated replicas but lacked explicit visibility into Pods that were in the process of shutting down. A drawback of this omission was that users and controllers could not easily distinguish between a stable Deployment and one that still had Pods executing cleanup tasks or adhering to long grace periods.&lt;/p>
&lt;p>Kubernetes v1.35 promotes the &lt;code>terminatingReplicas&lt;/code> field within the Deployment status to beta. This field provides a count of Pods that have a deletion timestamp set but have not yet been removed from the system. This feature is a foundational step in a larger initiative to improve how Deployments handle Pod replacement, laying the groundwork for future policies regarding when to create new Pods during a rollout.&lt;/p>
&lt;p>The primary benefit is improved observability for lifecycle management tools and operators. By exposing the number of terminating Pods, external systems can now make more informed decisions such as waiting for a complete shutdown before proceeding with subsequent tasks without needing to manually query and filter individual Pod lists.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3973">KEP #3973&lt;/a> led by SIG Apps.&lt;/p>
&lt;h2 id="new-features-in-alpha">New features in Alpha&lt;/h2>
&lt;p>&lt;em>This is a selection of some of the improvements that are now alpha following the v1.35 release.&lt;/em>&lt;/p>
&lt;h3 id="gang-scheduling-support-in-kubernetes">Gang scheduling support in Kubernetes&lt;/h3>
&lt;p>Scheduling interdependent workloads, such as AI/ML training jobs or HPC simulations, has traditionally been challenging because the default Kubernetes scheduler places Pods individually. This often leads to partial scheduling where some Pods start while others wait indefinitely for resources, resulting in deadlocks and wasted cluster capacity.&lt;/p>
&lt;p>Kubernetes v1.35 introduces native support for so-called &amp;quot;gang scheduling&amp;quot; via the new Workload API and PodGroup concept. This feature implements an &amp;quot;all-or-nothing&amp;quot; scheduling strategy, ensuring that a defined group of Pods is scheduled only if the cluster has sufficient resources to accommodate the entire group simultaneously.&lt;/p>
&lt;p>The primary benefit is improved reliability and efficiency for batch and parallel workloads. By preventing partial deployments, it eliminates resource deadlocks and ensures that expensive cluster capacity is utilized only when a complete job can run, significantly optimizing the orchestration of large-scale data processing tasks.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4671">KEP #4671&lt;/a> led by SIG Scheduling.&lt;/p>
&lt;h3 id="constrained-impersonation">Constrained impersonation&lt;/h3>
&lt;p>Historically, the &lt;code>impersonate&lt;/code> verb in Kubernetes RBAC functioned on an all-or-nothing basis: once a user was authorized to impersonate a target identity, they gained all associated permissions. A drawback of this broad authorization was that it violated the principle of least privilege, preventing administrators from restricting impersonators to specific actions or resources.&lt;/p>
&lt;p>Kubernetes v1.35 introduces a new alpha feature, constrained impersonation, which adds a secondary authorization check to the impersonation flow. When enabled via the &lt;code>ConstrainedImpersonation&lt;/code> feature gate, the API server verifies not only the basic &lt;code>impersonate&lt;/code> permission but also checks if the impersonator is authorized for the specific action using new verb prefixes (e.g., &lt;code>impersonate-on:&amp;lt;mode&amp;gt;:&amp;lt;verb&amp;gt;&lt;/code>). This allows administrators to define fine-grained policies—such as permitting a support engineer to impersonate a cluster admin solely to view logs, without granting full administrative access.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5284">KEP #5284&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="flagz-for-kubernetes-components">Flagz for Kubernetes components&lt;/h3>
&lt;p>Verifying the runtime configuration of Kubernetes components, such as the API server or &lt;code>kubelet&lt;/code>, has traditionally required privileged access to the host node or process arguments. To address this, the &lt;code>/flagz&lt;/code> endpoint was introduced to expose command-line options via HTTP. However, its output was initially limited to plain text, making it difficult for automated tools to parse and validate configurations reliably.&lt;/p>
&lt;p>In Kubernetes v1.35, the &lt;code>/flagz&lt;/code> endpoint has been enhanced to support structured, machine-readable JSON output. Authorized users can now request a versioned JSON response using standard HTTP content negotiation, while the original plain text format remains available for human inspection. This update significantly improves observability and compliance workflows, allowing external systems to programmatically audit component configurations without fragile text parsing or direct infrastructure access.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4828">KEP #4828&lt;/a> led by SIG Instrumentation.&lt;/p>
&lt;h3 id="statusz-for-kubernetes-components">Statusz for Kubernetes components&lt;/h3>
&lt;p>Troubleshooting Kubernetes components like the &lt;code>kube-apiserver&lt;/code> or &lt;code>kubelet&lt;/code> has traditionally involved parsing unstructured logs or text output, which is brittle and difficult to automate. While a basic &lt;code>/statusz&lt;/code> endpoint existed previously, it lacked a standardized, machine-readable format, limiting its utility for external monitoring systems.&lt;/p>
&lt;p>In Kubernetes v1.35, the &lt;code>/statusz&lt;/code> endpoint has been enhanced to support structured, machine-readable JSON output. Authorized users can now request this format using standard HTTP content negotiation to retrieve precise status data—such as version information and health indicators—without relying on fragile text parsing. This improvement provides a reliable, consistent interface for automated debugging and observability tools across all core components.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4827">KEP #4827&lt;/a> led by SIG Instrumentation.&lt;/p>
&lt;h3 id="ccm-watch-based-route-controller-reconciliation-using-informers">CCM: watch-based route controller reconciliation using informers&lt;/h3>
&lt;p>Managing network routes within cloud environments has traditionally relied on the Cloud Controller Manager (CCM) periodically polling the cloud provider's API to verify and update route tables. This fixed-interval reconciliation approach can be inefficient, often generating a high volume of unnecessary API calls and introducing latency between a node state change and the corresponding route update.&lt;/p>
&lt;p>For the Kubernetes v1.35 release, the cloud-controller-manager library introduces a watch-based reconciliation strategy for the route controller. Instead of relying on a timer, the controller now utilizes informers to watch for specific Node events, such as additions, deletions, or relevant field updates and triggers route synchronization only when a change actually occurs.&lt;/p>
&lt;p>The primary benefit is a significant reduction in cloud provider API usage, which lowers the risk of hitting rate limits and reduces operational overhead. Additionally, this event-driven model improves the responsiveness of the cluster's networking layer by ensuring that route tables are updated immediately following changes in cluster topology.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5237">KEP #5237&lt;/a> led by SIG Cloud Provider.&lt;/p>
&lt;h3 id="extended-toleration-operators-for-threshold-based-placement">Extended toleration operators for threshold-based placement&lt;/h3>
&lt;p>Kubernetes v1.35 introduces SLA-aware scheduling by enabling workloads to express reliability requirements. The feature adds numeric comparison operators to tolerations, allowing pods to match or avoid nodes based on SLA-oriented taints such as service guarantees or fault-domain quality.&lt;/p>
&lt;p>The primary benefit is enhancing the scheduler with more precise placement. Critical workloads can demand higher-SLA nodes, while lower priority workloads can opt into lower SLA ones. This improves utilization and reduces cost without compromising reliability.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5471">KEP #5471&lt;/a> led by SIG Scheduling.&lt;/p>
&lt;h3 id="mutable-container-resources-when-job-is-suspended">Mutable container resources when Job is suspended&lt;/h3>
&lt;p>Running batch workloads often involves trial and error with resource limits. Currently, the Job specification is immutable, meaning that if a Job fails due to an Out of Memory (OOM) error or insufficient CPU, the user cannot simply adjust the resources; they must delete the Job and create a new one, losing the execution history and status.&lt;/p>
&lt;p>Kubernetes v1.35 introduces the capability to update resource requests and limits for Jobs that are in a suspended state. Enabled via the &lt;code>MutableJobPodResourcesForSuspendedJobs&lt;/code> feature gate, this enhancement allows users to pause a failing Job, modify its Pod template with appropriate resource values, and then resume execution with the corrected configuration.&lt;/p>
&lt;p>The primary benefit is a smoother recovery workflow for misconfigured jobs. By allowing in-place corrections during suspension, users can resolve resource bottlenecks without disrupting the Job's lifecycle identity or losing track of its completion status, significantly improving the developer experience for batch processing.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5440">KEP #5440&lt;/a> led by SIG Apps.&lt;/p>
&lt;h2 id="other-notable-changes">Other notable changes&lt;/h2>
&lt;h3 id="continued-innovation-in-dynamic-resource-allocation-dra">Continued innovation in Dynamic Resource Allocation (DRA)&lt;/h3>
&lt;p>The &lt;a href="https://kep.k8s.io/4381">core functionality&lt;/a> was graduated to stable in v1.34, with the ability to turn it off. In v1.35 it is always enabled. Several alpha features have also been significantly improved and are ready for testing. We encourage users to provide feedback on these capabilities to help clear the path for their target promotion to beta in upcoming releases.&lt;/p>
&lt;h4 id="extended-resource-requests-via-dra">Extended Resource Requests via DRA&lt;/h4>
&lt;p>Several functional gaps compared to Extended Resource requests via Device Plugins were addressed, for example scoring and reuse of devices in init containers.&lt;/p>
&lt;h4 id="device-taints-and-tolerations">Device Taints and Tolerations&lt;/h4>
&lt;p>The new &amp;quot;None&amp;quot; effect can be used to report a problem without immediately affecting scheduling or running pod. DeviceTaintRule now provides status information about an ongoing eviction. The &amp;quot;None&amp;quot; effect can be used for a &amp;quot;dry run&amp;quot; before actually evicting pods:&lt;/p>
&lt;ul>
&lt;li>Create DeviceTaintRule with &amp;quot;effect: None&amp;quot;.&lt;/li>
&lt;li>Check the status to see how many pods would be evicted.&lt;/li>
&lt;li>Replace &amp;quot;effect: None&amp;quot; with &amp;quot;effect: NoExecute&amp;quot;.&lt;/li>
&lt;/ul>
&lt;h4 id="partitionable-devices">Partitionable Devices&lt;/h4>
&lt;p>Devices belonging to the same partitionable devices may now be defined in different ResourceSlices.
You can read more in the &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices">official documentation&lt;/a>.&lt;/p>
&lt;h4 id="consumable-capacity-device-binding-conditions">Consumable Capacity, Device Binding Conditions&lt;/h4>
&lt;p>Several bugs were fixed and/or more tests added.
You can learn more about &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#consumable-capacity">Consumable Capacity&lt;/a> and &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#device-binding-conditions">Binding Conditions&lt;/a> in the official documentation.&lt;/p>
&lt;h3 id="comparable-resource-version-semantics">Comparable resource version semantics&lt;/h3>
&lt;p>Kubernetes v1.35 changes the way that clients are allowed to interpret &lt;a href="https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions">resource versions&lt;/a>.&lt;/p>
&lt;p>Before v1.35, the only supported comparison that clients could make was to check for string equality: if two resource versions were equal, they were the same. Clients could also provide a resource version to the API server and ask the control plane to do internal comparisons, such as streaming all events since a particular resource version.&lt;/p>
&lt;p>In v1.35, all in-tree resource versions meet a new stricter definition: the values are a special form of decimal number. And, because they can be compared, clients can do their own operations to compare two different resource versions.
For example, this means that a client reconnecting after a crash can detect when it has lost updates, as distinct from the case where there has been an update but no lost changes in the meantime.&lt;/p>
&lt;p>This change in semantics enables other important use cases such as &lt;a href="https://kubernetes.io/docs/tasks/manage-kubernetes-objects/storage-version-migration/">storage version migration&lt;/a>, performance improvements to &lt;em>informers&lt;/em> (a client helper concept), and controller reliability. All of those cases require knowing whether one resource version is newer than another.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5504">KEP #5504&lt;/a> led by SIG API Machinery.&lt;/p>
&lt;h2 id="graduations-deprecations-and-removals-in-v1-35">Graduations, deprecations, and removals in v1.35&lt;/h2>
&lt;h3 id="graduations-to-stable">Graduations to stable&lt;/h3>
&lt;p>This lists all the features that graduated to stable (also known as &lt;em>general availability&lt;/em>). For a full list of updates including new features and graduations from alpha to beta, see the release notes.&lt;/p>
&lt;p>This release includes a total of 15 enhancements promoted to stable:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://kep.k8s.io/4540">Add CPUManager policy option to restrict reservedSystemCPUs to system daemons and interrupt processing&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/5067">Pod Generation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/5468">Invariant Testing&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/1287">In-Place Update of Pod Resources&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3619">Fine-grained SupplementalGroups control&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3983">Add support for a drop-in kubelet configuration directory&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/5589">Remove gogo protobuf dependency for Kubernetes API types&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4210">kubelet image GC after a maximum age&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3673">Kubelet limit of Parallel Image Pulls&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4622">Add a TopologyManager policy option for MaxAllowableNUMANodes&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/859">Include kubectl command metadata in http request headers&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3015">PreferSameNode Traffic Distribution (formerly PreferLocal traffic policy / Node-level topology)&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4368">Job API managed-by mechanism&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4006">Transition from SPDY to WebSockets&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="deprecations-removals-and-community-updates">Deprecations, removals and community updates&lt;/h3>
&lt;p>As Kubernetes develops and matures, features may be deprecated, removed, or replaced with better
ones to improve the project's overall health. See the Kubernetes
&lt;a href="https://kubernetes.io/docs/reference/using-api/deprecation-policy/">deprecation and removal policy&lt;/a> for more details on this process. Kubernetes v1.35 includes a couple of deprecations.&lt;/p>
&lt;h4 id="ingress-nginx-retirement">Ingress NGINX retirement&lt;/h4>
&lt;p>For years, the Ingress NGINX controller has been a popular choice for routing traffic into Kubernetes clusters. It was flexible, widely adopted, and served as the standard entry point for countless applications.&lt;/p>
&lt;p>However, maintaining the project has become unsustainable. With a severe shortage of maintainers and mounting technical debt, the community recently made the difficult decision to retire it. This isn't strictly part of the v1.35 release, but it's such an important change that we wanted to highlight it here.&lt;/p>
&lt;p>Consequently, the Kubernetes project announced that Ingress NGINX will receive only best-effort maintenance until &lt;strong>March 2026&lt;/strong>. After this date, it will be archived with no further updates. The recommended path forward is to migrate to the &lt;a href="https://gateway-api.sigs.k8s.io/">Gateway API&lt;/a>, which offers a more modern, secure, and extensible standard for traffic management.&lt;/p>
&lt;p>You can find more in the &lt;a href="https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/">official blog post&lt;/a>.&lt;/p>
&lt;h4 id="removal-of-cgroup-v1-support">Removal of cgroup v1 support&lt;/h4>
&lt;p>When it comes to managing resources on Linux nodes, Kubernetes has historically relied on cgroups (control groups). While the original cgroup v1 was functional, it was often inconsistent and limited. That is why Kubernetes introduced support for cgroup v2 back in v1.25, offering a much cleaner, unified hierarchy and better resource isolation.&lt;/p>
&lt;p>Because cgroup v2 is now the modern standard, Kubernetes is ready to retire the legacy cgroup v1 support in v1.35. This is an important notice for cluster administrators: if you are still running nodes on older Linux distributions that don't support cgroup v2, your &lt;code>kubelet&lt;/code> will fail to start. To avoid downtime, you will need to migrate those nodes to systems where cgroup v2 is enabled.&lt;/p>
&lt;p>To learn more, read &lt;a href="https://kubernetes.io/docs/concepts/architecture/cgroups/">about cgroup v2&lt;/a>;&lt;br>
you can also track the switchover work via &lt;a href="https://kep.k8s.io/5573">KEP-5573: Remove cgroup v1 support&lt;/a>.&lt;/p>
&lt;h4 id="deprecation-of-ipvs-mode-in-kube-proxy">Deprecation of ipvs mode in kube-proxy&lt;/h4>
&lt;p>Years ago, Kubernetes adopted the &lt;a href="https://kubernetes.io/docs/reference/networking/virtual-ips/#proxy-mode-ipvs">&lt;code>ipvs&lt;/code>&lt;/a> mode in &lt;code>kube-proxy&lt;/code> to provide faster load balancing than the standard &lt;a href="https://kubernetes.io/docs/reference/networking/virtual-ips/#proxy-mode-iptables">&lt;code>iptables&lt;/code>&lt;/a>. While it offered a performance boost, keeping it in sync with evolving networking requirements created too much technical debt and complexity.&lt;/p>
&lt;p>Because of this maintenance burden, Kubernetes v1.35 deprecates &lt;code>ipvs&lt;/code> mode. Although the mode remains available in this release, &lt;code>kube-proxy&lt;/code> will now emit a warning on startup when configured to use it. The goal is to streamline the codebase and focus on modern standards. For Linux nodes, you should begin transitioning to &lt;a href="https://kubernetes.io/docs/reference/networking/virtual-ips/#proxy-mode-nftables">&lt;code>nftables&lt;/code>&lt;/a>, which is now the recommended replacement.&lt;/p>
&lt;p>You can find more in &lt;a href="https://kep.k8s.io/5495">KEP-5495: Deprecate ipvs mode in kube-proxy&lt;/a>.&lt;/p>
&lt;h4 id="final-call-for-containerd-v1-x">Final call for containerd v1.X&lt;/h4>
&lt;p>While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases, this is the final version with such support. The SIG Node community has designated v1.35 as the last release to support the containerd v1.X series.&lt;/p>
&lt;p>This serves as an important reminder: before upgrading to the next Kubernetes version, you must switch to containerd 2.0 or later. To help identify which nodes need attention, you can monitor the &lt;code>kubelet_cri_losing_support&lt;/code> metric within your cluster.&lt;/p>
&lt;p>You can find more in the &lt;a href="https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/#announcement-kubernetes-is-deprecating-containerd-v1-y-support">official blog post&lt;/a> or in &lt;a href="https://kep.k8s.io/4033">KEP-4033: Discover cgroup driver from CRI&lt;/a>.&lt;/p>
&lt;h4 id="improved-pod-stability-during-kubelet-restarts">Improved Pod stability during &lt;code>kubelet&lt;/code> restarts&lt;/h4>
&lt;p>Previously, restarting the &lt;code>kubelet&lt;/code> service often caused a temporary disruption in Pod status. During a restart, the kubelet would reset container states, causing healthy Pods to be marked as &lt;code>NotReady&lt;/code> and removed from load balancers, even if the application itself was still running correctly.&lt;/p>
&lt;p>To address this reliability issue, this behavior has been corrected to ensure seamless node maintenance. The &lt;code>kubelet&lt;/code> now properly restores the state of existing containers from the runtime upon startup. This ensures that your workloads remain &lt;code>Ready&lt;/code> and traffic continues to flow uninterrupted during &lt;code>kubelet&lt;/code> restarts or upgrades.&lt;/p>
&lt;p>You can find more in &lt;a href="https://kep.k8s.io/4781">KEP-4781: Fix inconsistent container ready state after kubelet restart&lt;/a>.&lt;/p>
&lt;h2 id="release-notes">Release notes&lt;/h2>
&lt;p>Check out the full details of the Kubernetes v1.35 release in our &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md">release notes&lt;/a>.&lt;/p>
&lt;h2 id="availability">Availability&lt;/h2>
&lt;p>Kubernetes v1.35 is available for download on &lt;a href="https://github.com/kubernetes/kubernetes/releases/tag/v1.35.0">GitHub&lt;/a> or on the &lt;a href="https://kubernetes.io/releases/download/">Kubernetes download page&lt;/a>.&lt;/p>
&lt;p>To get started with Kubernetes, check out these &lt;a href="https://kubernetes.io/docs/tutorials/">interactive tutorials&lt;/a> or run local Kubernetes clusters using &lt;a href="https://minikube.sigs.k8s.io/">minikube&lt;/a>. You can also easily install v1.35 using &lt;a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/">kubeadm&lt;/a>.&lt;/p>
&lt;h2 id="release-team">Release team&lt;/h2>
&lt;p>Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is made up of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. This requires the specialized skills of people from all corners of our community, from the code itself to its documentation and project management.&lt;/p>
&lt;p>&lt;a href="https://github.com/cncf/memorials/blob/main/han-kang.md">We honor the memory of Han Kang&lt;/a>, a long-time contributor and respected engineer whose technical excellence and infectious enthusiasm left a lasting impact on the Kubernetes community. Han was a significant force within SIG Instrumentation and SIG API Machinery, earning a &lt;a href="https://www.kubernetes.dev/community/awards/2021/">2021 Kubernetes Contributor Award&lt;/a> for his critical work and sustained commitment to the project's core stability. Beyond his technical contributions, Han was deeply admired for his generosity as a mentor and his passion for building connections among people. He was known for &amp;quot;opening doors&amp;quot; for others, whether guiding new contributors through their first pull requests or supporting colleagues with patience and kindness. Han’s legacy lives on through the engineers he inspired, the robust systems he helped build, and the warm, collaborative spirit he fostered within the cloud native ecosystem.&lt;/p>
&lt;p>We would like to thank the entire &lt;a href="https://github.com/kubernetes/sig-release/blob/master/releases/release-1.35/release-team.md">Release Team&lt;/a> for the hours spent hard at work to deliver the Kubernetes v1.35 release to our community. The Release Team's membership ranges from first-time shadows to returning team leads with experience forged over several release cycles. We are incredibly grateful to our Release Lead, &lt;a href="https://github.com/drewhagen">Drew Hagen&lt;/a>, whose hands-on guidance and vibrant energy not only navigated us through complex challenges but also fueled the community spirit behind this successful release.&lt;/p>
&lt;h2 id="project-velocity">Project velocity&lt;/h2>
&lt;p>The CNCF K8s &lt;a href="https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;var-period=m&amp;var-repogroup_name=All">DevStats&lt;/a> project aggregates a number of interesting data points related to the velocity of Kubernetes and various sub-projects. This includes everything from individual contributions to the number of companies that are contributing and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem.&lt;/p>
&lt;p>During the v1.35 release cycle, which spanned 14 weeks from 15th September 2025 to 17th December 2025, Kubernetes received contributions from as many as 85 different companies and 419 individuals. In the wider cloud native ecosystem, the figure goes up to 281 companies, counting 1769 total contributors.&lt;/p>
&lt;p>Note that &amp;quot;contribution&amp;quot; counts when someone makes a commit, code review, comment, creates an issue or PR, reviews a PR (including blogs and documentation) or comments on issues and PRs.&lt;br>
If you are interested in contributing, visit &lt;a href="https://www.kubernetes.dev/docs/guide/#getting-started">Getting Started&lt;/a> on our contributor website.&lt;/p>
&lt;p>Sources for this data:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1757890800000&amp;to=1765929599000&amp;var-period=d28&amp;var-repogroup_name=Kubernetes&amp;var-repo_name=kubernetes%2Fkubernetes">Companies contributing to Kubernetes&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1757890800000&amp;to=1765929599000&amp;var-period=d28&amp;var-repogroup_name=All&amp;var-repo_name=kubernetes%2Fkubernetes">Overall ecosystem contributions&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="events-update">Events update&lt;/h2>
&lt;p>Explore upcoming Kubernetes and cloud native events, including KubeCon + CloudNativeCon, KCD, and other notable conferences worldwide. Stay informed and get involved with the Kubernetes community!&lt;/p>
&lt;p>&lt;strong>February 2026&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://www.kcddelhi.com/index.html">&lt;strong>KCD - Kubernetes Community Days: New Delhi&lt;/strong>&lt;/a>: Feb 21, 2026 | New Delhi, India&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-guadalajara-presents-kcd-guadalajara-open-source-contributor-summit/cohost-kcd-guadalajara">&lt;strong>KCD - Kubernetes Community Days: Guadalajara&lt;/strong>&lt;/a>: Feb 23, 2026 | Guadalajara, Mexico&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>March 2026&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/">&lt;strong>KubeCon + CloudNativeCon Europe 2026&lt;/strong>&lt;/a>: Mar 23-26, 2026 | Amsterdam, Netherlands&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>May 2026&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-toronto-presents-kcd-toronto-canada-2026/">&lt;strong>KCD - Kubernetes Community Days: Toronto&lt;/strong>&lt;/a>: May 13, 2026 | Toronto, Canada&lt;/li>
&lt;li>&lt;a href="https://cloudnativefinland.org/kcd-helsinki-2026/">&lt;strong>KCD - Kubernetes Community Days: Helsinki&lt;/strong>&lt;/a>: May 20, 2026 | Helsinki, Finland&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>June 2026&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-india/">&lt;strong>KubeCon + CloudNativeCon India 2026&lt;/strong>&lt;/a>: Jun 18-19, 2026 | Mumbai, India&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/kcd-kuala-lumpur-2026/">&lt;strong>KCD - Kubernetes Community Days: Kuala Lumpur&lt;/strong>&lt;/a>: Jun 27, 2026 | Kuala Lumpur, Malaysia&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>July 2026&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-japan/">&lt;strong>KubeCon + CloudNativeCon Japan 2026&lt;/strong>&lt;/a>: Jul 29-30, 2026 | Yokohama, Japan&lt;/li>
&lt;/ul>
&lt;p>You can find the latest event details &lt;a href="https://community.cncf.io/events/#/list">here&lt;/a>.
​&lt;/p>
&lt;h2 id="upcoming-release-webinar">Upcoming release webinar&lt;/h2>
&lt;p>Join members of the Kubernetes v1.35 Release Team on &lt;strong>Wednesday, January 14, 2026, at 5:00 PM (UTC)&lt;/strong> to learn about the release highlights of this release. For more information and registration, visit the &lt;a href="https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cloud-native-live-kubernetes-v135-release/">event page&lt;/a> on the CNCF Online Programs site.&lt;/p>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>The simplest way to get involved with Kubernetes is by joining one of the many &lt;a href="https://github.com/kubernetes/community/blob/master/sig-list.md">Special Interest Groups&lt;/a> (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly &lt;a href="https://github.com/kubernetes/community/tree/master/communication">community meeting&lt;/a>, and through the channels below. Thank you for your continued feedback and support.&lt;/p>
&lt;ul>
&lt;li>Follow us on Bluesky &lt;a href="https://bsky.app/profile/kubernetes.io">@Kubernetesio&lt;/a> for the latest updates&lt;/li>
&lt;li>Join the community discussion on &lt;a href="https://discuss.kubernetes.io/">Discuss&lt;/a>&lt;/li>
&lt;li>Join the community on &lt;a href="http://slack.k8s.io/">Slack&lt;/a>&lt;/li>
&lt;li>Post questions (or answer questions) on &lt;a href="http://stackoverflow.com/questions/tagged/kubernetes">Stack Overflow&lt;/a>&lt;/li>
&lt;li>Share your Kubernetes &lt;a href="https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform">story&lt;/a>&lt;/li>
&lt;li>Read more about what’s happening with Kubernetes on the &lt;a href="https://kubernetes.io/blog/">blog&lt;/a>&lt;/li>
&lt;li>Learn more about the &lt;a href="https://github.com/kubernetes/sig-release/tree/master/release-team">Kubernetes Release Team&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.35 Sneak Peek</title><link>https://kubernetes.io/blog/2025/11/26/kubernetes-v1-35-sneak-peek/</link><pubDate>Wed, 26 Nov 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/11/26/kubernetes-v1-35-sneak-peek/</guid><description>
&lt;p>As the release of Kubernetes v1.35 approaches, the Kubernetes project continues to evolve. Features may be deprecated, removed, or replaced to improve the project's overall health. This blog post outlines planned changes for the v1.35 release that the release team believes you should be aware of to ensure the continued smooth operation of your Kubernetes cluster(s), and to keep you up to date with the latest developments. The information below is based on the current status of the v1.35 release and is subject to change before the final release date.&lt;/p>
&lt;h2 id="deprecations-and-removals-for-kubernetes-v1-35">Deprecations and removals for Kubernetes v1.35&lt;/h2>
&lt;h3 id="cgroup-v1-support">cgroup v1 support&lt;/h3>
&lt;p>On Linux nodes, container runtimes typically rely on cgroups (short for &amp;quot;control groups&amp;quot;).
Support for using cgroup v2 has been stable in Kubernetes since v1.25, providing an alternative to the original v1 cgroup support. While cgroup v1 provided the initial resource control mechanism, it suffered from well-known
inconsistencies and limitations. Adding support for cgroup v2 allowed use of a unified control group hierarchy, improved resource isolation, and served as the foundation for modern features, making legacy cgroup v1 support ready for removal.
The removal of cgroup v1 support will only impact cluster administrators running nodes on older Linux distributions that do not support cgroup v2; on those nodes, the &lt;code>kubelet&lt;/code> will fail to start. Administrators must migrate their nodes to systems with cgroup v2 enabled. More details on compatibility requirements will be available in a blog post soon after the v1.35 release.&lt;/p>
&lt;p>To learn more, read &lt;a href="https://kubernetes.io/docs/concepts/architecture/cgroups/">about cgroup v2&lt;/a>;&lt;br>
you can also track the switchover work via &lt;a href="https://kep.k8s.io/5573">KEP-5573: Remove cgroup v1 support&lt;/a>.&lt;/p>
&lt;h3 id="deprecation-of-ipvs-mode-in-kube-proxy">Deprecation of ipvs mode in kube-proxy&lt;/h3>
&lt;p>Many releases ago, the Kubernetes project implemented an &lt;a href="https://kubernetes.io/docs/reference/networking/virtual-ips/#proxy-mode-ipvs">ipvs&lt;/a> mode in &lt;code>kube-proxy&lt;/code>. It was adopted as a way to provide high-performance service load balancing, with better performance than the existing &lt;code>iptables&lt;/code> mode. However, maintaining feature parity between ipvs and other kube-proxy modes became difficult, due to technical complexity and diverging requirements. This created significant technical debt and made the ipvs backend impractical to support alongside newer networking capabilities.&lt;/p>
&lt;p>The Kubernetes project intends to deprecate kube-proxy &lt;code>ipvs&lt;/code> mode in the v1.35 release, to streamline the &lt;code>kube-proxy&lt;/code> codebase. For Linux nodes, the recommended &lt;code>kube-proxy&lt;/code> mode is already &lt;a href="https://kubernetes.io/docs/reference/networking/virtual-ips/#proxy-mode-nftables">nftables&lt;/a>.&lt;/p>
&lt;p>You can find more in &lt;a href="https://kep.k8s.io/5495">KEP-5495: Deprecate ipvs mode in kube-proxy&lt;/a>&lt;/p>
&lt;h3 id="kubernetes-is-deprecating-containerd-v1-y-support">Kubernetes is deprecating containerd v1.y support&lt;/h3>
&lt;p>While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases of containerd, as a consequence of automated cgroup driver detection, the Kubernetes SIG Node community has formally agreed upon a final support timeline for containerd v1.X. Kubernetes v1.35 is the last release to offer this support (aligned with containerd 1.7 EOL).&lt;/p>
&lt;p>This is a final warning that if you are using containerd 1.X, you must switch to 2.0 or later before upgrading Kubernetes to the next version. You are able to monitor the &lt;code>kubelet_cri_losing_support&lt;/code> metric to determine if any nodes in your cluster are using a containerd version that will soon be unsupported.&lt;/p>
&lt;p>You can find more in the &lt;a href="https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/#announcement-kubernetes-is-deprecating-containerd-v1-y-support">official blog post&lt;/a> or in &lt;a href="https://kep.k8s.io/4033">KEP-4033: Discover cgroup driver from CRI&lt;/a>&lt;/p>
&lt;h2 id="featured-enhancements-of-kubernetes-v1-35">Featured enhancements of Kubernetes v1.35&lt;/h2>
&lt;p>The following enhancements are some of those likely to be included in the v1.35 release. This is not a commitment, and the release content is subject to change.&lt;/p>
&lt;h3 id="node-declared-features">Node declared features&lt;/h3>
&lt;p>When scheduling Pods, Kubernetes uses node labels, taints, and tolerations to match workload requirements with node capabilities. However, managing feature compatibility becomes challenging during cluster upgrades due to version skew between the control plane and nodes. This can lead to Pods being scheduled on nodes that lack required features, resulting in runtime failures.&lt;/p>
&lt;p>The &lt;em>node declared features&lt;/em> framework will introduce a standard mechanism for nodes to declare their supported Kubernetes features. With the new alpha feature enabled, a Node reports the features it can support, publishing this information to the control plane through a new &lt;code>.status.declaredFeatures&lt;/code> field. Then, the &lt;code>kube-scheduler&lt;/code>, admission controllers and third-party components can use these declarations. For example, you can enforce scheduling and API validation constraints, ensuring that Pods run only on compatible nodes.&lt;/p>
&lt;p>This approach reduces manual node labeling, improves scheduling accuracy, and prevents incompatible pod placements proactively. It also integrates with the Cluster Autoscaler for informed scale-up decisions. Feature declarations are temporary and tied to Kubernetes feature gates, enabling safe rollout and cleanup.&lt;/p>
&lt;p>Targeting alpha in v1.35, &lt;em>node declared features&lt;/em> aims to solve version skew scheduling issues by making node capabilities explicit, enhancing reliability and cluster stability in heterogeneous version environments.&lt;/p>
&lt;p>To learn more about this before the official documentation is published, you can read &lt;a href="https://kep.k8s.io/5328">KEP-5328&lt;/a>.&lt;/p>
&lt;h3 id="in-place-update-of-pod-resources">In-place update of Pod resources&lt;/h3>
&lt;p>Kubernetes is graduating in-place updates for Pod resources to General Availability (GA). This feature allows users to adjust &lt;code>cpu&lt;/code> and &lt;code>memory&lt;/code> resources without restarting Pods or Containers. Previously, such modifications required recreating Pods, which could disrupt workloads, particularly for stateful or batch applications.
Previous Kubernetes releases already allowed you to change infrastructure resources settings (requests and limits) for existing Pods. This allows for smoother &lt;a href="https://kubernetes.io/docs/concepts/workloads/autoscaling/vertical-pod-autoscale/">vertical scaling&lt;/a>, improves efficiency, and can also simplify solution development.&lt;/p>
&lt;p>The Container Runtime Interface (CRI) has also been improved, extending the &lt;code>UpdateContainerResources&lt;/code> API for Windows and future runtimes while allowing &lt;code>ContainerStatus&lt;/code> to report real-time resource configurations. Together, these changes make scaling in Kubernetes faster, more flexible, and disruption-free.
The feature was introduced as alpha in v1.27, graduated to beta in v1.33, and is targeting graduation to stable in v1.35.&lt;/p>
&lt;p>You can find more in &lt;a href="https://kep.k8s.io/1287">KEP-1287: In-place Update of Pod Resources&lt;/a>&lt;/p>
&lt;h3 id="pod-certificates">Pod certificates&lt;/h3>
&lt;p>When running microservices, Pods often require a strong cryptographic identity to authenticate with each other using mutual TLS (mTLS). While Kubernetes provides Service Account tokens, these are designed for authenticating to the API server, not for general-purpose workload identity.&lt;/p>
&lt;p>Before this enhancement, operators had to rely on complex, external projects like SPIFFE/SPIRE or cert-manager to provision and rotate certificates for their workloads. But what if you could issue a unique, short-lived certificate to your Pods natively and automatically? KEP-4317 is designed to enable such native workload identity. It opens up various possibilities for securing pod-to-pod communication by allowing the &lt;code>kubelet&lt;/code> to request and mount certificates for a Pod via a projected volume.&lt;/p>
&lt;p>This provides a built-in mechanism for workload identity, complete with automated certificate rotation, significantly simplifying the setup of service meshes and other zero-trust network policies. This feature was introduced as alpha in v1.34 and is targeting beta in v1.35.&lt;/p>
&lt;p>You can find more in &lt;a href="https://kep.k8s.io/4317">KEP-4317: Pod Certificates&lt;/a>&lt;/p>
&lt;h3 id="numeric-values-for-taints">Numeric values for taints&lt;/h3>
&lt;p>Kubernetes is enhancing &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/">taints and tolerations&lt;/a> by adding numeric comparison operators, such as &lt;code>Gt&lt;/code> (Greater Than) and &lt;code>Lt&lt;/code> (Less Than).&lt;/p>
&lt;p>Previously, tolerations supported only exact (&lt;code>Equal&lt;/code>) or existence (&lt;code>Exists&lt;/code>) matches, which were not suitable for numeric properties such as reliability SLAs.&lt;/p>
&lt;p>With this change, a Pod can use a toleration to &amp;quot;opt-in&amp;quot; to nodes that meet a specific numeric threshold. For example, a Pod can require a Node with an SLA taint value greater than 950 (&lt;code>operator: Gt&lt;/code>, &lt;code>value: &amp;quot;950&amp;quot;&lt;/code>).&lt;/p>
&lt;p>This approach is more powerful than Node Affinity because it supports the NoExecute effect, allowing Pods to be automatically evicted if a node's numeric value drops below the tolerated threshold.&lt;/p>
&lt;p>You can find more in &lt;a href="https://kep.k8s.io/5471">KEP-5471: Enable SLA-based Scheduling&lt;/a>&lt;/p>
&lt;h3 id="user-namespaces">User namespaces&lt;/h3>
&lt;p>When running Pods, you can use &lt;code>securityContext&lt;/code> to drop privileges, but containers inside the pod often still run as root (UID 0). This simplicity poses a significant challenge, as that container UID 0 maps directly to the host's root user.&lt;/p>
&lt;p>Before this enhancement, a container breakout vulnerability could grant an attacker full root access to the node. But what if you could dynamically remap the container's root user to a safe, unprivileged user on the host? KEP-127 specifically allows such native support for Linux User Namespaces. It opens up various possibilities for pod security by isolating container and host user/group IDs. This allows a process to have root privileges (UID 0) within its namespace, while running as a non-privileged, high-numbered UID on the host.&lt;/p>
&lt;p>Released as alpha in v1.25 and beta in v1.30, this feature continues to progress through beta maturity, paving the way for truly &amp;quot;rootless&amp;quot; containers that drastically reduce the attack surface for a whole class of security vulnerabilities.&lt;/p>
&lt;p>You can find more in &lt;a href="https://kep.k8s.io/127">KEP-127: User Namespaces&lt;/a>&lt;/p>
&lt;h3 id="support-for-mounting-oci-images-as-volumes">Support for mounting OCI images as volumes&lt;/h3>
&lt;p>When provisioning a Pod, you often need to bundle data, binaries, or configuration files for your containers.
Before this enhancement, people often included that kind of data directly into the main container image, or required a custom init container to download and unpack files into an &lt;code>emptyDir&lt;/code>. You can still take either of those approaches, of course.&lt;/p>
&lt;p>But what if you could populate a volume directly from a data-only artifact in an OCI registry, just like pulling a container image? Kubernetes v1.31 added support for the &lt;code>image&lt;/code> volume type, allowing Pods to pull and unpack OCI container image artifacts into a volume declaratively.&lt;/p>
&lt;p>This allows for seamless distribution of data, binaries, or ML models using standard registry tooling, completely decoupling data from the container image and eliminating the need for complex init containers or startup scripts.
This volume type has been in beta since v1.33 and will likely be enabled by default in v1.35.&lt;/p>
&lt;p>You can try out the beta version of &lt;a href="https://kubernetes.io/docs/concepts/storage/volumes/#image">&lt;code>image&lt;/code> volumes&lt;/a>, or you can learn more about the plans from &lt;a href="https://kep.k8s.io/4639">KEP-4639: OCI Volume Source&lt;/a>.&lt;/p>
&lt;h2 id="want-to-know-more">Want to know more?&lt;/h2>
&lt;p>New features and deprecations are also announced in the Kubernetes release notes. We will formally announce what's new in &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.35.md">Kubernetes v1.35&lt;/a> as part of the CHANGELOG for that release.&lt;/p>
&lt;p>The Kubernetes v1.35 release is planned for &lt;strong>December 17, 2025&lt;/strong>. Stay tuned for updates!&lt;/p>
&lt;p>You can also see the announcements of changes in the release notes for:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md">Kubernetes v1.34&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.33.md">Kubernetes v1.33&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md">Kubernetes v1.32&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md">Kubernetes v1.31&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md">Kubernetes v1.30&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>The simplest way to get involved with Kubernetes is by joining one of the many &lt;a href="https://github.com/kubernetes/community/blob/master/sig-list.md">Special Interest Groups&lt;/a> (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly &lt;a href="https://github.com/kubernetes/community/tree/master/communication">community meeting&lt;/a>, and through the channels below. Thank you for your continued feedback and support.&lt;/p>
&lt;ul>
&lt;li>Follow us on Bluesky &lt;a href="https://bsky.app/profile/kubernetes.io">@kubernetes.io&lt;/a> for the latest updates&lt;/li>
&lt;li>Join the community discussion on &lt;a href="https://discuss.kubernetes.io/">Discuss&lt;/a>&lt;/li>
&lt;li>Join the community on &lt;a href="http://slack.k8s.io/">Slack&lt;/a>&lt;/li>
&lt;li>Post questions (or answer questions) on &lt;a href="https://serverfault.com/questions/tagged/kubernetes">Server Fault&lt;/a> or &lt;a href="http://stackoverflow.com/questions/tagged/kubernetes">Stack Overflow&lt;/a>&lt;/li>
&lt;li>Share your Kubernetes &lt;a href="https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform">story&lt;/a>&lt;/li>
&lt;li>Read more about what’s happening with Kubernetes on the &lt;a href="https://kubernetes.io/blog/">blog&lt;/a>&lt;/li>
&lt;li>Learn more about the &lt;a href="https://github.com/kubernetes/sig-release/tree/master/release-team">Kubernetes Release Team&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes Configuration Good Practices</title><link>https://kubernetes.io/blog/2025/11/25/configuration-good-practices/</link><pubDate>Tue, 25 Nov 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/11/25/configuration-good-practices/</guid><description>
&lt;p>Configuration is one of those things in Kubernetes that seems small until it's not. Configuration is at the heart of every Kubernetes workload.
A missing quote, a wrong API version or a misplaced YAML indent can ruin your entire deploy.&lt;/p>
&lt;p>This blog brings together tried-and-tested configuration best practices. The small habits that make your Kubernetes setup clean, consistent and easier to manage.
Whether you are just starting out or already deploying apps daily, these are the little things that keep your cluster stable and your future self sane.&lt;/p>
&lt;p>&lt;em>This blog is inspired by the original &lt;em>Configuration Best Practices&lt;/em> page, which has evolved through contributions from many members of the Kubernetes community.&lt;/em>&lt;/p>
&lt;h2 id="general-configuration-practices">General configuration practices&lt;/h2>
&lt;h3 id="use-the-latest-stable-api-version">Use the latest stable API version&lt;/h3>
&lt;p>Kubernetes evolves fast. Older APIs eventually get deprecated and stop working. So, whenever you are defining resources, make sure you are using the latest stable API version.
You can always check with&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl api-resources
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This simple step saves you from future compatibility issues.&lt;/p>
&lt;h3 id="store-configuration-in-version-control">Store configuration in version control&lt;/h3>
&lt;p>Never apply manifest files directly from your desktop. Always keep them in a version control system like Git, it's your safety net.
If something breaks, you can instantly roll back to a previous commit, compare changes or recreate your cluster setup without panic.&lt;/p>
&lt;h3 id="write-configs-in-yaml-not-json">Write configs in YAML not JSON&lt;/h3>
&lt;p>Write your configuration files using YAML rather than JSON. Both work technically, but YAML is just easier for humans. It's cleaner to read and less noisy and widely used in the community.&lt;/p>
&lt;p>YAML has some sneaky gotchas with boolean values:
Use only &lt;code>true&lt;/code> or &lt;code>false&lt;/code>.
Don't write &lt;code>yes&lt;/code>, &lt;code>no&lt;/code>, &lt;code>on&lt;/code> or &lt;code>off&lt;/code>.
They might work in one version of YAML but break in another. To be safe, quote anything that looks like a Boolean (for example &lt;code>&amp;quot;yes&amp;quot;&lt;/code>).&lt;/p>
&lt;h3 id="keep-configuration-simple-and-minimal">Keep configuration simple and minimal&lt;/h3>
&lt;p>Avoid setting default values that are already handled by Kubernetes. Minimal manifests are easier to debug, cleaner to review and less likely to break things later.&lt;/p>
&lt;h3 id="group-related-objects-together">Group related objects together&lt;/h3>
&lt;p>If your Deployment, Service and ConfigMap all belong to one app, put them in a single manifest file.&lt;br>
It's easier to track changes and apply them as a unit.
See the &lt;a href="https://github.com/kubernetes/examples/blob/master/web/guestbook/all-in-one/guestbook-all-in-one.yaml">Guestbook all-in-one.yaml&lt;/a> file for an example of this syntax.&lt;/p>
&lt;p>You can even apply entire directories with:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl apply -f configs/
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>One command and boom everything in that folder gets deployed.&lt;/p>
&lt;h3 id="add-helpful-annotations">Add helpful annotations&lt;/h3>
&lt;p>Manifest files are not just for machines, they are for humans too. Use annotations to describe why something exists or what it does. A quick one-liner can save hours when debugging later and also allows better collaboration.&lt;/p>
&lt;p>The most helpful annotation to set is &lt;code>kubernetes.io/description&lt;/code>. It's like using comment, except that it gets copied into the API so that everyone else can see it even after you deploy.&lt;/p>
&lt;h2 id="managing-workloads-pods-deployments-and-jobs">Managing Workloads: Pods, Deployments, and Jobs&lt;/h2>
&lt;p>A common early mistake in Kubernetes is creating Pods directly. Pods work, but they don't reschedule themselves if something goes wrong.&lt;/p>
&lt;p>&lt;em>Naked Pods&lt;/em> (Pods not managed by a controller, such as &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Deployment&lt;/a> or a &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/">StatefulSet&lt;/a>) are fine for testing, but in real setups, they are risky.&lt;/p>
&lt;p>Why?
Because if the node hosting that Pod dies, the Pod dies with it and Kubernetes won't bring it back automatically.&lt;/p>
&lt;h3 id="use-deployments-for-apps-that-should-always-be-running">Use Deployments for apps that should always be running&lt;/h3>
&lt;p>A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available, and specifies a strategy to replace Pods (such as &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment">RollingUpdate&lt;/a>), is almost always preferable to creating Pods directly.
You can roll out a new version, and if something breaks, roll back instantly.&lt;/p>
&lt;h3 id="use-jobs-for-tasks-that-should-finish">Use Jobs for tasks that should finish&lt;/h3>
&lt;p>A &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Job&lt;/a> is perfect when you need something to run once and then stop like database migration or batch processing task.
It will retry if the pods fails and report success when it's done.&lt;/p>
&lt;h2 id="service-configuration-and-networking">Service Configuration and Networking&lt;/h2>
&lt;p>Services are how your workloads talk to each other inside (and sometimes outside) your cluster. Without them, your pods exist but can't reach anyone. Let's make sure that doesn't happen.&lt;/p>
&lt;h3 id="create-services-before-workloads-that-use-them">Create Services before workloads that use them&lt;/h3>
&lt;p>When Kubernetes starts a Pod, it automatically injects environment variables for existing Services.
So, if a Pod depends on a Service, create a &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/">Service&lt;/a> &lt;strong>before&lt;/strong> its corresponding backend workloads (Deployments or StatefulSets), and before any workloads that need to access it.&lt;/p>
&lt;p>For example, if a Service named foo exists, all containers will get the following variables in their initial environment:&lt;/p>
&lt;pre tabindex="0">&lt;code>FOO_SERVICE_HOST=&amp;lt;the host the Service runs on&amp;gt;
FOO_SERVICE_PORT=&amp;lt;the port the Service runs on&amp;gt;
&lt;/code>&lt;/pre>&lt;p>DNS based discovery doesn't have this problem, but it's a good habit to follow anyway.&lt;/p>
&lt;h3 id="use-dns-for-service-discovery">Use DNS for Service discovery&lt;/h3>
&lt;p>If your cluster has the DNS &lt;a href="https://kubernetes.io/docs/concepts/cluster-administration/addons/">add-on&lt;/a> (most do), every Service automatically gets a DNS entry. That means you can access it by name instead of IP:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>curl http://my-service.default.svc.cluster.local
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>It's one of those features that makes Kubernetes networking feel magical.&lt;/p>
&lt;h3 id="avoid-hostport-and-hostnetwork-unless-absolutely-necessary">Avoid &lt;code>hostPort&lt;/code> and &lt;code>hostNetwork&lt;/code> unless absolutely necessary&lt;/h3>
&lt;p>You'll sometimes see these options in manifests:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">hostPort&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">8080&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">hostNetwork&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>But here's the thing:
They tie your Pods to specific nodes, making them harder to schedule and scale. Because each &amp;lt;&lt;code>hostIP&lt;/code>, &lt;code>hostPort&lt;/code>, &lt;code>protocol&lt;/code>&amp;gt; combination must be unique. If you don't specify the &lt;code>hostIP&lt;/code> and &lt;code>protocol&lt;/code> explicitly, Kubernetes will use &lt;code>0.0.0.0&lt;/code> as the default &lt;code>hostIP&lt;/code> and &lt;code>TCP&lt;/code> as the default &lt;code>protocol&lt;/code>.
Unless you're debugging or building something like a network plugin, avoid them.&lt;/p>
&lt;p>If you just need local access for testing, try &lt;a href="https://kubernetes.io/docs/reference/kubectl/generated/kubectl_port-forward/">&lt;code>kubectl port-forward&lt;/code>&lt;/a>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl port-forward deployment/web 8080:80
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>See &lt;a href="https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/">Use Port Forwarding to access applications in a cluster&lt;/a> to learn more.
Or if you really need external access, use a &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport">&lt;code>type: NodePort&lt;/code> Service&lt;/a>. That's the safer, Kubernetes-native way.&lt;/p>
&lt;h3 id="use-headless-services-for-internal-discovery">Use headless Services for internal discovery&lt;/h3>
&lt;p>Sometimes, you don't want Kubernetes to load balance traffic. You want to talk directly to each Pod. That's where &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/#headless-services">headless Services&lt;/a> come in.&lt;/p>
&lt;p>You create one by setting &lt;code>clusterIP: None&lt;/code>.
Instead of a single IP, DNS gives you a list of all Pods IPs, perfect for apps that manage connections themselves.&lt;/p>
&lt;h2 id="working-with-labels-effectively">Working with labels effectively&lt;/h2>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/">Labels&lt;/a> are key/value pairs that are attached to objects such as Pods.
Labels help you organize, query and group your resources.
They don't do anything by themselves, but they make everything else from Services to Deployments work together smoothly.&lt;/p>
&lt;h3 id="use-semantics-labels">Use semantics labels&lt;/h3>
&lt;p>Good labels help you understand what's what, even after months later.
Define and use &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/">labels&lt;/a> that identify semantic attributes of your application or Deployment.
For example;&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">labels&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">app.kubernetes.io/name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>myapp&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">app.kubernetes.io/component&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>web&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tier&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>frontend&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">phase&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>test&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>&lt;code>app.kubernetes.io/name&lt;/code> : what the app is&lt;/li>
&lt;li>&lt;code>tier&lt;/code> : which layer it belongs to (frontend/backend)&lt;/li>
&lt;li>&lt;code>phase&lt;/code> : which stage it's in (test/prod)&lt;/li>
&lt;/ul>
&lt;p>You can then use these labels to make powerful selectors.
For example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl get pods -l &lt;span style="color:#b8860b">tier&lt;/span>&lt;span style="color:#666">=&lt;/span>frontend
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This will list all frontend Pods across your cluster, no matter which Deployment they came from.
Basically you are not manually listing Pod names; you are just describing what you want.
See the &lt;a href="https://github.com/kubernetes/examples/tree/master/web/guestbook/">guestbook&lt;/a> app for examples of this approach.&lt;/p>
&lt;h3 id="use-common-kubernetes-labels">Use common Kubernetes labels&lt;/h3>
&lt;p>Kubernetes actually recommends a set of &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/">common labels&lt;/a>. It's a standardized way to name things across your different workloads or projects.
Following this convention makes your manifests cleaner, and it means that tools such as &lt;a href="https://headlamp.dev/">Headlamp&lt;/a>, &lt;a href="https://github.com/kubernetes/dashboard#introduction">dashboard&lt;/a>, or third-party monitoring systems can all
automatically understand what's running.&lt;/p>
&lt;h3 id="manipulate-labels-for-debugging">Manipulate labels for debugging&lt;/h3>
&lt;p>Since controllers (like ReplicaSets or Deployments) use labels to manage Pods, you can remove a label to “detach” a Pod temporarily.&lt;/p>
&lt;p>Example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl label pod mypod app-
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>app-&lt;/code> part removes the label key &lt;code>app&lt;/code>.
Once that happens, the controller won’t manage that Pod anymore.
It’s like isolating it for inspection, a “quarantine mode” for debugging. To interactively remove or add labels, use &lt;a href="https://kubernetes.io/docs/reference/kubectl/generated/kubectl_label/">&lt;code>kubectl label&lt;/code>&lt;/a>.&lt;/p>
&lt;p>You can then check logs, exec into it and once done, delete it manually.
That’s a super underrated trick every Kubernetes engineer should know.&lt;/p>
&lt;h2 id="handy-kubectl-tips">Handy kubectl tips&lt;/h2>
&lt;p>These small tips make life much easier when you are working with multiple manifest files or clusters.&lt;/p>
&lt;h3 id="apply-entire-directories">Apply entire directories&lt;/h3>
&lt;p>Instead of applying one file at a time, apply the whole folder:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># Using server-side apply is also a good practice&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>kubectl apply -f configs/ --server-side
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This command looks for &lt;code>.yaml&lt;/code>, &lt;code>.yml&lt;/code> and &lt;code>.json&lt;/code> files in that folder and applies them all together.
It's faster, cleaner and helps keep things grouped by app.&lt;/p>
&lt;h3 id="use-label-selectors-to-get-or-delete-resources">Use label selectors to get or delete resources&lt;/h3>
&lt;p>You don't always need to type out resource names one by one.
Instead, use &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors">selectors&lt;/a> to act on entire groups at once:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl get pods -l &lt;span style="color:#b8860b">app&lt;/span>&lt;span style="color:#666">=&lt;/span>myapp
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>kubectl delete pod -l &lt;span style="color:#b8860b">phase&lt;/span>&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#a2f">test&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>It's especially useful in CI/CD pipelines, where you want to clean up test resources dynamically.&lt;/p>
&lt;h3 id="quickly-create-deployments-and-services">Quickly create Deployments and Services&lt;/h3>
&lt;p>For quick experiments, you don't always need to write a manifest. You can spin up a Deployment right from the CLI:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl create deployment webapp --image&lt;span style="color:#666">=&lt;/span>nginx
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Then expose it as a Service:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>kubectl expose deployment webapp --port&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#666">80&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This is great when you just want to test something before writing full manifests.
Also, see &lt;a href="https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/">Use a Service to Access an Application in a cluster&lt;/a> for an example.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Cleaner configuration leads to calmer cluster administrators.
If you stick to a few simple habits: keep configuration simple and minimal, version-control everything,
use consistent labels, and avoid relying on naked Pods, you'll save yourself hours of debugging down the road.&lt;/p>
&lt;p>The best part?
Clean configurations stay readable. Even after months, you or anyone on your team can glance at them and know exactly what’s happening.&lt;/p></description></item><item><title>Ingress NGINX Retirement: What You Need to Know</title><link>https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/</link><pubDate>Tue, 11 Nov 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/</guid><description>
&lt;p>To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of &lt;a href="https://github.com/kubernetes/ingress-nginx/">Ingress NGINX&lt;/a>. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. &lt;strong>Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.&lt;/strong>&lt;/p>
&lt;p>We recommend migrating to one of the many alternatives. Consider &lt;a href="https://gateway-api.sigs.k8s.io/guides/">migrating to Gateway API&lt;/a>, the modern replacement for Ingress. If you must continue using Ingress, many alternative Ingress controllers are &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/">listed in the Kubernetes documentation&lt;/a>. Continue reading for further information about the history and current state of Ingress NGINX, as well as next steps.&lt;/p>
&lt;h2 id="about-ingress-nginx">About Ingress NGINX&lt;/h2>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">Ingress&lt;/a> is the original user-friendly way to direct network traffic to workloads running on Kubernetes. (&lt;a href="https://kubernetes.io/docs/concepts/services-networking/gateway/">Gateway API&lt;/a> is a newer way to achieve many of the same goals.) In order for an Ingress to work in your cluster, there must be an &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/">Ingress controller&lt;/a> running. There are many Ingress controller choices available, which serve the needs of different users and use cases. Some are cloud-provider specific, while others have more general applicability.&lt;/p>
&lt;p>&lt;a href="https://www.github.com/kubernetes/ingress-nginx">Ingress NGINX&lt;/a> was an Ingress controller, developed early in the history of the Kubernetes project as an example implementation of the API. It became very popular due to its tremendous flexibility, breadth of features, and independence from any particular cloud or infrastructure provider. Since those days, many other Ingress controllers have been created within the Kubernetes project by community groups, and by cloud native vendors. Ingress NGINX has continued to be one of the most popular, deployed as part of many hosted Kubernetes platforms and within innumerable independent users’ clusters.&lt;/p>
&lt;h2 id="history-and-challenges">History and Challenges&lt;/h2>
&lt;p>The breadth and flexibility of Ingress NGINX has caused maintenance challenges. Changing expectations about cloud native software have also added complications. What were once considered helpful options have sometimes come to be considered serious security flaws, such as the ability to add arbitrary NGINX configuration directives via the &amp;quot;snippets&amp;quot; annotations. Yesterday’s flexibility has become today’s insurmountable technical debt.&lt;/p>
&lt;p>Despite the project’s popularity among users, Ingress NGINX has always struggled with insufficient or barely-sufficient maintainership. For years, the project has had only one or two people doing development work, on their own time, after work hours and on weekends. Last year, the Ingress NGINX maintainers &lt;a href="https://kccncna2024.sched.com/event/1hoxW/securing-the-future-of-ingress-nginx-james-strong-isovalent-marco-ebert-giant-swarm">announced&lt;/a> their plans to wind down Ingress NGINX and develop a replacement controller together with the Gateway API community. Unfortunately, even that announcement failed to generate additional interest in helping maintain Ingress NGINX or develop InGate to replace it. (InGate development never progressed far enough to create a mature replacement; it will also be retired.)&lt;/p>
&lt;h2 id="current-state-and-next-steps">Current State and Next Steps&lt;/h2>
&lt;p>Currently, Ingress NGINX is receiving best-effort maintenance. SIG Network and the Security Response Committee have exhausted our efforts to find additional support to make Ingress NGINX sustainable. To prioritize user safety, we must retire the project.&lt;/p>
&lt;p>In March 2026, Ingress NGINX maintenance will be halted, and the project will be &lt;a href="https://github.com/kubernetes-retired/">retired&lt;/a>. After that time, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. The GitHub repositories will be made read-only and left available for reference.&lt;/p>
&lt;p>Existing deployments of Ingress NGINX will not be broken. Existing project artifacts such as Helm charts and container images will remain available.&lt;/p>
&lt;p>In most cases, you can check whether you use Ingress NGINX by running &lt;code>kubectl get pods \--all-namespaces \--selector app.kubernetes.io/name=ingress-nginx&lt;/code> with cluster administrator permissions.&lt;/p>
&lt;p>We would like to thank the Ingress NGINX maintainers for their work in creating and maintaining this project–their dedication remains impressive. This Ingress controller has powered billions of requests in datacenters and homelabs all around the world. In a lot of ways, Kubernetes wouldn’t be where it is without Ingress NGINX, and we are grateful for so many years of incredible effort.&lt;/p>
&lt;p>&lt;strong>SIG Network and the Security Response Committee recommend that all Ingress NGINX users begin migration to Gateway API or another Ingress controller immediately.&lt;/strong> Many options are listed in the Kubernetes documentation: &lt;a href="https://gateway-api.sigs.k8s.io/guides/">Gateway API&lt;/a>, &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress-controllers/">Ingress&lt;/a>. Additional options may be available from vendors you work with.&lt;/p></description></item><item><title>Announcing the 2025 Steering Committee Election Results</title><link>https://kubernetes.io/blog/2025/11/09/steering-committee-results-2025/</link><pubDate>Sun, 09 Nov 2025 15:10:00 -0500</pubDate><guid>https://kubernetes.io/blog/2025/11/09/steering-committee-results-2025/</guid><description>
&lt;p>The &lt;a href="https://github.com/kubernetes/community/tree/master/elections/steering/2025">2025 Steering Committee Election&lt;/a> is now complete. The Kubernetes Steering Committee consists of 7 seats, 4 of which were up for election in 2025. Incoming committee members serve a term of 2 years, and all members are elected by the Kubernetes Community.&lt;/p>
&lt;p>The Steering Committee oversees the governance of the entire Kubernetes project. With that great power comes great responsibility. You can learn more about the steering committee’s role in their &lt;a href="https://github.com/kubernetes/steering/blob/master/charter.md">charter&lt;/a>.&lt;/p>
&lt;p>Thank you to everyone who voted in the election; your participation helps support the community’s continued health and success.&lt;/p>
&lt;h2 id="results">Results&lt;/h2>
&lt;p>Congratulations to the elected committee members whose two year terms begin immediately (listed in alphabetical order by GitHub handle):&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Kat Cosgrove (&lt;a href="https://github.com/katcosgrove">@katcosgrove&lt;/a>), Minimus&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Paco Xu (&lt;a href="https://github.com/pacoxu">@pacoxu&lt;/a>), DaoCloud&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Rita Zhang (&lt;a href="https://github.com/ritazh">@ritazh&lt;/a>), Microsoft&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Maciej Szulik (&lt;a href="https://github.com/soltysh">@soltysh&lt;/a>), Defense Unicorns&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>They join continuing members:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Antonio Ojea (&lt;a href="https://github.com/aojea">@aojea&lt;/a>), Google&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Benjamin Elder (&lt;a href="https://github.com/BenTheElder">@BenTheElder&lt;/a>), Google&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Sascha Grunert (&lt;a href="https://github.com/saschagrunert">@saschagrunert&lt;/a>), Red Hat&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>Maciej Szulik and Paco Xu are returning Steering Committee Members.&lt;/p>
&lt;h2 id="big-thanks">Big thanks!&lt;/h2>
&lt;p>Thank you and congratulations on a successful election to this round’s election officers:&lt;/p>
&lt;ul>
&lt;li>Christoph Blecker (&lt;a href="https://github.com/cblecker">@cblecker&lt;/a>)&lt;/li>
&lt;li>Nina Polshakova (&lt;a href="https://github.com/npolshakova">@npolshakova&lt;/a>)&lt;/li>
&lt;li>Sreeram Venkitesh (&lt;a href="https://github.com/sreeram-venkitesh">@sreeram-venkitesh&lt;/a>)&lt;/li>
&lt;/ul>
&lt;p>Thanks to the Emeritus Steering Committee Members. Your service is appreciated by the community:&lt;/p>
&lt;ul>
&lt;li>Stephen Augustus (&lt;a href="https://github.com/justaugustus">@justaugustus&lt;/a>), Bloomberg&lt;/li>
&lt;li>Patrick Ohly (&lt;a href="https://github.com/pohly">@pohly&lt;/a>), Intel&lt;/li>
&lt;/ul>
&lt;p>And thank you to all the candidates who came forward to run for election.&lt;/p>
&lt;h2 id="get-involved-with-the-steering-committee">Get involved with the Steering Committee&lt;/h2>
&lt;p>This governing body, like all of Kubernetes, is open to all. You can follow along with Steering Committee &lt;a href="https://bit.ly/k8s-steering-wd">meeting notes&lt;/a> and weigh in by filing an issue or creating a PR against their &lt;a href="https://github.com/kubernetes/steering">repo&lt;/a>. They have an open meeting on &lt;a href="https://github.com/kubernetes/steering">the first Wednesday at 8am PT of every month&lt;/a>. They can also be contacted at their public mailing list &lt;a href="mailto:steering@kubernetes.io">steering@kubernetes.io&lt;/a>.&lt;/p>
&lt;p>You can see what the Steering Committee meetings are all about by watching past meetings on the &lt;a href="https://www.youtube.com/playlist?list=PL69nYSiGNLP1yP1B_nd9-drjoxp0Q14qM">YouTube Playlist&lt;/a>.&lt;/p>
&lt;hr>
&lt;p>&lt;em>This post was adapted from one written by the &lt;a href="https://github.com/kubernetes/community/tree/master/communication/contributor-comms">Contributor Comms Subproject&lt;/a>. If you want to write stories about the Kubernetes community, learn more about us.&lt;/em>&lt;/p>
&lt;p>&lt;em>This article was revised in November 2025 to update the information about when the steering committee meets.&lt;/em>&lt;/p></description></item><item><title>Gateway API 1.4: New Features</title><link>https://kubernetes.io/blog/2025/11/06/gateway-api-v1-4/</link><pubDate>Thu, 06 Nov 2025 09:00:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/11/06/gateway-api-v1-4/</guid><description>
&lt;p>&lt;img alt="Gateway API logo" src="https://kubernetes.io/blog/2025/11/06/gateway-api-v1-4/gateway-api-logo.svg">&lt;/p>
&lt;p>Ready to rock your Kubernetes networking? The Kubernetes SIG Network community presented the General Availability (GA) release of Gateway API (v1.4.0)! Released on October 6, 2025, version 1.4.0 reinforces the path for modern, expressive, and extensible service networking in Kubernetes.&lt;/p>
&lt;p>Gateway API v1.4.0 brings three new features to the &lt;em>Standard channel&lt;/em>
(Gateway API's GA release channel):&lt;/p>
&lt;ul>
&lt;li>&lt;strong>BackendTLSPolicy for TLS between gateways and backends&lt;/strong>&lt;/li>
&lt;li>&lt;strong>&lt;code>supportedFeatures&lt;/code> in GatewayClass status&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Named rules for Routes&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>and introduces three new experimental features:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Mesh resource for service mesh configuration&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Default gateways&lt;/strong> to ease configuration burden**&lt;/li>
&lt;li>&lt;strong>&lt;code>externalAuth&lt;/code> filter for HTTPRoute&lt;/strong>&lt;/li>
&lt;/ul>
&lt;h2 id="graduations-to-standard-channel">Graduations to Standard Channel&lt;/h2>
&lt;h3 id="backend-tls-policy">Backend TLS policy&lt;/h3>
&lt;p>Leads: &lt;a href="https://github.com/candita">Candace Holman&lt;/a>, &lt;a href="https://github.com/snorwin">Norwin Schnyder&lt;/a>, &lt;a href="https://github.com/kl52752">Katarzyna Łach&lt;/a>&lt;/p>
&lt;p>GEP-1897: &lt;a href="https://github.com/kubernetes-sigs/gateway-api/issues/1897">BackendTLSPolicy&lt;/a>&lt;/p>
&lt;p>&lt;a href="https://gateway-api.sigs.k8s.io/api-types/backendtlspolicy">BackendTLSPolicy&lt;/a> is a new Gateway API type for specifying the TLS configuration
of the connection from the Gateway to backend pod(s).
. Prior to the introduction of BackendTLSPolicy, there was no API specification
that allowed encrypted traffic on the hop from Gateway to backend.&lt;/p>
&lt;p>The &lt;code>BackendTLSPolicy&lt;/code> &lt;code>validation&lt;/code> configuration requires a hostname. This &lt;code>hostname&lt;/code>
serves two purposes. It is used as the SNI header when connecting to the backend and
for authentication, the certificate presented by the backend must match this hostname,
&lt;em>unless&lt;/em> &lt;code>subjectAltNames&lt;/code> is explicitly specified.&lt;/p>
&lt;p>If &lt;code>subjectAltNames&lt;/code> (SANs) are specified, the &lt;code>hostname&lt;/code> is only used for SNI, and authentication is performed against the SANs instead. If you still need to authenticate against the hostname value in this case, you MUST add it to the &lt;code>subjectAltNames&lt;/code> list.&lt;/p>
&lt;p>BackendTLSPolicy &lt;code>validation&lt;/code> configuration also requires either &lt;code>caCertificateRefs&lt;/code> or &lt;code>wellKnownCACertificates&lt;/code>.
&lt;code>caCertificateRefs&lt;/code> refer to one or more (up to 8) PEM-encoded TLS certificate bundles. If there are no specific certificates to use,
then depending on your implementation, you may use &lt;code>wellKnownCACertificates&lt;/code>,
set to &amp;quot;System&amp;quot; to tell the Gateway to use an implementation-specific set of trusted CA Certificates.&lt;/p>
&lt;p>In this example, the BackendTLSPolicy is configured to use certificates defined in the auth-cert ConfigMap
to connect with a TLS-encrypted upstream connection where pods backing the auth service are expected to serve a
valid certificate for &lt;code>auth.example.com&lt;/code>. It uses &lt;code>subjectAltNames&lt;/code> with a Hostname type, but you may also use a URI type.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>BackendTLSPolicy&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>tls-upstream-auth&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">targetRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Service&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>auth&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">group&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">sectionName&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;https&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">validation&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">caCertificateRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">group&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># core API group&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ConfigMap&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>auth-cert&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">subjectAltNames&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">type&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Hostname&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">hostname&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;auth.example.com&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In this example, the BackendTLSPolicy is configured to use system certificates to connect with a TLS-encrypted backend connection where Pods backing the dev Service are expected to serve a valid certificate for &lt;code>dev.example.com&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>BackendTLSPolicy&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>tls-upstream-dev&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">targetRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Service&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>dev&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">group&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">sectionName&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;btls&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">validation&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">wellKnownCACertificates&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;System&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">hostname&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>dev.example.com&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>More information on the configuration of TLS in Gateway API can be found in &lt;a href="https://gateway-api.sigs.k8s.io/guides/tls/">Gateway API - TLS Configuration&lt;/a>.&lt;/p>
&lt;h3 id="status-information-about-the-features-that-an-implementation-supports">Status information about the features that an implementation supports&lt;/h3>
&lt;p>Leads: &lt;a href="https://github.com/liorlieberman">Lior Lieberman&lt;/a>, &lt;a href="https://github.com/bexxmodd">Beka Modebadze&lt;/a>&lt;/p>
&lt;p>GEP-2162: &lt;a href="https://github.com/kubernetes-sigs/gateway-api/blob/main/geps/gep-2162/index.md">Supported features in GatewayClass Status&lt;/a>&lt;/p>
&lt;p>GatewayClass status has a new field, &lt;code>supportedFeatures&lt;/code>.
This addition allows implementations to declare the set of features they support. This provides a clear way for users and tools to understand the capabilities of a given GatewayClass.&lt;/p>
&lt;p>This feature's name for conformance tests (and GatewayClass status reporting) is &lt;strong>SupportedFeatures&lt;/strong>.
Implementations must populate the &lt;code>supportedFeatures&lt;/code> field in the &lt;code>.status&lt;/code> of the GatewayClass &lt;strong>before&lt;/strong> the GatewayClass
is accepted, or in the same operation.&lt;/p>
&lt;p>Here’s an example of a &lt;code>supportedFeatures&lt;/code> published under GatewayClass' &lt;code>.status&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>GatewayClass&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">status&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">conditions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">lastTransitionTime&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;2022-11-16T10:33:06Z&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">message&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Handled by Foo controller&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">observedGeneration&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">1&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">reason&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Accepted&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">status&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;True&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">type&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Accepted&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">supportedFeatures&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- HTTPRoute&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- HTTPRouteHostRewrite&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- HTTPRoutePortRedirect&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- HTTPRouteQueryParamMatching&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Graduation of SupportedFeatures to Standard, helped improve the conformance testing process for Gateway API.
The conformance test suite will now automatically run tests based on the features populated in the GatewayClass' status.
This creates a strong, verifiable link between an implementation's declared capabilities and the test results,
making it easier for implementers to run the correct conformance tests and for users to trust the conformance reports.&lt;/p>
&lt;p>This means when the SupportedFeatures field is populated in the GatewayClass status there will be no need for additional
conformance tests flags like &lt;code>–suported-features&lt;/code>, or &lt;code>–exempt&lt;/code> or &lt;code>–all-features&lt;/code>.
It's important to note that Mesh features are an exception to this and can be tested for conformance by using
&lt;em>Conformance Profiles&lt;/em>, or by manually providing any combination of features related flags until the dedicated resource
graduates from the experimental channel.&lt;/p>
&lt;h3 id="named-rules-for-routes">Named rules for Routes&lt;/h3>
&lt;p>GEP-995: &lt;a href="https://gateway-api.sigs.k8s.io/geps/gep-995">Adding a new name field to all xRouteRule types (HTTPRouteRule, GRPCRouteRule, etc.)&lt;/a>&lt;/p>
&lt;p>Leads: &lt;a href="https://github.com/guicassolato">Guilherme Cassolato&lt;/a>&lt;/p>
&lt;p>This enhancement enables route rules to be explicitly identified and referenced across the Gateway API ecosystem.
Some of the key use cases include:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Status:&lt;/strong> Allowing status conditions to reference specific rules directly by name.&lt;/li>
&lt;li>&lt;strong>Observability:&lt;/strong> Making it easier to identify individual rules in logs, traces, and metrics.&lt;/li>
&lt;li>&lt;strong>Policies:&lt;/strong> Enabling policies (&lt;a href="https://gateway-api.sigs.k8s.io/geps/gep-773">GEP-713&lt;/a>) to target specific route rules via the &lt;code>sectionName&lt;/code> field in their &lt;code>targetRef[s]&lt;/code>.&lt;/li>
&lt;li>&lt;strong>Tooling:&lt;/strong> Simplifying filtering and referencing of route rules in tools such as &lt;code>gwctl&lt;/code>, &lt;code>kubectl&lt;/code>, and general-purpose utilities like &lt;code>jq&lt;/code> and &lt;code>yq&lt;/code>.&lt;/li>
&lt;li>&lt;strong>Internal configuration mapping:&lt;/strong> Facilitating the generation of internal configurations that reference route rules by name within gateway and mesh implementations.&lt;/li>
&lt;/ul>
&lt;p>This follows the same well-established pattern already adopted for Gateway listeners, Service ports, Pods (and containers),
and many other Kubernetes resources.&lt;/p>
&lt;p>While the new name field is &lt;strong>optional&lt;/strong> (so existing resources remain valid), its use is &lt;strong>strongly encouraged&lt;/strong>.
Implementations are not expected to assign a default value, but they may enforce constraints such as immutability.&lt;/p>
&lt;p>Finally, keep in mind that the &lt;a href="https://gateway-api.sigs.k8s.io/geps/gep-995/?h=995#format">name format&lt;/a> is validated,
and other fields (such as &lt;a href="https://gateway-api.sigs.k8s.io/reference/spec/?h=sectionname#sectionname">&lt;code>sectionName&lt;/code>&lt;/a>)
may impose additional, indirect constraints.&lt;/p>
&lt;h2 id="experimental-channel-changes">Experimental channel changes&lt;/h2>
&lt;h3 id="enabling-external-auth-for-httproute">Enabling external Auth for HTTPRoute&lt;/h3>
&lt;p>Giving Gateway API the ability to enforce authentication and maybe authorization as well at the Gateway or HTTPRoute level has been a highly requested feature for a long time. (See the &lt;a href="https://github.com/kubernetes-sigs/gateway-api/issues/1494">GEP-1494 issue&lt;/a> for some background.)&lt;/p>
&lt;p>This Gateway API release adds an Experimental filter in HTTPRoute that tells the Gateway API implementation to call out to an external service to authenticate (and, optionally, authorize) requests.&lt;/p>
&lt;p>This filter is based on the &lt;a href="https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_authz_filter#config-http-filters-ext-authz">Envoy ext_authz API&lt;/a>, and allows talking to an Auth service that uses either gRPC or HTTP for its protocol.&lt;/p>
&lt;p>Both methods allow the configuration of what headers to forward to the Auth service, with the HTTP protocol allowing some extra information like a prefix path.&lt;/p>
&lt;p>A HTTP example might look like this (noting that this example requires the Experimental channel to be installed and an implementation that supports External Auth to actually understand the config):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>HTTPRoute&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>require-auth&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">namespace&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>default&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">parentRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>your-gateway-here&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">rules&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">matches&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">path&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">type&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Prefix&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>/admin&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">filters&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">type&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ExternalAuth&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">externalAuth&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">protocol&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>HTTP&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">backendRef&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>auth-service&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">http&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># These headers are always sent for the HTTP protocol,&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># but are included here for illustrative purposes&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">allowedHeaders&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- Host&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- Method&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- Path&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- Content-Length&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- Authorization&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">backendRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>admin-backend&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">port&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">8080&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This allows the backend Auth service to use the supplied headers to make a determination about the authentication for the request.&lt;/p>
&lt;p>When a request is allowed, the external Auth service will respond with a 200 HTTP response code, and optionally extra headers to be included in the request that is forwarded to the backend. When the request is denied, the Auth service will respond with a 403 HTTP response.&lt;/p>
&lt;p>Since the Authorization header is used in many authentication methods, this method can be used to do Basic, Oauth, JWT, and other common authentication and authorization methods.&lt;/p>
&lt;h3 id="mesh-resource">Mesh resource&lt;/h3>
&lt;p>Lead(s): &lt;a href="https://github.com/kflynn">Flynn&lt;/a>&lt;/p>
&lt;p>GEP-3949: &lt;a href="https://github.com/kubernetes-sigs/gateway-api/issues/3949">Mesh-wide configuration and supported features&lt;/a>&lt;/p>
&lt;p>Gateway API v1.4.0 introduces a new experimental Mesh resource, which provides a way to configure mesh-wide settings and discover the features supported by a given mesh implementation. This resource is analogous to the Gateway resource and will initially be mainly used for conformance testing, with plans to extend its use to off-cluster Gateways in the future.&lt;/p>
&lt;p>The Mesh resource is cluster-scoped and, as an experimental feature, is named &lt;code>XMesh&lt;/code> and resides in the &lt;code>gateway.networking.x-k8s.io&lt;/code> API group. A key field is controllerName, which specifies the mesh implementation responsible for the resource. The resource's &lt;code>status&lt;/code> stanza indicates whether the mesh implementation has accepted it and lists the features the mesh supports.&lt;/p>
&lt;p>One of the goals of this GEP is to avoid making it more difficult for users to adopt a mesh. To simplify adoption, mesh implementations are expected to create a default Mesh resource upon startup if one with a matching &lt;code>controllerName&lt;/code> doesn't already exist. This avoids the need for manual creation of the resource to begin using a mesh.&lt;/p>
&lt;p>The new XMesh API kind, within the gateway.networking.x-k8s.io/v1alpha1 API group,
provides a central point for mesh configuration and feature discovery (source).&lt;/p>
&lt;p>A minimal XMesh object specifies the &lt;code>controllerName&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.x-k8s.io/v1alpha1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>XMesh&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>one-mesh-to-mesh-them-all&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">controllerName&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>one-mesh.example.com/one-mesh&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The mesh implementation populates the status field to confirm it has accepted the resource and to list its supported features ( source):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">status&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">conditions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">type&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Accepted&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">status&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;True&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">reason&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Accepted&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">supportedFeatures&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>MeshHTTPRoute&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>OffClusterGateway&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="introducing-default-gateways">Introducing default Gateways&lt;/h3>
&lt;p>Lead(s): &lt;a href="https://github.com/kflynn">Flynn&lt;/a>&lt;/p>
&lt;p>GEP-3793: &lt;a href="https://github.com/kubernetes-sigs/gateway-api/issues/3793">Allowing Gateways to program some routes by default&lt;/a>.&lt;/p>
&lt;p>For application developers, one common piece of feedback has been the need to explicitly name a parent Gateway for every single north-south Route. While this explicitness prevents ambiguity, it adds friction, especially for developers who just want to expose their application to the outside world without worrying about the underlying infrastructure's naming scheme. To address this, we have introduce the concept of &lt;strong>Default Gateways&lt;/strong>.&lt;/p>
&lt;h4 id="for-application-developers-just-use-the-default">For application developers: Just &amp;quot;use the default&amp;quot;&lt;/h4>
&lt;p>As an application developer, you often don't care about the specific Gateway your traffic flows through, you just want it to work. With this enhancement, you can now create a Route and simply ask it to use a default Gateway.&lt;/p>
&lt;p>This is done by setting the new &lt;code>useDefaultGateways&lt;/code> field in your Route's &lt;code>spec&lt;/code>.&lt;/p>
&lt;p>Here’s a simple &lt;code>HTTPRoute&lt;/code> that uses a default Gateway:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>HTTPRoute&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-route&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">useDefaultGateways&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>All&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">rules&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">backendRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-service&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">port&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">80&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>That's it! No more need to hunt down the correct Gateway name for your environment. Your Route is now a &amp;quot;defaulted Route.&amp;quot;&lt;/p>
&lt;h4 id="for-cluster-operators-you-re-still-in-control">For cluster operators: You're still in control&lt;/h4>
&lt;p>This feature doesn't take control away from cluster operators (&amp;quot;Chihiro&amp;quot;).
In fact, they have explicit control over which Gateways can act as a default. A Gateway will only accept these &lt;em>defaulted Routes&lt;/em> if it is configured to do so.&lt;/p>
&lt;p>You can also use a ValidatingAdmissionPolicy to either require or even forbid for Routes to rely on a default Gateway.&lt;/p>
&lt;p>As a cluster operator, you can designate a Gateway as a default
by setting the (new) &lt;code>.spec.defaultScope&lt;/code> field:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Gateway&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-default-gateway&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">namespace&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>default&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">defaultScope&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>All&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># ... other gateway configuration&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Operators can choose to have no default Gateways, or even multiple.&lt;/p>
&lt;h4 id="how-it-works-and-key-details">How it works and key details&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>To maintain a clean, GitOps-friendly workflow, a default Gateway does &lt;em>not&lt;/em> modify the &lt;code>spec.parentRefs&lt;/code> of your Route. Instead, the binding is reflected in the Route's &lt;code>status&lt;/code> field. You can always inspect the &lt;code>status.parents&lt;/code> stanza of your Route to see exactly which Gateway or Gateways have accepted it. This preserves your original intent and avoids conflicts with CD tools.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The design explicitly supports having multiple Gateways designated as defaults within a cluster. When this happens, a defaulted Route will bind to &lt;em>all&lt;/em> of them. This enables cluster operators to perform zero-downtime migrations and testing of new default Gateways.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>You can create a single Route that handles both north-south traffic (traffic entering or leaving the cluster, via a default Gateway) and east-west/mesh traffic (traffic between services within the cluster), by explicitly referencing a Service in &lt;code>parentRefs&lt;/code>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Default Gateways represent a significant step forward in making the Gateway API simpler and more intuitive for everyday use cases, bridging the gap between the flexibility needed by operators and the simplicity desired by developers.&lt;/p>
&lt;h3 id="configuring-client-certificate-validation">Configuring client certificate validation&lt;/h3>
&lt;p>Lead(s): &lt;a href="https://github.com/arkodg">Arko Dasgupta&lt;/a>, &lt;a href="https://github.com/kl52752">Katarzyna Łach&lt;/a>&lt;/p>
&lt;p>GEP-91: &lt;a href="https://github.com/kubernetes-sigs/gateway-api/pull/3942">Address connection coalescing security issue&lt;/a>&lt;/p>
&lt;p>This release brings updates for configuring client certificate validation, addressing a critical security vulnerability related to connection reuse.
HTTP connection coalescing is a web performance optimization that allows a client to reuse an existing TLS connection
for requests to different domains. While this reduces the overhead of establishing new connections, it introduces a security risk
in the context of API gateways.
The ability to reuse a single TLS connection across multiple Listeners brings the need to introduce shared client certificate
configuration in order to avoid unauthorized access.&lt;/p>
&lt;h4 id="why-sni-based-mtls-is-not-the-answer">Why SNI-based mTLS is not the answer&lt;/h4>
&lt;p>One might think that using Server Name Indication (SNI) to differentiate between Listeners would solve this problem.
However, TLS SNI is not a reliable mechanism for enforcing security policies in a connection coalescing scenario.
A client could use a single TLS connection for multiple peer connections, as long as they are all covered by the same certificate.
This means that a client could establish a connection by indicating one peer identity (using SNI), and then reuse that connection
to access a different virtual host that is listening on the same IP address and port. That reuse, which is controlled by client side
heuristics, could bypass mutual TLS policies that were specific to the second listener configuration.&lt;/p>
&lt;p>Here's an example to help explain it:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gateway.networking.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Gateway&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>wildcard-tls-gateway&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">gatewayClassName&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>example&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">listeners&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>foo-https&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">protocol&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>HTTPS&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">port&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">443&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">hostname&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>foo.example.com&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tls&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">certificateRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">group&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># core API group&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Secret&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name: foo-example-com-cert # SAN&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>foo.example.com&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>wildcard-https&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">protocol&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>HTTPS&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">port&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">443&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">hostname&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;*.example.com&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tls&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">certificateRefs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">group&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># core API group&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Secret&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name: wildcard-example-com-cert # SAN&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080">*.example.com&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>I have configured a Gateway with two listeners, both having overlapping hostnames.
My intention is for the &lt;code>foo-http&lt;/code> listener to be accessible only by clients presenting the &lt;code>foo-example-com-cert&lt;/code> certificate.
In contrast, the &lt;code>wildcard-https&lt;/code> listener should allow access to a broader audience using any certificate valid for the &lt;code>*.example.com&lt;/code> domain.&lt;/p>
&lt;p>Consider a scenario where a client initially connects to &lt;code>foo.example.com&lt;/code>. The server requests and successfully validates the
&lt;code>foo-example-com-cert&lt;/code> certificate, establishing the connection. Subsequently, the same client wishes to access other sites within this domain,
such as &lt;code>bar.example.com&lt;/code>, which is handled by the &lt;code>wildcard-https&lt;/code> listener. Due to connection reuse,
clients can access &lt;code>wildcard-https&lt;/code> backends without an additional TLS handshake on the existing connection.
This process functions as expected.&lt;/p>
&lt;p>However, a critical security vulnerability arises when the order of access is reversed.
If a client first connects to &lt;code>bar.example.com&lt;/code> and presents a valid &lt;code>bar.example.com&lt;/code> certificate, the connection is successfully established.
If this client then attempts to access &lt;code>foo.example.com&lt;/code>, the existing connection's client certificate will not be re-validated.
This allows the client to bypass the specific certificate requirement for the &lt;code>foo&lt;/code> backend, leading to a serious security breach.&lt;/p>
&lt;h4 id="the-solution-per-port-tls-configuration">The solution: per-port TLS configuration&lt;/h4>
&lt;p>The updated Gateway API gains a &lt;code>tls&lt;/code> field in the &lt;code>.spec&lt;/code> of a Gateway, that allows you to define a default client certificate
validation configuration for all Listeners, and then if needed override it on a per-port basis. This provides a flexible and
powerful way to manage your TLS policies.&lt;/p>
&lt;p>Here’s a look at the updated API definitions (shown as Go source code):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">// GatewaySpec defines the desired state of Gateway.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> GatewaySpec &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#666">...&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// GatewayTLSConfig specifies frontend tls configuration for gateway.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> TLS &lt;span style="color:#666">*&lt;/span>GatewayTLSConfig &lt;span style="color:#b44">`json:&amp;#34;tls,omitempty&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">// GatewayTLSConfig specifies frontend tls configuration for gateway.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> GatewayTLSConfig &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Default specifies the default client certificate validation configuration
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> Default TLSConfig &lt;span style="color:#b44">`json:&amp;#34;default&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// PerPort specifies tls configuration assigned per port.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> PerPort []TLSPortConfig &lt;span style="color:#b44">`json:&amp;#34;perPort,omitempty&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">// TLSPortConfig describes a TLS configuration for a specific port.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> TLSPortConfig &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// The Port indicates the Port Number to which the TLS configuration will be applied.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> Port PortNumber &lt;span style="color:#b44">`json:&amp;#34;port&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// TLS store the configuration that will be applied to all Listeners handling
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#080;font-style:italic">// HTTPS traffic and matching given port.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> TLS TLSConfig &lt;span style="color:#b44">`json:&amp;#34;tls&amp;#34;`&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="breaking-changes">Breaking changes&lt;/h2>
&lt;h3 id="breaking-grpcroute">Standard GRPCRoute - &lt;code>.spec&lt;/code> field required (technicality)&lt;/h3>
&lt;p>The promotion of GRPCRoute to Standard introduces a minor but technically breaking change regarding the presence of the top-level &lt;code>.spec&lt;/code> field.
As part of achieving Standard status, the Gateway API has tightened the OpenAPI schema validation within the GRPCRoute
CustomResourceDefinition (CRD)
to explicitly ensure the spec field is required for all GRPCRoute resources.
This change enforces stricter conformance to Kubernetes object standards and enhances the resource's stability and predictability.
While it is highly unlikely that users were attempting to define a GRPCRoute without any specification, any existing automation
or manifests that might have relied on a relaxed interpretation allowing a completely absent &lt;code>spec&lt;/code> field will now fail validation
and &lt;strong>must&lt;/strong> be updated to include the &lt;code>.spec&lt;/code> field, even if empty.&lt;/p>
&lt;h3 id="breaking-httproute">Experimental CORS support in HTTPRoute - breaking change for &lt;code>allowCredentials&lt;/code> field&lt;/h3>
&lt;p>The Gateway API subproject has introduced a breaking change to the Experimental CORS support in HTTPRoute, concerning the &lt;code>allowCredentials&lt;/code> field
within the CORS policy.
This field's definition has been strictly aligned with the upstream CORS specification, which dictates that the corresponding
&lt;code>Access-Control-Allow-Credentials&lt;/code> header must represent a Boolean value.
Previously, the implementation might have been overly permissive, potentially accepting non-standard or string representations such as
&lt;code>true&lt;/code> due to relaxed schema validation.
Users who were configuring CORS rules must now review their manifests and ensure the value for &lt;code>allowCredentials&lt;/code>
strictly conforms to the new, more restrictive schema.
Any existing HTTPRoute definitions that do not adhere to this stricter validation will now be rejected by the API server,
requiring a configuration update to maintain functionality.&lt;/p>
&lt;h2 id="improving-the-development-and-usage-experience">Improving the development and usage experience&lt;/h2>
&lt;p>As part of this release, we have improved some of the developer experience workflow:&lt;/p>
&lt;ul>
&lt;li>Added &lt;a href="https://github.com/kubernetes-sigs/kube-api-linter">Kube API Linter&lt;/a> to the CI/CD pipelines, reducing the burden of API reviewers and also reducing the amount of common mistakes.&lt;/li>
&lt;li>Improving the execution time of CRD tests with the usage of &lt;a href="https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest">&lt;code>envtest&lt;/code>&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>Additionally, as part of the effort to improve Gateway API usage experience, some efforts were made to remove some ambiguities and some old tech-debts from our documentation website:&lt;/p>
&lt;ul>
&lt;li>The API reference is now explicit when a field is &lt;code>experimental&lt;/code>.&lt;/li>
&lt;li>The GEP (GatewayAPI Enhancement Proposal) navigation bar is automatically generated, reflecting the real status of the enhancements.&lt;/li>
&lt;/ul>
&lt;h2 id="try-it-out">Try it out&lt;/h2>
&lt;p>Unlike other Kubernetes APIs, you don't need to upgrade to the latest version of
Kubernetes to get the latest version of Gateway API. As long as you're running
Kubernetes 1.26 or later, you'll be able to get up and running with this version
of Gateway API.&lt;/p>
&lt;p>To try out the API, follow the &lt;a href="https://gateway-api.sigs.k8s.io/guides/">Getting Started Guide&lt;/a>.&lt;/p>
&lt;p>As of this writing, seven implementations are already conformant with Gateway API v1.4.0. In alphabetical order:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/kgateway-dev/kgateway/releases/tag/v2.2.0-alpha.1">Agent Gateway (with kgateway)&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/airlock/microgateway/releases/tag/4.8.0-alpha1">Airlock Microgateway&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/envoyproxy/gateway/releases/tag/v1.6.0-rc.1">Envoy Gateway&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://docs.cloud.google.com/kubernetes-engine/docs/concepts/gateway-api">GKE Gateway&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/istio/istio/releases/tag/1.28.0-rc.1">Istio&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kgateway-dev/kgateway/releases/tag/v2.1.0">kgateway&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/traefik/traefik/releases/tag/v3.6.0-rc1">Traefik Proxy&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>Wondering when a feature will be added? There are lots of opportunities to get
involved and help define the future of Kubernetes routing APIs for both ingress
and service mesh.&lt;/p>
&lt;ul>
&lt;li>Check out the &lt;a href="https://gateway-api.sigs.k8s.io/guides">user guides&lt;/a> to see what use-cases can be addressed.&lt;/li>
&lt;li>Try out one of the &lt;a href="https://gateway-api.sigs.k8s.io/implementations/">existing Gateway controllers&lt;/a>.&lt;/li>
&lt;li>Or &lt;a href="https://gateway-api.sigs.k8s.io/contributing/">join us in the community&lt;/a>
and help us build the future of Gateway API together!&lt;/li>
&lt;/ul>
&lt;p>The maintainers would like to thank &lt;em>everyone&lt;/em> who's contributed to Gateway
API, whether in the form of commits to the repo, discussion, ideas, or general
support. We could never have made this kind of progress without the support of
this dedicated and active community.&lt;/p>
&lt;h2 id="related-kubernetes-blog-articles">Related Kubernetes blog articles&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.io/blog/2025/06/02/gateway-api-v1-3/">Gateway API v1.3.0: Advancements in Request Mirroring, CORS, Gateway Merging, and Retry Budgets&lt;/a>
(June 2025)&lt;/li>
&lt;li>&lt;a href="https://kubernetes.io/blog/2024/11/21/gateway-api-v1-2/">Gateway API v1.2: WebSockets, Timeouts, Retries, and More&lt;/a>
(November 2024)&lt;/li>
&lt;li>&lt;a href="https://kubernetes.io/blog/2024/05/09/gateway-api-v1-1/">Gateway API v1.1: Service mesh, GRPCRoute, and a whole lot more&lt;/a>
(May 2024)&lt;/li>
&lt;li>&lt;a href="https://kubernetes.io/blog/2023/11/28/gateway-api-ga/">New Experimental Features in Gateway API v1.0&lt;/a>
(November 2023)&lt;/li>
&lt;li>&lt;a href="https://kubernetes.io/blog/2023/10/31/gateway-api-ga/">Gateway API v1.0: GA Release&lt;/a>
(October 2023)&lt;/li>
&lt;/ul></description></item><item><title>7 Common Kubernetes Pitfalls (and How I Learned to Avoid Them)</title><link>https://kubernetes.io/blog/2025/10/20/seven-kubernetes-pitfalls-and-how-to-avoid/</link><pubDate>Mon, 20 Oct 2025 08:30:00 -0700</pubDate><guid>https://kubernetes.io/blog/2025/10/20/seven-kubernetes-pitfalls-and-how-to-avoid/</guid><description>
&lt;p>It’s no secret that Kubernetes can be both powerful and frustrating at times. When I first started dabbling with container orchestration, I made more than my fair share of mistakes enough to compile a whole list of pitfalls. In this post, I want to walk through seven big gotchas I’ve encountered (or seen others run into) and share some tips on how to avoid them. Whether you’re just kicking the tires on Kubernetes or already managing production clusters, I hope these insights help you steer clear of a little extra stress.&lt;/p>
&lt;h2 id="1-skipping-resource-requests-and-limits">1. Skipping resource requests and limits&lt;/h2>
&lt;p>&lt;strong>The pitfall&lt;/strong>: Not specifying CPU and memory requirements in Pod specifications. This typically happens because Kubernetes does not require these fields, and workloads can often start and run without them—making the omission easy to overlook in early configurations or during rapid deployment cycles.&lt;/p>
&lt;p>&lt;strong>Context&lt;/strong>:
In Kubernetes, resource requests and limits are critical for efficient cluster management. Resource requests ensure that the scheduler reserves the appropriate amount of CPU and memory for each pod, guaranteeing that it has the necessary resources to operate. Resource limits cap the amount of CPU and memory a pod can use, preventing any single pod from consuming excessive resources and potentially starving other pods.
When resource requests and limits are not set:&lt;/p>
&lt;ol>
&lt;li>Resource Starvation: Pods may get insufficient resources, leading to degraded performance or failures. This is because Kubernetes schedules pods based on these requests. Without them, the scheduler might place too many pods on a single node, leading to resource contention and performance bottlenecks.&lt;/li>
&lt;li>Resource Hoarding: Conversely, without limits, a pod might consume more than its fair share of resources, impacting the performance and stability of other pods on the same node. This can lead to issues such as other pods getting evicted or killed by the Out-Of-Memory (OOM) killer due to lack of available memory.&lt;/li>
&lt;/ol>
&lt;h3 id="how-to-avoid-it">How to avoid it:&lt;/h3>
&lt;ul>
&lt;li>Start with modest &lt;code>requests&lt;/code> (for example &lt;code>100m&lt;/code> CPU, &lt;code>128Mi&lt;/code> memory) and see how your app behaves.&lt;/li>
&lt;li>Monitor real-world usage and refine your values; the &lt;a href="https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/">HorizontalPodAutoscaler&lt;/a> can help automate scaling based on metrics.&lt;/li>
&lt;li>Keep an eye on &lt;code>kubectl top pods&lt;/code> or your logging/monitoring tool to confirm you’re not over- or under-provisioning.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>My reality check&lt;/strong>: Early on, I never thought about memory limits. Things seemed fine on my local cluster. Then, on a larger environment, Pods got &lt;em>OOMKilled&lt;/em> left and right. Lesson learned.
For detailed instructions on configuring resource requests and limits for your containers, please refer to &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/">Assign Memory Resources to Containers and Pods&lt;/a>
(part of the official Kubernetes documentation).&lt;/p>
&lt;h2 id="2-underestimating-liveness-and-readiness-probes">2. Underestimating liveness and readiness probes&lt;/h2>
&lt;p>&lt;strong>The pitfall&lt;/strong>: Deploying containers without explicitly defining how Kubernetes should check their health or readiness. This tends to happen because Kubernetes will consider a container “running” as long as the process inside hasn’t exited. Without additional signals, Kubernetes assumes the workload is functioning—even if the application inside is unresponsive, initializing, or stuck.&lt;/p>
&lt;p>&lt;strong>Context&lt;/strong>:&lt;br>
Liveness, readiness, and startup probes are mechanisms Kubernetes uses to monitor container health and availability.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Liveness probes&lt;/strong> determine if the application is still alive. If a liveness check fails, the container is restarted.&lt;/li>
&lt;li>&lt;strong>Readiness probes&lt;/strong> control whether a container is ready to serve traffic. Until the readiness probe passes, the container is removed from Service endpoints.&lt;/li>
&lt;li>&lt;strong>Startup probes&lt;/strong> help distinguish between long startup times and actual failures.&lt;/li>
&lt;/ul>
&lt;h3 id="how-to-avoid-it-1">How to avoid it:&lt;/h3>
&lt;ul>
&lt;li>Add a simple HTTP &lt;code>livenessProbe&lt;/code> to check a health endpoint (for example &lt;code>/healthz&lt;/code>) so Kubernetes can restart a hung container.&lt;/li>
&lt;li>Use a &lt;code>readinessProbe&lt;/code> to ensure traffic doesn’t reach your app until it’s warmed up.&lt;/li>
&lt;li>Keep probes simple. Overly complex checks can create false alarms and unnecessary restarts.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>My reality check&lt;/strong>: I once forgot a readiness probe for a web service that took a while to load. Users hit it prematurely, got weird timeouts, and I spent hours scratching my head. A 3-line readiness probe would have saved the day.&lt;/p>
&lt;p>For comprehensive instructions on configuring liveness, readiness, and startup probes for containers, please refer to &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/">Configure Liveness, Readiness and Startup Probes&lt;/a>
in the official Kubernetes documentation.&lt;/p>
&lt;h2 id="3-we-ll-just-look-at-container-logs-famous-last-words">3. “We’ll just look at container logs” (famous last words)&lt;/h2>
&lt;p>&lt;strong>The pitfall&lt;/strong>: Relying solely on container logs retrieved via &lt;code>kubectl logs&lt;/code>. This often happens because the command is quick and convenient, and in many setups, logs appear accessible during development or early troubleshooting. However, &lt;code>kubectl logs&lt;/code> only retrieves logs from currently running or recently terminated containers, and those logs are stored on the node’s local disk. As soon as the container is deleted, evicted, or the node is restarted, the log files may be rotated out or permanently lost.&lt;/p>
&lt;h3 id="how-to-avoid-it-2">How to avoid it:&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Centralize logs&lt;/strong> using CNCF tools like &lt;a href="https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent">Fluentd&lt;/a> or &lt;a href="https://fluentbit.io/">Fluent Bit&lt;/a> to aggregate output from all Pods.&lt;/li>
&lt;li>&lt;strong>Adopt OpenTelemetry&lt;/strong> for a unified view of logs, metrics, and (if needed) traces. This lets you spot correlations between infrastructure events and app-level behavior.&lt;/li>
&lt;li>&lt;strong>Pair logs with Prometheus metrics&lt;/strong> to track cluster-level data alongside application logs. If you need distributed tracing, consider CNCF projects like &lt;a href="https://www.jaegertracing.io/">Jaeger&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>My reality check&lt;/strong>: The first time I lost Pod logs to a quick restart, I realized how flimsy “kubectl logs” can be on its own. Since then, I’ve set up a proper pipeline for every cluster to avoid missing vital clues.&lt;/p>
&lt;h2 id="4-treating-dev-and-prod-exactly-the-same">4. Treating dev and prod exactly the same&lt;/h2>
&lt;p>&lt;strong>The pitfall&lt;/strong>: Deploying the same Kubernetes manifests with identical settings across development, staging, and production environments. This often occurs when teams aim for consistency and reuse, but overlook that environment-specific factors—such as traffic patterns, resource availability, scaling needs, or access control—can differ significantly. Without customization, configurations optimized for one environment may cause instability, poor performance, or security gaps in another.&lt;/p>
&lt;h3 id="how-to-avoid-it-3">How to avoid it:&lt;/h3>
&lt;ul>
&lt;li>Use environment overlays or &lt;a href="https://kustomize.io/">kustomize&lt;/a> to maintain a shared base while customizing resource requests, replicas, or config for each environment.&lt;/li>
&lt;li>Extract environment-specific configuration into ConfigMaps and / or Secrets. You can use a specialized tool such as &lt;a href="https://github.com/bitnami-labs/sealed-secrets">Sealed Secrets&lt;/a> to manage confidential data.&lt;/li>
&lt;li>Plan for scale in production. Your dev cluster can probably get away with minimal CPU/memory, but prod might need significantly more.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>My reality check&lt;/strong>: One time, I scaled up &lt;code>replicaCount&lt;/code> from 2 to 10 in a tiny dev environment just to “test.” I promptly ran out of resources and spent half a day cleaning up the aftermath. Oops.&lt;/p>
&lt;h2 id="5-leaving-old-stuff-floating-around">5. Leaving old stuff floating around&lt;/h2>
&lt;p>&lt;strong>The pitfall&lt;/strong>: Leaving unused or outdated resources—such as Deployments, Services, ConfigMaps, or PersistentVolumeClaims—running in the cluster. This often happens because Kubernetes does not automatically remove resources unless explicitly instructed, and there is no built-in mechanism to track ownership or expiration. Over time, these forgotten objects can accumulate, consuming cluster resources, increasing cloud costs, and creating operational confusion, especially when stale Services or LoadBalancers continue to route traffic.&lt;/p>
&lt;h3 id="how-to-avoid-it-4">How to avoid it:&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Label everything&lt;/strong> with a purpose or owner label. That way, you can easily query resources you no longer need.&lt;/li>
&lt;li>&lt;strong>Regularly audit&lt;/strong> your cluster: run &lt;code>kubectl get all -n &amp;lt;namespace&amp;gt;&lt;/code> to see what’s actually running, and confirm it’s all legit.&lt;/li>
&lt;li>&lt;strong>Adopt Kubernetes’ Garbage Collection&lt;/strong>: &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/">K8s docs&lt;/a> show how to remove dependent objects automatically.&lt;/li>
&lt;li>&lt;strong>Leverage policy automation&lt;/strong>: Tools like &lt;a href="https://kyverno.io/">Kyverno&lt;/a> can automatically delete or block stale resources after a certain period, or enforce lifecycle policies so you don’t have to remember every single cleanup step.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>My reality check&lt;/strong>: After a hackathon, I forgot to tear down a “test-svc” pinned to an external load balancer. Three weeks later, I realized I’d been paying for that load balancer the entire time. Facepalm.&lt;/p>
&lt;h2 id="6-diving-too-deep-into-networking-too-soon">6. Diving too deep into networking too soon&lt;/h2>
&lt;p>&lt;strong>The pitfall&lt;/strong>: Introducing advanced networking solutions—such as service meshes, custom CNI plugins, or multi-cluster communication—before fully understanding Kubernetes' native networking primitives. This commonly occurs when teams implement features like traffic routing, observability, or mTLS using external tools without first mastering how core Kubernetes networking works: including Pod-to-Pod communication, ClusterIP Services, DNS resolution, and basic ingress traffic handling. As a result, network-related issues become harder to troubleshoot, especially when overlays introduce additional abstractions and failure points.&lt;/p>
&lt;h3 id="how-to-avoid-it-5">How to avoid it:&lt;/h3>
&lt;ul>
&lt;li>Start small: a Deployment, a Service, and a basic ingress controller such as one based on NGINX (e.g., Ingress-NGINX).&lt;/li>
&lt;li>Make sure you understand how traffic flows within the cluster, how service discovery works, and how DNS is configured.&lt;/li>
&lt;li>Only move to a full-blown mesh or advanced CNI features when you actually need them, complex networking adds overhead.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>My reality check&lt;/strong>: I tried Istio on a small internal app once, then spent more time debugging Istio itself than the actual app. Eventually, I stepped back, removed Istio, and everything worked fine.&lt;/p>
&lt;h2 id="7-going-too-light-on-security-and-rbac">7. Going too light on security and RBAC&lt;/h2>
&lt;p>&lt;strong>The pitfall&lt;/strong>: Deploying workloads with insecure configurations, such as running containers as the root user, using the &lt;code>latest&lt;/code> image tag, disabling security contexts, or assigning overly broad RBAC roles like &lt;code>cluster-admin&lt;/code>. These practices persist because Kubernetes does not enforce strict security defaults out of the box, and the platform is designed to be flexible rather than opinionated. Without explicit security policies in place, clusters can remain exposed to risks like container escape, unauthorized privilege escalation, or accidental production changes due to unpinned images.&lt;/p>
&lt;h3 id="how-to-avoid-it-6">How to avoid it:&lt;/h3>
&lt;ul>
&lt;li>Use &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/">RBAC&lt;/a> to define roles and permissions within Kubernetes. While RBAC is the default and most widely supported authorization mechanism, Kubernetes also allows the use of alternative authorizers. For more advanced or external policy needs, consider solutions like &lt;a href="https://open-policy-agent.github.io/gatekeeper/">OPA Gatekeeper&lt;/a> (based on Rego), &lt;a href="https://kyverno.io/">Kyverno&lt;/a>, or custom webhooks using policy languages such as CEL or &lt;a href="https://cedarpolicy.com/">Cedar&lt;/a>.&lt;/li>
&lt;li>Pin images to specific versions (no more &lt;code>:latest&lt;/code>!). This helps you know what’s actually deployed.&lt;/li>
&lt;li>Look into &lt;a href="https://kubernetes.io/docs/concepts/security/pod-security-admission/">Pod Security Admission&lt;/a> (or other solutions like Kyverno) to enforce non-root containers, read-only filesystems, etc.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>My reality check&lt;/strong>: I never had a huge security breach, but I’ve heard plenty of cautionary tales. If you don’t tighten things up, it’s only a matter of time before something goes wrong.&lt;/p>
&lt;h2 id="final-thoughts">Final thoughts&lt;/h2>
&lt;p>Kubernetes is amazing, but it’s not psychic, it won’t magically do the right thing if you don’t tell it what you need. By keeping these pitfalls in mind, you’ll avoid a lot of headaches and wasted time. Mistakes happen (trust me, I’ve made my share), but each one is a chance to learn more about how Kubernetes truly works under the hood.
If you’re curious to dive deeper, the &lt;a href="https://kubernetes.io/docs/home/">official docs&lt;/a> and the &lt;a href="http://slack.kubernetes.io/">community Slack&lt;/a> are excellent next steps. And of course, feel free to share your own horror stories or success tips, because at the end of the day, we’re all in this cloud native adventure together.&lt;/p>
&lt;p>&lt;strong>Happy Shipping!&lt;/strong>&lt;/p></description></item><item><title>Spotlight on Policy Working Group</title><link>https://kubernetes.io/blog/2025/10/18/wg-policy-spotlight-2025/</link><pubDate>Sat, 18 Oct 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/10/18/wg-policy-spotlight-2025/</guid><description>
&lt;p>&lt;em>(Note: The Policy Working Group has completed its mission and is no longer active. This article reflects its work, accomplishments, and insights into how a working group operates.)&lt;/em>&lt;/p>
&lt;p>In the complex world of Kubernetes, policies play a crucial role in managing and securing clusters. But have you ever wondered how these policies are developed, implemented, and standardized across the Kubernetes ecosystem? To answer that, let's take a look back at the work of the Policy Working Group.&lt;/p>
&lt;p>The Policy Working Group was dedicated to a critical mission: providing an overall architecture that encompasses both current policy-related implementations and future policy proposals in Kubernetes. Their goal was both ambitious and essential: to develop a universal policy architecture that benefits developers and end-users alike.&lt;/p>
&lt;p>Through collaborative methods, this working group strove to bring clarity and consistency to the often complex world of Kubernetes policies. By focusing on both existing implementations and future proposals, they ensured that the policy landscape in Kubernetes remains coherent and accessible as the technology evolves.&lt;/p>
&lt;p>This blog post dives deeper into the work of the Policy Working Group, guided by insights from its former co-chairs:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://twitter.com/JimBugwadia">Jim Bugwadia&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://twitter.com/poonam_lamba">Poonam Lamba&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://twitter.com/sudermanjr">Andy Suderman&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>&lt;em>Interviewed by &lt;a href="https://twitter.com/arujjval">Arujjwal Negi&lt;/a>.&lt;/em>&lt;/p>
&lt;p>These co-chairs explained what the Policy Working Group was all about.&lt;/p>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>&lt;strong>Hello, thank you for the time! Let’s start with some introductions, could you tell us a bit about yourself, your role, and how you got involved in Kubernetes?&lt;/strong>&lt;/p>
&lt;p>&lt;strong>Jim Bugwadia&lt;/strong>: My name is Jim Bugwadia, and I am a co-founder and the CEO at Nirmata which provides solutions that automate security and compliance for cloud-native workloads. At Nirmata, we have been working with Kubernetes since it started in 2014. We initially built a Kubernetes policy engine in our commercial platform and later donated it to CNCF as the Kyverno project. I joined the CNCF Kubernetes Policy Working Group to help build and standardize various aspects of policy management for Kubernetes and later became a co-chair.&lt;/p>
&lt;p>&lt;strong>Andy Suderman&lt;/strong>: My name is Andy Suderman and I am the CTO of Fairwinds, a managed Kubernetes-as-a-Service provider. I began working with Kubernetes in 2016 building a web conferencing platform. I am an author and/or maintainer of several Kubernetes-related open-source projects such as Goldilocks, Pluto, and Polaris. Polaris is a JSON-schema-based policy engine, which started Fairwinds' journey into the policy space and my involvement in the Policy Working Group.&lt;/p>
&lt;p>&lt;strong>Poonam Lamba&lt;/strong>: My name is Poonam Lamba, and I currently work as a Product Manager for Google Kubernetes Engine (GKE) at Google. My journey with Kubernetes began back in 2017 when I was building an SRE platform for a large enterprise, using a private cloud built on Kubernetes. Intrigued by its potential to revolutionize the way we deployed and managed applications at the time, I dove headfirst into learning everything I could about it. Since then, I've had the opportunity to build the policy and compliance products for GKE. I lead and contribute to GKE CIS benchmarks. I am involved with the Gatekeeper project as well as I have contributed to Policy-WG for over 2 years and served as a co-chair for the group.&lt;/p>
&lt;p>&lt;em>Responses to the following questions represent an amalgamation of insights from the former co-chairs.&lt;/em>&lt;/p>
&lt;h2 id="about-working-groups">About Working Groups&lt;/h2>
&lt;p>&lt;strong>One thing even I am not aware of is the difference between a working group and a SIG. Can you help us understand what a working group is and how it is different from a SIG?&lt;/strong>&lt;/p>
&lt;p>Unlike SIGs, working groups are temporary and focused on tackling specific, cross-cutting issues or projects that may involve multiple SIGs. Their lifespan is defined, and they disband once they've achieved their objective. Generally, working groups don't own code or have long-term responsibility for managing a particular area of the Kubernetes project.&lt;/p>
&lt;p>(To know more about SIGs, visit the &lt;a href="https://github.com/kubernetes/community/blob/master/sig-list.md">list of Special Interest Groups&lt;/a>)&lt;/p>
&lt;p>&lt;strong>You mentioned that Working Groups involve multiple SIGS. What SIGS was the Policy WG closely involved with, and how did you coordinate with them?&lt;/strong>&lt;/p>
&lt;p>The group collaborated closely with Kubernetes SIG Auth throughout our existence, and more recently, the group also worked with SIG Security since its formation. Our collaboration occurred in a few ways. We provided periodic updates during the SIG meetings to keep them informed of our progress and activities. Additionally, we utilize other community forums to maintain open lines of communication and ensured our work aligned with the broader Kubernetes ecosystem. This collaborative approach helped the group stay coordinated with related efforts across the Kubernetes community.&lt;/p>
&lt;h2 id="policy-wg">Policy WG&lt;/h2>
&lt;p>&lt;strong>Why was the Policy Working Group created?&lt;/strong>&lt;/p>
&lt;p>To enable a broad set of use cases, we recognize that Kubernetes is powered by a highly declarative, fine-grained, and extensible configuration management system. We've observed that a Kubernetes configuration manifest may have different portions that are important to various stakeholders. For example, some parts may be crucial for developers, while others might be of particular interest to security teams or address operational concerns. Given this complexity, we believe that policies governing the usage of these intricate configurations are essential for success with Kubernetes.&lt;/p>
&lt;p>Our Policy Working Group was created specifically to research the standardization of policy definitions and related artifacts. We saw a need to bring consistency and clarity to how policies are defined and implemented across the Kubernetes ecosystem, given the diverse requirements and stakeholders involved in Kubernetes deployments.&lt;/p>
&lt;p>&lt;strong>Can you give me an idea of the work you did in the group?&lt;/strong>&lt;/p>
&lt;p>We worked on several Kubernetes policy-related projects. Our initiatives included:&lt;/p>
&lt;ul>
&lt;li>We worked on a Kubernetes Enhancement Proposal (KEP) for the Kubernetes Policy Reports API. This aims to standardize how policy reports are generated and consumed within the Kubernetes ecosystem.&lt;/li>
&lt;li>We conducted a CNCF survey to better understand policy usage in the Kubernetes space. This helped gauge the practices and needs across the community at the time.&lt;/li>
&lt;li>We wrote a paper that will guide users in achieving PCI-DSS compliance for containers. This is intended to help organizations meet important security standards in their Kubernetes environments.&lt;/li>
&lt;li>We also worked on a paper highlighting how shifting security down can benefit organizations. This focuses on the advantages of implementing security measures earlier in the development and deployment process.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Can you tell us what were the main objectives of the Policy Working Group and some of your key accomplishments?&lt;/strong>&lt;/p>
&lt;p>The charter of the Policy WG was to help standardize policy management for Kubernetes and educate the community on best practices.&lt;/p>
&lt;p>To accomplish this we updated the Kubernetes documentation (&lt;a href="https://kubernetes.io/docs/concepts/policy">Policies | Kubernetes&lt;/a>), produced several whitepapers (&lt;a href="https://github.com/kubernetes/sig-security/blob/main/sig-security-docs/papers/policy/CNCF_Kubernetes_Policy_Management_WhitePaper_v1.pdf">Kubernetes Policy Management&lt;/a>, &lt;a href="https://github.com/kubernetes/sig-security/blob/main/sig-security-docs/papers/policy_grc/Kubernetes_Policy_WG_Paper_v1_101123.pdf">Kubernetes GRC&lt;/a>), and created the Policy Reports API (&lt;a href="https://github.com/kubernetes-retired/wg-policy-prototypes/blob/master/policy-report/docs/api-docs.md">API reference&lt;/a>) which standardizes reporting across various tools. Several popular tools such as Falco, Trivy, Kyverno, kube-bench, and others support the Policy Report API. A major milestone for the Policy WG was promoting the Policy Reports API to a SIG-level API or finding it a stable home.&lt;/p>
&lt;p>Beyond that, as &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/">ValidatingAdmissionPolicy&lt;/a> and &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/mutating-admission-policy/">MutatingAdmissionPolicy&lt;/a> approached GA in Kubernetes, a key goal of the WG was to guide and educate the community on the tradeoffs and appropriate usage patterns for these built-in API objects and other CNCF policy management solutions like OPA/Gatekeeper and Kyverno.&lt;/p>
&lt;h2 id="challenges">Challenges&lt;/h2>
&lt;p>&lt;strong>What were some of the major challenges that the Policy Working Group worked on?&lt;/strong>&lt;/p>
&lt;p>During our work in the Policy Working Group, we encountered several challenges:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>One of the main issues we faced was finding time to consistently contribute. Given that many of us have other professional commitments, it can be difficult to dedicate regular time to the working group's initiatives.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Another challenge we experienced was related to our consensus-driven model. While this approach ensures that all voices are heard, it can sometimes lead to slower decision-making processes. We valued thorough discussion and agreement, but this can occasionally delay progress on our projects.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We've also encountered occasional differences of opinion among group members. These situations require careful navigation to ensure that we maintain a collaborative and productive environment while addressing diverse viewpoints.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Lastly, we've noticed that newcomers to the group may find it difficult to contribute effectively without consistent attendance at our meetings. The complex nature of our work often requires ongoing context, which can be challenging for those who aren't able to participate regularly.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Can you tell me more about those challenges? How did you discover each one? What has the impact been? What were some strategies you used to address them?&lt;/strong>&lt;/p>
&lt;p>There are no easy answers, but having more contributors and maintainers greatly helps! Overall the CNCF community is great to work with and is very welcoming to beginners. So, if folks out there are hesitating to get involved, I highly encourage them to attend a WG or SIG meeting and just listen in.&lt;/p>
&lt;p>It often takes a few meetings to fully understand the discussions, so don't feel discouraged if you don't grasp everything right away. We made a point to emphasize this and encouraged new members to review documentation as a starting point for getting involved.&lt;/p>
&lt;p>Additionally, differences of opinion were valued and encouraged within the Policy-WG. We adhered to the CNCF core values and resolve disagreements by maintaining respect for one another. We also strove to timebox our decisions and assign clear responsibilities to keep things moving forward.&lt;/p>
&lt;hr>
&lt;p>This is where our discussion about the Policy Working Group ends. The working group, and especially the people who took part in this article, hope this gave you some insights into the group's aims and workings. You can get more info about Working Groups &lt;a href="https://github.com/kubernetes/community/blob/master/committee-steering/governance/wg-governance.md">here&lt;/a>.&lt;/p></description></item><item><title>Introducing Headlamp Plugin for Karpenter - Scaling and Visibility</title><link>https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/</link><pubDate>Mon, 06 Oct 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/</guid><description>
&lt;p>Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.&lt;/p>
&lt;p>Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.&lt;/p>
&lt;p>The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.&lt;/p>
&lt;p>The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.&lt;/p>
&lt;h2 id="map-view-of-karpenter-resources-and-how-they-relate-to-kubernetes-resources">Map view of Karpenter Resources and how they relate to Kubernetes resources&lt;/h2>
&lt;p>Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.&lt;/p>
&lt;p>&lt;img alt="Map view showing relationships between resources" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/mini-map-view.png">&lt;/p>
&lt;h2 id="visualization-of-karpenter-metrics">Visualization of Karpenter Metrics&lt;/h2>
&lt;p>Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .&lt;/p>
&lt;p>&lt;img alt="NodePool default metrics shown with controls to see different frequencies" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/chart-1.png">&lt;/p>
&lt;p>&lt;img alt="NodeClaim default metrics shown with controls to see different frequencies" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/chart-2.png">&lt;/p>
&lt;h2 id="scaling-decisions">Scaling decisions&lt;/h2>
&lt;p>Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.&lt;/p>
&lt;p>&lt;img alt="Pod Placement Decisions data including reason, from, pod, message, and age" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/pod-decisions.png">&lt;/p>
&lt;p>&lt;img alt="Node decision data including Type, Reason, Node, From, Message" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/node-decisions.png">&lt;/p>
&lt;h2 id="config-editor-with-validation-support">Config editor with validation support&lt;/h2>
&lt;p>Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.&lt;br>
&lt;img alt="Config editor with validation support" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/config-editor.png">&lt;/p>
&lt;h2 id="real-time-view-of-karpenter-resources">Real time view of Karpenter resources&lt;/h2>
&lt;p>View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.&lt;/p>
&lt;p>&lt;img alt="Node claims data including Name, Status, Instance Type, CPU, Zone, Age, and Actions" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/node-claims.png">&lt;/p>
&lt;p>&lt;img alt="Node Pools data including Name, NodeClass, CPU, Memory, Nodes, Status, Age, Actions" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/nodepools.png">&lt;/p>
&lt;p>&lt;img alt="EC2 Node Classes data including Name, Cluster, Instance Profile, Status, IAM Role, Age, and Actions" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/nodeclass.png">&lt;/p>
&lt;h2 id="dashboard-for-pending-pods">Dashboard for Pending Pods&lt;/h2>
&lt;p>View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.&lt;/p>
&lt;p>&lt;img alt="Pending Pods data including Name, Namespace, Type, Reason, From, and Message" src="https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/pending-pods.png">&lt;/p>
&lt;h3 id="karpenter-providers">&lt;strong>Karpenter Providers&lt;/strong>&lt;/h3>
&lt;p>This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Provider Name&lt;/th>
&lt;th>Tested&lt;/th>
&lt;th>Extra provider specific info supported&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;a href="https://github.com/aws/karpenter-provider-aws">AWS&lt;/a>&lt;/td>
&lt;td>✅&lt;/td>
&lt;td>✅&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/Azure/karpenter-provider-azure">Azure&lt;/a>&lt;/td>
&lt;td>✅&lt;/td>
&lt;td>✅&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/cloudpilot-ai/karpenter-provider-alibabacloud">AlibabaCloud&lt;/a>&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>❌&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/bizflycloud/karpenter-provider-bizflycloud">Bizfly Cloud&lt;/a>&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>❌&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/kubernetes-sigs/karpenter-provider-cluster-api">Cluster API&lt;/a>&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>❌&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/cloudpilot-ai/karpenter-provider-gcp">GCP&lt;/a>&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>❌&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/sergelogvinov/karpenter-provider-proxmox">Proxmox&lt;/a>&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>❌&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;a href="https://github.com/zoom/karpenter-oci">Oracle Cloud Infrastructure (OCI)&lt;/a>&lt;/td>
&lt;td>❌&lt;/td>
&lt;td>❌&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>Please &lt;a href="https://github.com/headlamp-k8s/plugins/issues">submit an issue&lt;/a> if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).&lt;/p>
&lt;h2 id="how-to-use">How to use&lt;/h2>
&lt;p>Please see the &lt;a href="https://github.com/headlamp-k8s/plugins/tree/main/karpenter">plugins/karpenter/README.md&lt;/a> for instructions on how to use.&lt;/p>
&lt;h2 id="feedback-and-questions">Feedback and Questions&lt;/h2>
&lt;p>Please &lt;a href="https://github.com/headlamp-k8s/plugins/issues">submit an issue&lt;/a> if you use Karpenter and have any other ideas or feedback. Or come to the &lt;a href="https://kubernetes.slack.com/?redir=%2Fmessages%2Fheadlamp">Kubernetes slack headlamp channel&lt;/a> for a chat.&lt;/p></description></item><item><title>Announcing Changed Block Tracking API support (alpha)</title><link>https://kubernetes.io/blog/2025/09/25/csi-changed-block-tracking/</link><pubDate>Thu, 25 Sep 2025 05:00:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/25/csi-changed-block-tracking/</guid><description>
&lt;p>We're excited to announce the alpha support for a &lt;em>changed block tracking&lt;/em> mechanism. This enhances
the Kubernetes storage ecosystem by providing an efficient way for
&lt;a href="https://kubernetes.io/docs/concepts/storage/volumes/#csi">CSI&lt;/a> storage drivers to identify changed
blocks in PersistentVolume snapshots. With a driver that can use the feature, you could benefit
from faster and more resource-efficient backup operations.&lt;/p>
&lt;p>If you're eager to try this feature, you can &lt;a href="#getting-started">skip to the Getting Started section&lt;/a>.&lt;/p>
&lt;h2 id="what-is-changed-block-tracking">What is changed block tracking?&lt;/h2>
&lt;p>Changed block tracking enables storage systems to identify and track modifications at the block level
between snapshots, eliminating the need to scan entire volumes during backup operations. The
improvement is a change to the Container Storage Interface (CSI), and also to the storage support
in Kubernetes itself.
With the alpha feature enabled, your cluster can:&lt;/p>
&lt;ul>
&lt;li>Identify allocated blocks within a CSI volume snapshot&lt;/li>
&lt;li>Determine changed blocks between two snapshots of the same volume&lt;/li>
&lt;li>Streamline backup operations by focusing only on changed data blocks&lt;/li>
&lt;/ul>
&lt;p>For Kubernetes users managing large datasets, this API enables significantly more efficient
backup processes. Backup applications can now focus only on the blocks that have changed,
rather than processing entire volumes.&lt;/p>
&lt;div class="alert alert-info" role="alert">&lt;h4 class="alert-heading">Note:&lt;/h4>As of now, the Changed Block Tracking API is supported only for block volumes and not for
file volumes. CSI drivers that manage file-based storage systems will not be able to
implement this capability.&lt;/div>
&lt;h2 id="benefits-of-changed-block-tracking-support-in-kubernetes">Benefits of changed block tracking support in Kubernetes&lt;/h2>
&lt;p>As Kubernetes adoption grows for stateful workloads managing critical data, the need for efficient
backup solutions becomes increasingly important. Traditional full backup approaches face challenges with:&lt;/p>
&lt;ul>
&lt;li>&lt;em>Long backup windows&lt;/em>: Full volume backups can take hours for large datasets, making it difficult
to complete within maintenance windows.&lt;/li>
&lt;li>&lt;em>High resource utilization&lt;/em>: Backup operations consume substantial network bandwidth and I/O
resources, especially for large data volumes and data-intensive applications.&lt;/li>
&lt;li>&lt;em>Increased storage costs&lt;/em>: Repetitive full backups store redundant data, causing storage
requirements to grow linearly even when only a small percentage of data actually changes between
backups.&lt;/li>
&lt;/ul>
&lt;p>The Changed Block Tracking API addresses these challenges by providing native Kubernetes support for
incremental backup capabilities through the CSI interface.&lt;/p>
&lt;h2 id="key-components">Key components&lt;/h2>
&lt;p>The implementation consists of three primary components:&lt;/p>
&lt;ol>
&lt;li>&lt;em>CSI SnapshotMetadata Service API&lt;/em>: An API, offered by gRPC, that provides volume
snapshot and changed block data.&lt;/li>
&lt;li>&lt;em>SnapshotMetadataService API&lt;/em>: A Kubernetes CustomResourceDefinition (CRD) that
advertises CSI driver metadata service availability and connection details to
cluster clients.&lt;/li>
&lt;li>&lt;em>External Snapshot Metadata Sidecar&lt;/em>: An intermediary component that connects CSI
drivers to backup applications via a standardized gRPC interface.&lt;/li>
&lt;/ol>
&lt;h2 id="implementation-requirements">Implementation requirements&lt;/h2>
&lt;h3 id="storage-provider-responsibilities">Storage provider responsibilities&lt;/h3>
&lt;p>If you're an author of a storage integration with Kubernetes and want to support the changed block tracking feature, you must implement specific requirements:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;em>Implement CSI RPCs&lt;/em>: Storage providers need to implement the &lt;code>SnapshotMetadata&lt;/code> service as defined in the &lt;a href="https://github.com/container-storage-interface/spec/blob/master/csi.proto">CSI specifications protobuf&lt;/a>. This service requires server-side streaming implementations for the following RPCs:&lt;/p>
&lt;ul>
&lt;li>&lt;code>GetMetadataAllocated&lt;/code>: For identifying allocated blocks in a snapshot&lt;/li>
&lt;li>&lt;code>GetMetadataDelta&lt;/code>: For determining changed blocks between two snapshots&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Storage backend capabilities&lt;/em>: Ensure the storage backend has the capability to track and report block-level changes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Deploy external components&lt;/em>: Integrate with the &lt;code>external-snapshot-metadata&lt;/code> sidecar to expose the snapshot metadata service.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Register custom resource&lt;/em>: Register the &lt;code>SnapshotMetadataService&lt;/code> resource using a CustomResourceDefinition and create a &lt;code>SnapshotMetadataService&lt;/code> custom resource that advertises the availability of the metadata service and provides connection details.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Support error handling&lt;/em>: Implement proper error handling for these RPCs according to the CSI specification requirements.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="backup-solution-responsibilities">Backup solution responsibilities&lt;/h3>
&lt;p>A backup solution looking to leverage this feature must:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;em>Set up authentication&lt;/em>: The backup application must provide a Kubernetes ServiceAccount token when using the
&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3314-csi-changed-block-tracking#the-kubernetes-snapshotmetadata-service-api">Kubernetes SnapshotMetadataService API&lt;/a>.
Appropriate access grants, such as RBAC RoleBindings, must be established to authorize the backup application
ServiceAccount to obtain such tokens.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Implement streaming client-side code&lt;/em>: Develop clients that implement the streaming gRPC APIs defined in the
&lt;a href="https://github.com/kubernetes-csi/external-snapshot-metadata/blob/main/proto/schema.proto">schema.proto&lt;/a> file.
Specifically:&lt;/p>
&lt;ul>
&lt;li>Implement streaming client code for &lt;code>GetMetadataAllocated&lt;/code> and &lt;code>GetMetadataDelta&lt;/code> methods&lt;/li>
&lt;li>Handle server-side streaming responses efficiently as the metadata comes in chunks&lt;/li>
&lt;li>Process the &lt;code>SnapshotMetadataResponse&lt;/code> message format with proper error handling&lt;/li>
&lt;/ul>
&lt;p>The &lt;code>external-snapshot-metadata&lt;/code> GitHub repository provides a convenient
&lt;a href="https://github.com/kubernetes-csi/external-snapshot-metadata/tree/master/pkg/iterator">iterator&lt;/a>
support package to simplify client implementation.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Handle large dataset streaming&lt;/em>: Design clients to efficiently handle large streams of block metadata that
could be returned for volumes with significant changes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>Optimize backup processes&lt;/em>: Modify backup workflows to use the changed block metadata to identify and only
transfer changed blocks to make backups more efficient, reducing both backup duration and resource consumption.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="getting-started">Getting started&lt;/h2>
&lt;p>To use changed block tracking in your cluster:&lt;/p>
&lt;ol>
&lt;li>Ensure your CSI driver supports volume snapshots and implements the snapshot metadata capabilities with the required &lt;code>external-snapshot-metadata&lt;/code> sidecar&lt;/li>
&lt;li>Make sure the SnapshotMetadataService custom resource is registered using CRD&lt;/li>
&lt;li>Verify the presence of a SnapshotMetadataService custom resource for your CSI driver&lt;/li>
&lt;li>Create clients that can access the API using appropriate authentication (via Kubernetes ServiceAccount tokens)&lt;/li>
&lt;/ol>
&lt;p>The API provides two main functions:&lt;/p>
&lt;ul>
&lt;li>&lt;code>GetMetadataAllocated&lt;/code>: Lists blocks allocated in a single snapshot&lt;/li>
&lt;li>&lt;code>GetMetadataDelta&lt;/code>: Lists blocks changed between two snapshots&lt;/li>
&lt;/ul>
&lt;h2 id="what-s-next">What’s next?&lt;/h2>
&lt;p>Depending on feedback and adoption, the Kubernetes developers hope to push the CSI Snapshot Metadata implementation to Beta in the future releases.&lt;/p>
&lt;h2 id="where-can-i-learn-more">Where can I learn more?&lt;/h2>
&lt;p>For those interested in trying out this new feature:&lt;/p>
&lt;ul>
&lt;li>Official Kubernetes CSI Developer &lt;a href="https://kubernetes-csi.github.io/docs/external-snapshot-metadata.html">Documentation&lt;/a>&lt;/li>
&lt;li>The &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3314-csi-changed-block-tracking">enhancement proposal&lt;/a> for the snapshot metadata feature.&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes-csi/external-snapshot-metadata">GitHub repository&lt;/a> for implementation and release status of &lt;code>external-snapshot-metadata&lt;/code>&lt;/li>
&lt;li>Complete gRPC protocol definitions for snapshot metadata API: &lt;a href="https://github.com/kubernetes-csi/external-snapshot-metadata/blob/main/proto/schema.proto">schema.proto&lt;/a>&lt;/li>
&lt;li>Example snapshot metadata client implementation: &lt;a href="https://github.com/kubernetes-csi/external-snapshot-metadata/tree/main/examples/snapshot-metadata-lister">snapshot-metadata-lister&lt;/a>&lt;/li>
&lt;li>End-to-end example with csi-hostpath-driver: &lt;a href="https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/docs/example-ephemeral.md">example documentation&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="how-do-i-get-involved">How do I get involved?&lt;/h2>
&lt;p>This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together.
On behalf of SIG Storage, I would like to offer a huge thank you to the contributors who helped review the design and implementation of the project, including but not limited to the following:&lt;/p>
&lt;ul>
&lt;li>Ben Swartzlander (&lt;a href="https://github.com/bswartz">bswartz&lt;/a>)&lt;/li>
&lt;li>Carl Braganza (&lt;a href="https://github.com/carlbraganza">carlbraganza&lt;/a>)&lt;/li>
&lt;li>Daniil Fedotov (&lt;a href="https://github.com/hairyhum">hairyhum&lt;/a>)&lt;/li>
&lt;li>Ivan Sim (&lt;a href="https://github.com/ihcsim">ihcsim&lt;/a>)&lt;/li>
&lt;li>Nikhil Ladha (&lt;a href="https://github.com/Nikhil-Ladha">Nikhil-Ladha&lt;/a>)&lt;/li>
&lt;li>Prasad Ghangal (&lt;a href="https://github.com/PrasadG193">PrasadG193&lt;/a>)&lt;/li>
&lt;li>Praveen M (&lt;a href="https://github.com/iPraveenParihar">iPraveenParihar&lt;/a>)&lt;/li>
&lt;li>Rakshith R (&lt;a href="https://github.com/Rakshith-R">Rakshith-R&lt;/a>)&lt;/li>
&lt;li>Xing Yang (&lt;a href="https://github.com/xing-yang">xing-yang&lt;/a>)&lt;/li>
&lt;/ul>
&lt;p>Thank also to everyone who has contributed to the project, including others who helped review the
&lt;a href="https://github.com/kubernetes/enhancements/pull/4082">KEP&lt;/a> and the
&lt;a href="https://github.com/container-storage-interface/spec/pull/551">CSI spec PR&lt;/a>&lt;/p>
&lt;p>For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system,
join the &lt;a href="https://github.com/kubernetes/community/tree/master/sig-storage">Kubernetes Storage Special Interest Group&lt;/a> (SIG).
We always welcome new contributors.&lt;/p>
&lt;p>The SIG also holds regular &lt;a href="https://docs.google.com/document/d/15tLCV3csvjHbKb16DVk-mfUmFry_Rlwo-2uG6KNGsfw/edit">Data Protection Working Group meetings&lt;/a>.
New attendees are welcome to join our discussions.&lt;/p></description></item><item><title>Kubernetes v1.34: Pod Level Resources Graduated to Beta</title><link>https://kubernetes.io/blog/2025/09/22/kubernetes-v1-34-pod-level-resources/</link><pubDate>Mon, 22 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/22/kubernetes-v1-34-pod-level-resources/</guid><description>
&lt;p>On behalf of the Kubernetes community, I am thrilled to announce that the Pod Level Resources feature has graduated to Beta in the Kubernetes v1.34 release and is enabled by default! This significant milestone introduces a new layer of flexibility for defining and managing resource allocation for your Pods. This flexibility stems from the ability to specify CPU and memory resources for the Pod as a whole. Pod level resources can be combined with the container-level specifications to express the exact resource requirements and limits your application needs.&lt;/p>
&lt;h2 id="pod-level-specification-for-resources">Pod-level specification for resources&lt;/h2>
&lt;p>Until recently, resource specifications that applied to Pods were primarily defined
at the individual container level. While effective, this approach sometimes required
duplicating or meticulously calculating resource needs across multiple containers
within a single Pod. As a beta feature, Kubernetes allows you to specify the CPU,
memory and hugepages resources at the Pod-level. This means you can now define
resource requests and limits for an entire Pod, enabling easier resource sharing
without requiring granular, per-container management of these resources where
it's not needed.&lt;/p>
&lt;h2 id="why-does-pod-level-specification-matter">Why does Pod-level specification matter?&lt;/h2>
&lt;p>This feature enhances resource management in Kubernetes by offering &lt;em>flexible resource management&lt;/em> at both the Pod and container levels.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>It provides a consolidated approach to resource declaration, reducing the need for
meticulous, per-container management, especially for Pods with multiple
containers.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pod-level resources enable containers within a pod to share unused resoures
amongst themselves, promoting efficient utilization within the pod. For example,
it prevents sidecar containers from becoming performance bottlenecks. Previously,
a sidecar (e.g., a logging agent or service mesh proxy) hitting its individual CPU
limit could be throttled and slow down the entire Pod, even if the main
application container had plenty of spare CPU. With pod-level resources, the
sidecar and the main container can share Pod's resource budget, ensuring smooth
operation during traffic spikes - either the whole Pod is throttled or all
containers work.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>When both pod-level and container-level resources are specified, pod-level
requests and limits take precedence. This gives you – and cluster administrators -
a powerful way to enforce overall resource boundaries for your Pods.&lt;/p>
&lt;p>For scheduling, if a pod-level request is explicitly defined, the scheduler uses
that specific value to find a suitable node, insteaf of the aggregated requests of
the individual containers. At runtime, the pod-level limit acts as a hard ceiling
for the combined resource usage of all containers. Crucially, this pod-level limit
is the absolute enforcer; even if the sum of the individual container limits is
higher, the total resource consumption can never exceed the pod-level limit.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pod-level resources are &lt;strong>prioritized&lt;/strong> in influencing the Quality of Service (QoS) class of the Pod.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For Pods running on Linux nodes, the Out-Of-Memory (OOM) score adjustment
calculation considers both pod-level and container-level resources requests.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pod-level resources are &lt;strong>designed to be compatible with existing Kubernetes functionalities&lt;/strong>, ensuring a smooth integration into your workflows.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="how-to-specify-resources-for-an-entire-pod">How to specify resources for an entire Pod&lt;/h2>
&lt;p>Using &lt;code>PodLevelResources&lt;/code> &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/">feature
gate&lt;/a> requires
Kubernetes v1.34 or newer for all cluster components, including the control plane
and every node. This feature gate is in beta and enabled by default in v1.34.&lt;/p>
&lt;h3 id="example-manifest">Example manifest&lt;/h3>
&lt;p>You can specify CPU, memory and hugepages resources directly in the Pod spec manifest at the &lt;code>resources&lt;/code> field for the entire Pod.&lt;/p>
&lt;p>Here’s an example demonstrating a Pod with both CPU and memory requests and limits
defined at the Pod level:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>pod-resources-demo&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">namespace&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>pod-resources-example&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># The &amp;#39;resources&amp;#39; field at the Pod specification level defines the overall&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># resource budget for all containers within this Pod combined.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># Pod-level resources&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># &amp;#39;limits&amp;#39; specifies the maximum amount of resources the Pod is allowed to use.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># The sum of the limits of all containers in the Pod cannot exceed these values.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">limits&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;1&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># The entire Pod cannot use more than 1 CPU core.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;200Mi&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># The entire Pod cannot use more than 200 MiB of memory.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># &amp;#39;requests&amp;#39; specifies the minimum amount of resources guaranteed to the Pod.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># This value is used by the Kubernetes scheduler to find a node with enough capacity.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requests&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;1&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># The Pod is guaranteed 1 CPU core when scheduled.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;100Mi&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># The Pod is guaranteed 100 MiB of memory when scheduled.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>main-app-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>nginx&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>...&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># This container has no resource requests or limits specified.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>auxiliary-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>fedora&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;sleep&amp;#34;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;inf&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>...&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># This container has no resource requests or limits specified.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In this example, the &lt;code>pod-resources-demo&lt;/code> Pod as a whole requests 1 CPU and 100 MiB of memory, and is limited to 1 CPU and 200 MiB of memory. The containers within will operate under these overall Pod-level constraints, as explained in the next section.&lt;/p>
&lt;h3 id="interaction-with-container-level-resource-requests-or-limits">Interaction with container-level resource requests or limits&lt;/h3>
&lt;p>When both pod-level and container-level resources are specified, &lt;strong>pod-level requests and limits take precedence&lt;/strong>. This means the node allocates resources based on the pod-level specifications.&lt;/p>
&lt;p>Consider a Pod with two containers where pod-level CPU and memory requests and
limits are defined, and only one container has its own explicit resource
definitions:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>pod-resources-demo&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">namespace&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>pod-resources-example&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">limits&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;1&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;200Mi&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requests&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;1&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;100Mi&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>main-app-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>nginx&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requests&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cpu&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;0.5&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;50Mi&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>auxiliary-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>fedora&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;sleep&amp;#34;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;inf&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># This container has no resource requests or limits specified.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ul>
&lt;li>
&lt;p>Pod-Level Limits: The pod-level limits (cpu: &amp;quot;1&amp;quot;, memory: &amp;quot;200Mi&amp;quot;) establish an absolute boundary for the entire Pod. The sum of resources consumed by all its containers is enforced at this ceiling and cannot be surpassed.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Resource Sharing and Bursting: Containers can dynamically borrow any unused capacity, allowing them to burst as needed, so long as the Pod's aggregate usage stays within the overall limit.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pod-Level Requests: The pod-level requests (cpu: &amp;quot;1&amp;quot;, memory: &amp;quot;100Mi&amp;quot;) serve as the foundational resource guarantee for the entire Pod. This value informs the scheduler's placement decision and represents the minimum resources the Pod can rely on during node-level contention.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Container-Level Requests: Container-level requests create a priority system within
the Pod's guaranteed budget. Because main-app-container has an explicit request
(cpu: &amp;quot;0.5&amp;quot;, memory: &amp;quot;50Mi&amp;quot;), it is given precedence for its share of resources
under resource pressure over the auxiliary-container, which has no
such explicit claim.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="limitations">Limitations&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>First of all, &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/#pod-update-and-replacement">in-place
resize&lt;/a> of pod-level
resources is &lt;strong>not supported&lt;/strong> for Kubernetes v1.34 (or earlier). Attempting to
modify the &lt;em>pod-level&lt;/em> resource limits or requests on a running Pod results in an
error: the resize is rejected. The v1.34 implementation of Pod level resources
focuses on allowing initial declaration of an overall resource envelope, that
applies to the &lt;strong>entire Pod&lt;/strong>. That is distinct from in-place pod resize, which
(despite what the name might suggest) allows you
to make dynamic adjustments to &lt;em>container&lt;/em> resource
requests and limits, within a &lt;em>running&lt;/em> Pod,
and potentially without a container restart. In-place resizing is also not yet a
stable feature; it graduated to Beta in the v1.33 release.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Only CPU, memory, and hugepages resources can be specified at pod-level.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pod-level resources are not supported for Windows pods. If the Pod specification
explicitly targets Windows (e.g., by setting spec.os.name: &amp;quot;windows&amp;quot;), the API
server will reject the Pod during the validation step. If the Pod is not explicitly
marked for Windows but is scheduled to a Windows node (e.g., via a nodeSelector),
the Kubelet on that Windows node will reject the Pod during its admission process.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The Topology Manager, Memory Manager and CPU Manager do not
align pods and containers based on pod-level resources as these resource managers
don't currently support pod-level resources.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="getting-started-and-providing-feedback">Getting started and providing feedback&lt;/h4>
&lt;p>Ready to explore &lt;em>Pod Level Resources&lt;/em> feature? You'll need a Kubernetes cluster running version 1.34 or later. Remember to enable the &lt;code>PodLevelResources&lt;/code> &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/">feature gate&lt;/a> across your control plane and all nodes.&lt;/p>
&lt;p>As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels:&lt;/p>
&lt;ul>
&lt;li>Slack: &lt;a href="https://kubernetes.slack.com/messages/sig-node">#sig-node&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://groups.google.com/forum/#!forum/kubernetes-sig-node">Mailing list&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/community/labels/sig%2Fnode">Open Community Issues/PRs&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.34: Recovery From Volume Expansion Failure (GA)</title><link>https://kubernetes.io/blog/2025/09/19/kubernetes-v1-34-recover-expansion-failure/</link><pubDate>Fri, 19 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/19/kubernetes-v1-34-recover-expansion-failure/</guid><description>
&lt;p>Have you ever made a typo when expanding your persistent volumes in Kubernetes? Meant to specify &lt;code>2TB&lt;/code>
but specified &lt;code>20TiB&lt;/code>? This seemingly innocuous problem was kinda hard to fix - and took the project almost 5 years to fix.
&lt;a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes">Automated recovery from storage expansion&lt;/a> has been around for a while in beta; however, with the v1.34 release, we have graduated this to
&lt;strong>general availability&lt;/strong>.&lt;/p>
&lt;p>While it was always possible to recover from failing volume expansions manually, it usually required cluster-admin access and was tedious to do (See aformentioned link for more information).&lt;/p>
&lt;p>What if you make a mistake and then realize immediately?
With Kubernetes v1.34, you should be able to reduce the requested size of the PersistentVolumeClaim (PVC) and, as long as the expansion to previously requested
size hadn't finished, you can amend the size requested. Kubernetes will
automatically work to correct it. Any quota consumed by failed expansion will be returned to the user and the associated PersistentVolume should be resized to the
latest size you specified.&lt;/p>
&lt;p>I'll walk through an example of how all of this works.&lt;/p>
&lt;h2 id="reducing-pvc-size-to-recover-from-failed-expansion">Reducing PVC size to recover from failed expansion&lt;/h2>
&lt;p>Imagine that you are running out of disk space for one of your database servers, and you want to expand the PVC from previously
specified &lt;code>10TB&lt;/code> to &lt;code>100TB&lt;/code> - but you make a typo and specify &lt;code>1000TB&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>PersistentVolumeClaim&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>myclaim&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">accessModes&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- ReadWriteOnce&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requests&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">storage&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>1000TB&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># newly specified size - but incorrect!&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now, you may be out of disk space on your disk array or simply ran out of allocated quota on your cloud-provider. But, assume that expansion to &lt;code>1000TB&lt;/code> is never going to succeed.&lt;/p>
&lt;p>In Kubernetes v1.34, you can simply correct your mistake and request a new PVC size,
that is smaller than the mistake, provided it is still larger than the original size
of the actual PersistentVolume.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>PersistentVolumeClaim&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>myclaim&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">accessModes&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- ReadWriteOnce&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requests&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">storage&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>100TB&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># Corrected size; has to be greater than 10TB.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># You cannot shrink the volume below its actual size.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This requires no admin intervention. Even better, any surplus Kubernetes quota that you temporarily consumed will be automatically returned.&lt;/p>
&lt;p>This fault recovery mechanism does have a caveat: whatever new size you specify for the PVC, it &lt;strong>must&lt;/strong> be still higher than the original size in &lt;code>.status.capacity&lt;/code>.
Since Kubernetes doesn't support shrinking your PV objects, you can never go below the size that was originally allocated for your PVC request.&lt;/p>
&lt;h2 id="improved-error-handling-and-observability-of-volume-expansion">Improved error handling and observability of volume expansion&lt;/h2>
&lt;p>Implementing what might look like a relatively minor change also required us to almost
fully redo how volume expansion works under the hood in Kubernetes.
There are new API fields available in PVC objects which you can monitor to observe progress of volume expansion.&lt;/p>
&lt;h3 id="improved-observability-of-in-progress-expansion">Improved observability of in-progress expansion&lt;/h3>
&lt;p>You can query &lt;code>.status.allocatedResourceStatus['storage']&lt;/code> of a PVC to monitor progress of a volume expansion operation.
For a typical block volume, this should transition between &lt;code>ControllerResizeInProgress&lt;/code>, &lt;code>NodeResizePending&lt;/code> and &lt;code>NodeResizeInProgress&lt;/code> and become nil/empty when volume expansion has finished.&lt;/p>
&lt;p>If for some reason, volume expansion to requested size is not feasible it should accordingly be in states like - &lt;code>ControllerResizeInfeasible&lt;/code> or &lt;code>NodeResizeInfeasible&lt;/code>.&lt;/p>
&lt;p>You can also observe size towards which Kubernetes is working by watching &lt;code>pvc.status.allocatedResources&lt;/code>.&lt;/p>
&lt;h3 id="improved-error-handling-and-reporting">Improved error handling and reporting&lt;/h3>
&lt;p>Kubernetes should now retry your failed volume expansions at slower rate, it should make fewer requests to both storage system and Kubernetes apiserver.&lt;/p>
&lt;p>Errors observerd during volume expansion are now reported as condition on PVC objects and should persist unlike events. Kubernetes will now populate &lt;code>pvc.status.conditions&lt;/code> with error keys &lt;code>ControllerResizeError&lt;/code> or &lt;code>NodeResizeError&lt;/code> when volume expansion fails.&lt;/p>
&lt;h3 id="fixes-long-standing-bugs-in-resizing-workflows">Fixes long standing bugs in resizing workflows&lt;/h3>
&lt;p>This feature also has allowed us to fix long standing bugs in resizing workflow such as &lt;a href="https://github.com/kubernetes/kubernetes/issues/115294">Kubernetes issue #115294&lt;/a>.
If you observe anything broken, please report your bugs to &lt;a href="https://github.com/kubernetes/kubernetes/issues/new/choose">https://github.com/kubernetes/kubernetes/issues&lt;/a>, along with details about how to reproduce the problem.&lt;/p>
&lt;p>Working on this feature through its lifecycle was challenging and it wouldn't have been possible to reach GA
without feedback from &lt;a href="https://github.com/msau42">@msau42&lt;/a>, &lt;a href="https://github.com/jsafrane">@jsafrane&lt;/a> and &lt;a href="https://github.com/xing-yang">@xing-yang&lt;/a>.&lt;/p>
&lt;p>All of the contributors who worked on this also appreciate the input provided by &lt;a href="https://github.com/thockin">@thockin&lt;/a> and &lt;a href="https://github.comliggitt">@liggitt&lt;/a> at various Kubernetes contributor summits.&lt;/p></description></item><item><title>Kubernetes v1.34: DRA Consumable Capacity</title><link>https://kubernetes.io/blog/2025/09/18/kubernetes-v1-34-dra-consumable-capacity/</link><pubDate>Thu, 18 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/18/kubernetes-v1-34-dra-consumable-capacity/</guid><description>
&lt;p>Dynamic Resource Allocation (DRA) is a Kubernetes API for managing scarce resources across Pods and containers.
It enables flexible resource requests, going beyond simply allocating &lt;em>N&lt;/em> number of devices to support more granular usage scenarios.
With DRA, users can request specific types of devices based on their attributes, define custom configurations tailored to their workloads, and even share the same resource among multiple containers or Pods.&lt;/p>
&lt;p>In this blog, we focus on the device sharing feature and dive into a new capability introduced in Kubernetes 1.34: &lt;em>DRA consumable capacity&lt;/em>,
which extends DRA to support finer-grained device sharing.&lt;/p>
&lt;h2 id="background-device-sharing-via-resourceclaims">Background: device sharing via ResourceClaims&lt;/h2>
&lt;p>From the beginning, DRA introduced the ability for multiple Pods to share a device by referencing the same ResourceClaim.
This design decouples resource allocation from specific hardware, allowing for more dynamic and reusable provisioning of devices.&lt;/p>
&lt;p>In Kubernetes 1.33, the new support for &lt;em>partitionable devices&lt;/em> allowed resource drivers to advertise slices of a device that are available, rather than exposing the entire device as an all-or-nothing resource.
This enabled Kubernetes to model shareable hardware more accurately.&lt;/p>
&lt;p>But there was still a missing piece: it didn't yet support scenarios
where the device driver manages fine-grained, dynamic portions of a device resource — like network bandwidth — based on user demand,
or to share those resources independently of ResourceClaims, which are restricted by their spec and namespace.&lt;/p>
&lt;p>That’s where &lt;em>consumable capacity&lt;/em> for DRA comes in.&lt;/p>
&lt;h2 id="benefits-of-dra-consumable-capacity-support">Benefits of DRA consumable capacity support&lt;/h2>
&lt;p>Here's a taste of what you get in a cluster with the &lt;code>DRAConsumableCapacity&lt;/code>
&lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/">feature gate&lt;/a> enabled.&lt;/p>
&lt;h3 id="device-sharing-across-multiple-resourceclaims-or-devicerequests">Device sharing across multiple ResourceClaims or DeviceRequests&lt;/h3>
&lt;p>Resource drivers can now support sharing the same device — or even a slice of a device — across multiple ResourceClaims or across multiple DeviceRequests.&lt;/p>
&lt;p>This means that Pods from different namespaces can simultaneously share the same device,
if permitted and supported by the specific DRA driver.&lt;/p>
&lt;h3 id="device-resource-allocation">Device resource allocation&lt;/h3>
&lt;p>Kubernetes extends the allocation algorithm in the scheduler to support allocating a portion of a device's resources, as defined in the &lt;code>capacity&lt;/code> field.
The scheduler ensures that the total allocated capacity across all consumers never exceeds the device’s total capacity, even when shared across multiple ResourceClaims or DeviceRequests.
This is very similar to the way the scheduler allows Pods and containers to share allocatable resources on Nodes;
in this case, it allows them to share allocatable (consumable) resources on Devices.&lt;/p>
&lt;p>This feature expands support for scenarios where the device driver is able to manage resources &lt;strong>within&lt;/strong> a device and on a per-process basis — for example,
allocating a specific amount of memory (e.g., 8 GiB) from a virtual GPU,
or setting bandwidth limits on virtual network interfaces allocated to specific Pods. This aims to provide safe and efficient resource sharing.&lt;/p>
&lt;h3 id="distinctattribute-constraint">DistinctAttribute constraint&lt;/h3>
&lt;p>This feature also introduces a new constraint: &lt;code>DistinctAttribute&lt;/code>, which is the complement of the existing &lt;code>MatchAttribute&lt;/code> constraint.&lt;/p>
&lt;p>The primary goal of &lt;code>DistinctAttribute&lt;/code> is to prevent the same underlying device from being allocated multiple times within a single ResourceClaim, which could happen since we are allocating shares (or subsets) of devices.
This constraint ensures that each allocation refers to a distinct resource, even if they belong to the same device class.&lt;/p>
&lt;p>It is useful for use cases such as allocating network devices connecting to different subnets to expand coverage or provide redundancy across failure domains.&lt;/p>
&lt;h2 id="how-to-use-consumable-capacity">How to use consumable capacity?&lt;/h2>
&lt;p>&lt;code>DRAConsumableCapacity&lt;/code> is introduced as an alpha feature in Kubernetes 1.34. The &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/">feature gate&lt;/a> &lt;code>DRAConsumableCapacity&lt;/code> must be enabled in kubelet, kube-apiserver, kube-scheduler and kube-controller-manager.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>--feature-gates&lt;span style="color:#666">=&lt;/span>...,DRAConsumableCapacity&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#a2f">true&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="as-a-dra-driver-developer">As a DRA driver developer&lt;/h3>
&lt;p>As a DRA driver developer writing in Golang, you can make a device within a ResourceSlice allocatable to multiple ResourceClaims (or &lt;code>devices.requests&lt;/code>) by setting &lt;code>AllowMultipleAllocations&lt;/code> to &lt;code>true&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>Device {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#666">...&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> AllowMultipleAllocations: ptr.&lt;span style="color:#00a000">To&lt;/span>(&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#666">...&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Additionally, you can define a policy to restrict how each device's &lt;code>Capacity&lt;/code> should be consumed by each &lt;code>DeviceRequest&lt;/code> by defining &lt;code>RequestPolicy&lt;/code> field in the &lt;code>DeviceCapacity&lt;/code>.
The example below shows how to define a policy that requires a GPU with 40 GiB of memory to allocate at least 5 GiB per request, with each allocation in multiples of 5 GiB.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>DeviceCapacity{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Value: resource.&lt;span style="color:#00a000">MustParse&lt;/span>(&lt;span style="color:#b44">&amp;#34;40Gi&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> RequestPolicy: &lt;span style="color:#666">&amp;amp;&lt;/span>CapacityRequestPolicy{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Default: ptr.&lt;span style="color:#00a000">To&lt;/span>(resource.&lt;span style="color:#00a000">MustParse&lt;/span>(&lt;span style="color:#b44">&amp;#34;5Gi&amp;#34;&lt;/span>)),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ValidRange: &lt;span style="color:#666">&amp;amp;&lt;/span>CapacityRequestPolicyRange {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Min: ptr.&lt;span style="color:#00a000">To&lt;/span>(resource.&lt;span style="color:#00a000">MustParse&lt;/span>(&lt;span style="color:#b44">&amp;#34;5Gi&amp;#34;&lt;/span>)),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Step: ptr.&lt;span style="color:#00a000">To&lt;/span>(resource.&lt;span style="color:#00a000">MustParse&lt;/span>(&lt;span style="color:#b44">&amp;#34;5Gi&amp;#34;&lt;/span>)),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This will be published to the ResourceSlice, as partially shown below:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>resource.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ResourceSlice&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">devices&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gpu0&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">allowMultipleAllocations&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">capacity&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>40Gi&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requestPolicy&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">default&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>5Gi&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">validRange&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">min&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>5Gi&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">step&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>5Gi&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>An allocated device with a specified portion of consumed capacity will have a &lt;code>ShareID&lt;/code> field set in the allocation status.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>claim.Status.Allocation.Devices.Results[i].ShareID
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This &lt;code>ShareID&lt;/code> allows the driver to distinguish between different allocations that refer to the &lt;strong>same device or same statically-partitioned slice&lt;/strong> but come from &lt;strong>different &lt;code>ResourceClaim&lt;/code> requests&lt;/strong>.&lt;br>
It acts as a unique identifier for each shared slice, enabling the driver to manage and enforce resource limits independently across multiple consumers.&lt;/p>
&lt;h3 id="as-a-consumer">As a consumer&lt;/h3>
&lt;p>As a consumer (or user), the device resource can be requested with a ResourceClaim like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>resource.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ResourceClaim&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">devices&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requests&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># for devices&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>req0&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">exactly&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">deviceClassName&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>resource.example.com&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">capacity&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requests&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># for resources which must be provided by those devices&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">memory&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>10Gi&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This configuration ensures that the requested device can provide at least 10GiB of &lt;code>memory&lt;/code>.&lt;/p>
&lt;p>Notably that &lt;strong>any&lt;/strong> &lt;code>resource.example.com&lt;/code> device that has at least 10GiB of memory can be allocated.
If a device that does not support multiple allocations is chosen, the allocation would consume the entire device.
To filter only devices that support multiple allocations, you can define a selector like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">selectors&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">cel&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">expression&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>|-&lt;span style="color:#b44;font-style:italic">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b44;font-style:italic"> device.allowMultipleAllocations == true&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="integration-with-dra-device-status">Integration with DRA device status&lt;/h2>
&lt;p>In device sharing, general device information is provided through the resource slice.
However, some details are set dynamically after allocation.
These can be conveyed using the &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#resourceclaim-device-status">&lt;code>.status.devices&lt;/code>&lt;/a> field of a ResourceClaim.
That field is only published in clusters where the &lt;code>DRAResourceClaimDeviceStatus&lt;/code>
feature gate is enabled.&lt;/p>
&lt;p>If you do have &lt;em>device status&lt;/em> support available, a driver can expose additional device-specific information beyond the &lt;code>ShareID&lt;/code>.
One particularly useful use case is for virtual networks, where a driver can include the assigned IP address(es) in the status.
This is valuable for both network service operations and troubleshooting.&lt;/p>
&lt;p>You can find more information by watching our recording at: &lt;a href="https://sched.co/1x71v">KubeCon Japan 2025 - Reimagining Cloud Native Networks: The Critical Role of DRA&lt;/a>.&lt;/p>
&lt;h2 id="what-can-you-do-next">What can you do next?&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Check out the &lt;a href="https://github.com/kubernetes-sigs/cni-dra-driver">CNI DRA Driver project&lt;/a>&lt;/strong> for an example of DRA integration in Kubernetes networking. Try integrating with network resources like &lt;code>macvlan&lt;/code>, &lt;code>ipvlan&lt;/code>, or smart NICs.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Start enabling the &lt;code>DRAConsumableCapacity&lt;/code> feature gate and experimenting with virtualized or partitionable devices. Specify your workloads with &lt;em>consumable capacity&lt;/em> (for example: fractional bandwidth or memory).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Let us know your feedback:&lt;/p>
&lt;ul>
&lt;li>✅ What worked well?&lt;/li>
&lt;li>⚠️ What didn’t?&lt;/li>
&lt;/ul>
&lt;p>If you encountered issues to fix or opportunities to enhance,
please &lt;a href="https://github.com/kubernetes/enhancements/issues">file a new issue&lt;/a>
and reference &lt;a href="https://github.com/kubernetes/enhancements/issues/5075">KEP-5075&lt;/a> there,
or reach out via &lt;a href="https://kubernetes.slack.com/archives/C0409NGC1TK">Slack (#wg-device-management)&lt;/a>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="conclusion">Conclusion&lt;/h3>
&lt;p>Consumable capacity support enhances the device sharing capability of DRA by allowing effective device sharing across namespaces, across claims, and tailored to each Pod’s actual needs.
It also empowers drivers to enforce capacity limits, improves scheduling accuracy, and unlocks new use cases like bandwidth-aware networking and multi-tenant device sharing.&lt;/p>
&lt;p>Try it out, experiment with consumable resources, and help shape the future of dynamic resource allocation in Kubernetes!&lt;/p>
&lt;h3 id="further-reading">Further Reading&lt;/h3>
&lt;ul>
&lt;li>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/">DRA in the Kubernetes documentation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/4815-dra-partitionable-devices">KEP for DRA Partitionable Devices&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4817-resource-claim-device-status">KEP for DRA Device Status&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/5075-dra-consumable-capacity">KEP for DRA Consumable Capacity&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.kubernetes.dev/resources/release/#kubernetes-v134">Kubernetes 1.34 Release Notes&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.34: Pods Report DRA Resource Health</title><link>https://kubernetes.io/blog/2025/09/17/kubernetes-v1-34-pods-report-dra-resource-health/</link><pubDate>Wed, 17 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/17/kubernetes-v1-34-pods-report-dra-resource-health/</guid><description>
&lt;p>The rise of AI/ML and other high-performance workloads has made specialized hardware like GPUs, TPUs, and FPGAs a critical component of many Kubernetes clusters. However, as discussed in a &lt;a href="https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/">previous blog post about navigating failures in Pods with devices&lt;/a>, when this hardware fails, it can be difficult to diagnose, leading to significant downtime. With the release of Kubernetes v1.34, we are excited to announce a new alpha feature that brings much-needed visibility into the health of these devices.&lt;/p>
&lt;p>This work extends the functionality of &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4680-add-resource-health-to-pod-status">KEP-4680&lt;/a>, which first introduced a mechanism for reporting the health of devices managed by Device Plugins. Now, this capability is being extended to &lt;em>Dynamic Resource Allocation (DRA)&lt;/em>. Controlled by the &lt;code>ResourceHealthStatus&lt;/code> feature gate, this enhancement allows DRA drivers to report device health directly into a Pod's &lt;code>.status&lt;/code> field, providing crucial insights for operators and developers.&lt;/p>
&lt;h2 id="why-expose-device-health-in-pod-status">Why expose device health in Pod status?&lt;/h2>
&lt;p>For stateful applications or long-running jobs, a device failure can be disruptive and costly. By exposing device health in the &lt;code>.status&lt;/code> field for a Pod, Kubernetes provides a standardized way for users and automation tools to quickly diagnose issues. If a Pod is failing, you can now check its status to see if an unhealthy device is the root cause, saving valuable time that might otherwise be spent debugging application code.&lt;/p>
&lt;h2 id="how-it-works">How it works&lt;/h2>
&lt;p>This feature introduces a new, optional communication channel between the Kubelet and DRA drivers, built on three core components.&lt;/p>
&lt;h3 id="a-new-grpc-health-service">A new gRPC health service&lt;/h3>
&lt;p>A new gRPC service, &lt;code>DRAResourceHealth&lt;/code>, is defined in the &lt;code>dra-health/v1alpha1&lt;/code> API group. DRA drivers can implement this service to stream device health updates to the Kubelet. The service includes a &lt;code>NodeWatchResources&lt;/code> server-streaming RPC that sends the health status (&lt;code>Healthy&lt;/code>, &lt;code>Unhealthy&lt;/code>, or &lt;code>Unknown&lt;/code>) for the devices it manages.&lt;/p>
&lt;h3 id="kubelet-integration">Kubelet integration&lt;/h3>
&lt;p>The Kubelet’s &lt;code>DRAPluginManager&lt;/code> discovers which drivers implement the health service. For each compatible driver, it starts a long-lived &lt;code>NodeWatchResources&lt;/code> stream to receive health updates. The DRA Manager then consumes these updates and stores them in a persistent &lt;code>healthInfoCache&lt;/code> that can survive Kubelet restarts.&lt;/p>
&lt;h3 id="populating-the-pod-status">Populating the Pod status&lt;/h3>
&lt;p>When a device's health changes, the DRA manager identifies all Pods affected by the change and triggers a Pod status update. A new field, &lt;code>allocatedResourcesStatus&lt;/code>, is now part of the &lt;code>v1.ContainerStatus&lt;/code> API object. The Kubelet populates this field with the current health of each device allocated to the container.&lt;/p>
&lt;h2 id="a-practical-example">A practical example&lt;/h2>
&lt;p>If a Pod is in a &lt;code>CrashLoopBackOff&lt;/code> state, you can use &lt;code>kubectl describe pod &amp;lt;pod-name&amp;gt;&lt;/code> to inspect its status. If an allocated device has failed, the output will now include the &lt;code>allocatedResourcesStatus&lt;/code> field, clearly indicating the problem:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">status&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containerStatuses&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-gpu-intensive-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># ... other container statuses&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">allocatedResourcesStatus&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;claim:my-gpu-claim&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">resourceID&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;example.com/gpu-a1b2-c3d4&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">health&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;Unhealthy&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This explicit status makes it clear that the issue is with the underlying hardware, not the application.&lt;/p>
&lt;p>Now you can improve the failure detection logic to react on the unhealthy devices associated with the Pod by de-scheduling a Pod.&lt;/p>
&lt;h2 id="how-to-use-this-feature">How to use this feature&lt;/h2>
&lt;p>As this is an alpha feature in Kubernetes v1.34, you must take the following steps to use it:&lt;/p>
&lt;ol>
&lt;li>Enable the &lt;code>ResourceHealthStatus&lt;/code> feature gate on your kube-apiserver and kubelets.&lt;/li>
&lt;li>Ensure you are using a DRA driver that implements the &lt;code>v1alpha1 DRAResourceHealth&lt;/code> gRPC service.&lt;/li>
&lt;/ol>
&lt;h2 id="dra-drivers">DRA drivers&lt;/h2>
&lt;p>If you are developing a DRA driver, make sure to think about device failure detection strategy and ensure that your driver is integrated with this feature. This way, your driver will improve the user experience and simplify debuggability of hardware issues.&lt;/p>
&lt;h2 id="what-s-next">What's next?&lt;/h2>
&lt;p>This is the first step in a broader effort to improve how Kubernetes handles device failures. As we gather feedback on this alpha feature, the community is planning several key enhancements before graduating to Beta:&lt;/p>
&lt;ul>
&lt;li>&lt;em>Detailed health messages:&lt;/em> To improve the troubleshooting experience, we plan to add a human-readable message field to the gRPC API. This will allow DRA drivers to provide specific context for a health status, such as &amp;quot;GPU temperature exceeds threshold&amp;quot; or &amp;quot;NVLink connection lost&amp;quot;.&lt;/li>
&lt;li>&lt;em>Configurable health timeouts:&lt;/em> The timeout for marking a device's health as &amp;quot;Unknown&amp;quot; is currently hardcoded. We plan to make this configurable, likely on a per-driver basis, to better accommodate the different health-reporting characteristics of various hardware.&lt;/li>
&lt;li>&lt;em>Improved post-mortem troubleshooting:&lt;/em> We will address a known limitation where health updates may not be applied to pods that have already terminated. This fix will ensure that the health status of a device at the time of failure is preserved, which is crucial for troubleshooting batch jobs and other &amp;quot;run-to-completion&amp;quot; workloads.&lt;/li>
&lt;/ul>
&lt;p>This feature was developed as part of &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4680-add-resource-health-to-pod-status">KEP-4680&lt;/a>, and community feedback is crucial as we work toward graduating it to Beta. We have more improvements of device failure handling in k8s and encourage you to try it out and share your experiences with the SIG Node community!&lt;/p></description></item><item><title>Kubernetes v1.34: Moving Volume Group Snapshots to v1beta2</title><link>https://kubernetes.io/blog/2025/09/16/kubernetes-v1-34-volume-group-snapshot-beta-2/</link><pubDate>Tue, 16 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/16/kubernetes-v1-34-volume-group-snapshot-beta-2/</guid><description>
&lt;p>Volume group snapshots were &lt;a href="https://kubernetes.io/blog/2023/05/08/kubernetes-1-27-volume-group-snapshot-alpha/">introduced&lt;/a>
as an Alpha feature with the Kubernetes 1.27 release and moved to &lt;a href="https://kubernetes.io/blog/2024/12/18/kubernetes-1-32-volume-group-snapshot-beta/">Beta&lt;/a> in the Kubernetes 1.32 release.
The recent release of Kubernetes v1.34 moved that support to a second beta.
The support for volume group snapshots relies on a set of
&lt;a href="https://kubernetes-csi.github.io/docs/group-snapshot-restore-feature.html#volume-group-snapshot-apis">extension APIs for group snapshots&lt;/a>.
These APIs allow users to take crash consistent snapshots for a set of volumes.
Behind the scenes, Kubernetes uses a label selector to group multiple PersistentVolumeClaims
for snapshotting.
A key aim is to allow you restore that set of snapshots to new volumes and
recover your workload based on a crash consistent recovery point.&lt;/p>
&lt;p>This new feature is only supported for &lt;a href="https://kubernetes-csi.github.io/docs/">CSI&lt;/a> volume drivers.&lt;/p>
&lt;h2 id="what-s-new-in-beta-2">What's new in Beta 2?&lt;/h2>
&lt;p>While testing the beta version, we encountered an &lt;a href="https://github.com/kubernetes-csi/external-snapshotter/issues/1271">issue&lt;/a> where the &lt;code>restoreSize&lt;/code> field is not set for individual VolumeSnapshotContents and VolumeSnapshots if CSI driver does not implement the ListSnapshots RPC call.
We evaluated various options &lt;a href="https://docs.google.com/document/d/1LLBSHcnlLTaP6ZKjugtSGQHH2LGZPndyfnNqR1YvzS4/edit?tab=t.0">here&lt;/a> and decided to make this change releasing a new beta for the API.&lt;/p>
&lt;p>Specifically, a VolumeSnapshotInfo struct is added in v1beta2, it contains information for an individual volume snapshot that is a member of a volume group snapshot.
VolumeSnapshotInfoList, a list of VolumeSnapshotInfo, is added to VolumeGroupSnapshotContentStatus, replacing VolumeSnapshotHandlePairList.
VolumeSnapshotInfoList is a list of snapshot information returned by the CSI driver to identify snapshots on the storage system.
VolumeSnapshotInfoList is populated by the csi-snapshotter sidecar based on the CSI CreateVolumeGroupSnapshotResponse returned by the CSI driver's CreateVolumeGroupSnapshot call.&lt;/p>
&lt;p>The existing v1beta1 API objects will be converted to the new v1beta2 API objects by a conversion webhook.&lt;/p>
&lt;h2 id="what-s-next">What’s next?&lt;/h2>
&lt;p>Depending on feedback and adoption, the Kubernetes project plans to push the volume
group snapshot implementation to general availability (GA) in a future release.&lt;/p>
&lt;h2 id="how-can-i-learn-more">How can I learn more?&lt;/h2>
&lt;ul>
&lt;li>The &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/3476-volume-group-snapshot">design spec&lt;/a>
for the volume group snapshot feature.&lt;/li>
&lt;li>The &lt;a href="https://github.com/kubernetes-csi/external-snapshotter">code repository&lt;/a> for volume group
snapshot APIs and controller.&lt;/li>
&lt;li>CSI &lt;a href="https://kubernetes-csi.github.io/docs/">documentation&lt;/a> on the group snapshot feature.&lt;/li>
&lt;/ul>
&lt;h2 id="how-do-i-get-involved">How do I get involved?&lt;/h2>
&lt;p>This project, like all of Kubernetes, is the result of hard work by many contributors
from diverse backgrounds working together. On behalf of SIG Storage, I would like to
offer a huge thank you to the contributors who stepped up these last few quarters
to help the project reach beta:&lt;/p>
&lt;ul>
&lt;li>Ben Swartzlander (&lt;a href="https://github.com/bswartz">bswartz&lt;/a>)&lt;/li>
&lt;li>Hemant Kumar (&lt;a href="https://github.com/gnufied">gnufied&lt;/a>)&lt;/li>
&lt;li>Jan Šafránek (&lt;a href="https://github.com/jsafrane">jsafrane&lt;/a>)&lt;/li>
&lt;li>Madhu Rajanna (&lt;a href="https://github.com/Madhu-1">Madhu-1&lt;/a>)&lt;/li>
&lt;li>Michelle Au (&lt;a href="https://github.com/msau42">msau42&lt;/a>)&lt;/li>
&lt;li>Niels de Vos (&lt;a href="https://github.com/nixpanic">nixpanic&lt;/a>)&lt;/li>
&lt;li>Leonardo Cecchi (&lt;a href="https://github.com/leonardoce">leonardoce&lt;/a>)&lt;/li>
&lt;li>Saad Ali (&lt;a href="https://github.com/saad-ali">saad-ali&lt;/a>)&lt;/li>
&lt;li>Xing Yang (&lt;a href="https://github.com/xing-yang">xing-yang&lt;/a>)&lt;/li>
&lt;li>Yati Padia (&lt;a href="https://github.com/yati1998">yati1998&lt;/a>)&lt;/li>
&lt;/ul>
&lt;p>For those interested in getting involved with the design and development of CSI or
any part of the Kubernetes Storage system, join the
&lt;a href="https://github.com/kubernetes/community/tree/master/sig-storage">Kubernetes Storage Special Interest Group&lt;/a> (SIG).
We always welcome new contributors.&lt;/p>
&lt;p>We also hold regular &lt;a href="https://github.com/kubernetes/community/tree/master/wg-data-protection">Data Protection Working Group meetings&lt;/a>.
New attendees are welcome to join our discussions.&lt;/p></description></item><item><title>Kubernetes v1.34: Decoupled Taint Manager Is Now Stable</title><link>https://kubernetes.io/blog/2025/09/15/kubernetes-v1-34-decoupled-taint-manager-is-now-stable/</link><pubDate>Mon, 15 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/15/kubernetes-v1-34-decoupled-taint-manager-is-now-stable/</guid><description>
&lt;p>This enhancement separates the responsibility of managing node lifecycle and pod eviction into two distinct components.
Previously, the node lifecycle controller handled both marking nodes as unhealthy with NoExecute taints and evicting pods from them.
Now, a dedicated taint eviction controller manages the eviction process, while the node lifecycle controller focuses solely on applying taints.
This separation not only improves code organization but also makes it easier to improve taint eviction controller or build custom implementations of the taint based eviction.&lt;/p>
&lt;h2 id="what-s-new">What's new?&lt;/h2>
&lt;p>The feature gate &lt;code>SeparateTaintEvictionController&lt;/code> has been promoted to GA in this release.
Users can optionally disable taint-based eviction by setting &lt;code>--controllers=-taint-eviction-controller&lt;/code>
in kube-controller-manager.&lt;/p>
&lt;h2 id="how-can-i-learn-more">How can I learn more?&lt;/h2>
&lt;p>For more details, refer to the &lt;a href="http://kep.k8s.io/3902">KEP&lt;/a> and to the beta announcement article: &lt;a href="https://kubernetes.io/blog/2023/12/19/kubernetes-1-29-taint-eviction-controller/">Kubernetes 1.29: Decoupling taint manager from node lifecycle controller&lt;/a>.&lt;/p>
&lt;h2 id="how-to-get-involved">How to get involved?&lt;/h2>
&lt;p>We offer a huge thank you to all the contributors who helped with design,
implementation, and review of this feature and helped move it from beta to stable:&lt;/p>
&lt;ul>
&lt;li>Ed Bartosh (@bart0sh)&lt;/li>
&lt;li>Yuan Chen (@yuanchen8911)&lt;/li>
&lt;li>Aldo Culquicondor (@alculquicondor)&lt;/li>
&lt;li>Baofa Fan (@carlory)&lt;/li>
&lt;li>Sergey Kanzhelev (@SergeyKanzhelev)&lt;/li>
&lt;li>Tim Bannister (@lmktfy)&lt;/li>
&lt;li>Maciej Skoczeń (@macsko)&lt;/li>
&lt;li>Maciej Szulik (@soltysh)&lt;/li>
&lt;li>Wojciech Tyczynski (@wojtek-t)&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.34: Autoconfiguration for Node Cgroup Driver Goes GA</title><link>https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/</link><pubDate>Fri, 12 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/12/kubernetes-v1-34-cri-cgroup-driver-lookup-now-ga/</guid><description>
&lt;p>Historically, configuring the correct cgroup driver has been a pain point for users running new
Kubernetes clusters. On Linux systems, there are two different cgroup drivers:
&lt;code>cgroupfs&lt;/code> and &lt;code>systemd&lt;/code>. In the past, both the &lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/">kubelet&lt;/a>
and CRI implementation (like CRI-O or containerd) needed to be configured to use
the same cgroup driver, or else the kubelet would misbehave without any explicit
error message. This was a source of headaches for many cluster admins. Now, we've
(almost) arrived at the end of that headache.&lt;/p>
&lt;h2 id="automated-cgroup-driver-detection">Automated cgroup driver detection&lt;/h2>
&lt;p>In v1.28.0, the SIG Node community introduced the feature gate
&lt;code>KubeletCgroupDriverFromCRI&lt;/code>, which instructs the kubelet to ask the CRI
implementation which cgroup driver to use. You can read more &lt;a href="https://kubernetes.io/blog/2024/08/21/cri-cgroup-driver-lookup-now-beta/">here&lt;/a>.
After many releases of waiting for each CRI implementation to have major versions released
and packaged in major operating systems, this feature has gone GA as of Kubernetes 1.34.0.&lt;/p>
&lt;p>In addition to setting the feature gate, a cluster admin needs to ensure their
CRI implementation is new enough:&lt;/p>
&lt;ul>
&lt;li>containerd: Support was added in v2.0.0&lt;/li>
&lt;li>CRI-O: Support was added in v1.28.0&lt;/li>
&lt;/ul>
&lt;h2 id="announcement-kubernetes-is-deprecating-containerd-v1-y-support">Announcement: Kubernetes is deprecating containerd v1.y support&lt;/h2>
&lt;p>While CRI-O releases versions that match Kubernetes versions, and thus CRI-O
versions without this behavior are no longer supported, containerd maintains its
own release cycle. containerd support for this feature is only in v2.0 and
later, but Kubernetes 1.34 still supports containerd 1.7 and other LTS releases
of containerd.&lt;/p>
&lt;p>The Kubernetes SIG Node community has formally agreed upon a final support
timeline for containerd v1.y. The last Kubernetes release to offer this support
will be the last released version of v1.35, and support will be dropped in
v1.36.0. To assist administrators in managing this future transition,
a new detection mechanism is available. You are able to monitor
the &lt;code>kubelet_cri_losing_support&lt;/code> metric to determine if any nodes in your cluster
are using a containerd version that will soon be outdated. The presence of
this metric with a version label of &lt;code>1.36.0&lt;/code> will indicate that the node's containerd
runtime is not new enough for the upcoming requirements. Consequently, an
administrator will need to upgrade containerd to v2.0 or a later version before,
or at the same time as, upgrading the kubelet to v1.36.0.&lt;/p></description></item><item><title>Kubernetes v1.34: Mutable CSI Node Allocatable Graduates to Beta</title><link>https://kubernetes.io/blog/2025/09/11/kubernetes-v1-34-mutable-csi-node-allocatable-count/</link><pubDate>Thu, 11 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/11/kubernetes-v1-34-mutable-csi-node-allocatable-count/</guid><description>
&lt;p>The &lt;a href="https://kep.k8s.io/4876">functionality for CSI drivers to update information about attachable volume count on the nodes&lt;/a>, first introduced as Alpha in Kubernetes v1.33, has graduated to &lt;strong>Beta&lt;/strong> in the Kubernetes v1.34 release! This marks a significant milestone in enhancing the accuracy of stateful pod scheduling by reducing failures due to outdated attachable volume capacity information.&lt;/p>
&lt;h2 id="background">Background&lt;/h2>
&lt;p>Traditionally, Kubernetes &lt;a href="https://kubernetes-csi.github.io/docs/introduction.html">CSI drivers&lt;/a> report a static maximum volume attachment limit when initializing. However, actual attachment capacities can change during a node's lifecycle for various reasons, such as:&lt;/p>
&lt;ul>
&lt;li>Manual or external operations attaching/detaching volumes outside of Kubernetes control.&lt;/li>
&lt;li>Dynamically attached network interfaces or specialized hardware (GPUs, NICs, etc.) consuming available slots.&lt;/li>
&lt;li>Multi-driver scenarios, where one CSI driver’s operations affect available capacity reported by another.&lt;/li>
&lt;/ul>
&lt;p>Static reporting can cause Kubernetes to schedule pods onto nodes that appear to have capacity but don't, leading to pods stuck in a &lt;code>ContainerCreating&lt;/code> state.&lt;/p>
&lt;h2 id="dynamically-adapting-csi-volume-limits">Dynamically adapting CSI volume limits&lt;/h2>
&lt;p>With this new feature, Kubernetes enables CSI drivers to dynamically adjust and report node attachment capacities at runtime. This ensures that the scheduler, as well as other components relying on this information, have the most accurate, up-to-date view of node capacity.&lt;/p>
&lt;h3 id="how-it-works">How it works&lt;/h3>
&lt;p>Kubernetes supports two mechanisms for updating the reported node volume limits:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Periodic Updates:&lt;/strong> CSI drivers specify an interval to periodically refresh the node's allocatable capacity.&lt;/li>
&lt;li>&lt;strong>Reactive Updates:&lt;/strong> An immediate update triggered when a volume attachment fails due to exhausted resources (&lt;code>ResourceExhausted&lt;/code> error).&lt;/li>
&lt;/ul>
&lt;h3 id="enabling-the-feature">Enabling the feature&lt;/h3>
&lt;p>To use this beta feature, the &lt;code>MutableCSINodeAllocatableCount&lt;/code> feature gate must be enabled in these components:&lt;/p>
&lt;ul>
&lt;li>&lt;code>kube-apiserver&lt;/code>&lt;/li>
&lt;li>&lt;code>kubelet&lt;/code>&lt;/li>
&lt;/ul>
&lt;h3 id="example-csi-driver-configuration">Example CSI driver configuration&lt;/h3>
&lt;p>Below is an example of configuring a CSI driver to enable periodic updates every 60 seconds:&lt;/p>
&lt;pre tabindex="0">&lt;code>apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
name: example.csi.k8s.io
spec:
nodeAllocatableUpdatePeriodSeconds: 60
&lt;/code>&lt;/pre>&lt;p>This configuration directs kubelet to periodically call the CSI driver's &lt;code>NodeGetInfo&lt;/code> method every 60 seconds, updating the node’s allocatable volume count. Kubernetes enforces a minimum update interval of 10 seconds to balance accuracy and resource usage.&lt;/p>
&lt;h3 id="immediate-updates-on-attachment-failures">Immediate updates on attachment failures&lt;/h3>
&lt;p>When a volume attachment operation fails due to a &lt;code>ResourceExhausted&lt;/code> error (gRPC code &lt;code>8&lt;/code>), Kubernetes immediately updates the allocatable count instead of waiting for the next periodic update. The Kubelet then marks the affected pods as Failed, enabling their controllers to recreate them. This prevents pods from getting permanently stuck in the &lt;code>ContainerCreating&lt;/code> state.&lt;/p>
&lt;h2 id="getting-started">Getting started&lt;/h2>
&lt;p>To enable this feature in your Kubernetes v1.34 cluster:&lt;/p>
&lt;ol>
&lt;li>Enable the feature gate &lt;code>MutableCSINodeAllocatableCount&lt;/code> on the &lt;code>kube-apiserver&lt;/code> and &lt;code>kubelet&lt;/code> components.&lt;/li>
&lt;li>Update your CSI driver configuration by setting &lt;code>nodeAllocatableUpdatePeriodSeconds&lt;/code>.&lt;/li>
&lt;li>Monitor and observe improvements in scheduling accuracy and pod placement reliability.&lt;/li>
&lt;/ol>
&lt;h2 id="next-steps">Next steps&lt;/h2>
&lt;p>This feature is currently in beta and the Kubernetes community welcomes your feedback. Test it, share your experiences, and help guide its evolution to GA stability.&lt;/p>
&lt;p>Join discussions in the &lt;a href="https://github.com/kubernetes/community/tree/master/sig-storage">Kubernetes Storage Special Interest Group (SIG-Storage)&lt;/a> to shape the future of Kubernetes storage capabilities.&lt;/p></description></item><item><title>Kubernetes v1.34: Use An Init Container To Define App Environment Variables</title><link>https://kubernetes.io/blog/2025/09/10/kubernetes-v1-34-env-files/</link><pubDate>Wed, 10 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/10/kubernetes-v1-34-env-files/</guid><description>
&lt;p>Kubernetes typically uses ConfigMaps and Secrets to set environment variables,
which introduces additional API calls and complexity,
For example, you need to separately manage the Pods of your workloads
and their configurations, while ensuring orderly
updates for both the configurations and the workload Pods.&lt;/p>
&lt;p>Alternatively, you might be using a vendor-supplied container
that requires environment variables (such as a license key or a one-time token),
but you don’t want to hard-code them or mount volumes just to get the job done.&lt;/p>
&lt;p>If that's the situation you are in, you now have a new (alpha) way to
achieve that. Provided you have the &lt;code>EnvFiles&lt;/code>
&lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/">feature gate&lt;/a>
enabled across your cluster, you can tell the kubelet to load a container's
environment variables from a volume (the volume must be part of the Pod that
the container belongs to).
this feature gate allows you to load environment variables directly from a file in an emptyDir volume
without actually mounting that file into the container.
It’s a simple yet elegant solution to some surprisingly common problems.&lt;/p>
&lt;h2 id="what-s-this-all-about">What’s this all about?&lt;/h2>
&lt;p>At its core, this feature allows you to point your container to a file,
one generated by an &lt;code>initContainer&lt;/code>,
and have Kubernetes parse that file to set your environment variables.
The file lives in an &lt;code>emptyDir&lt;/code> volume (a temporary storage space that lasts as long as the pod does),
Your main container doesn’t need to mount the volume.
The kubelet will read the file and inject these variables when the container starts.&lt;/p>
&lt;h2 id="how-it-works">How It Works&lt;/h2>
&lt;p>Here's a simple example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">initContainers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>generate-config&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>busybox&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#39;sh&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;-c&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;echo &amp;#34;CONFIG_VAR=HELLO&amp;#34; &amp;gt; /config/config.env&amp;#39;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">volumeMounts&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>config-volume&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">mountPath&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>/config&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>app-container&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gcr.io/distroless/static&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">env&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>CONFIG_VAR&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">valueFrom&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">fileKeyRef&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">path&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>config.env&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">volumeName&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>config-volume&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">key&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>CONFIG_VAR&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">volumes&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>config-volume&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">emptyDir&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>{}&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Using this approach is a breeze.
You define your environment variables in the pod spec using the &lt;code>fileKeyRef&lt;/code> field,
which tells Kubernetes where to find the file and which key to pull.
The file itself resembles the standard for .env syntax (think KEY=VALUE),
and (for this alpha stage at least) you must ensure that it is written into
an &lt;code>emptyDir&lt;/code> volume. Other volume types aren't supported for this feature.
At least one init container must mount that &lt;code>emptyDir&lt;/code> volume (to write the file),
but the main container doesn’t need to—it just gets the variables handed to it at startup.&lt;/p>
&lt;h2 id="a-word-on-security">A word on security&lt;/h2>
&lt;p>While this feature supports handling sensitive data such as keys or tokens,
note that its implementation relies on &lt;code>emptyDir&lt;/code> volumes mounted into pod.
Operators with node filesystem access could therefore
easily retrieve this sensitive data through pod directory paths.&lt;/p>
&lt;p>If storing sensitive data like keys or tokens using this feature,
ensure your cluster security policies effectively protect nodes
against unauthorized access to prevent exposure of confidential information.&lt;/p>
&lt;h2 id="summary">Summary&lt;/h2>
&lt;p>This feature will eliminate a number of complex workarounds used today, simplifying
apps authoring, and opening doors for more use cases. Kubernetes stays flexible and
open for feedback. Tell us how you use this feature or what is missing.&lt;/p></description></item><item><title>Kubernetes v1.34: Snapshottable API server cache</title><link>https://kubernetes.io/blog/2025/09/09/kubernetes-v1-34-snapshottable-api-server-cache/</link><pubDate>Tue, 09 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/09/kubernetes-v1-34-snapshottable-api-server-cache/</guid><description>
&lt;p>For years, the Kubernetes community has been on a mission to improve the stability and performance predictability of the API server.
A major focus of this effort has been taming &lt;strong>list&lt;/strong> requests, which have historically been a primary source of high memory usage and heavy load on the &lt;code>etcd&lt;/code> datastore.
With each release, we've chipped away at the problem, and today, we're thrilled to announce the final major piece of this puzzle.&lt;/p>
&lt;p>The &lt;em>snapshottable API server cache&lt;/em> feature has graduated to &lt;strong>Beta&lt;/strong> in Kubernetes v1.34,
culminating a multi-release effort to allow virtually all read requests to be served directly from the API server's cache.&lt;/p>
&lt;h2 id="evolving-the-cache-for-performance-and-stability">Evolving the cache for performance and stability&lt;/h2>
&lt;p>The path to the current state involved several key enhancements over recent releases that paved the way for today's announcement.&lt;/p>
&lt;h3 id="consistent-reads-from-cache-beta-in-v1-31">Consistent reads from cache (Beta in v1.31)&lt;/h3>
&lt;p>While the API server has long used a cache for performance, a key milestone was guaranteeing &lt;em>consistent reads of the latest data&lt;/em> from it. This v1.31 enhancement allowed the watch cache to be used for strongly-consistent read requests for the first time, a huge win as it enabled &lt;em>filtered collections&lt;/em> (e.g. &amp;quot;a list of pods bound to this node&amp;quot;) to be safely served from the cache instead of etcd, dramatically reducing its load for common workloads.&lt;/p>
&lt;h3 id="taming-large-responses-with-streaming-beta-in-v1-33">Taming large responses with streaming (Beta in v1.33)&lt;/h3>
&lt;p>Another key improvement was tackling the problem of memory spikes when transmitting large responses. The streaming encoder, introduced in v1.33, allowed the API server to send list items one by one, rather than buffering the entire multi-gigabyte response in memory. This made the memory cost of sending a response predictable and minimal, regardless of its size.&lt;/p>
&lt;h3 id="the-missing-piece">The missing piece&lt;/h3>
&lt;p>Despite these huge improvements, a critical gap remained. Any request for a historical &lt;code>LIST&lt;/code>—most commonly used for paginating through large result sets—still had to bypass the cache and query &lt;code>etcd&lt;/code> directly. This meant that the cost of &lt;em>retrieving&lt;/em> the data was still unpredictable and could put significant memory pressure on the API server.&lt;/p>
&lt;h2 id="kubernetes-1-34-snapshots-complete-the-picture">Kubernetes 1.34: snapshots complete the picture&lt;/h2>
&lt;p>The &lt;em>snapshottable API server cache&lt;/em> solves this final piece of the puzzle.
This feature enhances the watch cache, enabling it to generate efficient, point-in-time snapshots of its state.&lt;/p>
&lt;p>Here’s how it works: for each update, the cache creates a lightweight snapshot.
These snapshots are &amp;quot;lazy copies,&amp;quot; meaning they don't duplicate objects but simply store pointers, making them incredibly memory-efficient.&lt;/p>
&lt;p>When a &lt;strong>list&lt;/strong> request for a historical &lt;code>resourceVersion&lt;/code> arrives, the API server now finds the corresponding snapshot and serves the response directly from its memory.
This closes the final major gap, allowing paginated requests to be served entirely from the cache.&lt;/p>
&lt;h2 id="a-new-era-of-api-server-performance">A new era of API Server performance 🚀&lt;/h2>
&lt;p>With this final piece in place, the synergy of these three features ushers in a new era of API server predictability and performance:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Get Data from Cache&lt;/strong>: &lt;em>Consistent reads&lt;/em> and &lt;em>snapshottable cache&lt;/em> work together to ensure nearly all read requests—whether for the latest data or a historical snapshot—are served from the API server's memory.&lt;/li>
&lt;li>&lt;strong>Send data via stream&lt;/strong>: &lt;em>Streaming list responses&lt;/em> ensure that sending this data to the client has a minimal and constant memory footprint.&lt;/li>
&lt;/ol>
&lt;p>The result is a system where the resource cost of read operations is almost fully predictable and much more resiliant to spikes in request load.
This means dramatically reduced memory pressure, a lighter load on &lt;code>etcd&lt;/code>, and a more stable, scalable, and reliable control plane for all Kubernetes clusters.&lt;/p>
&lt;h2 id="how-to-get-started">How to get started&lt;/h2>
&lt;p>With its graduation to Beta, the &lt;code>SnapshottableCache&lt;/code> feature gate is &lt;strong>enabled by default&lt;/strong> in Kubernetes v1.34. There are no actions required to start benefiting from these performance and stability improvements.&lt;/p>
&lt;h2 id="acknowledgements">Acknowledgements&lt;/h2>
&lt;p>Special thanks for designing, implementing, and reviewing these critical features go to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Ahmad Zolfaghari&lt;/strong> (&lt;a href="https://github.com/ah8ad3">@ah8ad3&lt;/a>)&lt;/li>
&lt;li>&lt;strong>Ben Luddy&lt;/strong> (&lt;a href="https://github.com/benluddy">@benluddy&lt;/a>) – &lt;em>Red Hat&lt;/em>&lt;/li>
&lt;li>&lt;strong>Chen Chen&lt;/strong> (&lt;a href="https://github.com/z1cheng">@z1cheng&lt;/a>) – &lt;em>Microsoft&lt;/em>&lt;/li>
&lt;li>&lt;strong>Davanum Srinivas&lt;/strong> (&lt;a href="https://github.com/dims">@dims&lt;/a>) – &lt;em>Nvidia&lt;/em>&lt;/li>
&lt;li>&lt;strong>David Eads&lt;/strong> (&lt;a href="https://github.com/deads2k">@deads2k&lt;/a>) – &lt;em>Red Hat&lt;/em>&lt;/li>
&lt;li>&lt;strong>Han Kang&lt;/strong> (&lt;a href="https://github.com/logicalhan">@logicalhan&lt;/a>) – &lt;em>CoreWeave&lt;/em>&lt;/li>
&lt;li>&lt;strong>haosdent&lt;/strong> (&lt;a href="https://github.com/haosdent">@haosdent&lt;/a>) – &lt;em>Shopee&lt;/em>&lt;/li>
&lt;li>&lt;strong>Joe Betz&lt;/strong> (&lt;a href="https://github.com/jpbetz">@jpbetz&lt;/a>) – &lt;em>Google&lt;/em>&lt;/li>
&lt;li>&lt;strong>Jordan Liggitt&lt;/strong> (&lt;a href="https://github.com/liggitt">@liggitt&lt;/a>) – &lt;em>Google&lt;/em>&lt;/li>
&lt;li>&lt;strong>Łukasz Szaszkiewicz&lt;/strong> (&lt;a href="https://github.com/p0lyn0mial">@p0lyn0mial&lt;/a>) – &lt;em>Red Hat&lt;/em>&lt;/li>
&lt;li>&lt;strong>Maciej Borsz&lt;/strong> (&lt;a href="https://github.com/mborsz">@mborsz&lt;/a>) – &lt;em>Google&lt;/em>&lt;/li>
&lt;li>&lt;strong>Madhav Jivrajani&lt;/strong> (&lt;a href="https://github.com/MadhavJivrajani">@MadhavJivrajani&lt;/a>) – &lt;em>UIUC&lt;/em>&lt;/li>
&lt;li>&lt;strong>Marek Siarkowicz&lt;/strong> (&lt;a href="https://github.com/serathius">@serathius&lt;/a>) – &lt;em>Google&lt;/em>&lt;/li>
&lt;li>&lt;strong>NKeert&lt;/strong> (&lt;a href="https://github.com/NKeert">@NKeert&lt;/a>)&lt;/li>
&lt;li>&lt;strong>Tim Bannister&lt;/strong> (&lt;a href="https://github.com/lmktfy">@lmktfy&lt;/a>)&lt;/li>
&lt;li>&lt;strong>Wei Fu&lt;/strong> (&lt;a href="https://github.com/fuweid">@fuweid&lt;/a>) - &lt;em>Microsoft&lt;/em>&lt;/li>
&lt;li>&lt;strong>Wojtek Tyczyński&lt;/strong> (&lt;a href="https://github.com/wojtek-t">@wojtek-t&lt;/a>) – &lt;em>Google&lt;/em>&lt;/li>
&lt;/ul>
&lt;p>...and many others in SIG API Machinery. This milestone is a testament to the community's dedication to building a more scalable and robust Kubernetes.&lt;/p></description></item><item><title>Kubernetes v1.34: VolumeAttributesClass for Volume Modification GA</title><link>https://kubernetes.io/blog/2025/09/08/kubernetes-v1-34-volume-attributes-class/</link><pubDate>Mon, 08 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/08/kubernetes-v1-34-volume-attributes-class/</guid><description>
&lt;p>The VolumeAttributesClass API, which empowers users to dynamically modify volume attributes, has officially graduated to General Availability (GA) in Kubernetes v1.34. This marks a significant milestone, providing a robust and stable way to tune your persistent storage directly within Kubernetes.&lt;/p>
&lt;h2 id="what-is-volumeattributesclass">What is VolumeAttributesClass?&lt;/h2>
&lt;p>At its core, VolumeAttributesClass is a cluster-scoped resource that defines a set of mutable parameters for a volume. Think of it as a &amp;quot;profile&amp;quot; for your storage, allowing cluster administrators to expose different quality-of-service (QoS) levels or performance tiers.&lt;/p>
&lt;p>Users can then specify a &lt;code>volumeAttributesClassName&lt;/code> in their PersistentVolumeClaim (PVC) to indicate which class of attributes they desire. The magic happens through the Container Storage Interface (CSI): when a PVC referencing a VolumeAttributesClass is updated, the associated CSI driver interacts with the underlying storage system to apply the specified changes to the volume.&lt;/p>
&lt;p>This means you can now:&lt;/p>
&lt;ul>
&lt;li>Dynamically scale performance: Increase IOPS or throughput for a busy database, or reduce it for a less critical application.&lt;/li>
&lt;li>Optimize costs: Adjust attributes on the fly to match your current needs, avoiding over-provisioning.&lt;/li>
&lt;li>Simplify operations: Manage volume modifications directly within the Kubernetes API, rather than relying on external tools or manual processes.&lt;/li>
&lt;/ul>
&lt;h2 id="what-is-new-from-beta-to-ga">What is new from Beta to GA&lt;/h2>
&lt;p>There are two major enhancements from beta.&lt;/p>
&lt;h3 id="cancellation-support-when-errors-occur">Cancellation support when errors occur&lt;/h3>
&lt;p>To improve resilience and user experience, the GA release introduces explicit cancel support when a requested volume modification encounters an error. If the underlying storage system or CSI driver indicates that the requested changes cannot be applied (e.g., due to invalid arguments), users can cancel the operation and revert the volume to its previous stable configuration, preventing the volume from being left in an inconsistent state.&lt;/p>
&lt;h3 id="quota-support-based-on-scope">Quota support based on scope&lt;/h3>
&lt;p>While VolumeAttributesClass doesn't add a new quota type, the Kubernetes control plane can be configured to enforce quotas on PersistentVolumeClaims that reference a specific VolumeAttributesClass.&lt;/p>
&lt;p>This is achieved by using the &lt;code>scopeSelector&lt;/code> field in a ResourceQuota to target PVCs that have &lt;code>.spec.volumeAttributesClassName&lt;/code> set to a particular VolumeAttributesClass name. Please see more details &lt;a href="https://kubernetes.io/docs/concepts/policy/resource-quotas/#resource-quota-per-volumeattributesclass">here&lt;/a>.&lt;/p>
&lt;h2 id="drivers-support-volumeattributesclass">Drivers support VolumeAttributesClass&lt;/h2>
&lt;ul>
&lt;li>Amazon EBS CSI Driver: The AWS EBS CSI driver has robust support for VolumeAttributesClass and allows you to modify parameters like volume type (e.g., gp2 to gp3, io1 to io2), IOPS, and throughput of EBS volumes dynamically.&lt;/li>
&lt;li>Google Compute Engine (GCE) Persistent Disk CSI Driver (pd.csi.storage.gke.io): This driver also supports dynamic modification of persistent disk attributes, including IOPS and throughput, via VolumeAttributesClass.&lt;/li>
&lt;/ul>
&lt;h2 id="contact">Contact&lt;/h2>
&lt;p>For any inquiries or specific questions related to VolumeAttributesClass, please reach out to the &lt;a href="https://github.com/kubernetes/community/tree/master/sig-storage">SIG Storage community&lt;/a>.&lt;/p></description></item><item><title>Kubernetes v1.34: Pod Replacement Policy for Jobs Goes GA</title><link>https://kubernetes.io/blog/2025/09/05/kubernetes-v1-34-pod-replacement-policy-for-jobs-goes-ga/</link><pubDate>Fri, 05 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/05/kubernetes-v1-34-pod-replacement-policy-for-jobs-goes-ga/</guid><description>
&lt;p>In Kubernetes v1.34, the &lt;em>Pod replacement policy&lt;/em> feature has reached general availability (GA).
This blog post describes the Pod replacement policy feature and how to use it in your Jobs.&lt;/p>
&lt;h2 id="about-pod-replacement-policy">About Pod Replacement Policy&lt;/h2>
&lt;p>By default, the Job controller immediately recreates Pods as soon as they fail or begin terminating (when they have a deletion timestamp).&lt;/p>
&lt;p>As a result, while some Pods are terminating, the total number of running Pods for a Job can temporarily exceed the specified parallelism.
For Indexed Jobs, this can even mean multiple Pods running for the same index at the same time.&lt;/p>
&lt;p>This behavior works fine for many workloads, but it can cause problems in certain cases.&lt;/p>
&lt;p>For example, popular machine learning frameworks like TensorFlow and
&lt;a href="https://jax.readthedocs.io/en/latest/">JAX&lt;/a> expect exactly one Pod per worker index.
If two Pods run at the same time, you might encounter errors such as:&lt;/p>
&lt;pre tabindex="0">&lt;code>/job:worker/task:4: Duplicate task registration with task_name=/job:worker/replica:0/task:4
&lt;/code>&lt;/pre>&lt;p>Additionally, starting replacement Pods before the old ones fully terminate can lead to:&lt;/p>
&lt;ul>
&lt;li>Scheduling delays by kube-scheduler as the nodes remain occupied.&lt;/li>
&lt;li>Unnecessary cluster scale-ups to accommodate the replacement Pods.&lt;/li>
&lt;li>Temporary bypassing of quota checks by workload orchestrators like &lt;a href="https://kueue.sigs.k8s.io/">Kueue&lt;/a>.&lt;/li>
&lt;/ul>
&lt;p>With Pod replacement policy, Kubernetes gives you control over when the control plane
replaces terminating Pods, helping you avoid these issues.&lt;/p>
&lt;h2 id="how-pod-replacement-policy-works">How Pod Replacement Policy works&lt;/h2>
&lt;p>This enhancement means that Jobs in Kubernetes have an optional field &lt;code>.spec.podReplacementPolicy&lt;/code>.&lt;br>
You can choose one of two policies:&lt;/p>
&lt;ul>
&lt;li>&lt;code>TerminatingOrFailed&lt;/code> (default): Replaces Pods as soon as they start terminating.&lt;/li>
&lt;li>&lt;code>Failed&lt;/code>: Replaces Pods only after they fully terminate and transition to the &lt;code>Failed&lt;/code> phase.&lt;/li>
&lt;/ul>
&lt;p>Setting the policy to &lt;code>Failed&lt;/code> ensures that a new Pod is only created after the previous one has completely terminated.&lt;/p>
&lt;p>For Jobs with a Pod Failure Policy, the default &lt;code>podReplacementPolicy&lt;/code> is &lt;code>Failed&lt;/code>, and no other value is allowed.
See &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-failure-policy">Pod Failure Policy&lt;/a> to learn more about Pod Failure Policies for Jobs.&lt;/p>
&lt;p>You can check how many Pods are currently terminating by inspecting the Job’s &lt;code>.status.terminating&lt;/code> field:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>kubectl get job myjob -o&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#b8860b">jsonpath&lt;/span>&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#b44">&amp;#39;{.status.terminating}&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="example">Example&lt;/h2>
&lt;p>Here’s a Job example that executes a task two times (&lt;code>spec.completions: 2&lt;/code>) in parallel (&lt;code>spec.parallelism: 2&lt;/code>) and
replaces Pods only after they fully terminate (&lt;code>spec.podReplacementPolicy: Failed&lt;/code>):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>batch/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Job&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>example-job&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">completions&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">2&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">parallelism&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#666">2&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">podReplacementPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Failed&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">template&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Never&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>worker&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>your-image&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If a Pod receives a SIGTERM signal (deletion, eviction, preemption...), it begins terminating.
When the container handles termination gracefully, cleanup may take some time.&lt;/p>
&lt;p>When the Job starts, we will see two Pods running:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>kubectl get pods
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>NAME READY STATUS RESTARTS AGE
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-qr8kf 1/1 Running &lt;span style="color:#666">0&lt;/span> 2s
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-stvb4 1/1 Running &lt;span style="color:#666">0&lt;/span> 2s
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Let's delete one of the Pods (&lt;code>example-job-qr8kf&lt;/code>).&lt;/p>
&lt;p>With the &lt;code>TerminatingOrFailed&lt;/code> policy, as soon as one Pod (&lt;code>example-job-qr8kf&lt;/code>) starts terminating, the Job controller immediately creates a new Pod (&lt;code>example-job-b59zk&lt;/code>) to replace it.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>kubectl get pods
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>NAME READY STATUS RESTARTS AGE
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-b59zk 1/1 Running &lt;span style="color:#666">0&lt;/span> 1s
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-qr8kf 1/1 Terminating &lt;span style="color:#666">0&lt;/span> 17s
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-stvb4 1/1 Running &lt;span style="color:#666">0&lt;/span> 17s
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>With the &lt;code>Failed&lt;/code> policy, the new Pod (&lt;code>example-job-b59zk&lt;/code>) is not created while the old Pod (&lt;code>example-job-qr8kf&lt;/code>) is terminating.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>kubectl get pods
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>NAME READY STATUS RESTARTS AGE
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-qr8kf 1/1 Terminating &lt;span style="color:#666">0&lt;/span> 17s
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-stvb4 1/1 Running &lt;span style="color:#666">0&lt;/span> 17s
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>When the terminating Pod has fully transitioned to the &lt;code>Failed&lt;/code> phase, a new Pod is created:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-shell" data-lang="shell">&lt;span style="display:flex;">&lt;span>kubectl get pods
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>NAME READY STATUS RESTARTS AGE
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-b59zk 1/1 Running &lt;span style="color:#666">0&lt;/span> 1s
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>example-job-stvb4 1/1 Running &lt;span style="color:#666">0&lt;/span> 25s
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="how-can-you-learn-more">How can you learn more?&lt;/h2>
&lt;ul>
&lt;li>Read the user-facing documentation for &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-replacement-policy">Pod Replacement Policy&lt;/a>,
&lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#backoff-limit-per-index">Backoff Limit per Index&lt;/a>, and
&lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-failure-policy">Pod Failure Policy&lt;/a>.&lt;/li>
&lt;li>Read the KEPs for &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3939-allow-replacement-when-fully-terminated">Pod Replacement Policy&lt;/a>,
&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3850-backoff-limits-per-index-for-indexed-jobs">Backoff Limit per Index&lt;/a>, and
&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures">Pod Failure Policy&lt;/a>.&lt;/li>
&lt;/ul>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>As with any Kubernetes feature, multiple people contributed to getting this
done, from testing and filing bugs to reviewing code.&lt;/p>
&lt;p>As this feature moves to stable after 2 years, we would like to thank the following people:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/kannon92">Kevin Hannon&lt;/a> - for writing the KEP and the initial implementation.&lt;/li>
&lt;li>&lt;a href="https://github.com/mimowo">Michał Woźniak&lt;/a> - for guidance, mentorship, and reviews.&lt;/li>
&lt;li>&lt;a href="https://github.com/alculquicondor">Aldo Culquicondor&lt;/a> - for guidance, mentorship, and reviews.&lt;/li>
&lt;li>&lt;a href="https://github.com/soltysh">Maciej Szulik&lt;/a> - for guidance, mentorship, and reviews.&lt;/li>
&lt;li>&lt;a href="https://github.com/dejanzele">Dejan Zele Pejchev&lt;/a> - for taking over the feature and promoting it from Alpha through Beta to GA.&lt;/li>
&lt;/ul>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>This work was sponsored by the Kubernetes
&lt;a href="https://github.com/kubernetes/community/tree/master/wg-batch">batch working group&lt;/a>
in close collaboration with the
&lt;a href="https://github.com/kubernetes/community/tree/master/sig-apps">SIG Apps&lt;/a> community.&lt;/p>
&lt;p>If you are interested in working on new features in the space we recommend
subscribing to our &lt;a href="https://kubernetes.slack.com/messages/wg-batch">Slack&lt;/a>
channel and attending the regular community meetings.&lt;/p></description></item><item><title>Kubernetes v1.34: PSI Metrics for Kubernetes Graduates to Beta</title><link>https://kubernetes.io/blog/2025/09/04/kubernetes-v1-34-introducing-psi-metrics-beta/</link><pubDate>Thu, 04 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/04/kubernetes-v1-34-introducing-psi-metrics-beta/</guid><description>
&lt;p>As Kubernetes clusters grow in size and complexity, understanding the health and performance of individual nodes becomes increasingly critical. We are excited to announce that as of Kubernetes v1.34, &lt;strong>Pressure Stall Information (PSI) Metrics&lt;/strong> has graduated to Beta.&lt;/p>
&lt;h2 id="what-is-pressure-stall-information-psi">What is Pressure Stall Information (PSI)?&lt;/h2>
&lt;p>&lt;a href="https://docs.kernel.org/accounting/psi.html">Pressure Stall Information (PSI)&lt;/a> is a feature of the Linux kernel (version 4.20 and later)
that provides a canonical way to quantify pressure on infrastructure resources,
in terms of whether demand for a resource exceeds current supply.
It moves beyond simple resource utilization metrics and instead
measures the amount of time that tasks are stalled due to resource contention.
This is a powerful way to identify and diagnose resource bottlenecks that can impact application performance.&lt;/p>
&lt;p>PSI exposes metrics for CPU, memory, and I/O, categorized as either &lt;code>some&lt;/code> or &lt;code>full&lt;/code> pressure:&lt;/p>
&lt;dl>
&lt;dt>&lt;code>some&lt;/code>&lt;/dt>
&lt;dd>The percentage of time that &lt;strong>at least one&lt;/strong> task is stalled on a resource. This indicates some level of resource contention.&lt;/dd>
&lt;dt>&lt;code>full&lt;/code>&lt;/dt>
&lt;dd>The percentage of time that &lt;strong>all&lt;/strong> non-idle tasks are stalled on a resource simultaneously. This indicates a more severe resource bottleneck.&lt;/dd>
&lt;/dl>
&lt;figure>
&lt;img src="https://kubernetes.io/blog/2025/09/04/kubernetes-v1-34-introducing-psi-metrics-beta/psi-metrics-some-vs-full.svg"
alt="Diagram illustrating the difference between &amp;#39;some&amp;#39; and &amp;#39;full&amp;#39; PSI pressure."/> &lt;figcaption>
&lt;h4>PSI: &amp;#39;Some&amp;#39; vs. &amp;#39;Full&amp;#39; Pressure&lt;/h4>
&lt;/figcaption>
&lt;/figure>
&lt;p>These metrics are aggregated over 10-second, 1-minute, and 5-minute rolling windows, providing a comprehensive view of resource pressure over time.&lt;/p>
&lt;h2 id="psi-metrics-in-kubernetes">PSI metrics in Kubernetes&lt;/h2>
&lt;p>With the &lt;code>KubeletPSI&lt;/code> feature gate enabled, the kubelet can now collect PSI metrics from the Linux kernel and expose them through two channels: the &lt;a href="https://kubernetes.io/docs/reference/instrumentation/node-metrics/#summary-api-source">Summary API&lt;/a> and the &lt;code>/metrics/cadvisor&lt;/code> Prometheus endpoint. This allows you to monitor and alert on resource pressure at the node, pod, and container level.&lt;/p>
&lt;p>The following new metrics are available in Prometheus exposition format via &lt;code>/metrics/cadvisor&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>&lt;code>container_pressure_cpu_stalled_seconds_total&lt;/code>&lt;/li>
&lt;li>&lt;code>container_pressure_cpu_waiting_seconds_total&lt;/code>&lt;/li>
&lt;li>&lt;code>container_pressure_memory_stalled_seconds_total&lt;/code>&lt;/li>
&lt;li>&lt;code>container_pressure_memory_waiting_seconds_total&lt;/code>&lt;/li>
&lt;li>&lt;code>container_pressure_io_stalled_seconds_total&lt;/code>&lt;/li>
&lt;li>&lt;code>container_pressure_io_waiting_seconds_total&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>These metrics, along with the data from the Summary API, provide a granular view of resource pressure, enabling you to pinpoint the source of performance issues and take corrective action. For example, you can use these metrics to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Identify memory leaks:&lt;/strong> A steadily increasing &lt;code>some&lt;/code> pressure for memory can indicate a memory leak in an application.&lt;/li>
&lt;li>&lt;strong>Optimize resource requests and limits:&lt;/strong> By understanding the resource pressure of your workloads, you can more accurately tune their resource requests and limits.&lt;/li>
&lt;li>&lt;strong>Autoscale workloads:&lt;/strong> You can use PSI metrics to trigger autoscaling events, ensuring that your workloads have the resources they need to perform optimally.&lt;/li>
&lt;/ul>
&lt;h2 id="how-to-enable-psi-metrics">How to enable PSI metrics&lt;/h2>
&lt;p>To enable PSI metrics in your Kubernetes cluster, you need to:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Ensure your nodes are running a Linux kernel version 4.20 or later and are using cgroup v2.&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Enable the &lt;code>KubeletPSI&lt;/code> feature gate on the kubelet.&lt;/strong>&lt;/li>
&lt;/ol>
&lt;p>Once enabled, you can start scraping the &lt;code>/metrics/cadvisor&lt;/code> endpoint with your Prometheus-compatible monitoring solution or query the Summary API to collect and visualize the new PSI metrics. Note that PSI is a Linux-kernel feature, so these metrics are not available on Windows nodes. Your cluster can contain a mix of Linux and Windows nodes, and on the Windows nodes the kubelet does not expose PSI metrics.&lt;/p>
&lt;h2 id="what-s-next">What's next?&lt;/h2>
&lt;p>We are excited to bring PSI metrics to the Kubernetes community and look forward to your feedback. As a beta feature, we are actively working on improving and extending this functionality towards a stable GA release. We encourage you to try it out and share your experiences with us.&lt;/p>
&lt;p>To learn more about PSI metrics, check out the official &lt;a href="https://kubernetes.io/docs/reference/instrumentation/understand-psi-metrics/">Kubernetes documentation&lt;/a>. You can also get involved in the conversation on the &lt;a href="https://kubernetes.slack.com/messages/sig-node">#sig-node&lt;/a> Slack channel.&lt;/p></description></item><item><title>Kubernetes v1.34: Service Account Token Integration for Image Pulls Graduates to Beta</title><link>https://kubernetes.io/blog/2025/09/03/kubernetes-v1-34-sa-tokens-image-pulls-beta/</link><pubDate>Wed, 03 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/03/kubernetes-v1-34-sa-tokens-image-pulls-beta/</guid><description>
&lt;p>The Kubernetes community continues to advance security best practices
by reducing reliance on long-lived credentials.
Following the successful &lt;a href="https://kubernetes.io/blog/2025/05/07/kubernetes-v1-33-wi-for-image-pulls/">alpha release in Kubernetes v1.33&lt;/a>,
&lt;em>Service Account Token Integration for Kubelet Credential Providers&lt;/em>
has now graduated to &lt;strong>beta&lt;/strong> in Kubernetes v1.34,
bringing us closer to eliminating long-lived image pull secrets from Kubernetes clusters.&lt;/p>
&lt;p>This enhancement allows credential providers
to use workload-specific service account tokens to obtain registry credentials,
providing a secure, ephemeral alternative to traditional image pull secrets.&lt;/p>
&lt;h2 id="what-s-new-in-beta">What's new in beta?&lt;/h2>
&lt;p>The beta graduation brings several important changes
that make the feature more robust and production-ready:&lt;/p>
&lt;h3 id="required-cachetype-field">Required &lt;code>cacheType&lt;/code> field&lt;/h3>
&lt;p>&lt;strong>Breaking change from alpha&lt;/strong>: The &lt;code>cacheType&lt;/code> field is &lt;strong>required&lt;/strong>
in the credential provider configuration when using service account tokens.
This field is new in beta and must be specified to ensure proper caching behavior.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># CAUTION: this is not a complete configuration example, just a reference for the &amp;#39;tokenAttributes.cacheType&amp;#39; field.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">tokenAttributes&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">serviceAccountTokenAudience&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;my-registry-audience&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cacheType&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;ServiceAccount&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># Required field in beta&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requireServiceAccount&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Choose between two caching strategies:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>&lt;code>Token&lt;/code>&lt;/strong>: Cache credentials per service account token
(use when credential lifetime is tied to the token).
This is useful when the credential provider transforms the service account token into registry credentials
with the same lifetime as the token, or when registries support Kubernetes service account tokens directly.
Note: The kubelet cannot send service account tokens directly to registries;
credential provider plugins are needed to transform tokens into the username/password format expected by registries.&lt;/li>
&lt;li>&lt;strong>&lt;code>ServiceAccount&lt;/code>&lt;/strong>: Cache credentials per service account identity
(use when credentials are valid for all pods using the same service account)&lt;/li>
&lt;/ul>
&lt;h3 id="isolated-image-pull-credentials">Isolated image pull credentials&lt;/h3>
&lt;p>The beta release provides stronger security isolation for container images
when using service account tokens for image pulls.
It ensures that pods can only access images that were pulled using ServiceAccounts they're authorized to use.
This prevents unauthorized access to sensitive container images
and enables granular access control where different workloads can have different registry permissions
based on their ServiceAccount.&lt;/p>
&lt;p>When credential providers use service account tokens,
the system tracks ServiceAccount identity (namespace, name, and &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids">UID&lt;/a>) for each pulled image.
When a pod attempts to use a cached image,
the system verifies that the pod's ServiceAccount matches exactly with the ServiceAccount
that was used to originally pull the image.&lt;/p>
&lt;p>Administrators can revoke access to previously pulled images
by deleting and recreating the ServiceAccount,
which changes the UID and invalidates cached image access.&lt;/p>
&lt;p>For more details about this capability,
see the &lt;a href="https://kubernetes.io/docs/concepts/containers/images/#ensureimagepullcredentialverification">image pull credential verification&lt;/a> documentation.&lt;/p>
&lt;h2 id="how-it-works">How it works&lt;/h2>
&lt;h3 id="configuration">Configuration&lt;/h3>
&lt;p>Credential providers opt into using ServiceAccount tokens
by configuring the &lt;code>tokenAttributes&lt;/code> field:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># CAUTION: this is an example configuration.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># Do not use this for your own cluster!&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubelet.config.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>CredentialProviderConfig&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">providers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-credential-provider&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">matchImages&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#b44">&amp;#34;*.myregistry.io/*&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">defaultCacheDuration&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;10m&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>credentialprovider.kubelet.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">tokenAttributes&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">serviceAccountTokenAudience&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;my-registry-audience&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">cacheType&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;ServiceAccount&amp;#34;&lt;/span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># New in beta&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requireServiceAccount&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">requiredServiceAccountAnnotationKeys&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#b44">&amp;#34;myregistry.io/identity-id&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">optionalServiceAccountAnnotationKeys&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#b44">&amp;#34;myregistry.io/optional-annotation&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="image-pull-flow">Image pull flow&lt;/h3>
&lt;p>At a high level, &lt;code>kubelet&lt;/code> coordinates with your credential provider
and the container runtime as follows:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>When the image is not present locally:&lt;/p>
&lt;ul>
&lt;li>&lt;code>kubelet&lt;/code> checks its credential cache using the configured &lt;code>cacheType&lt;/code>
(&lt;code>Token&lt;/code> or &lt;code>ServiceAccount&lt;/code>)&lt;/li>
&lt;li>If needed, &lt;code>kubelet&lt;/code> requests a ServiceAccount token for the pod's ServiceAccount
and passes it, plus any required annotations, to the credential provider&lt;/li>
&lt;li>The provider exchanges that token for registry credentials
and returns them to &lt;code>kubelet&lt;/code>&lt;/li>
&lt;li>&lt;code>kubelet&lt;/code> caches credentials per the &lt;code>cacheType&lt;/code> strategy
and pulls the image with those credentials&lt;/li>
&lt;li>&lt;code>kubelet&lt;/code> records the ServiceAccount coordinates (namespace, name, UID)
associated with the pulled image for later authorization checks&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>When the image is already present locally:&lt;/p>
&lt;ul>
&lt;li>&lt;code>kubelet&lt;/code> verifies the pod's ServiceAccount coordinates
match the coordinates recorded for the cached image&lt;/li>
&lt;li>If they match exactly, the cached image can be used
without pulling from the registry&lt;/li>
&lt;li>If they differ, &lt;code>kubelet&lt;/code> performs a fresh pull
using credentials for the new ServiceAccount&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>With image pull credential verification enabled:&lt;/p>
&lt;ul>
&lt;li>Authorization is enforced using the recorded ServiceAccount coordinates,
ensuring pods only use images pulled by a ServiceAccount
they are authorized to use&lt;/li>
&lt;li>Administrators can revoke access by deleting and recreating a ServiceAccount;
the UID changes and previously recorded authorization no longer matches&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="audience-restriction">Audience restriction&lt;/h3>
&lt;p>The beta release builds on service account node audience restriction
(beta since v1.33) to ensure &lt;code>kubelet&lt;/code> can only request tokens for authorized audiences.
Administrators configure allowed audiences using RBAC to enable kubelet to request service account tokens for image pulls:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># CAUTION: this is an example configuration.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># Do not use this for your own cluster!&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>rbac.authorization.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ClusterRole&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubelet-credential-provider-audiences&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">rules&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">verbs&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;request-serviceaccounts-token-audience&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">apiGroups&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;my-registry-audience&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resourceNames&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;registry-access-sa&amp;#34;&lt;/span>&lt;span style="color:#008000;font-weight:bold">] # Optional&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>specific SA&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="getting-started-with-beta">Getting started with beta&lt;/h2>
&lt;h3 id="prerequisites">Prerequisites&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Kubernetes v1.34 or later&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Feature gate enabled&lt;/strong>:
&lt;code>KubeletServiceAccountTokenForCredentialProviders=true&lt;/code> (beta, enabled by default)&lt;/li>
&lt;li>&lt;strong>Credential provider support&lt;/strong>:
Update your credential provider to handle ServiceAccount tokens&lt;/li>
&lt;/ol>
&lt;h3 id="migration-from-alpha">Migration from alpha&lt;/h3>
&lt;p>If you're already using the alpha version,
the migration to beta requires minimal changes:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Add &lt;code>cacheType&lt;/code> field&lt;/strong>:
Update your credential provider configuration to include the required &lt;code>cacheType&lt;/code> field&lt;/li>
&lt;li>&lt;strong>Review caching strategy&lt;/strong>:
Choose between &lt;code>Token&lt;/code> and &lt;code>ServiceAccount&lt;/code> cache types based on your provider's behavior&lt;/li>
&lt;li>&lt;strong>Test audience restrictions&lt;/strong>:
Ensure your RBAC configuration, or other cluster authorization rules, will properly restrict token audiences&lt;/li>
&lt;/ol>
&lt;h3 id="example-setup">Example setup&lt;/h3>
&lt;p>Here's a complete example
for setting up a credential provider with service account tokens
(this example assumes your cluster uses RBAC authorization):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># CAUTION: this is an example configuration.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># Do not use this for your own cluster!&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic">#&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># Service Account with registry annotations&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ServiceAccount&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>registry-access-sa&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">namespace&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>default&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">annotations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">myregistry.io/identity-id&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;user123&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">---&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># RBAC for audience restriction&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>rbac.authorization.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ClusterRole&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>registry-audience-access&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">rules&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">verbs&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;request-serviceaccounts-token-audience&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">apiGroups&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resources&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;my-registry-audience&amp;#34;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">resourceNames&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;registry-access-sa&amp;#34;&lt;/span>&lt;span style="color:#008000;font-weight:bold">] # Optional&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>specific ServiceAccount&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">---&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>rbac.authorization.k8s.io/v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ClusterRoleBinding&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubelet-registry-audience&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">roleRef&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">apiGroup&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>rbac.authorization.k8s.io&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>ClusterRole&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>registry-audience-access&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">subjects&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Group&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>system:nodes&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">apiGroup&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>rbac.authorization.k8s.io&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">---&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># Pod using the ServiceAccount&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">serviceAccountName&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>registry-access-sa&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>my-app&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>myregistry.example/my-app:latest&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="what-s-next">What's next?&lt;/h2>
&lt;p>For Kubernetes v1.35, we - Kubernetes SIG Auth - expect the feature to stay in beta,
and we will continue to solicit feedback.&lt;/p>
&lt;p>You can learn more about this feature
on the &lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-credential-provider/#service-account-token-for-image-pulls">service account token for image pulls&lt;/a>
page in the Kubernetes documentation.&lt;/p>
&lt;p>You can also follow along on the
&lt;a href="https://kep.k8s.io/4412">KEP-4412&lt;/a>
to track progress across the coming Kubernetes releases.&lt;/p>
&lt;h2 id="call-to-action">Call to action&lt;/h2>
&lt;p>In this blog post,
I have covered the beta graduation of ServiceAccount token integration
for Kubelet Credential Providers in Kubernetes v1.34.
I discussed the key improvements,
including the required &lt;code>cacheType&lt;/code> field
and enhanced integration with Ensure Secret Pull Images.&lt;/p>
&lt;p>We have been receiving positive feedback from the community during the alpha phase
and would love to hear more as we stabilize this feature for GA.
In particular, we would like feedback from credential provider implementors
as they integrate with the new beta API and caching mechanisms.
Please reach out to us on the &lt;a href="https://kubernetes.slack.com/archives/C04UMAUC4UA">#sig-auth-authenticators-dev&lt;/a> channel on Kubernetes Slack.&lt;/p>
&lt;h2 id="how-to-get-involved">How to get involved&lt;/h2>
&lt;p>If you are interested in getting involved in the development of this feature,
share feedback, or participate in any other ongoing SIG Auth projects,
please reach out on the &lt;a href="https://kubernetes.slack.com/archives/C0EN96KUY">#sig-auth&lt;/a> channel on Kubernetes Slack.&lt;/p>
&lt;p>You are also welcome to join the bi-weekly &lt;a href="https://github.com/kubernetes/community/blob/master/sig-auth/README.md#meetings">SIG Auth meetings&lt;/a>,
held every other Wednesday.&lt;/p></description></item><item><title>Kubernetes v1.34: Introducing CPU Manager Static Policy Option for Uncore Cache Alignment</title><link>https://kubernetes.io/blog/2025/09/02/kubernetes-v1-34-prefer-align-by-uncore-cache-cpumanager-static-policy-optimization/</link><pubDate>Tue, 02 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/02/kubernetes-v1-34-prefer-align-by-uncore-cache-cpumanager-static-policy-optimization/</guid><description>
&lt;p>A new CPU Manager Static Policy Option called &lt;code>prefer-align-cpus-by-uncorecache&lt;/code> was introduced in Kubernetes v1.32 as an alpha feature, and has graduated to &lt;strong>beta&lt;/strong> in Kubernetes v1.34.
This CPU Manager Policy Option is designed to optimize performance for specific workloads running on processors with a &lt;em>split uncore cache&lt;/em> architecture.
In this article, I'll explain what that means and why it's useful.&lt;/p>
&lt;h2 id="understanding-the-feature">Understanding the feature&lt;/h2>
&lt;h3 id="what-is-uncore-cache">What is uncore cache?&lt;/h3>
&lt;p>Until relatively recently, nearly all mainstream computer processors had a
monolithic last-level-cache cache that was shared across every core in a multiple
CPU package.
This monolithic cache is also referred to as &lt;em>uncore cache&lt;/em>
(because it is not linked to a specific core), or as Level 3 cache.
As well as the Level 3 cache, there is other cache, commonly called Level 1 and Level 2 cache,
that &lt;strong>is&lt;/strong> associated with a specific CPU core.&lt;/p>
&lt;p>In order to reduce access latency between the CPU cores and their cache, recent AMD64 and ARM
architecture based processors have introduced a &lt;em>split uncore cache&lt;/em> architecture,
where the last-level-cache is divided into multiple physical caches,
that are aligned to specific CPU groupings within the physical package.
The shorter distances within the CPU package help to reduce latency.
&lt;img alt="Diagram showing monolithic cache on the left and split cache on the right" src="https://kubernetes.io/blog/2025/09/02/kubernetes-v1-34-prefer-align-by-uncore-cache-cpumanager-static-policy-optimization/mono_vs_split_uncore.png">&lt;/p>
&lt;p>Kubernetes is able to place workloads in a way that accounts for the cache
topology within the CPU package(s).&lt;/p>
&lt;h3 id="cache-aware-workload-placement">Cache-aware workload placement&lt;/h3>
&lt;p>The matrix below shows the &lt;a href="https://github.com/nviennot/core-to-core-latency">CPU-to-CPU latency&lt;/a> measured in nanoseconds (lower is better) when
passing a packet between CPUs, via its cache coherence protocol on a processor that
uses split uncore cache.
In this example, the processor package consists of 2 uncore caches.
Each uncore cache serves 8 CPU cores.
&lt;img alt="Table showing CPU-to-CPU latency figures" src="https://kubernetes.io/blog/2025/09/02/kubernetes-v1-34-prefer-align-by-uncore-cache-cpumanager-static-policy-optimization/c2c_latency.png">
Blue entries in the matrix represent latency between CPUs sharing the same uncore cache, while grey entries indicate latency between CPUs corresponding to different uncore caches. Latency between CPUs that correspond to different caches are higher than the latency between CPUs that belong to the same cache.&lt;/p>
&lt;p>With &lt;code>prefer-align-cpus-by-uncorecache&lt;/code> enabled, the
&lt;a href="https://kubernetes.io/docs/concepts/policy/node-resource-managers/#static-policy">static CPU Manager&lt;/a> attempts to allocates CPU resources for a container, such that all CPUs assigned to a container share the same uncore cache.
This policy operates on a best-effort basis, aiming to minimize the distribution of a container's CPU resources across uncore caches, based on the
container's requirements, and accounting for allocatable resources on the node.&lt;/p>
&lt;p>By running a workload, where it can, on a set of CPUS that use the smallest feasible number of uncore caches, applications benefit from reduced cache latency (as seen in the matrix above),
and from reduced contention against other workloads, which can result in overall higher throughput.
The benefit only shows up if your nodes use a split uncore cache topology for their processors.&lt;/p>
&lt;p>The following diagram below illustrates uncore cache alignment when the feature is enabled.&lt;/p>
&lt;p>&lt;img alt="Diagram showing an example workload CPU assignment, default static policy, and with prefer-align-cpus-by-uncorecache" src="https://kubernetes.io/blog/2025/09/02/kubernetes-v1-34-prefer-align-by-uncore-cache-cpumanager-static-policy-optimization/cache-align-diagram.png">&lt;/p>
&lt;p>By default, Kubernetes does not account for uncore cache topology; containers are assigned CPU resources using a packed methodology.
As a result, Container 1 and Container 2 can experience a noisy neighbor impact due to
cache access contention on Uncore Cache 0. Additionally, Container 2 will have CPUs distributed across both caches which can introduce a cross-cache latency.&lt;/p>
&lt;p>With &lt;code>prefer-align-cpus-by-uncorecache&lt;/code> enabled, each container is isolated on an individual cache. This resolves the cache contention between the containers and minimizes the cache latency for the CPUs being utilized.&lt;/p>
&lt;h2 id="use-cases">Use cases&lt;/h2>
&lt;p>Common use cases can include telco applications like vRAN, Mobile Packet Core, and Firewalls. It's important to note that the optimization provided by &lt;code>prefer-align-cpus-by-uncorecache&lt;/code> can be dependent on the workload. For example, applications that are memory bandwidth bound may not benefit from uncore cache alignment, as utilizing more uncore caches can increase memory bandwidth access.&lt;/p>
&lt;h2 id="enabling-the-feature">Enabling the feature&lt;/h2>
&lt;p>To enable this feature, set the CPU Manager Policy to &lt;code>static&lt;/code> and enable the CPU Manager Policy Options with &lt;code>prefer-align-cpus-by-uncorecache&lt;/code>.&lt;/p>
&lt;p>For Kubernetes 1.34, the feature is in the beta stage and requires the &lt;code>CPUManagerPolicyBetaOptions&lt;/code>
&lt;a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/">feature gate&lt;/a> to also be enabled.&lt;/p>
&lt;p>Append the following to the kubelet configuration file:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>KubeletConfiguration&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubelet.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">featureGates&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>...&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">CPUManagerPolicyBetaOptions&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">true&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">cpuManagerPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;static&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">cpuManagerPolicyOptions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">prefer-align-cpus-by-uncorecache&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;true&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">reservedSystemCPUs&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;0&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#00f;font-weight:bold">...&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If you're making this change to an existing node, remove the &lt;code>cpu_manager_state&lt;/code> file and then restart kubelet.&lt;/p>
&lt;p>&lt;code>prefer-align-cpus-by-uncorecache&lt;/code> can be enabled on nodes with a monolithic uncore cache processor. The feature will mimic a best-effort socket alignment effect and will pack CPU resources on the socket similar to the default static CPU Manager policy.&lt;/p>
&lt;h2 id="further-reading">Further reading&lt;/h2>
&lt;p>See &lt;a href="https://kubernetes.io/docs/concepts/policy/node-resource-managers/">Node Resource Managers&lt;/a> to learn more about the CPU Manager and the available policies.&lt;/p>
&lt;p>Reference the documentation for &lt;code>prefer-align-cpus-by-uncorecache&lt;/code> &lt;a href="https://kubernetes.io/docs/concepts/policy/node-resource-managers/#prefer-align-cpus-by-uncorecache">here&lt;/a>.&lt;/p>
&lt;p>Please see the &lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4800-cpumanager-split-uncorecache">Kubernetes Enhancement Proposal&lt;/a> for more information on how &lt;code>prefer-align-cpus-by-uncorecache&lt;/code> is implemented.&lt;/p>
&lt;h2 id="getting-involved">Getting involved&lt;/h2>
&lt;p>This feature is driven by &lt;a href="https://github.com/Kubernetes/community/blob/master/sig-node/README.md">SIG Node&lt;/a>. If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please attend the SIG Node meeting for more details.&lt;/p></description></item><item><title>Kubernetes v1.34: DRA has graduated to GA</title><link>https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/</link><pubDate>Mon, 01 Sep 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/09/01/kubernetes-v1-34-dra-updates/</guid><description>
&lt;p>Kubernetes 1.34 is here, and it has brought a huge wave of enhancements for Dynamic Resource Allocation (DRA)! This
release marks a major milestone with many APIs in the &lt;code>resource.k8s.io&lt;/code> group graduating to General Availability (GA),
unlocking the full potential of how you manage devices on Kubernetes. On top of that, several key features have
moved to beta, and a fresh batch of new alpha features promise even more expressiveness and flexibility.&lt;/p>
&lt;p>Let's dive into what's new for DRA in Kubernetes 1.34!&lt;/p>
&lt;h2 id="the-core-of-dra-is-now-ga">The core of DRA is now GA&lt;/h2>
&lt;p>The headline feature of the v1.34 release is that the core of DRA has graduated to General Availability.&lt;/p>
&lt;p>Kubernetes &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/">Dynamic Resource Allocation (DRA)&lt;/a> provides
a flexible framework for managing specialized hardware and infrastructure resources, such as GPUs or FPGAs. DRA
provides APIs that enable each workload to specify the properties of the devices it needs, but leaving it to the
scheduler to allocate actual devices, allowing increased reliability and improved utilization of expensive hardware.&lt;/p>
&lt;p>With the graduation to GA, DRA is stable and will be part of Kubernetes for the long run. The community can still
expect a steady stream of new features being added to DRA over the next several Kubernetes releases, but they will
not make any breaking changes to DRA. So users and developers of DRA drivers can start adopting DRA with confidence.&lt;/p>
&lt;p>Starting with Kubernetes 1.34, DRA is enabled by default; the DRA features that have reached beta are &lt;strong>also&lt;/strong> enabled by default.
That's because the default API version for DRA is now the stable &lt;code>v1&lt;/code> version, and not the earlier versions
(eg: &lt;code>v1beta1&lt;/code> or &lt;code>v1beta2&lt;/code>) that needed explicit opt in.&lt;/p>
&lt;h2 id="features-promoted-to-beta">Features promoted to beta&lt;/h2>
&lt;p>Several powerful features have been promoted to beta, adding more control, flexibility, and observability to resource
management with DRA.&lt;/p>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#admin-access">Admin access labelling&lt;/a> has been updated.
In v1.34, you can restrict device support to people (or software) authorized to use it. This is meant
as a way to avoid privilege escalation if a DRA driver grants additional privileges when admin access is requested
and to avoid accessing devices which are in use by normal applications, potentially in another namespace.
The restriction works by ensuring that only users with access to a namespace with the
&lt;code>resource.k8s.io/admin-access: &amp;quot;true&amp;quot;&lt;/code> label are authorized to create
ResourceClaim or ResourceClaimTemplates objects with the &lt;code>adminAccess&lt;/code> field set to true. This ensures that non-admin users cannot misuse the feature.&lt;/p>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#prioritized-list">Prioritized list&lt;/a> lets users specify
a list of acceptable devices for their workloads, rather than just a single type of device. So while the workload
might run best on a single high-performance GPU, it might also be able to run on 2 mid-level GPUs. The scheduler will
attempt to satisfy the alternatives in the list in order, so the workload will be allocated the best set of devices
available on the node.&lt;/p>
&lt;p>The kubelet's API has been updated to report on Pod resources allocated through DRA. This allows node monitoring agents
to know the allocated DRA resources for Pods on a node and makes it possible to use the DRA information in the PodResources API
to develop new features and integrations.&lt;/p>
&lt;h2 id="new-alpha-features">New alpha features&lt;/h2>
&lt;p>Kubernetes 1.34 also introduces several new alpha features that give us a glimpse into the future of resource management with DRA.&lt;/p>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#extended-resource">Extended resource mapping&lt;/a> support in DRA allows
cluster administrators to advertise DRA-managed resources as &lt;em>extended resources&lt;/em>, allowing developers to consume them using
the familiar, simpler request syntax while still benefiting from dynamic allocation. This makes it possible for existing
workloads to start using DRA without modifications, simplifying the transition to DRA for both application developers and
cluster administrators.&lt;/p>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#consumable-capacity">Consumable capacity&lt;/a> introduces a flexible
device sharing model where multiple, independent resource claims from unrelated
pods can each be allocated a share of the same underlying physical device. This new capability is managed through optional,
administrator-defined sharing policies that govern how a device's total capacity is divided and enforced by the platform for
each request. This allows for sharing of devices in scenarios where pre-defined partitions are not viable.&lt;/p>
&lt;p>For more information, see &lt;a href="https://kubernetes.io/blog/2025/09/18/kubernetes-v1-34-dra-consumable-capacity/">Kubernetes v1.34: DRA Consumable Capacity&lt;/a>&lt;/p>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#binding-conditions">Binding conditions&lt;/a> improve scheduling
reliability for certain classes of devices by allowing the Kubernetes scheduler to delay binding a pod to a node until its
required external resources, such as attachable devices or FPGAs, are confirmed to be fully prepared. This prevents premature
pod assignments that could lead to failures and ensures more robust, predictable scheduling by explicitly modeling resource
readiness before the pod is committed to a node.&lt;/p>
&lt;p>&lt;em>Resource health status&lt;/em> for DRA improves observability by exposing the health status of devices allocated to a Pod via Pod Status.
This works whether the device is allocated through DRA or Device Plugin. This makes it easier to understand the cause of an
unhealthy device and respond properly.&lt;/p>
&lt;p>For more information, see &lt;a href="https://kubernetes.io/blog/2025/09/17/kubernetes-v1-34-pods-report-dra-resource-health/">Kubernetes v1.34: Pods Report DRA Resource Health&lt;/a>&lt;/p>
&lt;h2 id="what-s-next">What’s next?&lt;/h2>
&lt;p>While DRA got promoted to GA this cycle, the hard work on DRA doesn't stop. There are several features in alpha and beta that
we plan to bring to GA in the next couple of releases and we are looking to continue to improve performance, scalability
and reliability of DRA. So expect an equally ambitious set of features in DRA for the 1.35 release.&lt;/p>
&lt;h2 id="getting-involved">Getting involved&lt;/h2>
&lt;p>A good starting point is joining the WG Device Management &lt;a href="https://kubernetes.slack.com/archives/C0409NGC1TK">Slack channel&lt;/a> and &lt;a href="https://docs.google.com/document/d/1qxI87VqGtgN7EAJlqVfxx86HGKEAc2A3SKru8nJHNkQ/edit?tab=t.0#heading=h.tgg8gganowxq">meetings&lt;/a>, which happen at US/EU and EU/APAC friendly time slots.&lt;/p>
&lt;p>Not all enhancement ideas are tracked as issues yet, so come talk to us if you want to help or have some ideas yourself! We have work to do at all levels, from difficult core changes to usability enhancements in kubectl, which could be picked up by newcomers.&lt;/p>
&lt;h2 id="acknowledgments">Acknowledgments&lt;/h2>
&lt;p>A huge thanks to the new contributors to DRA this cycle:&lt;/p>
&lt;ul>
&lt;li>Alay Patel (&lt;a href="https://github.com/alaypatel07">alaypatel07&lt;/a>)&lt;/li>
&lt;li>Gaurav Kumar Ghildiyal (&lt;a href="https://github.com/gauravkghildiyal">gauravkghildiyal&lt;/a>)&lt;/li>
&lt;li>JP (&lt;a href="https://github.com/Jpsassine">Jpsassine&lt;/a>)&lt;/li>
&lt;li>Kobayashi Daisuke (&lt;a href="https://github.com/KobayashiD27">KobayashiD27&lt;/a>)&lt;/li>
&lt;li>Laura Lorenz (&lt;a href="https://github.com/lauralorenz">lauralorenz&lt;/a>)&lt;/li>
&lt;li>Sunyanan Choochotkaew (&lt;a href="https://github.com/sunya-ch">sunya-ch&lt;/a>)&lt;/li>
&lt;li>Swati Gupta (&lt;a href="https://github.com/guptaNswati">guptaNswati&lt;/a>)&lt;/li>
&lt;li>Yu Liao (&lt;a href="https://github.com/yliaog">yliaog&lt;/a>)&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.34: Finer-Grained Control Over Container Restarts</title><link>https://kubernetes.io/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/</link><pubDate>Fri, 29 Aug 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/</guid><description>
&lt;p>With the release of Kubernetes 1.34, a new alpha feature is introduced
that gives you more granular control over container restarts within a Pod. This
feature, named &lt;strong>Container Restart Policy and Rules&lt;/strong>, allows you to specify a
restart policy for each container individually, overriding the Pod's global
restart policy. In addition, it also allows you to conditionally restart
individual containers based on their exit codes. This feature is available
behind the alpha feature gate &lt;code>ContainerRestartRules&lt;/code>.&lt;/p>
&lt;p>This has been a long-requested feature. Let's dive into how it works and how you
can use it.&lt;/p>
&lt;h2 id="the-problem-with-a-single-restart-policy">The problem with a single restart policy&lt;/h2>
&lt;p>Before this feature, the &lt;code>restartPolicy&lt;/code> was set at the Pod level. This meant
that all containers in a Pod shared the same restart policy (&lt;code>Always&lt;/code>,
&lt;code>OnFailure&lt;/code>, or &lt;code>Never&lt;/code>). While this works for many use cases, it can be
limiting in others.&lt;/p>
&lt;p>For example, consider a Pod with a main application container and an init
container that performs some initial setup. You might want the main container
to always restart on failure, but the init container should only run once and
never restart. With a single Pod-level restart policy, this wasn't possible.&lt;/p>
&lt;h2 id="introducing-per-container-restart-policies">Introducing per-container restart policies&lt;/h2>
&lt;p>With the new &lt;code>ContainerRestartRules&lt;/code> feature gate, you can now specify a
&lt;code>restartPolicy&lt;/code> for each container in your Pod's spec. You can also define
&lt;code>restartPolicyRules&lt;/code> to control restarts based on exit codes. This gives you
the fine-grained control you need to handle complex scenarios.&lt;/p>
&lt;h2 id="use-cases">Use cases&lt;/h2>
&lt;p>Let's look at some real-life use cases where per-container restart policies can
be beneficial.&lt;/p>
&lt;h3 id="in-place-restarts-for-training-jobs">In-place restarts for training jobs&lt;/h3>
&lt;p>In ML research, it's common to orchestrate a large number of long-running AI/ML
training workloads. In these scenarios, workload failures are unavoidable. When
a workload fails with a retriable exit code, you want the container to restart
quickly without rescheduling the entire Pod, which consumes a significant amount
of time and resources. Restarting the failed container &amp;quot;in-place&amp;quot; is critical
for better utilization of compute resources. The container should only restart
&amp;quot;in-place&amp;quot; if it failed due to a retriable error; otherwise, the container and
Pod should terminate and possibly be rescheduled.&lt;/p>
&lt;p>This can now be achieved with container-level &lt;code>restartPolicyRules&lt;/code>. The workload
can exit with different codes to represent retriable and non-retriable errors.
With &lt;code>restartPolicyRules&lt;/code>, the workload can be restarted in-place quickly, but
only when the error is retriable.&lt;/p>
&lt;h3 id="try-once-init-containers">Try-once init containers&lt;/h3>
&lt;p>Init containers are often used to perform initialization work for the main
container, such as setting up environments and credentials. Sometimes, you want
the main container to always be restarted, but you don't want to retry
initialization if it fails.&lt;/p>
&lt;p>With a container-level &lt;code>restartPolicy&lt;/code>, this is now possible. The init container
can be executed only once, and its failure would be considered a Pod failure. If
the initialization succeeds, the main container can be always restarted.&lt;/p>
&lt;h3 id="pods-with-multiple-containers">Pods with multiple containers&lt;/h3>
&lt;p>For Pods that run multiple containers, you might have different restart
requirements for each container. Some containers might have a clear definition
of success and should only be restarted on failure. Others might need to be
always restarted.&lt;/p>
&lt;p>This is now possible with a container-level &lt;code>restartPolicy&lt;/code>, allowing individual
containers to have different restart policies.&lt;/p>
&lt;h2 id="how-to-use-it">How to use it&lt;/h2>
&lt;p>To use this new feature, you need to enable the &lt;code>ContainerRestartRules&lt;/code> feature
gate on your Kubernetes cluster control-plane and worker nodes running
Kubernetes 1.34+. Once enabled, you can specify the &lt;code>restartPolicy&lt;/code> and
&lt;code>restartPolicyRules&lt;/code> fields in your container definitions.&lt;/p>
&lt;p>Here are some examples:&lt;/p>
&lt;h3 id="example-1-restarting-on-specific-exit-codes">Example 1: Restarting on specific exit codes&lt;/h3>
&lt;p>In this example, the container should restart if and only if it fails with a
retriable error, represented by exit code 42.&lt;/p>
&lt;p>To achieve this, the container has &lt;code>restartPolicy: Never&lt;/code>, and a restart
policy rule that tells Kubernetes to restart the container in-place if it exits
with code 42.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>restart-on-exit-codes&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">annotations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">kubernetes.io/description&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;This Pod only restart the container only when it exits with code 42.&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Never&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>restart-on-exit-codes&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>docker.io/library/busybox:1.28&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#39;sh&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;-c&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;sleep 60 &amp;amp;&amp;amp; exit 0&amp;#39;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Never &lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># Container restart policy must be specified if rules are specified&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicyRules&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># Only restart the container if it exits with code 42&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">action&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Restart&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">exitCodes&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">operator&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>In&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">values&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#666">42&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="example-2-a-try-once-init-container">Example 2: A try-once init container&lt;/h3>
&lt;p>In this example, a Pod should always be restarted once the initialization succeeds.
However, the initialization should only be tried once.&lt;/p>
&lt;p>To achieve this, the Pod has an &lt;code>Always&lt;/code> restart policy. The &lt;code>init-once&lt;/code>
init container will only try once. If it fails, the Pod will fail. This allows
the Pod to fail if the initialization failed, but also keep running once the
initialization succeeds.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>fail-pod-if-init-fails&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">annotations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">kubernetes.io/description&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;This Pod has an init container that runs only once. After initialization succeeds, the main container will always be restarted.&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Always&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">initContainers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>init-once &lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># This init container will only try once. If it fails, the Pod will fail.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>docker.io/library/busybox:1.28&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#39;sh&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;-c&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;echo &amp;#34;Failing initialization&amp;#34; &amp;amp;&amp;amp; sleep 10 &amp;amp;&amp;amp; exit 1&amp;#39;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Never&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>main-container&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#080;font-style:italic"># This container will always be restarted once initialization succeeds.&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>docker.io/library/busybox:1.28&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#39;sh&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;-c&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;sleep 1800 &amp;amp;&amp;amp; exit 0&amp;#39;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="example-3-containers-with-different-restart-policies">Example 3: Containers with different restart policies&lt;/h3>
&lt;p>In this example, there are two containers with different restart requirements. One
should always be restarted, while the other should only be restarted on failure.&lt;/p>
&lt;p>This is achieved by using a different container-level &lt;code>restartPolicy&lt;/code> on each of
the two containers.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">metadata&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#a2f;font-weight:bold">on&lt;/span>-failure-pod&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">annotations&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">kubernetes.io/description&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;This Pod has two containers with different restart policies.&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">spec&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">containers&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>restart-on-failure&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>docker.io/library/busybox:1.28&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#39;sh&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;-c&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;echo &amp;#34;Not restarting after success&amp;#34; &amp;amp;&amp;amp; sleep 10 &amp;amp;&amp;amp; exit 0&amp;#39;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>OnFailure&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>restart-always&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">image&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>docker.io/library/busybox:1.28&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#39;sh&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;-c&amp;#39;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#39;echo &amp;#34;Always restarting&amp;#34; &amp;amp;&amp;amp; sleep 1800 &amp;amp;&amp;amp; exit 0&amp;#39;&lt;/span>]&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">restartPolicy&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Always&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="learn-more">Learn more&lt;/h2>
&lt;ul>
&lt;li>Read the documentation for
&lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-restart-rules">container restart policy&lt;/a>.&lt;/li>
&lt;li>Read the KEP for the
&lt;a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5307-container-restart-policy">Container Restart Rules&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="roadmap">Roadmap&lt;/h2>
&lt;p>More actions and signals to restart Pods and containers are coming! Notably,
there are plans to add support for restarting the entire Pod. Planning and
discussions on these features are in progress. Feel free to share feedback or
requests with the SIG Node community!&lt;/p>
&lt;h2 id="your-feedback-is-welcome">Your feedback is welcome!&lt;/h2>
&lt;p>This is an alpha feature, and the Kubernetes project would love to hear your feedback.
Please try it out. This feature is driven by the
&lt;a href="https://github.com/Kubernetes/community/blob/master/sig-node/README.md">SIG Node&lt;/a>.
If you are interested in helping develop this feature, sharing feedback, or
participating in any other ongoing SIG Node projects, please reach out to the
SIG Node community!&lt;/p>
&lt;p>You can reach SIG Node by several means:&lt;/p>
&lt;ul>
&lt;li>Slack: &lt;a href="https://kubernetes.slack.com/messages/sig-node">#sig-node&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://groups.google.com/forum/#!forum/kubernetes-sig-node">Mailing list&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/community/labels/sig%2Fnode">Open Community Issues/PRs&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Kubernetes v1.34: User preferences (kuberc) are available for testing in kubectl 1.34</title><link>https://kubernetes.io/blog/2025/08/28/kubernetes-v1-34-kubectl-kuberc-beta/</link><pubDate>Thu, 28 Aug 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/08/28/kubernetes-v1-34-kubectl-kuberc-beta/</guid><description>
&lt;p>Have you ever wished you could enable &lt;a href="https://kep.k8s.io/3895">interactive delete&lt;/a>,
by default, in &lt;code>kubectl&lt;/code>? Or maybe, you'd like to have custom aliases defined,
but not necessarily &lt;a href="https://github.com/ahmetb/kubectl-aliases">generate hundreds of them manually&lt;/a>?
Look no further. &lt;a href="https://git.k8s.io/community/sig-cli/">SIG-CLI&lt;/a>
has been working hard to add &lt;a href="https://kep.k8s.io/3104">user preferences to kubectl&lt;/a>,
and we are happy to announce that this functionality is reaching beta as part
of the Kubernetes v1.34 release.&lt;/p>
&lt;h2 id="how-it-works">How it works&lt;/h2>
&lt;p>A full description of this functionality is available &lt;a href="https://kubernetes.io/docs/reference/kubectl/kuberc/">in our official documentation&lt;/a>,
but this blog post will answer both of the questions from the beginning of this
article.&lt;/p>
&lt;p>Before we dive into details, let's quickly cover what the user preferences file
looks like and where to place it. By default, &lt;code>kubectl&lt;/code> will look for &lt;code>kuberc&lt;/code>
file in your default &lt;a href="https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/">kubeconfig&lt;/a>
directory, which is &lt;code>$HOME/.kube&lt;/code>. Alternatively, you can specify this location
using &lt;code>--kuberc&lt;/code> option or the &lt;code>KUBERC&lt;/code> environment variable.&lt;/p>
&lt;p>Just like every Kubernetes manifest, &lt;code>kuberc&lt;/code> file will start with an &lt;code>apiVersion&lt;/code>
and &lt;code>kind&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">apiVersion&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kubectl.config.k8s.io/v1beta1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">kind&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Preference&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#080;font-style:italic"># the user preferences will follow here&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="defaults">Defaults&lt;/h3>
&lt;p>Let's start by setting default values for &lt;code>kubectl&lt;/code> command options. Our goal
is to always use interactive delete, which means we want the &lt;code>--interactive&lt;/code>
option for &lt;code>kubectl delete&lt;/code> to always be set to &lt;code>true&lt;/code>. This can be achieved
with the following addition to our &lt;code>kuberc&lt;/code> file:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">defaults&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>delete&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">options&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>interactive&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">default&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;true&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In the above example, I'm introducing &lt;code>defaults&lt;/code> section, which allows users to
define default values for &lt;code>kubectl&lt;/code> options. In this case, we're setting the
interactive option for &lt;code>kubectl delete&lt;/code> to be &lt;code>true&lt;/code> by default. This default
can be overridden if a user explicitly provides a different value such as
&lt;code>kubectl delete --interactive=false&lt;/code>, in which case the explicit option takes
precedence.&lt;/p>
&lt;p>Another highly encouraged default from SIG-CLI, is using &lt;a href="https://kubernetes.io/docs/reference/using-api/server-side-apply/">Server-Side Apply&lt;/a>.
To do so, you can add the following snippet to your preferences:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># continuing defaults section&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>apply&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">options&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>server-side&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">default&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;true&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="aliases">Aliases&lt;/h3>
&lt;p>The ability to define aliases allows us to save precious seconds when typing
commands. I bet that you most likely have one defined for &lt;code>kubectl&lt;/code>, because
typing seven letters is definitely longer than just pressing &lt;code>k&lt;/code>.&lt;/p>
&lt;p>For this reason, the ability to define aliases was a must-have when we decided
to implement user preferences, alongside defaulting. To define an alias for any
of the built-in commands, expand your &lt;code>kuberc&lt;/code> file with the following addition:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">aliases&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>gns&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>get&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">prependArgs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- namespace&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">options&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>output&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">default&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>json&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>There's a lot going on above, so let me break this down. First, we're introducing
a new section: &lt;code>aliases&lt;/code>. Here, we're defining a new alias &lt;code>gns&lt;/code>, which is mapped
to the command &lt;code>get&lt;/code> command. Next, we're defining arguments (&lt;code>namespace&lt;/code> resource)
that will be inserted right after the command name. Additionally, we're setting
&lt;code>--output=json&lt;/code> option for this alias. The structure of &lt;code>options&lt;/code> block is identical
to the one in the &lt;code>defaults&lt;/code> section.&lt;/p>
&lt;p>You probably noticed that we've introduced a mechanism for prepending arguments,
and you might wonder if there is a complementary setting for appending them (in
other words, adding to the end of the command, after user-provided arguments).
This can be achieved through &lt;code>appendArgs&lt;/code> block, which is presented below:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic"># continuing aliases section&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>runx&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">command&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>run&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">options&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>image&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">default&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>busybox&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>namespace&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">default&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>test-ns&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">appendArgs&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- --&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- custom-arg&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Here, we're introducing another alias: &lt;code>runx&lt;/code>, which invokes &lt;code>kubectl run&lt;/code> command,
passing &lt;code>--image&lt;/code> and &lt;code>--namespace&lt;/code> options with predefined values, and also
appending &lt;code>--&lt;/code> and &lt;code>custom-arg&lt;/code> at the end of the invocation.&lt;/p>
&lt;h2 id="debugging">Debugging&lt;/h2>
&lt;p>We hope that &lt;code>kubectl&lt;/code> user preferences will open up new possibilities for our users.
Whenever you're in doubt, feel free to run &lt;code>kubectl&lt;/code> with increased verbosity.
At &lt;code>-v=5&lt;/code>, you should get all the possible debugging information from this feature,
which will be crucial when reporting issues.&lt;/p>
&lt;p>To learn more, I encourage you to read through &lt;a href="https://kubernetes.io/docs/reference/kubectl/kuberc/">our official documentation&lt;/a>
and the &lt;a href="https://git.k8s.io/enhancements/keps/sig-cli/3104-introduce-kuberc/README.md">actual proposal&lt;/a>.&lt;/p>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>Kubectl user preferences feature has reached beta, and we are very interested
in your feedback. We'd love to hear what you like about it and what problems
you'd like to see it solve. Feel free to join &lt;a href="https://kubernetes.slack.com/archives/C2GL57FJ4">SIG-CLI slack channel&lt;/a>,
or open an issue against &lt;a href="https://git.k8s.io/kubectl/">kubectl repository&lt;/a>.
You can also join us at our &lt;a href="https://git.k8s.io/community/sig-cli/#meetings">community meetings&lt;/a>,
which happen every other Wednesday, and share your stories with us.&lt;/p></description></item><item><title>Kubernetes v1.34: Of Wind &amp; Will (O' WaW)</title><link>https://kubernetes.io/blog/2025/08/27/kubernetes-v1-34-release/</link><pubDate>Wed, 27 Aug 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/08/27/kubernetes-v1-34-release/</guid><description>
&lt;p>&lt;strong>Editors:&lt;/strong> Agustina Barbetta, Alejandro Josue Leon Bellido, Graziano Casto, Melony Qin, Dipesh Rawat&lt;/p>
&lt;p>Similar to previous releases, the release of Kubernetes v1.34 introduces new stable, beta, and alpha features. The consistent delivery of high-quality releases underscores the strength of our development cycle and the vibrant support from our community.&lt;/p>
&lt;p>This release consists of 58 enhancements. Of those enhancements, 23 have graduated to Stable, 22 have entered Beta, and 13 have entered Alpha.&lt;/p>
&lt;p>There are also some &lt;a href="#deprecations-and-removals">deprecations and removals&lt;/a> in this release; make sure to read about those.&lt;/p>
&lt;h2 id="release-theme-and-logo">Release theme and logo&lt;/h2>
&lt;figure class="release-logo ">
&lt;img src="https://kubernetes.io/blog/2025/08/27/kubernetes-v1-34-release/k8s-v1.34.png"
alt="Kubernetes v1.34 logo: Three bears sail a wooden ship with a flag featuring a paw and a helm symbol on the sail, as wind blows across the ocean"/>
&lt;/figure>
&lt;p>A release powered by the wind around us — and the will within us.&lt;/p>
&lt;p>Every release cycle, we inherit winds that we don't really control — the state
of our tooling, documentation, and the historical quirks of our project.
Sometimes these winds fill our sails, sometimes they push us sideways or die
down.&lt;/p>
&lt;p>What keeps Kubernetes moving isn't the perfect winds, but the will of our
sailors who adjust the sails, man the helm, chart the courses and keep the ship
steady. The release happens not because conditions are always ideal, but because
of the people who build it, the people who release it, and the bears&lt;sup>
^&lt;/sup>, cats, dogs, wizards, and curious minds who keep Kubernetes sailing
strong — no matter which way the wind blows.&lt;/p>
&lt;p>This release, &lt;strong>Of Wind &amp;amp; Will (O' WaW)&lt;/strong>, honors the winds that have shaped us,
and the will that propels us forward.&lt;/p>
&lt;p>&lt;sub>^ Oh, and you wonder why bears? Keep wondering!&lt;/sub>&lt;/p>
&lt;h2 id="spotlight-on-key-updates">Spotlight on key updates&lt;/h2>
&lt;p>Kubernetes v1.34 is packed with new features and improvements. Here are a few select updates the Release Team would like to highlight!&lt;/p>
&lt;h3 id="stable-the-core-of-dra-is-ga">Stable: The core of DRA is GA&lt;/h3>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/">Dynamic Resource Allocation&lt;/a> (DRA)
enables more powerful ways to select, allocate, share, and configure
GPUs, TPUs, NICs and other devices.&lt;/p>
&lt;p>Since the v1.30 release, DRA has been based around claiming devices using
&lt;em>structured parameters&lt;/em> that are opaque to the core of Kubernetes.
This enhancement took inspiration from dynamic provisioning for storage volumes.
DRA with structured parameters relies on a set of supporting API kinds:
ResourceClaim, DeviceClass, ResourceClaimTemplate, and ResourceSlice API types
under &lt;code>resource.k8s.io&lt;/code>, while extending the &lt;code>.spec&lt;/code> for Pods with a new &lt;code>resourceClaims&lt;/code> field.&lt;br>
The &lt;code>resource.k8s.io/v1&lt;/code> APIs have graduated to stable and are now available by default.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4381">KEP #4381&lt;/a> led by WG Device Management.&lt;/p>
&lt;h3 id="beta-projected-serviceaccount-tokens-for-kubelet-image-credential-providers">Beta: Projected ServiceAccount tokens for &lt;code>kubelet&lt;/code> image credential providers&lt;/h3>
&lt;p>The &lt;code>kubelet&lt;/code> credential providers, used for pulling private container images, traditionally relied on long-lived Secrets stored on the node or in the cluster. This approach increased security risks and management overhead, as these credentials were not tied to the specific workload and did not rotate automatically.&lt;br>
To solve this, the &lt;code>kubelet&lt;/code> can now request short-lived, audience-bound ServiceAccount tokens for authenticating to container registries. This allows image pulls to be authorized based on the Pod's own identity rather than a node-level credential.&lt;br>
The primary benefit is a significant security improvement. It eliminates the need for long-lived Secrets for image pulls, reducing the attack surface and simplifying credential management for both administrators and developers.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4412">KEP #4412&lt;/a> led by SIG Auth and SIG Node.&lt;/p>
&lt;h3 id="alpha-support-for-kyaml-a-kubernetes-dialect-of-yaml">Alpha: Support for KYAML, a Kubernetes dialect of YAML&lt;/h3>
&lt;p>KYAML aims to be a safer and less ambiguous YAML subset, and was designed specifically for Kubernetes. Whatever version of Kubernetes you use, starting from Kubernetes v1.34 you are able to use KYAML as a new output format for kubectl.&lt;/p>
&lt;p>KYAML addresses specific challenges with both YAML and JSON. YAML's significant whitespace requires careful attention to indentation and nesting, while its optional string-quoting can lead to unexpected type coercion (for example: &lt;a href="https://hitchdev.com/strictyaml/why/implicit-typing-removed/">&amp;quot;The Norway Bug&amp;quot;&lt;/a>). Meanwhile, JSON lacks comment support and has strict requirements for trailing commas and quoted keys.&lt;/p>
&lt;p>You can write KYAML and pass it as an input to any version of &lt;code>kubectl&lt;/code>, because all KYAML files are also valid as YAML. With &lt;code>kubectl&lt;/code> v1.34, you are also able to &lt;a href="https://kubernetes.io/docs/reference/kubectl/#syntax-1">request KYAML output&lt;/a> (as in kubectl get -o kyaml …) by setting environment variable &lt;code>KUBECTL_KYAML=true&lt;/code>. If you prefer, you can still request the output in JSON or YAML format.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5295">KEP #5295&lt;/a> led by SIG CLI.&lt;/p>
&lt;h2 id="features-graduating-to-stable">Features graduating to Stable&lt;/h2>
&lt;p>&lt;em>This is a selection of some of the improvements that are now stable following the v1.34 release.&lt;/em>&lt;/p>
&lt;h3 id="delayed-creation-of-job-s-replacement-pods">Delayed creation of Job’s replacement Pods&lt;/h3>
&lt;p>By default, Job controllers create replacement Pods immediately when a Pod starts terminating, causing both Pods to run simultaneously. This can cause resource contention in constrained clusters, where the replacement Pod may struggle to find available nodes until the original Pod fully terminates. The situation can also trigger unwanted cluster autoscaler scale-ups.
Additionally, some machine learning frameworks like TensorFlow and &lt;a href="https://jax.readthedocs.io/en/latest/">JAX&lt;/a> require only one Pod per index to run at a time, making simultaneous Pod execution problematic.
This feature introduces &lt;code>.spec.podReplacementPolicy&lt;/code> in Jobs. You may choose to create replacement Pods only when the Pod is fully terminated (has &lt;code>.status.phase: Failed&lt;/code>). To do this, set &lt;code>.spec.podReplacementPolicy: Failed&lt;/code>.&lt;br>
Introduced as alpha in v1.28, this feature has graduated to stable in v1.34.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3939">KEP #3939&lt;/a> led by SIG Apps.&lt;/p>
&lt;h3 id="recovery-from-volume-expansion-failure">Recovery from volume expansion failure&lt;/h3>
&lt;p>This feature allows users to cancel volume expansions that are unsupported by the underlying storage provider, and retry volume expansion with smaller values that may succeed.&lt;br>
Introduced as alpha in v1.23, this feature has graduated to stable in v1.34.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/1790">KEP #1790&lt;/a> led by SIG Storage.&lt;/p>
&lt;h3 id="volumeattributesclass-for-volume-modification">VolumeAttributesClass for volume modification&lt;/h3>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/storage/volume-attributes-classes/">VolumeAttributesClass&lt;/a> has graduated to stable in v1.34. VolumeAttributesClass is a generic, Kubernetes-native API for modifying volume parameters like provisioned IO. It allows workloads to vertically scale their volumes on-line to balance cost and performance, if supported by their provider.&lt;br>
Like all new volume features in Kubernetes, this API is implemented via the &lt;a href="https://kubernetes-csi.github.io/docs/">container storage interface (CSI)&lt;/a>. Your provisioner-specific CSI driver must support the new ModifyVolume API which is the CSI side of this feature.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3751">KEP #3751&lt;/a> led by SIG Storage.&lt;/p>
&lt;h3 id="structured-authentication-configuration">Structured authentication configuration&lt;/h3>
&lt;p>Kubernetes v1.29 introduced a configuration file format to manage API server client authentication, moving away from the previous reliance on a large set of command-line options.
The &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/authentication/#using-authentication-configuration">AuthenticationConfiguration&lt;/a> kind allows administrators to support multiple JWT authenticators, CEL expression validation, and dynamic reloading.
This change significantly improves the manageability and auditability of the cluster's authentication settings - and has graduated to stable in v1.34.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3331">KEP #3331&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="finer-grained-authorization-based-on-selectors">Finer-grained authorization based on selectors&lt;/h3>
&lt;p>Kubernetes authorizers, including webhook authorizers and the built-in node authorizer, can now make authorization decisions based on field and label selectors in incoming requests. When you send &lt;strong>list&lt;/strong>, &lt;strong>watch&lt;/strong> or &lt;strong>deletecollection&lt;/strong> requests with selectors, the authorization layer can now evaluate access with that additional context.&lt;/p>
&lt;p>For example, you can write an authorization policy that only allows listing Pods bound to a specific &lt;code>.spec.nodeName&lt;/code>.
The client (perhaps the kubelet on a particular node) must specify
the field selector that the policy requires, otherwise the request is forbidden.
This change makes it feasible to set up least privilege rules, provided that the client knows how to conform to the restrictions you set.
Kubernetes v1.34 now supports more granular control in environments like per-node isolation or custom multi-tenant setups.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4601">KEP #4601&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="restrict-anonymous-requests-with-fine-grained-controls">Restrict anonymous requests with fine-grained controls&lt;/h3>
&lt;p>Instead of fully enabling or disabling anonymous access, you can now configure a strict list of endpoints where unauthenticated requests are allowed. This provides a safer alternative for clusters that rely on anonymous access to health or bootstrap endpoints like &lt;code>/healthz&lt;/code>, &lt;code>/readyz&lt;/code>, or &lt;code>/livez&lt;/code>.&lt;/p>
&lt;p>With this feature, accidental RBAC misconfigurations that grant broad access to anonymous users can be avoided without requiring changes to external probes or bootstrapping tools.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4633">KEP #4633&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="more-efficient-requeueing-through-plugin-specific-callbacks">More efficient requeueing through plugin-specific callbacks&lt;/h3>
&lt;p>The &lt;code>kube-scheduler&lt;/code> can now make more accurate decisions about when to retry scheduling Pods that were previously unschedulable. Each scheduling plugin can now register callback functions that tell the scheduler whether an incoming cluster event is likely to make a rejected Pod schedulable again.&lt;/p>
&lt;p>This reduces unnecessary retries and improves overall scheduling throughput - especially in clusters using dynamic resource allocation. The feature also lets certain plugins skip the usual backoff delay when it is safe to do so, making scheduling faster in specific cases.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4247">KEP #4247&lt;/a> led by SIG Scheduling.&lt;/p>
&lt;h3 id="ordered-namespace-deletion">Ordered Namespace deletion&lt;/h3>
&lt;p>Semi-random resource deletion order can create security gaps or unintended behavior, such as Pods persisting after their associated NetworkPolicies are deleted.&lt;br>
This improvement introduces a more structured deletion process for Kubernetes &lt;a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/">namespaces&lt;/a> to ensure secure and deterministic resource removal. By enforcing a structured deletion sequence that respects logical and security dependencies, this approach ensures Pods are removed before other resources.&lt;br>
This feature was introduced in Kubernetes v1.33 and graduated to stable in v1.34. The graduation improves security and reliability by mitigating risks from non-deterministic deletions, including the vulnerability described in &lt;a href="https://github.com/advisories/GHSA-r56h-j38w-hrqq">CVE-2024-7598&lt;/a>.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5080">KEP #5080&lt;/a> led by SIG API Machinery.&lt;/p>
&lt;h3 id="streaming-list-responses">Streaming &lt;strong>list&lt;/strong> responses&lt;/h3>
&lt;p>Handling large &lt;strong>list&lt;/strong> responses in Kubernetes previously posed a significant scalability challenge. When clients requested extensive resource lists, such as thousands of Pods or Custom Resources, the API server was required to serialize the entire collection of objects into a single, large memory buffer before sending it. This process created substantial memory pressure and could lead to performance degradation, impacting the overall stability of the cluster.&lt;br>
To address this limitation, a streaming encoding mechanism for collections (list responses)
has been introduced. For the JSON and Kubernetes Protobuf response formats, that streaming mechanism
is automatically active and the associated feature gate is stable.
The primary benefit of this approach is the avoidance of large memory allocations on the API server, resulting in a much smaller and more predictable memory footprint.
Consequently, the cluster becomes more resilient and performant, especially in large-scale environments where frequent requests for extensive resource lists are common.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5116">KEP #5116&lt;/a> led by SIG API Machinery.&lt;/p>
&lt;h3 id="resilient-watch-cache-initialization">Resilient watch cache initialization&lt;/h3>
&lt;p>Watch cache is a caching layer inside &lt;code>kube-apiserver&lt;/code> that maintains an eventually consistent cache of cluster state stored in etcd. In the past, issues could occur when the watch cache was not yet initialized during &lt;code>kube-apiserver&lt;/code> startup or when it required re-initialization.&lt;/p>
&lt;p>To address these issues, the watch cache initialization process has been made more resilient to failures, improving control plane robustness and ensuring controllers and clients can reliably establish watches. This improvement was introduced as beta in v1.31 and is now stable.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4568">KEP #4568&lt;/a> led by SIG API Machinery and SIG Scalability.&lt;/p>
&lt;h3 id="relaxing-dns-search-path-validation">Relaxing DNS search path validation&lt;/h3>
&lt;p>Previously, the strict validation of a Pod's DNS &lt;code>search&lt;/code> path in Kubernetes often created integration challenges in complex or legacy network environments. This restrictiveness could block configurations that were necessary for an organization's infrastructure, forcing administrators to implement difficult workarounds.&lt;br>
To address this, relaxed DNS validation was introduced as alpha in v1.32 and has now graduated to stable in v1.34. A common use case involves Pods that need to communicate with both internal Kubernetes services and external domains. By setting a single dot (&lt;code>.&lt;/code>) as the first entry in the &lt;code>searches&lt;/code> list of the Pod's &lt;code>.spec.dnsConfig&lt;/code>, administrators can prevent the system's resolver from appending the cluster's internal search domains to external queries. This avoids generating unnecessary DNS requests to the internal DNS server for external hostnames, improving efficiency and preventing potential resolution errors.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4427">KEP #4427&lt;/a> led by SIG Network.&lt;/p>
&lt;h3 id="support-for-direct-service-return-dsr-in-windows-kube-proxy">Support for Direct Service Return (DSR) in Windows &lt;code>kube-proxy&lt;/code>&lt;/h3>
&lt;p>DSR provides performance optimizations by allowing return traffic routed through load balancers to bypass the load balancer and respond directly to the client, reducing load on the load balancer and improving overall latency. For information on DSR on Windows, read &lt;a href="https://techcommunity.microsoft.com/blog/networkingblog/direct-server-return-dsr-in-a-nutshell/693710">Direct Server Return (DSR) in a nutshell&lt;/a>.&lt;br>
Initially introduced in v1.14, this feature has graduated to stable in v1.34.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5100">KEP #5100&lt;/a> led by SIG Windows.&lt;/p>
&lt;h3 id="sleep-action-for-container-lifecycle-hooks">Sleep action for Container lifecycle hooks&lt;/h3>
&lt;p>A Sleep action for containers’ PreStop and PostStart lifecycle hooks was introduced to provide a straightforward way to manage graceful shutdowns and improve overall container lifecycle management.&lt;br>
The Sleep action allows containers to pause for a specified duration after starting or before termination. Using a negative or zero sleep duration returns immediately, resulting in a no-op.&lt;br>
The Sleep action was introduced in Kubernetes v1.29, with zero value support added in v1.32. Both features graduated to stable in v1.34.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3960">KEP #3960&lt;/a> and &lt;a href="https://kep.k8s.io/4818">KEP #4818&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="linux-node-swap-support">Linux node swap support&lt;/h3>
&lt;p>Historically, the lack of swap support in Kubernetes could lead to workload instability, as nodes under memory pressure often had to terminate processes abruptly. This particularly affected applications with large but infrequently accessed memory footprints and prevented more graceful resource management.&lt;/p>
&lt;p>To address this, configurable per-node swap support was introduced in v1.22. It has progressed through alpha and beta stages and has graduated to stable in v1.34. The primary mode, &lt;code>LimitedSwap&lt;/code>, allows Pods to use swap within their existing memory limits, providing a direct solution to the problem. By default, the &lt;code>kubelet&lt;/code> is configured with &lt;code>NoSwap&lt;/code> mode, which means Kubernetes workloads cannot use swap.&lt;/p>
&lt;p>This feature improves workload stability and allows for more efficient resource utilization. It enables clusters to support a wider variety of applications, especially in resource-constrained environments, though administrators must consider the potential performance impact of swapping.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/2400">KEP #2400&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="allow-special-characters-in-environment-variables">Allow special characters in environment variables&lt;/h3>
&lt;p>The environment variable validation rules in Kubernetes have been relaxed
to allow nearly all printable ASCII characters in variable names, excluding &lt;code>=&lt;/code>.
This change supports scenarios where workloads require nonstandard characters in variable names - for example, frameworks like .NET Core that use &lt;code>:&lt;/code> to represent nested configuration keys.&lt;/p>
&lt;p>The relaxed validation applies to environment variables defined directly in Pod spec,
as well as those injected using &lt;code>envFrom&lt;/code> references to ConfigMaps and Secrets.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4369">KEP #4369&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="taint-management-is-separated-from-node-lifecycle">Taint management is separated from Node lifecycle&lt;/h3>
&lt;p>Historically, the &lt;code>TaintManager&lt;/code>'s logic for applying NoSchedule and NoExecute taints to nodes based on their condition (NotReady, Unreachable, etc.) was tightly coupled with the node lifecycle controller. This tight coupling made the code harder to maintain and test, and it also limited the flexibility of the taint-based eviction mechanism.
This KEP refactors the &lt;code>TaintManager&lt;/code> into its own separate controller within the Kubernetes controller manager. It is an internal architectural improvement designed to increase code modularity and maintainability. This change allows the logic for taint-based evictions to be tested and evolved independently, but it has no direct user-facing impact on how taints are used.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3902">KEP #3902&lt;/a> led by SIG Scheduling and SIG Node.&lt;/p>
&lt;h2 id="new-features-in-beta">New features in Beta&lt;/h2>
&lt;p>&lt;em>This is a selection of some of the improvements that are now beta following the v1.34 release.&lt;/em>&lt;/p>
&lt;h3 id="pod-level-resource-requests-and-limits">Pod-level resource requests and limits&lt;/h3>
&lt;p>Defining resource needs for Pods with multiple containers has been challenging,
as requests and limits could only be set on a per-container basis.
This forced developers to either over-provision resources for each container or meticulously
divide the total desired resources, making configuration complex and often leading to
inefficient resource allocation.
To simplify this, the ability to specify resource requests and limits at the Pod level was introduced.
This allows developers to define an overall resource budget for a Pod,
which is then shared among its constituent containers.
This feature was introduced as alpha in v1.32 and has graduated to beta in v1.34,
with HPA now supporting pod-level resource specifications.&lt;/p>
&lt;p>The primary benefit is a more intuitive and straightforward way to manage resources for multi-container Pods.
It ensures that the total resources used by all containers do not exceed the Pod's defined limits,
leading to better resource planning, more accurate scheduling,
and more efficient utilization of cluster resources.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/2837">KEP #2837&lt;/a> led by SIG Scheduling and SIG Autoscaling.&lt;/p>
&lt;h3 id="kuberc-file-for-kubectl-user-preferences">&lt;code>.kuberc&lt;/code> file for &lt;code>kubectl&lt;/code> user preferences&lt;/h3>
&lt;p>A &lt;code>.kuberc&lt;/code> configuration file allows you to define preferences for &lt;code>kubectl&lt;/code>, such as default options and command aliases. Unlike the kubeconfig file, the &lt;code>.kuberc&lt;/code> configuration file does not contain cluster details, usernames or passwords.&lt;br>
This feature was introduced as alpha in v1.33, gated behind the environment variable &lt;code>KUBECTL_KUBERC&lt;/code>. It has graduated to beta in v1.34 and is enabled by default.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3104">KEP #3104&lt;/a> led by SIG CLI.&lt;/p>
&lt;h3 id="external-serviceaccount-token-signing">External ServiceAccount token signing&lt;/h3>
&lt;p>Traditionally, Kubernetes manages ServiceAccount tokens using static signing keys that are loaded from disk at &lt;code>kube-apiserver&lt;/code> startup. This feature introduces an &lt;code>ExternalJWTSigner&lt;/code> gRPC service for out-of-process signing, enabling Kubernetes distributions to integrate with external key management solutions (for example, HSMs, cloud KMSes) for ServiceAccount token signing instead of static disk-based keys.&lt;/p>
&lt;p>Introduced as alpha in v1.32, this external JWT signing capability advances to beta and is enabled by default in v1.34.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/740">KEP #740&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="dra-features-in-beta">DRA features in beta&lt;/h3>
&lt;h4 id="admin-access-for-secure-resource-monitoring">Admin access for secure resource monitoring&lt;/h4>
&lt;p>DRA supports controlled administrative access via the &lt;code>adminAccess&lt;/code> field in ResourceClaims or ResourceClaimTemplates, allowing cluster operators to access devices already in use by others for monitoring or diagnostics. This privileged mode is limited to users authorized to create such objects in namespaces labeled &lt;code>resource.k8s.io/admin-access: &amp;quot;true&amp;quot;&lt;/code>, ensuring regular workloads remain unaffected. Graduating to beta in v1.34, this feature provides secure introspection capabilities while preserving workload isolation through namespace-based authorization checks.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5018">KEP #5018&lt;/a> led by WG Device Management and SIG Auth.&lt;/p>
&lt;h4 id="prioritized-alternatives-in-resourceclaims-and-resourceclaimtemplates">Prioritized alternatives in ResourceClaims and ResourceClaimTemplates&lt;/h4>
&lt;p>While a workload might run best on a single high-performance GPU, it might also be able to run on two mid-level GPUs.&lt;br>
With the feature gate &lt;code>DRAPrioritizedList&lt;/code> (now enabled by default), ResourceClaims and ResourceClaimTemplates get a new field named &lt;code>firstAvailable&lt;/code>. This field is an ordered list that allows users to specify that a request may be satisfied in different ways, including allocating nothing at all if specific hardware is not available. The scheduler will attempt to satisfy the alternatives in the list in order, so the workload will be allocated the best set of devices available in the cluster.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4816">KEP #4816&lt;/a> led by WG Device Management.&lt;/p>
&lt;h4 id="the-kubelet-reports-allocated-dra-resources">The &lt;code>kubelet&lt;/code> reports allocated DRA resources&lt;/h4>
&lt;p>The &lt;code>kubelet&lt;/code>'s API has been updated to report on Pod resources allocated through DRA. This allows node monitoring agents to discover the allocated DRA resources for Pods on a node. Additionally, it enables node components to use the PodResourcesAPI and leverage this DRA information when developing new features and integrations.&lt;br>
Starting from Kubernetes v1.34, this feature is enabled by default.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3695">KEP #3695&lt;/a> led by WG Device Management.&lt;/p>
&lt;h3 id="kube-scheduler-non-blocking-api-calls">&lt;code>kube-scheduler&lt;/code> non-blocking API calls&lt;/h3>
&lt;p>The &lt;code>kube-scheduler&lt;/code> makes blocking API calls during scheduling cycles, creating performance bottlenecks. This feature introduces asynchronous API handling through a prioritized queue system with request deduplication, allowing the scheduler to continue processing Pods while API operations complete in the background. Key benefits include reduced scheduling latency, prevention of scheduler thread starvation during API delays, and immediate retry capability for unschedulable Pods. The implementation maintains backward compatibility and adds metrics for monitoring pending API operations.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5229">KEP #5229&lt;/a> led by SIG Scheduling.&lt;/p>
&lt;h3 id="mutating-admission-policies">Mutating admission policies&lt;/h3>
&lt;p>&lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/mutating-admission-policy/">MutatingAdmissionPolicies&lt;/a> offer a declarative, in-process alternative to mutating admission webhooks. This feature leverages CEL's object instantiation and JSON Patch strategies, combined with Server Side Apply’s merge algorithms.&lt;br>
This significantly simplifies admission control by allowing administrators to define mutation rules directly in the API server.&lt;br>
Introduced as alpha in v1.32, mutating admission policies has graduated to beta in v1.34.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3962">KEP #3962&lt;/a> led by SIG API Machinery.&lt;/p>
&lt;h3 id="snapshottable-api-server-cache">Snapshottable API server cache&lt;/h3>
&lt;p>The &lt;code>kube-apiserver&lt;/code>'s caching mechanism (watch cache) efficiently serves requests for the latest observed state. However, &lt;strong>list&lt;/strong> requests for previous states (for example, via pagination or by specifying a &lt;code>resourceVersion&lt;/code>) often bypass this cache and are served directly from etcd. This direct etcd access significantly increases performance costs and can lead to stability issues, particularly with large resources, due to memory pressure from transferring large data blobs.&lt;br>
With the &lt;code>ListFromCacheSnapshot&lt;/code> feature gate enabled by default, &lt;code>kube-apiserver&lt;/code> will attempt to serve the response from snapshots if one is available with &lt;code>resourceVersion&lt;/code> older than requested. The &lt;code>kube-apiserver&lt;/code> starts with no snapshots, creates a new snapshot on every watch event, and keeps them until it detects etcd is compacted or if cache is full with events older than 75 seconds. If the provided &lt;code>resourceVersion&lt;/code> is unavailable, the server will fallback to etcd.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4988">KEP #4988&lt;/a> led by SIG API Machinery.&lt;/p>
&lt;h3 id="tooling-for-declarative-validation-of-kubernetes-native-types">Tooling for declarative validation of Kubernetes-native types&lt;/h3>
&lt;p>Prior to this release, validation rules for the
APIs built into Kubernetes were written entirely by hand, which makes them difficult for maintainers to discover, understand, improve or test.
There was no single way to find all the validation rules that might apply to an API.
&lt;em>Declarative validation&lt;/em> benefits Kubernetes maintainers by making API development, maintenance, and review easier while enabling programmatic inspection for better tooling and documentation.
For people using Kubernetes libraries to write their own code
(for example: a controller), the new approach streamlines adding new fields through IDL tags, rather than complex validation functions.
This change helps speed up API creation by automating validation boilerplate,
and provides more relevant error messages by performing validation on versioned types.​​​​​​​​​​​​​​​​&lt;br>
This enhancement (which graduated to beta in v1.33 and continues as beta in v1.34) brings CEL-based validation rules to native Kubernetes types. It allows for more granular and declarative validation to be defined directly in the type definitions, improving API consistency and developer experience.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5073">KEP #5073&lt;/a> led by SIG API Machinery.&lt;/p>
&lt;h3 id="streaming-informers-for-list-requests">Streaming informers for &lt;strong>list&lt;/strong> requests&lt;/h3>
&lt;p>The streaming informers feature, which has been in beta since v1.32, gains further beta refinements in v1.34. This capability allows &lt;strong>list&lt;/strong> requests to return data as a continuous stream of objects from the API server’s watch cache, rather than assembling paged results directly from etcd. By reusing the same mechanics used for &lt;strong>watch&lt;/strong> operations, the API server can serve large datasets while keeping memory usage steady and avoiding allocation spikes that can affect stability.&lt;/p>
&lt;p>In this release, the &lt;code>kube-apiserver&lt;/code> and &lt;code>kube-controller-manager&lt;/code> both take advantage of the new &lt;code>WatchList&lt;/code> mechanism by default. For the &lt;code>kube-apiserver&lt;/code>, this means list requests are streamed more efficiently, while the &lt;code>kube-controller-manager&lt;/code> benefits from a more memory-efficient and predictable way to work with informers. Together, these improvements reduce memory pressure during large list operations, and improve reliability under sustained load, making list streaming more predictable and efficient.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3157">KEP #3157&lt;/a> led by SIG API Machinery and SIG Scalability.&lt;/p>
&lt;h3 id="graceful-node-shutdown-handling-for-windows-nodes">Graceful node shutdown handling for Windows nodes&lt;/h3>
&lt;p>The &lt;code>kubelet&lt;/code> on Windows nodes can now detect system shutdown events and begin graceful termination of running Pods. This mirrors existing behavior on Linux and helps ensure workloads exit cleanly during planned shutdowns or restarts.&lt;br>
When the system begins shutting down, the &lt;code>kubelet&lt;/code> reacts by using standard termination logic. It respects the configured lifecycle hooks and grace periods, giving Pods time to stop before the node powers off. The feature relies on Windows pre-shutdown notifications to coordinate this process. This enhancement improves workload reliability during maintenance, restarts, or system updates. It is now in beta and enabled by default.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4802">KEP #4802&lt;/a> led by SIG Windows.&lt;/p>
&lt;h3 id="in-place-pod-resize-improvements">In-place Pod resize improvements&lt;/h3>
&lt;p>Graduated to beta and enabled by default in v1.33, in-place Pod resizing receives further improvements in v1.34. These include support for decreasing memory usage and integration with Pod-level resources.&lt;/p>
&lt;p>This feature remains in beta in v1.34. For detailed usage instructions and examples, refer to the documentation: &lt;a href="https://kubernetes.io/docs/tasks/configure-pod-container/resize-container-resources/">Resize CPU and Memory Resources assigned to Containers&lt;/a>.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/1287">KEP #1287&lt;/a> led by SIG Node and SIG Autoscaling.&lt;/p>
&lt;h2 id="new-features-in-alpha">New features in Alpha&lt;/h2>
&lt;p>&lt;em>This is a selection of some of the improvements that are now alpha following the v1.34 release.&lt;/em>&lt;/p>
&lt;h3 id="pod-certificates-for-mtls-authentication">Pod certificates for mTLS authentication&lt;/h3>
&lt;p>Authenticating workloads within a cluster, especially for communication with the API server, has primarily relied on ServiceAccount tokens. While effective, these tokens aren't always ideal for establishing a strong, verifiable identity for mutual TLS (mTLS) and can present challenges when integrating with external systems that expect certificate-based authentication.&lt;br>
Kubernetes v1.34 introduces a built-in mechanism for Pods to obtain X.509 certificates via &lt;a href="https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/#pod-certificate-requests">PodCertificateRequests&lt;/a>. The &lt;code>kubelet&lt;/code> can request and manage certificates for Pods, which can then be used to authenticate to the Kubernetes API server and other services using mTLS.
The primary benefit is a more robust and flexible identity mechanism for Pods. It provides a native way to implement strong mTLS authentication without relying solely on bearer tokens, aligning Kubernetes with standard security practices and simplifying integrations with certificate-aware observability and security tooling.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4317">KEP #4317&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="restricted-pod-security-standard-now-forbids-remote-probes">&amp;quot;Restricted&amp;quot; Pod security standard now forbids remote probes&lt;/h3>
&lt;p>The &lt;code>host&lt;/code> field within probes and lifecycle handlers allows users to specify an entity other than the &lt;code>podIP&lt;/code> for the &lt;code>kubelet&lt;/code> to probe.
However, this opens up a route for misuse and for attacks that bypass security controls, since the &lt;code>host&lt;/code> field could be set to &lt;strong>any&lt;/strong> value, including security sensitive external hosts, or localhost on the node.
In Kubernetes v1.34, Pods only meet the
&lt;a href="https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted">Restricted&lt;/a>
Pod security standard if they either leave the &lt;code>host&lt;/code> field unset, or if they don't even use this
kind of probe.
You can use &lt;em>Pod security admission&lt;/em>, or a third party solution, to enforce that Pods meet this standard. Because these are security controls, check
the documentation to understand the limitations and behavior of the enforcement mechanism you choose.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4940">KEP #4940&lt;/a> led by SIG Auth.&lt;/p>
&lt;h3 id="use-status-nominatednodename-to-express-pod-placement">Use &lt;code>.status.nominatedNodeName&lt;/code> to express Pod placement&lt;/h3>
&lt;p>When the &lt;code>kube-scheduler&lt;/code> takes time to bind Pods to Nodes, cluster autoscalers may not understand that a Pod will be bound to a specific Node. Consequently, they may mistakenly consider the Node as underutilized and delete it.&lt;br>
To address this issue, the &lt;code>kube-scheduler&lt;/code> can use &lt;code>.status.nominatedNodeName&lt;/code> not only to indicate ongoing preemption but also to express Pod placement intentions. By enabling the &lt;code>NominatedNodeNameForExpectation&lt;/code> feature gate, the scheduler uses this field to indicate where a Pod will be bound. This exposes internal reservations to help external components make informed decisions.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5278">KEP #5278&lt;/a> led by SIG Scheduling.&lt;/p>
&lt;h3 id="dra-features-in-alpha">DRA features in alpha&lt;/h3>
&lt;h4 id="resource-health-status-for-dra">Resource health status for DRA&lt;/h4>
&lt;p>It can be difficult to know when a Pod is using a device that has failed or is temporarily unhealthy, which makes troubleshooting Pod crashes challenging or impossible.&lt;br>
Resource Health Status for DRA improves observability by exposing the health status of devices allocated to a Pod in the Pod’s status. This makes it easier to identify the cause of Pod issues related to unhealthy devices and respond appropriately.&lt;br>
To enable this functionality, the &lt;code>ResourceHealthStatus&lt;/code> feature gate must be enabled, and the DRA driver must implement the &lt;code>DRAResourceHealth&lt;/code> gRPC service.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4680">KEP #4680&lt;/a> led by WG Device Management.&lt;/p>
&lt;h4 id="extended-resource-mapping">Extended resource mapping&lt;/h4>
&lt;p>Extended resource mapping provides a simpler alternative to DRA's expressive and flexible approach by offering a straightforward way to describe resource capacity and consumption. This feature enables cluster administrators to advertise DRA-managed resources as &lt;em>extended resources&lt;/em>, allowing application developers and operators to continue using the familiar container’s &lt;code>.spec.resources&lt;/code> syntax to consume them.&lt;br>
This enables existing workloads to adopt DRA without modifications, simplifying the transition to DRA for both application developers and cluster administrators.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5004">KEP #5004&lt;/a> led by WG Device Management.&lt;/p>
&lt;h4 id="dra-consumable-capacity">DRA consumable capacity&lt;/h4>
&lt;p>Kubernetes v1.33 added support for resource drivers to advertise slices of a device that are available, rather than exposing the entire device as an all-or-nothing resource. However, this approach couldn't handle scenarios where device drivers manage fine-grained, dynamic portions of a device resource based on user demand, or share those resources independently of ResourceClaims, which are restricted by their spec and namespace.&lt;br>
Enabling the &lt;code>DRAConsumableCapacity&lt;/code> feature gate
(introduced as alpha in v1.34)
allows resource drivers to share the same device, or even a slice of a device, across multiple ResourceClaims or across multiple DeviceRequests.
The feature also extends the scheduler to support allocating portions of device resources,
as defined in the &lt;code>capacity&lt;/code> field.
This DRA feature improves device sharing across namespaces and claims, tailoring it to Pod needs. It enables drivers to enforce capacity limits, enhances scheduling, and supports new use cases like bandwidth-aware networking and multi-tenant sharing.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5075">KEP #5075&lt;/a> led by WG Device Management.&lt;/p>
&lt;h4 id="device-binding-conditions">Device binding conditions&lt;/h4>
&lt;p>The Kubernetes scheduler gets more reliable by delaying binding a Pod to a Node until its required external resources, such as attachable devices or FPGAs, are confirmed to be ready.&lt;br>
This delay mechanism is implemented in the &lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#pre-bind">PreBind phase&lt;/a> of the scheduling framework. During this phase, the scheduler checks whether all required device conditions are satisfied before proceeding with binding. This enables coordination with external device controllers, ensuring more robust, predictable scheduling.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5007">KEP #5007&lt;/a> led by WG Device Management.&lt;/p>
&lt;h3 id="container-restart-rules">Container restart rules&lt;/h3>
&lt;p>Currently, all containers within a Pod will follow the same &lt;code>.spec.restartPolicy&lt;/code> when exited or crashed. However, Pods that run multiple containers might have different restart requirements for each container. For example, for init containers used to perform initialization, you may not want to retry initialization if they fail. Similarly, in ML research environments with long-running training workloads, containers that fail with retriable exit codes should restart quickly in place, rather than triggering Pod recreation and losing progress.&lt;br>
Kubernetes v1.34 introduces the &lt;code>ContainerRestartRules&lt;/code> feature gate. When enabled, a &lt;code>restartPolicy&lt;/code> can be specified for each container within a Pod. A &lt;code>restartPolicyRules&lt;/code> list can also be defined to override &lt;code>restartPolicy&lt;/code> based on the last exit code. This provides the fine-grained control needed to handle complex scenarios and better utilization of compute resources.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/5307">KEP #5307&lt;/a> led by SIG Node.&lt;/p>
&lt;h3 id="load-environment-variables-from-files-created-in-runtime">Load environment variables from files created in runtime&lt;/h3>
&lt;p>Application developers have long requested greater flexibility in declaring environment variables.
Traditionally, environment variables are declared on the API server side via static values, ConfigMaps, or Secrets.&lt;/p>
&lt;p>Behind the &lt;code>EnvFiles&lt;/code> feature gate, Kubernetes v1.34 introduces the ability to declare environment variables at runtime.
One container (typically an init container) can generate the variable and store it in a file,
and a subsequent container can start with the environment variable loaded from that file.
This approach eliminates the need to &amp;quot;wrap&amp;quot; the target container's entry point,
enabling more flexible in-Pod container orchestration.&lt;/p>
&lt;p>This feature particularly benefits AI/ML training workloads,
where each Pod in a training Job requires initialization with runtime-defined values.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3721">KEP #5307&lt;/a> led by SIG Node.&lt;/p>
&lt;h2 id="graduations-deprecations-and-removals-in-v1-34">Graduations, deprecations, and removals in v1.34&lt;/h2>
&lt;h3 id="graduations-to-stable">Graduations to stable&lt;/h3>
&lt;p>This lists all the features that graduated to stable (also known as &lt;em>general availability&lt;/em>). For a full list of updates including new features and graduations from alpha to beta, see the release notes.&lt;/p>
&lt;p>This release includes a total of 23 enhancements promoted to stable:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://kep.k8s.io/4369">Allow almost all printable ASCII characters in environment variables&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3939">Allow for recreation of pods once fully terminated in the job controller&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4818">Allow zero value for Sleep Action of PreStop Hook&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/647">API Server tracing&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/24">AppArmor support&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4601">Authorize with Field and Label Selectors&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/2340">Consistent Reads from Cache&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3902">Decouple TaintManager from NodeLifecycleController&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4033">Discover cgroup driver from CRI&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4381">DRA: structured parameters&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3960">Introducing Sleep Action for PreStop Hook&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/2831">Kubelet OpenTelemetry Tracing&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3751">Kubernetes VolumeAttributesClass ModifyVolume&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/2400">Node memory swap support&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4633">Only allow anonymous auth for configured endpoints&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/5080">Ordered namespace deletion&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4247">Per-plugin callback functions for accurate requeueing in kube-scheduler&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4427">Relaxed DNS search string validation&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/4568">Resilient Watchcache Initialization&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/5116">Streaming Encoding for LIST Responses&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/3331">Structured Authentication Config&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/5100">Support for Direct Service Return (DSR) and overlay networking in Windows kube-proxy&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kep.k8s.io/1790">Support recovery from volume expansion failure&lt;/a>&lt;/li>
&lt;/ul>
&lt;h3 id="deprecations-and-removals">Deprecations and removals&lt;/h3>
&lt;p>As Kubernetes develops and matures, features may be deprecated, removed, or replaced with better
ones to improve the project's overall health. See the Kubernetes
&lt;a href="https://kubernetes.io/docs/reference/using-api/deprecation-policy/">deprecation and removal policy&lt;/a> for more details on
this process. Kubernetes v1.34 includes a couple of deprecations.&lt;/p>
&lt;h4 id="manual-cgroup-driver-configuration-is-deprecated">Manual cgroup driver configuration is deprecated&lt;/h4>
&lt;p>Historically, configuring the correct cgroup driver has been a pain point for users running Kubernetes clusters.
Kubernetes v1.28 added a way for the &lt;code>kubelet&lt;/code>
to query the CRI implementation and find which cgroup driver to use. That automated detection is now
&lt;strong>strongly recommended&lt;/strong> and support for it has graduated to stable in v1.34.
If your CRI container runtime does not support the
ability to report the cgroup driver it needs, you
should upgrade or change your container runtime.
The &lt;code>cgroupDriver&lt;/code> configuration setting in the &lt;code>kubelet&lt;/code> configuration file is now deprecated.
The corresponding command-line option &lt;code>--cgroup-driver&lt;/code> was previously deprecated,
as Kubernetes recommends using the configuration file instead.
Both the configuration setting and command-line option will be removed in a future release,
that removal will not happen before the v1.36 minor release.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4033">KEP #4033&lt;/a> led by SIG Node.&lt;/p>
&lt;h4 id="kubernetes-to-end-containerd-1-x-support-in-v1-36">Kubernetes to end containerd 1.x support in v1.36&lt;/h4>
&lt;p>While Kubernetes v1.34 still supports containerd 1.7 and other LTS releases of containerd,
as a consequence of automated cgroup driver detection, the Kubernetes SIG Node community
has formally agreed upon a final support timeline for containerd v1.X.
The last Kubernetes release to offer this support will be v1.35 (aligned with containerd 1.7 EOL).
This is an early warning that if you are using containerd 1.X, consider switching to 2.0+ soon.
You are able to monitor the &lt;code>kubelet_cri_losing_support&lt;/code> metric to determine if any nodes in your
cluster are using a containerd version that will soon be outdated.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/4033">KEP #4033&lt;/a> led by SIG Node.&lt;/p>
&lt;h4 id="preferclose-traffic-distribution-is-deprecated">&lt;code>PreferClose&lt;/code> traffic distribution is deprecated&lt;/h4>
&lt;p>The &lt;code>spec.trafficDistribution&lt;/code> field within a Kubernetes &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/">Service&lt;/a> allows users to express preferences for how traffic should be routed to Service endpoints.&lt;/p>
&lt;p>&lt;a href="https://kep.k8s.io/3015">KEP-3015&lt;/a> deprecates &lt;code>PreferClose&lt;/code> and introduces two additional values: &lt;code>PreferSameZone&lt;/code> and &lt;code>PreferSameNode&lt;/code>. &lt;code>PreferSameZone&lt;/code> is an alias for the existing &lt;code>PreferClose&lt;/code> to clarify its semantics. &lt;code>PreferSameNode&lt;/code> allows connections to be delivered to a local endpoint when possible, falling back to a remote endpoint when not possible.&lt;/p>
&lt;p>This feature was introduced in v1.33 behind the &lt;code>PreferSameTrafficDistribution&lt;/code> feature gate. It has graduated to beta in v1.34 and is enabled by default.&lt;/p>
&lt;p>This work was done as part of &lt;a href="https://kep.k8s.io/3015">KEP #3015&lt;/a> led by SIG Network.&lt;/p>
&lt;h2 id="release-notes">Release notes&lt;/h2>
&lt;p>Check out the full details of the Kubernetes v1.34 release in our &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md">release notes&lt;/a>.&lt;/p>
&lt;h2 id="availability">Availability&lt;/h2>
&lt;p>Kubernetes v1.34 is available for download on &lt;a href="https://github.com/kubernetes/kubernetes/releases/tag/v1.34.0">GitHub&lt;/a> or on the &lt;a href="https://kubernetes.io/releases/download/">Kubernetes download page&lt;/a>.&lt;/p>
&lt;p>To get started with Kubernetes, check out these &lt;a href="https://kubernetes.io/docs/tutorials/">interactive tutorials&lt;/a> or run local Kubernetes clusters using &lt;a href="https://minikube.sigs.k8s.io/">minikube&lt;/a>. You can also easily install v1.34 using &lt;a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/">kubeadm&lt;/a>.&lt;/p>
&lt;h2 id="release-team">Release Team&lt;/h2>
&lt;p>Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is made up of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. This requires the specialized skills of people from all corners of our community, from the code itself to its documentation and project management.&lt;/p>
&lt;p>&lt;a href="https://github.com/cncf/memorials/blob/main/rodolfo-martinez.md">We honor the memory of Rodolfo &amp;quot;Rodo&amp;quot; Martínez Vega&lt;/a>, a dedicated contributor whose passion for technology and community building left a mark on the Kubernetes community. Rodo served as a member of the Kubernetes Release Team across multiple releases, including v1.22-v1.23 and v1.25-v1.30, demonstrating unwavering commitment to the project's success and stability.&lt;br>
Beyond his Release Team contributions, Rodo was deeply involved in fostering the Cloud Native LATAM community, helping to bridge language and cultural barriers in the space. His work on the Spanish version of Kubernetes documentation and the CNCF Glossary exemplified his dedication to making knowledge accessible to Spanish-speaking developers worldwide. Rodo's legacy lives on through the countless community members he mentored, the releases he helped deliver, and the vibrant LATAM Kubernetes community he helped cultivate.&lt;/p>
&lt;p>We would like to thank the entire &lt;a href="https://github.com/kubernetes/sig-release/blob/master/releases/release-1.34/release-team.md">Release Team&lt;/a> for the hours spent hard at work to deliver the Kubernetes v1.34 release to our community. The Release Team's membership ranges from first-time shadows to returning team leads with experience forged over several release cycles. A very special thanks goes out to our release lead, Vyom Yadav, for guiding us through a successful release cycle, for his hands-on approach to solving challenges, and for bringing the energy and care that drives our community forward.&lt;/p>
&lt;h2 id="project-velocity">Project Velocity&lt;/h2>
&lt;p>The CNCF K8s &lt;a href="https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;var-period=m&amp;var-repogroup_name=All">DevStats&lt;/a> project aggregates a number of interesting data points related to the velocity of Kubernetes and various sub-projects. This includes everything from individual contributions to the number of companies that are contributing and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem.&lt;/p>
&lt;p>During the v1.34 release cycle, which spanned 15 weeks from 19th May 2025 to 27th August 2025, Kubernetes received contributions from as many as 106 different companies and 491 individuals. In the wider cloud native ecosystem, the figure goes up to 370 companies, counting 2235 total contributors.&lt;/p>
&lt;p>Note that &amp;quot;contribution&amp;quot; counts when someone makes a commit, code review, comment, creates an issue or PR, reviews a PR (including blogs and documentation) or comments on issues and PRs.&lt;br>
If you are interested in contributing, visit &lt;a href="https://www.kubernetes.dev/docs/guide/#getting-started">Getting Started&lt;/a> on our contributor website.&lt;/p>
&lt;p>Source for this data:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1747609200000&amp;to=1756335599000&amp;var-period=d28&amp;var-repogroup_name=Kubernetes&amp;var-repo_name=kubernetes%2Fkubernetes">Companies contributing to Kubernetes&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://k8s.devstats.cncf.io/d/11/companies-contributing-in-repository-groups?orgId=1&amp;from=1747609200000&amp;to=1756335599000&amp;var-period=d28&amp;var-repogroup_name=All&amp;var-repo_name=kubernetes%2Fkubernetes">Overall ecosystem contributions&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="event-update">Event Update&lt;/h2>
&lt;p>Explore upcoming Kubernetes and cloud native events, including KubeCon + CloudNativeCon, KCD, and other notable conferences worldwide. Stay informed and get involved with the Kubernetes community!&lt;/p>
&lt;p>&lt;strong>August 2025&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-colombia-presents-kcd-colombia-2025/">&lt;strong>KCD - Kubernetes Community Days: Colombia&lt;/strong>&lt;/a>: Aug 28, 2025 | Bogotá, Colombia&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>September 2025&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-cloud-native-sydney-presents-cloudcon-sydney-sydney-international-convention-centre-910-september/">&lt;strong>CloudCon Sydney&lt;/strong>&lt;/a>: Sep 9–10, 2025 | Sydney, Australia.&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-sf-bay-area-presents-kcd-san-francisco-bay-area/">&lt;strong>KCD - Kubernetes Community Days: San Francisco Bay Area&lt;/strong>&lt;/a>: Sep 9, 2025 | San Francisco, USA&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-washington-dc-presents-kcd-washington-dc-2025/">&lt;strong>KCD - Kubernetes Community Days: Washington DC&lt;/strong>&lt;/a>: Sep 16, 2025 | Washington, D.C., USA&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-sofia-presents-kubernetes-community-days-sofia/">&lt;strong>KCD - Kubernetes Community Days: Sofia&lt;/strong>&lt;/a>: Sep 18, 2025 | Sofia, Bulgaria&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-el-salvador-presents-kcd-el-salvador/">&lt;strong>KCD - Kubernetes Community Days: El Salvador&lt;/strong>&lt;/a>: Sep 20, 2025 | San Salvador, El Salvador&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>October 2025&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-warsaw-presents-kcd-warsaw-2025/">&lt;strong>KCD - Kubernetes Community Days: Warsaw&lt;/strong>&lt;/a>: Oct 9, 2025 | Warsaw, Poland&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-uk-presents-kubernetes-community-days-uk-edinburgh-2025/">&lt;strong>KCD - Kubernetes Community Days: Edinburgh&lt;/strong>&lt;/a>: Oct 21, 2025 | Edinburgh, United Kingdom&lt;/li>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-sri-lanka-presents-kcd-sri-lanka-2025/">&lt;strong>KCD - Kubernetes Community Days: Sri Lanka&lt;/strong>&lt;/a>: Oct 26, 2025 | Colombo, Sri Lanka&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>November 2025&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-porto-presents-kcd-porto-2025/">&lt;strong>KCD - Kubernetes Community Days: Porto&lt;/strong>&lt;/a>: Nov 3, 2025 | Porto, Portugal&lt;/li>
&lt;li>&lt;a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/">&lt;strong>KubeCon + CloudNativeCon North America 2025&lt;/strong>&lt;/a>: Nov 10-13, 2025 | Atlanta, USA&lt;/li>
&lt;li>&lt;a href="https://sessionize.com/kcd-hangzhou-and-oicd-2025/">&lt;strong>KCD - Kubernetes Community Days: Hangzhou&lt;/strong>&lt;/a>: Nov 15, 2025 | Hangzhou, China&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>December 2025&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://community.cncf.io/events/details/cncf-kcd-suisse-romande-presents-kcd-suisse-romande/">&lt;strong>KCD - Kubernetes Community Days: Suisse Romande&lt;/strong>&lt;/a>: Dec 4, 2025 | Geneva, Switzerland&lt;/li>
&lt;/ul>
&lt;p>You can find the latest event details &lt;a href="https://community.cncf.io/events/#/list">here&lt;/a>.&lt;/p>
&lt;h2 id="upcoming-release-webinar">Upcoming Release Webinar&lt;/h2>
&lt;p>Join members of the Kubernetes v1.34 Release Team on &lt;strong>Wednesday, September 24th 2025 at 4:00 PM (UTC)&lt;/strong>, to learn about the release highlights of this release. For more information and registration, visit the &lt;a href="https://community.cncf.io/events/details/cncf-cncf-online-programs-presents-cloud-native-live-kubernetes-v134-release/">event page&lt;/a> on the CNCF Online Programs site.&lt;/p>
&lt;h2 id="get-involved">Get Involved&lt;/h2>
&lt;p>The simplest way to get involved with Kubernetes is by joining one of the many &lt;a href="https://github.com/kubernetes/community/blob/master/sig-list.md">Special Interest Groups&lt;/a> (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly &lt;a href="https://github.com/kubernetes/community/tree/master/communication">community meeting&lt;/a>, and through the channels below. Thank you for your continued feedback and support.&lt;/p>
&lt;ul>
&lt;li>Follow us on Bluesky &lt;a href="https://bsky.app/profile/kubernetes.io">@Kubernetesio&lt;/a> for the latest updates&lt;/li>
&lt;li>Join the community discussion on &lt;a href="https://discuss.kubernetes.io/">Discuss&lt;/a>&lt;/li>
&lt;li>Join the community on &lt;a href="http://slack.k8s.io/">Slack&lt;/a>&lt;/li>
&lt;li>Post questions (or answer questions) on &lt;a href="http://stackoverflow.com/questions/tagged/kubernetes">Stack Overflow&lt;/a>&lt;/li>
&lt;li>Share your Kubernetes &lt;a href="https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform">story&lt;/a>&lt;/li>
&lt;li>Read more about what’s happening with Kubernetes on the &lt;a href="https://kubernetes.io/blog/">blog&lt;/a>&lt;/li>
&lt;li>Learn more about the &lt;a href="https://github.com/kubernetes/sig-release/tree/master/release-team">Kubernetes Release Team&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Tuning Linux Swap for Kubernetes: A Deep Dive</title><link>https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/</link><pubDate>Tue, 19 Aug 2025 10:30:00 -0800</pubDate><guid>https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/</guid><description>
&lt;p>The Kubernetes &lt;a href="https://kubernetes.io/docs/concepts/cluster-administration/swap-memory-management/">NodeSwap feature&lt;/a>, likely to graduate to &lt;em>stable&lt;/em> in the upcoming Kubernetes v1.34 release,
allows swap usage:
a significant shift from the conventional practice of disabling swap for performance predictability.
This article focuses exclusively on tuning swap on Linux nodes, where this feature is available. By allowing Linux nodes to use secondary storage for additional virtual memory when physical RAM is exhausted, node swap support aims to improve resource utilization and reduce out-of-memory (OOM) kills.&lt;/p>
&lt;p>However, enabling swap is not a &amp;quot;turn-key&amp;quot; solution. The performance and stability of your nodes under memory pressure are critically dependent on a set of Linux kernel parameters. Misconfiguration can lead to performance degradation and interfere with Kubelet's eviction logic.&lt;/p>
&lt;p>In this blogpost, I'll dive into critical Linux kernel parameters that govern swap behavior. I will explore how these parameters influence Kubernetes workload performance, swap utilization, and crucial eviction mechanisms.
I will present various test results showcasing the impact of different configurations, and share my findings on achieving optimal settings for stable and high-performing Kubernetes clusters.&lt;/p>
&lt;h2 id="introduction-to-linux-swap">Introduction to Linux swap&lt;/h2>
&lt;p>At a high level, the Linux kernel manages memory through pages, typically 4KiB in size. When physical memory becomes constrained, the kernel's page replacement algorithm decides which pages to move to swap space. While the exact logic is a sophisticated optimization, this decision-making process is influenced by certain key factors:&lt;/p>
&lt;ol>
&lt;li>Page access patterns (how recently pages are accessed)&lt;/li>
&lt;li>Page dirtyness (whether pages have been modified)&lt;/li>
&lt;li>Memory pressure (how urgently the system needs free memory)&lt;/li>
&lt;/ol>
&lt;h3 id="anonymous-vs-file-backed-memory">Anonymous vs File-backed memory&lt;/h3>
&lt;p>It is important to understand that not all memory pages are the same. The kernel distinguishes between anonymous and file-backed memory.&lt;/p>
&lt;p>&lt;strong>Anonymous memory&lt;/strong>: This is memory that is not backed by a specific file on the disk, such as a program's heap and stack. From the application's perspective this is private memory, and when the kernel needs to reclaim these pages, it must write them to a dedicated swap device.&lt;/p>
&lt;p>&lt;strong>File-backed memory&lt;/strong>: This memory is backed by a file on a filesystem. This includes a program's executable code, shared libraries, and filesystem caches. When the kernel needs to reclaim these pages, it can simply discard them if they have not been modified (&amp;quot;clean&amp;quot;). If a page has been modified (&amp;quot;dirty&amp;quot;), the kernel must first write the changes back to the file before it can be discarded.&lt;/p>
&lt;p>While a system without swap can still reclaim clean file-backed pages memory under pressure by dropping them, it has no way to offload anonymous memory. Enabling swap provides this capability, allowing the kernel to move less-frequently accessed memory pages to disk to conserve memory to avoid system OOM kills.&lt;/p>
&lt;h3 id="key-kernel-parameters-for-swap-tuning">Key kernel parameters for swap tuning&lt;/h3>
&lt;p>To effectively tune swap behavior, Linux provides several kernel parameters that can be managed via &lt;code>sysctl&lt;/code>.&lt;/p>
&lt;ul>
&lt;li>&lt;code>vm.swappiness&lt;/code>: This is the most well-known parameter. It is a value from 0 to 200 (100 in older kernels) that controls the kernel's preference for swapping anonymous memory pages versus reclaiming file-backed memory pages (page cache).
&lt;ul>
&lt;li>&lt;strong>High value (eg: 90+)&lt;/strong>: The kernel will be aggressive in swapping out less-used anonymous memory to make room for file-cache.&lt;/li>
&lt;li>&lt;strong>Low value (eg: &amp;lt; 10)&lt;/strong>: The kernel will strongly prefer dropping file cache pages over swapping anonymous memory.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>vm.min_free_kbytes&lt;/code>: This parameter tells the kernel to keep a minimum amount of memory free as a buffer. When the amount of free memory drops below the this safety buffer, the kernel starts more aggressively reclaiming pages (swapping, and eventually handling OOM kills).
&lt;ul>
&lt;li>&lt;strong>Function:&lt;/strong> It acts as a safety lever to ensure the kernel has enough memory for critical allocation requests that cannot be deferred.&lt;/li>
&lt;li>&lt;strong>Impact on swap&lt;/strong>: Setting a higher &lt;code>min_free_kbytes&lt;/code> effectively raises the floor for for free memory, causing the kernel to initiate swap earlier under memory pressure.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;code>vm.watermark_scale_factor&lt;/code>: This setting controls the gap between different watermarks: &lt;code>min&lt;/code>, &lt;code>low&lt;/code> and &lt;code>high&lt;/code>, which are calculated based on &lt;code>min_free_kbytes&lt;/code>.
&lt;ul>
&lt;li>&lt;strong>Watermarks explained&lt;/strong>:
&lt;ul>
&lt;li>&lt;code>low&lt;/code>: When free memory is below this mark, the &lt;code>kswapd&lt;/code> kernel process wakes up to reclaim pages in the background. This is when a swapping cycle begins.&lt;/li>
&lt;li>&lt;code>min&lt;/code>: When free memory hits this minimum level, then aggressive page reclamation will block process allocation. Failing to reclaim pages will cause OOM kills.&lt;/li>
&lt;li>&lt;code>high&lt;/code>: Memory reclamation stops once the free memory reaches this level.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Impact&lt;/strong>: A higher &lt;code>watermark_scale_factor&lt;/code> careates a larger buffer between the &lt;code>low&lt;/code> and &lt;code>min&lt;/code> watermarks. This gives &lt;code>kswapd&lt;/code> more time to reclaim memory gradually before the system hits a critical state.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>In a typical server workload, you might have a long-running process with some memory that becomes 'cold'. A higher &lt;code>swappiness&lt;/code> value can free up RAM by swapping out the cold memory, for other active processes that can benefit from keeping their file-cache.&lt;/p>
&lt;p>Tuning the &lt;code>min_free_kbytes&lt;/code> and &lt;code>watermark_scale_factor&lt;/code> parameters to move the swapping window early will give more room for &lt;code>kswapd&lt;/code> to offload memory to disk and prevent OOM kills during sudden memory spikes.&lt;/p>
&lt;h2 id="swap-tests-and-results">Swap tests and results&lt;/h2>
&lt;p>To understand the real-impact of these parameters, I designed a series of stress tests.&lt;/p>
&lt;h3 id="test-setup">Test setup&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Environment&lt;/strong>: GKE on Google Cloud&lt;/li>
&lt;li>&lt;strong>Kubernetes version&lt;/strong>: 1.33.2&lt;/li>
&lt;li>&lt;strong>Node configuration&lt;/strong>: &lt;code>n2-standard-2&lt;/code> (8GiB RAM, 50GB swap on a &lt;code>pd-balanced&lt;/code> disk, without encryption), Ubuntu 22.04&lt;/li>
&lt;li>&lt;strong>Workload&lt;/strong>: A custom Go application designed to allocate memory at a configurable rate, generate file-cache pressure, and simulate different memory access patterns (random vs sequential).&lt;/li>
&lt;li>&lt;strong>Monitoring&lt;/strong>: A sidecar container capturing system metrics every second.&lt;/li>
&lt;li>&lt;strong>Protection&lt;/strong>: Critical system components (kubelet, container runtime, sshd) were prevented from swapping by setting &lt;code>memory.swap.max=0&lt;/code> in their respective cgroups.&lt;/li>
&lt;/ul>
&lt;h3 id="test-methodology">Test methodology&lt;/h3>
&lt;p>I ran a stress-test pod on nodes with different swappiness settings (0, 60, and 90) and varied the &lt;code>min_free_kbytes&lt;/code> and &lt;code>watermark_scale_factor&lt;/code> parameters to observe the outcomes under heavy memory allocation and I/O pressure.&lt;/p>
&lt;h4 id="visualizing-swap-in-action">Visualizing swap in action&lt;/h4>
&lt;p>The graph below, from a 100MBps stress test, shows swap in action. As free memory (in the &amp;quot;Memory Usage&amp;quot; plot) decreases, swap usage (&lt;code>Swap Used (GiB)&lt;/code>) and swap-out activity (&lt;code>Swap Out (MiB/s)&lt;/code>) increase. Critically, as the system relies more on swap, the I/O activity and corresponding wait time (&lt;code>IO Wait %&lt;/code> in the &amp;quot;CPU Usage&amp;quot; plot) also rises, indicating CPU stress.&lt;/p>
&lt;p>&lt;img alt="Graph showing CPU, Memory, Swap utilization and I/O activity on a Kubernetes node" src="https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/swap_visualization.png" title="swap visualization">&lt;/p>
&lt;h3 id="findings">Findings&lt;/h3>
&lt;p>My initial tests with default kernel parameters (&lt;code>swappiness=60&lt;/code>, &lt;code>min_free_kbytes=68MB&lt;/code>, &lt;code>watermark_scale_factor=10&lt;/code>) quickly led to OOM kills and even unexpected node restarts under high memory pressure. With selecting appropriate kernel parameters a good balance in node stability and performance can be achieved.&lt;/p>
&lt;h4 id="the-impact-of-swappiness">The impact of &lt;code>swappiness&lt;/code>&lt;/h4>
&lt;p>The swappiness parameter directly influences the kernel's choice between reclaiming anonymous memory (swapping) and dropping page cache. To observe this, I ran a test where one pod generated and held file-cache pressure, followed by a second pod allocating anonymous memory at 100MB/s, to observe the kernel preference on reclaim:&lt;/p>
&lt;p>My findings reveal a clear trade-off:&lt;/p>
&lt;ul>
&lt;li>&lt;code>swappiness=90&lt;/code>: The kernel proactively swapped out the inactive anonymous memory to keep the file cache. This resulted in high and sustained swap usage and significant I/O activity (&amp;quot;Blocks Out&amp;quot;), which in turn caused spikes in I/O wait on the CPU.&lt;/li>
&lt;li>&lt;code>swappiness=0&lt;/code>: The kernel favored dropping file-cache pages delaying swap consumption. However, it's critical to understand that this &lt;strong>does not disable swapping&lt;/strong>. When memory pressure was high, the kernel still swapped anonymous memory to disk.&lt;/li>
&lt;/ul>
&lt;p>The choice is workload-dependent. For workloads sensitive to I/O latency, a lower swappiness is preferable. For workloads that rely on a large and frequently accessed file cache, a higher swappiness may be beneficial, provided the underlying disk is fast enough to handle the load.&lt;/p>
&lt;h4 id="tuning-watermarks-to-prevent-eviction-and-oom-kills">Tuning watermarks to prevent eviction and OOM kills&lt;/h4>
&lt;p>The most critical challenge I encountered was the interaction between rapid memory allocation and Kubelet's eviction mechanism. When my test pod, which was deliberately configured to overcommit memory, allocated it at a high rate (e.g., 300-500 MBps), the system quickly ran out of free memory.&lt;/p>
&lt;p>With default watermarks, the buffer for reclamation was too small. Before &lt;code>kswapd&lt;/code> could free up enough memory by swapping, the node would hit a critical state, leading to two potential outcomes:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Kubelet eviction&lt;/strong> If kubelet's eviction manager detected &lt;code>memory.available&lt;/code> was below its threshold, it would evict the pod.&lt;/li>
&lt;li>&lt;strong>OOM killer&lt;/strong> In some high-rate scenarios, the OOM Killer would activate before eviction could complete, sometimes killing higher priority pods that were not the source of the pressure.&lt;/li>
&lt;/ol>
&lt;p>To mitigate this I tuned the watermarks:&lt;/p>
&lt;ol>
&lt;li>Increased &lt;code>min_free_kbytes&lt;/code> to 512MiB: This forces the kernel to start reclaiming memory much earlier, providing a larger safety buffer.&lt;/li>
&lt;li>Increased &lt;code>watermark_scale_factor&lt;/code> to 2000: This widened the gap between the &lt;code>low&lt;/code> and &lt;code>high&lt;/code> watermarks (from ≈337MB to ≈591MB in my test node's &lt;code>/proc/zoneinfo&lt;/code>), effectively increasing the swapping window.&lt;/li>
&lt;/ol>
&lt;p>This combination gave &lt;code>kswapd&lt;/code> a larger operational zone and more time to swap pages to disk during memory spikes, successfully preventing both premature evictions and OOM kills in my test runs.&lt;/p>
&lt;p>Table compares watermark levels from &lt;code>/proc/zoneinfo&lt;/code> (Non-NUMA node):&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;code>min_free_kbytes=67584KiB&lt;/code> and &lt;code>watermark_scale_factor=10&lt;/code>&lt;/th>
&lt;th>&lt;code>min_free_kbytes=524288KiB&lt;/code> and &lt;code>watermark_scale_factor=2000&lt;/code>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Node 0, zone Normal &lt;br>   pages free 583273 &lt;br>   boost 0 &lt;br>   min 10504 &lt;br>   low 13130 &lt;br>   high 15756 &lt;br>   spanned 1310720 &lt;br>   present 1310720 &lt;br>   managed 1265603&lt;/td>
&lt;td>Node 0, zone Normal &lt;br>   pages free 470539 &lt;br>   min 82109 &lt;br>   low 337017 &lt;br>   high 591925&lt;br>   spanned 1310720&lt;br>   present 1310720 &lt;br>   managed 1274542&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The graph below reveals that the kernel buffer size and scaling factor play a crucial role in determining how the system responds to memory load. With the right combination of these parameters, the system can effectively use swap space to avoid eviction and maintain stability.&lt;/p>
&lt;p>&lt;img alt="A side-by-side comparison of different min_free_kbytes settings, showing differences in Swap, Memory Usage and Eviction impact" src="https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/memory-and-swap-growth.png" title="Memory and Swap Utilization with min_free_kbytes">&lt;/p>
&lt;h3 id="risks-and-recommendations">Risks and recommendations&lt;/h3>
&lt;p>Enabling swap in Kubernetes is a powerful tool, but it comes with risks that must be managed through careful tuning.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Risk of performance degradation&lt;/strong> Swapping is orders of magnitude slower than accessing RAM. If an application's active working set is swapped out, its performance will suffer dramatically due to high I/O wait times (thrashing). Swap could preferably be provisioned with a SSD backed storage to improve performance.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Risk of masking memory leaks&lt;/strong> Swap can hide memory leaks in applications, which might otherwise lead to a quick OOM kill. With swap, a leaky application might slowly degrade node performance over time, making the root cause harder to diagnose.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Risk of disabling evictions&lt;/strong> Kubelet proactively monitors the node for memory-pressure and terminates pods to reclaim the resources. Improper tuning can lead to OOM kills before kubelet has a chance to evict pods gracefully. A properly configured &lt;code>min_free_kbytes&lt;/code> is essential to ensure kubelet's eviction mechanism remains effective.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="kubernetes-context">Kubernetes context&lt;/h3>
&lt;p>Together, the kernel watermarks and kubelet eviction threshold create a series of memory pressure zones on a node. The eviction-threshold parameters need to be adjusted to configure Kubernetes managed evictions occur before the OOM kills.&lt;/p>
&lt;p>&lt;img alt="Preferred thresholds for effective swap utilization" src="https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/swap-thresholds.png" title="Recommended Thresholds">&lt;/p>
&lt;p>As the diagram shows, an ideal configuration will be to create a large enough 'swapping zone' (between &lt;code>high&lt;/code> and &lt;code>min&lt;/code> watermarks) so that the kernel can handle memory pressure by swapping before available memory drops into the Eviction/Direct Reclaim zone.&lt;/p>
&lt;h3 id="recommended-starting-point">Recommended starting point&lt;/h3>
&lt;p>Based on these findings, I recommend the following as a starting point for Linux nodes with swap enabled. You should benchmark this with your own workloads.&lt;/p>
&lt;ul>
&lt;li>&lt;code>vm.swappiness=60&lt;/code>: Linux default is a good starting point for general-purpose workloads. However, the ideal value is workload-dependent, and swap-sensitive applications may need more careful tuning.&lt;/li>
&lt;li>&lt;code>vm.min_free_kbytes=500000&lt;/code> (500MB): Set this to a reasonably high value (e.g., 2-3% of total node memory) to give the node a reasonable safety buffer.&lt;/li>
&lt;li>&lt;code>vm.watermark_scale_factor=2000&lt;/code>: Create a larger window for &lt;code>kswapd&lt;/code> to work with, preventing OOM kills during sudden memory allocation spikes.&lt;/li>
&lt;/ul>
&lt;p>I encourage running benchmark tests with your own workloads in test-environments, when setting up swap for the first time in your Kubernetes cluster. Swap performance can be sensitive to different environment differences such as CPU load, disk type (SSD vs HDD) and I/O patterns.&lt;/p></description></item><item><title>Introducing Headlamp AI Assistant</title><link>https://kubernetes.io/blog/2025/08/07/introducing-headlamp-ai-assistant/</link><pubDate>Thu, 07 Aug 2025 20:00:00 +0100</pubDate><guid>https://kubernetes.io/blog/2025/08/07/introducing-headlamp-ai-assistant/</guid><description>
&lt;p>&lt;em>This announcement originally &lt;a href="https://headlamp.dev/blog/2025/08/07/introducing-the-headlamp-ai-assistant">appeared&lt;/a> on the Headlamp blog.&lt;/em>&lt;/p>
&lt;p>To simplify Kubernetes management and troubleshooting, we're thrilled to
introduce &lt;a href="https://github.com/headlamp-k8s/plugins/tree/main/ai-assistant#readme">Headlamp AI Assistant&lt;/a>: a powerful new plugin for Headlamp that helps
you understand and operate your Kubernetes clusters and applications with
greater clarity and ease.&lt;/p>
&lt;p>Whether you're a seasoned engineer or just getting started, the AI Assistant offers:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Fast time to value:&lt;/strong> Ask questions like &lt;em>&amp;quot;Is my application healthy?&amp;quot;&lt;/em> or
&lt;em>&amp;quot;How can I fix this?&amp;quot;&lt;/em> without needing deep Kubernetes knowledge.&lt;/li>
&lt;li>&lt;strong>Deep insights:&lt;/strong> Start with high-level queries and dig deeper with prompts
like &lt;em>&amp;quot;List all the problematic pods&amp;quot;&lt;/em> or &lt;em>&amp;quot;How can I fix this pod?&amp;quot;&lt;/em>&lt;/li>
&lt;li>&lt;strong>Focused &amp;amp; relevant:&lt;/strong> Ask questions in the context of what you're viewing
in the UI, such as &lt;em>&amp;quot;What's wrong here?&amp;quot;&lt;/em>&lt;/li>
&lt;li>&lt;strong>Action-oriented:&lt;/strong> Let the AI take action for you, like &lt;em>&amp;quot;Restart that
deployment&amp;quot;&lt;/em>, with your permission.&lt;/li>
&lt;/ul>
&lt;p>Here is a demo of the AI Assistant in action as it helps troubleshoot an
application running with issues in a Kubernetes cluster:&lt;/p>
&lt;div class="youtube-quote-sm">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/GzXkUuCTcd4?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" title="Headlamp AI Assistant"
>&lt;/iframe>
&lt;/div>
&lt;h2 id="hopping-on-the-ai-train">Hopping on the AI train&lt;/h2>
&lt;p>Large Language Models (LLMs) have transformed not just how we access data but
also how we interact with it. The rise of tools like ChatGPT opened a world of
possibilities, inspiring a wave of new applications. Asking questions or giving
commands in natural language is intuitive, especially for users who aren't deeply
technical. Now everyone can quickly ask how to do X or Y, without feeling awkward
or having to traverse pages and pages of documentation like before.&lt;/p>
&lt;p>Therefore, Headlamp AI Assistant brings a conversational UI to &lt;a href="https://headlamp.dev">Headlamp&lt;/a>,
powered by LLMs that Headlamp users can configure with their own API keys.
It is available as a Headlamp plugin, making it easy to integrate into your
existing setup. Users can enable it by installing the plugin and configuring
it with their own LLM API keys, giving them control over which model powers
the assistant. Once enabled, the assistant becomes part of the Headlamp UI,
ready to respond to contextual queries and perform actions directly from the
interface.&lt;/p>
&lt;h2 id="context-is-everything">Context is everything&lt;/h2>
&lt;p>As expected, the AI Assistant is focused on helping users with Kubernetes
concepts. Yet, while there is a lot of value in responding to Kubernetes
related questions from Headlamp's UI, we believe that the great benefit of such
an integration is when it can use the context of what the user is experiencing
in an application. So, the Headlamp AI Assistant knows what you're currently
viewing in Headlamp, and this makes the interaction feel more like working
with a human assistant.&lt;/p>
&lt;p>For example, if a pod is failing, users can simply ask &lt;em>&amp;quot;What's wrong here?&amp;quot;&lt;/em>
and the AI Assistant will respond with the root cause, like a missing
environment variable or a typo in the image name. Follow-up prompts like
&lt;em>&amp;quot;How can I fix this?&amp;quot;&lt;/em> allow the AI Assistant to suggest a fix, streamlining
what used to take multiple steps into a quick, conversational flow.&lt;/p>
&lt;p>Sharing the context from Headlamp is not a trivial task though, so it's
something we will keep working on perfecting.&lt;/p>
&lt;h2 id="tools">Tools&lt;/h2>
&lt;p>Context from the UI is helpful, but sometimes additional capabilities are
needed. If the user is viewing the pod list and wants to identify problematic
deployments, switching views should not be necessary. To address this, the AI
Assistant includes support for a Kubernetes tool. This allows asking questions
like &amp;quot;Get me all deployments with problems&amp;quot; prompting the assistant to fetch
and display relevant data from the current cluster. Likewise, if the user
requests an action like &amp;quot;Restart that deployment&amp;quot; after the AI points out what
deployment needs restarting, it can also do that. In case of &amp;quot;write&amp;quot;
operations, the AI Assistant does check with the user for permission to run them.&lt;/p>
&lt;h2 id="ai-plugins">AI Plugins&lt;/h2>
&lt;p>Although the initial version of the AI Assistant is already useful for
Kubernetes users, future iterations will expand its capabilities. Currently,
the assistant supports only the Kubernetes tool, but further integration with
Headlamp plugins is underway. Similarly, we could get richer insights for
GitOps via the Flux plugin, monitoring through Prometheus, package management
with Helm, and more.&lt;/p>
&lt;p>And of course, as the popularity of MCP grows, we are looking into how to
integrate it as well, for a more plug-and-play fashion.&lt;/p>
&lt;h2 id="try-it-out">Try it out!&lt;/h2>
&lt;p>We hope this first version of the AI Assistant helps users manage Kubernetes
clusters more effectively and assist newcomers in navigating the learning
curve. We invite you to try out this early version and give us your feedback.
The AI Assistant plugin can be installed from Headlamp's Plugin Catalog in the
desktop version, or by using the container image when deploying Headlamp.
Stay tuned for the future versions of the Headlamp AI Assistant!&lt;/p></description></item><item><title>Kubernetes v1.34 Sneak Peek</title><link>https://kubernetes.io/blog/2025/07/28/kubernetes-v1-34-sneak-peek/</link><pubDate>Mon, 28 Jul 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/07/28/kubernetes-v1-34-sneak-peek/</guid><description>
&lt;p>Kubernetes v1.34 is coming at the end of August 2025.
This release will not include any removal or deprecation, but it is packed with an impressive number of enhancements.
Here are some of the features we are most excited about in this cycle!&lt;/p>
&lt;p>Please note that this information reflects the current state of v1.34 development and may change before release.&lt;/p>
&lt;h2 id="featured-enhancements-of-kubernetes-v1-34">Featured enhancements of Kubernetes v1.34&lt;/h2>
&lt;p>The following list highlights some of the notable enhancements likely to be included in the v1.34 release,
but is not an exhaustive list of all planned changes.
This is not a commitment and the release content is subject to change.&lt;/p>
&lt;h3 id="the-core-of-dra-targets-stable">The core of DRA targets stable&lt;/h3>
&lt;p>&lt;a href="https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/">Dynamic Resource Allocation&lt;/a> (DRA) provides a flexible way to categorize,
request, and use devices like GPUs or custom hardware in your Kubernetes cluster.&lt;/p>
&lt;p>Since the v1.30 release, DRA has been based around claiming devices using &lt;em>structured parameters&lt;/em> that are opaque to the core of Kubernetes.
The relevant enhancement proposal, &lt;a href="https://kep.k8s.io/4381">KEP-4381&lt;/a>, took inspiration from dynamic provisioning for storage volumes.
DRA with structured parameters relies on a set of supporting API kinds: ResourceClaim, DeviceClass, ResourceClaimTemplate,
and ResourceSlice API types under &lt;code>resource.k8s.io&lt;/code>, while extending the &lt;code>.spec&lt;/code> for Pods with a new &lt;code>resourceClaims&lt;/code> field.
The core of DRA is targeting graduation to stable in Kubernetes v1.34.&lt;/p>
&lt;p>With DRA, device drivers and cluster admins define device classes that are available for use.
Workloads can claim devices from a device class within device requests.
Kubernetes allocates matching devices to specific claims and places the corresponding Pods on nodes that can access the allocated devices.
This framework provides flexible device filtering using CEL, centralized device categorization, and simplified Pod requests, among other benefits.&lt;/p>
&lt;p>Once this feature has graduated, the &lt;code>resource.k8s.io/v1&lt;/code> APIs will be available by default.&lt;/p>
&lt;h3 id="serviceaccount-tokens-for-image-pull-authentication">ServiceAccount tokens for image pull authentication&lt;/h3>
&lt;p>The &lt;a href="https://kubernetes.io/docs/concepts/security/service-accounts/">ServiceAccount&lt;/a> token integration for &lt;code>kubelet&lt;/code> credential providers is likely to reach beta and be enabled by default in Kubernetes v1.34.
This allows the &lt;code>kubelet&lt;/code> to use these tokens when pulling container images from registries that require authentication.&lt;/p>
&lt;p>That support already exists as alpha, and is tracked as part of &lt;a href="https://kep.k8s.io/4412">KEP-4412&lt;/a>.&lt;/p>
&lt;p>The existing alpha integration allows the &lt;code>kubelet&lt;/code> to use short-lived, automatically rotated ServiceAccount tokens (that follow OIDC-compliant semantics) to authenticate to a container image registry.
Each token is scoped to one associated Pod; the overall mechanism replaces the need for long-lived image pull Secrets.&lt;/p>
&lt;p>Adopting this new approach reduces security risks, supports workload-level identity, and helps cut operational overhead.
It brings image pull authentication closer to modern, identity-aware good practice.&lt;/p>
&lt;h3 id="pod-replacement-policy-for-deployments">Pod replacement policy for Deployments&lt;/h3>
&lt;p>After a change to a &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Deployment&lt;/a>, terminating pods may stay up for a considerable amount of time and may consume additional resources.
As part of &lt;a href="https://kep.k8s.io/3973">KEP-3973&lt;/a>, the &lt;code>.spec.podReplacementPolicy&lt;/code> field will be introduced (as alpha) for Deployments.&lt;/p>
&lt;p>If your cluster has the feature enabled, you'll be able to select one of two policies:&lt;/p>
&lt;dl>
&lt;dt>&lt;code>TerminationStarted&lt;/code>&lt;/dt>
&lt;dd>Creates new pods as soon as old ones start terminating, resulting in faster rollouts at the cost of potentially higher resource consumption.&lt;/dd>
&lt;dt>&lt;code>TerminationComplete&lt;/code>&lt;/dt>
&lt;dd>Waits until old pods fully terminate before creating new ones, resulting in slower rollouts but ensuring controlled resource consumption.&lt;/dd>
&lt;/dl>
&lt;p>This feature makes Deployment behavior more predictable by letting you choose when new pods should be created during updates or scaling.
It's beneficial when working in clusters with tight resource constraints or with workloads with long termination periods.&lt;/p>
&lt;p>It's expected to be available as an alpha feature and can be enabled using the &lt;code>DeploymentPodReplacementPolicy&lt;/code> and &lt;code>DeploymentReplicaSetTerminatingReplicas&lt;/code> feature gates in the API server and kube-controller-manager.&lt;/p>
&lt;h3 id="production-ready-tracing-for-kubelet-and-api-server">Production-ready tracing for &lt;code>kubelet&lt;/code> and API Server&lt;/h3>
&lt;p>To address the longstanding challenge of debugging node-level issues by correlating disconnected logs,
&lt;a href="https://kep.k8s.io/2831">KEP-2831&lt;/a> provides deep, contextual insights into the &lt;code>kubelet&lt;/code>.&lt;/p>
&lt;p>This feature instruments critical &lt;code>kubelet&lt;/code> operations, particularly its gRPC calls to the Container Runtime Interface (CRI), using the vendor-agnostic OpenTelemetry standard.
It allows operators to visualize the entire lifecycle of events (for example: a Pod startup) to pinpoint sources of latency and errors.
Its most powerful aspect is the propagation of trace context; the &lt;code>kubelet&lt;/code> passes a trace ID with its requests to the container runtime, enabling runtimes to link their own spans.&lt;/p>
&lt;p>This effort is complemented by a parallel enhancement, &lt;a href="https://kep.k8s.io/647">KEP-647&lt;/a>, which brings the same tracing capabilities to the Kubernetes API server.
Together, these enhancements provide a more unified, end-to-end view of events, simplifying the process of pinpointing latency and errors from the control plane down to the node.
These features have matured through the official Kubernetes release process.
&lt;a href="https://kep.k8s.io/2831">KEP-2831&lt;/a> was introduced as an alpha feature in v1.25, while &lt;a href="https://kep.k8s.io/647">KEP-647&lt;/a> debuted as alpha in v1.22.
Both enhancements were promoted to beta together in the v1.27 release.
Looking forward, Kubelet Tracing (&lt;a href="https://kep.k8s.io/2831">KEP-2831&lt;/a>) and API Server Tracing (&lt;a href="https://kep.k8s.io/647">KEP-647&lt;/a>) are now targeting graduation to stable in the upcoming v1.34 release.&lt;/p>
&lt;h3 id="prefersamezone-and-prefersamenode-traffic-distribution-for-services">&lt;code>PreferSameZone&lt;/code> and &lt;code>PreferSameNode&lt;/code> traffic distribution for Services&lt;/h3>
&lt;p>The &lt;code>spec.trafficDistribution&lt;/code> field within a Kubernetes &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/">Service&lt;/a> allows users to express preferences for how traffic should be routed to Service endpoints.&lt;/p>
&lt;p>&lt;a href="https://kep.k8s.io/3015">KEP-3015&lt;/a> deprecates &lt;code>PreferClose&lt;/code> and introduces two additional values: &lt;code>PreferSameZone&lt;/code> and &lt;code>PreferSameNode&lt;/code>.
&lt;code>PreferSameZone&lt;/code> is equivalent to the current &lt;code>PreferClose&lt;/code>.
&lt;code>PreferSameNode&lt;/code> prioritizes sending traffic to endpoints on the same node as the client.&lt;/p>
&lt;p>This feature was introduced in v1.33 behind the &lt;code>PreferSameTrafficDistribution&lt;/code> feature gate.
It is targeting graduation to beta in v1.34 with its feature gate enabled by default.&lt;/p>
&lt;h3 id="support-for-kyaml-a-kubernetes-dialect-of-yaml">Support for KYAML: a Kubernetes dialect of YAML&lt;/h3>
&lt;p>KYAML aims to be a safer and less ambiguous YAML subset, and was designed specifically
for Kubernetes. Whatever version of Kubernetes you use, you'll be able use KYAML for writing manifests
and/or Helm charts.
You can write KYAML and pass it as an input to &lt;strong>any&lt;/strong> version of &lt;code>kubectl&lt;/code>,
because all KYAML files are also valid as YAML.
With kubectl v1.34, we expect you'll also be able to request KYAML output from &lt;code>kubectl&lt;/code> (as in &lt;code>kubectl get -o kyaml …&lt;/code>).
If you prefer, you can still request the output in JSON or YAML format.&lt;/p>
&lt;p>KYAML addresses specific challenges with both YAML and JSON.
YAML's significant whitespace requires careful attention to indentation and nesting,
while its optional string-quoting can lead to unexpected type coercion (for example: &lt;a href="https://hitchdev.com/strictyaml/why/implicit-typing-removed/">&amp;quot;The Norway Bug&amp;quot;&lt;/a>).
Meanwhile, JSON lacks comment support and has strict requirements for trailing commas and quoted keys.&lt;/p>
&lt;p>&lt;a href="https://kep.k8s.io/5295">KEP-5295&lt;/a> introduces KYAML, which tries to address the most significant problems by:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Always double-quoting value strings&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Leaving keys unquoted unless they are potentially ambiguous&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Always using &lt;code>{}&lt;/code> for mappings (associative arrays)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Always using &lt;code>[]&lt;/code> for lists&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>This might sound a lot like JSON, because it is! But unlike JSON, KYAML supports comments, allows trailing commas, and doesn't require quoted keys.&lt;/p>
&lt;p>We're hoping to see KYAML introduced as a new output format for &lt;code>kubectl&lt;/code> v1.34.
As with all these features, none of these changes are 100% confirmed; watch this space!&lt;/p>
&lt;p>As a format, KYAML is and will remain a &lt;strong>strict subset of YAML&lt;/strong>, ensuring that any compliant YAML parser can parse KYAML documents.
Kubernetes does not require you to provide input specifically formatted as KYAML, and we have no plans to change that.&lt;/p>
&lt;h3 id="fine-grained-autoscaling-control-with-hpa-configurable-tolerance">Fine-grained autoscaling control with HPA configurable tolerance&lt;/h3>
&lt;p>&lt;a href="https://kep.k8s.io/4951">KEP-4951&lt;/a> introduces a new feature that allows users to configure autoscaling tolerance on a per-HPA basis,
overriding the default cluster-wide 10% tolerance setting that often proves too coarse-grained for diverse workloads.
The enhancement adds an optional &lt;code>tolerance&lt;/code> field to the HPA's &lt;code>spec.behavior.scaleUp&lt;/code> and &lt;code>spec.behavior.scaleDown&lt;/code> sections,
enabling different tolerance values for scale-up and scale-down operations,
which is particularly valuable since scale-up responsiveness is typically more critical than scale-down speed for handling traffic surges.&lt;/p>
&lt;p>Released as alpha in Kubernetes v1.33 behind the &lt;code>HPAConfigurableTolerance&lt;/code> feature gate, this feature is expected to graduate to beta in v1.34.
This improvement helps to address scaling challenges with large deployments, where for scaling in,
a 10% tolerance might mean leaving hundreds of unnecessary Pods running.
Using the new, more flexible approach would enable workload-specific optimization for both
responsive and conservative scaling behaviors.&lt;/p>
&lt;h2 id="want-to-know-more">Want to know more?&lt;/h2>
&lt;p>New features and deprecations are also announced in the Kubernetes release notes.
We will formally announce what's new in &lt;a href="https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.34.md">Kubernetes v1.34&lt;/a> as part of the CHANGELOG for that release.&lt;/p>
&lt;p>The Kubernetes v1.34 release is planned for &lt;strong>Wednesday 27th August 2025&lt;/strong>. Stay tuned for updates!&lt;/p>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>The simplest way to get involved with Kubernetes is to join one of the many &lt;a href="https://github.com/kubernetes/community/blob/master/sig-list.md">Special Interest Groups&lt;/a> (SIGs) that align with your interests.
Have something you'd like to broadcast to the Kubernetes community? Share your voice at our weekly &lt;a href="https://github.com/kubernetes/community/tree/master/communication">community meeting&lt;/a>, and through the channels below.
Thank you for your continued feedback and support.&lt;/p>
&lt;ul>
&lt;li>Follow us on Bluesky &lt;a href="https://bsky.app/profile/kubernetes.io">@kubernetes.io&lt;/a> for the latest updates&lt;/li>
&lt;li>Join the community discussion on &lt;a href="https://discuss.kubernetes.io/">Discuss&lt;/a>&lt;/li>
&lt;li>Join the community on &lt;a href="http://slack.k8s.io/">Slack&lt;/a>&lt;/li>
&lt;li>Post questions (or answer questions) on &lt;a href="https://serverfault.com/questions/tagged/kubernetes">Server Fault&lt;/a> or &lt;a href="http://stackoverflow.com/questions/tagged/kubernetes">Stack Overflow&lt;/a>&lt;/li>
&lt;li>Share your Kubernetes &lt;a href="https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform">story&lt;/a>&lt;/li>
&lt;li>Read more about what's happening with Kubernetes on the &lt;a href="https://kubernetes.io/blog/">blog&lt;/a>&lt;/li>
&lt;li>Learn more about the &lt;a href="https://github.com/kubernetes/sig-release/tree/master/release-team">Kubernetes Release Team&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Post-Quantum Cryptography in Kubernetes</title><link>https://kubernetes.io/blog/2025/07/18/pqc-in-k8s/</link><pubDate>Fri, 18 Jul 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/07/18/pqc-in-k8s/</guid><description>
&lt;p>The world of cryptography is on the cusp of a major shift with the advent of
quantum computing. While powerful quantum computers are still largely
theoretical for many applications, their potential to break current
cryptographic standards is a serious concern, especially for long-lived
systems. This is where &lt;em>Post-Quantum Cryptography&lt;/em> (PQC) comes in. In this
article, I'll dive into what PQC means for TLS and, more specifically, for the
Kubernetes ecosystem. I'll explain what the (suprising) state of PQC in
Kubernetes is and what the implications are for current and future clusters.&lt;/p>
&lt;h2 id="what-is-post-quantum-cryptography">What is Post-Quantum Cryptography&lt;/h2>
&lt;p>Post-Quantum Cryptography refers to cryptographic algorithms that are thought to
be secure against attacks by both classical and quantum computers. The primary
concern is that quantum computers, using algorithms like &lt;a href="https://en.wikipedia.org/wiki/Shor%27s_algorithm">Shor's Algorithm&lt;/a>,
could efficiently break widely used public-key cryptosystems such as RSA and
Elliptic Curve Cryptography (ECC), which underpin much of today's secure
communication, including TLS. The industry is actively working on standardizing
and adopting PQC algorithms. One of the first to be standardized by &lt;a href="https://www.nist.gov/">NIST&lt;/a> is
the Module-Lattice Key Encapsulation Mechanism (&lt;code>ML-KEM&lt;/code>), formerly known as
Kyber, and now standardized as &lt;a href="https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.203.pdf">FIPS-203&lt;/a> (PDF download).&lt;/p>
&lt;p>It is difficult to predict when quantum computers will be able to break
classical algorithms. However, it is clear that we need to start migrating to
PQC algorithms now, as the next section shows. To get a feeling for the
predicted timeline we can look at a &lt;a href="https://nvlpubs.nist.gov/nistpubs/ir/2024/NIST.IR.8547.ipd.pdf">NIST report&lt;/a> covering the transition to
post-quantum cryptography standards. It declares that system with classical
crypto should be deprecated after 2030 and disallowed after 2035.&lt;/p>
&lt;h2 id="timelines">Key exchange vs. digital signatures: different needs, different timelines&lt;/h2>
&lt;p>In TLS, there are two main cryptographic operations we need to secure:&lt;/p>
&lt;p>&lt;strong>Key Exchange&lt;/strong>: This is how the client and server agree on a shared secret to
encrypt their communication. If an attacker records encrypted traffic today,
they could decrypt it in the future, if they gain access to a quantum computer
capable of breaking the key exchange. This makes migrating KEMs to PQC an
immediate priority.&lt;/p>
&lt;p>&lt;strong>Digital Signatures&lt;/strong>: These are primarily used to authenticate the server (and
sometimes the client) via certificates. The authenticity of a server is
verified at the time of connection. While important, the risk of an attack
today is much lower, because the decision of trusting a server cannot be abused
after the fact. Additionally, current PQC signature schemes often come with
significant computational overhead and larger key/signature sizes compared to
their classical counterparts.&lt;/p>
&lt;p>Another significant hurdle in the migration to PQ certificates is the upgrade
of root certificates. These certificates have long validity periods and are
installed in many devices and operating systems as trust anchors.&lt;/p>
&lt;p>Given these differences, the focus for immediate PQC adoption in TLS has been
on hybrid key exchange mechanisms. These combine a classical algorithm (such as
Elliptic Curve Diffie-Hellman Ephemeral (ECDHE)) with a PQC algorithm (such as
&lt;code>ML-KEM&lt;/code>). The resulting shared secret is secure as long as at least one of the
component algorithms remains unbroken. The &lt;code>X25519MLKEM768&lt;/code> hybrid scheme is the
most widely supported one.&lt;/p>
&lt;h2 id="state-of-kems">State of PQC key exchange mechanisms (KEMs) today&lt;/h2>
&lt;p>Support for PQC KEMs is rapidly improving across the ecosystem.&lt;/p>
&lt;p>&lt;strong>Go&lt;/strong>: The Go standard library's &lt;code>crypto/tls&lt;/code> package introduced support for
&lt;code>X25519MLKEM768&lt;/code> in version 1.24 (released February 2025). Crucially, it's
enabled by default when there is no explicit configuration, i.e.,
&lt;code>Config.CurvePreferences&lt;/code> is &lt;code>nil&lt;/code>.&lt;/p>
&lt;p>&lt;strong>Browsers &amp;amp; OpenSSL&lt;/strong>: Major browsers like Chrome (version 131, November 2024)
and Firefox (version 135, February 2025), as well as OpenSSL (version 3.5.0,
April 2025), have also added support for the &lt;code>ML-KEM&lt;/code> based hybrid scheme.&lt;/p>
&lt;p>Apple is also &lt;a href="https://support.apple.com/en-lb/122756">rolling out support&lt;/a> for &lt;code>X25519MLKEM768&lt;/code> in version
26 of their operating systems. Given the proliferation of Apple devices, this
will have a significant impact on the global PQC adoption.&lt;/p>
&lt;p>For a more detailed overview of the state of PQC in the wider industry,
see &lt;a href="https://blog.cloudflare.com/pq-2024/">this blog post by Cloudflare&lt;/a>.&lt;/p>
&lt;h2 id="post-quantum-kems-in-kubernetes-an-unexpected-arrival">Post-quantum KEMs in Kubernetes: an unexpected arrival&lt;/h2>
&lt;p>So, what does this mean for Kubernetes? Kubernetes components, including the
API server and kubelet, are built with Go.&lt;/p>
&lt;p>As of Kubernetes v1.33, released in April 2025, the project uses Go 1.24. A
quick check of the Kubernetes codebase reveals that &lt;code>Config.CurvePreferences&lt;/code>
is not explicitly set. This leads to a fascinating conclusion: Kubernetes
v1.33, by virtue of using Go 1.24, supports hybrid post-quantum
&lt;code>X25519MLKEM768&lt;/code> for TLS connections by default!&lt;/p>
&lt;p>You can test this yourself. If you set up a Minikube cluster running Kubernetes
v1.33.0, you can connect to the API server using a recent OpenSSL client:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-console" data-lang="console">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#000080;font-weight:bold">$&lt;/span> minikube start --kubernetes-version&lt;span style="color:#666">=&lt;/span>v1.33.0
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#000080;font-weight:bold">$&lt;/span> kubectl cluster-info
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">Kubernetes control plane is running at https://127.0.0.1:&amp;lt;PORT&amp;gt;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">&lt;/span>&lt;span style="color:#000080;font-weight:bold">$&lt;/span> kubectl config view --minify --raw -o &lt;span style="color:#b8860b">jsonpath&lt;/span>&lt;span style="color:#666">=&lt;/span>&lt;span style="color:#b62;font-weight:bold">\&amp;#39;&lt;/span>&lt;span style="color:#666">{&lt;/span>.clusters&lt;span style="color:#666">[&lt;/span>0&lt;span style="color:#666">]&lt;/span>.cluster.certificate-authority-data&lt;span style="color:#666">}&lt;/span>&lt;span style="color:#b62;font-weight:bold">\&amp;#39;&lt;/span> | base64 -d &amp;gt; ca.crt
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#000080;font-weight:bold">$&lt;/span> openssl version
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">OpenSSL 3.5.0 8 Apr 2025 (Library: OpenSSL 3.5.0 8 Apr 2025)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">&lt;/span>&lt;span style="color:#000080;font-weight:bold">$&lt;/span> &lt;span style="color:#a2f">echo&lt;/span> -n &lt;span style="color:#b44">&amp;#34;Q&amp;#34;&lt;/span> | openssl s_client -connect 127.0.0.1:&amp;lt;PORT&amp;gt; -CAfile ca.crt
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">[...]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">Negotiated TLS1.3 group: X25519MLKEM768
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">[...]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#888">DONE
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Lo and behold, the negotiated group is &lt;code>X25519MLKEM768&lt;/code>! This is a significant
step towards making Kubernetes quantum-safe, seemingly without a major
announcement or dedicated KEP (Kubernetes Enhancement Proposal).&lt;/p>
&lt;h2 id="the-go-version-mismatch-pitfall">The Go version mismatch pitfall&lt;/h2>
&lt;p>An interesting wrinkle emerged with Go versions 1.23 and 1.24. Go 1.23
included experimental support for a draft version of &lt;code>ML-KEM&lt;/code>, identified as
&lt;code>X25519Kyber768Draft00&lt;/code>. This was also enabled by default if
&lt;code>Config.CurvePreferences&lt;/code> was &lt;code>nil&lt;/code>. Kubernetes v1.32 used Go 1.23. However,
Go 1.24 removed the draft support and replaced it with the standardized version
&lt;code>X25519MLKEM768&lt;/code>.&lt;/p>
&lt;p>What happens if a client and server are using mismatched Go versions (one on
1.23, the other on 1.24)? They won't have a common PQC KEM to negotiate, and
the handshake will fall back to classical ECC curves (e.g., &lt;code>X25519&lt;/code>). How
could this happen in practice?&lt;/p>
&lt;p>Consider a scenario:&lt;/p>
&lt;p>A Kubernetes cluster is running v1.32 (using Go 1.23 and thus
&lt;code>X25519Kyber768Draft00&lt;/code>). A developer upgrades their &lt;code>kubectl&lt;/code> to v1.33,
compiled with Go 1.24, only supporting &lt;code>X25519MLKEM768&lt;/code>. Now, when &lt;code>kubectl&lt;/code>
communicates with the v1.32 API server, they no longer share a common PQC
algorithm. The connection will downgrade to classical cryptography, silently
losing the PQC protection that has been in place. This highlights the
importance of understanding the implications of Go version upgrades, and the
details of the TLS stack.&lt;/p>
&lt;h2 id="limitation-packet-size">Limitations: packet size&lt;/h2>
&lt;p>One practical consideration with &lt;code>ML-KEM&lt;/code> is the size of its public keys
with encoded key sizes of around 1.2 kilobytes for &lt;code>ML-KEM-768&lt;/code>.
This can cause the initial TLS &lt;code>ClientHello&lt;/code> message not to fit inside
a single TCP/IP packet, given the typical networking constraints
(most commonly, the standard Ethernet frame size limit of 1500
bytes). Some TLS libraries or network appliances might not handle this
gracefully, assuming the Client Hello always fits in one packet. This issue
has been observed in some Kubernetes-related projects and networking
components, potentially leading to connection failures when PQC KEMs are used.
More details can be found at &lt;a href="https://tldr.fail/">tldr.fail&lt;/a>.&lt;/p>
&lt;h2 id="state-of-post-quantum-signatures">State of Post-Quantum Signatures&lt;/h2>
&lt;p>While KEMs are seeing broader adoption, PQC digital signatures are further
behind in terms of widespread integration into standard toolchains. NIST has
published standards for PQC signatures, such as &lt;code>ML-DSA&lt;/code> (&lt;code>FIPS-204&lt;/code>) and
&lt;code>SLH-DSA&lt;/code> (&lt;code>FIPS-205&lt;/code>). However, implementing these in a way that's broadly
usable (e.g., for PQC Certificate Authorities) &lt;a href="https://blog.cloudflare.com/another-look-at-pq-signatures/#the-algorithms">presents challenges&lt;/a>:&lt;/p>
&lt;p>&lt;strong>Larger Keys and Signatures&lt;/strong>: PQC signature schemes often have significantly
larger public keys and signature sizes compared to classical algorithms like
Ed25519 or RSA. For instance, Dilithium2 keys can be 30 times larger than
Ed25519 keys, and certificates can be 12 times larger.&lt;/p>
&lt;p>&lt;strong>Performance&lt;/strong>: Signing and verification operations &lt;a href="https://pqshield.github.io/nist-sigs-zoo/">can be substantially slower&lt;/a>.
While some algorithms are on par with classical algorithms, others may have a
much higher overhead, sometimes on the order of 10x to 1000x worse performance.
To improve this situation, NIST is running a
&lt;a href="https://csrc.nist.gov/news/2024/pqc-digital-signature-second-round-announcement">second round of standardization&lt;/a> for PQC signatures.&lt;/p>
&lt;p>&lt;strong>Toolchain Support&lt;/strong>: Mainstream TLS libraries and CA software do not yet have
mature, built-in support for these new signature algorithms. The Go team, for
example, has indicated that &lt;code>ML-DSA&lt;/code> support is a high priority, but the
soonest it might appear in the standard library is Go 1.26 &lt;a href="https://github.com/golang/go/issues/64537#issuecomment-2877714729">(as of May 2025)&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://github.com/cloudflare/circl">Cloudflare's CIRCL&lt;/a> (Cloudflare Interoperable Reusable Cryptographic Library)
library implements some PQC signature schemes like variants of Dilithium, and
they maintain a &lt;a href="https://github.com/cloudflare/go">fork of Go (cfgo)&lt;/a> that integrates CIRCL. Using &lt;code>cfgo&lt;/code>, it's
possible to experiment with generating certificates signed with PQC algorithms
like Ed25519-Dilithium2. However, this requires using a custom Go toolchain and
is not yet part of the mainstream Kubernetes or Go distributions.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>The journey to a post-quantum secure Kubernetes is underway, and perhaps
further along than many realize, thanks to the proactive adoption of &lt;code>ML-KEM&lt;/code>
in Go. With Kubernetes v1.33, users are already benefiting from hybrid post-quantum key
exchange in many TLS connections by default.&lt;/p>
&lt;p>However, awareness of potential pitfalls, such as Go version mismatches leading
to downgrades and issues with Client Hello packet sizes, is crucial. While PQC
for KEMs is becoming a reality, PQC for digital signatures and certificate
hierarchies is still in earlier stages of development and adoption for
mainstream use. As Kubernetes maintainers and contributors, staying informed
about these developments will be key to ensuring the long-term security of the
platform.&lt;/p></description></item><item><title>Navigating Failures in Pods With Devices</title><link>https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/</link><pubDate>Thu, 03 Jul 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/</guid><description>
&lt;p>Kubernetes is the de facto standard for container orchestration, but when it
comes to handling specialized hardware like GPUs and other accelerators, things
get a bit complicated. This blog post dives into the challenges of managing
failure modes when operating pods with devices in Kubernetes, based on insights
from &lt;a href="https://sched.co/1i7pT">Sergey Kanzhelev and Mrunal Patel's talk at KubeCon NA
2024&lt;/a>. You can follow the links to
&lt;a href="https://static.sched.com/hosted_files/kccncna2024/b9/KubeCon%20NA%202024_%20Navigating%20Failures%20in%20Pods%20With%20Devices_%20Challenges%20and%20Solutions.pptx.pdf?_gl=1*191m4j5*_gcl_au*MTU1MDM0MTM1My4xNzMwOTE4ODY5LjIxNDI4Nzk1NDIuMTczMTY0ODgyMC4xNzMxNjQ4ODIy*FPAU*MTU1MDM0MTM1My4xNzMwOTE4ODY5">slides&lt;/a>
and
&lt;a href="https://www.youtube.com/watch?v=-YCnOYTtVO8&amp;list=PLj6h78yzYM2Pw4mRw4S-1p_xLARMqPkA7&amp;index=150">recording&lt;/a>.&lt;/p>
&lt;h2 id="the-ai-ml-boom-and-its-impact-on-kubernetes">The AI/ML boom and its impact on Kubernetes&lt;/h2>
&lt;p>The rise of AI/ML workloads has brought new challenges to Kubernetes. These
workloads often rely heavily on specialized hardware, and any device failure can
significantly impact performance and lead to frustrating interruptions. As
highlighted in the 2024 &lt;a href="https://ai.meta.com/research/publications/the-llama-3-herd-of-models/">Llama
paper&lt;/a>,
hardware issues, particularly GPU failures, are a major cause of disruption in
AI/ML training. You can also learn how much effort NVIDIA spends on handling
devices failures and maintenance in the KubeCon talk by &lt;a href="https://kccncna2024.sched.com/event/1i7kJ/all-your-gpus-are-belong-to-us-an-inside-look-at-nvidias-self-healing-geforce-now-infrastructure-ryan-hallisey-piotr-prokop-pl-nvidia">Ryan Hallisey and Piotr
Prokop All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA's Self-Healing
GeForce NOW
Infrastructure&lt;/a>
(&lt;a href="https://www.youtube.com/watch?v=iLnHtKwmu2I">recording&lt;/a>) as they see 19
remediation requests per 1000 nodes a day!
We also see data centers offering spot consumption models and overcommit on
power, making device failures commonplace and a part of the business model.&lt;/p>
&lt;p>However, Kubernetes’s view on resources is still very static. The resource is
either there or not. And if it is there, the assumption is that it will stay
there fully functional - Kubernetes lacks good support for handling full or partial
hardware failures. These long-existing assumptions combined with the overall complexity of a setup lead
to a variety of failure modes, which we discuss here.&lt;/p>
&lt;h3 id="understanding-ai-ml-workloads">Understanding AI/ML workloads&lt;/h3>
&lt;p>Generally, all AI/ML workloads require specialized hardware, have challenging
scheduling requirements, and are expensive when idle. AI/ML workloads typically
fall into two categories - training and inference. Here is an oversimplified
view of those categories’ characteristics, which are different from traditional workloads
like web services:&lt;/p>
&lt;dl>
&lt;dt>Training&lt;/dt>
&lt;dd>These workloads are resource-intensive, often consuming entire
machines and running as gangs of pods. Training jobs are usually &amp;quot;run to
completion&amp;quot; - but that could be days, weeks or even months. Any failure in a
single pod can necessitate restarting the entire step across all the pods.&lt;/dd>
&lt;dt>Inference&lt;/dt>
&lt;dd>These workloads are usually long-running or run indefinitely,
and can be small enough to consume a subset of a Node’s devices or large enough to span
multiple nodes. They often require downloading huge files with the model
weights.&lt;/dd>
&lt;/dl>
&lt;p>These workload types specifically break many past assumptions:&lt;/p>
&lt;table>&lt;caption style="display: none;">Workload assumptions before and now&lt;/caption>
&lt;thead>
&lt;tr>
&lt;th style="text-align:left">Before&lt;/th>
&lt;th style="text-align:left">Now&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td style="text-align:left">Can get a better CPU and the app will work faster.&lt;/td>
&lt;td style="text-align:left">Require a &lt;strong>specific&lt;/strong> device (or &lt;strong>class of devices&lt;/strong>) to run.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">When something doesn’t work, just recreate it.&lt;/td>
&lt;td style="text-align:left">Allocation or reallocation is expensive.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Any node will work. No need to coordinate between Pods.&lt;/td>
&lt;td style="text-align:left">Scheduled in a special way - devices often connected in a cross-node topology.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Each Pod can be plug-and-play replaced if failed.&lt;/td>
&lt;td style="text-align:left">Pods are a part of a larger task. Lifecycle of an entire task depends on each Pod.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Container images are slim and easily available.&lt;/td>
&lt;td style="text-align:left">Container images may be so big that they require special handling.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Long initialization can be offset by slow rollout.&lt;/td>
&lt;td style="text-align:left">Initialization may be long and should be optimized, sometimes across many Pods together.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td style="text-align:left">Compute nodes are commoditized and relatively inexpensive, so some idle time is acceptable.&lt;/td>
&lt;td style="text-align:left">Nodes with specialized hardware can be an order of magnitude more expensive than those without, so idle time is very wasteful.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>The existing failure model was relying on old assumptions. It may still work for
the new workload types, but it has limited knowledge about devices and is very
expensive for them. In some cases, even prohibitively expensive. You will see
more examples later in this article.&lt;/p>
&lt;h3 id="why-kubernetes-still-reigns-supreme">Why Kubernetes still reigns supreme&lt;/h3>
&lt;p>This article is not going deeper into the question: why not start fresh for&lt;br>
AI/ML workloads since they are so different from the traditional Kubernetes
workloads. Despite many challenges, Kubernetes remains the platform of choice
for AI/ML workloads. Its maturity, security, and rich ecosystem of tools make it
a compelling option. While alternatives exist, they often lack the years of
development and refinement that Kubernetes offers. And the Kubernetes developers
are actively addressing the gaps identified in this article and beyond.&lt;/p>
&lt;h2 id="the-current-state-of-device-failure-handling">The current state of device failure handling&lt;/h2>
&lt;p>This section outlines different failure modes and the best practices and DIY
(Do-It-Yourself) solutions used today. The next session will describe a roadmap
of improving things for those failure modes.&lt;/p>
&lt;h3 id="failure-modes-k8s-infrastructure">Failure modes: K8s infrastructure&lt;/h3>
&lt;p>In order to understand the failures related to the Kubernetes infrastructure,
you need to understand how many moving parts are involved in scheduling a Pod on
the node. The sequence of events when the Pod is scheduled in the Node is as
follows:&lt;/p>
&lt;ol>
&lt;li>&lt;em>Device plugin&lt;/em> is scheduled on the Node&lt;/li>
&lt;li>&lt;em>Device plugin&lt;/em> is registered with the &lt;em>kubelet&lt;/em> via local gRPC&lt;/li>
&lt;li>&lt;em>Kubelet&lt;/em> uses &lt;em>device plugin&lt;/em> to watch for devices and updates capacity of
the node&lt;/li>
&lt;li>&lt;em>Scheduler&lt;/em> places a &lt;em>user Pod&lt;/em> on a Node based on the updated capacity&lt;/li>
&lt;li>&lt;em>Kubelet&lt;/em> asks &lt;em>Device plugin&lt;/em> to &lt;strong>Allocate&lt;/strong> devices for a &lt;em>User Pod&lt;/em>&lt;/li>
&lt;li>&lt;em>Kubelet&lt;/em> creates a &lt;em>User Pod&lt;/em> with the allocated devices attached to it&lt;/li>
&lt;/ol>
&lt;p>This diagram shows some of those actors involved:&lt;/p>
&lt;figure>
&lt;img src="https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/k8s-infra-devices.svg"
alt="The diagram shows relationships between the kubelet, Device plugin, and a user Pod. It shows that kubelet connects to the Device plugin named my-device, kubelet reports the node status with the my-device availability, and the user Pod requesting the 2 of my-device."/>
&lt;/figure>
&lt;p>As there are so many actors interconnected, every one of them and every
connection may experience interruptions. This leads to many exceptional
situations that are often considered failures, and may cause serious workload
interruptions:&lt;/p>
&lt;ul>
&lt;li>Pods failing admission at various stages of its lifecycle&lt;/li>
&lt;li>Pods unable to run on perfectly fine hardware&lt;/li>
&lt;li>Scheduling taking unexpectedly long time&lt;/li>
&lt;/ul>
&lt;figure>
&lt;img src="https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/k8s-infra-failures.svg"
alt="The same diagram as one above it, however it has an overlayed orange bang drawings over individual components with the text indicating what can break in that component. Over the kubelet text reads: &amp;#39;kubelet restart: looses all devices info before re-Watch&amp;#39;. Over the Device plugin text reads: &amp;#39;device plugin update, evictIon, restart: kubelet cannot Allocate devices or loses all devices state&amp;#39;. Over the user Pod text reads: &amp;#39;slow pod termination: devices are unavailable&amp;#39;."/>
&lt;/figure>
&lt;p>The goal for Kubernetes is to make the interruption between these components as
reliable as possible. Kubelet already implements retries, grace periods, and
other techniques to improve it. The roadmap section goes into details on other
edge cases that the Kubernetes project tracks. However, all these improvements
only work when these best practices are followed:&lt;/p>
&lt;ul>
&lt;li>Configure and restart kubelet and the container runtime (such as containerd or CRI-O)
as early as possible to not interrupt the workload.&lt;/li>
&lt;li>Monitor device plugin health and carefully plan for upgrades.&lt;/li>
&lt;li>Do not overload the node with less-important workloads to prevent interruption
of device plugin and other components.&lt;/li>
&lt;li>Configure user pods tolerations to handle node readiness flakes.&lt;/li>
&lt;li>Configure and code graceful termination logic carefully to not block devices
for too long.&lt;/li>
&lt;/ul>
&lt;p>Another class of Kubernetes infra-related issues is driver-related. With
traditional resources like CPU and memory, no compatibility checks between the
application and hardware were needed. With special devices like hardware
accelerators, there are new failure modes. Device drivers installed on the node:&lt;/p>
&lt;ul>
&lt;li>Must match the hardware&lt;/li>
&lt;li>Be compatible with an app&lt;/li>
&lt;li>Must work with other drivers (like &lt;a href="https://developer.nvidia.com/nccl">nccl&lt;/a>,
etc.)&lt;/li>
&lt;/ul>
&lt;p>Best practices for handling driver versions:&lt;/p>
&lt;ul>
&lt;li>Monitor driver installer health&lt;/li>
&lt;li>Plan upgrades of infrastructure and Pods to match the version&lt;/li>
&lt;li>Have canary deployments whenever possible&lt;/li>
&lt;/ul>
&lt;p>Following the best practices in this section and using device plugins and device
driver installers from trusted and reliable sources generally eliminate this
class of failures. Kubernetes is tracking work to make this space even better.&lt;/p>
&lt;h3 id="failure-modes-device-failed">Failure modes: device failed&lt;/h3>
&lt;p>There is very little handling of device failure in Kubernetes today. Device
plugins report the device failure only by changing the count of allocatable
devices. And Kubernetes relies on standard mechanisms like liveness probes or
container failures to allow Pods to communicate the failure condition to the
kubelet. However, Kubernetes does not correlate device failures with container
crashes and does not offer any mitigation beyond restarting the container while
being attached to the same device.&lt;/p>
&lt;p>This is why many plugins and DIY solutions exist to handle device failures based
on various signals.&lt;/p>
&lt;h4 id="health-controller">Health controller&lt;/h4>
&lt;p>In many cases a failed device will result in unrecoverable and very expensive
nodes doing nothing. A simple DIY solution is a &lt;em>node health controller&lt;/em>. The
controller could compare the device allocatable count with the capacity and if
the capacity is greater, it starts a timer. Once the timer reaches a threshold,
the health controller kills and recreates a node.&lt;/p>
&lt;p>There are problems with the &lt;em>health controller&lt;/em> approach:&lt;/p>
&lt;ul>
&lt;li>Root cause of the device failure is typically not known&lt;/li>
&lt;li>The controller is not workload aware&lt;/li>
&lt;li>Failed device might not be in use and you want to keep other devices running&lt;/li>
&lt;li>The detection may be too slow as it is very generic&lt;/li>
&lt;li>The node may be part of a bigger set of nodes and simply cannot be deleted in
isolation without other nodes&lt;/li>
&lt;/ul>
&lt;p>There are variations of the health controller solving some of the problems
above. The overall theme here though is that to best handle failed devices, you
need customized handling for the specific workload. Kubernetes doesn’t yet offer
enough abstraction to express how critical the device is for a node, for the
cluster, and for the Pod it is assigned to.&lt;/p>
&lt;h4 id="pod-failure-policy">Pod failure policy&lt;/h4>
&lt;p>Another DIY approach for device failure handling is a per-pod reaction on a
failed device. This approach is applicable for &lt;em>training&lt;/em> workloads that are
implemented as Jobs.&lt;/p>
&lt;p>Pod can define special error codes for device failures. For example, whenever
unexpected device behavior is encountered, Pod exits with a special exit code.
Then the Pod failure policy can handle the device failure in a special way. Read
more on &lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-failure-policy">Handling retriable and non-retriable pod failures with Pod failure
policy&lt;/a>&lt;/p>
&lt;p>There are some problems with the &lt;em>Pod failure policy&lt;/em> approach for Jobs:&lt;/p>
&lt;ul>
&lt;li>There is no well-known &lt;em>device failed&lt;/em> condition, so this approach does not work for the
generic Pod case&lt;/li>
&lt;li>Error codes must be coded carefully and in some cases are hard to guarantee.&lt;/li>
&lt;li>Only works with Jobs with &lt;code>restartPolicy: Never&lt;/code>, due to the limitation of a pod
failure policy feature.&lt;/li>
&lt;/ul>
&lt;p>So, this solution has limited applicability.&lt;/p>
&lt;h4 id="custom-pod-watcher">Custom pod watcher&lt;/h4>
&lt;p>A little more generic approach is to implement the Pod watcher as a DIY solution
or use some third party tools offering this functionality. The pod watcher is
most often used to handle device failures for inference workloads.&lt;/p>
&lt;p>Since Kubernetes just keeps a pod assigned to a device, even if the device is
reportedly unhealthy, the idea is to detect this situation with the pod watcher
and apply some remediation. It often involves obtaining device health status and
its mapping to the Pod using Pod Resources API on the node. If a device fails,
it can then delete the attached Pod as a remediation. The replica set will
handle the Pod recreation on a healthy device.&lt;/p>
&lt;p>The other reasons to implement this watcher:&lt;/p>
&lt;ul>
&lt;li>Without it, the Pod will keep being assigned to the failed device forever.&lt;/li>
&lt;li>There is no &lt;em>descheduling&lt;/em> for a pod with &lt;code>restartPolicy=Always&lt;/code>.&lt;/li>
&lt;li>There are no built-in controllers that delete Pods in CrashLoopBackoff.&lt;/li>
&lt;/ul>
&lt;p>Problems with the &lt;em>custom pod watcher&lt;/em>:&lt;/p>
&lt;ul>
&lt;li>The signal for the pod watcher is expensive to get, and involves some
privileged actions.&lt;/li>
&lt;li>It is a custom solution and it assumes the importance of a device for a Pod.&lt;/li>
&lt;li>The pod watcher relies on external controllers to reschedule a Pod.&lt;/li>
&lt;/ul>
&lt;p>There are more variations of DIY solutions for handling device failures or
upcoming maintenance. Overall, Kubernetes has enough extension points to
implement these solutions. However, some extension points require higher
privilege than users may be comfortable with or are too disruptive. The roadmap
section goes into more details on specific improvements in handling the device
failures.&lt;/p>
&lt;h3 id="failure-modes-container-code-failed">Failure modes: container code failed&lt;/h3>
&lt;p>When the container code fails or something bad happens with it, like out of
memory conditions, Kubernetes knows how to handle those cases. There is either
the restart of a container, or a crash of a Pod if it has &lt;code>restartPolicy: Never&lt;/code>
and scheduling it on another node. Kubernetes has limited expressiveness on what
is a failure (for example, non-zero exit code or liveness probe failure) and how
to react on such a failure (mostly either Always restart or immediately fail the
Pod).&lt;/p>
&lt;p>This level of expressiveness is often not enough for the complicated AI/ML
workloads. AI/ML pods are better rescheduled locally or even in-place as that
would save on image pulling time and device allocation. AI/ML pods are often
interconnected and need to be restarted together. This adds another level of
complexity and optimizing it often brings major savings in running AI/ML
workloads.&lt;/p>
&lt;p>There are various DIY solutions to handle Pod failures orchestration. The most
typical one is to wrap a main executable in a container by some orchestrator.
And this orchestrator will be able to restart the main executable whenever the
job needs to be restarted because some other pod has failed.&lt;/p>
&lt;p>Solutions like this are very fragile and elaborate. They are often worth the
money saved comparing to a regular JobSet delete/recreate cycle when used in
large training jobs. Making these solutions less fragile and more streamlined
by developing new hooks and extension points in Kubernetes will make it
easy to apply to smaller jobs, benefiting everybody.&lt;/p>
&lt;h3 id="failure-modes-device-degradation">Failure modes: device degradation&lt;/h3>
&lt;p>Not all device failures are terminal for the overall workload or batch job.
As the hardware stack gets more and more
complex, misconfiguration on one of the hardware stack layers, or driver
failures, may result in devices that are functional, but lagging on performance.
One device that is lagging behind can slow down the whole training job.&lt;/p>
&lt;p>We see reports of such cases more and more often. Kubernetes has no way to
express this type of failures today and since it is the newest type of failure
mode, there is not much of a best practice offered by hardware vendors for
detection and third party tooling for remediation of these situations.&lt;/p>
&lt;p>Typically, these failures are detected based on observed workload
characteristics. For example, the expected speed of AI/ML training steps on
particular hardware. Remediation for those issues is highly depend on a workload needs.&lt;/p>
&lt;h2 id="roadmap">Roadmap&lt;/h2>
&lt;p>As outlined in a section above, Kubernetes offers a lot of extension points
which are used to implement various DIY solutions. The space of AI/ML is
developing very fast, with changing requirements and usage patterns. SIG Node is
taking a measured approach of enabling more extension points to implement the
workload-specific scenarios over introduction of new semantics to support
specific scenarios. This means prioritizing making information about failures
readily available over implementing automatic remediations for those failures
that might only be suitable for a subset of workloads.&lt;/p>
&lt;p>This approach ensures there are no drastic changes for workload handling which
may break existing, well-oiled DIY solutions or experiences with the existing
more traditional workloads.&lt;/p>
&lt;p>Many error handling techniques used today work for AI/ML, but are very
expensive. SIG Node will invest in extension points to make those cheaper, with
the understanding that the price cutting for AI/ML is critical.&lt;/p>
&lt;p>The following is the set of specific investments we envision for various failure
modes.&lt;/p>
&lt;h3 id="roadmap-for-failure-modes-k8s-infrastructure">Roadmap for failure modes: K8s infrastructure&lt;/h3>
&lt;p>The area of Kubernetes infrastructure is the easiest to understand and very
important to make right for the upcoming transition from Device Plugins to DRA.
SIG Node is tracking many work items in this area, most notably the following:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/issues/127460">integrate kubelet with the systemd watchdog · Issue
#127460&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/issues/128696">DRA: detect stale DRA plugin sockets · Issue
#128696&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/issues/127803">Support takeover for devicemanager/device-plugin · Issue
#127803&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/issues/127457">Kubelet plugin registration reliability · Issue
#127457&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/issues/128167">Recreate the Device Manager gRPC server if failed · Issue
#128167&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://github.com/kubernetes/kubernetes/issues/128043">Retry pod admission on device plugin grpc failures · Issue
#128043&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>Basically, every interaction of Kubernetes components must be reliable via
either the kubelet improvements or the best practices in plugins development
and deployment.&lt;/p>
&lt;h3 id="roadmap-for-failure-modes-device-failed">Roadmap for failure modes: device failed&lt;/h3>
&lt;p>For the device failures some patterns are already emerging in common scenarios
that Kubernetes can support. However, the very first step is to make information
about failed devices available easier. The very first step here is the work in
&lt;a href="https://kep.k8s.io/4680">KEP 4680&lt;/a> (Add Resource Health Status to the Pod Status for
Device Plugin and DRA).&lt;/p>
&lt;p>Longer term ideas include to be tested:&lt;/p>
&lt;ul>
&lt;li>Integrate device failures into Pod Failure Policy.&lt;/li>
&lt;li>Node-local retry policies, enabling pod failure policies for Pods with
restartPolicy=OnFailure and possibly beyond that.&lt;/li>
&lt;li>Ability to &lt;em>deschedule&lt;/em> pod, including with the &lt;code>restartPolicy: Always&lt;/code>, so it can
get a new device allocated.&lt;/li>
&lt;li>Add device health to the ResourceSlice used to represent devices in DRA,
rather than simply withdrawing an unhealthy device from the ResourceSlice.&lt;/li>
&lt;/ul>
&lt;h3 id="roadmap-for-failure-modes-container-code-failed">Roadmap for failure modes: container code failed&lt;/h3>
&lt;p>The main improvements to handle container code failures for AI/ML workloads are
all targeting cheaper error handling and recovery. The cheapness is mostly
coming from reuse of pre-allocated resources as much as possible. From reusing
the Pods by restarting containers in-place, to node local restart of containers
instead of rescheduling whenever possible, to snapshotting support, and
re-scheduling prioritizing the same node to save on image pulls.&lt;/p>
&lt;p>Consider this scenario: A big training job needs 512 Pods to run. And one of the
pods failed. It means that all Pods need to be interrupted and synced up to
restart the failed step. The most efficient way to achieve this generally is to
reuse as many Pods as possible by restarting them in-place, while replacing the
failed pod to clear up the error from it. Like demonstrated in this picture:&lt;/p>
&lt;figure>
&lt;img src="https://kubernetes.io/blog/2025/07/03/navigating-failures-in-pods-with-devices/inplace-pod-restarts.svg"
alt="The picture shows 512 pod, most ot them are green and have a recycle sign next to them indicating that they can be reused, and one Pod drawn in red, and a new green replacement Pod next to it indicating that it needs to be replaced."/>
&lt;/figure>
&lt;p>It is possible to implement this scenario, but all solutions implementing it are
fragile due to lack of certain extension points in Kubernetes. Adding these
extension points to implement this scenario is on the Kubernetes roadmap.&lt;/p>
&lt;h3 id="roadmap-for-failure-modes-device-degradation">Roadmap for failure modes: device degradation&lt;/h3>
&lt;p>There is very little done in this area - there is no clear detection signal,
very limited troubleshooting tooling, and no built-in semantics to express the
&amp;quot;degraded&amp;quot; device on Kubernetes. There has been discussion of adding data on
device performance or degradation in the ResourceSlice used by DRA to represent
devices, but it is not yet clearly defined. There are also projects like
&lt;a href="https://github.com/medik8s/node-healthcheck-operator">node-healthcheck-operator&lt;/a>
that can be used for some scenarios.&lt;/p>
&lt;p>We expect developments in this area from hardware vendors and cloud providers, and we expect to see mostly DIY
solutions in the near future. As more users get exposed to AI/ML workloads, this
is a space needing feedback on patterns used here.&lt;/p>
&lt;h2 id="join-the-conversation">Join the conversation&lt;/h2>
&lt;p>The Kubernetes community encourages feedback and participation in shaping the
future of device failure handling. Join SIG Node and contribute to the ongoing
discussions!&lt;/p>
&lt;p>This blog post provides a high-level overview of the challenges and future
directions for device failure management in Kubernetes. By addressing these
issues, Kubernetes can solidify its position as the leading platform for AI/ML
workloads, ensuring resilience and reliability for applications that depend on
specialized hardware.&lt;/p></description></item><item><title>Image Compatibility In Cloud Native Environments</title><link>https://kubernetes.io/blog/2025/06/25/image-compatibility-in-cloud-native-environments/</link><pubDate>Wed, 25 Jun 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/06/25/image-compatibility-in-cloud-native-environments/</guid><description>
&lt;p>In industries where systems must run very reliably and meet strict performance criteria such as telecommunication, high-performance or AI computing, containerized applications often need specific operating system configuration or hardware presence.
It is common practice to require the use of specific versions of the kernel, its configuration, device drivers, or system components.
Despite the existence of the &lt;a href="https://opencontainers.org/">Open Container Initiative (OCI)&lt;/a>, a governing community to define standards and specifications for container images, there has been a gap in expression of such compatibility requirements.
The need to address this issue has led to different proposals and, ultimately, an implementation in Kubernetes' &lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html">Node Feature Discovery (NFD)&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html">NFD&lt;/a> is an open source Kubernetes project that automatically detects and reports &lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features">hardware and system features&lt;/a> of cluster nodes. This information helps users to schedule workloads on nodes that meet specific system requirements, which is especially useful for applications with strict hardware or operating system dependencies.&lt;/p>
&lt;h2 id="the-need-for-image-compatibility-specification">The need for image compatibility specification&lt;/h2>
&lt;h3 id="dependencies-between-containers-and-host-os">Dependencies between containers and host OS&lt;/h3>
&lt;p>A container image is built on a base image, which provides a minimal runtime environment, often a stripped-down Linux userland, completely empty or distroless. When an application requires certain features from the host OS, compatibility issues arise. These dependencies can manifest in several ways:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Drivers&lt;/strong>:
Host driver versions must match the supported range of a library version inside the container to avoid compatibility problems. Examples include GPUs and network drivers.&lt;/li>
&lt;li>&lt;strong>Libraries or Software&lt;/strong>:
The container must come with a specific version or range of versions for a library or software to run optimally in the environment. Examples from high performance computing are MPI, EFA, or Infiniband.&lt;/li>
&lt;li>&lt;strong>Kernel Modules or Features&lt;/strong>:
Specific kernel features or modules must be present. Examples include having support of write protected huge page faults, or the presence of VFIO&lt;/li>
&lt;li>And more…&lt;/li>
&lt;/ul>
&lt;p>While containers in Kubernetes are the most likely unit of abstraction for these needs, the definition of compatibility can extend further to include other container technologies such as Singularity and other OCI artifacts such as binaries from a spack binary cache.&lt;/p>
&lt;h3 id="multi-cloud-and-hybrid-cloud-challenges">Multi-cloud and hybrid cloud challenges&lt;/h3>
&lt;p>Containerized applications are deployed across various Kubernetes distributions and cloud providers, where different host operating systems introduce compatibility challenges.
Often those have to be pre-configured before workload deployment or are immutable.
For instance, different cloud providers will include different operating systems like:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>RHCOS/RHEL&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Photon OS&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Amazon Linux 2&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Container-Optimized OS&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Azure Linux OS&lt;/strong>&lt;/li>
&lt;li>And more...&lt;/li>
&lt;/ul>
&lt;p>Each OS comes with unique kernel versions, configurations, and drivers, making compatibility a non-trivial issue for applications requiring specific features.
It must be possible to quickly assess a container for its suitability to run on any specific environment.&lt;/p>
&lt;h3 id="image-compatibility-initiative">Image compatibility initiative&lt;/h3>
&lt;p>An effort was made within the &lt;a href="https://github.com/opencontainers/wg-image-compatibility">Open Containers Initiative Image Compatibility&lt;/a> working group to introduce a standard for image compatibility metadata.
A specification for compatibility would allow container authors to declare required host OS features, making compatibility requirements discoverable and programmable.
The specification implemented in Kubernetes Node Feature Discovery is one of the discussed proposals.
It aims to:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Define a structured way to express compatibility in OCI image manifests.&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Support a compatibility specification alongside container images in image registries.&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Allow automated validation of compatibility before scheduling containers.&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>The concept has since been implemented in the Kubernetes Node Feature Discovery project.&lt;/p>
&lt;h3 id="implementation-in-node-feature-discovery">Implementation in Node Feature Discovery&lt;/h3>
&lt;p>The solution integrates compatibility metadata into Kubernetes via NFD features and the &lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup">NodeFeatureGroup&lt;/a> API.
This interface enables the user to match containers to nodes based on exposing features of hardware and software, allowing for intelligent scheduling and workload optimization.&lt;/p>
&lt;h3 id="compatibility-specification">Compatibility specification&lt;/h3>
&lt;p>The compatibility specification is a structured list of compatibility objects containing &lt;em>&lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup">Node Feature Groups&lt;/a>&lt;/em>.
These objects define image requirements and facilitate validation against host nodes.
The feature requirements are described by using &lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/customization-guide.html#available-features">the list of available features&lt;/a> from the NFD project.
The schema has the following structure:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>version&lt;/strong> (string) - Specifies the API version.&lt;/li>
&lt;li>&lt;strong>compatibilities&lt;/strong> (array of objects) - List of compatibility sets.
&lt;ul>
&lt;li>&lt;strong>rules&lt;/strong> (object) - Specifies &lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/usage/custom-resources.html#nodefeaturegroup">NodeFeatureGroup&lt;/a> to define image requirements.&lt;/li>
&lt;li>&lt;strong>weight&lt;/strong> (int, optional) - Node affinity weight.&lt;/li>
&lt;li>&lt;strong>tag&lt;/strong> (string, optional) - Categorization tag.&lt;/li>
&lt;li>&lt;strong>description&lt;/strong> (string, optional) - Short description.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>An example might look like the following:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#008000;font-weight:bold">version&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>v1alpha1&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>&lt;span style="color:#008000;font-weight:bold">compatibilities&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb">&lt;/span>- &lt;span style="color:#008000;font-weight:bold">description&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;My image requirements&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">rules&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;kernel and cpu&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">matchFeatures&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">feature&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>kernel.loadedmodule&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">vfio-pci&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>{&lt;span style="color:#008000;font-weight:bold">op&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>Exists}&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">feature&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>cpu.model&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">vendor_id&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>{&lt;span style="color:#008000;font-weight:bold">op: In, value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;Intel&amp;#34;&lt;/span>,&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;AMD&amp;#34;&lt;/span>]}&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">name&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#b44">&amp;#34;one of available nics&amp;#34;&lt;/span>&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">matchAny&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">matchFeatures&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">feature&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>pci.device&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">vendor&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>{&lt;span style="color:#008000;font-weight:bold">op: In, value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;0eee&amp;#34;&lt;/span>]}&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">class&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>{&lt;span style="color:#008000;font-weight:bold">op: In, value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;0200&amp;#34;&lt;/span>]}&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">matchFeatures&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>- &lt;span style="color:#008000;font-weight:bold">feature&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>pci.device&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">matchExpressions&lt;/span>:&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">vendor&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>{&lt;span style="color:#008000;font-weight:bold">op: In, value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;0fff&amp;#34;&lt;/span>]}&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#bbb"> &lt;/span>&lt;span style="color:#008000;font-weight:bold">class&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>{&lt;span style="color:#008000;font-weight:bold">op: In, value&lt;/span>:&lt;span style="color:#bbb"> &lt;/span>[&lt;span style="color:#b44">&amp;#34;0200&amp;#34;&lt;/span>]}&lt;span style="color:#bbb">
&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="client-implementation-for-node-validation">Client implementation for node validation&lt;/h3>
&lt;p>To streamline compatibility validation, we implemented a &lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html">client tool&lt;/a> that allows for node validation based on an image's compatibility artifact.
In this workflow, the image author would generate a compatibility artifact that points to the image it describes in a registry via the referrers API.
When a need arises to assess the fit of an image to a host, the tool can discover the artifact and verify compatibility of an image to a node before deployment.
The client can validate nodes both inside and outside a Kubernetes cluster, extending the utility of the tool beyond the single Kubernetes use case.
In the future, image compatibility could play a crucial role in creating specific workload profiles based on image compatibility requirements, aiding in more efficient scheduling.
Additionally, it could potentially enable automatic node configuration to some extent, further optimizing resource allocation and ensuring seamless deployment of specialized workloads.&lt;/p>
&lt;h3 id="examples-of-usage">Examples of usage&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Define image compatibility metadata&lt;/strong>&lt;/p>
&lt;p>A &lt;a href="https://kubernetes.io/docs/concepts/containers/images/">container image&lt;/a> can have metadata that describes
its requirements based on features discovered from nodes, like kernel modules or CPU models.
The previous compatibility specification example in this article exemplified this use case.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Attach the artifact to the image&lt;/strong>&lt;/p>
&lt;p>The image compatibility specification is stored as an OCI artifact.
You can attach this metadata to your container image using the &lt;a href="https://oras.land/">oras&lt;/a> tool.
The registry only needs to support OCI artifacts, support for arbitrary types is not required.
Keep in mind that the container image and the artifact must be stored in the same registry.
Use the following command to attach the artifact to the image:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>oras attach &lt;span style="color:#b62;font-weight:bold">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#b62;font-weight:bold">&lt;/span>--artifact-type application/vnd.nfd.image-compatibility.v1alpha1 &amp;lt;image-url&amp;gt; &lt;span style="color:#b62;font-weight:bold">\ &lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;path-to-spec&amp;gt;.yaml:application/vnd.nfd.image-compatibility.spec.v1alpha1+yaml
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>&lt;strong>Validate image compatibility&lt;/strong>&lt;/p>
&lt;p>After attaching the compatibility specification, you can validate whether a node meets the
image's requirements. This validation can be done using the
&lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/reference/node-feature-client-reference.html">nfd client&lt;/a>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>nfd compat validate-node --image &amp;lt;image-url&amp;gt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>&lt;strong>Read the output from the client&lt;/strong>&lt;/p>
&lt;p>Finally you can read the report generated by the tool or use your own tools to act based on the generated JSON report.&lt;/p>
&lt;p>&lt;img alt="validate-node command output" src="https://kubernetes.io/blog/2025/06/25/image-compatibility-in-cloud-native-environments/validate-node-output.png">&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>The addition of image compatibility to Kubernetes through Node Feature Discovery underscores the growing importance of addressing compatibility in cloud native environments.
It is only a start, as further work is needed to integrate compatibility into scheduling of workloads within and outside of Kubernetes.
However, by integrating this feature into Kubernetes, mission-critical workloads can now define and validate host OS requirements more efficiently.
Moving forward, the adoption of compatibility metadata within Kubernetes ecosystems will significantly enhance the reliability and performance of specialized containerized applications, ensuring they meet the stringent requirements of industries like telecommunications, high-performance computing or any environment that requires special hardware or host OS configuration.&lt;/p>
&lt;h2 id="get-involved">Get involved&lt;/h2>
&lt;p>Join the &lt;a href="https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/contributing/">Kubernetes Node Feature Discovery&lt;/a> project if you're interested in getting involved with the design and development of Image Compatibility API and tools.
We always welcome new contributors.&lt;/p></description></item><item><title>Changes to Kubernetes Slack</title><link>https://kubernetes.io/blog/2025/06/16/changes-to-kubernetes-slack/</link><pubDate>Mon, 16 Jun 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/06/16/changes-to-kubernetes-slack/</guid><description>
&lt;p>&lt;strong>UPDATE&lt;/strong>: We’ve received notice from Salesforce that our Slack workspace &lt;strong>WILL NOT BE DOWNGRADED&lt;/strong> on June 20th. Stand by for more details, but for now, there is no urgency to back up private channels or direct messages.&lt;/p>
&lt;p>&lt;del>Kubernetes Slack will lose its special status and will be changing into a standard free Slack on June 20, 2025&lt;/del>. Sometime later this year, our community may move to a new platform. If you are responsible for a channel or private channel, or a member of a User Group, you will need to take some actions as soon as you can.&lt;/p>
&lt;p>For the last decade, Slack has supported our project with a free customized enterprise account. They have let us know that they can no longer do so, particularly since our Slack is one of the largest and more active ones on the platform. As such, they will be downgrading it to a standard free Slack while we decide on, and implement, other options.&lt;/p>
&lt;p>On Friday, June 20, we will be subject to the &lt;a href="https://slack.com/help/articles/27204752526611-Feature-limitations-on-the-free-version-of-Slack">feature limitations of free Slack&lt;/a>. The primary ones which will affect us will be only retaining 90 days of history, and having to disable several apps and workflows which we are currently using. The Slack Admin team will do their best to manage these limitations.&lt;/p>
&lt;p>Responsible channel owners, members of private channels, and members of User Groups should &lt;a href="https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md#what-actions-do-channel-owners-and-user-group-members-need-to-take-soon">take some actions&lt;/a> to prepare for the upgrade and preserve information as soon as possible.&lt;/p>
&lt;p>The CNCF Projects Staff have proposed that our community look at migrating to Discord. Because of existing issues where we have been pushing the limits of Slack, they have already explored what a Kubernetes Discord would look like. Discord would allow us to implement new tools and integrations which would help the community, such as GitHub group membership synchronization. The Steering Committee will discuss and decide on our future platform.&lt;/p>
&lt;p>Please see our &lt;a href="https://github.com/kubernetes/community/blob/master/communication/slack-migration-faq.md">FAQ&lt;/a>, and check the &lt;a href="https://groups.google.com/a/kubernetes.io/g/dev/">kubernetes-dev mailing list&lt;/a> and the &lt;a href="https://kubernetes.slack.com/archives/C9T0QMNG4">#announcements channel&lt;/a> for further news. If you have specific feedback on our Slack status join the &lt;a href="https://github.com/kubernetes/community/issues/8490">discussion on GitHub&lt;/a>.&lt;/p></description></item><item><title>Enhancing Kubernetes Event Management with Custom Aggregation</title><link>https://kubernetes.io/blog/2025/06/10/enhancing-kubernetes-event-management-custom-aggregation/</link><pubDate>Tue, 10 Jun 2025 00:00:00 +0000</pubDate><guid>https://kubernetes.io/blog/2025/06/10/enhancing-kubernetes-event-management-custom-aggregation/</guid><description>
&lt;p>Kubernetes &lt;a href="https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/event-v1/">Events&lt;/a> provide crucial insights into cluster operations, but as clusters grow, managing and analyzing these events becomes increasingly challenging. This blog post explores how to build custom event aggregation systems that help engineering teams better understand cluster behavior and troubleshoot issues more effectively.&lt;/p>
&lt;h2 id="the-challenge-with-kubernetes-events">The challenge with Kubernetes events&lt;/h2>
&lt;p>In a Kubernetes cluster, events are generated for various operations - from pod scheduling and container starts to volume mounts and network configurations. While these events are invaluable for debugging and monitoring, several challenges emerge in production environments:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Volume&lt;/strong>: Large clusters can generate thousands of events per minute&lt;/li>
&lt;li>&lt;strong>Retention&lt;/strong>: Default event retention is limited to one hour&lt;/li>
&lt;li>&lt;strong>Correlation&lt;/strong>: Related events from different components are not automatically linked&lt;/li>
&lt;li>&lt;strong>Classification&lt;/strong>: Events lack standardized severity or category classifications&lt;/li>
&lt;li>&lt;strong>Aggregation&lt;/strong>: Similar events are not automatically grouped&lt;/li>
&lt;/ol>
&lt;p>To learn more about Events in Kubernetes, read the &lt;a href="https://kubernetes.io/docs/reference/kubernetes-api/cluster-resources/event-v1/">Event&lt;/a> API reference.&lt;/p>
&lt;h2 id="real-world-value">Real-World value&lt;/h2>
&lt;p>Consider a production environment with tens of microservices where the users report intermittent transaction failures:&lt;/p>
&lt;p>&lt;strong>Traditional event aggregation process:&lt;/strong> Engineers are wasting hours sifting through thousands of standalone events spread across namespaces. By the time they look into it, the older events have long since purged, and correlating pod restarts to node-level issues is practically impossible.&lt;/p>
&lt;p>&lt;strong>With its event aggregation in its custom events:&lt;/strong> The system groups events across resources, instantly surfacing correlation patterns such as volume mount timeouts before pod restarts. History indicates it occurred during past record traffic spikes, highlighting a storage scalability issue in minutes rather than hours.&lt;/p>
&lt;p>The beneﬁt of this approach is that organizations that implement it commonly cut down their troubleshooting time significantly along with increasing the reliability of systems by detecting patterns early.&lt;/p>
&lt;h2 id="building-an-event-aggregation-system">Building an Event aggregation system&lt;/h2>
&lt;p>This post explores how to build a custom event aggregation system that addresses these challenges, aligned to Kubernetes best practices. I've picked the Go programming language for my example.&lt;/p>
&lt;h3 id="architecture-overview">Architecture overview&lt;/h3>
&lt;p>This event aggregation system consists of three main components:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Event Watcher&lt;/strong>: Monitors the Kubernetes API for new events&lt;/li>
&lt;li>&lt;strong>Event Processor&lt;/strong>: Processes, categorizes, and correlates events&lt;/li>
&lt;li>&lt;strong>Storage Backend&lt;/strong>: Stores processed events for longer retention&lt;/li>
&lt;/ol>
&lt;p>Here's a sketch for how to implement the event watcher:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">package&lt;/span> main
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">import&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;context&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> metav1 &lt;span style="color:#b44">&amp;#34;k8s.io/apimachinery/pkg/apis/meta/v1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;k8s.io/client-go/kubernetes&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#b44">&amp;#34;k8s.io/client-go/rest&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> eventsv1 &lt;span style="color:#b44">&amp;#34;k8s.io/api/events/v1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> EventWatcher &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> clientset &lt;span style="color:#666">*&lt;/span>kubernetes.Clientset
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> &lt;span style="color:#00a000">NewEventWatcher&lt;/span>(config &lt;span style="color:#666">*&lt;/span>rest.Config) (&lt;span style="color:#666">*&lt;/span>EventWatcher, &lt;span style="color:#0b0;font-weight:bold">error&lt;/span>) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> clientset, err &lt;span style="color:#666">:=&lt;/span> kubernetes.&lt;span style="color:#00a000">NewForConfig&lt;/span>(config)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> err &lt;span style="color:#666">!=&lt;/span> &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span>, err
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> &lt;span style="color:#666">&amp;amp;&lt;/span>EventWatcher{clientset: clientset}, &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> (w &lt;span style="color:#666">*&lt;/span>EventWatcher) &lt;span style="color:#00a000">Watch&lt;/span>(ctx context.Context) (&lt;span style="color:#666">&amp;lt;-&lt;/span>&lt;span style="color:#a2f;font-weight:bold">chan&lt;/span> &lt;span style="color:#666">*&lt;/span>eventsv1.Event, &lt;span style="color:#0b0;font-weight:bold">error&lt;/span>) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> events &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#a2f">make&lt;/span>(&lt;span style="color:#a2f;font-weight:bold">chan&lt;/span> &lt;span style="color:#666">*&lt;/span>eventsv1.Event)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> watcher, err &lt;span style="color:#666">:=&lt;/span> w.clientset.&lt;span style="color:#00a000">EventsV1&lt;/span>().&lt;span style="color:#00a000">Events&lt;/span>(&lt;span style="color:#b44">&amp;#34;&amp;#34;&lt;/span>).&lt;span style="color:#00a000">Watch&lt;/span>(ctx, metav1.ListOptions{})
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> err &lt;span style="color:#666">!=&lt;/span> &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span>, err
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">go&lt;/span> &lt;span style="color:#a2f;font-weight:bold">func&lt;/span>() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">defer&lt;/span> &lt;span style="color:#a2f">close&lt;/span>(events)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">for&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">select&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">case&lt;/span> event &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#666">&amp;lt;-&lt;/span>watcher.&lt;span style="color:#00a000">ResultChan&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> e, ok &lt;span style="color:#666">:=&lt;/span> event.Object.(&lt;span style="color:#666">*&lt;/span>eventsv1.Event); ok {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> events &lt;span style="color:#666">&amp;lt;-&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">case&lt;/span> &lt;span style="color:#666">&amp;lt;-&lt;/span>ctx.&lt;span style="color:#00a000">Done&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> watcher.&lt;span style="color:#00a000">Stop&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> events, &lt;span style="color:#a2f;font-weight:bold">nil&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="event-processing-and-classification">Event processing and classification&lt;/h3>
&lt;p>The event processor enriches events with additional context and classification:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> EventProcessor &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categoryRules []CategoryRule
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> correlationRules []CorrelationRule
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> ProcessedEvent &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Event &lt;span style="color:#666">*&lt;/span>eventsv1.Event
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Category &lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Severity &lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> CorrelationID &lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Metadata &lt;span style="color:#a2f;font-weight:bold">map&lt;/span>[&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>]&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> (p &lt;span style="color:#666">*&lt;/span>EventProcessor) &lt;span style="color:#00a000">Process&lt;/span>(event &lt;span style="color:#666">*&lt;/span>eventsv1.Event) &lt;span style="color:#666">*&lt;/span>ProcessedEvent {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> processed &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#666">&amp;amp;&lt;/span>ProcessedEvent{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Event: event,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Metadata: &lt;span style="color:#a2f">make&lt;/span>(&lt;span style="color:#a2f;font-weight:bold">map&lt;/span>[&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>]&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Apply classification rules
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> processed.Category = p.&lt;span style="color:#00a000">classifyEvent&lt;/span>(event)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> processed.Severity = p.&lt;span style="color:#00a000">determineSeverity&lt;/span>(event)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Generate correlation ID for related events
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> processed.CorrelationID = p.&lt;span style="color:#00a000">correlateEvent&lt;/span>(event)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Add useful metadata
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> processed.Metadata = p.&lt;span style="color:#00a000">extractMetadata&lt;/span>(event)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> processed
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="implementing-event-correlation">Implementing Event correlation&lt;/h3>
&lt;p>One of the key features you could implement is a way of correlating related Events.
Here's an example correlation strategy:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> (p &lt;span style="color:#666">*&lt;/span>EventProcessor) &lt;span style="color:#00a000">correlateEvent&lt;/span>(event &lt;span style="color:#666">*&lt;/span>eventsv1.Event) &lt;span style="color:#0b0;font-weight:bold">string&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Correlation strategies:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#080;font-style:italic">// 1. Time-based: Events within a time window
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#080;font-style:italic">// 2. Resource-based: Events affecting the same resource
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#080;font-style:italic">// 3. Causation-based: Events with cause-effect relationships
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> correlationKey &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#00a000">generateCorrelationKey&lt;/span>(event)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> correlationKey
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> &lt;span style="color:#00a000">generateCorrelationKey&lt;/span>(event &lt;span style="color:#666">*&lt;/span>eventsv1.Event) &lt;span style="color:#0b0;font-weight:bold">string&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Example: Combine namespace, resource type, and name
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> fmt.&lt;span style="color:#00a000">Sprintf&lt;/span>(&lt;span style="color:#b44">&amp;#34;%s/%s/%s&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event.InvolvedObject.Namespace,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event.InvolvedObject.Kind,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event.InvolvedObject.Name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="event-storage-and-retention">Event storage and retention&lt;/h2>
&lt;p>For long-term storage and analysis, you'll probably want a backend that supports:&lt;/p>
&lt;ul>
&lt;li>Efficient querying of large event volumes&lt;/li>
&lt;li>Flexible retention policies&lt;/li>
&lt;li>Support for aggregation queries&lt;/li>
&lt;/ul>
&lt;p>Here's a sample storage interface:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> EventStorage &lt;span style="color:#a2f;font-weight:bold">interface&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#00a000">Store&lt;/span>(context.Context, &lt;span style="color:#666">*&lt;/span>ProcessedEvent) &lt;span style="color:#0b0;font-weight:bold">error&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#00a000">Query&lt;/span>(context.Context, EventQuery) ([]ProcessedEvent, &lt;span style="color:#0b0;font-weight:bold">error&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#00a000">Aggregate&lt;/span>(context.Context, AggregationParams) ([]EventAggregate, &lt;span style="color:#0b0;font-weight:bold">error&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> EventQuery &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> TimeRange TimeRange
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Categories []&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Severity []&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> CorrelationID &lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Limit &lt;span style="color:#0b0;font-weight:bold">int&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> AggregationParams &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> GroupBy []&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> TimeWindow &lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Metrics []&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="good-practices-for-event-management">Good practices for Event management&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Resource Efficiency&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Implement rate limiting for event processing&lt;/li>
&lt;li>Use efficient filtering at the API server level&lt;/li>
&lt;li>Batch events for storage operations&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Scalability&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Distribute event processing across multiple workers&lt;/li>
&lt;li>Use leader election for coordination&lt;/li>
&lt;li>Implement backoff strategies for API rate limits&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Reliability&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Handle API server disconnections gracefully&lt;/li>
&lt;li>Buffer events during storage backend unavailability&lt;/li>
&lt;li>Implement retry mechanisms with exponential backoff&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="advanced-features">Advanced features&lt;/h2>
&lt;h3 id="pattern-detection">Pattern detection&lt;/h3>
&lt;p>Implement pattern detection to identify recurring issues:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> PatternDetector &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> patterns &lt;span style="color:#a2f;font-weight:bold">map&lt;/span>[&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>]&lt;span style="color:#666">*&lt;/span>Pattern
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> threshold &lt;span style="color:#0b0;font-weight:bold">int&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> (d &lt;span style="color:#666">*&lt;/span>PatternDetector) &lt;span style="color:#00a000">Detect&lt;/span>(events []ProcessedEvent) []Pattern {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Group similar events
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> groups &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#00a000">groupSimilarEvents&lt;/span>(events)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Analyze frequency and timing
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> patterns &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#00a000">identifyPatterns&lt;/span>(groups)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> patterns
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> &lt;span style="color:#00a000">groupSimilarEvents&lt;/span>(events []ProcessedEvent) &lt;span style="color:#a2f;font-weight:bold">map&lt;/span>[&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>][]ProcessedEvent {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#a2f">make&lt;/span>(&lt;span style="color:#a2f;font-weight:bold">map&lt;/span>[&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>][]ProcessedEvent)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">for&lt;/span> _, event &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#a2f;font-weight:bold">range&lt;/span> events {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Create similarity key based on event characteristics
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> similarityKey &lt;span style="color:#666">:=&lt;/span> fmt.&lt;span style="color:#00a000">Sprintf&lt;/span>(&lt;span style="color:#b44">&amp;#34;%s:%s:%s&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event.Event.Reason,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event.Event.InvolvedObject.Kind,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> event.Event.InvolvedObject.Namespace,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Group events with the same key
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> groups[similarityKey] = &lt;span style="color:#a2f">append&lt;/span>(groups[similarityKey], event)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> groups
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> &lt;span style="color:#00a000">identifyPatterns&lt;/span>(groups &lt;span style="color:#a2f;font-weight:bold">map&lt;/span>[&lt;span style="color:#0b0;font-weight:bold">string&lt;/span>][]ProcessedEvent) []Pattern {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">var&lt;/span> patterns []Pattern
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">for&lt;/span> key, events &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#a2f;font-weight:bold">range&lt;/span> groups {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Only consider groups with enough events to form a pattern
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> &lt;span style="color:#a2f">len&lt;/span>(events) &amp;lt; &lt;span style="color:#666">3&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">continue&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Sort events by time
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> sort.&lt;span style="color:#00a000">Slice&lt;/span>(events, &lt;span style="color:#a2f;font-weight:bold">func&lt;/span>(i, j &lt;span style="color:#0b0;font-weight:bold">int&lt;/span>) &lt;span style="color:#0b0;font-weight:bold">bool&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> events[i].Event.LastTimestamp.Time.&lt;span style="color:#00a000">Before&lt;/span>(events[j].Event.LastTimestamp.Time)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> })
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Calculate time range and frequency
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> firstSeen &lt;span style="color:#666">:=&lt;/span> events[&lt;span style="color:#666">0&lt;/span>].Event.FirstTimestamp.Time
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lastSeen &lt;span style="color:#666">:=&lt;/span> events[&lt;span style="color:#a2f">len&lt;/span>(events)&lt;span style="color:#666">-&lt;/span>&lt;span style="color:#666">1&lt;/span>].Event.LastTimestamp.Time
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> duration &lt;span style="color:#666">:=&lt;/span> lastSeen.&lt;span style="color:#00a000">Sub&lt;/span>(firstSeen).&lt;span style="color:#00a000">Minutes&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">var&lt;/span> frequency &lt;span style="color:#0b0;font-weight:bold">float64&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> duration &amp;gt; &lt;span style="color:#666">0&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> frequency = &lt;span style="color:#a2f">float64&lt;/span>(&lt;span style="color:#a2f">len&lt;/span>(events)) &lt;span style="color:#666">/&lt;/span> duration
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#080;font-style:italic">// Create a pattern if it meets threshold criteria
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> frequency &amp;gt; &lt;span style="color:#666">0.5&lt;/span> { &lt;span style="color:#080;font-style:italic">// More than 1 event per 2 minutes
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> pattern &lt;span style="color:#666">:=&lt;/span> Pattern{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Type: key,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Count: &lt;span style="color:#a2f">len&lt;/span>(events),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> FirstSeen: firstSeen,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> LastSeen: lastSeen,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> Frequency: frequency,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> EventSamples: events[:&lt;span style="color:#a2f">min&lt;/span>(&lt;span style="color:#666">3&lt;/span>, &lt;span style="color:#a2f">len&lt;/span>(events))], &lt;span style="color:#080;font-style:italic">// Keep up to 3 samples
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#080;font-style:italic">&lt;/span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> patterns = &lt;span style="color:#a2f">append&lt;/span>(patterns, pattern)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">return&lt;/span> patterns
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>With this implementation, the system can identify recurring patterns such as node pressure events, pod scheduling failures, or networking issues that occur with a specific frequency.&lt;/p>
&lt;h3 id="real-time-alerts">Real-time alerts&lt;/h3>
&lt;p>The following example provides a starting point for building an alerting system based on event patterns. It is not a complete solution but a conceptual sketch to illustrate the approach.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-go" data-lang="go">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">type&lt;/span> AlertManager &lt;span style="color:#a2f;font-weight:bold">struct&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rules []AlertRule
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> notifiers []Notifier
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#a2f;font-weight:bold">func&lt;/span> (a &lt;span style="color:#666">*&lt;/span>AlertManager) &lt;span style="color:#00a000">EvaluateEvents&lt;/span>(events []ProcessedEvent) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">for&lt;/span> _, rule &lt;span style="color:#666">:=&lt;/span> &lt;span style="color:#a2f;font-weight:bold">range&lt;/span> a.rules {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a2f;font-weight:bold">if&lt;/span> rule.&lt;span style="color:#00a000">Matches&lt;/span>(events) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> alert &lt;span style="color:#666">:=&lt;/span> rule.&lt;span style="color:#00a000">GenerateAlert&lt;/span>(events)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> a.&lt;span style="color:#00a000">notify&lt;/span>(alert)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>A well-designed event aggregation system can significantly improve cluster observability and troubleshooting capabilities. By implementing custom event processing, correlation, and storage, operators can better understand cluster behavior and respond to issues more effectively.&lt;/p>
&lt;p>The solutions presented here can be extended and customized based on specific requirements while maintaining compatibility with the Kubernetes API and following best practices for scalability and reliability.&lt;/p>
&lt;h2 id="next-steps">Next steps&lt;/h2>
&lt;p>Future enhancements could include:&lt;/p>
&lt;ul>
&lt;li>Machine learning for anomaly detection&lt;/li>
&lt;li>Integration with popular observability platforms&lt;/li>
&lt;li>Custom event APIs for application-specific events&lt;/li>
&lt;li>Enhanced visualization and reporting capabilities&lt;/li>
&lt;/ul>
&lt;p>For more information on Kubernetes events and custom &lt;a href="https://kubernetes.io/docs/concepts/architecture/controller/">controllers&lt;/a>,
refer to the official Kubernetes &lt;a href="https://kubernetes.io/docs/">documentation&lt;/a>.&lt;/p></description></item></channel></rss>