Skip to content

kubectl: fix infinite drain retry for pods migrated to other nodes#138237

Open
sakshar2303 wants to merge 1 commit intokubernetes:masterfrom
sakshar2303:fix-kubectl-drain-migration
Open

kubectl: fix infinite drain retry for pods migrated to other nodes#138237
sakshar2303 wants to merge 1 commit intokubernetes:masterfrom
sakshar2303:fix-kubectl-drain-migration

Conversation

@sakshar2303
Copy link
Copy Markdown

Description

This PR fixes a bug in kubectl drain where the eviction process could enter an infinite retry loop for pods that have already migrated to a different node.

The Issue

Currently, when the Eviction API returns a 429 Too Many Requests error (often due to a PodDisruptionBudget), kubectl retries the eviction every 5 seconds. However, it does not re-verify the pod's state during these retries. In scenarios involving StatefulSets or rapid rescheduling, a pod might be deleted and recreated on a different node while kubectl is still retrying the old eviction. This causes kubectl to attempt evicting a pod that is no longer on the target node, potentially blocking the drain operation indefinitely.

Changes

  • Updated the eviction retry logic in staging/src/k8s.io/kubectl/pkg/drain/evict.go.
  • Added a check to re-fetch the Pod from the API server during retries.
  • If the pod.Spec.NodeName no longer matches the node being drained, the pod is considered "successfully moved" and the drain proceeds to the next pod.

Validation

  • Verified that the drain continues if the pod's NodeName changes during the retry window.
  • Verified that API errors during the pod re-fetch are handled gracefully.
  • Added/Updated unit tests in staging/src/k8s.io/kubectl/pkg/drain/drain_test.go.

Fixes # (Link the issue here once you've created it, or mention it's a found bug)

/sig cli
/area kubectl
/kind bug

Signed-off-by: Sakshar Dhawan <sakshardhawanfzk@gmail.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/cli Categorizes an issue or PR as relevant to SIG CLI. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. area/kubectl kind/bug Categorizes issue or PR as related to a bug. labels Apr 6, 2026
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG CLI Apr 6, 2026
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 6, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Apr 6, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @sakshar2303. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 6, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sakshar2303
Once this PR has been reviewed and has the lgtm label, please assign soltysh for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from seans3 and soltysh April 6, 2026 19:47
@sakshar2303
Copy link
Copy Markdown
Author

@seans3 @soltysh please review the changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubectl cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

2 participants