fix: serialize shim shutdown and quiesce guest before poweroff#32
Merged
Conversation
Replace the checksum-flaky golangci-lint install.sh download (which failed CI with a SHA-256 mismatch on the cached tarball) with a pinned `go install` that routes through the Go module proxy + GOSUMDB. Also bump all actions to their latest major versions: checkout v6, setup-go v6, setup-task v2, setup-buildx-action v4, login-action v4, action-gh-release v3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The CI bump to golangci-lint v2.12.2 surfaced new findings. Address them: - .golangci.yml: rename deprecated gomodguard -> gomodguard_v2; exclude gosec G703 (path traversal taint, same intent as the existing G304 exclusion) and G602 in tests; configure goconst with ignore-tests and min-occurrences: 4 to avoid churn on generic short literals. - Remove dead code: service.getContainerID (unused). - Extract constants for repeated high-value literals: QMP/log field keys (cpu_id, slot_id, data, container, exec, container_id, resource, required), mount types/options (tmpfs, nosuid, noexec, nodev, cgroup2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
newTestService() left service.vmLifecycle nil, but the production constructor (NewTaskService) always initializes it via NewManager(). The shutdown cleanup orchestrator's VMShutdown phase calls vmLifecycle.Shutdown unconditionally, so the nil fixture panicked with a nil pointer dereference in TestShutdownReleasesNetworkOnce (a linux-only test). This was latent: CI never reached the test step because the golangci-lint install was failing first. Initialize vmLifecycle with lifecycle.NewManager() so the fixture mirrors production; Manager.Shutdown is a safe no-op with no instance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "valid configuration" case pointed confDir at the host's /etc/cni/net.d, so NewCNIManager failed with "no CNI configuration files found" on CI runners (and any clean environment) that lack CNI configs. This was latent: CI never reached the test step before because the golangci-lint install was failing first. Write a valid conflist into a t.TempDir() and use that as confDir so the test no longer depends on host CNI state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
The release task failed with curl exit 92 ("HTTP/2 stream not closed
cleanly: PROTOCOL_ERROR") while fetching nerdctl-full from GitHub's CDN,
a known intermittent HTTP/2 issue.
Add a download() helper that forces HTTP/1.1 (eliminating the root
cause), retries transient failures, and writes to a temp file renamed on
success so a failed attempt never leaves a truncated tarball that a later
run would reuse. Use it for the nerdctl-full and erofs-snapshotter
downloads.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each container runs in its own shim process, but CNI plugins (bridge, firewall) mutate shared host state (iptables chains, bridges). When a new container's CNI ADD overlaps a prior container's asynchronous CNI DEL, the two race on the kernel xtables lock and the firewall plugin intermittently aborts the ADD with an empty error — observed in TestContainerdRunMultipleContainers (3rd container). Wrap Setup/Teardown in a host-wide advisory file lock (flock on /run/spinbox/cni.lock) so the plugin chain runs serially across all shim processes. Internal cleanup paths use new setupLocked/teardownLocked helpers to avoid recursive lock acquisition (flock is not reentrant). The lock is released on fd close, so a crashed shim never leaves it held. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
beginRPC/shutdownOnceto prevent new RPCs during shutdown and ensure in-flight calls drain before exitPrepareShutdownRPC so the host can flush the guest writable layer before VM poweroff, ensuring snapshot/commit consistencyapi/next.pb.txtdescriptor snapshotTest plan
PrepareShutdownguest servicebeginRPC/ shutdown serialization in task servicetask lintpassesgo test -race ./...passes🤖 Generated with Claude Code