Skip to content

fix(llamacpp): resolve backend tag via registry API to honor mirrors#974

Draft
ilopezluna wants to merge 2 commits into
mainfrom
fix/llamacpp-backend-resolve-via-registry
Draft

fix(llamacpp): resolve backend tag via registry API to honor mirrors#974
ilopezluna wants to merge 2 commits into
mainfrom
fix/llamacpp-backend-resolve-via-registry

Conversation

@ilopezluna

Copy link
Copy Markdown
Contributor

Context

downloadLatestLlamaCpp resolved the llama.cpp backend tag to a digest by calling the Docker Hub HTTP API (hub.docker.com/v2/.../tags/<tag>). That endpoint is only reachable from the public internet, so customers behind a corporate registry mirror (Artifactory / Nexus / Harbor) with no direct egress to Docker Hub could not resolve the backend image even though the mirror could serve it.

This change resolves the tag through the containerd registry resolver instead. Resolution now goes through the same registryutil.RegistryHosts path as the image pull, honoring configured mirrors and docker login credentials.

Changes

  • Resolve via registry API — new dockerhub.ResolveDigest(ctx, ref, mirrors) resolves a tag to a digest using the registry resolver (manifest HEAD/GET only, no blobs). downloadLatestLlamaCpp uses it instead of the Hub HTTP API.

Hardening from review

  • Fail fast on terminal errors — the shared retry() now short-circuits non-retryable errors (errdefs.IsNotFound / IsUnauthorized / canceled / deadline-exceeded) instead of looping 10×1s (~9s). A missing tag or bad credentials no longer stalls the install/startup path. Applies to both ResolveDigest and PullPlatform.
  • Build the resolver once — extracted a shared newResolver() helper (reused by fetch()); the resolver is built once outside the retry loop rather than rebuilt and re-reading ~/.docker/config.json on every attempt.
  • Diagnosability — log the resolved digest and the host/name that served it, so a re-download loop behind a mirror that re-pushes a converted image is debuggable.
  • Cleanup — dropped the now-dead *http.Client parameter from the llama.cpp download path (the shared Install interface boundary is unchanged) and removed the unreachable empty-digest branch.

Tests

  • goleak TestMain to catch goroutine/connection leaks.
  • Mirror-resolution test (resolver hits the mirror, not registry-1.docker.io).
  • Canceled-context test and hermetic retry() tests for terminal-error fast-fail vs. transient-error retry.

Verification

  • go build ./... and per-platform GOOS=darwin/windows/linux go build ./...
  • go vet on both packages
  • go test ./pkg/internal/dockerhub/... (goleak clean) and ./pkg/inference/backends/llamacpp/...

🤖 Generated with Claude Code

Resolve the llama.cpp backend tag to a digest through the containerd
registry resolver instead of the Docker Hub HTTP API. This honors
configured registry mirrors (corporate Artifactory / Nexus / Harbor) and
`docker login` credentials, so customers behind a private mirror with no
direct egress to registry-1.docker.io can resolve and pull the backend.

Hardening from review:
- retry() now short-circuits terminal errors (not-found, unauthorized,
  canceled/expired context) instead of looping 10x with 1s sleeps, so a
  missing tag or bad credentials no longer blocks startup ~9s. Applies to
  both ResolveDigest and PullPlatform.
- Extract a shared newResolver() helper; build the resolver once outside
  the retry loop rather than rebuilding (and re-reading config.json) per
  attempt.
- Log the resolved digest and the host that served it, to diagnose a
  re-download loop behind a mirror that re-pushes a converted image.
- Drop the now-dead *http.Client parameter from the llama.cpp download
  path and the unreachable empty-digest branch.
- Add goleak and tests for mirror resolution and terminal-error fast-fail.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the llamacpp backend to resolve image tags to digests using the Registry HTTP API v2, which honors registry mirrors and local docker credentials. It also introduces a fast-fail mechanism for terminal errors during retries and adds comprehensive unit tests. The feedback highlights a critical issue where the resolver is still recreated on every retry in PullPlatform, and suggests expanding the terminal error checks to include 403 Forbidden (errdefs.IsDenied) to fail fast on permission errors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pkg/internal/dockerhub/download.go Outdated
Comment thread pkg/internal/dockerhub/download.go
Address review feedback on PR #974:

- PullPlatform now builds the resolver once and passes it into fetch(),
  instead of fetch() rebuilding it (and re-reading ~/.docker/config.json)
  on every retry attempt.
- isTerminal() now fails fast on 401/403. The containerd resolver only
  maps 404 to errdefs.ErrNotFound; other 4xx statuses surface as
  remoteerrors.ErrUnexpectedStatus carrying the raw code, so we inspect
  StatusCode via errors.As (covering both 401 and 403). 429 is left
  retryable, matching the resolver's internal rate-limit handling.
- Add hermetic retry() tests for 401/403 fast-fail and 429 retry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant