Skip to content

[CI] Preprocessor cache hits = 0% across all Linux wheel builds (sccache + PEP 517 build isolation) #2220

@leofang

Description

@leofang

Summary

SCCACHE_GHA_USE_PREPROCESSOR_CACHE_MODE=true is enabled in build-wheel.yml#L58, but the preprocessor cache has never served a hit in any CI run I've sampled (PR 2218 today, main today 00:23 UTC, main on 2026-05-27). All compiled translation units register as Preprocessor cache misses. The downstream object cache still hits ~100% via GHAC, so wheel build correctness is fine — but the preprocessor stage is pure overhead.

Evidence

From job Build linux-64, CUDA 13.3.0 / py3.12 (PR #2218):

                       cuda.bindings   cuda.core   cuda.core (prev CTK)
Preprocessor hits                  0           0                     0
Preprocessor misses               29          42                    42
Cache hits                        29          42                    40
Cache hits rate              100.00 %    100.00 %               95.24 %
Avg preprocessor miss        0.094 s    17.993 s               14.148 s

Same pattern in main run 27516906885 / job 81327125521 and main run 26482898891 / job 77984135920 (3 weeks ago) — 0 hits in every case.

Root cause

pip's per-run build-isolation overlay lives at /tmp/pip-build-env-<RANDOM>/overlay/... and that random path lands in -I arguments of every c++ invocation, e.g.:

c++ ... -I/tmp/build-env-_283dvkj/include ... -c build/cython/cuda/bindings/runtime.cpp

rapidsai/sccache hashes the raw compiler arguments into the preprocessor cache key without any basedir stripping for flag values — see preprocessor_cache_entry_hash_key in preprocessor_cache.rs#L493-L497. Different random -I path each CI run → different preprocessor key → guaranteed miss. The object cache survives because its key is derived from the preprocessor output, which is byte-identical between runs (the overlay dir contributes no headers to the expansion).

SCCACHE_BASEDIRS would not help either: it strips paths from the input file only, not from flag values. (ccache's CCACHE_BASEDIR does both — sccache lacks the equivalent.)

Perf implication

Lookups aren't free. In the linked job:

  • cuda.core: 42 misses × ~18 s avg lookup ≈ ~12.6 min wasted
  • cuda.core (prev CTK): 42 × ~14 s ≈ ~9.9 min

So preprocessor cache mode currently costs roughly 20 min per Linux job without delivering a single hit. Multiplied across the build matrix (6 Python versions × 2 arches × 2 CTKs in the wheel jobs), this is non-trivial CI cost.

Proposed fix

Quick win: stop setting SCCACHE_GHA_USE_PREPROCESSOR_CACHE_MODE until the underlying issue is addressed. PR to follow.

Real fix (separate work): switch cibuildwheel to --no-build-isolation, so pip's overlay dir never appears in -I to begin with. That also unlocks reusing the scikit-build / Cython build dir across runs.

Metadata

Metadata

Assignees

Labels

CI/CDCI/CD infrastructurebugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions