Skip to content

Add file-based transfer APIs to build cache provider interface and all cache plugins#5746

Open
Copilot wants to merge 50 commits into
mainfrom
copilot/stream-cache-entry-for-http-plugin
Open

Add file-based transfer APIs to build cache provider interface and all cache plugins#5746
Copilot wants to merge 50 commits into
mainfrom
copilot/stream-cache-entry-for-http-plugin

Conversation

Copilot AI commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds optional file-based transfer APIs to the build cache provider interface (ICloudBuildCacheProvider) and implements them across all cache plugins (HTTP, Amazon S3, Azure Blob Storage). When enabled via the useDirectFileTransfersForBuildCache experiment flag, cache entries are transferred directly between local files and cloud storage without buffering entire contents in memory, preventing out-of-memory errors for large build outputs.

Details

Core changes:

  • ICloudBuildCacheProvider gains two optional methods: tryDownloadCacheEntryToFileAsync and tryUploadCacheEntryFromFileAsync. Providers that don't implement them gracefully fall back to the existing buffer-based APIs.
  • OperationBuildCache conditionally uses the file-based path when useDirectFileTransfersForBuildCache is enabled and the provider supports it. Includes cleanup of partial files on failed downloads.
  • WebClient is refactored to extract a shared _makeRawRequestAsync core used by both buffer and streaming request paths, with a new fetchStreamAsync method and Content-Encoding decompression support for streaming responses.
  • FileSystem in node-core-library gains createReadStream, createWriteStream, and createWriteStreamAsync methods (wrapped in _wrapException for consistent error handling).
  • FileSystemBuildCacheProvider is simplified — the stream method is removed since cloud providers now handle file I/O directly.

Plugin implementations:

  • HTTP plugin: Downloads via fetchStreamAsyncpipeline() to file. Uploads via createReadStreamfetchStreamAsync. Uses maxAttempts: 1 for uploads (stream consumed after first attempt), with credential fallback skipped for stream bodies.
  • Amazon S3 plugin: Downloads via streaming GET with retry support. Uploads hash the tarball on disk first via _hashFileAsync, then stream with the SHA-256 hash included in the AWS Signature V4 request — restoring full payload signing (no more UNSIGNED_PAYLOAD). No retry on uploads.
  • Azure Blob Storage plugin: Downloads via SDK-native blobClient.downloadToFile(). Uploads via blockBlobClient.uploadFile(). Parent directory creation ensured before download.

Gating:

  • Controlled by useDirectFileTransfersForBuildCache in experiments.json. Defaults to off. Falls back to buffer-based APIs if the cloud provider plugin doesn't implement the file-based methods.

How it was tested

  • Unit tests for HttpBuildCacheProvider (14 tests): buffer and file-based GET/SET, 404 cache miss, credential fallback skip for file uploads, write-not-allowed checks, retry behavior, pipeline assertions
  • Unit tests for AmazonS3Client (38 tests): buffer and file-based GET/SET, signed payload hash verification (not UNSIGNED-PAYLOAD), download retry on transient 5xx, no-retry on upload, credential validation, pipeline assertions
  • All existing tests continue to pass (51 S3 total, 14 HTTP total)
  • Verified backwards compatibility: when the experiment flag is off (default), all code paths are identical to the pre-change behavior

Impacted documentation

  • experiments.schema.json updated with useDirectFileTransfersForBuildCache description
  • common/reviews/api/rush-lib.api.md updated with new API surface
  • common/reviews/api/node-core-library.api.md updated with new FileSystem stream methods

@iclanton

iclanton commented Apr 5, 2026

Copy link
Copy Markdown
Member

@copilot - add APIs to FileSystem in node-core-library for the fs APIs you're making use of in this PR. Ideally, none of our projects (except for FileSystem itself) should use fs. Everything should use FileSystem. The change to node-core-library will also need a changefile. Take a look at the conventions in FileSystem and follow that pattern.

Copilot AI commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot - add APIs to FileSystem in node-core-library for the fs APIs you're making use of in this PR. Ideally, none of our projects (except for FileSystem itself) should use fs. Everything...

Added createReadStream() and createWriteStream() static methods to FileSystem in node-core-library, along with FileSystemReadStream and FileSystemWriteStream type aliases (following the same pattern as FileSystemStats and FolderItem). Updated FileSystemBuildCacheProvider and OperationBuildCache to use the new FileSystem APIs instead of importing from node:fs directly. Change file for @rushstack/node-core-library included as a minor bump. Commit: 468caf7

Copilot AI requested a review from iclanton April 5, 2026 04:32
@iclanton

iclanton commented Apr 5, 2026

Copy link
Copy Markdown
Member

@copilot - are there async variants of those fs APIs? Should we expose and use those too?

Copilot AI commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot - are there async variants of those fs APIs? Should we expose and use those too?

No, fs.createReadStream() and fs.createWriteStream() don't have async variants — they're already inherently asynchronous. They return stream objects immediately and all the actual I/O happens asynchronously through the stream events/pipeline. The fs/promises module doesn't include them for this reason.

@iclanton

iclanton commented Apr 5, 2026

Copy link
Copy Markdown
Member

@copilot - take a look at the other FileSystem APIs. Many have an optional options parameter with useful utilities. For example, the write operations have an option to create the parent folder if it doesn't exist. Consider adding useful options to these new APIs

Copilot AI commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot - take a look at the other FileSystem APIs. Many have an optional options parameter with useful utilities. For example, the write operations have an option to create the parent folder if it...

Added IFileSystemCreateWriteStreamOptions with ensureFolderExists to createWriteStream(), following the same pattern as IFileSystemWriteBinaryFileOptions and other write APIs. Updated FileSystemBuildCacheProvider to use the new option instead of manual ensureFolderAsync(). For createReadStream, there aren't useful options from the existing FileSystem conventions (read methods use encoding/convertLineEndings which don't apply to raw streams). Commit: 91161d8

@iclanton

iclanton commented Apr 5, 2026

Copy link
Copy Markdown
Member

@copilot - can we add stream support to the other cache plugins?

Copilot AI commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot - can we add stream support to the other cache plugins?

Added streaming support to both remaining cache plugins:

Amazon S3 plugin (733ef5c):

  • Refactored AmazonS3Client._makeRequestAsync to extract signing into _buildSignedRequest
  • Added _makeStreamRequestAsync using WebClient.fetchStreamAsync() with UNSIGNED-PAYLOAD for AWS Sig V4 signing of streaming uploads
  • Added getObjectStreamAsync() (with retry support) and uploadObjectStreamAsync() (single attempt, since consumed streams can't be replayed)
  • Added tryGetCacheEntryStreamByIdAsync and trySetCacheEntryStreamAsync to AmazonS3BuildCacheProvider

Azure Storage plugin (733ef5c):

  • Added tryGetCacheEntryStreamByIdAsync using blobClient.download() with readableStreamBody
  • Added trySetCacheEntryStreamAsync using blockBlobClient.uploadStream()
  • Added _logBlobError helper for consistent error handling across buffer and stream methods

All three cache plugins (HTTP, S3, Azure) now implement both streaming methods. All existing tests pass (45 S3, 10 Azure).

Copilot AI changed the title Add streaming APIs to build cache provider interface and HTTP plugin Add streaming APIs to build cache provider interface and all cache plugins Apr 5, 2026
@iclanton

iclanton commented Apr 5, 2026

Copy link
Copy Markdown
Member

@copilot - Can you put usage of this functionality behind a Rush experiment?

Copilot AI commented Apr 5, 2026

Copy link
Copy Markdown
Contributor Author

@copilot - Can you put usage of this functionality behind a Rush experiment?

Added a useStreamingBuildCache experiment flag in experiments.json. The streaming code paths in OperationBuildCache now only activate when both the experiment is enabled AND the cloud provider implements the streaming methods. Without the experiment enabled, all cache operations use the existing buffer-based approach. Commit: 2493ed8

@iclanton iclanton left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - This change introduces a lot of duplicated code. I've called out some instances of it. Can you review with that in mind and clean that up?

Comment thread libraries/rush-lib/src/logic/buildCache/FileSystemBuildCacheProvider.ts Outdated
Comment thread libraries/rush-lib/src/utilities/WebClient.ts Outdated
Comment thread rush-plugins/rush-amazon-s3-build-cache-plugin/src/AmazonS3Client.ts Outdated
Comment thread rush-plugins/rush-amazon-s3-build-cache-plugin/src/AmazonS3Client.ts Outdated
Comment thread rush-plugins/rush-amazon-s3-build-cache-plugin/src/AmazonS3Client.ts Outdated
iclanton and others added 8 commits April 8, 2026 14:20
The HTTP and S3 providers both ensure the parent directory exists
before writing the cache entry file. The Azure provider was missing
this, which would cause failures on a fresh machine where the build
cache folder hasn't been created yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rush change file: replace "streaming APIs" with file-based transfer
  API names and mention the experiment flag
- node-core-library change file: fix broken backtick formatting
- Azure provider: update JSDoc from "stream" to "file-based"
- HTTP test names: replace "stream consumed" / "stream bodies" with
  file-based language

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The delay variable is in milliseconds but the log message said "s".
This produced misleading output like "Will retry request in 4000s...".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… signing

Remove unused onBlobAlreadyExists parameter from _trySetBlobDataAsync

This callback was previously used to drain incoming streams when the
blob already existed. With the switch to file-based APIs, no callers
pass this parameter anymore.
- HTTP: add 404 cache miss test for tryDownloadCacheEntryToFileAsync
- HTTP: add pipeline assertion in download success test
- S3: add retry test for downloadObjectToFileAsync on transient 5xx
- S3: add pipeline assertions in download success/miss tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For stream-body requests, the log read "unknown bytes" which is
awkward. Change to "unknown length" for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add clarifying comment on maxAttempts: 1 for uploads explaining why
  the parameter exists (shared between download with retries and upload
  without)
- Replace S3 download snapshot containing auth headers with explicit
  field assertions, avoiding credential-looking strings in snapshots
- Update inline snapshot for "unknown length" wording change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@iclanton iclanton force-pushed the copilot/stream-cache-entry-for-http-plugin branch from dd9bf94 to f8fecd4 Compare April 8, 2026 22:26
Comment thread common/reviews/api/rush-lib.api.md
// Clean up any partial file left by the failed download so it isn't
// mistaken for a valid cache entry on the next build.
try {
await FileSystem.deleteFileAsync(targetPath);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this only deletes the target file when the provider throws during download.

However, providers generally swallow download failures and return false/undefined instead of throwing:

  • HTTP provider catches any error in tryDownloadCacheEntryToFileAsync() and returns false
  • S3 provider catches errors and returns false
  • Azure provider catches errors in _tryGetBlobDataAsync() and ultimately returns false

At the same time, the local cache provider considers a cache entry valid if the file merely exists, without validating size, checksum, or completeness.

It seems possible that a partially written (truncated or corrupted) file from a failed streamed download could remain on disk and be treated as a valid cache hit on subsequent runs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - take a look at this concern.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the cleanup now runs for any non-success case (both false return and thrown exception), not just in the catch block. Moved deleteFileAsync outside the try/catch so it triggers whenever cloudCacheHit remains false. This prevents partially written files from being treated as valid cache entries on subsequent runs.

@iclanton

Copy link
Copy Markdown
Member

@copilot - this branch has conflicts. Rebase or merge main into this branch and resolve conflicts.

…urns false

Providers may catch errors internally and return false instead of throwing,
leaving a partially written file on disk. Move the cleanup logic outside
the catch block so it runs for any non-success case, preventing corrupted
partial files from being treated as valid cache entries on subsequent builds.
Copilot AI added 2 commits June 15, 2026 02:14
…entry-for-http-plugin

# Conflicts:
#	common/reviews/api/rush-lib.api.md
#	libraries/rush-lib/src/api/ExperimentsConfiguration.ts
#	libraries/rush-lib/src/schemas/experiments.schema.json

Copilot AI commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@copilot - this branch has conflicts. Rebase or merge main into this branch and resolve conflicts.

Merged main and resolved three conflicts (experiments.schema.json, ExperimentsConfiguration.ts, rush-lib.api.md) — all from the new strictChangefileValidation experiment added on main. Also updated the HTTP plugin test file to use jest.mocked() instead of bare mocked() to match main's updated convention. All tests pass (14 HTTP, 51 S3, 10 Azure).

}
const response: IWebClientResponse = await this._makeSignedRequestAsync('GET', objectName);
return this._handleGetResponseAsync(
response.status,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - restructure the responses in these changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restructured _handleGetResponseAsync to accept the response object (IWebClientResponseBase) directly instead of individual status/statusText/ok fields. Also removed the getError callback — error messages are now constructed inline from the response object, making the call sites cleaner. Commit: the latest push.

body: entryStream,
warningText: 'Could not write cache entry',
// Streaming uploads cannot be retried because the stream is consumed
maxAttempts: 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - what can be done about this concern?

Copilot AI requested a review from iclanton June 15, 2026 03:22
cleanup?: () => void
): Promise<RetryableRequestResponse<T | undefined>> {
if (ok) {
if (response.ok) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - destructure the response variable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destructured response into { ok, status, statusText } at the top of the method. All 51 S3 tests pass.

Copilot AI requested a review from iclanton June 15, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Needs triage

Development

Successfully merging this pull request may close these issues.

5 participants