feat(storage): implement GCS overrun logging and resume prevention#16150
feat(storage): implement GCS overrun logging and resume prevention#16150v-pratap wants to merge 10 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces overrun logging and transcoding detection when reading objects from GCS, updating classes like ReadRange, AsyncReaderConnectionResume, and ObjectReadStreambuf to track received bytes and suppress warnings or checksum validation for transcoded data. The code review identified several critical issues: data races in ObjectDescriptorImpl::OnRead due to accessing metadata_ after unlocking and in AsyncReaderConnectionResume::Reconnect due to accessing impl_ without a lock; potential undefined behavior from signed integer overflow when negating p.start at its minimum value; duplicated overrun check logic in OverrunLoggingObjectReadSource; and a bug where casting a negative requested_length_ to std::size_t causes an unsigned wrap-around, bypassing the overrun check.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #16150 +/- ##
========================================
Coverage 92.20% 92.21%
========================================
Files 2264 2264
Lines 209092 209725 +633
========================================
+ Hits 192802 193405 +603
- Misses 16290 16320 +30 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Overall looks good - added some minor comments. |
This commit implements overrun logging, satisfied-resume prevention, and server-side transcoding bypass for both the synchronous and asynchronous GCS download paths. Synchronous Path: - Introduced a private 'OverrunLoggingObjectReadSource' decorator inside 'object_read_streambuf.cc' that wraps the transport source to track received bytes lock-free. - Logs a warning with the exact overshoot byte count, bucket, and object name if GCS sends more bytes than requested. - Leaves 'object_read_streambuf.h' and 'object_read_stream.h' completely clean, untouched, and thread-safe. Asynchronous Path (gRPC BiDi Stream & Streaming Reader): - Integrated chunk-level byte tracking inside 'ReadRange' and 'AsyncReaderConnectionResume'. - Under server-side transcoding (gzip), suppresses overrun warnings and bypasses checksum validation failures (since decompressed bytes naturally exceed the compressed request and mismatch the original checksums). - Implemented satisfied-resume prevention: if a connection fails but the requested number of bytes has already been received, we refuse to reconnect to prevent GCS from sending unwanted extra bytes, and return Status::OK directly, masking any subsequent stream errors. - Defined overloaded constructors to maintain 100% backward compatibility for all existing tests and client code. Testing: - Added 15 new, comprehensive unit tests covering all overrun, tail-read, transcoding, and resume scenarios.
- Fix data race on metadata_ in ObjectDescriptorImpl::OnRead by extracting is_transcoded status before unlocking the mutex. - Fix data race on impl_ in AsyncReaderConnectionResume::Reconnect by calling CurrentImpl() to safely acquire the lock. - Fix potential signed integer negation overflow undefined behavior in ObjectDescriptorImpl::Read by explicitly checking for numeric_limits min() and capping the limit. - De-duplicate overrun check logic in OverrunLoggingObjectReadSource by calling CheckOverrun() directly in Read(). - Fix potential signed-to-unsigned cast wrap-around bug in CheckOverrun() by checking that requested_length_ is non-negative before casting.
- Dynamically verify if server-side decompressive transcoding actually occurred by comparing total received bytes to the object's original stored size (when available). - Prevents silently ignoring real data corruption when a user explicitly requests compressed gzip-encoded bytes and receives a corrupted file. - Falls back to the standard is_transcoded flag if the object size is unknown (maintaining 100% backward compatibility and preventing test breakages). - Applied this robust check to both ReadRange (async BiDi range reads) and AsyncReaderConnectionResume (async streaming reader). - Added a new unit test 'TranscodingFailsOnRealChecksumMismatch' to verify that a checksum mismatch is correctly reported as an error if the received size matches the compressed object size. - Resolved a compiler signedness warning in read_range.cc by casting object_size to std::size_t. - Made the 'logged_warning_' flag in ObjectReadStreambuf a std::atomic<bool> to ensure absolute thread-safety.
- Refactored overrun warning logs to print once at the end of the download (stream Close or completion/failure) rather than immediately on the first overrunning chunk. - Aligns the C++ GCS client's logging behavior with the official Google Cloud Go client (cloud.google.com/go/storage). - Ensures that the warning log reports the true, final total overrun byte count instead of a partial overrun count from the first chunk. - Sync Path: Removed CheckOverrun from the hot Read() path in ObjectReadStreambuf, moving it to Close() and the decorator destructor (~OverrunLoggingObjectReadSource). - Async Range Path: Deferred CheckOverrun in ReadRange to the final range_end and OnFinish paths. - Async Resume Path: Deferred CheckOverrun in AsyncReaderConnectionResume to the final status.ok, satisfied, and stop paths. - Updated all unit tests in object_read_streambuf_test.cc, read_range_test.cc, and reader_connection_resume_test.cc to drive the streams to completion (EOF/Close) before verifying the logs.
- Converted the logged_warning_ check-and-set in ObjectReadStreambuf to a thread-safe atomic exchange (!logged_warning_.exchange(true)) to prevent any potential check-then-act race conditions. - Added explicit static_cast<std::size_t> casts to requested_length_ in both ReadRange and AsyncReaderConnectionResume overrun logging statements. This prevents signed/unsigned mixing arithmetic and conforms perfectly to strict compiler sign-comparison flags. - Documented and verified architectural correctness of total_received_bytes_ relative counting (essential for ranged reads starting at non-zero offsets) and the >0 read_limit check (mandatory for GCS proto3 default semantics).
…isfy GCB checkers
1e5b546 to
957d8a0
Compare
|
|
||
| if (data.range_end()) { | ||
| status_ = Status{}; | ||
| CheckOverrun(); |
There was a problem hiding this comment.
Should we do this if we have satisfied the requested length, even if range_end is false?
There was a problem hiding this comment.
I think yes, Once we get all the bytes the user asked for, the download is basically done. If we wait for the server to close the stream, any sudden network drop in the meantime would fail the download, even though we already have all the data. Completing immediately avoids this issue.
…satisfied in ReadRange
This commit implements overrun logging, satisfied-resume prevention, and server-side transcoding bypass for both the synchronous and asynchronous GCS download paths.
Synchronous Path:
Asynchronous Path (gRPC BiDi Stream & Streaming Reader):