gh-151307: Bound zipfile reads for forged compressed sizes#151509
gh-151307: Bound zipfile reads for forged compressed sizes#151509rohitjavvadi wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9067b6f30d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try: | ||
| fileobj.seek(0, os.SEEK_END) | ||
| self._compress_end = fileobj.tell() | ||
| finally: | ||
| fileobj.seek(self._orig_compress_start) |
There was a problem hiding this comment.
Avoid seeking arbitrary streams to EOF for every member
When the ZIP is backed by another seekable file-like object whose seek() is not constant-time, this EOF probe makes every ZipFile.open() scan the whole backing stream. A concrete supported case is ZipFile over a deflated ZipExtFile: ZipExtFile.seek(0, SEEK_END) reaches the end by repeatedly reading/decompressing, so opening each member of a nested ZIP becomes O(size of the outer member) before any member data is read, causing a large regression for nested archives with many entries.
Useful? React with 👍 / 👎.
930fc93 to
26535f5
Compare
26535f5 to
6531eb7
Compare
The forged ZIP from gh-151307 can make
ZipExtFile._read2()pass a central-directory-controlled compressed size directly to the underlying file object'sread(n). In the local reproducer, a 160-byte archive made the unpatched code callread(2147483647)twice before failing withEOFError.This keeps the existing overlap warning behavior for duplicate-name entries, but bounds the actual low-level read request:
After the change, the same 160-byte archive still fails as truncated, but the largest underlying read request is 125 bytes and there are no oversized reads.
Fixes gh-151307.
Testing
[2147483647, 2147483647][]./python.exe -m test test_zipfile -m test_forged_compress_size_read_is_bounded -v./python.exe -m test test_zipfile -vgit diff --checkmake patchcheck