Skip to content

gh-151308: Avoid huge pre-allocation in wave.readframes() for crafted files#151488

Open
iamsharduld wants to merge 1 commit into
python:mainfrom
iamsharduld:gh-151308-wave-readframes
Open

gh-151308: Avoid huge pre-allocation in wave.readframes() for crafted files#151488
iamsharduld wants to merge 1 commit into
python:mainfrom
iamsharduld:gh-151308-wave-readframes

Conversation

@iamsharduld

Copy link
Copy Markdown
Contributor

A WAV data chunk records its size in a 4-byte header field that is not
validated against the data actually present in the file. A small,
truncated, or maliciously crafted file can therefore claim a chunk of
several gigabytes and make wave.Wave_read.readframes() pre-allocate that
much memory via a single file.read(chunksize) call, leading to a
MemoryError (or memory exhaustion) from a tiny input.

When the underlying file is seekable, this clamps each read in the internal
_Chunk.read() to the number of bytes physically available, so we never
allocate more than the file can actually provide. The data returned for
valid files is unchanged.

Only the raw file object is probed, never a parent _Chunk, so the probe
can't seek to an untrusted chunk size (which would overflow on 32-bit
platforms such as WASI). Non-seekable streams retain the previous
behaviour, since their size can't be probed without buffering; the
realistic attack vector is a .wav file on disk, which is fully covered.

…rafted files

A WAV data chunk records its size in a 4-byte header field that is not
validated against the data actually present in the file.  A small,
truncated, or maliciously crafted file could therefore claim a chunk of
several gigabytes and make wave.Wave_read.readframes() pre-allocate that
much memory via a single file.read(chunksize) call, leading to a
MemoryError (or memory exhaustion) from a tiny input.

When the underlying file is seekable, clamp each read in the internal
_Chunk.read() to the number of bytes physically available, so we never
allocate more than the file can actually provide.  The data returned for
valid files is unchanged.
@iamsharduld

Copy link
Copy Markdown
Contributor Author

The only red check, Docs / Check for removed HTML IDs, fails in the Find merge base step with fatal: shallow file has changed since we read it — the known shallow-clone CI flake from #151365 (git 2.54.0 regression), unrelated to this PR: it changes no docs, and the doc-diff steps were all skipped. Could a maintainer re-run that job? All other checks pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant