Skip to content

Speed up unwrap() by resolving keys from the key map instead of items()#521

Open
tfoutrein wants to merge 1 commit into
python-poetry:masterfrom
AstekGroup:perf/unwrap-from-keymap
Open

Speed up unwrap() by resolving keys from the key map instead of items()#521
tfoutrein wants to merge 1 commit into
python-poetry:masterfrom
AstekGroup:perf/unwrap-from-keymap

Conversation

@tfoutrein

Copy link
Copy Markdown
Contributor

What

Container.unwrap() and Table.unwrap() (TOML → plain dict) iterated self.items() — the inherited MutableMapping view. That routes every key through __getitem__item(), which rebuilds a SingleKey from the bare string for each key (running the key-type detection genexpr + recomputing _original) only to look the value back up. The structured Key objects already sit in the container's _map next to their body index.

Resolve straight from _map instead:

for key, idx in self._map.items():
    if isinstance(idx, tuple):
        value = OutOfOrderTableProxy(self, idx)   # unchanged path
    else:
        value = self._body[idx][1]
    unwrapped[key.key] = value.unwrap() if hasattr(value, "unwrap") else value

_map iterates in the same insertion order as items(), so key order is preserved. Out-of-order keys (a tuple index) still go through OutOfOrderTableProxy, so their validation and fragment merge are unchanged — in particular a conflicting out-of-order fragment (a value later redefined as a table) still raises KeyAlreadyPresent from unwrap(), exactly as before. Table.unwrap() now delegates to the inner container's unwrap().

This is the conversion-path analogue of the native __contains__ work in #483: stop rebuilding keys that already exist.

Benchmark

unwrap() on a parsed document, median, interleaved A/B vs master:

document speedup
large flat keys ~12.7×
poetry.lock-like ~9.5×
nested tables ~8.6×
pyproject.toml-like ~3.1×

Parsing is untouched.

Tests

Full suite incl. the toml-test conformance submodule passes. No behaviour change — the output dict (keys, values and order) plus exception parity are verified by a differential over ~9k generated documents (out-of-order / conflicting / nested / arrays-of-tables / dotted / inline, with del + __setitem__ mutations). Added regression tests for out-of-order unwrap() and for the conflicting-fragment case that must still raise.

@tfoutrein tfoutrein force-pushed the perf/unwrap-from-keymap branch 2 times, most recently from d39fa3c to 2098ed5 Compare June 15, 2026 17:51
Comment thread tests/test_toml_document.py Outdated

def test_unwrap_raises_on_conflicting_out_of_order_fragment() -> None:
# `b` is first a value, then redefined as a table by an out-of-order
# fragment. The conflict is only detected when the out-of-order proxy is

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bug, right? the parsing should fail

do you have a fix lined up for this, or intend to open a bug report?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — this is a genuine pre-existing bug, not something this PR introduces. An out-of-order value-vs-table redefinition like this is invalid TOML and should be rejected at parse(). tomlkit already does that for the in-order form ([a] / b = true / [a.b] raises KeyAlreadyPresent at parse); only the out-of-order continuation (split by an unrelated table) slips past parse and surfaces lazily when the OutOfOrderTableProxy is built. stdlib tomllib rejects the same input.

On scope: #521 is a pure performance change to unwrap() (it resolves keys from _map instead of iterating items()) and deliberately preserves current behaviour. The assertion you flagged is a regression guard — it proves the faster unwrap() still goes through the proxy and raises on this invalid doc rather than silently merging it into a corrupted dict. I've reworded it to clearly document the current (deferred) behaviour rather than bless it as correct.

I've opened #523 for the parse-time defect — it deserves its own report and is distinct from this perf PR. It also has a strictly-worse sibling: the dotted-key variant ([a] / b.c = 1 / [zz] / [a.b] / d = 2) is silently accepted and merged, which tomllib rejects. Happy to look at a parse-time fix separately, or coordinate with #505/#507 since they share the proxy construction path.

Container.unwrap() and Table.unwrap() iterated self.items() (the inherited
MutableMapping view), which routes every key through __getitem__ -> item(),
rebuilding a SingleKey from the bare string on each key only to look the
value back up. The structured Key objects already sit in the container's
_map alongside their body index.

Resolve straight from _map instead: O(1) per in-order key, no SingleKey
rebuild. Out-of-order keys (a tuple index) still go through
OutOfOrderTableProxy, so their validation (and fragment merge) is unchanged
-- in particular a conflicting out-of-order fragment still raises
KeyAlreadyPresent from unwrap() exactly as before. Table.unwrap() delegates
to the inner container's unwrap().

No behaviour change: full suite incl. toml-test conformance passes; the
output dict (keys, values AND order) plus exception parity are verified by a
differential over ~9k generated documents (out-of-order / conflicting /
nested / AoT / dotted / inline, with del+setitem mutations). ~9-13x faster
unwrap() on key-heavy documents.
@tfoutrein tfoutrein force-pushed the perf/unwrap-from-keymap branch from 2098ed5 to b214c44 Compare June 15, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants