Speed up unwrap() by resolving keys from the key map instead of items()#521
Speed up unwrap() by resolving keys from the key map instead of items()#521tfoutrein wants to merge 1 commit into
Conversation
d39fa3c to
2098ed5
Compare
|
|
||
| def test_unwrap_raises_on_conflicting_out_of_order_fragment() -> None: | ||
| # `b` is first a value, then redefined as a table by an out-of-order | ||
| # fragment. The conflict is only detected when the out-of-order proxy is |
There was a problem hiding this comment.
this is a bug, right? the parsing should fail
do you have a fix lined up for this, or intend to open a bug report?
There was a problem hiding this comment.
You're right — this is a genuine pre-existing bug, not something this PR introduces. An out-of-order value-vs-table redefinition like this is invalid TOML and should be rejected at parse(). tomlkit already does that for the in-order form ([a] / b = true / [a.b] raises KeyAlreadyPresent at parse); only the out-of-order continuation (split by an unrelated table) slips past parse and surfaces lazily when the OutOfOrderTableProxy is built. stdlib tomllib rejects the same input.
On scope: #521 is a pure performance change to unwrap() (it resolves keys from _map instead of iterating items()) and deliberately preserves current behaviour. The assertion you flagged is a regression guard — it proves the faster unwrap() still goes through the proxy and raises on this invalid doc rather than silently merging it into a corrupted dict. I've reworded it to clearly document the current (deferred) behaviour rather than bless it as correct.
I've opened #523 for the parse-time defect — it deserves its own report and is distinct from this perf PR. It also has a strictly-worse sibling: the dotted-key variant ([a] / b.c = 1 / [zz] / [a.b] / d = 2) is silently accepted and merged, which tomllib rejects. Happy to look at a parse-time fix separately, or coordinate with #505/#507 since they share the proxy construction path.
Container.unwrap() and Table.unwrap() iterated self.items() (the inherited MutableMapping view), which routes every key through __getitem__ -> item(), rebuilding a SingleKey from the bare string on each key only to look the value back up. The structured Key objects already sit in the container's _map alongside their body index. Resolve straight from _map instead: O(1) per in-order key, no SingleKey rebuild. Out-of-order keys (a tuple index) still go through OutOfOrderTableProxy, so their validation (and fragment merge) is unchanged -- in particular a conflicting out-of-order fragment still raises KeyAlreadyPresent from unwrap() exactly as before. Table.unwrap() delegates to the inner container's unwrap(). No behaviour change: full suite incl. toml-test conformance passes; the output dict (keys, values AND order) plus exception parity are verified by a differential over ~9k generated documents (out-of-order / conflicting / nested / AoT / dotted / inline, with del+setitem mutations). ~9-13x faster unwrap() on key-heavy documents.
2098ed5 to
b214c44
Compare
What
Container.unwrap()andTable.unwrap()(TOML → plaindict) iteratedself.items()— the inheritedMutableMappingview. That routes every key through__getitem__→item(), which rebuilds aSingleKeyfrom the bare string for each key (running the key-type detection genexpr + recomputing_original) only to look the value back up. The structuredKeyobjects already sit in the container's_mapnext to their body index.Resolve straight from
_mapinstead:_mapiterates in the same insertion order asitems(), so key order is preserved. Out-of-order keys (a tuple index) still go throughOutOfOrderTableProxy, so their validation and fragment merge are unchanged — in particular a conflicting out-of-order fragment (a value later redefined as a table) still raisesKeyAlreadyPresentfromunwrap(), exactly as before.Table.unwrap()now delegates to the inner container'sunwrap().This is the conversion-path analogue of the native
__contains__work in #483: stop rebuilding keys that already exist.Benchmark
unwrap()on a parsed document, median, interleaved A/B vsmaster:poetry.lock-likepyproject.toml-likeParsing is untouched.
Tests
Full suite incl. the toml-test conformance submodule passes. No behaviour change — the output dict (keys, values and order) plus exception parity are verified by a differential over ~9k generated documents (out-of-order / conflicting / nested / arrays-of-tables / dotted / inline, with
del+__setitem__mutations). Added regression tests for out-of-orderunwrap()and for the conflicting-fragment case that must still raise.