Skip to content

gh-149079: Optimize sorting in unicodedata.normalize()#150782

Merged
encukou merged 1 commit into
python:mainfrom
serhiy-storchaka:unicodedata-normalize-optimize
Jun 15, 2026
Merged

gh-149079: Optimize sorting in unicodedata.normalize()#150782
encukou merged 1 commit into
python:mainfrom
serhiy-storchaka:unicodedata-normalize-optimize

Conversation

@serhiy-storchaka

@serhiy-storchaka serhiy-storchaka commented Jun 2, 2026

Copy link
Copy Markdown
Member

Sort the Py_UCS4 buffer instead of PyUnicodeObject. This allows to avoid the use of PyUnicode_READ() and PyUnicode_WRITE().

Sort the Py_UCS4 buffer instead of PyUnicodeObject. This allows to avoid
the use of PyUnicode_READ() and PyUnicode_WRITE().
@serhiy-storchaka

serhiy-storchaka commented Jun 2, 2026

Copy link
Copy Markdown
Member Author
./python -m timeit 's=("a"+"\u0300\u0327"*1000)*100; from unicodedata import normalize' -- 'normalize("NFC", s)'

Baseline: 100 loops, best of 5: 3.76 msec per loop
This PR: 100 loops, best of 5: 3.57 msec per loop

./python -m timeit 's=("a"+"\u0300\u0327"*9)*10000; from unicodedata import normalize' -- 'normalize("NFC", s)'

Baseline: 100 loops, best of 5: 3.99 msec per loop
This PR: 100 loops, best of 5: 3.84 msec per loop

@eendebakpt

Copy link
Copy Markdown
Contributor

@serhiy-storchaka On your benchmark I can improve from 3.75 ms to 2.0 ms by using a more efficient search in find_nfc_index. See main...eendebakpt:gh-149079-find-nfc-index. The changes in the PR look good at first sight.

@serhiy-storchaka

Copy link
Copy Markdown
Member Author

by using a more efficient search in find_nfc_index.

Looks interesting. But this is a different issue, not directly related to #149079. Can you open a new issue?

@eendebakpt

Copy link
Copy Markdown
Contributor

Looks interesting. But this is a different issue, not directly related to #149079. Can you open a new issue?

Done in #150889

@eendebakpt eendebakpt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked the PR (rebased onto main first) and found a 10% to 20% speedup on the benchmarks.

@encukou encukou merged commit 9074876 into python:main Jun 15, 2026
57 checks passed
@encukou

encukou commented Jun 15, 2026

Copy link
Copy Markdown
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants