GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x by Vishwanatha-HD · Pull Request #48211 · apache/arrow

Vishwanatha-HD · 2025-11-21T15:38:18Z

Rationale for this change

This PR is intended to enable Parquet DB support on Big-endian (s390x) systems. The fix in this PR fixes the Bloom Filter logic.

What changes are included in this PR?

The fix includes changes to following file:
cpp/src/parquet/bloom_filter.cc

Are these changes tested?

Yes. The changes are tested on s390x arch to make sure things are working fine. The fix is also tested on x86 arch, to make sure there is no new regression introduced.

Are there any user-facing changes?

No

GitHub main Issue link: #48151

GitHub Issue: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on Big-Endian (s390x) systems #48210

github-actions · 2025-11-21T15:38:48Z

⚠️ GitHub issue #48210 has been automatically assigned in GitHub to PR creator.

k8ika0s · 2025-11-23T22:20:42Z

@Vishwanatha-HD

Bloom filters are one of those parts of Parquet where tiny byte-order details end up mattering way more than you’d expect, so it’s good to see attention landing here.

Something I ran into on s390x is that the xxhash input/output tends to stay a lot more predictable if the bitset words are kept in a single canonical order (LE in our case) and the reader/writer treat them as such. In my own experiments I normalized the bitset once at the boundary and let the rest of the logic operate on native values.

In this patch, the per-word FromLittleEndian/ToLittleEndian inside the find/insert loops definitely keeps things correct, though it does create a slightly tighter coupling between the hashing logic and the byte-swapping. I only mention it because it can sometimes show up in profiling when bloom filters are exercised heavily over wide row groups.

Not calling this out as a problem — the behavior you’re targeting here lines up with what I’ve seen on s390x, especially around making sure the mask checks behave the same across BE/LE hosts. Just sharing observations in case it’s useful while these pieces get polished.

Vishwanatha-HD · 2026-06-03T12:19:58Z

@pitrou @kou @kiszk @zanmato1984
Hi All,
I know its been long time since I have my PRs opened.. Can you please help me with review and merging to the upstream. Is there anything more that you people want me to do it, I would be happy to work on it. Please suggest.
I have verified that with my fix all the 113 testcase passes without any issues.
Thanks.

kou · 2026-06-03T21:41:40Z

Could you rebase on main?

Copilot

Pull request overview

This pull request fixes Parquet Bloom Filter bitset access so Bloom filters are interpreted/stored in little-endian word order, enabling correct behavior on big-endian architectures (notably s390x) while preserving existing behavior on little-endian systems.

Changes:

Read Bloom filter bitset words using arrow::bit_util::FromLittleEndian() during lookups.
Write Bloom filter bitset words using a read/modify/write cycle with FromLittleEndian() / ToLittleEndian() during inserts.
Add the required Arrow endian utilities include.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+  uint32_t* raw_bitset32 = reinterpret_cast<uint32_t*>(data_->mutable_data());

  for (int i = 0; i < kBitsSetPerBlock; i++) {
+    const int word_index = bucket_index * kBitsSetPerBlock + i;


… s390x

Vishwanatha-HD · 2026-06-15T18:20:22Z

Could you rebase on main?

Hi @kou..
I have rebased this PR onto the main. Please check.. Thanks..

Vishwanatha-HD requested a review from wgtmac as a code owner November 21, 2025 15:38

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on Big-Endian (s390x) systems #48210

Open

github-actions Bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Nov 21, 2025

k8ika0s mentioned this pull request Nov 21, 2025

GH-48213: [C++][Parquet] Fix endianness and test failures on s390x (big-endian) (supersedes partial fixes) #48212

Closed

Vishwanatha-HD mentioned this pull request Nov 21, 2025

[C++][Parquet] Enable Parquet DB support on Big Endian (IBM Z) systems #48151

Open

Vishwanatha-HD force-pushed the fixBloomFilters branch from 6f6199c to 7dbe358 Compare November 22, 2025 05:03

kou changed the title ~~GH-48210 Fix Bloom Filter logic to enable Parquet DB support on s390x~~ GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x Nov 22, 2025

Vishwanatha-HD force-pushed the fixBloomFilters branch from 7dbe358 to 70dd0c1 Compare November 29, 2025 13:15

kou requested a review from Copilot June 3, 2026 21:41

Copilot started reviewing on behalf of kou June 3, 2026 21:41 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread cpp/src/parquet/bloom_filter.cc

uint32_t* raw_bitset32 = reinterpret_cast<uint32_t*>(data_->mutable_data());

for (int i = 0; i < kBitsSetPerBlock; i++) {

const int word_index = bucket_index * kBitsSetPerBlock + i;

apacheGH-48210 Fix Bloom Filter logic to enable Parquet DB support on…

41e672e

… s390x

Vishwanatha-HD force-pushed the fixBloomFilters branch from 70dd0c1 to 41e672e Compare June 15, 2026 18:19

Vishwanatha-HD requested a review from pitrou as a code owner June 15, 2026 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x#48211

GH-48210: [C++][Parquet] Fix Bloom Filter logic to enable Parquet DB support on s390x#48211
Vishwanatha-HD wants to merge 1 commit into
apache:mainfrom
Vishwanatha-HD:fixBloomFilters

Vishwanatha-HD commented Nov 21, 2025 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

Vishwanatha-HD commented Jun 3, 2026

Uh oh!

kou commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Vishwanatha-HD commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Vishwanatha-HD commented Nov 21, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions Bot commented Nov 21, 2025

Uh oh!

k8ika0s commented Nov 23, 2025

Uh oh!

Vishwanatha-HD commented Jun 3, 2026

Uh oh!

kou commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Vishwanatha-HD commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Vishwanatha-HD commented Nov 21, 2025 •

edited by github-actions Bot

Loading