Skip to content

GH-38849: [C++][Parquet] Add support for list view and large list view#50160

Open
HuaHuaY wants to merge 5 commits into
apache:mainfrom
HuaHuaY:list_view_type
Open

GH-38849: [C++][Parquet] Add support for list view and large list view#50160
HuaHuaY wants to merge 5 commits into
apache:mainfrom
HuaHuaY:list_view_type

Conversation

@HuaHuaY

@HuaHuaY HuaHuaY commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

Allow us to write an arrow list view array into a parquet file and read list view type from parquet.

What changes are included in this PR?

We will sort the list views based on the offsets before inserting the array in Parquet writer, and generate a size array in Parquet reader.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@wgtmac wgtmac left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. I've left some comments. cc @mapleFU @pitrou

Comment thread cpp/src/parquet/arrow/reader.cc
Comment thread cpp/src/parquet/arrow/reader.cc Outdated
Comment thread cpp/src/parquet/arrow/writer.cc
Comment thread cpp/src/parquet/arrow/arrow_reader_writer_test.cc
}

TEST(ArrowReadWrite, LargeListView) {
auto values = ArrayFromJSON(::arrow::int32(), "[1, 2, 3, 4, 5]");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a single test by parameterizing schema and table to reduce duplicated lines? I don't remember if RecordBatchFromJson supports list view natively.

@HuaHuaY HuaHuaY Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much duplicate code here, and it's not easy to merge. Because without enabling store_schema, large list type requires manually changing the inner array name from item to element.

@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 15, 2026
Comment thread cpp/src/parquet/properties.h Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants