Skip to content

[SPARK-57463][SQL] Render nanosecond-precision timestamp types in the Thrift server via the Types Framework#56519

Open
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:nanos-thriftserver
Open

[SPARK-57463][SQL] Render nanosecond-precision timestamp types in the Thrift server via the Types Framework#56519
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:nanos-thriftserver

Conversation

@MaxGekk

@MaxGekk MaxGekk commented Jun 15, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

This PR implements the Types Framework thriftTypeName hook for the nanosecond-capable timestamp types TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p) (p in 7-9), so they are usable over the Spark Thrift / JDBC server, reaching parity with the microsecond TimestampType / TimestampNTZType.

SparkExecuteStatementOperation resolves a column's Thrift TTypeId via TypeApiOps(typ).flatMap(_.thriftTypeName). TimestampNanosTypeApiOps did not override TypeApiOps.thriftTypeName (it defaulted to None), and the nanos types are not in the toTTypeIdDefault fallback, so a nanos column hit case other => throw new IllegalArgumentException("Unrecognized type name: ...").

The change overrides thriftTypeName in the abstract base TimestampNanosTypeApiOps (inherited by both the NTZ and LTZ subclasses) to return Some("STRING_TYPE"), mirroring the reference TimeTypeApiOps:

override def thriftTypeName: Option[String] = Some("STRING_TYPE")

STRING_TYPE is the correct mapping because RowSetUtils already serializes these values as a string column (TStringColumn), rendered at the column precision by HiveResult.toHiveString (consistent with cast-to-string). No changes were needed in SparkExecuteStatementOperation or RowSetUtils.

Note: related Hive CLI rendering through the framework is tracked separately by SPARK-57386.

Why are the changes needed?

To be able to retrieve nanosecond-precision timestamps via the Hive Thrift server. Before this change, with the preview flag enabled, such a query fails:

0: jdbc:hive2://localhost:10000/default> SET spark.sql.timestampNanosTypes.enabled=true;
0: jdbc:hive2://localhost:10000/default> SELECT timestamp_ntz'2021-01-01 01:02:03.000000001';
Error: java.lang.IllegalArgumentException: Unrecognized type name: timestamp_ntz(9) (state=,code=0)

This is analogous to the ANSI-interval issue fixed by SPARK-35017 (Unrecognized type name: day-time interval).

Does this PR introduce any user-facing change?

Yes. After the changes, nanosecond timestamp columns are returned over JDBC as strings rendered at the column precision (the nanos types are a preview feature gated by spark.sql.timestampNanosTypes.enabled):

0: jdbc:hive2://localhost:10000/default> SET spark.sql.timestampNanosTypes.enabled=true;
0: jdbc:hive2://localhost:10000/default> SELECT timestamp_ntz'2021-01-01 01:02:03.000000001' AS ntz9;
+--------------------------------+
|              ntz9              |
+--------------------------------+
| 2021-01-01 01:02:03.000000001  |
+--------------------------------+
0: jdbc:hive2://localhost:10000/default> SELECT timestamp_ltz'2021-01-01 01:02:03.123456789' AS ltz9;
+--------------------------------+
|              ltz9              |
+--------------------------------+
| 2021-01-01 01:02:03.123456789  |
+--------------------------------+
0: jdbc:hive2://localhost:10000/default> SELECT CAST('2021-01-01 01:02:03.123456789' AS TIMESTAMP_NTZ(7)) AS ntz7;
+------------------------------+
|             ntz7             |
+------------------------------+
| 2021-01-01 01:02:03.1234567  |
+------------------------------+
0: jdbc:hive2://localhost:10000/default> SELECT CAST('2021-01-01 01:02:03.123456789' AS TIMESTAMP_LTZ(8)) AS ltz8;
+-------------------------------+
|             ltz8              |
+-------------------------------+
| 2021-01-01 01:02:03.12345678  |
+-------------------------------+

With the flag off (the production default), a nanos literal continues to degrade to a microsecond timestamp, unchanged by this PR:

0: jdbc:hive2://localhost:10000/default> SELECT timestamp_ntz'2021-01-01 01:02:03.000000001' AS ntz;
+------------------------+
|          ntz           |
+------------------------+
| 2021-01-01 01:02:03.0  |
+------------------------+

How was this patch tested?

  1. New tests under sql/hive-thriftserver:
    • SparkExecuteStatementOperationSuite: asserts toTTableSchema maps TimestampNTZNanosType(p) / TimestampLTZNanosType(p) (p in 7-9) to TTypeId.STRING_TYPE.
    • HiveThriftBinaryServerSuite: an end-to-end JDBC test that enables the flag, queries NTZ/LTZ at precisions 7-9, and asserts both the column metadata (VARCHAR / "string") and the rendered fractional digits.
$ build/sbt -Phive -Phive-thriftserver "hive-thriftserver/testOnly *SparkExecuteStatementOperationSuite"
$ build/sbt -Phive -Phive-thriftserver "hive-thriftserver/testOnly *HiveThriftBinaryServerSuite -- -z nanosecond"
  1. Manually verified end-to-end against a running Thrift server with beeline (the before/after transcripts shown above).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

… Thrift server via the Types Framework

Override the Types Framework `thriftTypeName` hook in `TimestampNanosTypeApiOps`
to return `STRING_TYPE` for TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p) (p in 7-9), so
these columns are usable over the Spark Thrift / JDBC server instead of failing
type resolution. Values are already rendered as strings at the column precision
via HiveResult.toHiveString. Adds type-mapping and end-to-end JDBC tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant