# Accessing Trino 423 Trino 423 is the current version of Trino (Presto) now available for Treasure Data queries. The new features and improvements are - Performance Improvements - New UDFs - New SQL support - JDBC Improvements - tzdata version update # Performance Improvements Trino 423 realizes performance issues in the following areas: - join - improved join performance. ([#5981](https://github.com/trinodb/trino/issues/5981),[#8974](https://github.com/trinodb/trino/issues/8974),[#13352](https://github.com/trinodb/trino/issues/13352)) - improved performance of certain queries involving LEFT, RIGHT and FULL JOIN when one side of the join is known to produce a single row. ([#7090](https://github.com/trinodb/trino/issues/7090)) - improved performance of queries that contain joins on varchar keys of different length. ([#7644](https://github.com/trinodb/trino/issues/7644)) - improved performance of inequality joins. ([#9307](https://github.com/trinodb/trino/issues/9307)) - improved performance of joins involving a small table on one side. ([#9851](https://github.com/trinodb/trino/issues/9851)) - improved performance of queries that contain JOIN and UNION clauses. ([#11935](https://github.com/trinodb/trino/issues/11935)) - improved join performance when one side of the join is small. ([#12257](https://github.com/trinodb/trino/issues/12257)) - improved performance of queries involving joins on a single bigint column. ([#13432](https://github.com/trinodb/trino/issues/13432)) - improved planning performance for queries involving joins. ([#17458](https://github.com/trinodb/trino/issues/17458)) - in, exists - improved performance for queries using IN predicate with moderate to large number of constants. ([#8833](https://github.com/trinodb/trino/issues/8833)) - improved performance of IN () expressions. ([#8639](https://github.com/trinodb/trino/issues/8639)) - improved query planning performance for queries containing large IN predicates. ([#9874](https://github.com/trinodb/trino/issues/9874)) - improved planning performance of queries with large IN lists. ([#11902](https://github.com/trinodb/trino/issues/11902),[#11918](https://github.com/trinodb/trino/issues/11918),[#11956](https://github.com/trinodb/trino/issues/11956)) - improved performance of queries involving correlated IN or EXISTS predicates. ([#12047](https://github.com/trinodb/trino/issues/12047)) - group by - improved performance of GROUP BY with a large number of groups. ([#11011](https://github.com/trinodb/trino/issues/11011),[#11361](https://github.com/trinodb/trino/issues/11361)) - improved performance of queries that contain GROUP BY clauses. ([#12095](https://github.com/trinodb/trino/issues/12095)) - improved planning performance for queries with many GROUP BY clauses. ([#15292](https://github.com/trinodb/trino/issues/15292)) - functions - improved performance of queries using `rank()` window function. ([#6333](https://github.com/trinodb/trino/issues/6333)) - improved performance of `sum()` and `avg()` for decimal types. ([#6591](https://github.com/trinodb/trino/pull/6591),[#8878](https://github.com/trinodb/trino/issues/8878)) - improved performance of decimal aggregations. ([#9640](https://github.com/trinodb/trino/issues/9640),[#13573](https://github.com/trinodb/trino/issues/13573)) - improved performance of queries that contain `array_distinct(array_sort(…))` expressions. ([#8777](https://github.com/trinodb/trino/issues/8777)) - improved performance of queries with window functions. ([#15994](https://github.com/trinodb/trino/issues/15994)) - improved performance of queries with `row_number()` and `rank()` window functions. ([#16753](https://github.com/trinodb/trino/issues/16753)) - types - improved performance of specific queries which compare table columns of type timestamp with date literals. ([#11170](https://github.com/trinodb/trino/issues/11170)) - improved performance of queries that compare date columns with `timestamp(n)` with time zone literals. ([#5798](https://github.com/trinodb/trino/issues/5798)) - improved performance of queries that process row or array data. ([#9402](https://github.com/trinodb/trino/issues/9402)) - improved performance of map and row types. ([#10469](https://github.com/trinodb/trino/issues/10469)) - improved read performance for row data types. ([#12926](https://github.com/trinodb/trino/issues/12926)) - improved performance of queries involving row type or certain aggregations such as sum, avg, etc. ([#12762](https://github.com/trinodb/trino/issues/12762)) - improved performance of queries that process string data. ([#12798](https://github.com/trinodb/trino/issues/12798)) - improved performance of queries with an UNNEST clause. ([#10506](https://github.com/trinodb/trino/issues/10506)) - improved performance of LIKE expressions. ([#15999](https://github.com/trinodb/trino/issues/15999)) - improved performance of UNION ALL queries. ([#17265](https://github.com/trinodb/trino/issues/17265)) # New UDFs The following new UDFs are available in Trino 423: - `version()` ([#4627](https://github.com/trinodb/trino/issues/4627)) - `to_geojson_geometry()` and `from_geojson_geometry()` ([#6355](https://github.com/trinodb/trino/issues/6355)) - `soundex()` ([#4022](https://github.com/trinodb/trino/issues/4022)) - `format_number()` ([#1878](https://github.com/trinodb/trino/issues/1878)) - `geometry_nearest_points()`. ([#8280](https://github.com/trinodb/trino/issues/8280)) - support for `listagg()`. ([#4835](https://github.com/trinodb/trino/issues/4835)) - contains function to check whether a CIDR contains an IP address. ([#9654](https://github.com/trinodb/trino/issues/9654)) - `trim_array()` ([#11238](https://github.com/trinodb/trino/issues/11238)) - `to_base32()` and `from_base32()` ([#11439](https://github.com/trinodb/trino/issues/11439)) - `json_exists`, `json_query`, `json_value`, `json_array`, and `json_object` [JSON functions](https://trino.io/docs/current/functions/json.html). ([#9081](https://github.com/trinodb/trino/issues/9081)) - `sinh()` ([#16494](https://github.com/trinodb/trino/issues/16494)) - `quantile_at_value()` ([#16736](https://github.com/trinodb/trino/issues/16736)) - `array_histogram()` ([#14725](https://github.com/trinodb/trino/issues/14725)) - support for `any_value()` ([#17777](https://github.com/trinodb/trino/issues/17777)) # New SQL support The following SQL support has been added: - Support for WINDOW clause. ([#651](https://github.com/trinodb/trino/issues/651)) - Support for [MATCH_RECOGNIZE](https://trino.io/docs/current/sql/match-recognize.html). ([#6111](https://github.com/trinodb/trino/issues/6111)) - Support for standard SQL trim syntax. ([#11236](https://github.com/trinodb/trino/issues/11236)) - Support for correlated sub-queries in DELETE queries. ([#9447](https://github.com/trinodb/trino/issues/9447)) - Support for [recursive member access](https://trino.io/docs/current/functions/json.html#json-descendant-member-accessor) access to the [JSON path language](https://trino.io/docs/current/functions/json.html#json-path-language). ([#16854](https://github.com/trinodb/trino/issues/16854)) - Support for CUBE and ROLLUP with composite sets. ([#16981](https://github.com/trinodb/trino/issues/16981)) - Support for [EXECUTE IMMEDIATE](https://trino.io/docs/current/sql/execute-immediate.html). ([#17341](https://github.com/trinodb/trino/issues/17341)) - Support for underscores in numeric literals. ([#17776](https://github.com/trinodb/trino/issues/17776)) - Support for hexadecimal, binary, and octal numeric literals. ([#17776](https://github.com/trinodb/trino/issues/17776)) # JDBC improvements Trino 423 implements the following method: - `PreparedStatement.getParameterMetaData()` ([#2978](https://github.com/trinodb/trino/issues/2978)) # tzdata Trino 423 updates the `tzdata` version. With this new version, the `Pacific/Kanton`time zone is now supported; however, as a result, queries can no longer reference the `US/Pacific-New` zone.([#6660](https://github.com/trinodb/trino/issues/6660),[#10679](https://github.com/trinodb/trino/issues/10679)) # Migration Guide Currently, the default Presto query engine is Presto 350. You can specify the version of Trino/Presto you want to use by prefacing your SQL query with a "magic comment:" **Presto 350** ``` -- @TD engine_version: 350 ``` **Trino 423** ``` -- @TD engine_version: 423 ``` Using pytd, you can add the magic comment in the query with `db` parameter as shown here. ``` client.query('-- @TD engine_version: 350 \n select * from nasdaq limit 10', db='sample_datasets') ``` # Trino 423 Limitations and Differences While most existing Presto 350 queries are compatible with Trino 423, some queries will require changes. ## Casting from double or real to varchar Follow the SQL standard when casting from double or real to varchar. ``` presto> with t as (select * from (VALUES DOUBLE '10.3', DOUBLE '100.0') AS t (k)) select CAST("k" AS VARCHAR) from t; _col0 ------- 10.3 100.0 (2 rows) ``` ``` trino> with t as (select * from (VALUES DOUBLE '10.3', DOUBLE '100.0') AS t (k)) select CAST("k" AS VARCHAR) from t; _col0 -------- 1.03E1 1.0E2 (2 rows) To get the same result as 350, you need to cast to decimal: ``` If you want to get the same result as Presto 350, you need to cast to decimal: ```sql trino> with t as (select * from (VALUES DOUBLE '10.3', DOUBLE '100.0') AS t (k)) select CAST(CAST("k" AS DECIMAL(10,1)) AS VARCHAR) from t; _col0 ------- 10.3 100.0 (2 rows) ``` ## Handling return type from unixtime When using time zone, change return type of `from_unixtime()`, and `from_unixtime_nanos()`, to `timestamp(p)` with time zone. ``` presto> select from_unixtime(time) from sample_datasets.www_access order by time limit 1; _col0 ------------------------- 2014-10-03 07:20:45.000 (1 row) presto> select from_unixtime_nanos(1699503369586210000); _col0 ------------------------- 2023-11-09 04:16:09.586 (1 row) ``` ``` trino> select from_unixtime(time) from sample_datasets.www_access order by time limit 1; _col0 ------------------------------------ 2014-10-03 16:20:45.000 Asia/Tokyo (1 row) trino> select from_unixtime_nanos(1699503369586210000); _col0 ------------------------------------------ 2023-11-09 13:16:09.586210000 Asia/Tokyo (1 row) ``` ## Differences in to_unixtime In the Treasure Data implementation of Trino 423 the `to_unixtime` function continues to operate in the same way as Presto 350. Be aware, however, that the Trino 423 specification introduced new timestamp semantics that required timestamps to be cast to `timestapm(p) with time zone` before being passed to `to_unixtime`. ## Casting from row to json does not return a JSON array in Trino 423 ``` presto> select cast(row("code", "host", "time") as json) from sample_datasets.www_access; _col0 ------------------------------------ [200,"132.174.221.142",1412374112] [200,"104.216.96.28",1412374101] ... ``` ``` trino> select cast(row("code", "host", "time") as json) from sample_datasets.www_access; _col0 --------------------------------------------- {"":200,"":"60.183.134.133","":1412366395} {"":200,"":"76.45.157.42","":1412366383} ... ``` ## Invalid timestamp literals can't be casted In Presto 350, an invalid timestamp literal can be casted. However, this is a bug and has been fixed in Trino 423. ``` presto> select cast('2023-12-13 14.18.34.000' as timestamp); _col0 ------------------------- 2023-12-13 00:00:00.000 (1 row) # The literals are converted with due validation, so the same literal raises an error presto> select TIMESTAMP '2023-12-13 14.18.34.000'; Query 20231213_052357_26435_7amdj failed: line 1:8: '2023-12-13 14.18.34.000' is not a valid timestamp literal ``` ``` # In Trino 423, the query fails with an error. trino> select cast('2023-12-13 14.18.34.000' as timestamp); Query 20231213_053737_00008_abjfn failed: Value cannot be cast to timestamp: 2023-12-13 14.18.34.000 ``` Additionally, when using Trino 423, ISO-8601 (cast to timestamp) also fails. (In Presto 350, casting ISO-8601 to timestamp is not correct.) ``` # incorrect presto> SELECT cast('2023-12-13T14:18:34+0000' as timestamp); _col0 ------------------------- 2023-12-13 00:00:00.000 (1 row) # correct presto> SELECT from_iso8601_timestamp('2023-12-13T14:18:34+0000'); _col0 ----------------------------- 2023-12-13 14:18:34.000 UTC (1 row) ``` ``` # In 423, the query fails. trino> SELECT cast('2023-12-13T14:18:34+0000' as timestamp); Query 20231213_060050_00012_abjfn failed: Value cannot be cast to timestamp: 2023-12-13T14:18:34+0000 # correct trino> SELECT from_iso8601_timestamp('2023-12-13T14:18:34+0000'); _col0 ----------------------------- 2023-12-13 14:18:34.000 UTC (1 row) ``` Generally, use - `from_iso8601_timestamp` - `from_iso8601_timestamp_nanos` - `from_iso8601_date` - `to_iso8601` ## Changes to the handling of row expressions for IN predicates, quantified comaprisons, and scalar subqueries Trino 423 fixed the handling of row expressions for IN predicates, quantified comparisons and scalar subqueries. ``` # incorrect behavior presto> SELECT -> time, (code, host) -> FROM -> sample_datasets.www_access -> WHERE -> TD_TIME_RANGE(time, '2014-10-04 01:13:00', '2014-10-04 01:13:05') -> AND (time, code, host) IN -> ( -> SELECT (time, code, host) FROM sample_datasets.www_access -> ); time | _col1 ------------+----------------------- 1412385182 | {200, 40.81.151.94} 1412385184 | {200, 216.51.135.153} 1412385180 | {200, 76.75.28.148} (3 rows) ``` ``` # In 423, the query fails with a type mismatch error. trino> SELECT -> time, (code, host) -> FROM -> sample_datasets.www_access -> WHERE -> TD_TIME_RANGE(time, '2014-10-04 01:13:00', '2014-10-04 01:13:05') -> AND (time, code, host) IN -> ( -> SELECT (time, code, host) FROM sample_datasets.www_access -> ); Query 20231110_073823_00001_ymnng failed: line 7:26: Value expression and result of subquery must be of the same type: row(bigint, bigint, varchar) vs row(row(bigint, bigint, varchar)) # Need to change: # (time, code, host) IN (SELECT (time, code, host) FROM sample_datasets.www_access) # to: # (time, code, host) IN (SELECT time, code, host FROM sample_datasets.www_access) ``` ## Differences in casting numeric values to varchar(n) In Presto 350, even when the varchar type length was too short, a query was successful. In Trino 423, the query fails with an invalid cast argument error. **Numeric Values** ``` presto> select CAST(12345678900 AS varchar(11)); _col0 ------------- 12345678900 (1 row) presto> select CAST(12345678900 AS varchar(50)); _col0 ------------- 12345678900 (1 row) presto> select CAST(12345678900 AS varchar(10)); _col0 ------------ 1234567890 (1 row) ``` ``` trino> select CAST(12345678900 AS varchar(11)); _col0 ------------- 12345678900 (1 row) trino> select CAST(12345678900 AS varchar(50)); _col0 ------------- 12345678900 (1 row) trino> select CAST(12345678900 AS varchar(10)); Query 20240130_084534_31703_24gcm failed: Value 12345678900 cannot be represented as varchar(10) ``` **Date** ``` presto> select CAST(DATE '2013-02-02' AS varchar(10)); _col0 ------------ 2013-02-02 (1 row) presto> select CAST(DATE '2013-02-02' AS varchar(50)); _col0 ------------ 2013-02-02 (1 row) presto> select CAST(DATE '2013-02-02' AS varchar(9)); _col0 ----------- 2013-02-0 (1 row) ``` ``` trino> select CAST(DATE '2013-02-02' AS varchar(10)); _col0 ------------ 2013-02-02 (1 row) trino> select CAST(DATE '2013-02-02' AS varchar(50)); _col0 ------------ 2013-02-02 (1 row) trino> select CAST(DATE '2013-02-02' AS varchar(9)); Query 20240131_022940_06992_24gcm failed: Value 2013-02-02 cannot be represented as varchar(9) ``` **Encoding Values** For example, `now()` returns a `timestamp(3)` with time zone type value. ``` select cast((now()) AS varchar(10)); // => 2024-01-31 ``` ``` select cast((now()) AS varchar(10)); // => Query 20240131_024017_07317_24gcm failed: Value [2024-01-31 02:40:17.842 UTC] does not fit in type varchar(10) ``` ## UNNEST ARRAY with 255+ items causes an error According to the Trino community, > Due to the way arrays literals are modeled in Trino, there’s currently a physical limitation on how many items you can have in an array declaration. It’s possible that before it worked “by chance” based on when the array declaration was being converted into a function call and which optimizer rules ran in between. Consider using the alternatives listed [here](https://github.com/trinodb/trino/issues/16033#issuecomment-1487717358). ## CREATE TABLE and DROP TABLE no longer return result sets In earlier versions of Presto, DDL queries would return outputs. With Trino 423, CREATE TABLE and DROP TABLE no longer return result sets. In case you use `pytd` library, update to v1.5.2`or later to support this result sets handling. Earlier versions will return`TypeError: 'NoneType' object is not iterable` when pytd fetches a result of CREATE TABLE and DROP TABLE. ## information_schema.role_authorization_descriptors table has been removed Becasue the table was only intended to be used in the SQL views that define the actual tables, the `information_schema.role_authorization_descriptors` table has been removed. ## Name of catalog changes In 423, the catalog is `td`. However, to ensure backward compatibility, `td-presto` continues as an alias for `td`. The same URL works as is in 423. However, be aware of the alias does not display in the results you get back from `information_schema` or `system.jdbc`. **information_schema** ``` presto> select table_catalog, table_schema, table_name from information_schema.tables where table_schema = 'sample_datasets' and table_name = 'www_access'; table_catalog | table_schema | table_name ---------------+-----------------+------------ td-presto | sample_datasets | www_access (1 row) presto> select table_catalog, table_schema, table_name, column_name from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access'; table_catalog | table_schema | table_name | column_name ---------------+-----------------+------------+------------- td-presto | sample_datasets | www_access | user td-presto | sample_datasets | www_access | host ... ``` ``` trino> select table_catalog, table_schema, table_name from information_schema.tables where table_schema = 'sample_datasets' and table_name = 'www_access'; table_catalog | table_schema | table_name ---------------+-----------------+------------ td | sample_datasets | www_access (1 row) trino> select table_catalog, table_schema, table_name, column_name from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access'; table_catalog | table_schema | table_name | column_name ---------------+-----------------+------------+------------- td | sample_datasets | www_access | user td | sample_datasets | www_access | host ... ``` **JDBC** ``` select TABLE_SCHEM, TABLE_CATALOG from system.jdbc.schemas where TABLE_CATALOG = 'td-presto' select TABLE_CAT, TABLE_SCHEM, TABLE_NAME from system.jdbc.tables where TABLE_CAT = 'td-presto' select TABLE_CAT, TABLE_SCHEM, COLUMN_NAME from system.jdbc.columns where TABLE_CAT = 'td-presto' and TABLE_NAME = 'www_access' ``` ``` select TABLE_SCHEM, TABLE_CATALOG from system.jdbc.schemas where TABLE_CATALOG = 'td' select TABLE_CAT, TABLE_SCHEM, TABLE_NAME from system.jdbc.tables where TABLE_CAT = 'td' select TABLE_CAT, TABLE_SCHEM, COLUMN_NAME from system.jdbc.columns where TABLE_CAT = 'td' and TABLE_NAME = 'www_access' ``` ## Some extra information eliminated from SHOW COLUMNS and information_schema.columns In Trino 423, SHOW COLUMNS and `information_schema.columns` do not show extra information. ``` presto> SHOW COLUMNS FROM sample_datasets.www_access Like '%host%'; Column | Type | Extra | Comment --------+---------+--------------+--------- host | varchar | keyName=host | (1 row) presto> select column_name, data_type, extra_info, comment from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access' and column_name Like '%host%'; column_name | data_type | extra_info | comment -------------+-----------+--------------+--------- host | varchar | keyName=host | NULL (1 row) ``` ``` trino> SHOW COLUMNS FROM sample_datasets.www_access Like '%host%'; Column | Type | Extra | Comment --------+---------+-------+--------- host | varchar | | (1 row) trino> select column_name, data_type, extra_info, comment from information_schema.columns where table_schema = 'sample_datasets' and table_name = 'www_access' and column_name Like '%host%'; column_name | data_type | extra_info | comment -------------+-----------+------------+--------- host | varchar | NULL | NULL (1 row) ``` To see keys names with Trino 423, use SHOW CREATE TABLE. ``` presto> SHOW CREATE TABLE your_db.language; Create Table ----------------------------------------------- CREATE TABLE "td-presto".your_db.language ( name varchar, time bigint ) (1 row) ``` ``` trino> SHOW CREATE TABLE your_db.language; Create Table --------------------------------------------- CREATE TABLE td.your_db.language ( name varchar WITH ( key_name = 'name' ), time bigint WITH ( key_name = 'time' ) ) (1 row) ``` # Further Reading - [Trino concepts](https://trino.io/docs/current/overview/concepts.html) - [Trino Functions and Operators](https://trino.io/docs/current/functions.html)