# Accessing Presto 350 (Legacy) info The current version of Presto on Treasure Data is Trino 423. For additional information for Trino 423, including release information and known limitations, see [Accessing Trino 423](/en/tools/presto/trino/#accessing-trino-423). # TD Presto API Release Information for Presto 350 (Legacy) All native Presto functions are supported in Treasure Data. However, the following section provides notes and information about the features, changes, and limitations of the Presto 350. - [Performance Improvements](#performance-improvements) - [New UDFs](#new-udfs) - [New SQL support](#new-sql-support) - [JDBC Improvements](#jdbc-improvements) - [time partitioning shuffle magic comment for INSERT CTAS](#time-partitioning-shuffle-magic-comment-for-insert-ctas) - [Migration Guide](#migration-guide-for-presto-350-legacy) ## Performance Improvements Here is a list of updates related to performance improvements: - Performance improvements in querying `information_schema` tables [#999](https://github.com/trinodb/trino/pull/999/), [#1306](https://github.com/trinodb/trino/pull/1306/), [#1329](https://github.com/trinodb/trino/pull/1329), [#1543](https://github.com/trinodb/trino/pull/1543), [#2488](https://github.com/trinodb/trino/pull/2488) - Pushdown support for some functions and operators in PostgreSQL (DataTank) connector [#3881](https://github.com/trinodb/trino/pull/3881), [#4111](https://github.com/trinodb/trino/pull/4111), [#4112](https://github.com/trinodb/trino/issues/4112), [#5261](https://github.com/trinodb/trino/pull/5261) - Improved query performance by reducing worker to worker communication overhead. [#6126](https://github.com/trinodb/trino/issues/6126), [#5905](https://github.com/trinodb/trino/issues/5905), [#5949](https://github.com/trinodb/trino/pull/5949) - Improved performance of `ORDER BY ... LIMIT` queries. [#6072](https://github.com/trinodb/trino/issues/6072) - Improved performance of queries with uncorrelated `IN` clauses. [#5582](https://github.com/trinodb/trino/issues/5582) - Improved performance of queries that use the `decimal` type. [#4730](https://github.com/trinodb/trino/pull/4730), [#4886](https://github.com/trinodb/trino/pull/4886), [#5181](https://github.com/trinodb/trino/pull/5181) - Improved performance and accuracy of `approx_percentile()`. [#5158](https://github.com/trinodb/trino/pull/5158) - Improved performance of certain cross join queries. [#5276](https://github.com/trinodb/trino/pull/5276) - Reduced latency for queries that perform a broadcast join of a large table. [#5237](https://github.com/trinodb/trino/pull/5237) - Improved performance of queries involving comparisons between `DOUBLE` or `REAL` values and integer values. [#3533](https://github.com/trinodb/trino/pull/3533) - Improved performance of queries involving `row_number()`. [#3614](https://github.com/trinodb/trino/pull/3614) - Improved performance of queries containing `LIKE` predicate. [#3618](https://github.com/trinodb/trino/pull/3618) - Improved performance for queries that read fields from nested structures. [#2672](https://github.com/trinodb/trino/pull/2672) - Improved performance of queries involving constant scalar subqueries [#3432](https://github.com/trinodb/trino/pull/3432) - Improved query performance by removing redundant data reshuffling. [#2853](https://github.com/trinodb/trino/pull/2853) - Improved performance of inequality joins involving `BETWEEN`. [#2859](https://github.com/trinodb/trino/pull/2859) - Improved join performance for dictionary encoded data. [#2862](https://github.com/trinodb/trino/pull/2862) - Improved performance of queries containing redundant scalar subqueries. [#2456](https://github.com/trinodb/trino/pull/2456) - Improved performance of `INSERT` and `CREATE TABLE ... AS` queries containing redundant `ORDER BY` clauses. [#2044](https://github.com/trinodb/trino/pull/2044) - Improved performance when processing columns of `map` type. [#2015](https://github.com/trinodb/trino/pull/2015) - Reduced query memory usage by improving retained size estimation for `VARCHAR` and `CHAR` types. [#4123](https://github.com/trinodb/trino/pull/4123) - Improved performance of certain join queries by reducing the amount of data that needs to be scanned. [#1673](https://github.com/trinodb/trino/pull/1673) - Improved performance of queries containing complex predicates. [#1515](https://github.com/trinodb/trino/pull/1515) - Improved performance of certain window functions when using bounded window frames (e.g., `ROWS BETWEEN ... PRECEDING AND ... FOLLOWING`). [#464](https://github.com/trinodb/trino/pull/464) - Improved performance of certain queries involving coercions and complex expressions in `JOIN` conditions. [#1390](https://github.com/trinodb/trino/pull/1390) ## New UDFs Many new UDFs are available. See the [Trino official documentation](https://trino.io/docs/current/functions.html) to understand the usage of each functions. - contains_sequence() - concat_ws() - murmur3() - from_unixtime_nanos() - T-Digest functions: T-Digest functions — Trino 361 Documentation - from_iso8601_timestamp_nanos() - human_readable_seconds() - bitwise_left_shift(), bitwise_right_shift() and bitwise_right_shift_arithmetic() - luhn_check() - approx_most_frequent() - random(m, n) - starts_with() - regexp_count(), regexp_position() - strpos(string, substring, instance) - Geospatial functions: to_encoded_polyline(), from_encoded_polyline(), line_interpolate_point(), line_interpolate_points(), geometry_from_hadoop_shape(), ST_Length(SphericalGeography) - at_timezone(), with_timezone() - last_day_of_month() ## New SQL support - Add IF EXISTS and IF NOT EXISTS syntax to ALTER TABLE. [#4651](https://github.com/trinodb/trino/pull/4651) - Add support for INTERSECT ALL and EXCEPT ALL. [#2152](https://github.com/trinodb/trino/pull/2152) - Add support for DISTINCT clause in aggregations within correlated subqueries. [#5904](https://github.com/trinodb/trino/pull/5904) - Add support for `RANGE BETWEEN PRECEDING AND FOLLOWING` window frames. [#609](https://github.com/trinodb/trino/issues/609) - Add support for window frames based on `GROUPS`. [#5713](https://github.com/trinodb/trino/issues/5713) - Add support for `extract()` with `TIMEZONE_HOUR` and `TIMEZONE_MINUTE` for `time with time zone` values. [#5668](https://github.com/trinodb/trino/issues/5668) - Add support for correlated subqueries in recursive queries. [#4877](https://github.com/trinodb/trino/pull/4877) - Add support for `IN` predicate with subqueries in outer join condition. [#4151](https://github.com/trinodb/trino/pull/4151) - Add support for quantified comparisons (e.g., `> ALL (...)`) in aggregation queries. [#4128](https://github.com/trinodb/trino/pull/4128) - Add support for variable-precision `TIME` type. [#4381](https://github.com/trinodb/trino/pull/4381) - Add support for variable precision `TIME WITH TIME ZONE` type. [#4905](https://github.com/trinodb/trino/pull/4905) - Add support for variable-precision `TIMESTAMP` (without time zone) type. [#3783](https://github.com/trinodb/trino/pull/3783) - Add support for variable-precision `TIMESTAMP WITH TIME ZONE` type [#3947](https://github.com/trinodb/trino/pull/3947) - Allow inserting values of a larger type into as smaller type when the values fit. For example, BIGINT into SMALLINT, or VARCHAR(10) into VARCHAR(3). Values that don't fit will cause an error at runtime. [#2061](https://github.com/trinodb/trino/pull/2061) - Allow using `.*` on expressions of type `ROW` in the `SELECT` clause to convert the fields of a row into multiple columns. [#1017](https://github.com/trinodb/trino/pull/1017) - Allow references to tables in the enclosing query when using ".*." [#1867](https://github.com/trinodb/trino/pull/1867) - Add support for IGNORE NULLS for window functions. [#1244](https://github.com/trinodb/trino/pull/1244) - Add support for INNER and OUTER joins involving UNNEST. [#1522](https://github.com/trinodb/trino/pull/1522) ## JDBC Improvements Query parameters are supported in LIMIT, OFFSET, and FETCH FIRST clauses. ``` Connection conn = ... PreparedStatement stmt = conn.prepareStatement( "SELECT * FROM sample_datasets.www_access OFFSET ? LIMIT ?"); stmt.setInt(1, 10); // OFFSET = 10 stmt.setInt(2, 20); // LIMIT = 20 ResultSet rs = stmt.executeQuery(); ... ``` ## Time Partitioning Shuffle Magic Comment for INSERT CTAS Treasure Data supports a TD-specific setting to customize a partition size for INSERT/CTAS queries for better query performance. The following is an example of a time_partitioning_range_session property that is supported as a magic comment. ``` -- set session time_partitioning_range = '12h' ``` This comment can also be used to disable shuffling by specifying `no-shuffle` as a value. In this case, however, the partition size cannot be customized. In Presto 350, we introduced a new magic comment time_partitioning_shuffle = 'true'|'false' so that you can specify partition shuffling and time partition ranges independently. ## Migration Guide for Presto 350 (Legacy) While most existing Presto v317 queries are compatible with the new version v350, some queries require changing the syntax. ### Run the New Presto Version You can specify the Presto version that runs your queries by using the following query hints: ``` (i) Presto 350 -- @TD engine_version: 350 ``` ``` (ii) Presto 317: -- @TD engine_version: 317 ``` You can request Treasure Data Support to change the default engine version to Presto 350. #### Next Steps We plan to start migrating the Presto default version in the near future. The target date will be announced soon, but we recommend you verify your Presto queries and switch to the new version as soon as possible. We plan to inform a list of potential incompatible Presto queries to the executing users through our simulation when we announce the schedule. If you have any further inquiries, please reach out to support@treasure-data.com or your Customer Success representative. ### Changed Presto Queries - [SELECT DISTINCT, ORDER BY ](#select-distinct-order-by) - [lag() and lead() ORDER BY](#lag-and-lead-order-by) - [Window Frame](#window-frame) - [TIME and TIMESTAMP Behavior Changes](#time-and-timestamp-behavior-changes) - [Query Length Limitation](#query-length-limitation) - [checksum()](#checksum) - [approx percentile](#approx-percentile) - [double NaN to integer](#double-nan-to-integer) - [information_schema.columns](#information_schemacolumns) - [Length check for CHAR(x) and VARCHAR(x)](#length-check-for-charx-and-varcharx) - [Trailing delimiter in ARRAY_JOIN() result](#trailing-delimiter-in-array_join-result) - [Invalid value error at TRY(CAST(VARCHAR as TIMESTAMP))](#invalid-value-error-at-trycastvarchar-as-timestamp) #### SELECT DISTINCT, ORDER BY When SELECT DISTINCT is used with ORDER BY statement, expressions must appear in the select list. The following query works in the current version of Presto. ``` SELECT DISTINCT from_unixtime(time) , COUNT(*), time FROM sample_datasets.www_access GROUP BY from_unixtime(time), time ORDER BY from_unixtime(time) ``` This query fails on Presto 350 with an error message like For SELECT DISTINCT, ORDER BY expressions must appear in select list. To solve this issue on Presto 350, rewrite this query as follows. ``` SELECT DISTINCT from_unixtime(time) , COUNT(*), time FROM sample_datasets.www_access GROUP BY from_unixtime(time) , time ORDER BY from_unixtime(sample_datasets.www_access.time) ``` #### lag() and lead() ORDER BY lag() and lead() require ORDER BY in Presto 350. For example, the following query works on the current version, but it doesn't on Presto 350. ``` SELECT time, LAG (path, 1) OVER () AS lag_data, path, LEAD (path, 1) OVER () AS lead_data FROM sample_datasets.www_access ``` To support this query in Presto 350, you must explicitly use the ORDER BY clause as follows. ``` SELECT time, LAG (path, 1) OVER (ORDER BY time) AS lag_data, path, LEAD (path, 1) OVER (ORDER BY time) AS lead_data FROM sample_datasets.www_access ``` ##### Window Frame Presto 350 has a new semantic check for lag() and lead(); the frame cannot be specified. For example, the following query works on the current version but doesn't produce expected results. However, this query fails on Presto 350. ``` SELECT time, user, LAG (path, 1) OVER (ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS lag_data, path, LEAD (path, 1) OVER (ORDER BY time ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS lead_data FROM sample_datasets.www_access ``` To fix this query in Presto 350, remove ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING. ``` SELECT time, user, LAG (path, 1) OVER (ORDER BY time) AS lag_data, path, LEAD (path, 1) OVER (ORDER BY time) AS lead_data FROM sample_datasets.www_access ``` #### TIME and TIMESTAMP Behavior Changes In Presto 350, TIME and TIMESTAMP behaviors follow SQL standards; it has some incompatibilities with the current version. Presto has the following data types to represent time and timestamp: - TIME - TIME WITH TIME ZONE - TIMESTAMP - TIMESTAMP WITH TIME ZONE In the current version, TIME and TIMESTAMP are similar to WITH TIME ZONE types, but in Presto 350, TIME and TIMESTAMP don't have an associated timezone. This might affect query results. **Examples** The result of the following query will be different depending on the Presto version: ``` SELECT CAST('2000-01-01 00:00:00 US/Eastern' AS TIMESTAMP) --> 2000-01-01 05:00:00.000 on Presto 317 --> 2000-01-01 00:00:00.000 on Presto 350 ``` When you convert WITH TIME ZONE types to TIME or TIMESTAMP, timezone is considered in the current version, but not in Presto 350. If you have queries that contain such conversion, you might need to revise them for Presto 350. In the previous example, the query can be rewritten as follows: ``` SELECT CAST( -- Convert to UTC before CAST as TIMESTAMP TIMESTAMP '2000-01-01 00:00:00 US/Eastern' AT TIME ZONE 'UTC' AS TIMESTAMP) ``` or simply use TIMESTAMP WITH TIME ZONE type instead: ``` SELECT CAST('2000-01-01 00:00:00 US/Eastern' AS TIMESTAMP WITH TIME ZONE) ``` In addition, political timezones (e.g. `America/Los_Angeles`) are no longer allowed in TIME WITH TIME ZONE, to avoid issues around DST and possible future policy changes. Therefore, the following query works on the current version, but not on Presto 350. ``` SELECT TIME '01:02:03.456 America/Los_Angeles' ``` Rewrite this query as follows: ``` SELECT TIME '01:02:03.456 -07:00' ``` Note that timezone is case-sensitive in Presto 350. For example, `Asia/tokyo` is invalid and has to be `Asia/Tokyo` when using Presto 350. Query results can be also different. `SELECT CURRENT_TIME` generates `09:29:52.540 UTC` on the current version; in Presto 350, use `09:29:52.540+00:00`. #### Query Length Limitation Presto 350 might generate a longer byte-code internally, so long queries might hit the query length limitation. If you get error messages like the following, you need to shorten the query. - java.lang.IllegalArgumentException: bad parameter count 256 - io.prestosql.spi.PrestoException: Query exceeded maximum columns #### checksum() checksum function implementation has changed in Presto 350. The checksum() function generates a different result from previous Presto versions. #### approx percentile approx_percentile() now uses T-digest as an internal data structure, which is more accurate and faster than the previous version. The function produces slightly different results. However, if you specify the `accuracy` parameter, approx_percentile() doesn't use T-digest because T-digest doesn't allow the `accuracy` parameter. If you want to benefit from T-digest-based approx_percentile(), consider dropping the `accuracy`parameter. Info approx_percentile() doesn't allow infinite values (values divide by zero). If such a value is given, the query fails with `java.lang.IllegalArgumentException: value must be finite` or `java.lang.IllegalArgumentException: value is NaN.`. In this case, you have to exclude infinite values or NaN before approx_percentile(), or you can use the old version of approx_percentile() by specifying `accuracy` parameter intentionally as a workaround as follows: ``` -- Before SELECT approx_percentile( column1 / column2, -- can be infinite value 0.5 -- percentile ) FROM ... -- Exclude invalid values SELECT approx_percentile( column1 / column2, -- can be infinite value 0.5 -- percentile ) filter (where is_finite(x)) FROM ... -- Use old version SELECT approx_percentile( column1 / column2, -- can be infinite value 1, -- weight 0.5, -- percentile 0.01 -- accuracy ) FROM ... ``` #### double NaN to integer Presto 350 doesn't allow converting NaN value to INTEGER, however, it can be converted to 0 by CAST in the current version. The following query works on the current version but fails on Presto 350 with Cannot cast double NaN to integer error message. ``` SELECT CAST(nan() AS INTEGER) ``` You can restore the original behavior by using TRY_CAST and COALESCE as follows: ``` SELECT COALESCE(TRY_CAST(nan() AS INTEGER), 0) ``` #### information_schema.columns `comment` and `extra_info` columns have been removed from `information_schema.columns`. If you have queries that refer to these columns, you have to revise them. Here is a query result of `information_schema.columns` on Presto 350. ``` presto> select * from information_schema.columns limit 10; table_catalog | table_schema | table_name | column_name | ordinal_position | column_default | is_nullable | data_type ---------------+--------------------+------------------+-----------------+------------------+----------------+-------------+----------- td-presto | information_schema | tables | table_catalog | 1 | NULL | YES | varchar td-presto | information_schema | tables | table_schema | 2 | NULL | YES | varchar ... ``` #### Length check for CHAR(x) and VARCHAR(x) A length check for CHAR(X) and VARCHAR(x) has been introduced in Presto 350; extra characters are simply truncated from the results, but you might see the following error in some cases. ```bash io.prestosql.spi.PrestoException: Could not serialize column 'col1' of type 'varchar(10)' at position 1:2 io.prestosql.spi.PrestoException: Character count exceeds length limit 10: base64:xxxxxxxxxxxx ``` You might see this error if your query contains CAST(xxx AS VARCHAR(10)) AS col1 and col1 value is longer than 10. In this case, you need to revise the query to limit the size or without the data size using CAST(xxx AS VARCHAR). #### Trailing delimiter in ARRAY_JOIN() result In the current version, ARRAY_JOIN() appends the delimiter if the last element is NULL, but this is not the case in Presto 350. For example, the following query yields "A,B," on the current Presto version, but "A,B" on Presto 350. The ending comma (,) disappears on Presto 350. ``` SELECT ARRAY_JOIN(ARRAY ['A', 'B', NULL], ',') ``` #### Invalid value error at TRY(CAST(VARCHAR as TIMESTAMP) The following query yields NULL on Presto 317 because although the timestamp format is valid, 32 is invalid as a value of a day of the month. ``` SELECT TRY(CAST('2022-01-32 00:00:00' AS TIMESTAMP) ``` You might see an error like this on Presto 350. ``` Invalid value for DayOfMonth (valid values 1 - 28/31): 32 ``` Instead, use TRY_CAST instead of the combination of TRY and CAST. ``` SELECT TRY_CAST('2022-01-32 00:00:00' AS TIMESTAMP) ``` ## Known Limitations with Presto 350 (Legacy) on TD The Presto query engine has some known limitations. - [Accessing Presto 350 (Legacy)](#accessing-presto-350-legacy) - [TD Presto API Release Information for Presto 350 (Legacy)](#td-presto-api-release-information-for-presto-350-legacy) - [Performance Improvements](#performance-improvements) - [New UDFs](#new-udfs) - [New SQL support](#new-sql-support) - [JDBC Improvements](#jdbc-improvements) - [Time Partitioning Shuffle Magic Comment for INSERT CTAS](#time-partitioning-shuffle-magic-comment-for-insert-ctas) - [Migration Guide for Presto 350 (Legacy)](#migration-guide-for-presto-350-legacy) - [Run the New Presto Version](#run-the-new-presto-version) - [Next Steps](#next-steps) - [Changed Presto Queries](#changed-presto-queries) - [SELECT DISTINCT, ORDER BY](#select-distinct-order-by) - [lag() and lead() ORDER BY](#lag-and-lead-order-by) - [Window Frame](#window-frame) - [TIME and TIMESTAMP Behavior Changes](#time-and-timestamp-behavior-changes) - [Query Length Limitation](#query-length-limitation) - [checksum()](#checksum) - [approx percentile](#approx-percentile) - [double NaN to integer](#double-nan-to-integer) - [information_schema.columns](#information_schemacolumns) - [Length check for CHAR(x) and VARCHAR(x)](#length-check-for-charx-and-varcharx) - [Trailing delimiter in ARRAY_JOIN() result](#trailing-delimiter-in-array_join-result) - [Invalid value error at TRY(CAST(VARCHAR as TIMESTAMP)](#invalid-value-error-at-trycastvarchar-as-timestamp) - [Known Limitations with Presto 350 (Legacy) on TD](#known-limitations-with-presto-350-legacy-on-td) - [Column Name Escaping](#column-name-escaping) - [DELETE Statement](#delete-statement) - [No DELETE Without a WHERE Clause](#no-delete-without-a-where-clause) - [DELETE against Data Tank](#delete-against-data-tank) - [Limit number of input Partitions for DELETE query](#limit-number-of-input-partitions-for-delete-query) - [Multiple DELETE jobs on the same table are NOT allowed when there is overwrap on a partition file](#multiple-delete-jobs-on-the-same-table-are-not-allowed-when-there-is-overwrap-on-a-partition-file) - [Undoing Delete Statements](#undoing-delete-statements) - [DELETE statement Timeout Error](#delete-statement-timeout-error) - [Presto DELETE Statement Resource Consumption](#presto-delete-statement-resource-consumption) - [6-Hour Limit for Presto Queries](#6-hour-limit-for-presto-queries) - [JOIN Order](#join-order) - [INSERT OVERWRITE Statements are NOT Supported](#insert-overwrite-statements-are-not-supported) - [Error Message - Killed by the system because this query generates output for more than 100GB](#error-message---killed-by-the-system-because-this-query-generates-output-for-more-than-100gb) - [Error Message - Output size too large. Exceeds maximum file size: 25GB](#error-message---output-size-too-large-exceeds-maximum-file-size-25gb) - [Error Message - Accessing too many tables - Maximum 300 tables allowed](#error-message---accessing-too-many-tables---maximum-300-tables-allowed) - [Error Message - Creating partitions exceeds the limit of 500000](#error-message---creating-partitions-exceeds-the-limit-of-500000) - [Error Message - Remote page is too large](#error-message---remote-page-is-too-large) ### Column Name Escaping When a column name matches the name of a reserved keyword, the name of the column needs to be quoted. While in Hive, as most of the SQL-based query languages, the quotation character is the backtick character ```. In Presto, quoting of a column name is accomplished with the double-quote character instead `“`. See the following comparison: ``` SELECT `join` FROM mytbl # Hive SELECT "join" FROM mytbl # Presto ``` Quoting a column name in single quotes `‘` will cause the query parser interpret the content as a simple string that might produce an unexpected result. For example: ``` SELECT "join" FROM mytbl LIMIT 10 result: value1 value2 value3 value4 ... value10 SELECT 'join' FROM mytbl result: join join join join ... join ``` ### DELETE Statement The Presto DELETE statement has several known limitations. ### No DELETE Without a WHERE Clause To reduce the occurrence of unintentional DELETE commands, DELETE statements require you to include a WHERE clause. Error sample: ``` Does not support DELETE statement without WHERE condition ``` ### DELETE against Data Tank DELETE statements cannot be issued against data stored in Data Tank. Connect directly to the PostgreSQL Data Tank to issue DELETE statements on data stored there. ### Limit number of input Partitions for DELETE query Treasure Data limits the number of input partitions to 500K for DELETE queries. DELETE queries exceeding the input partition limit may fail. ```sql Cannot delete too many input files more than 500000. Try reducing input files by using 'TD_TIME_RANGE ``` ### Multiple DELETE jobs on the same table are NOT allowed when there is overwrap on a partition file Multiple DELETE jobs on the same table are allowed only when there's no overwrap on a partition file. If the table is partitioned by 1-hour, it can be deleted with different time-range. But there could be large time range data files so it's not always guaranteed to be executed. More precisely, multiple jobs cannot remove the same partition file at the same time. Also when a DELETE job is running and internal optimization process for partitioning merges (deletes small files and creates large ones) the data which DELETE involves, an error occurs: Metadata transaction conflicted with others. In this case, retry the DELETE job manually. ### Undoing Delete Statements If you want to rollback an executed delete statement and recover deleted data, you must contact Treasure Data technical support and provide the jobID of the job that contained the executed Presto DELETE statement. In many cases, you can undo a delete statement. However, certain situations prevent the rollback of Presto DELETE statements. Situations that prevent rollback include: - When new partitions have been deleted by another DELETE statement. - When original partitions have been discarded, which can happen after a retention period. - When a table is dropped and a new table with the same name is created. - When new partitions have been further modified or replaced, which can happen as a result of internal storage maintenance processes. ### DELETE statement Timeout Error Depending on the number of partitions that contain rows to be deleted, a DELETE operation can take several hours. An operation can time out and the job fail if too many partitions are affected. If the job fails and the output contains the following message: ``` org.plazmadb.metadb.MetadataSQLException: ERROR: canceling statement due to statement timeout ``` Then you should rewrite the DELETE statement to affect fewer rows or fewer partitions. The most effective way to do this is to include a WHERE clause that limits the time range for the DELETE with TD_TIME_RANGE or TD_INTERVAL. For example, if you have a year's worth of game history data, instead of ``` DELETE FROM game_history WHERE player_id=1000; ``` try ``` DELETE FROM game_history WHERE player_id=1000 AND TD_TIME_RANGE(time, '2018-01-01', '2018-02-01','PDT') ``` and then delete more time ranges until all rows are deleted. TD support can provide a workflow that simplifies automating this process. ### Presto DELETE Statement Resource Consumption DELETE queries issue jobs and consume resources. Because of the storage and indexing methods used in Treasure Data, a DELETE query can be resource intensive. To prevent a DELETE query from performing a full table scan, you should use: - a time expression, like TD_TIME_RANGE or TD_INTERVAL, if the table uses default time-based partitioning - an equality predicate on all partition keys, if the table uses user-defined partitioning For example, if your table is partitioned by time, and you are trying to delete a set of users that were created in June 2018, include the time restriction even if that's not a critical parameter for your DELETE statement. For example, you should: ``` DELETEFROM logtable WHERE userid IN (1234, 1235, 1236) AND TD_TIME_RANGE(time, '2017-06-01','2017-07-01') ``` Including the time restriction significantly improves the speed of your query, and reduces the resources this query consumes. ### 6-Hour Limit for Presto Queries Presto queries that run more than 6 hours are automatically canceled. Typically, Presto queries that run more than a day or so are failing queries. For long-running queries, you can rewrite the Presto query as a Hive query. ### JOIN Order Presto does not support cost-based JOIN optimizations, meaning JOINs are not automatically reordered based on table size. Make sure that smaller tables are on the right-hand side of JOIN, and they must fit in memory. Otherwise, out of memory exceptions will cause the query to fail. ``` SELECT ... FROM large_table JOIN small_table ``` ### INSERT OVERWRITE Statements are NOT Supported Presto does not support INSERT OVERWRITE Statements. Make sure that you delete the table before using [INSERT INTO](/en/tools/presto/hive_and_presto_query_engine_reference#insert-into). ### Error Message - Killed by the system because this query generates output for more than 100GB This error is caused by trying to generate more than 100GB output by a single query. As a workaround, use [result_output_redirect](/en/tools/presto/presto_query_faqs#use-result_output_redirect) option or [CREATE TABLE AS](https://api-docs.treasuredata.com/en/tools/presto/presto_query_faqs/#how-can-i-speed-up-queries-that-produce-massive-results). - `result_output_redirect` option Solution: [Experimental] [Use result_output_redirect](https://api-docs.treasuredata.com/en/tools/presto/presto_query_faqs/#use-result_output_redirect) - CREATE TABLE AS Solution: [Use CREATE TABLE (table) AS SELECT …](https://api-docs.treasuredata.com/en/tools/presto/presto_query_faqs/#use-create-table-table-as-select-) ### Error Message - Output size too large. Exceeds maximum file size: 25GB This message might appear if result_output_redirect is enabled by default or used in the magic comment. ### Error Message - Accessing too many tables - Maximum 300 tables allowed This error is caused by trying to scan more than 300 tables by a single query. As a workaround, tune the SQL query by reducing scan tables or dividing a single query into multiple jobs. - [Presto Performance Tuning](/en/tools/presto/presto_performance_tuning#presto-performance-tuning) ### Error Message - Creating partitions exceeds the limit of 500000 This error is caused by trying to commit more than 500,000 (500K) partitions by a single query. As a workaround, use User Defined Partitioning (UDP) or tune parameters for UDP. - [User-Defined Partitioning](/en/tools/presto/presto_performance_tuning#defining-partitioning-for-presto) - Tune parameters for UDP. For example: - Reduce bucket_count to 256. (512 in default) - Update max_time_range to adopt a more broad time range. - Increase max_file_size to 512MB. (256MB in default) ### Error Message - Remote page is too large This error is caused by hitting the limits of the intermediate buffers in Presto. The cause could be the size of the row or column being too high. The limit is set to 64MB in TD Presto. As a workaround, reduce the size of a single row: - Reduce the number of columns to be scanned. - Reduce the size of a large column using [SMART_DIGEST()](/en/tools/presto/api#smart_digest). - Use the [Hive engine](/en/tools/presto/quickstart#choosing-an-analytics-engine) instead of Presto.