# Writing Trino(Presto) Queries The following information can help you learn how to write Trino(Presto) queries. ## Trino(Presto) Options Consider the following Trino(Presto) features when creating queries to run in Treasure Data. ### Use the WITH Clause for Nested Queries The WITH clause is useful for nested queries as shown in this example query: ``` SELECT a, b, c FROM ( SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a ) tbl_alias ``` The same query can be written with the WITH clause as follows: ``` WITH tbl_alias AS( SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a ) SELECT a, b, c FROM tbl_alias ``` Notice that the query has been “de-nested”. The following example shows multiple subqueries use WITH. Each subquery is delimited by a comma. ``` WITH tbl1 AS( SELECT a, MAX(b) AS b, MIN(c) AS c FROM tbl GROUP BY a ), tbl2 AS( SELECT a, AVG(d) AS d FROM another_tbl GROUP BY a ) SELECT tbl1. * , tbl2. * FROM tbl1 JOIN tbl2 ON tbl1.a = tbl2.a ``` ### Using VALUES for Prototyping If you want to quickly test Trino(Presto) syntax, you can use VALUES to create a table immediately. ``` SELECT a, b, c FROM ( VALUES( 1, 2.0, 'x' ), ( 3, 4.0, 'y' ), ( 5, 6.0, 'z' ) ) tbl( a, b, c ) ``` The above query returns the following table: | **a** | **b** | **c** | | --- | --- | --- | | 1 | 2.0 | 'x' | | 3 | 4.0 | 'y' | | 5 | 6.0 | 'z' | ### Clean up a Table To clean up a table before using CREATE TABLE AS or INSERT INTO statements, use multiple statements split by semi-colon. ``` DROP TABLE IF EXISTS mytable ; CREATE TABLE mytable AS SELECT . . . ; ``` ### ALTER TABLE - DROP COLUMN You can include the SQL DDL statement ALTER TABLE...DROP COLUMN SQL in your Treasure Data queries to, for example, deduplicate data. An example is as follows: ``` DROP TABLE IF EXISTS task_temp ; CREATE TABLE task_temp AS SELECT * FROM ( SELECT * , row_number( ) over( partition BY id ORDER BY TD_TIME_PARSE(lastmodifieddate) DESC ) AS rnk FROM task_raw ) WHERE rnk = 1 AND isdeleted = 0 ORDER BY lastmodifieddate DESC ; ALTER TABLE task_temp DROP COLUMN rnk ; ``` ## Limitation in Trino(Presto) on Multiple Updates Multiple-statement execution is not guarded by a transaction, therefore never write multiple update operations in a single job. For example, if you write two or more INSERT INTO statements in a single job, it may produce duplicated records: ``` # I1 INSERT INTO mytable SELECT . . . ; # I2 INSERT INTO mytable SELECT . . . ; ``` If the system finds a sporadic failure during the job execution, it will start the job from the scratch, so the actual execution could be I1 (success), I2 (failure), (retry from the scratch), I1 (success) and I2 (success). In this case, I1 was executed twice and I2 was executed only once. The result is duplicated records in the target table. ## Improving Performance of Very Large Trino(Presto) Queries You can parallelize the query result output process by using the CREATE TABLE AS SELECT statement. If you DROP the table before running the query, your performance is significantly better. The result output performance will be 5x faster than running SELECT *. Treasure Data Trino(Presto) skips the JSON output process and directly produces a 1-hour partitioned table. Without using DROP TABLE, Trino(Presto) uses JSON text to materialize query results. And if the result table contains 100GB of data, the coordinator transfers more than 100GB of JSON text to save the query result. So, even if the query computation is almost finished, output of theJSON results takes a long time. To clean up the result table beforehand: 1. Add a DROP TABLE statement at the top of your query. 2. Use CREATE TABLE (table) AS SELECT … For example, your query might look like this: ``` DROP TABLE IF EXISTS my_result; CREATE TABLE my_result AS SELECT * FROM my_table; ``` ## Supported Time Formats in TD TIME FORMAT UDF This is a summary of the time formats supported by the `TD_TIME_FORMAT_UDF` udf. **Formats** | Syntax | Date or Time Component | Presentation | Examples | | --- | --- | --- | --- | | G | Era designator | Text | AD | | yyyy | Year | Year | 1996 | | yy | Year | Year (2 digits) | 96 | | MMMM | Month in year | Month long name | July | | MMM | Month in year | Month short name | Jul | | MM, M | Month in year | Month number | 07 | | ww, w | Week in year | Number | 06 | | DDD, DD, D | Day in year | Number | 189 | | dd, d | Day in month | Number | 10 | | EEEE | Day in week | Text | Tuesday | | E, EEE | Day in week | Text (short form) | Tue | | a | Am/pm marker | Text | PM | | HH, H | Hour in day (0-23) | Number | 00 | | kk, k | Hour in day (1-24) | Number | 24 | | KK, K | Hour in AM/PM (0-11) | Number | 00 | | hh, h | Hour in AM/PM (1-12) | Number | 12 | | mm, m | Minute in hour | Number | 30 | | ss, s | Second in minute | Number | 55 | | SSS, SS, S | Millisecond | Number | 978 | | wyear | Week of the year | Number | TD_TIME_FORMAT(1582859260, 'wyear' TD_TIME_FORMAT(1582859260, 'wyear', 'UTC')) | | zzzz | Time zone | Zone long name | Pacific Standard Time, or GMT+01:00 | | z | Time zone | Zone short name | PST, or GMT+01:00 | | Z | Time zone | Zone offset | -0800 | | u | Day number of week (1-7) | Number | 1 (for Monday) | **Example** This example formats a UNIX timestamp into a date formatted string. You can use the sample_datasets database for example to try it out: **SQL** ``` SELECT time, TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:mm:ss z') FROM www_access LIMIT 1 ; ``` **Result** ``` 2012-01-01 00: 00: 00 UTC ``` ## SQL DDL Command Differences in Hive and Trino(Presto) Data Definition Language (DDL) is a concept in SQL that can be used to create and modify the structure of database objects or schema objects. You can use DDL statements to create, manage, and delete tables, indexes, users, and other schema objects. As it pertains to data-processing, Trino(Presto) is optimized for quickly processing smaller volumes of information while Hive is better for querying more expansive databases requiring large amounts of memory. Both Hive and Trino(Presto) use DDL commands but in slightly different ways. The following table illustrates DDL commands required for Hive and Trino(Presto). **SQL DDL Commands and Their Equivalents** | SQL DDL Commands | Hive SQL DDL | Trino(Presto) SQL DDL | Query Definition | | --- | --- | --- | --- | | ALTER | ALTER | ALTER | Modifies the structure of an existing table. | | CREATE | CREATE | CREATE | Creates a database or objects including tables, views, stored procedures, etc. | | DROP | DROP | DROP | Deletes an existing database or an object within a database. | | TRUNCATE | TRUNCATE | DELETE | Removes all the records from a table. | ## Trino(Presto) Query Signatures You can view query signatures from the Trino(Presto) Utilization dashboard and use the query signatures to identify and group similar queries. The Trino(Presto) Utilization dashboard is available upon request for most customers. Trino(Presto) query signatures are clusters of Trino(Presto) queries share common SQL statements. Similar jobs, jobs with only a few differences (such as the date range computed), are grouped together. You can then analyze the clusters of queries and determine which queries require the most or least resources in a given month. The Trino(Presto) Query Details table, which is posted on the Trino(Presto) Utilization dashboard, includes a table column: Query Names or Signature. - [Query Signature Definitions for Common Letters](#query-signature-definitions-for-common-letters) - [Example of Query Signatures](#example-of-query-signatures) ### Query Signature Definitions for Common Letters Query Signatures are subject to change. The following list defines common query letters that help to decipher the query that the query signature represents: | Query Letter | Description | | --- | --- | | S | Select `|S\[\* \]` means SELECT `\*` is used (retrieving all columns. This argument might be inefficient because our storage is columnar format.) | | Cnt | count(x) | | CntD | count(distinct x) (If query has this signature, it means a single node process = slow!) | | T | table | | G | Group by | | J | Join | | O | Order by (single node process = slow!) | | L | Limit | | CT | Create table | | I | Insert Into | | DT | Delete table | | D | Delete | | E | Duplicate Elimination (select distinct, union, etc.) | | U | Union All | | E(U(…)) | Union (with duplicate elimination) | ### Example of Query Signatures The following example shows SQL statements mapped to query signatures: | Query Signature | Results | | --- | --- | | SELECT 1 | `S(V)` | | SELECT * FROM t | `S * t - > #` | | INSERT INTO A SELECT * FROM B | `B I(``S *``) B - > A` | | SELECT * FROM t, ‘query’ | `S[*](T)` | | SELECT * FROM t, ‘table’ | `t - > #` | ## Trino(Presto) Queries What follows is not a complete set of SQL statements. These are just some basic SQL functions that you might use in your queries. Info Before you begin writing your queries, review [Trino(Presto) Query Runtime Limits and Query Hint Override](/en/tools/presto/presto_troubleshooting#troubleshooting-presto-query-runtime-limits-and-query-hint-override). ### Select Data You can use the SELECT statement to select data from a database. The data returned is stored in a results table, called the result-set. SELECT … FROM … WHERE … LIMIT **Example** ``` SELECT contact_id, last_name, first_name FROM contacts WHERE website = "Company_Organization.com" ORDER BY contact_id DESC LIMIT 5; ``` ### Aggregate Data Use aggregate functions to perform a calculation on one or more values and return a single value. #### GROUP BY Use the `GROUP BY` command to group the result set (used with aggregate functions: COUNT, MAX, MIN, SUM, AVG). **Example** ``` SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country; ``` #### MIN()/MAX() The MIN() function returns the smallest value of the selected column. **Example** ``` SELECT MIN(columnname) FROM table_name WHERE condition; ``` The MAX() function returns the largest value of the selected column. **Example** ``` SELECT MAX(column_name) FROM table_name WHERE condition; ``` #### MIN_BY/MAX_BY `MIN_BY` takes two arguments and returns the value of the first argument for which the value of the second argument is minimized. ``` SELECT MIN_BY(close_date, close_value) as date_of_min_sale FROM sales_pipeline ``` `MAX_BY` takes two arguments and returns the value of the first argument for which the value of the second argument is maximized. #### APPROX_DISTINCT `APPROX_DISTINCT` calculates an approximate count of the number of distinct values. **Example** ``` SELECT APPROX_DISTINCT(close_value) FROM sales_pipeline ``` #### APPROX_PERCENTILE `APPROX_PRECENTILE` calculates the value at a given percentile of a distribution of values. **Example** ``` SELECT APPROX_PERCENTILE(close_value, 0.75) FROM sales_pipeline ``` ### Work with Time Ranges #### Choose a Target Data Time Range Use [TD_TIME_RANGE](/en/tools/presto/api#td_time_range) and [TD_INTERVAL](/en/tools/presto/api#td_interval). #### Format Date Time Strings Use Use [TD_TIME_FORMAT](/en/tools/presto/api#td_time_format) and [TD_TIME_STRING](/en/tools/presto/api#td_time_string). #### Truncate Time into Hour, Day, and Month Use [TD_DATE_TRUNC](/en/tools/presto/api#td_date_trunc) #### Specify a Query Target Range Use [TD_SCHEDULED_TIME](/en/tools/presto/api#td_scheduled_time) ### Create and Update Tables from SQL Results The following SQL statements can help you create and update tables in Trino(Presto). #### CREATE TABLE … AS Create a new table containing the result of a select query. Use CREATE TABLE to create an empty table. You might want to use these optional clauses: - `IF NOT EXISTS` clause causes the error to be suppressed if the table already exists. - `WITH` clause can be used to set properties on the newly created table. **Example** ``` CREATE TABLE [ IF NOT EXISTS ] table_name [ ( column_alias, ... ) ] [ COMMENT table_comment ] [ WITH ( property_name = expression [, ...] ) ] AS query [ WITH [ NO ] DATA ] ``` #### INSERT INTO Insert data into an existing table. The following example uses `person` as the name of the table. **Example** ``` insert into person values (1, 'james', 10, '20190301'); ``` ### Delete Records Delete rows from a table. If you specify a `WHERE` clause, only the matching rows are deleted. Otherwise, all rows from the table are deleted. The WHERE clause also helps reduce unintentional DELETE commands. DELETE statements are helpful for: - Cleaning up error records -Removing duplicate records - Removing unnecessary or unexpected results from result tables #### Limitations Review [Delete Statement Syntax](/en/tools/presto/api#delete-statement-syntax) and known limitations for the [Trino(Presto) Delete Statement](/en/tools/presto/td_presto_api_release_info#delete-statement). #### Syntax ``` DELETE FROM [ WHERE ] ``` Where: - is the name of the table for deletion - `` is the clause to specify a search condition **Examples** Delete all products shipped by boat. ``` DELETE FROM products WHERE shipmode = 'BOAT'; ``` Delete all products for low priority orders. ``` DELETE FROM products WHERE orderkey IN (SELECT orderkey FROM orders WHERE priority = 'LOW'); ``` Delete all products. ``` DELETE FROM products; ``` ### Perform Window Functions Window functions perform calculations across rows of the query result. They run after the `HAVING` clause but before the `ORDER BY` clause. Invoking a window function requires special syntax using the `OVER` clause to specify the window. A window has these three components: - **Partition Specification**. Separates the input rows into different partitions. This is analogous to how the `GROUP BY` clause separates rows into different groups for aggregate functions. - **Ordering Specification**. Determines the order in which input rows are processed by the window function. - **Window Frame**. Specifies a sliding window of rows to be processed by the function for a given row. If the frame is not specified, it defaults to `RANGE UNBOUNDED PRECEDING, which is the same as `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This frame contains all rows from the start of the partition up to the last peer of the current row. For example, the following query ranks orders for each clerk by price. **Example** ``` SELECT orderkey, clerk, totalprice, rank() OVER (PARTITION BY clerk ORDER BY totalprice DESC) AS rnk FROM orders ORDER BY clerk, rnk ``` ### Access Array Type Data #### ARRAY_AGG You can use the aggregate function, array_agg, with ARRAY data types that allow arrays to be created as part of the SELECT list of a query. **Example** The following target table contains one row per employee and stores all phone numbers associated with an employee in an ARRAY type. ``` CREATE SET TABLE employeePhoneInfo (emp_id INTEGER, emp_name VARCHAR(30), emp_phone emp_phone_array); INSERT INTO employeePhoneInfo SELECT emp_id, emp_name, ARRAY_AGG(emp_phone, NEW emp_phone_array()) FROM employee GROUP BY emp_id,emp_name WHERE emp_id < 100; ``` #### AWS Lambda AWS Lambda is a compute service that lets you run code without provisioning or managing servers. **Example** This example creates an array (fruits). The `has_elements` flag identifies whether or not more elements need to be processed. ``` exports.handler = (event, context, callback) => var fruits = ['apple', 'orange', 'pinaple']; event.fruits = fruits; if (event.fruits.length > 0) event.has_elements = true; else event.has_elements = false; callback(null, event); ; ``` This example processes the array elements. We want to remove processed elements from the array. The `.shift()` method removes the first element from the array. ``` exports.handler = (event, context, callback) => let fruit = event.fruits.shift(); // Process array element console.log('Processing fruit: ' + fruit); // Array still has elements: if (event.fruits.length > 0) event.has_elements = true; else event.has_elements = false; // Log array elements (for demonstration purpose) console.log('Elements in array: ' + event.fruits); console.log('Array has more elements: ' + event.has_elements); callback(null, event); ; ``` ### Process Data #### Data Type Conversion Functions Trino(Presto) implicity converts numeric and character values to the correct type if such a conversion is possible. Trino(Presto) does not convert between character and numeric types. For example, a query that expects a `varchar` does not automatically convert a `bigint` value to an equivalent `varchar`. When necessary, values can be explicitly cast to a particular type. **Conversion Functions** `cast(value AS type) → type` Explicitly cast a value as a type. This can be used to cast a varchar to a numeric value type and vice versa. `try_cast(value AS type) → type` Like `cast()`, but returns null if the cast fails. #### String Functions You can write string functions to extract, find like, replace, and split data. Learn more about [Trino Expression Functions](https://trino.io/docs/current/functions/regexp.html). #### Process JSON Use JSON functions and operators process data. Learn more about [Trino JSON Functions and Operators](https://trino.io/docs/current/functions/json.html) #### Numbers (integer, float) There are many mathematical operators available in Trino(Presto) that you can use to process data. Here are a few you might find useful: - `ceil(_x_) → [same as input]` This is an alias for `ceiling()`. - `floor(x) → [same as input]` Returns `x` rounded down to the nearest integer. - `round(x, d) → [same as input]` Returns `x` rounded to `d` decimal places To review all mathematical operators, see [Trino(Presto) Documentation: Mathematical Functions and Operators](https://trino.io/docs/current/functions/math.html).