SQL Tips for Hive and Trino(Presto)
For tips specifically concerning Trino(Presto) performance, refer to Trino(Presto) Performance Tuning.
Use of PIVOT and UNPIVOT
You can use the PIVOT and UNPIVOT operators in standard SQL, Hive, and Trino(Presto).
The PIVOT operator transforms rows into columns. The UNPIVOT operator transforms columns into rows.
Example Tables
Example of vertical table (vtable)
uid | key | value |
---|---|---|
101 | c1 | 11 |
101 | c2 | 12 |
101 | c3 | 13 |
102 | c1 | 21 |
102 | c2 | 22 |
102 | c3 | 23 |
Example of horizontal table (htable)
uid | c1 | c2 | c3 |
---|---|---|---|
101 | 11 | 12 | 13 |
102 | 21 | 22 | 23 |
Standard SQL
The following examples show how you can use an SQL syntax query language.
PIVOT
SELECT
uid,
kv['c1'] AS c1,
kv['c2'] AS c2,
kv['c3'] AS c3
FROM (
SELECT
uid,
to_map(
KEY,
VALUE
) kv
FROM
vtable
GROUP BY
uid
) t
uid | c1 | c2 | c3 |
---|---|---|---|
101 | 11 | 12 | 13 |
102 | 21 | 22 | 23 |
UNPIVOT
SELECT uid, 'c1' AS key, c1 AS value FROM htable
UNION ALL
SELECT uid, 'c2' AS key, c2 AS value FROM htable
UNION ALL
SELECT uid, 'c3' AS key, c3 AS value FROM htable
uid | key | value |
---|---|---|
101 | c1 | 11 |
102 | c1 | 21 |
101 | c2 | 12 |
102 | c2 | 22 |
101 | c3 | 13 |
102 | c3 | 23 |
Hive
The following examples show how you can use Hive.
PIVOT
Hive on Treasure Data supports to_map
UDAF, which can generate Map type, and then transforms rows into columns. The general Hive function doesn’t offer the same support.
SELECT
uid,
kv['c1'] AS c1,
kv['c2'] AS c2,
kv['c3'] AS c3
FROM (
SELECT uid, to_map(key, value) kv
FROM vtable
GROUP BY uid
) t
uid | c1 | c2 | c3 |
---|---|---|---|
101 | 11 | 12 | 13 |
102 | 21 | 22 | 23 |
UNPIVOT
LATERAL VIEW explode
function transforms columns into rows.
SELECT
t1.uid,
t2.key,
t2.value
FROM
htable t1
LATERAL VIEW
explode(
MAP(
'c1',
c1,
'c2',
c2,
'c3',
c3
)
) t2 AS KEY,
VALUE
uid | key | value |
---|---|---|
101 | c1 | 11 |
101 | c2 | 12 |
101 | c3 | 13 |
102 | c1 | 21 |
102 | c2 | 22 |
102 | c3 | 23 |
Trino(Presto)
The following examples show how you can use Trino(Presto).
PIVOT
This SQL transforms rows into columns by map_agg
function.
SELECT
uid,
kv['c1'] AS c1,
kv['c2'] AS c2,
kv['c3'] AS c3
FROM (
SELECT
uid,
map_agg(
KEY,
VALUE
) kv
FROM
vtable
GROUP BY
uid
) t
uid | c1 | c2 | c3 |
---|---|---|---|
101 | 11 | 12 | 13 |
102 | 21 | 22 | 23 |
UNPIVOT
CROSS JOIN unnest
function is similar to LATERAL VIEW explode
function. It also transforms columns into rows.
SELECT
t1.uid,
t2.key,
t2.value
FROM
htable t1 CROSS
JOIN
unnest(
array['c1',
'c2',
'c3'],
array[c1,
c2,
c3]
) t2(
KEY,
VALUE
)
uid | key | value |
---|---|---|
101 | c1 | 11 |
101 | c2 | 12 |
101 | c3 | 13 |
102 | c1 | 21 |
102 | c2 | 22 |
102 | c3 | 23 |
Getting Data from Multiple Databases
From within in Hive and Trino(Presto), you can create a single query to obtain data from several databases or analyze data in different databases.
Example Databases and Tables
- Example of animals table in zoo_a database
id | name | sex |
---|---|---|
1 | Lion | m |
2 | Gorilla | m |
3 | Zebra | f |
4 | Giraffe | f |
- Example of emp animals in zoo_b database
animal_id | name | sex |
---|---|---|
101 | Lion | m |
102 | Bear | f |
103 | Elephant | f |
104 | Gorilla | f |
105 | Tiger | m |
106 | Monkey | m |
107 | Rhinoceros | f |
Use of Fully Qualified Table Names
If you use the fully qualified name of a table in the format database.table
, you can query and process data from multiple databases.
SELECT
name,
sex
FROM
zoo_a.animals
UNION
SELECT
name,
sex
FROM
zoo_b.animals
name | sex |
---|---|
Lion | m |
Giraffe | f |
Bear | f |
Monkey | m |
Gorilla | m |
Gorilla | f |
Elephant | f |
Rhinoceros | f |
Tiger | m |
Zebra | f |
Treasure Data Native Data Types
Because Treasure Data supports so many different sources and targets of data, Treasure Data uses a specific set of primitive data types native to our platform.
TD processes queries using different processing engines (Trino(Presto) and Hive). Each engine has its own data type system, and the TD native data types map to types available in the query engine.
When read from and written to the underlying storage layer in TD, which uses MessagePack mpc1 format, these TD types map to MessagePack's formats as well.
Schema relation
Treasure Data | Trino(Presto) | Hive |
---|---|---|
int | bigint | smallint |
int | bigint | int |
long | bigint | bigint |
double | double | decimal |
float | double | float |
double | double | double |
Convert to string or int | boolean | boolean |
string | varchar | string or varchar |
string or Convert to long | date | string |
string or Convert to long | timestamp | timestamp |
Our storage stores data as MessagePack mpc1 format. They support INT as:
positive fixint | uint16 | int16 |
negative fixint | unit32 | int32 |
uint8 | int8 |
For example, the Hive query result for ResultWorker a float data type becomes a double data type.
TD Primitive Data Types | Hive Query Result for ResultWorker | Trino(Presto) Result msgpack ValueType |
---|---|---|
int | int > int | Integer |
long | bigint > long | integer |
float | float > double | float |
double | double > double | float |
string | string > string | string |