Skip to content

SQL Tips for Hive and Trino(Presto)

For tips specifically concerning Trino(Presto) performance, refer to Trino(Presto) Performance Tuning.

Use of PIVOT and UNPIVOT

You can use the PIVOT and UNPIVOT operators in standard SQL, Hive, and Trino(Presto).

The PIVOT operator transforms rows into columns. The UNPIVOT operator transforms columns into rows.

Example Tables

Example of vertical table (vtable)

uidkeyvalue
101c111
101c212
101c313
102c121
102c222
102c323

Example of horizontal table (htable)

uidc1c2c3
101111213
102212223

Standard SQL

The following examples show how you can use an SQL syntax query language.

PIVOT

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
    SELECT
      uid,
      to_map(
        KEY,
        VALUE
      ) kv
    FROM
      vtable
    GROUP BY
      uid
  ) t
uidc1c2c3
101111213
102212223

UNPIVOT

SELECT uid, 'c1' AS key, c1 AS value FROM htable
UNION ALL
SELECT uid, 'c2' AS key, c2 AS value FROM htable
UNION ALL
SELECT uid, 'c3' AS key, c3 AS value FROM htable
uidkeyvalue
101c111
102c121
101c212
102c222
101c313
102c323

Hive

The following examples show how you can use Hive.

PIVOT

Hive on Treasure Data supports to_map UDAF, which can generate Map type, and then transforms rows into columns. The general Hive function doesn’t offer the same support.

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, to_map(key, value) kv
  FROM vtable
  GROUP BY uid
) t
uidc1c2c3
101111213
102212223

UNPIVOT

LATERAL VIEW explode function transforms columns into rows.

SELECT
  t1.uid,
  t2.key,
  t2.value
FROM
  htable t1
LATERAL VIEW
  explode(
    MAP(
      'c1',
      c1,
      'c2',
      c2,
      'c3',
      c3
    )
  ) t2 AS KEY,
  VALUE
uidkeyvalue
101c111
101c212
101c313
102c121
102c222
102c323

Trino(Presto)

The following examples show how you can use Trino(Presto).

PIVOT

This SQL transforms rows into columns by map_agg function.

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
    SELECT
      uid,
      map_agg(
        KEY,
        VALUE
      ) kv
    FROM
      vtable
    GROUP BY
      uid
  ) t
uidc1c2c3
101111213
102212223

UNPIVOT

CROSS JOIN unnest function is similar to LATERAL VIEW explode function. It also transforms columns into rows.

SELECT
  t1.uid,
  t2.key,
  t2.value
FROM
  htable t1 CROSS
JOIN
  unnest(
    array['c1',
    'c2',
    'c3'],
    array[c1,
    c2,
    c3]
  ) t2(
    KEY,
    VALUE
  )
uidkeyvalue
101c111
101c212
101c313
102c121
102c222
102c323

Getting Data from Multiple Databases

From within in Hive and Trino(Presto), you can create a single query to obtain data from several databases or analyze data in different databases.

Example Databases and Tables

  • Example of animals table in zoo_a database
idnamesex
1Lionm
2Gorillam
3Zebraf
4Giraffef
  • Example of emp animals in zoo_b database
animal_idnamesex
101Lionm
102Bearf
103Elephantf
104Gorillaf
105Tigerm
106Monkeym
107Rhinocerosf

Use of Fully Qualified Table Names

If you use the fully qualified name of a table in the format database.table, you can query and process data from multiple databases.

SELECT
  name,
  sex
FROM
  zoo_a.animals
UNION
SELECT
name,
sex
FROM
zoo_b.animals
namesex
Lionm
Giraffef
Bearf
Monkeym
Gorillam
Gorillaf
Elephantf
Rhinocerosf
Tigerm
Zebraf

Treasure Data Native Data Types

Because Treasure Data supports so many different sources and targets of data, Treasure Data uses a specific set of primitive data types native to our platform.

TD processes queries using different processing engines (Trino(Presto) and Hive). Each engine has its own data type system, and the TD native data types map to types available in the query engine.

When read from and written to the underlying storage layer in TD, which uses MessagePack mpc1 format, these TD types map to MessagePack's formats as well.

Schema relation

Treasure DataTrino(Presto)Hive
intbigintsmallint
intbigintint
longbigintbigint
doubledoubledecimal
floatdoublefloat
doubledoubledouble
Convert to string or intbooleanboolean
stringvarcharstring or varchar
string or Convert to longdatestring
string or Convert to longtimestamptimestamp

Our storage stores data as MessagePack mpc1 format. They support INT as:

positive fixintuint16int16
negative fixintunit32int32
uint8int8

For example, the Hive query result for ResultWorker a float data type becomes a double data type.

TD Primitive Data TypesHive Query Result for ResultWorkerTrino(Presto) Result msgpack ValueType
intint > intInteger
longbigint > longinteger
floatfloat > doublefloat
doubledouble > doublefloat
stringstring > stringstring