# databricks>: Execute SQL statements on Databricks

**databricks>** operator executes SQL statements on Databricks SQL warehouses.


```
_export:
  databricks:
    host: my-workspace.cloud.databricks.com
    warehouse_id: abc123def456
    catalog: my_catalog
    schema: my_schema

+query_from_file:
  databricks>: queries/query.sql

+query_with_parameters:
  databricks>:
  statement: select * from my_table where column1 = :value1 and column2 = :value2
  parameters:
    - name: value1
      value: some_value
    - name: value2
      value: 123
      type: INT

+store_query_results:
  databricks>:
  statement: select count(*) as total_count from my_table
  store_last_results: true

+send_email:
    mail>: body.txt
    to: [me@example.com]
    subject: Total count is ${databricks.last_results.total_count}
```

## Secrets

When you don't know how to set secrets, please refer to [Managing Workflow Secret](https://docs.treasuredata.com/articles/pd/about-workflow-secret-management)

### Personal Access Token Authentication (Default)

* **databricks.pat**: STRING
The Databricks Personal Access Token to use for authentication. Use this when `authentication_method: pat` is specified.


### OAuth Authentication M2M

* **databricks.client_id**: STRING
* **databricks.client_secret**: STRING
OAuth client credentials for Databricks authentication. Use these when `authentication_method: oauth` is specified.


## Options

* **databricks>**: FILE.sql
Path to a query template file. This file can contain `${...}` syntax to embed variables.
Examples:


```
  databricks>: queries/my_query.sql
```

* **statement**: STRING
The SQL statement to be executed. Use this to specify the SQL statement inline instead of using a file. This parameter can contain `${...}` syntax to embed variables.
Examples:


```
  statement: select current_version()
```

* **host**: STRING
The hostname of your Databricks workspace (without https://). *Required*.
Examples:


```
  host: my-workspace.cloud.databricks.com
```

* **warehouse_id**: STRING
The ID of the SQL warehouse to use for query execution. *Required*.
Examples:


```
  warehouse_id: abc123def456ghi789
```

* **catalog**: STRING
The catalog to use for the query. If not specified, uses the warehouse's default catalog.
Examples:


```
  catalog: my_catalog
```

* **schema**: STRING
The schema to use for the query. If not specified, uses the warehouse's default schema.
Examples:


```
  schema: my_schema
```

* **authentication_method**: pat | oauth
The authentication method to use. *Default*: `pat`.
Examples:


```
  authentication_method: pat
```


```
  authentication_method: oauth
```

* **parameters**: LIST
List of parameters to bind to the query using Databricks parameter binding syntax (`:parameter_name`). Each parameter must have `name` and `value` fields, with an optional `type` field.
**Required fields**:
  - `name`: Parameter name (must match the `:parameter_name` in your SQL)
  - `value`: Parameter value as a string
**Optional fields**:
  - `type`: Parameter type hint. If it is not specified, STRING is used. Please refer to [Databricks documentation](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-datatypes) for supported types.
Examples:


```
  parameters:
    - name: user_id
      value: "12345"
      type: STRING
    - name: min_score
      value: "100"
      type: INT
    - name: is_active
      value: "true"
      type: BOOLEAN
```

* **store_last_results**: BOOLEAN
Whether to store the results of the SQL statement in the store params. *Default*: `false`.
If true, the first row of the result set is stored in the `databricks.last_results` parameter as an object, where column names serve as the keys. Individual column values can be accessed using `${databricks.last_results.column_name}` syntax.
Examples:


```
  store_last_results: true
```

* **_timeout**: DURATION
The maximum time to wait for the task to complete before timing out. The statement execution is canceled if the timeout is reached.
Examples:


```
  _timeout: 30m
```

## Output parameters

* **databricks.last_statement.id**
The statement ID of the executed query.
Examples:


```
  01f0b498-9c6e-1b84-80a6-d4216951d152
```

* **databricks.last_statement.num_records**
The number of records returned by the query.
Examples:


```
  1523
```

* **databricks.last_results**
The first row of the query results as an object. This is only set when `store_last_results: true` is specified.
For example, if the query `select count(*) AS total_count from my_table` returns a single row with a column named `total_count`, you can access it as `${databricks.last_results.total_count}`.


## Limitations

The operator uses the [Databricks REST API - Statement Execution](https://docs.databricks.com/api/workspace/statementexecution)

The **databricks>** operator does not support multiple statements in a single task. Each query should contain only one SQL statement due to Databricks limitation.

If `store_last_results: true` is specified, the size of first row must be less than 4MiB and each column must be less than 1MiB. Otherwise, the task will fail.