databricks> operator executes SQL statements on Databricks SQL warehouses.
_export:
databricks:
host: my-workspace.cloud.databricks.com
warehouse_id: abc123def456
catalog: my_catalog
schema: my_schema
+query_from_file:
databricks>: queries/query.sql
+query_with_parameters:
databricks>:
statement: select * from my_table where column1 = :value1 and column2 = :value2
parameters:
- name: value1
value: some_value
- name: value2
value: 123
type: INT
+store_query_results:
databricks>:
statement: select count(*) as total_count from my_table
store_last_results: true
+send_email:
mail>: body.txt
to: [me@example.com]
subject: Total count is ${databricks.last_results.total_count}When you don't know how to set secrets, please refer to Managing Workflow Secret
databricks.pat: STRING
The Databricks Personal Access Token to use for authentication. Use this when
authentication_method: patis specified.
databricks.client_id: STRING
databricks.client_secret: STRING
OAuth client credentials for Databricks authentication. Use these when
authentication_method: oauthis specified.
databricks>: FILE.sql
Path to a query template file. This file can contain
${...}syntax to embed variables.Examples:
databricks>: queries/my_query.sqlstatement: STRING
The SQL statement to be executed. Use this to specify the SQL statement inline instead of using a file. This parameter can contain
${...}syntax to embed variables.Examples:
statement: select current_version()host: STRING
The hostname of your Databricks workspace (without https://). Required.
Examples:
host: my-workspace.cloud.databricks.comwarehouse_id: STRING
The ID of the SQL warehouse to use for query execution. Required.
Examples:
warehouse_id: abc123def456ghi789catalog: STRING
The catalog to use for the query. If not specified, uses the warehouse's default catalog.
Examples:
catalog: my_catalogschema: STRING
The schema to use for the query. If not specified, uses the warehouse's default schema.
Examples:
schema: my_schemaauthentication_method: pat | oauth
The authentication method to use. Default:
pat.Examples:
authentication_method: pat authentication_method: oauthparameters: LIST
List of parameters to bind to the query using Databricks parameter binding syntax (
:parameter_name). Each parameter must havenameandvaluefields, with an optionaltypefield.Required fields:
name: Parameter name (must match the:parameter_namein your SQL)value: Parameter value as a string
Optional fields:
type: Parameter type hint. If it is not specified, STRING is used. Please refer to Databricks documentation for supported types. Examples:
parameters:
- name: user_id
value: "12345"
type: STRING
- name: min_score
value: "100"
type: INT
- name: is_active
value: "true"
type: BOOLEANstore_last_results: BOOLEAN
Whether to store the results of the SQL statement in the store params. Default:
false.If true, the first row of the result set is stored in the
databricks.last_resultsparameter as an object, where column names serve as the keys. Individual column values can be accessed using${databricks.last_results.column_name}syntax.Examples:
store_last_results: true_timeout: DURATION
The maximum time to wait for the task to complete before timing out. The statement execution is canceled if the timeout is reached.
Examples:
_timeout: 30mdatabricks.last_statement.id
The statement ID of the executed query.
Examples:
01f0b498-9c6e-1b84-80a6-d4216951d152databricks.last_statement.num_records
The number of records returned by the query.
Examples:
1523databricks.last_results
The first row of the query results as an object. This is only set when
store_last_results: trueis specified.For example, if the query
select count(*) AS total_count from my_tablereturns a single row with a column namedtotal_count, you can access it as${databricks.last_results.total_count}.
The operator uses the Databricks REST API - Statement Execution
The databricks> operator does not support multiple statements in a single task. Each query should contain only one SQL statement due to Databricks limitation.
If store_last_results: true is specified, the size of first row must be less than 4MiB and each column must be less than 1MiB. Otherwise, the task will fail.