# databricks>: Execute SQL statements on Databricks **databricks>** operator executes SQL statements on Databricks SQL warehouses. ``` _export: databricks: host: my-workspace.cloud.databricks.com warehouse_id: abc123def456 catalog: my_catalog schema: my_schema +query_from_file: databricks>: queries/query.sql +query_with_parameters: databricks>: statement: select * from my_table where column1 = :value1 and column2 = :value2 parameters: - name: value1 value: some_value - name: value2 value: 123 type: INT +store_query_results: databricks>: statement: select count(*) as total_count from my_table store_last_results: true +send_email: mail>: body.txt to: [me@example.com] subject: Total count is ${databricks.last_results.total_count} ``` ## Secrets When you don't know how to set secrets, please refer to [Managing Workflow Secret](https://docs.treasuredata.com/articles/pd/about-workflow-secret-management) ### Personal Access Token Authentication (Default) * **databricks.pat**: STRING The Databricks Personal Access Token to use for authentication. Use this when `authentication_method: pat` is specified. ### OAuth Authentication M2M * **databricks.client_id**: STRING * **databricks.client_secret**: STRING OAuth client credentials for Databricks authentication. Use these when `authentication_method: oauth` is specified. ## Options * **databricks>**: FILE.sql Path to a query template file. This file can contain `${...}` syntax to embed variables. Examples: ``` databricks>: queries/my_query.sql ``` * **statement**: STRING The SQL statement to be executed. Use this to specify the SQL statement inline instead of using a file. This parameter can contain `${...}` syntax to embed variables. Examples: ``` statement: select current_version() ``` * **host**: STRING The hostname of your Databricks workspace (without https://). *Required*. Examples: ``` host: my-workspace.cloud.databricks.com ``` * **warehouse_id**: STRING The ID of the SQL warehouse to use for query execution. *Required*. Examples: ``` warehouse_id: abc123def456ghi789 ``` * **catalog**: STRING The catalog to use for the query. If not specified, uses the warehouse's default catalog. Examples: ``` catalog: my_catalog ``` * **schema**: STRING The schema to use for the query. If not specified, uses the warehouse's default schema. Examples: ``` schema: my_schema ``` * **authentication_method**: pat | oauth The authentication method to use. *Default*: `pat`. Examples: ``` authentication_method: pat ``` ``` authentication_method: oauth ``` * **parameters**: LIST List of parameters to bind to the query using Databricks parameter binding syntax (`:parameter_name`). Each parameter must have `name` and `value` fields, with an optional `type` field. **Required fields**: - `name`: Parameter name (must match the `:parameter_name` in your SQL) - `value`: Parameter value as a string **Optional fields**: - `type`: Parameter type hint. If it is not specified, STRING is used. Please refer to [Databricks documentation](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-datatypes) for supported types. Examples: ``` parameters: - name: user_id value: "12345" type: STRING - name: min_score value: "100" type: INT - name: is_active value: "true" type: BOOLEAN ``` * **store_last_results**: BOOLEAN Whether to store the results of the SQL statement in the store params. *Default*: `false`. If true, the first row of the result set is stored in the `databricks.last_results` parameter as an object, where column names serve as the keys. Individual column values can be accessed using `${databricks.last_results.column_name}` syntax. Examples: ``` store_last_results: true ``` * **_timeout**: DURATION The maximum time to wait for the task to complete before timing out. The statement execution is canceled if the timeout is reached. Examples: ``` _timeout: 30m ``` ## Output parameters * **databricks.last_statement.id** The statement ID of the executed query. Examples: ``` 01f0b498-9c6e-1b84-80a6-d4216951d152 ``` * **databricks.last_statement.num_records** The number of records returned by the query. Examples: ``` 1523 ``` * **databricks.last_results** The first row of the query results as an object. This is only set when `store_last_results: true` is specified. For example, if the query `select count(*) AS total_count from my_table` returns a single row with a column named `total_count`, you can access it as `${databricks.last_results.total_count}`. ## Limitations The operator uses the [Databricks REST API - Statement Execution](https://docs.databricks.com/api/workspace/statementexecution) The **databricks>** operator does not support multiple statements in a single task. Each query should contain only one SQL statement due to Databricks limitation. If `store_last_results: true` is specified, the size of first row must be less than 4MiB and each column must be less than 1MiB. Otherwise, the task will fail.