Skip to content

databricks>: Execute SQL statements on Databricks

databricks> operator executes SQL statements on Databricks SQL warehouses.

_export:
  databricks:
    host: my-workspace.cloud.databricks.com
    warehouse_id: abc123def456
    catalog: my_catalog
    schema: my_schema

+query_from_file:
  databricks>: queries/query.sql

+query_with_parameters:
  databricks>:
  statement: select * from my_table where column1 = :value1 and column2 = :value2
  parameters:
    - name: value1
      value: some_value
    - name: value2
      value: 123
      type: INT

+store_query_results:
  databricks>:
  statement: select count(*) as total_count from my_table
  store_last_results: true

+send_email:
    mail>: body.txt
    to: [me@example.com]
    subject: Total count is ${databricks.last_results.total_count}

Secrets

When you don't know how to set secrets, please refer to Managing Workflow Secret

Personal Access Token Authentication (Default)

  • databricks.pat: STRING

    The Databricks Personal Access Token to use for authentication. Use this when authentication_method: pat is specified.

OAuth Authentication M2M

  • databricks.client_id: STRING

  • databricks.client_secret: STRING

    OAuth client credentials for Databricks authentication. Use these when authentication_method: oauth is specified.

Options

  • databricks>: FILE.sql

    Path to a query template file. This file can contain ${...} syntax to embed variables.

    Examples:

  databricks>: queries/my_query.sql
  • statement: STRING

    The SQL statement to be executed. Use this to specify the SQL statement inline instead of using a file. This parameter can contain ${...} syntax to embed variables.

    Examples:

  statement: select current_version()
  • host: STRING

    The hostname of your Databricks workspace (without https://). Required.

    Examples:

  host: my-workspace.cloud.databricks.com
  • warehouse_id: STRING

    The ID of the SQL warehouse to use for query execution. Required.

    Examples:

  warehouse_id: abc123def456ghi789
  • catalog: STRING

    The catalog to use for the query. If not specified, uses the warehouse's default catalog.

    Examples:

  catalog: my_catalog
  • schema: STRING

    The schema to use for the query. If not specified, uses the warehouse's default schema.

    Examples:

  schema: my_schema
  • authentication_method: pat | oauth

    The authentication method to use. Default: pat.

    Examples:

  authentication_method: pat
  authentication_method: oauth
  • parameters: LIST

    List of parameters to bind to the query using Databricks parameter binding syntax (:parameter_name). Each parameter must have name and value fields, with an optional type field.

    Required fields:

    • name: Parameter name (must match the :parameter_name in your SQL)
    • value: Parameter value as a string

    Optional fields:

    • type: Parameter type hint. If it is not specified, STRING is used. Please refer to Databricks documentation for supported types. Examples:
  parameters:
    - name: user_id
      value: "12345"
      type: STRING
    - name: min_score
      value: "100"
      type: INT
    - name: is_active
      value: "true"
      type: BOOLEAN
  • store_last_results: BOOLEAN

    Whether to store the results of the SQL statement in the store params. Default: false.

    If true, the first row of the result set is stored in the databricks.last_results parameter as an object, where column names serve as the keys. Individual column values can be accessed using ${databricks.last_results.column_name} syntax.

    Examples:

  store_last_results: true
  • _timeout: DURATION

    The maximum time to wait for the task to complete before timing out. The statement execution is canceled if the timeout is reached.

    Examples:

  _timeout: 30m

Output parameters

  • databricks.last_statement.id

    The statement ID of the executed query.

    Examples:

  01f0b498-9c6e-1b84-80a6-d4216951d152
  • databricks.last_statement.num_records

    The number of records returned by the query.

    Examples:

  1523
  • databricks.last_results

    The first row of the query results as an object. This is only set when store_last_results: true is specified.

    For example, if the query select count(*) AS total_count from my_table returns a single row with a column named total_count, you can access it as ${databricks.last_results.total_count}.

Limitations

The operator uses the Databricks REST API - Statement Execution

The databricks> operator does not support multiple statements in a single task. Each query should contain only one SQL statement due to Databricks limitation.

If store_last_results: true is specified, the size of first row must be less than 4MiB and each column must be less than 1MiB. Otherwise, the task will fail.