bq_load>: Importing Data into Google BigQuery
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude

bq_load> operator can be used to import data into Google BigQuery tables.

_export:
  bq:
    dataset: my_dataset

+ingest:
  bq_load>: gs://my_bucket/data.csv
  destination_table: my_data

+process:
  bq>: queries/process.sql
  destination_table: my_result

Secrets

When you don't know how to set secrets, please refer to Managing Workflow Secret

gcp.credential: CREDENTIAL
See gcp.credential.

Options

bq_load>: URI | LIST
A URI or list of URIs identifying files in GCS to import.
Examples:

  bq_load>: gs://my_bucket/data.csv

  bq_load>:
    - gs://my_bucket/data1.csv.gz
    - gs://my_bucket/data2_*.csv.gz

dataset: NAME
The dataset that the destination table is located in or should be created in. Can also be specified directly in the table reference.
Examples:

  dataset: my_dataset

  dataset: my_project:my_dataset

destination_table: NAME
The table to store the imported data in.
Examples:

  destination_table: my_result_table

  destination_table: some_dataset.some_table

  destination_table: some_project:some_dataset.some_table

You can append a date as $YYYYMMDD form at the end of table name to store data in a specific partition. See Creating and Updating Date-Partitioned Tables document for details.

  destination_table: some_dataset.some_partitioned_table$20160101

location: LOCATION
The location where the job should run. The source GCS bucket and the table must be in this location. See BigQuery locations for a list of available locations.
Examples:

  location: asia-northeast1

project: NAME
The project that the table is located in or should be created in. Can also be specified directly in the table reference or the dataset parameter.
source_format: CSV | NEWLINE_DELIMITED_JSON | AVRO | DATASTORE_BACKUP
The format of the files to be imported. Default: CSV.
Examples:

  source_format: CSV

  source_format: NEWLINE_DELIMITED_JSON

  source_format: AVRO

  source_format: DATASTORE_BACKUP

field_delimiter: CHARACTER
The separator used between fields in CSV files to be imported. Default: ,.
Examples:

  field_delimiter: '\\t'

create_disposition: CREATE_IF_NEEDED | CREATE_NEVER
Specifies whether the destination table should be automatically created when performing the import.
- CREATE_IF_NEEDED: (default) The destination table is created if it does not already exist.
- CREATE_NEVER: The destination table must already exist, otherwise the import will fail.
Examples:

  create_disposition: CREATE_IF_NEEDED

  create_disposition: CREATE_NEVER

write_disposition: WRITE_TRUNCATE | WRITE_APPEND | WRITE_EMPTY
Specifies whether to permit importing data to an already existing destination table.
- WRITE_TRUNCATE: If the destination table already exists, any data in it will be overwritten.
- WRITE_APPEND: If the destination table already exists, any data in it will be appended to.
- WRITE_EMPTY: (default) The import fails if the destination table already exists and is not empty.
Examples:

  write_disposition: WRITE_TRUNCATE

  write_disposition: WRITE_APPEND

  write_disposition: WRITE_EMPTY

skip_leading_rows: INTEGER
The number of leading rows to skip in CSV files to import. Default: 0.
Examples:

  skip_leading_rows: 1

encoding: UTF-8 | ISO-8859-1 The character encoding of the data in the files to import. Default: UTF-8.
Examples:

  encoding: ISO-8859-1

quote: CHARACTER
The character quote of the data in the files to import. Default: '"'.
Examples:

  quote: ''

  quote: "'"

max_bad_records: INTEGER
The maximum number of bad records to ignore before failing the import. Default: 0.
Examples:

  max_bad_records: 100

allow_quoted_newlines: BOOLEAN
Whether to allow quoted data sections that contain newline characters in a CSV file. Default: false.
allow_jagged_rows: BOOLEAN
Whether to accept rows that are missing trailing optional columns in CSV files. Default: false.
ignore_unknown_values: BOOLEAN
Whether to ignore extra values in data that are not represented in the table schema. Default: false.
projection_fields: LIST
A list of names of Cloud Datastore entity properties to load. Requires source_format: DATASTORE_BACKUP.
autodetect: BOOLEAN
Whether to automatically infer options and schema for CSV and JSON sources. Default: false.
schema_update_options: LIST
A list of destination table schema updates that may be automatically performed when performing the import.

  schema_update_options:
    - ALLOW_FIELD_ADDITION
    - ALLOW_FIELD_RELAXATION

schema: OBJECT | STRING

A table schema. It can accept object, json or yml file path.

Example:

You can write schema within .dag file directly.

+step:
  bq_load>: gs://<bucket>/path/to_file
  ...
  schema:
    fields:
      - name: "name"
        type: "string"
      ...


  Or you can write it as external file.

  ```java
  {
    "fields": [
      {"name": "name", "type": "STRING"},
      ...
    ]
  }

fields:
  - name: "name"
    type: "string"
  ...

And specify the file path. Supported formats are YAML and JSON. If an extension of the path is .json bq_load try parse as JSON, otherwise YAML.

+step:
  bq_load>: gs://<bucket>/path/to_file
  ...
  schema: path/to/schema.json
  # or
  # schema: path/to/schema.yml

Output parameters

bq.last_job_id
The id of the BigQuery job that performed this import.
Note: bq.last_jobid parameter is kept only for backward compatibility but you must not use it because it will be removed in a near future release.

bq_load>: Importing Data into Google BigQueryCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from Claude

Secrets

Options

Output parameters

Was this helpful?

bq_load>: Importing Data into Google BigQuery
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude