# What is ID Unification?

## Overview

ID Unification is the process of stitching together multiple tables using various identifiers to assign a unique customer ID (`canonical_id` or `persistent_id`) to each user. In simpler terms, it consolidates identifiers like `cookie_id` and email addresses from various user data sources to identify and group "the same person."

Since customer data often contains different identifiers across different data sources, simply aggregating this data doesn't link these sources together. This necessitates the ID Unification process to make the data usable.

![Relationship Between Different Identifiers](/assets/1-1-1.7184d27f664ca062a998cd352bbf6e97db0fa2025ad688ced24fed94e1fd2f33.8cab2a52.png)

The above diagram illustrates the relationships between the IDs (identifiers) associated with users. Below, we outline common types of IDs that are unified, showing how the process integrates data to uniquely identify individuals.

## Types of IDs Linked to Users

### User-Associated IDs

These are identifiers issued by various services, such as membership IDs or email addresses used during registration.

**Examples:** `member_id`, `customer_id`, `email_address`

### Device-Associated IDs

IDs issued for each device, such as ADID/IDFA, are used when collecting application logs.

**Examples:** `ADID`, `IDFA`

### Browser-Associated IDs

Cookies issued per browser and source (1st-party or 3rd-party) are used. For stitching across data sources, having both 1st-party and 3rd-party `cookie_id` is advantageous.

**Examples:** `cookie_id`, `td_ssc_id`

## ID Unification Feature Provided by Treasure Workflow

ID Unification is provided as a standard feature accessible to all users. To utilize this tool, users mainly need to prepare:

- A .dig file to invoke the unification workflow.
- A .yml file defining the data sources and stitching keys for ID Unification.


### .dig File to Invoke the Unification Workflow

The .dig file makes an HTTP call to invoke the Unification Workflow. This approach eliminates the need to download workflow code from GitHub and ensures all users receive updates simultaneously.


```yaml
+call_unification:
  http_call>: https://api-cdp.treasuredata.com/unifications/workflow_call
  headers:
    - authorization: ${secret:td.apikey}
  method: POST
  …
```

### .yml File for Defining Data Sources and Stitching Keys

The `.yml` file specifies the source tables and the keys used for stitching. While .dig files remain relatively consistent across use cases, `.yml` files depend entirely on the user's data structure and must be carefully written. Copy-pasting a template or another user's file won't suffice.


```yaml
name: test_id_unification_ex1

keys:
  - name: td_client_id
  - name: td_global_id

tables:
  - database: test_id_unification_ex1
    table: ex1_site_aaa
    key_columns:
      - {column: td_client_id, key: td_client_id}
      - {column: td_global_id, key: td_global_id}
…
```

## Defining Inputs and Outputs of ID Unification

### Inputs of ID Unification

ID Unification requires enumerating:

- All tables with identifiers that can be stitched together.
- The identifiers (keys) within each table used for stitching.


These keys are used to traverse all tables and consolidate data.

![InputIDU](/assets/1-2-1.76003422475e77a45ff177182010afb40f7febec4f823934f0b7f1384bcd1971.8cab2a52.png)

### Outputs of ID Unification

The most significant output of ID Unification is the assignment of a `canonical_id` to each identified individual. This process enriches the source tables, appending the `canonical_id` to facilitate further operations.

![Output](/assets/1-3-1.dc1953b799083cebc2f5b74c7e8e712562feefaf15b7d9a136a0fe6e1901dfbe.8cab2a52.png)

All stitched tables are output with the `canonical_id`, enabling table joins and analysis at the user level. For instance, using this ID as a join key allows unification of other tables for user-based aggregation and analysis.

### ID Unification for Audience Studio

ID Unification is a fundamental tool for utilizing Audience Studio. It begins by enumerating all source tables (attribute_table, behavior_table) and the keys within these tables.

![Output](/assets/1-4-1.430f8c9e0310d067cc191e7eabfe97a1aed066d72319157fe17fc3663c19484b.8cab2a52.png)

### Outputs for Audience Studio

In addition to enriched source tables, Audience Studio outputs a master_table containing the `canonical\_id`. This ensures all necessary tables for Master Segment creation—master_table, attribute_table, and behavior_table are prepared with enriched data.

![Output](/assets/1-5-1.d0c33db0ffe3444e4ae8ca316cf734a30973069e2313e9859fe57b1c3fc3fdcf.8cab2a52.png)

## Workflow Examples

Worlfklow examples in this ID Unification doc is published at [Treasure Boxes](https://github.com/treasure-data/treasure-boxes/tree/master/tool-box/id-unification-samples). You can try it out on your Treasure Data account.