# Multiple Canonical IDs

Up until the previous page, only one `canonical_id` was created by setting all the keys specified in `merge_by_keys:`. In this page, we introduce how to configure multiple `canonical_id`s.

We will use the data from [Learning canonical_id for Cookie ID](/en/tools/idu/p3_canonical_id_learning).

## Case1: Generating Multiple Canonical IDs Independently

By specifying multiple `canonical_id`s in `canonical_ids:`, it is possible to generate multiple independent `canonical_id`s simultaneously. However, only one `canonical_id` can be set as the Parent Segment, which is generally the one configured with all keys in `merge_by_keys:`.


```yaml
name: test_id_unification_ex3

keys:
  - name: td_client_id
    invalid_texts: ['']

  - name: td_global_id
    valid_regexp: "3rd_*"
    invalid_texts: ['']

tables:
  - database: test_id_unification_ex3
    table: site_aaa
    key_columns:
      - {column: td_client_id, key: td_client_id}
      - {column: td_global_id, key: td_global_id}

  - database: test_id_unification_ex3
    table: site_xxx
    key_columns:
      - {column: td_client_id, key: td_client_id}
      - {column: td_global_id, key: td_global_id}

  - database: test_id_unification_ex3
    table: site_yyy
    key_columns:
      - {column: td_client_id, key: td_client_id}
      - {column: td_global_id, key: td_global_id}

  - database: test_id_unification_ex3
    table: site_zzz
    key_columns:
      - {column: td_client_id, key: td_client_id}
      - {column: td_global_id, key: td_global_id}

canonical_ids:
  - name: brand_id_ax
    merge_by_keys: [td_client_id, td_global_id]
    source_tables: ['site_aaa', 'site_xxx']
    merge_iterations: 5

  - name: brand_id_yz
    merge_by_keys: [td_client_id, td_global_id]
    source_tables: ['site_yyy', 'site_zzz']
    merge_iterations: 5

  - name: unified_brand_id
    merge_by_keys: [td_client_id, td_global_id]
    merge_iterations: 5

master_tables:
  - name: master_table_ex3
    canonical_id: unified_brand_id

    attributes:    
      - name: td_client_id
        invalid_texts: ['']
        source_columns:
          - {table: 'site_aaa', order: first, order_by: td_client_id, priority: 1}
          - {table: 'site_xxx', order: first, order_by: td_client_id, priority: 1}
          - {table: 'site_yyy', order: first, order_by: td_client_id, priority: 1}
          - {table: 'site_zzz', order: first, order_by: td_client_id, priority: 1}                    
      - name: td_global_id
        valid_regexp: "3rd_*"
        invalid_texts: ['']
        source_columns:
          - {table: 'site_aaa', order: last, order_by: time, priority: 1}
          - {table: 'site_xxx', order: last, order_by: time, priority: 1}
          - {table: 'site_yyy', order: last, order_by: time, priority: 1}
          - {table: 'site_zzz', order: last, order_by: time, priority: 1}
```

`brand_id_ax` is created by performing unification on data from only two sites of the same brand, with the goal of assigning a `canonical_id` specific to this brand. Similarly, `brand_id_yz` is configured for another brand. Additionally, a `unified_brand_id` spanning all brands is created and set for the `master_table`.

In this configuration, the following three `canonical_id`s will execute their unification algorithms in parallel:

- `brand_id_ax`
- `brand_id_yz`
- `unified_brand_id`


Below are the final graphs for each unification algorithm.

### brand_id_ax_graph

![canonical_id_1](/assets/brand_id_ax_graph.e62af58fe1b93aa1dd8021ac723b64ea300c19ff10a957e62c9f174e731aaa90.1bb9a569.png)

### brand_id_yz_graph

![canonical_id_1](/assets/brand_id_yz_graph.2c175874c18443f6e8be9e707fe78bb7c3583c1726273c02bf3abfb915a79d30.1bb9a569.png)

### unified_brand_id_graph

![canonical_id_1](/assets/unified_brand_id_graph.4cd799a8089317c224d66028f75eca9bb3880e4083474c5773b4da49f6c7ff1e.1bb9a569.png)

### About the Configuration of `source_tables:`

In this example, `source_tables:` is specified as follows in the `canonical_ids:` configuration:


```yaml
canonical_ids:
  - name: brand_id_ax
    merge_by_keys: [td_client_id, td_global_id]
    source_tables: ['site_aaa', 'site_xxx']
    merge_iterations: 5

  - name: brand_id_yz
    merge_by_keys: [td_client_id, td_global_id]
    source_tables: ['site_yyy', 'site_zzz']
    merge_iterations: 5

  - name: unified_brand_id
    merge_by_keys: [td_client_id, td_global_id]
    merge_iterations: 5
```

In this setup, the unification algorithm uses only the keys from the tables specified in `source_tables:`. This configuration can be useful when you want to generate `canonical_id`s for specific brands or games in addition to `canonical_id`s spanning all data sources.

## Case2. Creating Master Tables for Each `canonical_id`

Next, we will introduce how to create a `master_table` for each `canonical_id`. Multiple `master_table` entries can be enumerated as shown below:


```yaml
master_tables:
  - name: master_table_ex3
    canonical_id: unified_brand_id

    attributes:
      - name: td_client_id
        invalid_texts: ['']
        source_columns:
          - {table: '${td.tbl_aaa}', order: first, order_by: td_client_id, priority: 1}
          - {table: '${td.tbl_xxx}', order: first, order_by: td_client_id, priority: 1}
          - {table: '${td.tbl_yyy}', order: first, order_by: td_client_id, priority: 1}
          - {table: '${td.tbl_zzz}', order: first, order_by: td_client_id, priority: 1}                    
      - name: td_global_id
        valid_regexp: "3rd_*"
        invalid_texts: ['']
        source_columns:
          - {table: '${td.tbl_aaa}', order: last, order_by: time, priority: 1}
          - {table: '${td.tbl_xxx}', order: last, order_by: time, priority: 1}
          - {table: '${td.tbl_yyy}', order: last, order_by: time, priority: 1}
          - {table: '${td.tbl_zzz}', order: last, order_by: time, priority: 1}                    

  - name: master_table_brand_id_ax
    canonical_id: brand_id_ax

    attributes:    
      - name: td_client_id
        invalid_texts: ['']
        source_columns:
          - {table: '${td.tbl_aaa}', order: first, order_by: td_client_id, priority: 1}
          - {table: '${td.tbl_xxx}', order: first, order_by: td_client_id, priority: 1}
      - name: td_global_id
        valid_regexp: "3rd_*"
        invalid_texts: ['']
        source_columns:
          - {table: '${td.tbl_aaa}', order: last, order_by: time, priority: 1}
          - {table: '${td.tbl_xxx}', order: last, order_by: time, priority: 1}

  - name: master_table_brand_id_yz
    canonical_id: brand_id_yz

    attributes:    
      - name: td_client_id
        invalid_texts: ['']
        source_columns:
          - {table: '${td.tbl_yyy}', order: first, order_by: td_client_id, priority: 1}
          - {table: '${td.tbl_zzz}', order: first, order_by: td_client_id, priority: 1}
      - name: td_global_id
        valid_regexp: "3rd_*"
        invalid_texts: ['']
        source_columns:
          - {table: '${td.tbl_yyy}', order: last, order_by: time, priority: 1}
          - {table: '${td.tbl_zzz}', order: last, order_by: time, priority: 1}
```

### `master_table`

The resulting `master_table` for each case would look like this:

#### `master_table_ex3`

| unified_brand_id | td_client_id | td_global_id |
|  --- | --- | --- |
| jiioKQC2uMFH | aaa_001 | 3rd_018 |


#### `master_table_brand_id_ax`

| brand_id_ax | td_client_id | td_global_id |
|  --- | --- | --- |
| jiioKQC2uMFH | aaa_001 | 3rd_005 |
| 0_LjXD4ScR1H | aaa_002 | 3rd_010 |
| A0T5jlq77XJH | aaa_003 | 3rd_017 |


### `master_table_brand_id_yz`

| brand_id_yz | td_client_id | td_global_id |
|  --- | --- | --- |
| NCnsgr0MusJH | yyy_001 | 3rd_001 |
| UA9xZODmd_pH | yyy_002 | 3rd_008 |
| lb8seLF7YZBH | yyy_003 | 3rd_012 |
| jaUfFmYfVjJH | yyy_004 | 3rd_018 |
| nBTjvWWxqydH | zzz_001 | 3rd_002 |


### `enriched_ Table`

Even if multiple `master_table`s are created, only one `enriched_table` is generated per source table. The key difference from Case1 is that all `canonical_id`s are enriched into the source tables, allowing linkage with any `master_table`. However, for `brand_id_ax` and `brand_id_yz` specified under `source_tables`, enrichment is limited to the designated tables. The enriched tables are shown below:

#### `enriched_site_aaa`

| site | td_client_id | td_global_id | brand_id_ax | unified_brand_id |
|  --- | --- | --- | --- | --- |
| aaa.jp | aaa_001 | 3rd_001 | jiioKQC2uMFH | jiioKQC2uMFH |
| aaa.jp | aaa_001 | 3rd_002 | jiioKQC2uMFH | jiioKQC2uMFH |
| aaa.jp | aaa_001 | 3rd_003 | jiioKQC2uMFH | jiioKQC2uMFH |
| aaa.jp | aaa_001 | 3rd_004 | jiioKQC2uMFH | jiioKQC2uMFH |


Here, the same value appears for both `brand_id_ax` and `unified_brand_id`, as both were based on `aaa_001`, which happened to be the final leader, generating the respective `canonical_id`. However, the two `canonical_id`s are not inherently related.

#### `enriched_site_xxx`

| site | td_client_id | td_global_id | brand_id_ax | unified_brand_id | time |
|  --- | --- | --- | --- | --- | --- |
| xxx.jp |  |  |  |  | 1672876800 |
| xxx.jp |  |  |  |  | 1673740800 |
| xxx.jp |  |  |  |  | 1674604800 |
| xxx.jp | xxx_001 | 3rd_004 | jiioKQC2uMFH | jiioKQC2uMFH | 1675555200 |
| xxx.jp | xxx_001 | 3rd_005 | jiioKQC2uMFH | jiioKQC2uMFH | 1676419200 |


#### `enriched_site_yyy`

| site | td_client_id | td_global_id | brand_id_yz | unified_brand_id | time |
|  --- | --- | --- | --- | --- | --- |
| yyy.jp | yyy_001 | 3rd_001 | NCnsgr0MusJH | jiioKQC2uMFH | 1672876800 |
| yyy.jp |  |  |  |  | 1673740800 |
| yyy.jp |  |  |  |  | 1674604800 |
| yyy.jp |  |  |  |  | 1675555200 |
| yyy.jp | yyy_002 | 3rd_005 | UA9xZODmd_pH | jiioKQC2uMFH | 1676419200 |


#### `enriched_site_zzz`

| site | td_client_id | td_global_id | brand_id_yz | unified_brand_id | time |
|  --- | --- | --- | --- | --- | --- |
| zzz.jp |  |  |  |  | 1672876800 |
| zzz.jp | zzz_001 | 3rd_002 | nBTjvWWxqydH | jiioKQC2uMFH | 1673740800 |
| zzz.jp |  |  |  |  | 1674604800 |
| zzz.jp |  |  |  |  | 1675555200 |
| zzz.jp |  |  |  |  | 1676419200 |


## Case3. Generating a Unified `canonical_id` by Merging Multiple `canonical_id`s

By using `merge_by_canonical_ids:` in the configuration for each `canonical_id` within `canonical_ids:`, you can generate a unified `canonical_id` by merging other `canonical_id`s.


```yaml
canonical_ids:
  - name: brand_id_ax
    merge_by_keys: [td_client_id, td_global_id]
    source_tables: ['site_aaa', 'site_xxx']
    merge_iterations: 5

  - name: brand_id_yz
    merge_by_keys: [td_client_id, td_global_id]
    source_tables: ['site_yyy', 'site_zzz']
    merge_iterations: 5

  - name: unified_brand_id
    merge_by_keys: []
    merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
    merge_iterations: 5
```

In this configuration, `unified_brand_id` is generated by merging `brand_id_ax` and `brand_id_yz`.

Since `unified_brand_id` is set as the `canonical_id` for the `master_table`, it must cover all source tables. Therefore, it is crucial to ensure that the combined `source_tables:` of `brand_id_ax` and `brand_id_yz` cover all necessary tables.

Although the `merge_by_keys:` for `unified_brand_id` is empty, the configuration works because `merge_by_canonical_ids` is specified.

Under this setup, the Unification Algorithm will first process `brand_id_ax` and `brand_id_yz` in parallel. Once these are complete, the Unification Algorithm for `unified_brand_id` will execute.

### Unification Algorithm with `merge_by_canonical_ids`

When `merge_by_canonical_ids:` is specified, the initial graph of the Unification Algorithm takes on a completely different structure.

In addition to the keys set in `merge_by_keys:`, the keys from the `merge_by_keys:` of the canonical IDs specified in `merge_by_canonical_ids:`, as well as the canonical IDs themselves, are added as keys.

The configuration for `unified_brand_id` will be used to explain this process.


```yaml
  - name: unified_brand_id
    merge_by_keys: []
    merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
    merge_iterations: 5
```

#### `unified_brand_id_graph_unify_loop_0`

Under this configuration, `graph_unify_loop_0` will look as follows:

Despite having an empty `merge_by_keys:`, all keys from all data sources and the canonical IDs themselves are included.

#### Creation Rules for `graph_unify_loop_0`

The initial graph starts by merging the following graphs, meaning that the process begins with a highly interconnected state:

1. The final graph of `brand_id_ax`
2. The final graph of `brand_id_yz`
3. The initial graph generated from the combined source tables of `brand_id_ax` and `brand_id_yz`


#### Rules for Determining `merge_by_keys:`

The applicable `merge_by_keys:` configuration is determined by the following rules:

- The canonical IDs specified in `merge_by_canonical_ids:` are added as stitching keys.
- Then, `merge_by_keys:` is determined in the following order of precedence:
  1. Keys in the `merge_by_keys:` of `unified_brand_id` have the highest priority.
  2. Canonical IDs from `brand_id_ax` and `brand_id_yz`, as specified in `merge_by_canonical_ids:`, have the next highest priority in the order listed.
  3. The `merge_by_keys:` settings of `brand_id_ax` and `brand_id_yz` (excluding keys already used by `unified_brand_id`) have the lowest priority.


In this example, the final `merge_by_keys:` is as follows:


```yaml
merge_by_keys: [canonical_id (brand_id_ax), canonical_id (brand_id_yz), td_client_id, td_global_id]
```

The Unification Algorithm is applied to `graph_unify_loop_0` using this priority order.

#### `unified_brand_id_graph`

![canonical_id_1](/assets/unified_brand_id_graph.4cd799a8089317c224d66028f75eca9bb3880e4083474c5773b4da49f6c7ff1e.1bb9a569.png)

| key | canonical_id |
|  --- | --- |
| 04zahkB6hA9G | wONJPFlQsJyj |


In this configuration, the `canonical_id` is generated based on `canonical_id (brand_id_ax)`. However, generating a `canonical_id` based on another `canonical_id` means that any changes in the base keys of the original `canonical_id` will also affect this one, adding complexity. To avoid this, consider specifying keys in the `merge_by_keys:` of `unified_brand_id`:


```yaml
  - name: unified_brand_id
    merge_by_keys: [td_global_id]
    merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
    merge_iterations: 5
```

In this case, the true priority of `merge_by_keys:` becomes:


```yaml
merge_by_keys: [td_global_id, canonical_id (brand_id_ax), canonical_id (brand_id_yz), td_client_id]
```

The initial graph remains the same, but the final graph will use `td_global_id` as the leader.

## Case4. Setting `canonical_id` as an Attribute in the `master_table`

A `canonical_id` can be set as an attribute in the Parent Segment.


```yaml
canonical_ids:
  - name: brand_id_ax
    merge_by_keys: [td_client_id, td_global_id]    
    source_tables: ['site_aaa','site_xxx']
    merge_iterations: 5

  - name: brand_id_yz
    merge_by_keys: [td_client_id, td_global_id]
    source_tables: ['site_yyy','site_zzz']
    merge_iterations: 5

  - name: unified_brand_id
    merge_by_keys: []
    merge_by_canonical_ids: [brand_id_ax, brand_id_yz]
    merge_iterations: 5

master_tables:
  - name: master_table_ex3
    canonical_id: unified_brand_id

    attributes:    
      - name: brand_id_ax
        source_canonical_id: brand_id_ax

      - name: brand_id_yz
        source_canonical_id: brand_id_yz

      - name: td_client_id
        invalid_texts: ['']
        source_columns:
          - {table: 'site_aaa', order: first, order_by: td_client_id, priority: 1}
          - {table: 'site_xxx', order: first, order_by: td_client_id, priority: 1}
          - {table: 'site_yyy', order: first, order_by: td_client_id, priority: 1}
          - {table: 'site_zzz', order: first, order_by: td_client_id, priority: 1}                    
      - name: td_global_id
        valid_regexp: "3rd_*"
        invalid_texts: ['']
        source_columns:
          - {table: 'site_aaa', order: last, order_by: time, priority: 1}
          - {table: 'site_xxx', order: last, order_by: time, priority: 1}
          - {table: 'site_yyy', order: last, order_by: time, priority: 1}
          - {table: 'site_zzz', order: last, order_by: time, priority: 1}
```

In this configuration, the canonical IDs `brand_id_ax` and `brand_id_yz` are set within the `attributes:` section. Other canonical IDs can also be configured as attributes, but they must always be related to the `canonical_id` of the `master_table` (in this case, `unified_brand_id`) through `merge_by_canonical_ids:`.

It is also worth reiterating that `brand_id_ax` and `brand_id_yz` cannot be set as the canonical ID of the `master_table`. This is because they do not cover all source tables, which prevents the enriched canonical ID from being applied to all tables.

### `master_table`

The output `master_table` generates records based only on the number of `unified_brand_id` specified in `canonical_id:`. It is important to understand that attributes do not influence this record generation.

As seen in Case1, `brand_id_ax` and `brand_id_yz` each contained multiple canonical IDs. However, when referenced as attributes, only the canonical ID corresponding to the most recent key (in terms of the `time` column) is included.

| unified_brand_id | td_client_id | td_global_id | brand_id_ax | brand_id_yz |
|  --- | --- | --- | --- | --- |
| wONJPFlQsJyj | aaa_001 | 3rd_018 | XuCLIRp30bxG | RXcPcKu6CO1G |