Skip to content

About persistent_id

As mentioned in About canonical_id, canonical_id is not guaranteed to be immutable. In some cases, a different canonical_id may be assigned to a user who was previously identified as the same person.

To address this issue, a different mechanism has been introduced to maintain an persistency, robust value similar to canonical_id. This is called persistent_id. This page introduces the characteristics of persistent_id and explains how to configure it.

Note

If persistent_id is used instead of canonical_id, users do not need to set merge_by_keys: to obtain an robust ID.

Warning

persistent_id is not supported by do_not_merge_key.

Mechanism for persistent_id to Retain a Robust Value

The mechanism for generating values for canonical_id and persistent_id is illustrated using the following examples:

Daytd_client_idtd_global_id
2024-03-01aaa_0023rd_001
2024-03-02aaa_0013rd_001

Both canonical_id and persistent_id can be configured as follows:

canonical_ids:
  - name: cid
    merge_by_keys: [td_client_id, td_global_id]

persistent_ids:
  - name: pid
    merge_by_keys: [td_client_id, td_global_id]

How canonical_id is Generated

Under this configuration, the smallest value in td_client_id (in this case, aaa_001) is selected as the leader, and the canonical_id is generated based on this value.

How persistent_id is Generated

For persistent_id, the smallest value in td_client_id in terms of time is chosen as the leader. In this example, aaa_002 is the leader, and the persistent_id is generated based on this value.

Example 1: Daily Transition of Key Values Selected as Leader

To understand how persistent_id is generated, let’s examine the transition of the leader as new data is added each day.

Day 1

Daytd_client_idtd_global_id
2024-03-01aaa_0023rd_001

The leaders for each ID on Day 1 are as follows:

DayLeader (canonical_id)Leader (persistent_id)
2024-03-01aaa_002aaa_002

Day 2

Daytd_client_idtd_global_id
2024-03-01aaa_0023rd_001
2024-03-02aaa_0013rd_001

On Day 2, the leaders are as follows:

DayLeader (canonical_id)Leader (persistent_id)
2024-03-01aaa_002aaa_002
2024-03-02aaa_001aaa_002

The leader for canonical_id changes from the previous day, resulting in a new canonical_id value. However, the leader for persistent_id remains unchanged, so the same persistent_id value is retained.

This mechanism ensures that persistent_id remains persistent by always selecting the smallest value based on time, ensuring the leader does not change regardless of new key values.

Example 2: Behavior When Two Individuals Are Linked

When two individuals are linked, persistent_id ensures that the value of the earlier key remains as the leader. Consider the following example:

Daysite_aaasite_aaasite_bbbsite_bbb
td_client_idtd_global_idtd_client_idtd_global_id
2024-03-01bbb_0013rd_001
2024-03-02aaa_0013rd_002
2024-03-03aaa_0013rd_001

Day 2

Person 1

DayLeader (canonical_id)Leader (persistent_id)
2024-03-02bbb_001bbb_001

Person 2

DayLeader (canonical_id)Leader (persistent_id)
2024-03-02aaa_001aaa_001

Day 3

When Person 1 and Person 2 are linked on Day 3:

DayLeader (canonical_id)Leader (persistent_id)
2024-03-03aaa_001bbb_001

In this case, canonical_id merges into the leader of Person 1, while persistent_id merges into the leader of Person 2.

Cases Where persistent_id May Change

There are two scenarios where persistent_id can change:

  1. When past records are added or deleted: Changes to past records may affect the leader selection.
  2. When time is deprioritized in merge_by_keys:: The key priority can be explicitly set to override time.

For instance:

persistent_ids:
  - name: pid
    merge_by_keys: [td_client_id, time, td_global_id]

This configuration prioritizes td_client_id over time, potentially leading to changes in the persistent_id value when higher-priority keys are introduced later.