Business Intelligence Dictionary: Surrogate Keys Explained | NSBI
Surrogate Key | Definition
Primary Keys and Foreign Keys are also known as business keys since they are primarily used in various operational databases. When we load a Data Warehouse, however, those keys may no longer uniquely identify each tuple in a table, therefore a new kind of key is introduced - a Surrogate Key.
NSBI Tutorials aim to make Business Intelligence (BI) and Data Warehousing (DWH) attractive to non-technical people as well as to those who are now entering the field and are excited by the numerous ways data is changing our world. NSBI Tutorials are written and delivered by Nick Shopov, (BI Software Developer & DWH Consultant).
Surrogate Key | Example
Let's build on the previous example for Primary and Foreign Keys:
Now imagine that we have to consolidate all our sales from different operational databases into a single data warehouse. There are numerous potential issues considering the Primary Key of table "Employees":
1. 123456 may also be used as a national ID in some other country, then it's no longer unique for Jack.
2. Joanna maybe just got married and legally changed her name. Then 654456 will no longer be unique if her past sales and new sales continue to exist in the same table. Also see Slowly Changing Dimensions.
3. Designers of "Employees" table in some other location maybe not even use "National ID Num" but prefer another incremental indicator, e.g. '1' for the first employee, '2', '3', etc.
All these potential issues require that a new key be introduced in the data warehouse - the Surrogate Key. It is usually derived as a function of the natural Primary Key. The "function" to be used may vary across businesses.
One common simple function is appending the current system time to the business Primary key. See example below:
First record is loaded on 13.Oct.2014, 16h10, 22sec., its business key is 123456. Next record is loaded in 1 second, etc. (we take this approach for simplicity, usually we measure miliseconds). Now let's observe the potential issues again:
- 1. Jack and Ahmed have unique surrogate keys.
- 2. Joanna P and Joanna J are one and the same person but with different unique surrogate keys.
- 3. Artem and Mestan also have unique surrogate keys, although DB2 ID pattern is different from DB1 ID pattern.
Surrogate keys are proven unique.