Table 2. Count of unique values at each field using Chinese characters, Jyutping, Pinyin, Pinyin_notone (without tonal information) and HKG-romanisation for 771 names. In brackets, degree of loss or addition of unique values, using different romanisation methods compared to original Chinese characters (%).
Unique count | Chinese | Jyutping | Pinyin | Pinyin_notone | HKG-romanisation |
Surname (1 char) | 123 | 117 (−4.9%) | 120 (−2.4%) | 108 (−12.2%) | 152 (+23.6%) |
1-to-1 | 35 | 39 | 34 | 27 | |
1-to-1 (occurred >1) | 66 | 67 | 54 | 24 | |
Many-to-1 | 16 | 14 | 20 | 11 | |
1-to-Many (contain duplications) | – | – | – | 49 | |
Forename (1-2 char) | 743 | 641 (−13.7%) | 679 (−8.6%) | 642 (−13.4%) | 687 (−7.5%) |
1-to-1 | 555 | 608 | 557 | 600 | |
1-to-1 (occurred >1) | 8 | 13 | 11 | 3 | |
Many-to-1 | 77 | 57 | 74 | 60 | |
1-to-Many (contain duplications) | – | – | – | 14 | |
Full name | 771 | 767 (−0.5%) | 769 (−0.3%) | 763 (−1.0%) | 770 (−0.1%) |
We also described how many names have 1-to-1, 1-to-many, and many-to-1 correspondence.