PII Matching Methods

About

GTR has constituted the standards for verifying PII, based on strict regulations and considering user experience. Please make sure you use this standard to improve verification accuracy.

We can distinguish two types of validation entities: 1. Legal Person (Company/Entity/Enterprise), 2. Natural Person (Individual/Citizen).

They have different verification methods and applicable conditions. Please mind the following attributes and parameters of each verification standard.

In the chapter of PII Verify Fields, we list the all Verify Fields in the table, the column Verify Rules ID refer to this page to make the verification.

For example the fields 121001 use the NAME_FUZZY_VD method to match the data, please refer to the content below the table.

VerifyFields	IVMS Name	Direction	Entity Type	Verify Rules ID	Format	How to Fill	Other Description	IVMS Field Name
121001	Legal Person Name	OriginatingVASP	Legal Person	NAME_FUZZY_VD			公司法人名稱	legalPersonName

Pre-processing

Natural Person Name

The name fields should have primaryIdentifier (Last Name), secondaryIdentifier (First Name), and the middle name is including in the part of Last Name.

If your KYC/B database cannot recognize the First Name or Last Name, please fill all to the primaryIdentifier.

The name is always describe in the list, means that could have many of name in the list to be verify. the strategy of MATCHED is that the name has match to the one of list, then it sohuld consider to be matched.

Natural Person Local Name

Most of the system cannot split non-english name or english name, we defined that all the local name should treat as Natural Person Name, to verify in same array.

You can merge local name array and name array to one list and verify, if one of them has been matched, it should flag as MATCHED.

Legal Person Name

Legal Person Name just a single field name in the list, if any one of list been matched, then consider to be MATCHED.

NAME_FUZZY_VD

Input: Name-1, Name-2 Output: 0-1 (Similarity) Threshold: 0.8 (Recommend)

NAME_FUZZY_VD is the fuzzy matching method for check the similarity between two name (A, B), it is suitable to compare in the list one-by-one, the case is non-sensitive.

For example, the list is come from the decrypted PII data what the conter-party VASP sent.

[
    "John Wick",
    "Wick John",
    "John",
    "Wick"
]

In our KYC/B, the name is JohnWick, so to apply the name matching method will be:

[
    NAME_FUZZY_VD("John Wick", "JohnWick"),
    NAME_FUZZY_VD("Wick John", "JohnWick"),
    NAME_FUZZY_VD("John", "JohnWick"),
    NAME_FUZZY_VD("Wick, "JohnWick")"
]

and the applied function will be:

[
    0.94,
    0.91,
    0.6,
    0.2
]

and theres two similarity score of text on the list is grether than 0.8, it should consider to be MATCHED.

Preprocessing

Convert to lower case

Convert all names to lowercase.

For example (KYC):

LastName: Maynard → maynard
MiddleName: Victor P. → victor p.
FirstName: Ausburn → ausburn

For example (IVMS101):

primaryIdentifier: Maynard Victor P. → maynard victor p.
secondaryIdentifier: Ausburn → ausburn
legalPersonNameIdentifier: Happy Company Co., Ltd → happycompanyco.,ltd

Replace with regular expressions

Each field should use regular rules to remove special characters, please refer to this pattern, remove whitespace and some special characters.

[-,\.\s&%#^?!@{}\[\]()><*"'~\/;:$\\\|\/_=+-]

For example the name "maynard victor p. ausburn" will replace to be: "maynardvictorpausburn", and the company name "happycompanyco.,ltd" will replace to be "happycompanycoltd" after applying the pattern.

Regex101

The fuzzy matching method in GTR is using the algorithm module describe as follows:

Algorithm Details

The algorithm measures similarity between two names by combining multiple techniques:

Tokenization: Split names into individual words
Sorting: Arrange tokens alphabetically to normalize word order
Levenshtein Distance: Calculate similarity between token pairs
Threshold Filtering: Only count matches above similarity threshold (0.7)
Missing Token Penalty: Penalize unmatched tokens

Step-by-Step Process

Step 1: Preprocessing

Convert to lowercase
Remove special characters using regex pattern
Split into tokens (words)
Sort tokens alphabetically

function preprocess(name) {
    // Convert to lowercase and remove special characters
    const cleaned = name.toLowerCase()
        .replace(/[-,\.\s&%#^?!@{}\[\]()><*"'~\/;:$\\\|\/_=+-]/g, '');
    
    // For token-based matching, keep spaces for splitting
    const forTokens = name.toLowerCase()
        .replace(/[-,\.&%#^?!@{}\[\]()><*"'~\/;:$\\\|\/_=+-]/g, ' ')
        .split(/\s+/)
        .filter(token => token.length > 0)
        .sort();
    
    return { cleaned, tokens: forTokens };
}

// Example:
// Input: "John A. Smith"
// Output: { cleaned: "johasmith", tokens: ["a", "john", "smith"] }

Step 2: Token Matching with Levenshtein

function levenshteinSimilarity(str1, str2) {
    const maxLen = Math.max(str1.length, str2.length);
    if (maxLen === 0) return 1.0;
    
    const distance = levenshteinDistance(str1, str2);
    return 1 - (distance / maxLen);
}

function matchTokens(tokens1, tokens2, threshold = 0.7, missingPenalty = 0.2) {
    const smaller = tokens1.length <= tokens2.length ? tokens1 : tokens2;
    const larger = tokens1.length > tokens2.length ? tokens1 : tokens2;
    
    let totalScore = 0;
    const used = new Set();
    
    // Match tokens from smaller list to larger list
    for (const token of smaller) {
        let bestMatch = -1;
        let bestScore = 0;
        
        for (let i = 0; i < larger.length; i++) {
            if (used.has(i)) continue;
            
            const similarity = levenshteinSimilarity(token, larger[i]);
            if (similarity > bestScore) {
                bestMatch = i;
                bestScore = similarity;
            }
        }
        
        if (bestScore >= threshold) {
            totalScore += bestScore;
            used.add(bestMatch);
        } else {
            // Missing token penalty
            totalScore += Math.max(0, bestScore - missingPenalty);
        }
    }
    
    // Penalize unmatched tokens in larger list
    const unmatchedCount = larger.length - used.size;
    totalScore -= unmatchedCount * missingPenalty;
    
    // Normalize by average token count
    const avgTokenCount = (tokens1.length + tokens2.length) / 2;
    return Math.max(0, Math.min(1, totalScore / avgTokenCount));
}

Step 3: Complete NAME_FUZZY_VD Implementation

function NAME_FUZZY_VD(name1, name2) {
    const processed1 = preprocess(name1);
    const processed2 = preprocess(name2);
    
    // Token-based similarity (primary method)
    const tokenSimilarity = matchTokens(processed1.tokens, processed2.tokens);
    
    // Character-based similarity (fallback for short names)
    const charSimilarity = levenshteinSimilarity(processed1.cleaned, processed2.cleaned);
    
    // Use token-based if both names have multiple tokens, otherwise character-based
    const hasMultipleTokens = processed1.tokens.length > 1 || processed2.tokens.length > 1;
    
    return hasMultipleTokens ? tokenSimilarity : charSimilarity;
}

Examples

// Example 1: Different word order
NAME_FUZZY_VD("John Smith", "Smith John")
// Tokens: ["john", "smith"] vs ["john", "smith"] 
// Result: ~0.95 (high similarity)

// Example 2: With typo
NAME_FUZZY_VD("John Smith", "Jon Smith")
// Tokens: ["john", "smith"] vs ["jon", "smith"]
// "john" vs "jon": similarity ~0.75 (above 0.7 threshold)
// Result: ~0.87

// Example 3: Missing token
NAME_FUZZY_VD("John A Smith", "John Smith")
// Tokens: ["a", "john", "smith"] vs ["john", "smith"]
// "a" has no good match, gets penalty
// Result: ~0.73

// Example 4: Preprocessed company name
NAME_FUZZY_VD("Happy Company Co., Ltd", "HappyCompanyCo Ltd")
// After preprocessing: "happycompanycoltd" vs "happycompanyco ltd"
// Result: ~0.91

This algorithm handles common name variations like different word orders, typos, missing middle names, and company name formats while maintaining high accuracy.

TYPE

Input: Type-1, Type-2 Output: MATCH, MISMATCHED (Boolean)

TYPE is mean the type name id, it use to check the value between two Type are same, it has to be 100% match, and it is case non-sensitive.

Simple Implementation

function TYPE(type1, type2) {
    return type1.toLowerCase() === type2.toLowerCase();
}

Examples

// Example 1: Match
TYPE("CCPT", "ccpt")  // true (MATCH)

// Example 2: Mismatch
TYPE("CCPT", "RAID")  // false (MISMATCH)
// CCPT = Passport, RAID = Tax ID - different types

// Example 3: Case insensitive
TYPE("PASSPORT", "passport")  // true (MATCH)

ABS_CI

Input: Value-1, Value-2 Output: MATCH, MISMATCHED (Boolean)

ABS_CI is to check the value between two Value are same, it have to be 100% match, and it is case non-sensitive.

Simple Implementation

function ABS_CI(value1, value2) {
    return value1.toLowerCase() === value2.toLowerCase();
}

Examples

// Example 1: Country codes match
ABS_CI("US", "us")  // true (MATCH)

// Example 2: Country codes mismatch
ABS_CI("US", "UK")  // false (MISMATCH)

// Example 3: Case insensitive
ABS_CI("Singapore", "SINGAPORE")  // true (MATCH)

// Example 4: Exact match required
ABS_CI("New York", "New York City")  // false (MISMATCH)

FUZZY_TEXT

Input: Text-1, Text-2 Output: 0-1 (Similarity) Threshold: 0.7 (Recommend)

FUZZY_TEXT is the fuzzy matching method for check the similarity between two text (Text-1, Text-2), and it is case non-sensitive.

Simple Implementation

function FUZZY_TEXT(text1, text2, threshold = 0.7) {
    const similarity = levenshteinSimilarity(
        text1.toLowerCase(), 
        text2.toLowerCase()
    );
    return similarity >= threshold;
}

function levenshteinSimilarity(str1, str2) {
    const maxLen = Math.max(str1.length, str2.length);
    if (maxLen === 0) return 1.0;
    
    const distance = levenshteinDistance(str1, str2);
    return 1 - (distance / maxLen);
}

Examples

// Example 1: Partial address match
FUZZY_TEXT("New York City, A Street", "A Street")
// Similarity: ~0.42, Result: false (below 0.7 threshold)

// Example 2: Similar addresses
FUZZY_TEXT("123 Main Street", "123 Main St")
// Similarity: ~0.85, Result: true (MATCH)

// Example 3: Typo in address
FUZZY_TEXT("Wall Street", "Wal Street")
// Similarity: ~0.91, Result: true (MATCH)

// Example 4: Different addresses
FUZZY_TEXT("Wall Street", "Park Avenue")
// Similarity: ~0.18, Result: false (MISMATCH)

POST_CODE

Input: PostCode-1, PostCode-2 Output: MATCH, MISMATCHED (Boolean)

POST_CODE is to check the value between two PostCode are same, it have to be 100% match, and it is case non-sensitive.

post code need do the preprocessing to remove all non-digits value by the pattern below:

[^0-9]

Simple Implementation

function POST_CODE(postcode1, postcode2) {
    // Remove all non-digit characters
    const cleaned1 = postcode1.replace(/[^0-9]/g, '');
    const cleaned2 = postcode2.replace(/[^0-9]/g, '');
    
    return cleaned1 === cleaned2;
}

Examples

// Example 1: Same postcode different format
POST_CODE("171-0023", "1710023")  // true (MATCH)
// Both become "1710023" after preprocessing

// Example 2: Different postcodes
POST_CODE("171-0023", "249-3203")  // false (MISMATCH)
// "1710023" vs "2493203"

// Example 3: Complex formatting
POST_CODE("SW1A 1AA", "SW1A1AA")  // true (MATCH)
// Both become "" after removing non-digits (no digits in UK postcode)

// Example 4: US ZIP codes
POST_CODE("10001-1234", "10001")  // false (MISMATCH)
// "100011234" vs "10001"

NONE

NONE means this field is not use for matching or verify.

PII Matching Methods

About​

Pre-processing​

Natural Person Name​

Natural Person Local Name​

Legal Person Name​

NAME_FUZZY_VD​

Preprocessing​

Algorithm Details​

Step-by-Step Process​

Examples​

TYPE​

Simple Implementation​

Examples​

ABS_CI​

Simple Implementation​

Examples​

FUZZY_TEXT​

Simple Implementation​

Examples​

POST_CODE​

Simple Implementation​

Examples​

NONE​

About

Pre-processing

Natural Person Name

Natural Person Local Name

Legal Person Name

NAME_FUZZY_VD

Preprocessing

Algorithm Details

Step-by-Step Process

Examples

TYPE

Simple Implementation

Examples

ABS_CI

Simple Implementation

Examples

FUZZY_TEXT

Simple Implementation

Examples

POST_CODE

Simple Implementation

Examples

NONE