Facial Recognition: How to Train Models and Keep Improving Over Time

Executive Summary

Incode’s facial recognition technology is designed for accurate and fast identification/verification of a person’s identity based on facial images, without the need for physical document checks or human intervention. It ensures high recognition accuracy regardless of gender, age, race, or environmental conditions during capture.

Despite challenges related to image quality in documents due to factors such as wear and security features, Incode’s True Positive Rate (TPR) for selfie-vs-document photo comparisons reaches 98.53%, while the False Positive Rate (FPR) is maintained at 0.001%, ensuring a high level of security.

Our technology has undergone independent evaluations and demonstrates excellent results:
In the
NIST FRVT testing, we achieved an FNIR of 0.0038 on the visa-border dataset with an FPIR ≤ 0.003, placing us #1 among IDV providers, and in the top 15 of the best overall FR algorithms used for various purposes.

In the RIVTD 2023 test by DHS S&T, our technology was one of the 3 systems meeting the criteria with a document extraction failure rate (FTXR) below 1%, using a conservative threshold where the False Match Rate (FMR) was three times lower than the expected 1:10,000 and had FNMR below 1%.

These impressive independent recognitions are not a coincidence, but the result of consistent efforts to challenge our own algorithm with rigid testing:
We use extensive and well-balanced
proprietary datasets containing more than 40 million images and over 4.5 million unique identities and work tirelessly to improve our facial recognition models, employing advanced methods and effective training strategies and regularly updating and enhancing the training data.

In this paper, you will get an inside view of what Face Recognition is and the most pressing challenges today, and learn from our experts how to train and test processes that build high-performing facial recognition models.

What is Face Recognition?

Face Recognition is a technology that allows for identifying or verifying a person’s identity by analyzing unique facial features.

It has become an integral part of our daily lives, integrating into various sectors: facial recognition can not only enhance security and prevent fraud but also simplify access to services and make them more personalized.

According to forecasts, the global market for facial recognition technologies is expected to reach significant volumes in the coming years, reflecting the growing demand for such technologies.

Two main tasks are distinguished in Face Recognition:

  • 1:1 Verification: verifying whether two facial images belong to the same individual.
  • 1:N Identification: identifying an individual from a large database by comparing a single facial image (1) against numerous stored images (N) to find the correct match.

During verification in our system, the user presents an identity document and takes a photo of their face. Each of the received images is sent for face detection and quality assessment. It is important to understand that face detection is only the process of determining the presence of a face in an image and its location, not its recognition. Although this process may seem simple, the step of detecting a face, assessing its quality, and selecting the best image plays a crucial role in achieving high-quality and effective recognition.

If the face is not detected, or if the quality of the face on the selfie or ID is insufficient for further recognition (for example, if the image is too dark or overexposed), a new capture and assessment of the next frame takes place.

Once faces of sufficient quality are detected on both the selfie and the ID images, the system extracts unique features – high-dimensional representations of complex facial patterns and unique characteristics. These features are transformed into a numerical representation of the biometric data of the face. This vector represents a unique “facial signature,” which allows it to be distinguished from other users’ faces.

In the verification system, the next step is comparing the obtained face embeddings from the selfie and the identity document. If the face on the selfie matches the face on the presented ID, and after validating the selfie and the document (this is done to prevent fraud, more details can be found in the corresponding documents), the data is entered into the system and can be used later for user identification.

The user identification process at the initial stages is similar to verification, except that only the selfie is used. At the comparison stage, the embedding of the captured face is compared with the registered embeddings stored in the database.

All technologies used in our system are based on advanced machine learning models. These innovative solutions provide maximum accuracy and efficiency in detecting faces and their landmarks, extracting face embeddings, and comparing those embeddings.

This high level of accuracy was not achieved overnight; researchers started exploring automatic face detection and recognition quite some time ago.

The chart below starts from the early 1990s, although the first attempts at automating the process were made in the mid-20th century. Early approaches, based on empirical knowledge of facial images, worked well only in strictly defined conditions. Classical machine learning helped to address some of the issues, but it still couldn’t account for the high face variability. This situation persisted until the mid-2010s when significant advances in neural networks made it possible to apply them effectively. Neural network-based methods are now demonstrating the highest levels of efficiency.

Deep learning has overcome many limitations of earlier methods, providing high accuracy and reliability in facial recognition across various conditions. However, despite the progress made, facial recognition technologies still face certain challenges. Factors such as capturing conditions, image characteristics, and the similarity of close relatives’ faces can affect recognition quality. Therefore, ongoing efforts are focused on improving algorithms to ensure maximum accuracy and robustness in a wide range of application scenarios.

Key Challenges in Face Recognition

The development of an effective, reliable, and robust facial recognition algorithm presents us, as developers, with a number of technical challenges. The quality of recognition can be influenced by various factors, ranging from the conditions of image capture and the characteristics of the images to the similarity of faces among close relatives.

Varying capturing conditions and image characteristics

For example, varying capturing conditions and image characteristics affect the accuracy of facial recognition models. Real-world conditions are rarely ideal: lighting can be poor or uneven, the angle may deviate from a frontal view, and facial expressions may vary.

Shadows, glare, and overexposed areas can hide or distort important facial features. Poor or uneven lighting makes it difficult to extract these features, which can lead to recognition errors.

When a face is captured at an angle, the relative positioning of facial features changes. A lack of sufficient data with different angles during training results in the model’s inability to properly handle such images.

Different facial expressions can affect the appearance of wrinkles and folds on the face, altering the texture features used by the model. If the model is not trained on various facial expressions, it may perceive images of the same person with different expressions as different faces.

All those factors can significantly reduce the effectiveness of models if they are not trained to handle such variations. Therefore, it is important to develop models that can maintain high recognition accuracy despite the diversity of external factors.

Including images with a wide range of external conditions in training datasets, as well as using models invariant to rotation and scale, can greatly improve the reliability of recognition systems. The application of data augmentation, normalization, and preprocessing of facial images also contributes to creating a high-quality method.

High variability in facial appearance

Another factor that affects recognition accuracy is high variability in facial appearance. The use of makeup, wearing beards or mustaches, glasses or hats, and natural aging can significantly alter a person’s appearance.

Makeup can significantly change the visual perception of a face, hide natural skin texture features, and alter the visibility of facial contours.

Accessories, beards, and mustaches can obscure parts of the face that contain important features, change the facial contour, and conceal the chin and jawline.

Different facial expressions can affect the appearance of wrinkles and folds on the face, altering the texture features used by the model. If the model is not trained on various facial expressions, it may perceive images of the same person with different expressions as different faces.

These changes can significantly complicate facial recognition tasks as they affect the key features used by models for identification. Therefore, it is essential to develop methods that allow models to effectively handle high facial variability.

Difficulty in Comparing Selfies with Photos on ID Documents

Another factor affecting the accuracy of the recognition method is the difficulty in comparing faces from different domains, specifically the challenge of comparing selfies with photos on ID documents. IDs often contain security features designed to prevent document forgery, which creates visual obstructions for facial recognition systems.

Holograms, semi-transparent images, or patterns applied over the photograph can create glare and distortions, reduce the contrast and clarity of the photo, introduce visual noise, and obscure facial details.

The glossy surface of the document can produce reflections during scanning or photographing, making recognition more difficult.

Large and diverse dataset collection and labeling

Developing models that work fairly and ensure accuracy for all demographic groups, like age, gender, and race, is a necessary requirement for modern systems aiming for leadership in the global market. This creates the need to identify and mitigate biases that may arise from skewed distributions or underrepresentation of certain groups in the data, which in turn requires the use of large and diverse datasets.

Unfortunately, using open datasets does not fully meet the need for diversity and completeness of data. Open datasets often contain biases, as they predominantly feature individuals from a single demographic group (e.g., LFW), specific age groups (e.g., MS-Celeb-1M), or have a limited number of identities (e.g., DemogPairs). Additionally, open datasets are not always available for commercial use.

Therefore, the collection, preparation, and labeling of proprietary datasets is critical for training fair and accurate facial recognition models. However, the process of collecting, labeling, and checking the quality of data for facial recognition tasks is labor-intensive and often requires significant human and machine resources. A small dataset cannot capture the full complexity and diversity of human faces; thus, the training dataset must contain millions of images. This leads to the need for automated annotation, as it becomes practically impossible to manually process such large volumes of data. Human oversight is still necessary to verify the accuracy of automated annotations, but it is most effective in the final stages or for quality control.

Another challenge in data collection for facial recognition is compliance with laws and regulations. A human face is considered biometric data, which is classified as sensitive information in most countries and subject to strict protection. During data collection, it is essential to ensure that there are no violations of the laws of the country where the data is collected, that the data is securely protected, and that access to it is carefully controlled. Failure to comply with these regulations can lead to reputational and financial losses.

Similar faces differentiation

Finally, the issue that requires special attention is distinguishing between twins or close relatives, as they share similar biometric characteristics. This is an especially challenging task due to the abovementioned natural changes in appearance, such as aging or makeup. When trying to enhance the model’s ability to differentiate faces, it may begin to lose its generalization capabilities for the same person’s face.

Look at the image below. The first two pictures show twin brothers photographed in 2005, and the next two pictures show the same twins a few years later. Are you sure we didn’t swap the brothers in the photos?

Identical twins John and Edward Grimes (Jedward) in 2013
A newer photo, taken in 2023.

These are the same challenges faced by the model when attempting to differentiate twins. Unfortunately, differentiation of similar faces still remains a challenge for facial recognition systems. Although evolving technologies bring us closer to solving this problem, a definitive solution has yet to be found.

Core technologies and processes

All technologies used in our verification and identification system are based on advanced ML models. These innovative solutions ensure maximum accuracy and efficiency in face and facial landmark detection, facial embedding extraction, and the embedding comparison.

Face and facial landmark detection

Face and facial landmark detection is a critical component of the system. It is particularly challenging when dealing with diverse conditions such as occlusion, varying lighting, and real-time constraints, especially considering that the solution needs to be lightweight and fast without compromising accuracy.

By using advanced detection approaches, along with modern data processing techniques, we achieved a balance of efficiency, robustness, and precision, enabling seamless integration into real-time applications.

As a result, the face detection model returns the coordinates of the bounding box surrounding each detected face, the coordinates of facial landmarks such as the eyes, mouth, and nose, as well as the confidence score that the detected object is indeed a face. Based on this information, Incode selects the most prominent face for further processing. For clients who wish to work with the detected faces themselves, we can provide information about all found faces.

The most prominent face is then sent for processing by the model, which computes the face embedding and performs the comparison.

Face alignment

Before extracting the face embedding, the face alignment procedure is performed. For face alignment, five key points are used: the outer corners of the eyes, the tip of the nose, and the outer corners of the mouth. The need for face alignment arises from the sensitivity of neural network algorithms to affine transformations, such as rotation and scaling.

The similarity transformation we use includes scaling in addition to rotations and translations. This approach helps maintain a more realistic appearance of the faces, even as their size changes, although it does not preserve the exact distances between points.

Face embedding extraction and comparison

For face embedding extraction, we use deep learning techniques, particularly convolutional neural networks (CNNs). This approach has demonstrated state-of-the-art performance in various tasks and is widely adopted in both academic research and industry applications. Alternative approaches, such as traditional machine learning methods or handcrafted feature-based approaches, were considered but were found to be less effective in handling the complexity and variability of face and ID data.

After extraction, face embeddings are compared with the embedding of a specific person(or with embeddings in the database).

In the end, the face recognition model returns a score between 0 and 100, which represents the confidence of the face in the selfie matching the picture in the identity document or inanother selfie. A value of 100 represents the highest likelihood that the pictures are from the same person, and a value of 0 represents the lowest likelihood that the pictures are from the same person. Once a threshold value is set, the model behaves as a binary classifier.

Vector Database

To store face embeddings, we designed our own Incode Vector Database. The Incode Vector Database is designed to provide a reliable and efficient way to store and manage large sets of facial vectors, enabling quick and accurate face recognition. The database is designed to store large amounts of data and can scale to accommodate increasing data volumes without compromising performance or reliability.

With Vector Database, users can easily store and manage their facial vector data, and perform a variety of operations, including searching and indexing. The database is optimized for fast search and retrieval of facial vectors, enabling users to quickly and accurately match faces against a large dataset.

How to train Facial Recognition for High Performance

The success of deep model training depends on several factors: architecture, loss functions, training data, sampling strategies, and many others.

Model architecture

Deep convolutional networks were chosen as our architecture, they represent the current state-of-the-art approach in face recognition tasks. However, we would like to highlight a specific training feature that enables the efficient handling of a large number of identities.

The model we train solves not a binary task of comparing two faces, but a task of obtaining face embeddings. The model’s training is organized as a multi-class classification task, where each class represents a unique identity (during the training, each face is considered a separate class). This way, the model learns to distinguish among many different identities.

If we visualize the process, it might look like we are trying to place faces on an imaginary sphere during training, where the faces of one person are close to each other, and the faces of different people are located far apart.

Training datasets

In addition to an architecture, models capable of representing a large number of identities require vast amounts of data. The main datasets used in the industry, along with their key features, can be found in the table below.

Dataset nameNumber of imagesNumber of identityKey features
VGGFace2~3 300 000~9 000Large-scale face recognition dataset. The original identity labels are obtained automatically from web pages. This dataset has been retracted and should not be used
IMDb-Face~1 700 000~59 000A dataset for recognizing faces across pose and age. Authors note that the distribution of identities in the VGG-Face dataset may not be representative of the global human population
MS-Celeb-1M~10 000 000100 000Large-scale face recognition dataset. The original identity labels are obtained automatically from webpages. This dataset has been retracted and should not be used
WebFace260M~260 000 000~4 000 000Images obtained from the IMDb website. The dataset is manually cleaned from 2.0 million raw images
WebFace42M~42 000 000~ 2 000 000Cleaned version of WebFace260M
Glint360~17 000 000~360 000The largest and cleanest face recognition dataset

The usage of open datasets, unfortunately, does not fully cover the need for diversity and complexity of data. Open datasets often contain a limited number of faces, small variations in conditions, and biases. Moreover, they are not always available for commercial use and may not align with the real-world domain.

To solve this, Incode has built its own dataset, reflecting real usage scenarios of our products and covering a wide range of age groups, genders, and races. The images in the dataset were captured under various lighting conditions, with different backgrounds, angles, and facial expressions. Additionally, by using our own data, we not only ensure its quality and relevance but also compliance with all legal and ethical standards.

The training dataset contains ~41,000,000 images corresponding to ~4,600,000 identities. This huge dataset was extracted from the production environment, using data from clients who signed the ‘ML Consent’. Proprietary clustering and cleaning algorithms were used for data labeling.

Sampling strategy

However, effective model training depends not only on the data itself but also on the sampling strategies. To optimize the training process, we use a strategy of selecting hard negative examples (those that the model confuses with positive ones), which encourages the model to better differentiate between similar faces.

All these methods allow the models to generalize information more effectively, avoid overfitting, and accurately distinguish even similar faces, which is especially important in real-world scenarios.

Model testing Key Aspects

Key Error Metrics

False Negative Rate (FNR)

False Negative Rate (FNR) defines the proportion of cases where the system fails to recognize a registered user. FNR is one of the key metrics in evaluating the performance of face recognition models. In different testing scenarios, this metric may be referred to by different names: in 1:1 tasks, it is often called False Non-Match Rate (FNMR), and in 1:N tasks, it is known as False Non-Identification Rate (FNIR). However, at its core, it is the same metric that assesses the system’s ability to correctly identify a registered face.

FNR is calculated as:

where

FN (False Negative) – the number of false rejections where the system fails to recognize a registered user; the number of mated comparisons below a threshold.

TP (True Positive) – the number of correct recognitions, where the system correctly identifies a registered user; the number of mated comparisons at or above a threshold.

The lower the FNR, the less likely it is that a legitimate user will face identification issues, which helps maintain customer satisfaction. A high FNR can lead to longer customer service times, reduced trust in the system, and increased costs for manual checks. Therefore, minimizing the false negative rate is essential when developing a facial recognition system.

False Positive Rate – FPR

A False Positive Rate (FPR) defines the proportion of cases where the system incorrectly identifies an unregistered user as registered. FPR is crucial for ensuring the security and trust of a facial recognition system, as a high rate of false positives can lead to unauthorized access. FPR may also be referred to by different names depending on the testing scenario: False Positive Match Rate (FPMR) in 1:1 tasks and False Positive Identification Rate (FPIR) in 1:N tasks.

FPR is calculated as:

Where:

FP (False Positive) – the number of false acceptances where the system incorrectly identifies an unregistered user as registered; the number of impostor comparisons at or above the threshold.

TN (True Negative) – the number of correct rejections, where the system correctly does not recognize an unregistered face; the number of non-mated comparisons below the threshold.

A high FPR can lead to unauthorized access, which is particularly dangerous in access control and security applications. Moreover, frequent false positives can reduce customer trust in the system and may cause them to stop using it. Therefore, minimizing the number of false positives is critical when developing a facial recognition system.

Eventually, it is essential to find an optimal threshold that balances both metrics: False Negative Rate and False Positive Rate.

Threshold selection

In our particular case, for all our models, we choose thresholds at the points where FPR=0, which means that the model doesn’t have any tolerance to false positives, and then shift them by approximately 5% to make sure they will work for new data that the model hasn’t seen before, for example, if the threshold is 0.5 we will put 0.525 as the threshold.

To make a system more secure, the threshold can be shifted even higher if the higher false negative rate is not an issue. We highly recommend not moving thresholds without approval from Incode’s side because TPR/FPR dependency isn’t linear (ROC curves show that), and their values highly depend on the input data. Also, you shouldn’t consider the threshold as a probability, so, for example, the threshold of 0.3 for the recognition model doesn’t mean the probability of successful recognition equals 0.3. The threshold is just some number that helps to classify input data.

Visual representation of model performance

To select the threshold and evaluate the overall performance of facial recognition models, visual representations of model performance are often used. The most widely used ones are ROC curves (Receiver Operating Characteristic) and DET curves (Detection Error Tradeoff). The ROC curve shows the relationship between TPR (True Positive Rate, the proportion of mated comparisons at or above a threshold) and FPR (False Positive Rate) at various threshold values, allowing for a visual assessment of the balance between false positives and true positives. The closer the ROC curve is to the point (0, 1), the better the algorithm separates the classes.

DET curves are used to visualize the tradeoff between two key error metrics: False Negative Rate (FNR) and False Positive Rate (FPR), providing a clearer understanding of how changes in the recognition threshold affect the number of errors. The closer the DET curve is to the point (0, 0), the better the algorithm separates the classes.

Both DET and ROC curves often use logarithmic scales to display FPR. This makes them more informative for analyzing performance in low-error conditions, which is especially useful in facial recognition systems, where even small changes in error rates can have critical implications.
Below are examples of ROC and DET curves for two abstract models. The graphs show that the blue model demonstrates better performance, staying closer to the zero point on the DET curve and closer to the point (0, 1) on the ROC curve.

Testing protocol

For testing and evaluating model performance, and comparing with competitors, Incode conducts testing on its own databases and participates in open vendor technology evaluation competitions for facial recognition.

Internal testing

For internal testing, Incode sampled ~2M unique images of ~800k unique identities from production (properly distributed across countries, devices, etc.). From those images, we formed ~5,000,000 genuine pairs and ~2,000,000,000,000 imposter pairs.

Then, ~2M genuine pairs and ~2.4M of the hardest imposter pairs (those with the highest confidence) for selfie comparison and a similar number of pairs for selfie vs ID comparison were randomly selected. This formed a dataset of ~4,000,000 selfie vs selfie pairs and ~5,500,000 selfie vs ID pairs in total. This was done primarily to speed up our testing process without compromising the quality of the test.

For analysis, we use False Positive Rate (FPR, the proportion of cases where the system incorrectly identifies an unregistered user as registered) and False Negative Rate (FNR, the proportion of cases where the system fails to recognize a registered user).

Evaluating by NIST

Incode’s face recognition algorithms have been tested by NIST since 2018. NIST Face Technology Evaluations (FRTE) is a process specifically focused on assessing the accuracy, security, and performance of facial recognition technologies.

There are two main tracks: Face Recognition Technology Evaluation (FRTE) 1:1 Verification, which assesses the accuracy, reliability, and efficiency of face recognition algorithms in verifying whether two facial images belong to the same individual, and Face Recognition Technology Evaluation (FRTE) 1:N Identification, which evaluates the performance of face recognition algorithms in identifying an individual from a large database by comparing a single facial image (1) against numerous stored images (N) to find the correct match.

NIST uses the following datasets for testing:

  • Visa images (6.2M images) refer to photographs collected as part of visa applications.
  • Mugshot images (1.4M images) are collected in the United States during routine post-arrest booking processes.
  • Application Images (1.1M images) are collected during attended interviews at U.S. immigration offices.
  • Border Crossing Images (2.6M images) are taken with a webcam, oriented by an immigration officer toward a cooperating subject.
  • Kiosk images (1M images) are taken at kiosks equipped with cameras designed to capture a centred face (non-cooperative scenario).

For 1:1, the metrics used are FMR (False Match Rate, which refers to the frequency at which a biometric system incorrectly matches the biometric input of one person to the stored embedding of another person. Essentially, it measures the rate of false positives) и FNMR (False Non-Match Rate, measures the frequency at which a biometric system fails to recognize a legitimate user, incorrectly identifying them as a non-match. This corresponds to false negatives).

For 1:N, the metrics used are FNIR (False Negative Identification Rate, The percentage of cases where the algorithm fails to identify a registered subject) и FPIR (False Positive Identification Rate, The percentage of cases where the algorithm incorrectly identifies an unregistered subject).

DHS Science & Technology RIVTD Track 2 (facial recognition) Test

Incode Face Recognition algorithms were attended in the DHS Science & Technology RIVTD Track 2 (facial recognition) Test. The 2023 Remote Identity Validation Technology Demonstration (RIVTD) is the independent US government test. It includes an evaluation of 18 remote identity validation systems based on selfie and document matching accuracy, failure-to-extract rates, and demographic fairness.

Remote identity validation technologies enable people to assert their identity online without going to a physical location. RIVTD measures the performance of these systems and the degree to which they may reduce fraud while maintaining access to services.

RIVTD was divided into three tracks for three different steps in the remote identity validation process. The first track, document validation, tested systems that determine if an ID is genuine or not. The second track, match to document, tested systems that determine if a person in a selfie is the same person pictured on an ID document. The third track, presentation attack detection, tested subsystems that determine if a presentation is from a legitimate or fraudulent user, like someone using a mask to impersonate someone.

A total of 1,633 volunteers participated in Remote Identity Validation Tech (RIVTD) Track 2 over two data collections: Maryland Test Facility (MdTF), May 2023 and Remote Collection, September 2023. Each volunteer used each smartphone to provide one controlled and one uncontrolled selfie image. Test team personnel used each smartphone to collect one controlled document image (only the front of the document was used).

For testing, there were used the following metrics:

  • Selfie FTXR (Selfie Failure to Extract Rate) measures the percentage of times a system fails to extract useful biometric data from a selfie
  • Document FTXR (Document Failure to Extract Rate) measures the percentage of times a system fails to extract useful biometric data from a document image
  • FNMR (False Non-Match Rate) measures the probability that a system incorrectly determines that matching photos on a selfie and ID document do not belong to the same person.

There were three options for threshold Setting: E (Expected)- The FMR (False Match Rate) is 1:10,000, ensuring only one incorrect match occurs per 10,000 trials; P (Permissive) – The FMR (False Match Rate) is three times larger than expected 1:10,000 and C (Conservative) – The FMR (False Match Rate) is three times smaller than the expected 1:10,000.

Expected FMR 1:10,000 refers to setting the False Match Rate (FMR) threshold at 1 in 10,000 comparisons, ensuring only one incorrect match occurs per 10,000 trials.

Key Advantages of Our Technology

Usage the proprietary large and balanced datasets

The use of our proprietary large and balanced datasets is a key factor in ensuring that our models are accurate, fair, and capable of performing well under a wide range of external and internal conditions.

Since the data collection, cleaning, and labeling processes are fully under our control, we ensure high quality, representativeness, and diversity in our data, which is crucial for maintaining industry leadership and offering our clients reliable and effective facial recognition technologies.

During data collection, we pay special attention to including a wide range of age groups, genders, ethnicities, and races. This ensures that our models perform accurately and fairly for all users. Moreover, our datasets include images captured under various lighting conditions with different backgrounds, angles, and facial expressions. In terms of ID images, our datasets cover a wide variety of identification types. This makes the models more robust to variations and enhances their reliability in real-world scenarios.

By using our proprietary data, we can guarantee its quality, relevance, and compliance with all legal and ethical standards as we collect data from clients who signed the ‘ML Consent’.

We have two large internal datasets:

  • The first dataset is used for training and contains ~41M unique images of ~4.6k unique identities.
  • The second dataset is used for testing and contains ~2M unique images of ~800k unique identities.

Both datasets include selfie and ID data.

To ensure data quality, we apply semi-automated labeling methods using our proprietary clustering and cleaning algorithms, which allow us to efficiently process and structure massive amounts of data. This has enabled us to create datasets that are not only huge in size but also diverse, covering a wide range of faces and reflecting real-world usage scenarios of our products.

Outstanding accuracy in comparing selfies with ID document photos

Thanks to the fact that our data reflects real-world usage scenarios of our products, we achieve outstanding accuracy in comparing selfies vs ID documents, significantly surpassing our competitors.

Our model is trained to recognize faces even in the presence of holograms, watermarks, and other security elements that create visual interference on document photos.

Our internal tests confirm that in the general task of comparing selfies with selfies, all vendors show comparable performance, with accuracy around 99.5–99.9%. However, in the critically important task of identity verification — comparing selfies with document photos — our technology significantly outperforms competitors, reaching an accuracy of 98.5%, while the closest competitor achieves only 90%.

The following ROC curve demonstrates the superiority of our model over competitors at various levels of False Positive Rate.

Below are examples that we successfully solved, unlike our competitors, clearly demonstrating our advantages.

Document photos often contain security elements that create interference. Moreover, we take into account factors such as lighting, face angle, and image quality while developing models that are robust to these variations. Our technology accurately processes such images. In the left example, we correctly identified two different people, while competitors made a mistake. In the right example, we accurately recognized one person, whereas competitors incorrectly identified them as different individuals.

Leading positions in NIST and DHS S&T RIVTD Track 2 rankings

Our facial recognition systems demonstrate outstanding results in official tests conducted by leading organizations such as the National Institute of Standards and Technology (NIST) and the U.S. Department of Homeland Security (DHS S&T). These achievements confirm our leadership in the facial recognition technology market and highlight the high quality of our solutions.

NIST Face Technology Evaluations (FRTE)

Incode demonstrates outstanding and respectable results in terms of FNIR and FPIR metrics. Incode ranked among the top performers, and the metric values are within the acceptable range for highly effective face recognition algorithms.

In the first metric (False Negative Identification Rates (FNIR) for the case where a threshold is set to limit the False Positive Identification Rate (FPIR) to 0.003), we have 0.0038 on visa-border, and we are in the top 15 across all datasets, except for Mugshot/Profile and Mugshot/Mugshot ΔT ≥ 12 YRS.

GalleryMugshotMugshotMugshotVisaVisa
ProbeMugshotMugshotWebcamBorderKiosk
N12M16M16M16M16M
incode-006 FNIR T >00.00230.00140.00960.00380.0681

For the second metric (False Negative Identification Rates (FNIR) for the case where the threshold is set to zero and the algorithm returns a fixed number (50) of candidates), we have 0.0025 FNIR on visa-border (16M), and we are ranked 13th for Mugshot/Mugshot and 14th for Border/Border ΔT ≥ 10 YRS.

GalleryMugshotMugshotMugshotVisaVisa
ProbeMugshotMugshotWebcamBorderKiosk
N12M16M16M16M16M
incode-006 FNIR Rank 1, T=00.00130.00110.00800.00250.0536

In addition, the size of the enrolled population is almost unaffected on the Incode performance.

Remote Identity Validation Technology Demonstration (RIVTD) 2023

There were 18 vendors applied. Only 16 were accepted. Results are anonymized, and we don’t know who the other vendors were or how any individual vendor performed. Incode demonstrates distinguished results. Incode was 1 of only:

  • 6 that had documented “failure to extract rate” (FTXR) below 1%;
  • 5 that has a conservative (C) threshold with the FMR (False Match Rate) is three times smaller than the expected 1:10,000
  • 9 that had FNMR below 1%
  • 3 for which all of the above is true

These results confirm the advantage of our technology in the critically important task of identity verification, where comparing selfies with ID documents requires high accuracy and robustness to variations in capturing conditions.

High efficiency in real-world use cases

Our facial recognition technology demonstrates high efficiency in real-world conditions, where accurate and fast identification is required. Our clients report that the Incode system makes fewer errors compared to competitors, enables quick user identification, and works effectively in low-light conditions, at various angles, and even with security elements present on documents.

Leaders across various industries trust Incode:

Nothing proves the reliability and accuracy of our technology better than real-world examples of its use. In the following examples, we will show how our system enhances security at stadiums and increases the average throughput during passenger check-in at airports.

Case Study 1: Get into games securely with Fan ID

Before the event in Querétaro, insecurity and violence had already impacted Liga MX, with various incidents that put the safety of fans, players, and officials at risk. However, the most notable incident was the mass brawl at Estadio Corregidora in 2022, during a match between Querétaro and Atlas, which resulted in severe sanctions, stadium closures, and an urgent call to improve security at sporting events. These incidents have led to the implementation of measures such as FAN ID to control access and strengthen security in the stadiums.

The Mexican Football Federation (FMF) and Liga MX, along with its 18 clubs, partnered with Incode to implement the FAN ID system in all Liga MX and Mexican National Team matches. This system aims to enhance stadium security by enabling efficient biometric identification of fans and controlling access in a safer and more organized manner, creating a more secure environment for all attendees. This system enhances security levels at stadiums by quickly detecting individuals banned from the event (visitors previously involved in criminal activities and listed on the blacklist).

Since its implementation in January 2023, more than 3 million FAN ID registrations have been generated by fans using over 150 types of identification (INE, passports, driver’s licenses), including foreign nationals.

Cultural adaptation has been steadily improving, and the impact of this implementation appears to be reflected in the increased attendance at Liga MX matches when comparing different tournaments:

  • Clausura (spring tournament):
    • 3.08 million people attended in 2022
    • 4.1 million people attended in 2023 (FAN ID was used)
    • 4.16 million people attended in 2024 (FAN ID was used)
  • Apertura (fall tournament):
    • 3.57 million people attended in 2022
    • 3.78 million people attended in 2023 (FAN ID was used)

Feedback from Visitors and Staff:

“When we began to apply more police presence in the stadiums, FAN ID, and reduced the presence of visiting supporter groups, we started to see a correlation with increased attendance at the stadiums.”

— Mikel Arriola, President of Liga MX

“We aimed for the solution to be designed with the fan’s convenience in mind; the reality is that it’s a very simple digital, online process that should take two or three minutes to complete.”

— Germán Elvira, Director of Digital Marketing of Liga MX

Case Study 2: Accelerated Registration at Airports

In modern airports, the passenger registration and control process must be as fast and reliable as possible. Viva Aerobus—one of Mexico’s largest airlines—operating through Mexico City International Airport (AICM) and Monterrey International Airport jointly with Inсode implemented a contactless registration system using facial recognition, eliminating the need for paper boarding passes and manual document checks by staff. This significantly enhanced both passenger comfort and airport operational efficiency.

Before implementing our system, the boarding process was more time-consuming and prone to human errors. After deploying the system, it not only significantly reduced processing time and errors but also enhanced security measures.

At AICM, where the system crosschecks information between the ID, boarding pass, and the passenger’s face, the processing time is reduced by 55-60%, decreasing verification from 8-9 seconds to just 3-4 seconds while also eliminating errors associated with human verification.

Monterrey International Airport only verifies that the passenger has a valid boarding pass, and although the boarding process may take slightly longer, the system enhances security by ensuring that the correct person is presenting their boarding pass.

In both cases, passengers benefit from a seamless boarding experience using their biometrics.

Feedback from Passengers and Airport Staff
We couldn’t record passengers’ names as it would disrupt the boarding process, but we overheard several comments.

“Esto se siente como en los aeropuertos de Estados Unidos”
English: “This feels like airports in the United States”

“Que bueno que ya no tenemos que mostrar los documentos todo el tiempo. Mucho más rápido”
English: “It’s great that we don’t have to show our documents all the time. Much faster.”

“Es como lo que vi en Estados Unidos/Europa, me sorprende verlo aquí.”
English: “It’s like what I saw in the US/Europe, I’m surprised to see it here.”

Continuous model improvements

We regularly conduct experiments with various modern architectures and training approaches, exploring opportunities for improvement. As part of these studies, we analyze the effectiveness of different activation functions and loss functions, experiment with modern approaches such as transformers, and examine methods of data sampling and augmentation. Additionally, we keep track of technological trends and explore the possibility of integrating them to enhance the functionality of our systems.

We regularly update and expand our datasets, which is one of the key factors in maintaining and improving the performance quality of our models. The continuous collection of new data allows our models to adapt to changing conditions and ensure high recognition accuracy across various scenarios. New data enables the models to handle evolving conditions, such as the emergence of new document types with enhanced security features that impact the overall quality of facial images.

In addition to data collection, we experiment with increasing data diversity by generating synthetic data using generative models. This allows for dataset expansion without compromising privacy while ensuring broader coverage of various demographic groups.

The accuracy improvements over time of our algorithm on the NIST Mugshot-to-Mugshot dataset demonstrate that, despite excellent absolute FNMR values, we are continuously enhancing and improving the efficiency of our models.

Any questions or comments? Get in touch: