Incode’s Technology
Frontier AI Lab for Fraud Prevention
Incode is advancing the state of the art of AI models that identify and combat fraud. By leveraging foundational models trained on unique global fraud datasets and with the ability to continuously learn, Incode not only stops today’s fraud but also evolves at the speed of emerging gen-AI fraud threats.
Challenges
Fraud is accelerating with the rise of generative AI; Deepfakes, synthetic identities, and automated impersonation agents can be created and deployed at unprecedented scale, producing adversarial attacks that evolve as quickly as generative AI evolves.
Solution
Incode develops foundational models for identity, document, and behavior analysis. These models are trained on proprietary datasets and designed to adapt in real time to adversarial fraud attempts, staying ahead of evolving AI fraud.
VLM, LLM, Agent
Multimodal and Agentic Models for Real-Time Fraud Adaptation
In order to interpret complex identity, document, and behavioral signals, Incode develops Vision-Language Models, Large Language Models, and reasoning agents that work across modalities to evaluate fraud patterns and support adaptive detection as new attacks emerge.
Identity Vision-Language Model (VLM)
Incode’s VLM is trained on global identity data, documents, templates, and synthetic fraud samples across 200+ regions. With few-shot learning, it adapts quickly to new attack types and unseen document formats. It analyzes visual and textual signals to detect tampering, synthetics, deepfakes, and altered documents with high precision.
What it powers:
Tamper & Synthetic Detection: Identifies deepfakes, swaps, edits, and synthetic visual content.
Document Intelligence: Classifies document types, performs OCR, and extracts structured fields.
Visual–Text Consistency Checking: Cross-verifies that images, text, and metadata align and are authentic
Performance: Superior accuracy vs. traditional ML classifiers in production benchmarks and fraud simulations.
Key Performance improvements:
- 67% fewer errors in fake ID detection
- +4.0% gain in document tampering accuracy
- 25% fewer errors in signature fraud detection
- ~80% faster training cycles with optimized VLM architecture
- 0% APCER on challenging liveness dataset
Fraud Large Language Model (LLM)
Incode’s Fraud LLM is trained on proprietary fraud datasets, identity metadata, behavioral sequences, device signals, and transactional flows. Through few-shot and transfer learning, it adapts rapidly to emerging fraud tactics and contextual manipulation attempts. It interprets complex, multi-source patterns in real time to uncover hidden anomalies and intent.
What it powers:
Anomaly & Pattern Detection: Detects unusual sequences, behaviors, and fraud tactics.
Risk Signal Extraction: Derives structured insights from messy, multi-source data.
Fraud-Intent Classification: Distinguishes benign user mistakes from coordinated fraud attempts.
Performance: Early internal testing shows significant gains over traditional rule-based and ML classifiers.
Reasoning Agents for Risk Intelligence
Reasoning Agents are trained on hundreds of signals and millions of labeled identity and fraud outcomes. Using reinforcement learning and client-specific histories, they refine decision boundaries over time. They orchestrate outputs from VLMs, LLMs, device telemetry, and behavioral insights into a unified, context-aware risk assessment.
What it powers:
Holistic Risk Decisions: Combine multi-modal signals to deliver real-time, context-aware outcomes.
Signal Coordination & Conflict Resolution: Weigh and balance signals when they disagree.
Adaptive Verification: Tailor decision logic to each client’s risk profile and tolerance.
Performance: Designed to minimize human intervention while improving error rates. Early Risk AI Agent results show reduced fraud and higher approval rates for genuine users.
Mexico:
- FAR improves by ≈47.6%
- FRR improves by ≈60.3%
USA:
- FAR improves by ≈12.1%
- FRR improves by ≈19.1%
Training data
Comprehensive Training Data
Incode’s multimodal and agentic models rely on a strong data foundation covering global coverage, structured training pipelines, early exposure to high-fraud environments, and real-time network intelligence. The following sections outline how these data layers support accurate, secure, and adaptive AI model performance.
Data access
Extensive Data Coverage for Model Enhancement
Incode trains resilient models with data from government databases, enterprise integrations, and billions of global verifications. This scale provides the diversity and depth required to train models with broad regional coverage and resilience against a wide range of fraud tactics.
Global Coverage
200+
Countries
4,600+
Document types
Data at Scale
400M+
Unique Identities
4.1B+
Identity checks
Enterprise Coverage
700+
Enterprise clients
20+
Industries served
Source of Truth Data
670+
Connections to verified identity databases
15+
Biometrics government source of truth connections
Fraud Data Infrastructure
Turning global data signals into real‑time fraud defense intelligence
Building on this global data access foundation, Incode uses structured labeling, synthetic data generation, and continuous stress-testing to turn raw signals into high-quality training data for fraud-resilient AI models.
Labeling Pipelines
Human and automated review across millions of records.
200+ human labelers creating training data and measuring AI performance for each customer.
Synthetic data
120+ synthetic data generation tools.
Generation of tampered documents, presentation attacks, and deepfakes to train models on rare threats.
Fraud Lab
Red-team environment to simulate and replay real-world attack scenarios.
Continuous stress-testing of models to strengthen defenses against new fraud tactics.
Access to the best training data for fraud prevention
Models trained with high-fraud regions data
We started in Latin America, one of the world’s highest-fraud regions, covering 66% of the adult population. This early exposure to complex, large-scale fraud hardened our models from the start.
Building on this foundation, Incode now serves 700+ enterprises worldwide, including 8 of the top 10 banks in North America and reaching about 65% of U.S. adults. The same models hardened in high-fraud markets are now deployed globally, supporting customers across industries and regions.
Because new fraud techniques often appear first in LATAM, our strong presence there gives us an early-warning advantage, helping models adapt faster and deliver stronger fraud prevention before threats spread to other regions.
Cross Organization Fraud Detection
Trust Graph
Trust Graph is Incode’s privacy preserving identity intelligence network. By connecting and comparing billions of biometric, data, and document signals, it flags fraudulent anomaly patterns, first party fraud, and previously identified fraudsters without exposing any PII.
This network effect not only improves fraud detection in real time but also enriches model training data. This improves the density, diversity, and adaptiveness of Incode’s AI models, making them more accurate and resilient over time.
Incode’s Vector Face Database
This in-house technology powers Trust Graph by enabling instant search across hundreds of millions of identity embeddings generated by our recognition models. Optimized for speed, it delivers sub-20 ms response times with full recall while maintaining distributed reliability and in-memory performance.
Performance
Millisecond search at massive scale.
Reliability
Distributed with zone-level resilience and replication.
Flexibility
Customizable index structures, replication factors, and comparison functions.
Security
Encryption, retention controls, and full auditability.
This in-house technology powers Trust Graph by enabling instant search across hundreds of millions of identity embeddings generated by our recognition models. Optimized for speed, it delivers sub-20 ms response times with full recall while maintaining distributed reliability and in-memory performance.
Vector Face Database
Incode’s Proprietary Vector Face Database
Desgined for storing and searching facial embeddings at scale, it supports both 1:1 verification, matching a probe to a claimed identity, and 1:N identification, where probes are compared against large galleries to find the best matches.
Performance
20–40 ms search times across hundreds of millions of vectors
Elastic Resilience
Vector indexing, sharding, parallel querying, and autoscaling
Efficiency
Caching reduces tail latency for frequent queries
Security
Built-in encryption, retention controls, and auditability
Identity Density
How Incode Measures Identity Confidence
Identity density expresses how confidently a user’s identity can be confirmed. Incode measures it by combining deterministic records with probabilistic AI signals, powered by our global data foundation, multimodal models, and Trust Graph intelligence.
Rich Identity Density Through Deterministic and Probabilistic Mapping
Layered Identity Verification
Deterministic mapping anchors identities with hard, verified facts like biometric govt sources or previous verified checks.
Probabilistic mapping uses AI-driven ML modles that analyze patterns across face, document, device, and behavior to expand coverage and detect anomalies when direct records are limited.
Deterministic Sources
Incode’s Network
400M+ identities confirmed by Incode,
Biometrics SOTs
15+ connections to biometrics government sources of truth.
Probabilistic Sources
VLMs, LLMs, Intelligent Agents, and Incode’s deep learning face, document, and fraud-detection models expand coverage to identities not captured by deterministic data sources.
Together, deterministic and probabilistic sources help Incode create denser identity coverage by adding more datapoints, more signals, and greater certainty when verifying an identity.
ML Models
Deep Learning Models
Face Intelligence and Core Models
End-to-end face perception that detects faces, creates robust embeddings, and matches identities at scale via a vector engine, continuously improving through calibration and hard‑case mining.
3rd Party Validation
- Nist #1 technology ranked for facial recognition
- 1:1 NIST Certified
- 1:N NIST Certified
- FIDO Face certification
- DHS RIVTD: Incode was one of only 3 vendors to meet all key benchmarks (FTXR <1%, FNMR <1%, FMR below 1:10,000).
Generic Face Detector
Detects and localizes faces in selfies and ID images, serving as the foundation for downstream tasks such as recognition, liveness, and document validation. The model is trained to handle varied image conditions, including rotation, occlusions, and non-human distractors. Evaluated on datasets covering selfies, IDs, rotated samples, negatives, and non-human inputs.
Face Recognition 1:1
Performs one-to-one biometric matching by comparing a live selfie against the portrait extracted from a government ID. The model is optimized to minimize both false accepts and false rejects under strict thresholds. Evaluated on a dataset of 5.8M+ selfie–ID pairs.
Face Recognition 1:N
Performs one-to-many biometric search by embedding a live selfie into a high-dimensional feature space and comparing it against a gallery of enrolled identities. The model is designed for scalability and efficiency, supporting large databases while maintaining strict accuracy thresholds. It minimizes false accepts and false rejects through optimized indexing and similarity scoring. Evaluated on a dataset of 5.8M+ selfies.
FaceDB (Vector Matching Engine)
A vector database and matching engine for facial templates. It enables 1:N identification and 1:1 authentication by converting faces into embeddings (high-dimensional vectors) and comparing them efficiently. Built as a C++ binary with an Elixir orchestration layer, FaceDB uses HNSW indexing for similarity search and supports both standalone and clustered deployments with autoscaling and index migration.
Liveness and Core Models
Multi-modal defenses that distinguish real users and physical IDs from spoofs and deepfakes using spatial, temporal, and device-aware signals with continual hard‑negative training.
3rd Party Validation
- 1stPassive Liveness technology to be certified in the market
- iBeta ISO/IEC 30107‑3 Presentation Attack Detection (PAD) Level 2 confirmation for passive liveness.
Face Liveness
Face Liveness: Detects whether a selfie comes from a live human rather than a spoof (photo, screen replay, mask, or deepfake). Incode’s default passive liveness has been evaluated on a dataset of 150,000+ spoof attempts, covering replays, paper copies, 2D masks, and 3D masks.
Document Liveness
Document Liveness: Determines whether an identity document presented to the camera is a genuine physical ID or a spoof (printed copy, photo, or screen replay). Paper ID Liveness v5.0 has been evaluated on a dataset of 34,000+ spoof attempts, covering paper copies and screen replays across diverse document types.
Deepfake Defense and Core Models
Deepfake and Gen‑AI Defense: Multi‑modal models that detect and block AI‑generated fraud , deepfakes, face swaps, document injections, and synthetic identities, by combining pixel‑level artifact analysis with generative‑pattern inconsistencies and cross‑signal checks across face, liveness, document, barcode, and metadata inputs
3rd Party Validation
- #2 in the ICCV 2025 DeepID Challenge, a benchmark focused on detecting Gen-AI generated identity documents.
- Ranked #1 in deepfake attack detection by Hochschule Darmstadt, outperforming commercial vendors and research labs
Deepfakes
Digital Liveness (Deepfake & Injection Detection): Detects whether a selfie has been synthetically generated, altered, or injected (e.g., face morphs, swaps, or AI-generated deepfakes). Evaluated on a dataset of 40,000+ digital spoof attempts
Gen-AI Documents
Age Assurance and Core Models
Policy-ready age estimation that provides calibrated predictions with uncertainty bounds and fairness constraints, routing edge cases to secondary verification.
3rd Party Validation
- NIST: Each model ranked among the top 3 in the market for the lowest average MAE across all ages
- NIST: Fastest Response Time among age verification vendors for Age estimation
- ACCS accreditation under PAS 1296 (Age Check Certification Scheme, UKAS‑accredited)
Age Estimation
AI model that estimates a user’s age from a selfie, designed to enforce age-based compliance while minimizing bias across demographics, trained on data from over 200,000 images.
Document Intelligence and Core Models
Document understanding that classifies type, extracts and validates OCR, MRZ, and barcodes, and detects tampering, fusing signals into a document authenticity score that adapts with active learning.
Document Type Classification
Determines the category and regional origin of an identity document by analyzing its visual layout, textual content, and structural patterns. The model leverages multimodal machine learning techniques to distinguish between document types and issuing authorities, even under varied formats and scan conditions. Evaluated on a large-scale dataset of diverse global identity documents.
Tamper Detection
Detects whether the portrait or key fields on an identity document have been digitally manipulated (e.g., replacement of the main photo, text-field alterations, or other digital edits). The model leverages pixel-level anomaly detection and cross-template consistency checks to identify tampering. Evaluated on a dataset of 8,200+ tampered ID samples.
ID Text Readability
Determines whether the text fields on an identity document are readable for automated data extraction. The model processes cropped ID images with a binary mask over text zones and classifies them into three categories: unreadable, no text fields of interest, readable. Evaluated on both test and production datasets, it shows significant improvements over earlier segmentation-based approaches.
ID Cropping (Web + Mobile)
Detects and crops identity documents from camera frames while also estimating text and barcode readability, as well as image quality factors such as blur and glare. Both models are designed to ensure that captured IDs meet readability standards for automated processing, across mobile and web environments.
Barcode Validation
- Ensures that barcode data extracted from identity documents is correct, complete, and compliant before use downstream. The model performs multiple layers of validation, including: Symbology Checks: Confirm the barcode type (e.g., PDF417) and enforce structural rules.
- Decoding Integrity: Verify error-correction and checksums; perform re-encoding round-trip checks.
- Spec Compliance: Validate against standards (e.g., AAMVA), required fields, delimiters, and length constraints.
- Data Consistency: Cross-check fields against OCR/MRZ and validate logical values (DOB, expiration, issue date).
- Security & Anti-Tamper: Detect truncation, padding abuse, malformed segments, and flag anomalies or signature/hash mismatches.
Fraud & Risk Defense
A real-time orchestration layer that combines model outputs with a trust graph to score and route risk decisions, optimizing thresholds through continuous feedback and counterfactual evaluation.
Risk AI agent
Integrates 100+ tabular features from face, document, liveness, and event signals to estimate the probability that a session is fraudulent. The model reduces the need for manual reviews while improving fraud detection and approval rates for legitimate users.
Trust Graph
A fraud defense layer that links users, devices, sessions, and documents to uncover coordinated attacks. It maintains a global fraud list and connects traces like faces, device hashes, and document numbers, flagging anomalies such as shared devices, repeated IDs, or deceptive personal details.
- Trace Types: Face, Face-on-ID, Device Hash, Document Number, Personal Number, Voter Number
- Fraud Reasons: Liveness Attacks, Document Tampering, Deceptive Info, Loan/Credit Abuse, Account Selling, Rewards Abuse, Circular Trading, False Invoicing
- Global Fraud List: Continuously updated across clients
Evasion Fraud
A model designed to block advanced fraud attempts that try to bypass liveness and face-recognition systems. It targets adversarial behaviors such as extreme expressions, partial or half-masks, occlusions, and other attempts to manipulate on-device capture. The model was evaluated on a dataset of 54,000 samples collected from production environments, representing real-world evasion scenarios.
Device and Behavior Intelligence
Machine learning models that analyze device, network, and user-interaction signals to assess session integrity and risk. Inputs include hardware and OS characteristics, network attributes, emulator or automation indicators, and behavioral telemetry such as typing cadence, swipe velocity, or cursor movement. The models identify anomalies, automation, and high-risk usage patterns.
Behavioral Model
Machine learning models that analyze patterns of human interaction to assess authenticity and risk. Inputs include typing cadence, keystroke dynamics, swipe velocity, scrolling behavior, and cursor movement. These models detect anomalies such as scripted activity, replayed interactions, or unusual usage patterns that may indicate fraud or automation.
Device Signal Model
Machine learning models that evaluate the integrity and risk profile of the device and network used in a session. Inputs include hardware and OS characteristics, browser and app attributes, IP and network indicators, emulator or automation flags, and sensor data. These models identify signs of compromised, emulated, or suspicious devices to prevent unauthorized access and fraud.
Governance
Comprehensive governance framework covering data practices, security, model development, fairness, and compliance to ensure responsible AI
Data Practices
Purpose limited, minimized, encrypted data with regional compliance options.
Access & Security
Role-based controls, secure SDLC, HSM key management, and full audit logging.
Dataset Quality
Curated, balanced datasets with pseudonymization and continuous QA.
Model Development
Reproducible pipelines, versioned training, and performance-driven tuning.
Fairness & Bias
Bias testing across demographics with remediation and ongoing monitoring.
Deployment Controls
Staged rollouts, canary checks, kill-switches, and secure microservice deploys.
Monitoring & Feedback
Real-time dashboards, drift detection, and fraud-focused production alerts.
Retention & Deletion
Configurable retention, verified deletion, and GDPR/CCPA-aligned policies.
Incident & Continuity
24/7 monitoring, DR readiness, and fast response to emerging fraud vectors.
Compliance & Transparency
SOC2/ISO-certified, GDPR/CCPA/LGPD compliant, with a public Trust Center.
Data Practices
- Purpose-limited collection strictly for fraud prevention and security, with explicit user consent where required.
- Data minimization combined with configurable regional residency and jurisdictional compliance options.
- Encryption in transit and at rest using AES-256, with secure key management.
- Rigorous data curation: controlled collection timelines, balanced datasets across age, skin tone, device, spoof type, and environmental conditions.
Access & Security
- Role-based access with least privilege, supported by continuous monitoring and full audit logging.
- Secrets and key management via HSMs or customer-managed keys, with rotation policies.
- Secure SDLC practices: containerized environments, dependency scanning, peer reviews, and formal change controls.
- Encrypted model storage with obfuscated decryption keys embedded in codebase.
Dataset Quality
- Human-plus-automated labeling with three-way consensus protocols and QA sampling.
- Pseudonymization wherever possible, and version-controlled datasets with full lineage.
- Continuous inflow of “in-the-wild” production data to minimize bias and drift.
Model Development
- Isolated training environments with fully reproducible pipelines.
- Explicit versioning of data, code, packages, hyperparameters, and artifacts for reproducibility.
- Structured training protocols: algorithm selection, hyperparameter tuning, loss function design, and model optimization.
- Iterative performance evaluation with ROC curve analysis, ensuring thresholds tuned for zero-tolerance false positives in sensitive tasks.
Fairness & Bias
- Pre- and post-training bias analysis across demographics (age, sex, ethnicity) and document types.
- Slice testing to detect uneven error rates across cohorts, with remediation steps.
- Continuous re-evaluation of fairness as models evolve, supported by statistical testing and stakeholder feedback loops.
Deployment Controls
- Staged rollouts with canary testing, kill-switches, rollback options, and policy-driven promotion to production.
- Integration validation with statistical and functional tests prior to deployment.
- Secure containerized deployment across microservices and SDK environments.
Monitoring & Feedback
- Real-time dashboards for fraud-capture, accuracy, latency, and inference times.
- Automated drift detection for data and models, combined with analyst investigation workflows.
- Production monitoring tuned to flag anomalies, spoofing attempts, and unsupported documents.
Retention & Deletion
- Configurable retention windows defined by customer or regulation.
- Verified deletion workflows, with immutable audit trails confirming compliance.
- Automated enforcement of retention policies aligned to GDPR/CCPA.
Incident & Continuity
- 24×7 monitoring, severity-based SLAs, and structured post-incident reviews.
- Regional redundancy, regular backup validation, and disaster recovery drills.
- Fraud Lab integration ensures quick identification of new attack vectors and response playbooks.
Compliance & Transparency
- SOC 2 Type II, ISO/IEC 27001, and ISO/IEC 30107 certifications; participation in NIST FRTE/FATE and iBeta PAD testing.
- GDPR, CCPA, LGPD compliance with full DPA/DPIA processes.
- Subprocessor due diligence with a publicly available and regularly updated list.
- Trust Center with transparency resources: Privacy Policy, Security Whitepaper, certifications, and audit reports.
Our Governance Documentation