Home » Artificial Intelligence » The Risks of Using OpenAI

The Risks of Using OpenAI

By Tony Mason in Artificial Intelligence on 2025-04-18

I never imagined that the first blog post in a long time wouldn’t be about my research but about recent events that are serving as an impediment to me completing what’s been a very long development cycle. In all of the discussions of AI with which I’m bombarded daily, I haven’t seen anyone talk about this risk: the volatility of the companies themselves.

I’ve been using OpenAI’s products for several years now. They’ve become an integral part of my research and while I have also used several other such models, I tend to come back to OpenAI repeatedly. I pay for the top tier of ChatGPT (which is a solid research tool,) and I’ve been using API access as a critical component of what drives my research (even though AI is not my research, it is the tool that demonstrates why what I’ve been doing matters.)

Two days ago, OpenAI changed API access to a number of their models to remove certain functionality unless you go through a “verification” process. I admit, I’ve seldom had one of these automated processes go well but in this case, OpenAI has done the worst job I could imagine. I used my driving licence and it then wanted me to take pictures of my face with my phone. Whoever thought that asking you to push a button on the screen while you’re looking away from the screen was a good idea is probably a graduate of the Marquis du Sade school of UX/UI design. Or they’re really systems software developers (who believe a CLI is “all anyone needs.”)

Don’t get me wrong, I chuckle watching these coding editors using “sed” as most developer now don’t know why it’s called sed (ahem) let alone using ed (the editor from UNIX – it was certainly there in V6, but from what I can tell it went away at some point.) Aside: the memories of using ed to edit files on a hard TTY (greenbar tractor feed paper, with a keyboard) are still lurking the recesses of my memory.

Turns out the verification “failed” due to some sort of technical issue so the third party provider working on OpenAI’s behalf said “you’ll have to get a new verification link.”

I was eventually able to find a chat interface to OpenAI’s support (their discord has a “bug reports” channel but I’m not allowed to post to it.) The answer from support was:

If your verification failed, you won’t be able to access certain features—like GPT-4o and other O Series models—via the API. These models require a verified organization to use. For more details, please refer to our article here: API organization verification..
Regards,
Sneha J
OpenAI Support

Hence, a very real risk of relying upon any AI supplier now is that they’re inherently unstable. I read that this new verification was added to prevent companies like Deepseek from using OpenAI to do model training against their model. Lesson: OpenAI (and likely all of the providers) are risky to rely upon because they can quickly change directions and leave you without any resolution.

I decided to dig more (which is not what I should be doing, but this one really has me annoyed. Partially it is because I really do not like this perspective of companies that seem to think “this would be a great business if it weren’t for the customers.”) Naturally, I decided to turn to ChatGPT. I’ll link the entire conversation (in case you want to dig into it) but for the post I’ll summarize my findings about these sorts of “identity verification services”. Note that I did this as a “deep research” request.

The response is long. I’ll summarize it here and include the full text following.

Summary (provided via ChatGPT, reviewed by me, not modified):

Camera-based identity verification systems (such as those offered by Persona, Onfido, Jumio, and ID.me) have made significant progress, achieving high accuracy—often exceeding 99%—in controlled conditions, according to industry benchmarks and third-party evaluations. However, real-world results are more variable, with legitimate users frequently facing verification failures due to common practical issues such as poor lighting, unclear photos, device limitations, or overly strict one-time verification attempts. These problems disproportionately affect certain demographics and older devices. While top-tier providers implement sophisticated biometric comparisons and liveness detection to reduce errors, the best user experiences combine automated checks with manual fallback processes and reasonable retry policies, highlighting that reliability depends as much on thoughtful implementation and fairness as on technical accuracy.

Query and Response

Here’s a public link to the entire conversation:

’ll research the accuracy and reliability of identity verification services that use camera-based biometric comparison and document scanning—both generally across the industry and specifically for withpersona.com, if data is available.

I’ll also look for:

Reported accuracy metrics (false positives/negatives)
Common failure scenarios (e.g., document positioning, lighting, chip scanning)
Any published benchmarks, user feedback, or audits
Regulatory commentary or standards (e.g., NIST guidelines)

I’ll let you know what I find!

Accuracy and Reliability of Camera-Based Identity Verification

Overview of Camera-Based Identity Verification

Camera-based third-party identity verification (IDV) typically involves two key steps: document verification (scanning a photo ID for authenticity) and selfie biometrics (capturing a live facial image to compare with the ID photo). This process is widely used for remote Know Your Customer (KYC) and user onboarding. Modern services combine optical checks (OCR of text, security feature detection, and even RFID chip reading on e-passports) with facial recognition (matching the selfie to the ID photo) and often include liveness detection to ensure the person is real and present. The goal is to accurately approve legitimate users while catching impostors, which is measured by metrics like false acceptance and rejection rates, as described below.

Key Accuracy Metrics and Industry Benchmarks

False Acceptance Rate (FAR) and False Rejection Rate (FRR) are the standard metrics for accuracy. FAR is the percentage of fraudulent or mismatched users that are erroneously accepted (i.e. impostors slipped through), while FRR is the percentage of legitimate users incorrectly rejected (false negatives). An ideal system has both FAR and FRR as low as possible, but in practice there is a trade-off – tightening thresholds to reduce FAR often raises FRR. Many providers report confidence scores for matches, and clients choose a threshold (e.g. require similarity score above X% to pass).

Vendor-Reported Performance: Top vendors claim extremely high accuracy. For example, Onfido (a major IDV provider) announced that with its latest AI system (“Motion” with Atlas AI), false rejection and false acceptance rates are both below 0.1% (Onfido launches the next generation of facial biometric technology | Onfido). Onfido also reports improving its false acceptance rate 10× to ~0.01% on average after bias-mitigation efforts (Onfido’s Real Identity Platform Improves Performance by 12x | Onfido). Jumio, another leading provider, integrates iProov’s liveness technology; iProov is known for “industry-leading accuracy and low false rejection rates” that yield high user pass rates (Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform | iProov). Socure (which offers a combined document and data-driven verification) claims up to 99% identity verification success rates for mainstream user populations (Socure Verifies Over 2.7 Billion Identity Requests in 2024, Achieves …). Persona (withpersona.com) has not publicly disclosed specific error rates, but its platform is certified by independent labs – their selfie liveness passed rigorous iBeta Level-2 testing with 0% spoof acceptance (APCER) () and was evaluated by NIST and DHS, indicating high accuracy. Persona emphasizes that its models showed “no material bias across age, sex, or skin tone” in internal and third-party tests (Industry-Leading, Lab-Certified Face Recognition | Persona).

Independent Benchmark Results: Neutral evaluations reveal a wider range of performance across the industry:

A 2024 General Services Administration (GSA) study (with Clarkson Univ.) tested five major remote IDV solutions (selfie-to-ID match) on ~4,000 people. Results varied dramatically – the best system had about 10.5% false negative (false rejection) rate (with a 95% confidence interval ±4.5%), while another system had over 50% false rejection rate for genuine users (GSA testing finds variations in the accuracy of digital ID verification tech – Nextgov/FCW) ([2409.12318] A large-scale study of performance and equity of commercial remote identity verification technologies across demographics). In other words, some solutions were fairly accurate, but at least one failed half of legitimate users, highlighting reliability issues in parts of the industry. Notably, two out of five solutions were deemed “equitable” (no significant performance bias across demographics), but others showed higher failure rates for certain groups (e.g. one vendor had significantly higher rejection of Black/African American users and those with darker skin tones) ([2409.12318] A large-scale study of performance and equity of commercial remote identity verification technologies across demographics).
The DHS Science & Technology RIVTD evaluation (2023-2024) provides a detailed benchmark of top vendors’ capabilities in document authentication, face matching, and spoof detection. In the document authenticity test, 12 document verification systems were tested with 1,000 genuine IDs and 1,000 fakes on various smartphones. The performance depended on the ID issuer and device used; DHS recommended choosing systems with <10% error rates for document detection (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update) (this threshold is being incorporated into NIST SP 800-63-4 guidelines). In the face match test (selfie vs. ID photo), many algorithms performed extremely well under controlled conditions: over half of the systems matched legitimate users with >99% accuracy, achieving False Non-Match Rates below 1% at a strict False Match Rate of 0.01% (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). (In fact, 9 out of 16 algorithms had FNMR <1% at FAR 1:10,000 (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update).) However, there were also low performers (one “worst” system failed to match any IDs correctly). On average, 55% of errors stemmed from the ID photo extraction (poor image capture from the document) (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). The impostor test in this study found that when a random person’s selfie was tested against someone else’s ID, systems correctly rejected those attempts >99.99% of the time on many systems (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). But if the impostor shared similar demographics to the victim, the success rate of impostors was 10× higher (a sign that impostors who look similar to the target are harder to catch) (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update).
For liveness/presentation attack detection (PAD), the DHS test showed wide variance in user experience. Active liveness systems (those asking users to turn their head or perform an action) had bona fide pass rates ranging from only 41% up to 94%, with a median around 85% (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). This means some active systems were very finicky (failing almost 60% of real users in the worst case), whereas the best were quite seamless. Interestingly, error rates were higher for older users – 9% for ages 18–45 vs. 20% for 46+ in active liveness checks (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). Passive liveness systems (which analyze a selfie image or short video clip without explicit user action) generally operated faster and more consistently; the median passive system had virtually 0% error for real users, though the worst passive system only succeeded 62% of the time (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). In terms of security, at least two active and two passive systems caught 100% of spoof attacks in the test (i.e. 0% spoof acceptance), whereas one poor system failed to detect 88% of spoof attempts (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). Device type had a significant impact on both usability and security performance for liveness – results varied widely across different smartphones, indicating hardware/camera differences can affect reliability (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update).

Summary of Provider Accuracy (Approximate):

Provider	False Rejection Rate (FRR) – Legit User Failures	False Acceptance Rate (FAR) – Fraud Passed	Notable Accuracy Credentials
Onfido (Motion platform)	< 0.1% (vendor reported) ([Onfido launches the next generation of facial biometric technology	Onfido](https://onfido.com/press-release/onfido-launches-motion-the-next-generation-of-facial-biometric-technology/#:~:text=The%20Onfido%20Real%20Identity%20Platform,1))	< 0.1% (vendor reported) ([Onfido launches the next generation of facial biometric technology
Persona (withpersona.com)	Not published (estimated high 90s% pass rate)	Not published (threshold adjustable)	iBeta Level-2 liveness passed (0% spoof success) (); NIST-tested face match; in-house bias testing shows no material bias ([Industry-Leading, Lab-Certified Face Recognition
Jumio (w/ iProov liveness)	~1% or lower (implied)	Very low (implied)	Uses iProov Genuine Presence Assurance; known for high conversion and “low false rejection” ([Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform
ID.me (US govt-focused)	~10–15% (automated process failure) ([A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me	CyberScoop](https://cyberscoop.com/irs-facial-recognition-identity-privacy/#:~:text=According%20to%20ID,not%20available%20to%20IRS%20customers))	N/A (fallback to manual)

Sources: Onfido and Jumio figures from vendor press releases; ID.me figure from House Oversight Committee findings (ID.me disclosed ~10–15% of users could not be verified via automated selfie match) (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop). Persona figures from lab certifications and Persona’s disclosures.

Common Causes of Failures and False Rejections

Even with advanced algorithms, failed verifications happen for a variety of practical reasons:

User Error & Poor Image Quality: The most common cause of failure is low-quality images provided by users (Buyer’s Guide to Identity Verification Solutions | Persona). If the camera capture is blurry, too small, or has glare, the system may be unable to read the ID or identify the face. Persona notes that “one of the most common reasons users fail verification checks is because their photos aren’t detailed enough for the tool to read.” (Buyer’s Guide to Identity Verification Solutions | Persona) This can be due to a dirty camera lens, low lighting, or motion blur, which make the ID text or face illegible. In such cases, the document or selfie will be rejected as “not legible.” (Complete Guide to Document Verification: Process, Benefits & Compliance | Persona) Many systems try to guide users (e.g. on-screen prompts to retake a blurry photo), but if the user does not or cannot capture a clear image, false rejections rise.
Document Issues – Wear, Expiry, or Mismatch: Physical IDs that are damaged, worn, or have faint print can cause the automated checks to fail. An ID with worn-out security features or a scratched photo might not pass authenticity verification. Expired documents are usually rejected by policy (Complete Guide to Document Verification: Process, Benefits & Compliance | Persona). Additionally, if the document data doesn’t match the user’s input data (name, DOB, etc.), the system flags a problem (Complete Guide to Document Verification: Process, Benefits & Compliance | Persona). These lead to legitimate users being flagged if, for example, they changed their name or input a nickname that doesn’t exactly match their ID. (Persona’s guide gives the example that a legal name change not reflected across documents can trigger a false negative (Complete Guide to Document Verification: Process, Benefits & Compliance | Persona).)
Lighting and Environment: Good lighting and camera focus are critical. Backlighting or extreme glare on an ID can foil OCR and face detection. Conversely, very low light or shadows can obscure a face. Many mobile verification SDKs now include auto-capture and exposure adjustment – e.g. waiting until the ID is steady and focused before snapping, or asking the user to move to a brighter area. Still, in real-world use, users may attempt verification in suboptimal conditions (nighttime indoor lighting, etc.) leading to higher failure rates.
Hardware and Device Variation: The type of camera or phone used can significantly affect reliability. Higher-end smartphones with good cameras tend to produce better results. The DHS RIVTD tests found performance differences across phone models were so large that they specifically recommend vendors ensure broad device compatibility (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update) (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). Older phones may have low resolution or poor autofocus. Additionally, if using NFC to read a passport’s RFID chip, not all phones have NFC readers, and even when they do, aligning the chip correctly can be tricky for users (leading to failure to acquire the chip data). Some providers report Failure to Acquire rates as part of their metrics for chip reading. For example, FIDO Alliance has testing concepts for Document Failure-to-Acquire Rate (when the system can’t even read/scan the document) (Document Authenticity Verification Requirements – FIDO Alliance) – a high failure-to-acquire can translate to user friction. Browser or OS issues can also interfere (e.g. a user not granting camera permission, outdated browser not supporting the video feed, etc., will prevent the capture altogether).
Liveness and Anti-Spoof Sensitivity: Liveness detection adds another potential point of failure. Systems must balance security vs. convenience. If liveness checks are too strict, they might falsely flag genuine users (e.g. unusual lighting might be misinterpreted as a mask or screen spoof). For instance, some face liveness AI might mis-read glasses glare as a spoof attempt or think a low-resolution selfie is a “replay” attack (What is Selfie Identity Verification? | Persona). Persona cautions that even advanced liveness can produce “false negatives” – e.g. “someone’s eyeglasses may fool the system… or a low-resolution photo may trick the system into thinking it’s a digital replay” (What is Selfie Identity Verification? | Persona). In those cases, a real user could be wrongly rejected. The DHS active liveness results (41–94% genuine pass rate) underline how some implementations are much more user-friendly than others (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update). Generally, active liveness (asking the user to turn their head, blink, etc.) provides strong security but can fail if the user doesn’t follow instructions correctly or has mobility issues. Passive liveness (just analyzing a static selfie or short video) is easier on the user but technically challenging; the best passive systems achieve low errors, but others might either mistakenly reject users or allow spoofs if not robust (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update).
One-Time Attempts and Rigid Processes: Some verification flows only give users a single attempt or a very limited number of retries, which can lead to high drop-off if that attempt fails. For security, a business might choose to lock the verification after one failed try to prevent attackers from brute-forcing the process. However, this “one-and-done” policy can be harsh on legitimate users who made a mistake. Real-world user reports illustrate this pain point: for example, during state unemployment verifications with ID.me, some users were only allowed one automated attempt and then had to wait in a long queue for a video call after a failure (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU). If no alternative path is provided, a single camera glitch can force the user out of the fast lane entirely. User experience reports on forums frequently cite frustration when “the system won’t accept my ID” and there’s no clear recourse – e.g., LinkedIn’s Persona-based verification initially gave some users trouble with few retry options or support channels (Linkedin and Persona : r/privacy – Reddit). A balanced approach is to allow a couple of retakes for issues like glare, while still halting endless tries for security.

Provider Comparison and Approaches

The IDV industry has several major providers – including Persona, Jumio, Onfido, and ID.me (mentioned by name), as well as others like Veriff, Onfido, Socure, LexisNexis, Incode, etc. – each with different strengths. Below is a brief comparison of how they stack up on accuracy and reliability, and their known approaches:

Persona (withpersona.com): Persona offers an integrated, multi-layer verification with document scanning, face recognition, database checks, etc. Rather than disclosing a single “accuracy” percentage, Persona highlights its ensemble of models and independent certifications. According to Persona, their face matching and liveness AI were tested by NIST and DHS and showed “defense-grade” results (Industry-Leading, Lab-Certified Face Recognition | Persona). They achieved iBeta Level 2 compliance (meaning their liveness detection blocked 100% of spoof attempts in a certified lab test) (). Persona also focuses on bias mitigation – they’ve written about measuring and reducing demographic bias in face verification, and claim their models show no significant performance gaps across different ages, genders, or skin tones (Industry-Leading, Lab-Certified Face Recognition | Persona). In practice, Persona’s clients can configure the strictness of checks (e.g. what confidence score to accept, whether to require NFC chip verification, etc.). This flexibility allows tuning fraud vs. conversion trade-offs. Persona emphasizes conversion and user experience (auto-capturing images at the right moment, providing real-time feedback to correct issues) to reduce accidental failures. For example, their SDK provides “dynamic error handling” and tips to improve image quality (Industry-Leading, Lab-Certified Face Recognition | Persona). Real-world use: Persona is used by companies like cryptocurrency exchanges, fintech apps, and even LinkedIn for ID verification. Users have reported occasional friction (such as needing multiple tries to get a clear photo), but Persona’s system typically does allow retry attempts and even manual review options if automated steps fail (Persona ID verification : r/linkedin – Reddit). Overall, Persona’s accuracy appears on par with top-tier vendors, given its certifications, though exact FAR/FRR stats aren’t public.
Onfido: Onfido is a well-established provider that has invested heavily in AI (their “Atlas” AI engine) and has a large global customer base. Onfido’s published metrics are impressive: their system is 95% automated, with most checks done in seconds (Onfido’s Real Identity Platform Improves Performance by 12x | Onfido). They report <0.1% FAR and FRR under optimal settings (Onfido launches the next generation of facial biometric technology | Onfido), and have demonstrated improvements in reducing bias (a 10× reduction in false accept disparities) (How Onfido mitigates AI bias in facial recognition). Onfido’s liveness and face match are also iBeta Level 2 certified and were audited for fairness by the UK’s ICO (Onfido’s Real Identity Platform Improves Performance by 12x | Onfido). One notable aspect is Onfido’s “Motion” active liveness, which has the user turn their head in a short video; this was introduced to combat deepfakes and 3D mask attacks, and was found compliant with the stringent ISO 30107-3 standard (Onfido launches the next generation of facial biometric technology | Onfido) (Onfido launches the next generation of facial biometric technology | Onfido). In customer deployments, Onfido often balances auto-approval with fallback to manual review for edge cases. They promote a “hybrid” approach where AI handles the bulk of verifications and questionable cases get human review – this helps keep false rejects low without letting fraud through. Onfido’s scale (tens of millions of verifications per year) suggests their reported error rates are averaged over many scenarios; certain populations or documents might see higher friction, but they continuously retrain on new data. In sum, Onfido is viewed as having very high accuracy (enterprise-grade) and is frequently benchmarked in analyst reports. In Gartner’s 2024 Critical Capabilities report, for instance, Onfido and Persona both ranked highly (Persona was noted as a top performer across use cases) in part due to their accuracy and flexibility.
Jumio: Jumio is another leading vendor known for a broad identity platform. Historically, Jumio’s selfie verification had solid accuracy, and in 2021 they partnered with iProov to further enhance liveness and face match performance (Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform | iProov) (Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform | iProov). iProov’s system (which uses a brief face scan with illuminated colors) is designed for inclusivity and high pass rates – it’s used by government agencies like DHS and the UK Home Office (Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform | iProov). This indicates strong performance under strict testing. While Jumio hasn’t publicly quoted specific error rates recently, the integration of iProov suggests false rejection rates well under 1-2% in practice and excellent spoof resistance. (iProov’s own tests with governments showed near-perfect spoof blocking and 99%+ completion rates, even for older users or those less tech-savvy (Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform | iProov).) Jumio’s document verification is also respected; they were an early mover in ID authenticity checks and likely participated in the DHS or other evaluations (though results were anonymized). A U.S. GSA privacy assessment in 2023 listed Jumio among vendors to be studied for equity (GSA testing finds variations in the accuracy of digital ID verification tech – Nextgov/FCW). Jumio also supports NFC scanning for passports and even Face Match against chip data – using the high-resolution photo in the RFID chip to compare with the selfie, which can improve match accuracy when available. Overall, Jumio’s strategy is to offer high-security, compliance-focused solutions (they often highlight GDPR compliance and data security), while leveraging top-tier biometric tech for reliability. Clients of Jumio (banks, airlines, etc.) often report good verification rates, but like others, issues can arise with users on old devices or unfamiliar with the process (which Jumio mitigates with UI guidance and their “Netverify” SDK’s automatic capture features).
ID.me: ID.me is somewhat unique as it has been heavily used by government agencies in the US (e.g. IRS, state unemployment systems) and has a mixed reputation in terms of user experience. On the one hand, ID.me’s identity verification has caught a huge amount of fraud during the pandemic (they claim to have blocked substantial identity theft attempts, including by requiring selfies where criminals had only stolen documents). On the other hand, ID.me’s automated face match has a relatively high false rejection rate, requiring many users to undergo manual video calls. In testimony to Congress, ID.me revealed that 10–15% of users could not be verified through the automated selfie-matching process (Chairs Maloney, Clyburn Release Evidence Facial Recognition …). In real numbers, ID.me has stated that only ~70-85% of people complete verification self-serve for certain programs, and the rest need human intervention (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop). Those who fail initially must join a video chat with a “Trusted Referee,” which led to backlogs — there were reports of users waiting hours or even being “booted out” of virtual queues due to technical difficulties (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU). The high FRR can be attributed to a combination of factors: many users verifying were not in ideal conditions (some had limited broadband or older devices), and ID.me’s system settings erred on the side of fraud prevention, possibly with stricter thresholds. ID.me’s CEO has claimed their face match algorithm (provided by Paravision) is very accurate in lab terms – he cited false match error rates “as low as less than 1%, with insignificant variation across race/sex”, for 1:1 matching (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop). However, real-world conditions (aging of ID photos, low-quality selfies, user errors) meant about a 10-15% false non-match rate in practice. ID.me also controversially was using a 1:many face search (comparing the selfie against a larger database to prevent duplicate identities) which is generally less accurate and raised privacy concerns (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop). They have since downplayed this 1:many usage after backlash. From an industry perspective, the ID.me case underscores that even if an algorithm is top-tier, operational decisions (like offering no in-person alternative, or allowing only one try) can impact effective reliability. Regulators and advocacy groups (ACLU, etc.) noted ID.me’s system was “nearly universally reviled by users for its poor service and difficult verification process.” (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU) The company has responded by increasing its support staff and claiming to have cut video call wait times by 86% and average waits under 10 minutes (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop). Going forward, the federal government (Login.gov) is exploring other solutions, emphasizing that any chosen system must be equitable and highly accurate for all users.
Other Notable Providers: Socure Verify, Onfido, Incode, LexisNexis ThreatMetrix, TransUnion, Veriff, Microsoft (Azure AD Verify), Google (Cloud Identity Toolkit), etc., all offer ID verification services with broadly similar technology. Many have published case studies or white papers with glowing statistics, but fewer have third-party audits available. For example, Socure asserts that its AI-based approach (which combines document verification with extensive data source cross-checks) achieves +8-10% higher verification rates for “hard-to-identify” demographics compared to competitors (Socure Launches Compliance Product Suite to Optimize ID …) – indicating a focus on maximizing inclusivity. Incode and HyperVerge have boasted about meeting all benchmarks in the DHS tests (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update), suggesting top-tier accuracy. What distinguishes providers often is the workflow flexibility and fallback procedures: e.g., some offer integrated manual review services or allow additional identity evidence (like a second ID, a utility bill, or knowledge-based questions) if the primary check fails. These mitigations can raise overall success rates. Providers also differentiate with geographical coverage (ability to recognize IDs from many countries), compliance certifications (GDPR, SOC 2, ISO27001), and whether they keep data onshore for certain jurisdictions. All these factors play into reliability – e.g., an IDV service that can read the MRZ on an international passport and verify its chip will be more reliable for foreign users than one that only knows US IDs.

Regulatory Standards and Evaluations

Given the critical role of ID verification in security and access, standards and regulations have emerged to guide accuracy and fairness requirements:

NIST SP 800-63: The National Institute of Standards and Technology’s Special Publication 800-63 (in particular 800-63A) provides a framework for digital identity proofing at various Identity Assurance Levels (IAL). For remote ID verification (IAL2/IAL3), NIST recommends the use of document authentication plus biometric comparison. While SP 800-63-3 (current as of 2017) doesn’t mandate specific error rates, the draft SP 800-63-4 is expected to incorporate findings from recent evaluations. As noted, DHS’s benchmark suggested selecting systems with <10% document and biometric error rates (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update), so future guidance may explicitly call for vendors meeting that bar. NIST also runs the Face Recognition Vendor Tests (FRVT) – an ongoing benchmark of face matching algorithms. Many vendors (or the algorithm suppliers they use) participate, and top algorithms in 1:1 verification now achieve extremely low error rates (FNMR well below 0.5% at FAR=1e-6 in some cases). However, these FRVT tests use high-quality images; NIST acknowledges that real-world performance will be worse due to capture issues. Still, agencies refer to FRVT rankings when vetting technology. Additionally, NIST and the GSA have placed huge emphasis on demographic bias testing, as exemplified by the GSA’s study ([2409.12318] A large-scale study of performance and equity of commercial remote identity verification technologies across demographics). Any vendor selling to government is under pressure to demonstrate that their false match/reject rates do not disproportionately impact any race, gender, or age group. This has led to internal testing and improvements – e.g., Onfido and Microsoft both have published methodologies for reducing bias in face AI (How Onfido mitigates AI bias in facial recognition).
GDPR and Data Protection: In the EU (and other jurisdictions with similar privacy laws), the use of facial recognition and biometrics for identity verification must comply with GDPR. Biometrics are considered “special category” personal data under GDPR, requiring explicit user consent (or another narrow legal basis) and subject to strict security and minimization requirements. IDV providers usually obtain the user’s consent to process their ID photo and selfie for verification purposes. They also have to handle data retention carefully – many offer options to auto-delete biometric data after verification or store it only in a hashed form, to alleviate privacy concerns. For example, Persona’s policy lets business clients configure how long data is stored, to help them meet regional privacy rules. Providers targeting Europe often undergo third-party audits and certify to standards like ISO 27001 or SOC2, and some join the EU-U.S. Data Privacy Framework to lawfully transfer data. Another aspect is GDPR’s accuracy principle – organizations processing personal data must ensure it’s accurate and up-to-date. For IDV, this can be interpreted as a need to ensure the verification results are correct (to not wrongly deny someone access due to a false negative). In practice, a false rejection might be seen as an “inaccuracy” in personal data processing. While not usually litigated, it’s something companies pay attention to in order to avoid claims of algorithmic discrimination under GDPR or related laws.
Certification Schemes: Beyond NIST and GDPR, there are industry certifications. We’ve mentioned iBeta PAD certification (which is essentially required by many banks/fintechs to ensure liveness spoof resilience at ISO 30107 Levels 1 or 2). Many providers proudly cite passing iBeta Level 1 or 2 (Onfido, Persona, Facetec, iProov, etc. all have). There’s also the UK Digital Identity and Attributes Trust Framework (DIATF), which certifies IDV providers for use in verifying identities in the UK – companies like Jumio, Onfido, and Yoti have been certified, which involves meeting performance and security benchmarks. Similarly, in Canada, the DIACC’s trust framework and in Australia the “TDIF” set requirements for biometric accuracy (often referencing back to NIST FRVT results or ISO standards). NIST 800-63-3 at IAL2 effectively requires agencies to use services that have demonstrable equivalent assurance to an in-person check of photo ID; this has driven agencies to demand evidence from vendors (test results, audits).
Audit and Transparency: Some providers have undergone independent audits or published white papers with performance data. For instance, Onfido published a white paper on reducing bias with detailed breakdowns of false acceptance rates by demographic after various training interventions (How Onfido mitigates AI bias in facial recognition). Microsoft’s Face API team similarly published data on how they reduced error rate disparities. These are positive steps, but not all vendors share such detail publicly. The U.S. Government (GSA) study on equity (GSA testing finds variations in the accuracy of digital ID verification tech – Nextgov/FCW), once finalized in 2025, will likely shine light on each tested vendor’s strengths/weaknesses (if vendors consent to be named), which could push the industry toward more transparency. In Europe, the proposed EU AI Act could classify “remote biometric identification” systems as high-risk, meaning providers might have to undergo conformity assessments and provide documentation on accuracy, testing, and bias mitigation as a legal requirement.

Real-World User Experiences and Limitations

While metrics and certifications tell one side of the story, user experience in the wild often uncovers limitations. Identity verification, when it works seamlessly, barely gets noticed – but when it fails, users can be very vocal. Here are some real-world insights:

Demographic Disparities: As noted, certain groups have historically faced higher error rates in face matching. Older adults sometimes struggle with the selfie step (they may have more trouble aligning their face or may present an appearance that differs significantly from their ID photo taken years earlier). People with very dark skin tones have been shown in some studies to experience higher false rejection in facial recognition systems that were not properly trained – the GSA study confirmed one vendor had this issue, rejecting a disproportionate number of Black users ([2409.12318] A large-scale study of performance and equity of commercial remote identity verification technologies across demographics). This not only frustrates users but can deny access to essential services. In response, companies are diversifying training data and testing. For example, Microsoft and FaceTec both improved their algorithms after early bias critiques. Persona explicitly mentions using ethically sourced, diverse data and testing for bias (Industry-Leading, Lab-Certified Face Recognition | Persona). Still, users occasionally report anecdotes like “I had to try multiple times, but my lighter-skinned friend got through on first try” – individual experiences vary, and perception of bias can harm trust even if unintentional.
One-Time/One-Channel Verification: Some implementations (especially in government or high-security contexts) give users no fallback options – e.g., no alternative to doing the selfie. The ACLU criticized systems that “don’t provide an accessible offline alternative”, noting that forcing everyone through a selfie upload can exclude those without smartphones or with disabilities (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU) (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU). A harsh reality was during COVID, unemployment claimants who had poor internet or no webcam simply had no way to verify when states only offered the online ID.me route (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU). This taught agencies that having alternative verification methods (in-person, mail-in, or at least video chat on low bandwidth) is important for equity. From a user’s view, a reliable system isn’t just one that’s accurate when working, but one that offers help when it fails. Many vendors now offer omni-channel support: e.g., some partner with networks of retail locations where a user can show their ID to a clerk as a backup, or they offer postal verification. These aren’t camera-based, but they improve overall reliability of the identity proofing process.
Strict Retry Policies: As mentioned, a single failed attempt can put a user in “identity verification limbo.” Some exchanges or apps allow only one submission of documents to prevent fraudsters from trial-and-error. But genuine users also get only one shot – if their camera glitched or their hands shook, they might be locked out. Users have complained on forums about scenarios like being banned from a platform because the ID verification failed once and there was no second chance. Good practice is to allow at least a small number of retries (since most failures are benign issues like blur). Manual review is the ultimate fallback: companies like Persona and Onfido offer services or tools for a human agent to review the documents and selfie if automation can’t make a definitive decision. While manual review is slower (minutes or hours instead of seconds) and costlier, it dramatically increases overall success rates by rescuing false rejects. For example, one fintech noted that adding a human-overread for failed automated checks raised their total verification pass rate several percentage points and saved many customer relationships (Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform | iProov) (since those users would have been denied by AI alone). However, not all companies utilize this – some low-cost providers or strict compliance scenarios simply reject and require the user to contact support. The user experience in those cases can be painful.
Harsh Implementation Stories: A notable case was the IRS’s attempted rollout of mandatory ID.me in 2022. Taxpayers were alarmed at having to submit selfies, and reports surfaced of people unable to verify in time to meet filing deadlines. Under public pressure, the IRS dropped the requirement (though it still offers ID.me as an option) (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop) (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop). This showed that public acceptance of facial verification is still tenuous if people feel it’s not reliable or private enough. In contrast, when private-sector users perceive a clear benefit (e.g., faster onboarding for a bank account or higher security), they tend to accept it – especially younger users, 77% of whom find biometrics more convenient than traditional methods according to surveys (Onfido launches the next generation of facial biometric technology | Onfido). Providers must navigate this by being transparent (explaining why they need a selfie) and providing recourse. Some users have also highlighted accessibility issues – for example, people who are blind or have low vision might not be able to center their ID or face in the frame without help. Liveness checks that require specific movements could be hard for those with limited mobility. Regulations (like the ADA in the U.S.) are beginning to consider these aspects; providers have started including accessibility features (such as voice instructions, ability to use keyboard instead of screen tapping, etc.).

In summary, modern camera-based identity verification services are highly accurate under ideal conditions – often above 95-99% success – but real-world factors can reduce those rates. Industry leaders like Persona, Onfido, Jumio have achieved low error rates through advanced AI and liveness checks, as evidenced by independent audits and certifications (Onfido’s Real Identity Platform Improves Performance by 12x | Onfido) (). However, studies and user feedback reveal that common failure points include poor image quality, suboptimal lighting, device variability, and stringent process rules. When comparing providers, it’s clear they all have had to address the FAR vs. FRR trade-off: some, like ID.me, initially prioritized fraud prevention (low FAR) at the expense of user experience (higher FRR ~10%+ requiring manual intervention) (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop). Others, like Onfido’s Motion or Jumio with iProov, strive for a better balance, leveraging tech that keeps both types of errors extremely low (sub-1%) so that most legitimate users sail through while stopping nearly all impostors (Onfido launches the next generation of facial biometric technology | Onfido).

Conclusion

Camera-based third-party ID verification has rapidly advanced in accuracy due to improved AI models, huge training datasets, and rigorous testing. The best systems today verify identities with minimal errors – e.g., false acceptance rates on the order of 0.01% and false rejection rates well under 1% in controlled settings (Onfido’s Real Identity Platform Improves Performance by 12x | Onfido) (Onfido launches the next generation of facial biometric technology | Onfido). They incorporate multi-faceted checks (document chip reading, face matching, liveness, fraud analytics) to boost reliability and security. Nonetheless, no system is foolproof. Environmental and human factors will cause some legitimate users to fail automated checks, which is why backup procedures (retries, manual review, alternative verification) are crucial for a fair and inclusive implementation. Industry benchmarks from NIST, GSA, and DHS show that while many vendors perform at a high level, there is significant variance – indicating organizations must carefully evaluate providers (perhaps even conduct a “bake-off” pilot test of their own) (Buyer’s Guide to Identity Verification Solutions | Persona) (Buyer’s Guide to Identity Verification Solutions | Persona) rather than trusting glossy marketing stats alone.

Looking forward, ongoing independent evaluations (like the upcoming peer-reviewed GSA report in 2025 (GSA testing finds variations in the accuracy of digital ID verification tech – Nextgov/FCW) and the planned DHS Remote Identity Validation Rally in 2025 (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update)) will shed more light on each service’s strengths and weaknesses. Providers that invest in usability, broad device support, and bias reduction are likely to stand out. Likewise, compliance with evolving standards (NIST 800-63-4, AI Act, etc.) will be a differentiator – accuracy isn’t just a technical goal but a regulatory expectation and ethical mandate. Users’ real-world experiences remind us that an identity verification system’s success is measured not just by percentages in a lab, but by its ability to handle the diversity of people and conditions out in the world. In that regard, the industry is moving in a positive direction: error rates continue to drop, and awareness of edge cases is growing. With multi-layered approaches (document + biometric + database checks) and human-in-the-loop fail-safes, camera-based ID verification services can achieve both high accuracy and robust reliability, enabling security without shutting out the honest users who just want to prove “I am who I claim to be.”

Sources:

GSA/Clarkson University study on remote ID proofing (2024) (GSA testing finds variations in the accuracy of digital ID verification tech – Nextgov/FCW) ([2409.12318] A large-scale study of performance and equity of commercial remote identity verification technologies across demographics)
DHS S&T RIVTD Evaluations (2023) summarized by Biometric Update (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update) (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update)
Persona documentation and blog (Complete Guide to Document Verification: Process, Benefits & Compliance | Persona) (Industry-Leading, Lab-Certified Face Recognition | Persona)
Onfido press releases and white papers (Onfido’s Real Identity Platform Improves Performance by 12x | Onfido) (Onfido launches the next generation of facial biometric technology | Onfido)
iProov Press (Jumio partnership) (Jumio Adds iProov’s Award-Winning Liveness Detection to its KYX Platform | iProov)
House Oversight Committee findings on ID.me (2022) (A year after outcry, IRS still doesn’t offer taxpayers alternative to ID.me | CyberScoop)
ACLU analysis of ID.me issues (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU) (Three Key Problems with the Government’s Use of a Flawed Facial Recognition Service | ACLU)
Socure performance claims (Socure Launches Compliance Product Suite to Optimize ID …)
NIST and FIDO standards (ISO 30107 PAD, NIST SP 800-63-3/4) (Understanding the results of DHS S&T’s RIVTD biometrics assessment | Biometric Update) ().

Tags: OpenAI