Are contemporary facial recognition algorithms making human facial comparison performance worse?

Eden ClothierDana MichalskiChristopher MalecMarcin Nowina-Krowicki

Published in: Forensic science international (2024)

Facial recognition plays a vital role in several security and law enforcement workflows, such as passport control and criminal investigations. The identification process typically involves a facial recognition system comparing an image against a large database of faces to return a list of probable matches, called a candidate list, for review. A human then looks at the returned images to determine whether there is a match. Most evaluations of these systems tend to examine the performance of the algorithm or human in isolation, not accounting for the interaction that occurs in operational contexts. To ensure optimal whole system performance, it is important to understand how the output produced by an algorithm can impact human performance. Anecdotal claims have been made by users of facial recognition systems that the images being returned by new algorithms in these systems have become more similar in appearance compared to old algorithms, making their job of determining the presence of a match more difficult. This paper explores whether these claims are true and whether the latest facial recognition algorithms decrease human performance compared to an old algorithm from the same company. We examined the performance of 40 novice participants on 120 face matching trials. Each trial required the participant to compare a face image against a candidate list containing eight possible matches returned by either a new or old algorithm (60 trials of each). Overall, participants were more likely to make errors when presented with a candidate list from a new algorithm. Specifically, they were more likely to misidentify an incorrect identity as a match. Participants were more accurate, confident, and faster on candidate lists from the older algorithm. These findings suggest that new algorithms are generating more plausible matches, making the task of determining a match harder for humans. We propose strategies to potentially improve performance and recommendations for future research.

Keyphrases