The Dangerous Junk Science of Vocal Risk Assessment
25 Nov 2018 – Is it possible to tell whether someone is a criminal just from looking at their face or listening to the sound of their voice? The idea may seem ludicrous, like something out of science fiction — Big Brother in “1984” detects any unconscious look “that carried with it the suggestion of abnormality” — and yet, some companies have recently begun to answer this question in the affirmative. AC Global Risk, a startup founded in 2016, claims to be able to determine your level of “risk” as an employee or an asylum-seeker based not on what you say, but how you say it.
The California-based company offers an automated screening system known as a Remote Risk Assessment, or RRA. Here’s how it works: Clients of AC Global Risk help develop automated, yes-or-no interview questions. The group of people selected for a given screening then answer these simple questions in their native language during a 10-minute interview that can be conducted over the phone. The RRA then measures the characteristics of their voice to produce an evaluation report that scores each individual on a spectrum from low to high risk. CEO Alex Martin has said that the company’s proprietary risk analysis can “forever change for the better how human risk is measured.”
AC Global Risk, which boasts the consulting firm of Robert Gates, Condoleezza Rice, and Stephen Hadley on its advisory board, has advertised contracts with the U.S. Special Operations Command in Afghanistan, the Ugandan Wildlife Authority, and the security teams at Palantir, Apple, Facebook, and Google, among others. The extensive use of risk screening in these and other markets, Martin has said, has proven that it is “highly accurate, scalable, cost-effective, and capable of high throughput.” AC Global Risk claims that its RRA system can simultaneously process hundreds of individuals anywhere in the world. Now, in response to President Donald Trump’s calls for the “extreme vetting” of immigrants, the company has pitched itself as the ultimate solution for “the monumental refugee crisis the U.S. and other countries are currently experiencing.”
It’s a proposal that would seem to appeal to the U.S. Department of Homeland Security. The DHS has already funded research to develop similar AI technology for the border. The program, known as the Automated Virtual Agent for Truth Assessments in Real-Time, or AVATAR, used artificial intelligence to measure changes in the voice, posture, and facial gestures of travelers in order to flag those who appeared untruthful or seemed to pose a potential risk. In 2012 it was tested by volunteers at the U.S.-Mexico border. The European Union has also funded research into technology that would reduce “the workload and subjective errors caused by human agents.”
Some of the leading experts in vocal analytics, algorithmic bias, and machine learning find the trend toward digital polygraph tests troubling, pointing to the faulty methodology of companies like AC Global Risk. “There is some information in dynamic changes in the voice and they’re detecting it. This is perfectly plausible,” explained Alex Todorov, a Princeton University psychologist who studies the science of social perception and first impressions. “But the question is, How unambiguous is this information at detecting the category of people they’ve defined as risky? There is always ambiguity in these kinds of signals.”
Over the past year, the American Civil Liberties Union and others have reported that Border Patrol agents have been seizing people from Greyhound buses based on their appearance or accent. Because Customs and Border Protection agents already use information about how someone speaks or looks as a pretext to search individuals in the 100-mile border zone, or to deny individuals entry to the U.S., experts fear that vocal emotion detection software could make such biases routine, pervasive, and seemingly “objective.”
AC Global Risk declined to respond to repeated requests for comment for this article. The company also did not respond to a list of detailed questions about how the technology works. In public appearances, however, Martin has claimed that the company’s proprietary analytical processes can determine someone’s risk level with greater than 97 percent accuracy. (AVATAR, meanwhile, claims an accuracy rate of between 60 and 70 percent.) Several leading audiovisual experts who reviewed AC Global Risk’s publicly available materials for The Intercept used the word “bullshit” or “bogus” to describe the company’s claims. “From an ethical point of view, it’s very dubious and shady to give the impression that recognizing deception from only the voice can be done with any accuracy,” said Björn Schuller, a professor at the University of Augsburg who has led the field’s major academic challenge event to advance the state of the art in vocal emotion detection. “Anyone who says they can do this should themselves be seen as a risk.”
Trump’s Extreme Vetting Initiative has called for software that can automatically “determine and evaluate an applicant’s probability of becoming a positively contributing member of society” and predict “whether an applicant intends to commit criminal or terrorist acts after entering the United States,” as The Intercept reported last summer. AC Global Risk has pitched itself as the perfect tool for carrying out this initiative, offering to assess “the risk levels of individuals with unknown loyalties, such as refugees and visa applicants.” The DHS, the company says, would then decide how to act on the results of those reports. “With four levels to work with (low, average, potential, and high) it would not be hard to establish Departmental protocols according to risk level,” the company stated on its blog.
Risk assessments in themselves are nothing new. In recent years, algorithms have been introduced at nearly every stage in the criminal justice process, from policing and bail to sentencing and parole. The arrival of such techniques has not been without controversy. Many of these automated tools have been criticized for their opacity, secrecy, and bias. In many cases, officers, courts, and the public are not equipped — or allowed — to interrogate their underlying assumptions, training sets, or conclusions. Chief among the concerns of skeptical experts is that the objective aura of machine learning may provide plausible cover for discrimination.
AC Global Risk provides few public details about how its technology works. It does not publish white papers backing up its research claims and has not released the scientific pedigrees of its researchers. The company did not answer questions about what qualities (pitch, speed, inflection) and features the product measures. “As much as the use of risk assessment in criminal justice settings is problematic, it’s much more accurate compared to this company’s tool,” said Suresh Venkatasubramanian, a computer scientist at the University of Utah who focuses on algorithmic fairness.
If any of AC Global Risk’s claims for its technology are valid, they would represent the cutting edge of what researchers think is possible to ascertain from the human voice. Vocal assessments can be excellent at quickly discerning demographic information. This information might be general — such as someone’s age, gender, or dialect — but it can also be quite personal, revealing the particular region someone is from, as well as any health problems they might have.
Last month, Amazon was issued a patent that would allow its virtual assistant Alexa to determine users’ vocal features, including language, accent, gender, and age. However, when it comes to determining emotions from the voice, accuracy remains a major concern. Schuller, the co-founder of audEERING, a voice analytics company, says that it’s currently not possible to tell whether someone is lying (if lying is, in fact, one of the company’s indices for risk) from the voice at greater than 70 percent accuracy, which is around the same as an average human judgment.
Schuller said that it is possible to detect intoxication, sincerity, and deception, but again, the success rate is similar to an average human’s abilities. “With a solid label, you can sometimes beat the human, but if something claims zero error, it should be taken with a grain of salt,” he said.
Central to assessing the validity of AC Global Risks’ claims is what fits under the amorphous label of risk and who defines it. “They’re defining risk as self-evident, as though it’s a universal quality,” said Joseph Pugliese, an Australian academic whose work focuses on biometric discrimination. “It assumes that people already know what risk is, whereas of course the question of who defines the parameters of risk and what constitutes those is politically loaded.”
CEO Alex Martin has spoken of looking “for actual risk along the continuum that is present in every human.” Yet the idea that risk is an innate and legible human trait — and that this trait can be ascertained from just the voice — rests on flawed assumptions, explained Todorov, the Princeton psychologist. Our ability to detect how people actually feel versus how we are perceiving them to feel has been a notoriously difficult problem in machine learning, Todorov continued. The possibility for mistaken impressions might be further complicated by the evaluative setting. “People at the border are already in fraught and highly emotionally charged circumstances,” Pugliese said. “How can they comply in a so-called normal way?”
A New Physiognomy?
AC Global Risk is part of a growing number of companies making outsized claims about the abilities of their behavioral analytics software. Encouraged by the observational prowess of artificial intelligence, many biometrics vendors and AI companies have been selling corporations and governments the ability to determine entire personalities from our facial expressions, movements, and voices. A biometrics vendor at the 2014 Winter Olympics in Russia, for instance, scanned the expressions of attendees in order to give the country’s security agency, the FSB, the ability to “detect someone who appears unremarkable but whose agitated mental state signals an imminent threat.”
Some skeptical experts who study AI and human behavior have framed these tools as part of a growing resurgence of interest in physiognomy, the practice of looking to the body for signs of moral character and criminal intent. In the mid-19th century, Cesare Lombroso’s precise measurements of the skulls and facial features of “born criminals” lent a scientific veneer to this interpretative practice. Yet while the efforts of criminal anthropologists like Lombroso have since been relegated to the dustbin of dangerous junk science, the desire to infer someone’s moral character or hidden thoughts from physical features and behaviors has persisted.
Underlying the efforts of AC Global Risk and similar companies, Pugliese says, is an assumption that the correlations of big data can circumvent the scientific method. These “physiognomic” applications are especially troubling, he explains, given that machine learning algorithms are inherently designed to find superficial patterns (whether or not those patterns are “real”) among the data they’re given. “When they say they are triaging for risk, there is a self-evident notion that they have an objective purchase on the signs that constitute ‘criminal intent’” Pugliese says. “But we don’t know what actual signs would constitute these criminal predictors.”
Yet exposing the pseudoscientific premises of this technology does not necessarily make corporations and governments any less likely to use it. The power of these technologies — as with so many other predictive and risk-based systems — relies predominantly on their promise of efficacy and speed. “Their main claim is efficiency, making things faster, and in that sense, of course, it will work,” Venkatasubramanian explained. Whether that efficiency helps or harms the life chances of those encountering these systems is, in other words, beside the point. The Remote Risk Assessment will be seen to be working insofar as humans enact its recommendations. As Todorov wrote in an essay with two machine learning experts to voice their concerns about this general trend: “Whether intentional or not, this ‘laundering’ of human prejudice through computer algorithms can make those biases appear to be justified objectively.”
Click here to go to the current weekly digest or pick another article:
BIG BROTHER - SPYING - SURVEILLANCE - WHISTLEBLOWING: