In the burgeoning field of computer science known as machine learning, engineers often refer to the artificial intelligences they create as “black box” systems: Once a machine learning engine has been trained from a collection of example data to perform anything from facial recognition to malware detection, it can take in queries—Whose face is that? Is this app safe?—and spit out answers without anyone, not even its creators, fully understanding the mechanics of the decision-making inside that box.
But researchers are increasingly proving that even when the inner workings of those machine learning engines are inscrutable, they aren’t exactly secret. In fact, they’ve found that the guts of those black boxes can be reverse-engineered and even fully reproduced— stolen, as one group of researchers puts it—with the very same methods used to create them.
In a paper they released earlier this month titled “Stealing Machine Learning Models via Prediction APIs,” a team of computer scientists at Cornell Tech, the Swiss institute EPFL in Lausanne, and the University of North Carolina detail how they were able to reverse engineer machine learning-trained AIs based only on sending them queries and analyzing the responses. By training their own AI with the target AI’s output, they found they could produce software that was able to predict with near-100% accuracy the responses of the AI they’d cloned, sometimes after a few thousand or even just hundreds of queries.
“You’re taking this black box and through this very narrow interface, you can reconstruct its internals, reverse engineering the box,” says Ari Juels, a Cornell Tech professor who worked on the project. “In some cases, you can actually do a perfect reconstruction.”
Taking the Innards of a Black Box
The trick, they point out, could be used against services offered by companies like Amazon, Google, Microsoft, and BigML that allow users to upload data into machine learning engines and publish or share the resulting model online, in some cases with a pay-by-the-query business model. The researchers’ method, which they call an extraction attack, could duplicate AI engines meant to be proprietary, or in some cases even recreate the sensitive private data an AI has been trained with. “Once you’ve recovered the model for yourself, you don’t have to pay for it, and you can also get serious privacy breaches,” says Florian Tramer, the EPFL researcher who worked on the AI-stealing project before taking a position at Stanford.
In other cases, the technique might allow hackers to reverse engineer and then defeat machine-learning-based security systems meant to filter spam and malware, Tramer adds. “After a few hours’ work…you’d end up with an extracted model you could then evade if it were used on a production system.”
The researchers’ technique works by essentially using machine learning itself to reverse engineer machine learning software. To take a simple example, a machine-learning-trained spam filter might put out a simple spam or not-spam judgment of a given email, along with a “confidence value” that reveals how likely it is to be correct in its decision. That answer can be interpreted as a point on either side of a boundary that represents the AI’s decision threshold, and the confidence value shows its distance from that boundary. Repeatedly trying test emails against that filter reveals the precise line that defines that boundary. The technique can be scaled up to far more complex, multidimensional models that give precise answers rather than mere yes-or-no responses. (The trick even works when the target machine learning engine doesn’t provide those confidence values, the researchers say, but requires tens or hundreds of times more queries.)
Stealing a Steak-Preference Predictor
The researchers tested their attack against two services: Amazon’s machine learning platformand the online machine learning service BigML. They tried reverse engineering AI models built on those platforms from a series of common data sets. On Amazon’s platform, for instance, they tried “stealing” an algorithm that predicts a person’s salary based on demographic factors like their employment, marital status, and credit score, and another that tries to recognize one-through-ten numbers based on images of handwritten digits. In the demographics case they found that they could reproduce the model without any discernible difference after 1,485 queries and just 650 queries in the digit-recognition case.
On the BigML service, they tried their extraction technique on one algorithm that predicts German citizens’ credit scores based on their demographics and on another that predicts how people like their steak cooked—rare, medium, or well-done—based on their answers to other lifestyle questions. Replicating the credit score engine took just 1,150 queries, and copying the steak-preference predictor took just over 4,000.
Not every machine learning algorithm is so easily reconstructed, says Nicholas Papernot, a researcher at Penn State University who worked on another machine learning reverse engineering project earlier this year. The examples in the latest AI-stealing paper reconstruct relatively simple machine-learning engines. More complex ones might take far more computation to attack, he says, especially if machine learning interfaces learn to hide their confidence values. “If machine learning platforms decide to use larger models or hide the confidence values, then it becomes much harder for the attacker,” Papernot says. “But this paper is interesting because they show that the current models of machine learning services are shallow enough that they can be extracted.”
Amazon declined WIRED’s request for an on-the-record comment on the researchers’ work, and BigML didn’t respond. But when the researchers contacted the companies, they say Amazon responded that the risk of their AI-stealing attacks was reduced by the fact that Amazon doesn’t make its machine learning engines public, instead only allowing users to share access among collaborators. In other words, the company warned, take care who you share your AI with.
From Face Recognition to Face Reconstruction
Aside from merely stealing AI, the researchers warn that their attack also makes it easier to reconstruct the often-sensitive data it’s trained on. They point to another paper published late last year that showed it’s possible to reverse engineer a facial recognition AIthat responds to images with guesses of the person’s name. That method would send the target AI repeated test pictures, tweaking the images until they homed in on the pictures that machine learning engine was trained on and reproduced the actual face images without the researchers’ computer having ever actually seen them. By first performing their AI-stealing attack before running the face-reconstruction technique, they showed they could actually reassemble the face images far faster on their own stolen copy of the AI running on a computer they controlled, reconstructing 40 distinct faces in just 10 hours, compared to 16 hours when they performed the facial reconstruction on the original AI engine.
The notion of reverse engineering machine learning engines, in fact, has been advancing in the AI research community for months. In February another group of researchers showed they could reproduce a machine learning system with about 80 percent accuracycompared with the near-100 percent success of the Cornell and EPLF researchers. Even then, they found that by testing inputs on their reconstructed model, they could often learn how to trick the original. When they applied that technique to AI engines designed to recognize numbers or street signs, for instance, they found they could cause the engine to make incorrect judgments in between 84 percent and 96 percent of cases.
The latest research into reconstructing machine learning engines could make that deception even easier. And if that machine learning is applied to security- or safety-critical tasks like self-driving cars or filtering malware, the ability to steal and analyze them could have troubling implications. Black-box or not, it may be wise to consider keeping your AI out of sight.
Here’s the researchers’ full paper: