Exposing.AI Lets You Find Your Old Photos in AI Training Datasets


By building facial recognition systems, tech companies are getting help from an unexpected source - (people's faces). To develop such systems, companies, universities and government laboratories use millions of images collected from many online sources.


Artificial intelligence (AI) -based facial recognition systems do not magically become smart. They learn by identifying patterns in human-generated data — photographs, voice recordings, books, Wikipedia articles, and all sorts of other materials. And humans may not even be aware that they are contributing to AI training.


Liz O'Sullivan, technology director of the Surveillance Technology Oversight Project, and researcher Adam Harvey, have created an online tool, Exposing.AI , that allows people to find their old photographs in collections of images used for AI training, writes The New York Times.


In 2006, Canadian documentary filmmaker Brett Gaylor posted a photo from his honeymoon on the then-popular Flickr service. Fifteen years later, he used an early version of Exposing.AI provided to him by Adam Harvey and found that these images were scattered across different datasets that could be used to train facial recognition systems around the world.


Gaylor wondered how his photographs could move from place to place. He was then told that the images could be used in surveillance systems in the United States and elsewhere, and that one of those systems was even used to track Uyghurs in China.


Flickr, which has been bought and sold by many companies over the years and is now owned by the photo-sharing service SmugMug, allowed users to share their photos under a so-called Creative Commons license. This license means that third parties are allowed to use these photographs with certain restrictions, although in practice these restrictions may be ignored. In 2014, Yahoo !, which owned Flickr at the time, used many of these photographs in a dataset designed to work on computer vision.


O'Sullivan and Harvey have been trying for years to create a tool with which users can figure out how all the data they generate is being used. However, the task turned out to be more difficult than they expected. The researchers wanted their tool to take a photograph of someone and, using facial recognition technology, instantly tell that person how many times his or her face was included in one of the AI ​​training datasets. But they worried that such a tool could be used for bad purposes - by persecutors, companies and intelligence agencies.


In the end, the researchers were forced to limit the functionality of Exposing.AI and the results it produced. In its current form, the tool is not as effective as we would like. But the researchers are concerned that they won't be able to uncover the scope of the problem without making it worse.


Exposing.AI itself does not use face recognition technology. The tool detects images only if the user already has a way to point to them on the Internet, for example, using an Internet address. Only photos posted on Flickr can be found and requires a Flickr username and a tag or web address that can identify those photos. This provides adequate security and privacy protection, the researchers said.


Previous Post Next Post