The vast majority of computer vision research leads to technology that surveils human beings, a new preprint study that analyzed more than 20,000 computer vision papers and 11,000 patents spanning three decades has found. Crucially, the study found that computer vision papers often refer to human beings as “objects,” a convention that both obfuscates how common surveillance of humans is in the field, and objectifies humans by definition.
I am totally in favor of criticizing researchers for doing science that actually serves corporate interests. I wrote a whole thing doing that just last week. I actually fully agree with the main point made by the researchers here, that people in fields like machine vision are often unwilling to grapple with the real-word impacts of their work, but I think complaining that they use the word “object” for humans is distracting, and a bit of a misfire. “Object detection” is just the term of art for recognizing anything, humans included, and of course humans are the object that interests us most. It’s a bit like complaining that I objectified humans by calling them a “thing” when I included humans in “anything” in my previous sentence.
Again, I fully agree with much of their main thesis. This is a really important point:
As co-author Luca Soldaini said on a call with 404 Media, even in the seemingly benign context of computer vision enabled cameras on self-driving cars, which are ostensibly there to detect and prevent collision with human beings, computer vision is often eventually used for surveillance.
“The way I see it is that even benign applications like that, because data that involves humans is collected by an automatic car, even if you’re doing this for object detection, you’re gonna have images of humans, of pedestrians, or people inside the car—in practice collecting data from folks without their consent.” Soldaini said.
Soldaini also pointed to instances when this data was eventually used for surveillance, like police requesting self-driving car footage for video evidence.
And I do agree that sometimes, it’s wise to update our language to be more respectful, but I’m not convinced that in this instance it’s the smoking gun they’re portraying it as. The structures that make this technology evil here are very well understood, and they matter much more than the fairly banal language we’re using to describe the tech.
The structures that make this technology evil here are very well understood, and they matter much more than the fairly banal language we’re using to describe the tech.
Conversely, the fairly banal language used to describe the tech is how the structures that make technology evil are concealed.
Calling humans human rather than objects, even if object detection is what AI does, re-instills certain objects with a whole host of features that distinguishes them from other objects. It won’t matter for the AI, obviously. But it will matter for the people involved with creating and using it.
I mean, imagine if Tesla shows “Object Identified” as it barrels over a misplaced jaywalker. My previous sentence buries the horror of someone being murdered. Similarly, humans are understood to have rights, thoughts, feelings, whole worlds that exist\ inside of their heads, and they exist within a social ecosystem where their presence is fundamental to its health. “Object” capture none of that. But identifying human objects as human does.
Relabeling human objects as human reintroduces all the associated values of being human into AI object detection discussions. And so it becomes easier to see how the evils of technology are acting on us rather than concealing it.
This still just feels like a muddying of technical language. If you were to write an article about autopilot killing somebody and use object to refer to them, that's certainly dehumanization, but saying that an object detection algorithm performs poorly on humans doesn't feel like it is.
Part of the problem is that in general we aren't talking about specialized human detection models that incorporate things like pose estimation. Instead it is almost always a general object detection alg, and referring to the same models differently based on the subject just adds muddiness.
I'm mostly familiar with AI within healthcare, and in my workplace, any released model is going to have a number of conversations and evaluations about the technical performance, practical impact on patients, and general ethics of the model. Those conversations blend, but it's harmful to make the language less clear in any one of those contexts.
I love being objectified, it’s so hot. Treat me like a filthy object