Meta has released a dataset to explore biases in computer vision models.

Continuing on its open-source tear, Meta today released a new AI benchmark, FACET, designed to assess the "fairness" of AI models that classify and detect stuff in photos and videos, including people.
Meta has released a dataset to explore biases in computer vision models.

Continuing on its open-source tear, Meta today released a new AI benchmark, FACET, designed to assess the "fairness" of AI models that classify and detect stuff in photos and videos, including people.

Comprising 32,000 images of 50,000 individuals annotated by human annotators, FACET is a tortured acronym for "FAirness in Computer Vision EvaluaTion". It includes classes related to occupations and activities such as "basketball player", "disc jockey" and "doctor," besides demographic and physical characteristics that enables what Meta describes as "deep" evaluations of biases against those classes.

By releasing FACET, we are opening up the possibility of having similar benchmarking for researchers and practitioners alike to better grasp the disparities that exist within their models and also track interventions instituted to mitigate fairness concerns," Meta wrote in a blog post it shared with TechCrunch. "We welcome the research community to utilize FACET for benchmarking fairness on other vision and multimodal tasks.".

Of course, benchmarks to test for biases in computer vision models aren't new. Meta itself published one back in 2019 to bring attention to age, gender and skin tone biases in models of computer vision as well as audio machine learning. And there have been many investigations into computer vision models to see if they were biased against particular demographic groups. (Spoiler alert: they usually are.)

Of course, then there is the simple fact that Meta does not have the best track record when it comes to responsible AI.

Late last year, Meta had to pull an AI demo after it created racist and inaccurate scientific literature. The anti-AI-bias tools the company's released so far have reportedly been largely toothless, too. Meanwhile, academics have accused Meta of exacerbating socioeconomic inequalities in its ad-serving algorithms and of exhibiting bias against Black users in its automated moderation systems.

Meta claims, however, that FACET is far more granular than any other previous computer vision bias benchmark — able to answer questions like "Are models more effective at classifying individuals as skateboarders when their perceived gender presentation has more stereotypically male attributes?" or "Are biases amplified when the person has coily hair compared to straight hair?

To develop FACET, Meta had the above annotators both label each of the 32,000 images for demographic attributes (e.g. the pictured person's perceived gender presentation and age group), additional physical attributes (e.g. skin tone, lighting, tattoos, headwear and eyewear, hairstyle and facial hair, etc.) and classes. They paired these labels with other labels for people, hair and clothing obtained from Segment Anything 1 Billion, a Meta-designed dataset for training computer vision models to "segment," or isolate, objects and animals from images.

The images were sourced from Segment Anything 1 Billion, Meta tells me. Those were purchased, one way or another, from a "photo provider." But it's unclear if the folks in those images were told their pictures would be used for this purpose. And in any case, at least in this blog post, it is unclear how Meta recruited the annotator teams, and what wages they received.

Historically and today, many of the annotators who label datasets used to train and benchmark AIs have been from developing countries, making wages there about an order of magnitude lower than the minimum wage in the United States. This week, The Washington Post reported that one of the biggest and best-funded annotation companies, Scale AI, has made extremely low payments to its workers, often delayed or simply withheld payments and offered insufficient ways for workers to contest them.

In a white paper outlining how FACET was created, Meta notes the annotators were "trained experts" who hailing from "several geographic regions" such as North America (United States), Latin American (Colombia), Middle East (Egypt), Africa (Kenya), Southeast Asia (Philippines) and East Asia (Taiwan). Meta utilized a "proprietary annotation platform" from a third-party vendor, it says, and annotators were "compensated with an hour wage set per country.".

Putting all problematic origins of FACET aside, Meta says the benchmark can be used to probe classification, detection, "instance segmentation", and "visual grounding" models against different demographic attributes.

In one demonstration, Meta ran FACET over its proprietary DINOv2 computer vision algorithm, which is now available for use in commercial applications. It found a variety of biases within DINOv2, including a bias against people who present with certain kinds of gender and a tendency to stereotypically identify images of women as "nurses."

"The preparation of the pre-training dataset of DINOv2 may inadvertently have transferred biases in the reference datasets chosen for curation, according to Meta in its blog post. Future work will address these possible shortcomings, and we believe image-based curation may also help avoid possible biases from search engines or text supervision as well.".

No benchmark is perfect. And Meta, to its credit, recognizes that FACET may not fully represent real-world concepts and demographic groups. It also points out that most representations of professions in the dataset may have evolved since FACET was designed. For instance, most doctors and nurses in FACET, photographed during the COVID-19 pandemic, are wearing more personal protective equipment than they would've before the health crises.

At present, we do not intend to update the dataset for this, Meta states in the whitepaper. "We shall provide a facility by which users can flag any objectionable content images that they come across, and remove objectionable content if found.".

Beside the dataset itself, Meta published a web-based dataset explorer tool. Developers must agree not to train models on computer vision models on FACET-only evaluate, test, and benchmark them-if they use it and the dataset.

Blog
|
2024-11-14 19:35:55