newsynews

This commit is contained in:
clarissa 2023-06-05 17:38:28 -07:00
parent 162e43760a
commit ab8afec0ae
1 changed files with 54 additions and 1 deletions

View File

@ -7,4 +7,57 @@
+ Link roundup:
+ stable-diffusion and the american smile
+ department of ed recommendations about generative AI in the classroom
+
+
* Text
Hi everyone,
Another week another summary of the news wrapped inside an essay. Or maybe that's the other way 'round. Or maybe it's like three essays in a trenchcoat. You, the people, should decide.
So first here's a bit of a link roundup of interesting, odd, and upsetting things in recent days.
First, a kind of fascinating consequence of how large image generation models such as stable-diffusion are trained
https://medium.com/@socialcreature/ai-and-the-american-smile-76d23a0fbfaf
as you might be able to guess from the URL, this is a piece about how these models have a tendency to give people a very particular American cultural conception of how one should smile in photos. I'll admit this hadn't even really occurred to me, despite being a person who literally never smiles in photos by American standards, but it makes perfect sense. Of course you're going to, given how these datasets are built, get some kind of deep bias in very cultural/contextual things like how people show emotion. Despite what some forensic pop-science types will tell you, there is no magic shortcut to understand someone's interiority either through the face or through the even more dubious "micro-expressions".
So what's the solution? This is one of those times where I think we're getting into the inherent problems of large models: can you actually make an unbiased generative model? Or maybe I should phrase it as "can you make a model suitable for all domains, that is unbiased?"
See, I have rather complicated feelings about the concept of bias in large generative models because once you're at the scale of "a non-trivial portion of the internet is my data" then you're not biased like, say, the proctoring software that couldn't recognize dark skin as a student taking the test---bias in function, bias in who it works for---and instead is something more like "reflects the large scale bias of our society". That may sound like I'm splitting hairs but I think the distinction is actually really important.
Let's set aside for the moment whether you even want automated proctoring software, okay? But we can imagine what it would look like to be unbiased: if everyone's faces are registered equally well, regardless of skintone, hair, dress, makeup, disfigurement, &c.
What would it mean, though, for stable-diffusion to be unbiased in how it generates images? Seriously. What would it look like? When we ask for an astronaut or a scientist, should we get a statistical distribution of race and genders that:
+ reflects the stories we culturally tell in the US and western Europe, where most of the images come from
+ reflect the average of stories we tell globally
+ reflect the actual distribution of these jobs globally
+ reflect a flat distribution that values all of these equally
well, okay, so the last option certainly sounds like a lack of bias.
What if, instead, we ask for something more ignoble like "serial killer" or "oil conglomerate ceo". Do we want that to be a completely flat distribution? Is that fair when there are historically due to bias in our society itself some people have been far more likely to commit acts of violence and domination? One attempt at fairness becomes a whitewashing of historicity in others.
Or let's go really hard here and examine what---if reddit threads are to be believed---is the main thing people use stable-diffusion for: generating photos of attractive women. Let's leave aside some of the more lurid descriptions you might see when looking through prompting galleries and just focus on something like "beautiful" or "attractive". Since I'm picking fights with various disciplines already I'm going to say sorry, evopsych, there is no objective biologically determined idea of attractiveness of another person. That is the most contingent of contingencies, without an ounce of necessity to it.
So what should stable-diffusion do if you ask for a realistic photo of an attractive person? What on earth would that even mean? There's literally no answer that's going to be unbiased other than the model just throwing up it's hands and giving you a picture of a good squirrel instead.
This is what I mean when I'm saying that I don't know if a large model like this can even be unbiased.
It's the same problem I have with LLMs: the very nature of trying to create a universal generator means that you are picking answers to these questions and countless more and yet presenting it as a view-from-nowhere.
That doesn't mean that I think things like image generators are bad, inherently. I think something like LoRAs (Low-rank adaptations) are a step forward because they involve honestly making choices for yourself about what kind of outputs you want. I include a few links to LoRAs below
https://softwarekeep.com/help-center/how-to-use-stable-diffusion-lora-models
https://huggingface.co/blog/lora
https://replicate.com/blog/lora-faster-fine-tuning-of-stable-diffusion
But the basic idea is that it's a kind of effective fine-tuning that people can potentially do on their own to create specific kinds of images about specific kinds of subjects by using examples of it. Now, before you go running to check out the world of LoRAs please remember what I said about what stable-diffusion seems to mostly be used for and consider, then, what kinds of specific subjects and poses people are looking for.
I'm saying you're going to find a lot of NSFW content, okay? I'm being kinda snarky about it but you're going to find a lot of stuff you don't want to be looking at on a work computer, so please just keep that in mind.
Pulling it back around, though, if we could massively increase the sample efficiency---that is, reduce the number of examples that CLIP-based image generation needs to learn---then maybe we could start making models that reflect the stories and images we want to tell rather than a smeared average of the zeitgeist. Imagine the ways we could tell stories if we had that?