I posted this as a thread on Mastodon (I’m trying to practice threads, like climbing a mountain, because it’s there). I kind of like it and want it somewhere where I can find it as well 😛
First, an example of what AI thinks might be a bird from Emily Oliver.
The non-maven non-geek tends to think of Artificial Intelligence like this:
- get powerful computer
- furnish with amazing software
- Presto! AI.
In reality it works like this: 1) get powerful computer. Check. 2) get software. Check. Then the missing step — missing because most people are only dimly aware of it — train the AI on a dataset.
The dataset is selected by the geeks making the AI. (It doesn’t have to be, but that’s how it currently is.) If their dataset is current US physics grads, it’ll be +/- 3/4 white men. If they’re making an resume reading AI for employers, it’ll favor white men because that’s what its training told it is a common trait of physics experts.
It’s obvious if you think about it for a second, but an AI is only as good as its training. It’s almost human that way.
A visual example makes clear how very small differences, mistakes a human would never make, are enough to make nonsense of AI results. Something to remember when AI makes the first cuts on college and job and mortgage and parole applications. From Daniel Solis.
These are from datasets of bird illustrations, after which the AI is told to draw a bird. It doesn’t always draw nonfunctional edge cases. But rather often. So, clearly, it is ESSENTIAL to have public access to the training dataset and methods. (See also Emily Bender.)
Commercial AI, the ones making those resume-reading decisions, all — without exception as far as I know — hide everything under “proprietary.” Think about that as you look at the “birds.”