full transcript
From the Ted Talk by Kasia Chmielinski: Why AI needs a "nutrition label"
Unscramble the Blue Letters
Now, I've been asking myself a lot of questions about how can we understand the data quality before we use it. And this emerges from two daedecs of bldnuiig these kinds of systems. The way I was tiearnd to build semstys is similar to how people do it today. You build for the middle of the distribution. That's your nomarl user. So for me, a lot of my training data sets would include iiaoontfrmn about people from the Western world who speak English, who have certain normative ctacstrihiercas. And it took me an elbangmsrrsaiy long amount of time to realize that I was not my own user. So I ifdentiy as non-binary, as mixed race, I wear a hearing aid and I just wasn't represented in the data sets that I was using. And so I was building systems that literally didn't work for me. And for example, I once built a system that repeatedly told me that I was a white Eastern-European lady. This did a real number on my identity.
Open Cloze
Now, I've been asking myself a lot of questions about how can we understand the data quality before we use it. And this emerges from two _______ of ________ these kinds of systems. The way I was _______ to build _______ is similar to how people do it today. You build for the middle of the distribution. That's your ______ user. So for me, a lot of my training data sets would include ___________ about people from the Western world who speak English, who have certain normative _______________. And it took me an ______________ long amount of time to realize that I was not my own user. So I ________ as non-binary, as mixed race, I wear a hearing aid and I just wasn't represented in the data sets that I was using. And so I was building systems that literally didn't work for me. And for example, I once built a system that repeatedly told me that I was a white Eastern-European lady. This did a real number on my identity.
Solution
- embarrassingly
- information
- systems
- characteristics
- normal
- trained
- building
- decades
- identify
Original Text
Now, I've been asking myself a lot of questions about how can we understand the data quality before we use it. And this emerges from two decades of building these kinds of systems. The way I was trained to build systems is similar to how people do it today. You build for the middle of the distribution. That's your normal user. So for me, a lot of my training data sets would include information about people from the Western world who speak English, who have certain normative characteristics. And it took me an embarrassingly long amount of time to realize that I was not my own user. So I identify as non-binary, as mixed race, I wear a hearing aid and I just wasn't represented in the data sets that I was using. And so I was building systems that literally didn't work for me. And for example, I once built a system that repeatedly told me that I was a white Eastern-European lady. This did a real number on my identity.
Frequently Occurring Word Combinations
ngrams of length 2
collocation |
frequency |
training data |
3 |
nutrition labels |
3 |
dataset nutrition |
3 |
generative ai |
3 |
artificial intelligence |
2 |
stop eating |
2 |
data set |
2 |
data quality |
2 |
data sets |
2 |
include information |
2 |
data nutrition |
2 |
food nutrition |
2 |
building ai |
2 |
building datasets |
2 |
transparency labeling |
2 |
food packaging |
2 |
private actors |
2 |
basic principles |
2 |
Important Words
- aid
- amount
- build
- building
- built
- characteristics
- data
- decades
- distribution
- embarrassingly
- emerges
- english
- hearing
- identify
- identity
- include
- information
- kinds
- lady
- literally
- long
- lot
- middle
- mixed
- normal
- normative
- number
- people
- quality
- questions
- race
- real
- realize
- repeatedly
- represented
- sets
- similar
- speak
- system
- systems
- time
- today
- told
- trained
- training
- understand
- user
- wear
- western
- white
- work
- world