full transcript
From the Ted Talk by Mainak Mazumdar: How bad data keeps us from good AI
Unscramble the Blue Letters
As a data sneistict, I'm here to tell you, it's not the algorithm, but the biased data that's rlpsnibosee for these decisions. To make AI possible for humanity and society, we need an urgent reest. Instead of algorithms, we need to focus on the data. We're spending time and money to slace AI at the expense of designing and collecting high-quality and contextual data. We need to stop the data, or the biased data that we already have, and fucos on three things: data infrastructure, data quality and data literacy.
In June of this year, we saw embarrassing bias in the Duke University AI model called PULSE, which enhanced a blurry iagme into a recognizable photograph of a person. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American images were underrepresented in the training set, ladineg to wrong decisions and pdoenrtiics. Probably this is not the first time you have seen an AI misidentify a Black person's image. Despite an ivmroepd AI methodology, the uentdaoepentisrrren of racial and ethnic populations still left us with besiad results.
Open Cloze
As a data _________, I'm here to tell you, it's not the algorithm, but the biased data that's ___________ for these decisions. To make AI possible for humanity and society, we need an urgent _____. Instead of algorithms, we need to focus on the data. We're spending time and money to _____ AI at the expense of designing and collecting high-quality and contextual data. We need to stop the data, or the biased data that we already have, and _____ on three things: data infrastructure, data quality and data literacy.
In June of this year, we saw embarrassing bias in the Duke University AI model called PULSE, which enhanced a blurry _____ into a recognizable photograph of a person. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American images were underrepresented in the training set, _______ to wrong decisions and ___________. Probably this is not the first time you have seen an AI misidentify a Black person's image. Despite an ________ AI methodology, the ___________________ of racial and ethnic populations still left us with ______ results.
Solution
- improved
- predictions
- reset
- leading
- scale
- biased
- focus
- responsible
- scientist
- image
- underrepresentation
Original Text
As a data scientist, I'm here to tell you, it's not the algorithm, but the biased data that's responsible for these decisions. To make AI possible for humanity and society, we need an urgent reset. Instead of algorithms, we need to focus on the data. We're spending time and money to scale AI at the expense of designing and collecting high-quality and contextual data. We need to stop the data, or the biased data that we already have, and focus on three things: data infrastructure, data quality and data literacy.
In June of this year, we saw embarrassing bias in the Duke University AI model called PULSE, which enhanced a blurry image into a recognizable photograph of a person. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American images were underrepresented in the training set, leading to wrong decisions and predictions. Probably this is not the first time you have seen an AI misidentify a Black person's image. Despite an improved AI methodology, the underrepresentation of racial and ethnic populations still left us with biased results.
Frequently Occurring Word Combinations
ngrams of length 2
collocation |
frequency |
data quality |
3 |
data infrastructure |
3 |
biased data |
2 |
wrong decisions |
2 |
million people |
2 |
census data |
2 |
Important Words
- ai
- algorithm
- algorithms
- bias
- biased
- black
- blurry
- called
- caucasian
- collecting
- contextual
- data
- decisions
- designing
- duke
- embarrassing
- enhanced
- ethnic
- expense
- focus
- humanity
- image
- images
- improved
- incorrectly
- infrastructure
- june
- leading
- left
- literacy
- methodology
- misidentify
- model
- money
- nonwhite
- person
- photograph
- populations
- predictions
- pulse
- quality
- racial
- recognizable
- reset
- responsible
- results
- scale
- scientist
- set
- society
- spending
- stop
- time
- training
- underrepresentation
- underrepresented
- university
- urgent
- wrong
- year