full transcript

From the Ted Talk by Mainak Mazumdar: How bad data keeps us from good AI


Unscramble the Blue Letters


As a data sneistict, I'm here to tell you, it's not the algorithm, but the biased data that's rlpsnibosee for these decisions. To make AI possible for humanity and society, we need an urgent reest. Instead of algorithms, we need to focus on the data. We're spending time and money to slace AI at the expense of designing and collecting high-quality and contextual data. We need to stop the data, or the biased data that we already have, and fucos on three things: data infrastructure, data quality and data literacy.

In June of this year, we saw embarrassing bias in the Duke University AI model called PULSE, which enhanced a blurry iagme into a recognizable photograph of a person. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American images were underrepresented in the training set, ladineg to wrong decisions and pdoenrtiics. Probably this is not the first time you have seen an AI misidentify a Black person's image. Despite an ivmroepd AI methodology, the uentdaoepentisrrren of racial and ethnic populations still left us with besiad results.

Open Cloze


As a data _________, I'm here to tell you, it's not the algorithm, but the biased data that's ___________ for these decisions. To make AI possible for humanity and society, we need an urgent _____. Instead of algorithms, we need to focus on the data. We're spending time and money to _____ AI at the expense of designing and collecting high-quality and contextual data. We need to stop the data, or the biased data that we already have, and _____ on three things: data infrastructure, data quality and data literacy.

In June of this year, we saw embarrassing bias in the Duke University AI model called PULSE, which enhanced a blurry _____ into a recognizable photograph of a person. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American images were underrepresented in the training set, _______ to wrong decisions and ___________. Probably this is not the first time you have seen an AI misidentify a Black person's image. Despite an ________ AI methodology, the ___________________ of racial and ethnic populations still left us with ______ results.

Solution


  1. improved
  2. predictions
  3. reset
  4. leading
  5. scale
  6. biased
  7. focus
  8. responsible
  9. scientist
  10. image
  11. underrepresentation

Original Text


As a data scientist, I'm here to tell you, it's not the algorithm, but the biased data that's responsible for these decisions. To make AI possible for humanity and society, we need an urgent reset. Instead of algorithms, we need to focus on the data. We're spending time and money to scale AI at the expense of designing and collecting high-quality and contextual data. We need to stop the data, or the biased data that we already have, and focus on three things: data infrastructure, data quality and data literacy.

In June of this year, we saw embarrassing bias in the Duke University AI model called PULSE, which enhanced a blurry image into a recognizable photograph of a person. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American images were underrepresented in the training set, leading to wrong decisions and predictions. Probably this is not the first time you have seen an AI misidentify a Black person's image. Despite an improved AI methodology, the underrepresentation of racial and ethnic populations still left us with biased results.

Frequently Occurring Word Combinations


ngrams of length 2

collocation frequency
data quality 3
data infrastructure 3
biased data 2
wrong decisions 2
million people 2
census data 2



Important Words


  1. ai
  2. algorithm
  3. algorithms
  4. bias
  5. biased
  6. black
  7. blurry
  8. called
  9. caucasian
  10. collecting
  11. contextual
  12. data
  13. decisions
  14. designing
  15. duke
  16. embarrassing
  17. enhanced
  18. ethnic
  19. expense
  20. focus
  21. humanity
  22. image
  23. images
  24. improved
  25. incorrectly
  26. infrastructure
  27. june
  28. leading
  29. left
  30. literacy
  31. methodology
  32. misidentify
  33. model
  34. money
  35. nonwhite
  36. person
  37. photograph
  38. populations
  39. predictions
  40. pulse
  41. quality
  42. racial
  43. recognizable
  44. reset
  45. responsible
  46. results
  47. scale
  48. scientist
  49. set
  50. society
  51. spending
  52. stop
  53. time
  54. training
  55. underrepresentation
  56. underrepresented
  57. university
  58. urgent
  59. wrong
  60. year