Member-only story

AI Generation: Diverse Dataset Collections for Inclusivity & Empowerment

AIGeneration.blog
3 min readOct 11, 2023
Data Speed Time Deluge. Digital Artist: MoniGarr

Creating diverse dataset collections is a requirement to verify AI systems are a force for inclusivity and empowerment instead of exclusion that rapidly increases the digital divide between privileged people receiving all the benefits while marginalized communities receive harm from the same AI systems.

Diverse dataset collections learn from a wider range of examples from a wider range of data sources that are very different from each other. This tactic makes them more accurate, fair and representative of the diverse world we all live in. The following provides info to help you create a diverse dataset collection.

  1. Define Inclusion Goals: Identify the attributes, characteristics and aspects of diversity that are important for your AI application. Gender, race, age, ethnicity, different abilities, socioeconomic status, language, religion, culture and more can be defined here.
  2. Fully Informed Prior Consent: Work with the individuals and communities that you defined in step 1 to obtain their fully informed prior consent regarding your current and future data collection and resulting AI application work. Clearly define each (your own, the individuals and the communities) expectations regarding all aspects of your data collection and resulting AI applications. Verify individuals also have the fully informed prior consent from the communities they claim to represent. Plan to provide fair, equal reciprocity and benefits for everyone involved in your projects.
  3. Data Sources: Collect your data from a variety of sources that reflect the diversity you wish to represent. The sources can include public databases, surveys, social media, user-generated content and other relevant data repositories.
  4. Random Samples: Verify randomness in your data sampling process. Avoid cherry-picking data that only represents a specific individual or particular group or viewpoints. Exception: Their are use cases for dataset collections that represent very specific isms (age, gender) when the data source and resulting AI applications are transparent and carefully planned for very specific use cases.
  5. Balance Representation: Use a balanced representation of different groups within your dataset. Aim for a…

--

--

AIGeneration.blog
AIGeneration.blog

Written by AIGeneration.blog

AI Generation Tutorials, Opinions and Experiments. Subscribe : https://medium.com/subscribe/@aiarts

No responses yet

Write a response