Developmentexécutants sontnousing "big data» from cellphonesand remittances to track the flow of refugees and migrants.Through thisdonnées, ils can model the scope and scale of a “problem” and provide real-time information that enables program design teams to adapt interventions. It may even provide thedetails quesont needed for evidenced-based development decision-making.
All of this is good; cependant, we must remember thatsuch data is usuallygénéréfrom a sub-set of the population. Par exemple, women are not equally represented in these data samples because they are significantly less likely to be carrying mobile phones, êtreconnected tothe internetou actifsur social media.Donc, the use of this data automatically excludes significant proportions of the population– pas only women, but the elderly, the poor, certain ethnic groups – the list goes on. At this stage in global digital transformation, big data suffers from representation or sample bias and should be used with extreme caution to inform development decision-making.
Big data– particularlypassive data such as social media and cellphone metadata or call detail records–is one of the primary model inputs or training data forArtificial Intelligence(AI) and Machine Learning (ML).
Heureusement, USAID has begun a much-neededdiscussion on gender bias in AI and its effect on development initiatives. Identifying the insidious ways gender bias creeps into the AI lifecycle and correcting this bias is crucial to ensure that future programming promotes inclusive development. This blog focuses on gettingAI’s underlying data right.
GenreBias inDevelopment Data
Development actors are also producers of data sets via project data collection efforts.USAID now requires all project data be entered into its Development Data Library in “machine-readable” formats. Unmay logically assume that this data may be used in AI/ML applications. This would be a good thing if the data going into the AI applications were free from bias. Cependant, development data has too long been plagued by gender bias.
The use of samples where men are disproportionately represented is so prevalent in our project–level data sets that many of us do not even notice it. There are practical reasons why this happens.
D'abord, it is easier to reach men. In manydevelopingpays, women are less present in public spaces – whether it be the result of their"time poverty» (they are much more likely to be at home, performing household chores, caring for children, the sick and the elderly) or due to gender norms. Deuxième, women tend to be more protected. Social norms often restrict male–female interaction and make families hesitant to welcome interviewers into the home. These norms also make it more likely that a male will be present if an interview takes place, etmuch more likely that women will be less forthcoming in discussions where men are present.In larger group settings, femmesare more likely tobe deferential to men.
Whether data is collected through traditional means or relying on big data sources, the bias in favor of men in sampling data is often left unchallenged. We justify using this data by telling ourselvesquethis is the best we can get. Ainsi, we provide space for gender bias to impact the way we interpret the data.Biased data leads to programming that meets the needs of a few (usually men)=et reinforcesand propagatesexisting gender inequalities, tousalors quegiving the illusion of advancing développement objectifs.
Addressing Gender Bias in Using Big Data
Closing the digital divide may one day make big data less vulnerable to interpretive bias. En attendant, c'estour responsibility to first acknowledge the limitations of using big data in capturing on-the-ground reality, ainsi que reflecting the behavior of women and others who are less likely to be carrying mobile phones ornousing the internet.
We then need to avoid making programming decisions based solely on data that we know is biased in favor of a sub-group of the population. These are hardly big tech solutions, but if successfully implemented, they would minimize the potential for gender bias to creep into the way we use big data and lead to more inclusive programming decisions.
Correcting forthe over representation of men indata sets should be done as a matter of dataquality and best practice, just as it would be if there were other types of distortions in the data.The dataset needs to be reviewed to ensure thatit is representative –in thatilaequal numbers of males and females. jef it is skewed in favor of men, then the project must collect more data from women tocorrect for the underrepresentation of women.
Taking positive steps
Today, noushave increasingly granularcompréhension of the extent of the digital divide and digital access. We could apply this information to developstatistical pointers to enable data collectors or publishers to adjust for sampling bias in specific countries.
If nothing else, we shouldrequirethat all datasets carry warning labelsqueclearly state what proportion of men and women are represented in the sample. The warning would caution those who use the data aboutc'estlimitationsand make it harder for development practitioners to“unconsciously”take data from a sub-set of the population and extrapolate the findings to the entire community.
In scenarios where data publishers and users are known to each other, there is another simple way to expose bias: to ensure that all parties – those who collect data and those who use it – develop a shared understanding of the data, its limitations and what conclusions can be drawn from it.
In my experience as a Gender and Inclusion Advisor, this exercise has several benefits, le most important of which is interrupting"business as usual» and the tendency to rely on biased data. This creates space for project teams to acknowledge the need to develop programming that is responsive to the needs and interests of those communities that are rendered invisible when using such data. Without such a process, project teams often do not recognize the need to consider women or gender at all – especially if theydon’t consider it a “gender” project.
We also recommendlecollaborative production of a Compendium on Gender Bias in AI for Development àexpose where and how gender bias manifests et discuss the potential negativeoutcomesin development programmingsecteurs. Ce cun be helpful to prompt discussion andaugmenterawareness ofgender bias in data for development decision-making.
We do notavoirall the answers, but we do know that the morenousun-packand analyzedata sources unsd inputs into AI/ML models, build in processes that force us tocollectively explore how we interpret them and use them carefully to inform our decision-making, pluséquitablenotreoutcomes will be.
Rebecca Sewall is the Senior Advisor, Genre and Inclusion. Members of the Laboratoire de développement créatif, promeut et canalise de nouvelles stratégies pour relever les défis du développement grâce à la science, technologie and media, contributed to this blog.