The Big (Data) Gender Bias

.
Postado Poderia 4, 2021 .
6 minutos de leitura.


D
evelopmentimplementers arenósing big data from cellphonesand remittances to track the flow of refugees and migrants.Through thisdata, they can model the scope and scale of a “problem” and provide real-time information that enables program design teams to adapt interventions. It may even provide thedetails thatare needed for evidenced-based development decision-making. 

 All of this is good; no entanto, we must remember thatsuch data is usuallygeneratedfrom a sub-set of the population. Por exemplo, women are not equally represented in these data samples because they are significantly less likely to be carrying mobile phonesbeconnected tothe internetou activeon social media.Therefore, the use of this data automatically excludes significant proportions of the population not only women, but the elderly, the poor, certain ethnic groups – the list goes on. At this stage in global digital transformation, big data suffers from representation or sample bias and should be used with extreme caution to inform development decision-making. 

 Big data– particularlypassive data such as social media and cellphone metadata or call detail records–is one of the primary model inputs or training data forArtificial Intelligence(AI) and Machine Learning (ML).  

FortunatelyUSAID has begun a much-neededdiscussion on gender bias in AI and its effect on development initiatives. Identifying the insidious ways gender bias creeps into the AI lifecycle and correcting this bias is crucial to ensure that future programming promotes inclusive development. This blog focuses on gettingAI’s underlying data right. 

GenderBias inDevelopment Data 

 Development actors are also producers of data sets via project data collection efforts.USAID now requires all project data be entered into its Development Data Library in “machine-readable” formats. Onemay logically assume that this data may be used in AI/ML applications. This would be a good thing if the data going into the AI applications were free from bias. No entanto, development data has too long been plagued by gender bias. 

 The use of samples where men are disproportionately represented is so prevalent in our projectlevel data sets that many of us do not even notice it. There are practical reasons why this happens. 

 First, it is easier to reach men. In manydevelopingcountries, women are less present in public spaces – whether it be the result of theirtime poverty (they are much more likely to be at home, performing household chores, caring for children, the sick and the elderly) or due to gender norms. Second, women tend to be more protected. Social norms often restrict malefemale interaction and make families hesitant to welcome interviewers into the home. These norms also make it more likely that a male will be present if an interview takes place, emuch more likely that women will be less forthcoming in discussions where men are present.In larger group settings, womenare more likely tobe deferential to men. 

 Whether data is collected through traditional means or relying on big data sources, the bias in favor of men in sampling data is often left unchallenged. We justify using this data by telling ourselvesthatthis is the best we can get. Thus, we provide space for gender bias to impact the way we interpret the data.Biased data leads to programming that meets the needs of a few (usually men)=e reinforcesand propagatesexisting gender inequalities, allenquantogiving the illusion of advancing development objectives. 

Addressing Gender Bias in Using Big Data 

 Closing the digital divide may one day make big data less vulnerable to interpretive bias. In the meantime, it isour responsibility to first acknowledge the limitations of using big data in capturing on-the-ground reality, as well as reflecting the behavior of women and others who are less likely to be carrying mobile phones ornósing the internet. 

 We then need to avoid making programming decisions based solely on data that we know is biased in favor of a sub-group of the population. These are hardly big tech solutions, but if successfully implemented, they would minimize the potential for gender bias to creep into the way we use big data and lead to more inclusive programming decisions. 

 Correcting forthe over representation of men indata sets should be done as a matter of dataquality and best practice, just as it would be if there were other types of distortions in the data.The dataset needs to be reviewed to ensure thatit is representative –in thatithasequal numberé of males and femalesIf it is skewed in favor of men, then the project must collect more data from women tocorrect for the underrepresentation of women.  

Taking positive steps 

Todaynóshave increasingly granularunderstanding of the extent of the digital divide and digital access. We could apply this information to developstatistical pointers to enable data collectors or publishers to adjust for sampling bias in specific countries. 

 If nothing elsewe shouldrequirethat all datasets carry warning labelsthatclearly state what proportion of men and women are represented in the sample. The warning would caution those who use the data aboutitslimitationsand make it harder for development practitioners to“unconsciously”take data from a sub-set of the population and extrapolate the findings to the entire community. 

In scenarios where data publishers and users are known to each other, there is another simple way to expose bias: to ensure that all parties – those who collect data and those who use it – develop a shared understanding of the data, its limitations and what conclusions can be drawn from it. 

 In my experience as a Gender and Inclusion Advisor, this exercise has several benefits, o most important of which is interruptingbusiness as usual and the tendency to rely on biased data. This creates space for project teams to acknowledge the need to develop programming that is responsive to the needs and interests of those communities that are rendered invisible when using such data. Without such a process, project teams often do not recognize the need to consider women or gender at all – especially if theydon’t consider it a “gender” project. 

We also recommendocollaborative production of a Compendium on Gender Bias in AI for Development paraexpose where and how gender bias manifests e discusé the potential negativeoutcomesin development programmingsectors. This can be helpful to prompt discussion andraiseawareness ofgender bias in data for development decision-making.  

We do nothaveall the answers, but we do know that the morenósun-packand analyzedata sources and inputs into AI/ML modelébuild in processes that force us tocollectively explore how we interpret them and use them carefully to inform our decision-makingthe moreequitablenossooutcomes will be.

Rebecca Sewall is the Senior AdvisorGender and InclusionMembers of the Creative Development Lab, promotes and channels new strategies to address development challenges through sciencetechnology and mediacontributed to this blog.