Skip to main content
  • Research article
  • Open access
  • Published:

Disaggregation of humanitarian data by disability: a realist evaluation of the use of the Washington Group Questions to support more inclusive practices


People with disabilities make up approximately 16% of the world’s population and disproportionately experience the risks and negative impacts of humanitarian emergencies. In humanitarian contexts, understanding who has a disability, where they are located, and what their needs are is crucial to delivering the right assistance at the right place and time. In recent years, global attention to disability inclusion in the humanitarian sector has focused on the generation of disaggregated data, most commonly using one of the Washington Group Sets of Questions. The implicit assumption behind the collection of more and more data disaggregated by disability, that it will lead to more inclusive action and outcomes, is incorrect. Our findings nuance the current push for disability disaggregated data in all settings and advocate a shift away from the blanket application of disaggregation to a more bespoke approach. Humanitarian and development contexts present multiple challenges to disaggregating data sets by demographic factors such as disability, including the use of households rather than individuals as the unit of analysis, small and non-representative samples, and minority languages with limited translation capacity. Through evaluation of the use of the Washington Group set across the world’s largest humanitarian organization, and its cooperating partners, we present five decision-making criteria that can be flexibly but consistently applied across operating contexts. This enables contextualized decision-making that uses consistent logic to predict the likelihood of data disaggregation by disability leading to more inclusive action and outcomes.


In humanitarian emergencies, persons with disabilities face barriers to accessing assistance and are likely to have needs and capacities that are not recognized or addressed by standard humanitarian responses (WFP 2020). Disaggregating data by disability is increasingly championed as a way toward more inclusive humanitarian action. Quantifying and comparing the needs or experiences of persons with disabilities can be a powerful measure to propel humanitarian access and outcomes on an equal basis with others, as enshrined in the Convention on the Rights of Person with Disabilities (CRPD).

Articles 11 and 31 of the convention directly address ‘situations of risk and humanitarian emergencies’ and ‘statistics and data collection’ respectively (Mitleton-Kelly 2003). Read together they provide a clear requirement for consideration of disability in data collection activities in humanitarian settings. In recent years, an explicitly rights-based approach to humanitarian action has also seen increased attention, interest, and funding for disability inclusion (DI) in the humanitarian sector (The Washington Group on Disability Statistics 2022; Abualghaib et al, 2019; AusAID. 2012). One notable manifestation of this attention is the proliferation of expectations for the disaggregation of data by disability, championed by donors, and organizations of persons with disabilities (OPDs). The Washington Group Short Set of Questions (WG-SS), consisting of 6 questions (see Table 1), has emerged as the tool of choice for disaggregating humanitarian data sets by disability (Carden et al, 2021; Leonard Cheshire, Humanity & Inclusion 2018; Cilliers 2001) and specific WG modules, such as the Child Functioning and Labor Force Survey Disability Modules have been produced in collaboration with UN agencies (Constantino et al, 2020; Dalkin et al, 2021).

Table 1 The Washington Group Short Set of Questions

However, the drive toward increased data disaggregation as an output has overlooked data disaggregation as a process and specifically, the appropriate conditions that enable disaggregated data to go beyond generating numbers to driving better programming and more equal outcomes. The constraints of data collection in emergencies mean that not all data sets can produce meaningful information when disaggregated by demographic factors such as disability. When it comes to disaggregation of humanitarian outcomes by disability, we argue that the ‘how’ is well answered by the WG-SS. Yet, amidst small samples, short timeframes, and untrained data collectors, the questions of when and whether to collect and disaggregate data remain challenging (Darcy n.d.), ECHO.2019). This study makes a unique and timely contribution to the literature by providing evidence-based support to humanitarian practitioners in assessing whether the salient conditions are in place for disaggregation by disability to ‘work’, i.e., produce meaningful and reliable data that drive better and more equal outcomes. Where the decision-making process suggests that these conditions are not in place, remedial steps can be taken prior to the collection and analysis of data.

Correct use of the WG modules supports the collection of data which are ‘comparable throughout the world’, noting that the standard application of the WGQ (see Table 1) positively identifies an individual if they respond ‘a lot of difficulty’ or ‘cannot do at all’ to at least one question. However, humanitarian emergencies present challenges to standard data collection practices, and unendorsed and untested adaptations of the WG data collection tools are common (Fuhr et al, 2020). Ad hoc adaptations often contradict the core tenets of the methodology and introduce bias in the resultant data. This poses potential risks, as resulting statistics may be inconsistent, thus affecting credibility of the data or the appropriateness of data-based decisions. This is especially worrisome in humanitarian settings, where the socio-political context is likely sensitive and resources are limited (Carden et al, 2021). If the WG-SS are implemented without the right conditions in place, limited positive impact is likely.

If data are not collected appropriately, are not analyzed, or are not available at the right time or to the right people, the data are unlikely to change the realities experience by persons with disabilities in emergencies. Quantitative approaches such as disaggregation can focus too much attention on the generation of statistics, with the risk that producing a number becomes viewed as a disability inclusion effort in and of itself, with little tangible action beyond the spreadsheets. This is exacerbated by the siloed nature of humanitarian work, where data users and data generators may comprise different specialties, lacking an integrated approach (WFP 2020).

The WFP is the world’s largest humanitarian organization, mandated with Sustainable Development Goal 2, ‘Zero Hunger’, and delivers food assistance in dozens of countries globally. In 2020, WFP introduced its first disability inclusion road map, aimed at implementing the organization’s obligations regarding disability inclusion (Gear et al, 2018). This was preceded in 2018, by an organization-wide requirement for data disaggregation by disability, using the WG-SS as initially one, and over time the only, accepted methodology for such disaggregation. This created a rich interview pool with a wealth of implementation experience across highly diverse contexts. This corporate data requirement notwithstanding, across more than 80 operational country offices globally, uptake of disaggregation by disability was ad hoc with a wide variety of non-standardized, local adaptations often used (Fuhr et al, 2020). This made interviews and observations, rather than interrogation of the secondary disaggregated data, more appropriate to explore the underlying causes of different responses to the WG-SS.

This paper fills a knowledge gap by presenting an evaluation of 4 years of experience using the WG-SS within the world’s largest humanitarian organization, the United Nations World Food Program (WFP) to understand under what conditions can disaggregation of data by disability can be expected to produce inclusive action.


In this section, we detail the use of realist evaluation methodology (Gilmore et al, 2019) to develop a theoretical proposition on the necessary conditions for successful disaggregation of data by disability.

Realist evaluation (RE) has been used in evaluations of aid and development work for at least a decade (Greenhalgh et al, 2017). RE seeks to move beyond a confirmation of whether an intervention works, to understanding how or why it works, for whom, in which contexts, and over what durations (Greenhalgh et al, 2017). Context is a central tenet in the realist approach, as it serves to enable or constrain the mechanisms by which an intervention has an effect (Greenwood et al, 2017). RE does not seek to control causative and confounding variables, but to account for such factors and better harness or mitigate their effects. Therefore, despite RE’s theoretical orientation, it is well suited to producing practical knowledge that can inform humanitarian praxis.

Our initial ‘theory gleaning’ was extensive and made possible through the research team’s embeddedness within WFP at headquarters and country office levels. This embedded position was the result of a global level research partnership between WFP and the TCD research team, whereby the TCD team sat closely alongside the HQ DI team and had potential access to all global operations of WFP. These theoretical propositions were refined and consolidated against primary data using key informant interviews (KII’s) with key staff. Finally, the abstracted findings were re-framed as five distinct, decision-making criteria that could be used by humanitarian actors to make a quick assessment of the suitability of conditions for implementing WG-SS, and if gaps are identified, to identify actions to take before initiating the data process.

Ethical approval for this research was secured from the relevant review board at the School of Linguistic, Speech and Communication Sciences at Trinity College Dublin, Ireland.

Theory development and refinement

Building a theoretical proposition began with making explicit our 'underlying assumptions' (Guha-Sapir 2020) regarding why and how WG-SS did or would work within WFP operations and what was driving the observed variation. This phase relied on opportunities to listen in on discussions, view relevant email chains, and review data tools and processes, which were a direct result of the embedded structure of the research partnership. Through observing common patterns of actions and discussion among WFP country offices which engaged with the WG-SS, we identified candidate theories which were consolidated into one initial theoretical proposition. These observations were augmented and framed by a review of both the gray and academic literature regarding disability inclusion and disability data within humanitarian action (Holden n.d.) ; Jagosh et al, 2015; Loeb et al, 2017), and change processes within complex adaptive systems (Loeb et al, 2018; (Lough n.d.) ; Mactaggart et al, 2021; Manzano 2016).

Initial theories were then interrogated against primary data from KIIs with staff in roles likely to have key insight into the process. Given the capacity of interviews to explore the unobservable reasoning underlying the observed outcomes, KII’s focused on why the WG-SS were being used, refused, or adapted across WFP’s operations. Participants were identified according to their expected ability to help clarify the program theory (McVeigh et al, 2021) through 'assisted sense-making' (Mojtahedi et al, 2023) based on factors including position within WFP and explicit experience with disability-related data. Some interviewees spontaneously suggested colleagues with relevant experience, who were then contacted through a WFP gatekeeper, in line with the research protocol. In total, 50 interviews were completed and coded as part of a multi-year research project investigating disability inclusion within WFP. A subset of these referred specifically to data processes and are presented in the results section, but all interviews informed initial theorising. All interviews were recorded and transcribed verbatim, and coding was carried out using NVivo V1.5.

Interviews were coded iteratively, meaning that interviews were analyzed and coded as they were completed, refining the theory in an ongoing process. The initial theoretical propositions regarding how data for disaggregation ‘works’ were used as the analytical unit (Mukumbang et al, 2020; O’Reilly et al, 2021). Retroductive reasoning was used to verify hypothesized, generative causes for observed outcomes, shifting between deductive testing of initial theories against the evidence (KII transcripts), and inductively drawing on the partial findings to refine the theory and subsequent interview guide (Pawson 2003). The use of memos enabled a transparent record of how the theories were refined or augmented through this process (Mukumbang et al, 2020).

Part of this iterative process is shown in Table 2, where a refinement of the initial theory and a rival theory are presented. The interviews followed the three stages of 'theory gleaning, theory refining, and theory consolidation' (Mojtahedi et al, 2023) continuing until little or no new nuance was obtained through additional KIIs. Explicitly realist interviewing techniques sought to bring the interviewee to engage in theorising, adding trustworthiness to the process and results (McVeigh et al, 2021). In this phase the theory was presented in the form of the “if…., then…” statements (see Table 2), to interview participants, i.e., 'If ABC context and resource are available, then XYZ response and outcome will be observed'. This echoes the structure of program theory statements commonly used in humanitarian practice (Pawson 1997; Pearce 2017), with the expectation that a familiar presentation would spur greater comfort in shared theorizing.

Table 2 Phases of theory development

Questions were posed to invite participant’s own reflections in response to an element of the theory, e.g., 'Having a shared understanding of why you want to collect these data seems important. Does that align with your own experience? …In your experience, what is it about that understanding that is so important?'. This approach also has benefits in action-oriented research, where a usable theory is the aim, and so shared ownership of the findings can support up-take and buy-in.

Theory consolidation

As refinement progressed, some participants were consulted and asked to again critique the theory statement and consider whether the refined version better reflected their understanding and experience of how the WG-SS ‘work’. By the consolidation phase, changes were mainly semantic rather than substantive. This increased our confidence that the theoretical output accounted for most of the salient features of the intervention. Once additional interviews were resulting in limited new refinements of the theory, and no refutation of any elements, we considered the theory ready for final consolidation.

Theory repackaging

The theory output is intended to directly inform humanitarian decision making, therefore presenting the theory in a usable format was of paramount importance. One shortcoming of any theoretical proposition (e.g., Table 2, row 4) is that it may feel unwieldy, academic, and ill-suited to informing practical action in an emergency context. To address this, we reformulated the key components of the theory as five criteria. We sought to develop a nuanced output with explanatory capacity across the heterogeneity of humanitarian contexts and sufficient detail to aid a genuine decision. Each of the criteria was accompanied by a question to help decision makers assess whether the criterion was (sufficiently) present in their context. These criteria were then shared with a subsample of the original interviewees for feedback, and some tweaks were made to the terminology used (Table 3).

Table 3 The evolution of a theory


In humanitarian practice, it has been said that '[f]or research evidence to be more operationally relevant it must respond to operational demand.' (Pearce et al, 2016). The five criteria present the findings as a transferable knowledge product, designed to encourage uptake and utility for decision-makers and data stewards. The criteria aim to balance robust theory with a field-ready framework for localized decision making.

In line with calls for clear methodological reporting to support the transparency of RE findings (Mukumbang et al, 2020; Refugees 2007), we provide here detail as to how each criterion is supported by primary data and then apply the five criteria to a real scenario.

Following consolidation, the final theoretical output, as shown in the final row of Table 2, was articulated as follows:

When the purpose of disability data is unambiguous and perceived as relevant, and when structures and resources can facilitate actioning data, and if data disaggregation is applied by key staff according to this rationale, with system capacity for learning with experience, then implementation of data disaggregation can sustainably proliferate, and over time will contribute to more systematically inclusive action.

In contrast with this unwieldly sentence with are the clear criteria and questions in Table 4, which allow humanitarians considering the use of the WG-SS to make a rapid assessment, using the associated questions. A higher proportion of positive responses indicates an increased likelihood that the WG-SS will work, while a negative response would indicate caution in proceeding, and highlight an area where preparatory action could be taken. Each criterion is not a linear prerequisite for the next step; however, a positive response to each criterion contributes to an increasingly facilitatory environment for data disaggregation (Figure 1). There is no set threshold for a positive or negative response, reflecting the gray areas or fluctuating circumstances of humanitarian practice. Therefore, although questions are framed as yes/no; in practice, we found a strict binary response which is not always possible, and the user must apply judgement.

Table 4 Will the WG-SS work for me?
Fig. 1
figure 1

Five decision-making criteria


It is necessary to identify and articulate a clear purpose for data disaggregation by disability that is understood by all involved. This was initially considered so integral that it was not an explicit element of the first theoretical proposition. However, interviews and observation revealed repeated instances where data was disaggregated by disability without any clear purpose, only a general sense of requirement or expectation, or a purpose that was misaligned with the use of WG-SS and disaggregation by disability. Our data suggested that ‘purpose’ is a key explainer of why disaggregating data by disability is often not as straightforward in practice as anticipated.

An unanticipated outcome of donor enthusiasm for increased data disaggregation was data that were collected primarily to appease donors, rather than from a ground up desire to generate information viewed as relevant or usable. This was evident in the explanation of one WFP specialist as to how disability data collection had come about: ‘the first time I saw it, the donor specifically asked WFP to ask whether or not there was a person with disability in the household that WFP was assisting.’ This was echoed by a field officer who also saw the purpose of these data as related to the desires of the donor, rather than WFP: ‘I'm not even sure what we do with this data. I mean, except maybe telling those donors [who are] really asking us, out of the 10,000 household[s] we assisted, 10% of these households [included] someone with disability.’

As we conceptualized data as ‘working’ when it can and does support or spur action that tangibly contributes to equal outcomes for the lives of persons with disabilities, data for demonstration purposes was not considered to work. These following quotes are illustrative of the common experience of staff grappling to understand what purpose disability disaggregated data served in relation to their work. Data for disaggregation was often presented as a way of getting started on disability inclusion (DI), but WFP personnel had doubts regarding the capacity of data to fulfil this purpose.

Monitoring specialist: Why do we want to collect this information? Like does this indicator [no. beneficiaries disaggregated by disability] actually give us an idea of how well we're doing?

Humanitarian advisor What is this really going to show us? How is this fitting in with our mandate?

Elsewhere in the organization, responses showed the power of purpose, with interviewees who had a clear idea of why the data would be important to them.

HQ Advisor: It all comes down […] what you're doing, right? That's one of the things that we're trying to get across. It [data]’s not disaggregated by disability to say ‘this person is disabled’. It's disaggregating by disability to understand what food security looks like for people with disabilities versus those without… it's not status for the sake of it.

Field officer: It's evidence. If you don't have reliable data, you cannot advocate for them… There is no decision we can make without data.

Regional protection advisor: I think if we really are able to capture disability systematically in those early [assessments]… then those assessments will be our evidence and our tool, also our weapon, to actually do more disability inclusion.

Among those who clearly articulated a purpose for collecting data to disaggregate by disability, this varied according to the operational context and activity being implemented; ranging from use as a targeting consideration to serving as a proxy for potential access barriers. The mixed understanding of the purpose of disability disaggregated data underpins the importance of the second criterion, ‘buy-in’, whereby a clear and appropriate purpose for the data supports generation of the necessary buy-in for its uptake.


Buy-in refers to the agreement of key staff across the data cycle to utilize the WG-SS, rather than an adaptation or alternative tool. The impact of non-standardized data collection tools and comparability of data is a known issue in humanitarian practice (Robinson et al, 2021). We hypothesized that adaptations to the WG were driven by a desire for maximal efficiency in emergency settings and a lack of understanding of why the WG-SS are formulated as they are. While the practical challenges to data collection in emergency settings are undeniable, ad hoc changes negate comparability (Rohwerder n.d.), introduce biases, and negatively impact the validity and reliability of the data.

This was confirmed by interviews and nuanced by the fact that in many instances, humanitarian practitioners did not consider their adaptations to be major changes, still reporting the use of the 'WG-SS' for everything from minor changes, such as combining response categories, to major changes, such as a single question asking about someone’s ability to use the toilet independently at night.

Program officer: We do not ask directly, ‘do you have difficulty in seeing? … So, [we] ask like, ‘is there anybody in the household who has some difficulty’ and give some examples.

KII’s further nuanced our theoretical propositions regarding buy-in, revealing that buy-in sometimes had little to do with the specific tool being used, but rather a general sense of overwhelm:

Protection specialist: Now I worry about gender, now I worry about disability. And then we’re going to have to include race, ethnicity. And, you know, it scares people.

Rendering the disaggregation of data by disability mandatory, as done by WFP also did not nullify the need for active buy-in, as noted by a regional data specialist: ‘even just with [data disaggregation by disability] being mandatory, it's interesting to me to see, you know, the different levels of reporting coming through [from country offices]’.

Respondents emphasized that buy-in was not an all-or-nothing feature, as it could be present for some key staff but not others or might fluctuate over time. However, buy-in could be nurtured:

DI specialist: After you scratched the surface hard enough, you uncovered a lot more passion on both sides for the approach to disability data… So [initially] there wasn't a lot of buy-in, but with a little bit more pushing, a little more time, I think people saw the potential.

And doing so at senior levels was key for to unlocking a cascade of buy-in:

Humanitarian advisor: [can we] systematically disaggregate data by disability… I think is a bit early to say that because it really depends on the level of buy-in and engagement of senior management.

Our initial hypothetical ‘hunch’ (Greenhalgh et al, 2017) that buy-in was key was confirmed by the primary data, but this proposition needed to be refined as interviews revealed buy-in to be one of the most slippery and intangible criteria for success. A negative response to the buy-in criterion suggested time and resources may be most effectively spent ‘getting to yes’ before progressing to data collection. These findings can be used to better adapt training on the WG-SS and disability data to address local concerns and encourage buy-in.


Even in the presence of clear understanding of the purpose of disability disaggregated data and widespread buy-in for use of the WG-SS, there may be concerns regarding feasibility, as relayed by one data specialist who questioned, ‘But the issue is… you will need resources and it’s going to be tricky. Can we do it logistically? Can we really implement that?’.

Not all humanitarian data collection and analysis activities are created equal, and issues such as modality, e.g., remote or in-person, and timing of data collection, e.g., assessment, registration, or monitoring, influenced feasibility. It is important to reflect upon whether it is feasible to both collect the WG-SS and achieve the primary aim of the interaction. The WG questions were originally designed for application in censuses with total population coverage, but humanitarian data collection activities must often use much smaller samples which presents challenges to generating representative insight.

Gender specialist: It’s really difficult to sample… this is something that will take a long time for our colleagues at the office to really master, because it's super, super difficult.

While the time taken to administer the questions is a common concern a data specialist working in a country office with experience of large-scale data collection disagreed this is a reason to avoid using the WG-SS:

We can definitely push on this [use of the WG-SS] and it’s not a big deal. It’s not asking a lot. We are asking them to just do this type of internationally accepted questions... I have actually seen household data questionnaires like 20, 30 pages long. But when it comes to the disability section, it's not the same. So, I think it's a good thing to show so that it's not a big ask.

Early in our theorizing around feasibility, we hypothesized that contextual factors such as the acuity or scale of the emergency, of the availability of DI-specific funding could render disaggregation of data by disability more or less feasible. In some instances, this did play a role, yet we did not observe consistent evidence to support these as primary driving factors. What is feasible in one setting may be challenging or impossible in another, and this was more consistently moderated by the approach of key staff—again tying back to clarity of purpose and buy-in among key decision makers.

There are countless offices where more can be achieved and country offices where maybe it cannot be done.

Our findings suggest that a negative response to the feasibility criterion is likely to be nuanced. Effectively addressing feasibility requires a good understanding of the core issue(s) at play including the challenges presented. Finding solutions often requires input from team members other than disability specialists, and often feasibility requires taking time to find solutions within the existing context. Input from those who understand local data systems and processes, funding mechanisms, and program design cycles can clarify modifiable barriers from hard limits and enable efforts to be focused where success is most likely. Where a feasible opportunity is identified and agreed to, quality needs to be considered in advance of data collection and analysis.


Data have the potential to tell compelling stories that can drive more inclusive humanitarian action. Data quality and reliability are key, as decisions such as resource allocations or targeting of assistance may be based on the results of disaggregated analysis. Quality data build confidence that evidence-based decisions result from a robust process. As the fourth criterion, quality ensures a reflection process to monitor whether the WG-SS are being integrated, applied, and analyzed in a way that enables a trustworthy outcome.

In our initial hypothesizing, quality concerns were an element of buy-in, but KII’s further emphasized the importance, elevating it to a stand-alone criterion. This is reasonable given the complexity and gravity of the life-saving decisions made based on these data, and the potential for disability disaggregated data to be challenged or rejected in politicized humanitarian contexts such as active conflict zones including Ukraine or Syria, where disability may be a new or contentious consideration.

Protection specialist: In some ways, we hear from people we don't want to have the disaggregated data because we don't want to be seen to be delivering [assistance] for that reason.

Emergency humanitarian response occurs at a frenetic pace that challenges capacity for careful survey design, enumerator training and analysis. In contrast to other demographic survey questions, such as age, which can be used with little to no training, the WG-SS require some familiarity to be used correctly. Where users are unfamiliar with the logic underpinning the WG-SS formulation, it was more likely that the need for training and adherence to the WG-SS methodology would be overlooked, leading to data quality issues.

Field Officer: So, in that [emergency registration] scenario there became a crowd of people, like 50, 70 people. And everyone tries to just complete the formality of the questionnaire instead of the true sense of the questionnaire. So, it harms, it destroys the purpose of the [WG-SS] tool... So, in my view, you can say time is a crucial factor for the implementation.

Conversely, those who had time for training recounted the positive impact on their own understanding of the WG approach and improved workflow when collecting data.

Program officer: [The training] was a very good eye opener for all of us to see how these [WG-SS] get very delicately asked. From the feedback we've got from the enumerators, it is very useful for them as well, when they're conducting the interview that they have failed previously… You are seeing that [sensitivity] in the W.G. questions. They have given us the feedback saying that they got very good responses from the field, from the source.

The evidence underpinning the quality criterion suggests that time is required for trial and error when using the WG-SS in humanitarian action. It is worth noting that even with large scale data collection activities, opportunities for piloting were limited in these contexts. Monitoring and reflection on data collection activities can indicate what is going well and what parts of the process (not the questions themselves) may need to be adjusted to ensure quality evidence generation.

Analysis and action

Identifying a clear purpose for data, generating buy-in, and ensuring the feasibility of the process and quality of the data output are all significant tasks, yet unless resultant data are analyzed and used, these efforts are in vain. Despite the significant input required to simply produce raw data, the humanitarian sphere comprises 'too much data collected that go unused' (Sandvik 2017). We hypothesized the same risk for data disaggregated by disability within WFP. Observation and data review confirmed this with KII’s illuminating several mechanisms at play in generating this risk. In the final, fifth criterion, we observe the cumulative effect of the presence or lack of prior criteria, as many of the constituent mechanisms of ‘analysis and action’ recall earlier criteria.

Having data available did not automatically translate into inclusive action, as noted in contexts where the purpose for these data had not first been clearly considered or articulated, to the dismay of those in support of disability inclusion.

Advisor: There's so much data collection happening. But what's happening with that data? …we don't see the data turning into improved programming.

Data Specialist: We are not even able to analyze everything. Because of capacity, it's not straightforward even to collect everything, to analyze everything, and to have the resources to make sure that we analyze it correctly and that we don't miss [anything].

The impact of the analysis and action criteria is powerful, as a positive outcome implies inclusive action—the ultimate aim of any humanitarian data collection activity. However, the inverse is not neutral, as a lack of analysis and action had a negative impact on buy-in, likely to affect future data efforts.

Data specialist: Most of the time, this data will be not used. And is it worth the effort? That's the question.

Action may even be an outcome that carries some trepidation, leading to avoidance as hinted at by one Global Advisor: 'if we identify a need, we're going to have to do something about it.'

When such concerns are not adequately addressed earlier in the data process, translating data into action will be much more difficult. Despite these challenges to analysis and use, there was a broad consensus from participants that the purpose of collecting data and generating understanding is ultimately to enable better action.

Field officer: [data are an] important first step but can’t be the last step. [We] have to achieve action.

Program officer: It's critical to start there [with data], so it’s extremely important… And then [there are] other steps we need to follow and make sure that there is also a transition from evidence to operations.

Existing advice to ensure analysts understand how to handle the data generated by the WG-SS remains relevant. However, our research suggests that having an action plan in place before data collection begins, and which includes the input and collaboration of all relevant colleagues, including specialists in inclusion, data, program design, is key to ensuring data will not exist only on the page, but can support programmatic inclusion and more equal humanitarian outcomes.

The criteria in action

In line with the realist assertion that there is ‘nothing so practical as a good theory’ (Greenhalgh et al, 2017) the embedded nature of this research offered a natural proving ground for the final theory expressed as five, consolidated criteria. As anticipated by our rival theory (Table 2), a WFP country office in an acute conflict setting reported fears that the WG-SS would be overly burdensome to administer. They proposed a reworking of the WG-SS and requested guidance on their approach.

As adaptations to the WG-SS are not advised and would not produce valid or comparable data, the researchers instead met remotely with the responsible staff member to apply the five criteria and observe real-time decision making. Table 5 shows a brief record of the response to each criterion and the final decision.

Table 5 Each criterion with answer and brief justification, followed by final decision

As evidenced by Table 5, binary yes/no answers may be impossible in the dynamic, low information environment that characterizes early emergency responses. This highlights that the criteria need not be used only once but can be employed whenever new data opportunities are identified or planned. In this example, the decision not to utilize the WG-SS at that point in time would ideally be factored into an action plan to find a more feasible data collection opportunity in the future. The criteria proved quick and simple to apply in practice, but one challenge was in getting all relevant stakeholders to contribute to the review of the criteria and final decision.


In this study, we used realist methodology to develop a theoretical proposition describing the circumstances under which disability data disaggregation can be expected to work, i.e., support more inclusive humanitarian action. By interrogating the implicit assumption that disaggregating data by disability is always possible or desirable and will always contribute to more inclusive actions, our results nuance the current push to disaggregate. Disaggregation has been touted as a way of ‘making visible the invisible’ (Cilliers 2001), yet data disaggregated by disability must occur at the right time and place to have a positive impact on lives of persons with disabilities in emergencies.

Describing the necessary conditions for successful disaggregation of data by disability is a valuable advancement in the understanding of when and how the WG-SS can be effectively applied in humanitarian action. Figure 1 illustrates that while the criteria build upon one another, the U-shape is not a closed loop—as disaggregation by disability should not continue automatically but is only recommended in contexts where a clear purpose, and subsequent criteria have been identified. However, evaluating whether the appropriate conditions are in place remains challenging for humanitarians, many of whom have limited specialist experience with disability-related data.

To address this challenge and attempt to bridge the research to practice gap, we transformed our theoretical proposition into five practical criteria. These criteria facilitate a decision-making process to support the generation of data as a means of achieving better outcomes, rather than as an end in itself. The criteria have a cumulative effect in building facilitatory conditions for data disaggregation by disability to work. These criteria make an important, two-fold contribution to the literature by providing (i) empirical evidence addressing the research to practice gap of deciding how and whether humanitarian actors should apply the WG-SS in emergency settings and (ii) an example of how RE findings can be practically presented for non-academic users.

The five criteria and linked questions offer the flexibility to be applied across multiple contexts, supporting a common logic between actors and across humanitarian responses. Given the idiosyncrasies of humanitarian action and contexts, applying the criteria often resulted in a mix of yes and no answers. Yet, the purpose of the criteria is not to ensure that disability disaggregation can always be utilized. Rather, the criteria should help to identify and maximize opportunities where disaggregation can confidently proceed, to identify issues that should be addressed prior to data collection, and to identify scenarios where disaggregation of data by disability is an unsuitable approach.

The use of the criteria can lead to a decision that application of the WG-SS is not advisable at that point in time, but due to the rapidly changing nature of emergency humanitarian work, this may change, and vice versa. Briefly documenting the decisions and rationale for each criterion, as demonstrated in Table 4, can create a useful record. Doing so can highlight when a noted condition changes, potentially making data disaggregation a good choice, and it encourages transparency and accountability in decision making around inclusion. This is especially true in emergency scenarios where surge staff may quickly rotate in and out, and in collaborative data collection efforts between humanitarian agencies.

While guidance on how to utilize the WG-SS has proliferated within the humanitarian sphere, their successful application often relies on the capacity and availability of someone with the right expertise to consume and contextualize the information. The findings discussed in this paper may be applicable to any WG module, or other method used to collect information on disability, but our evidence is drawn only from use of the WG-SS.

We expect that colleagues applying these criteria together will be better able to identify precisely where support, change, or resources are necessary, rapidly revealing next steps. Where disaggregation is not the right fit, alternative methods of understanding the needs of persons with disabilities could include qualitative methods, dedicated surveys, or the use of secondary data. As always, partnership with local OPDs is invaluable to ensuring that the right questions are asked of the right people (AusAID. 2012; Wolf-Branigin 2013).

In the right conditions, disaggregation of specific data by disability can provide important information to support and encourage inclusive action. However, given the disproportionate impact of emergencies on persons with disabilities (Wong et al, 2017; Young 2022) and the well documented exclusion of this group from humanitarian responses (Jagosh et al, 2015), a lack of additional data is not a valid reason for delaying action. Preemptively taking reasonable steps known to support accessibility of humanitarian aid will have a positive benefit, whether or not it is quantitatively recorded. Contextualized assessment and comparison on the situation of persons with disabilities is useful in finetuning programming and justifying additional investment, but available evidence tells us that humanitarian action can and should be designed, targeted, and monitored through a basic inclusion lens.


This study was not without limitations. While we made every effort to include a broad sample of interviewees with key insight, the diversity of humanitarian settings and responses means there may be contexts where these findings are less applicable. Similarly, organizations other than WFP may find that their own data processes have idiosyncrasies that affect the applicability of our findings. The embeddedness of the academic research team within WFP provided valuable access and insight, yet as the interviewer is the tool of analysis within qualitative inquiry, this is also likely to have impacted the analysis generated.


The Washington Group Short Set of Questions provides a comparable way to identify a population at risk for limitations in the ability to participate on an equal basis, in accordance with the Convention on the Rights of Persons with Disabilities. These questions have been extensively tested, including in emergency and development settings, and are the tool of choice for disaggregating data by disability in humanitarian work. The proliferation of WG-SS throughout humanitarian practice and increasing calls for disaggregated analysis seems based on the assumption that generating, analyzing, and using such data is possible in most, if not all humanitarian contexts and data activities, and that the availability of such data will usually, if not always lead to greater inclusion. These assumptions are incorrect. Our theory shows that to support inclusion, data must be collected for a clear purpose, be administered without adaptation by staff who understand and are bought in to the WG approach, using a modality and in a context where the intervention is feasible, with appropriate process monitoring and support, and with a concrete plan for analysis and responsive action. The associated field-ready assessment criteria enable humanitarian actors to assess whether and to what extent these conditions are in place.

Availability of data and materials

The qualitative datasets generated and analyzed during the current study are not publicly available due to the restrictions required by the ethics committee based on sensitivities in the data.



Convention on the rights of persons with disabilities


Disability inclusion


Key informant interview


Organization of persons with disabilities


Realist evaluation


United Nations


Washington group


Washington Group Short Set of Questions


Download references


The authors are grateful to Jennifer H. Madans and Julie A. Weeks for their influential feedback on drafts of this paper.


This research was funded by the World Food Program. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Author information

Authors and Affiliations



All authors read and approved the final manuscript. Conceptualization, CO’R and CJ; methodology, CO’R, CJ; validation, CJ; formal analysis, CO’R; investigation, CO’R; data curation, CO’R; writing—original draft preparation, CO’R; writing—review and editing, CJ and CO’R; visualization, CJ; supervision, CJ; project administration, CJ; and funding acquisition, CJ and CO’R. All authors have read and agreed the final version of the manuscript.

Corresponding author

Correspondence to Caroline Jagoe.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

O’Reilly, C.F., Jagoe, C. Disaggregation of humanitarian data by disability: a realist evaluation of the use of the Washington Group Questions to support more inclusive practices. Int J Humanitarian Action 9, 6 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: