81
Mathification of Subjectivity
When assessing algorithmic discrimination, it is vital to have a definition of “ground truth”. In
the case of hiring, this notion is quite subjective, where the definition of “good candidate to job
fit” can differ from organization to organization, and even among the hiring managers within that
organization. This makes the challenge of a model audit an inconsistent one, where these
definitions will vary significantly by audit vendor. In short, it is entirely possible to “game the
system”, allowing vendors to provide audits that reflect a lack of bias where bias truly exists.
The guidance in its current form does make way for one method to avoid assessing the human
factor, by allowing for analysis of adverse impact by match score alone. However, later on in
these comments we will detail just a few scenarios in which this simplified reporting may miss
many forms of bias that remain, despite “passing” metrics. In order to assess algorithmic
discrimination, a combination of quantitative and qualitative analysis is required, in order to
contextualize and fully situate the impact to candidates amid the totality of the system. Candidate
positioning, ranking, and display qualities matter a great deal to a candidate’s likelihood of
receiving an offer. In addition, there are many standardization practices that AEDT vendors can
undertake to limit discrimination that can only be uncovered through an assessment of their risk
and control practices. By neglecting the qualitative elements of the field of algorithmic impact,
the city paves the way for these reports to be misleading, and ultimately to fail to reflect real-
world discrimination where it exists.
Demographic Inference
As we’ve previously stated, employers may possess demographic data for their hired candidates,
but the vendors who provide this technology often make active effort not to collect this vital
information. As a result, these AEDT vendors often turn to methods like BISG to infer race and
gender characteristics. BISG, as the most prevalent of these methods, was developed in
healthcare research, and has been employed at great scale within the financial sector. However,
besides concerns around accuracy, the methods themselves pose structural inequity. Race itself is
a subjective attribute, and one which many have claimed can never be truly inferred. These
methods also only allow for analysis on a gender binary, obscuring discrimination which may
occur against others along the gender spectrum. An unintended consequence of this guidance
may be the proliferation of these techniques, which have received deep scrutiny and criticism for
their lack of inclusivity, and propensity for error. In fact, these error rates may in many cases be
high enough to further obscure discrimination or lack thereof. If a set of candidates are
improperly associated to the incorrect protected group, this may result in low enough accuracy to
make the report incorrect, and therefore misleading. Additionally, common inference methods
like BISG can only be effective in regions where we can assume that redlining, white flight, and
gentrification have homogenized the racial makeup of the area. This seems broadly inadequate
for a city as diverse as New York where there may be just as many Black John Smiths in the
same zip code as there are white John Smiths. In our field, the vast consensus is that the only
proper way to use demographic data in analysis is when it is volunteered from the candidates
themselves. We recommend to our AEDT vendor clients that they engage in post-hoc surveys,
despite our expectations that response rates will be low, because it will yield the greatest
accuracy. These surveys take time, however, and in many cases the clients who have only begun
this analysis in the second half of 2022 will not have adequate time to complete this initiative
sufficiently prior to the release of their public reports.