Are algorithms automatically free of bias? What are the effects of group differences and performance ratings on these algorithms? Do these algorithms take into account diversity and life experiences?
2. Some Questions
• What bias might exist in the Hogan Assessments?
• To what extent are the constructs that Hogan measures “white”
constructs?
• What data does Hogan have to support a statement that the Hogan
assessment data is not ethnically biased?
• How can everyone be on an equal playing field if what shaped my
personality is so much different from those who have not faced the same
challenges?
• DIF and Bias in the HPI (2006 publication in Assessment)
• Are the assessments biased against neurodiverse candidates?
3. Decision Making and Algorithms
Evaluating Algorithms
How Hogan Uses Algorithms to Aid Decisions
How HoganTests forAccuracy and Bias
Summary
Overview
1
2
3
4
5
4. Decision Making & Algorithms
All people, and organizations, must make decisions
You have no choice.
All decisions are made using algorithms
Including personnel decisions, whether assessments are used or not.
Some algorithms are…
1. More accurate, 2. More transparent, and 3. More unbiased.
5. Evaluating Algorithms
• The only way to know an algorithm’s accuracy is to test it
• The only way to know how an algorithm works is to see how all the
inputs affect the outputs
• The only way to know an algorithm’s bias is to test it
6. Algorithm Accuracy
• Every single algorithm used at Hogan has been tested and QA’d
extensively. We know our assessments predict job performance
and reputation.
• 400+ Criterion RelatedValidity Studies
• Adjective Checklist Research
• 360 Research
• These algorithms have repeatedly been proven to be more
accurate than an interviewer’s judgment and other methods of
making personnel decisions
7. Algorithm Transparency
• Every single algorithm used at Hogan is meticulously
documented. I can tell you exactly how every input affects every
output.
• Conversely, I have no idea how a hiring manager processing
information gleaned from an interview. I have no idea if the
manager even got the same inputs from all candidates.
8. Algorithm Bias
Two Kinds of Bias
• Group Differences
• i.e., Do groups get different scores on average?
• Predictive Bias
• i.e., Do the scores have different implications for different
groups?
9. Group Differences
• We repeatedly test for group differences in our items, scales,
profiles, and recommendations.
• We consistently find no large or meaningful group differences.
• These results are consistent with a broad literature on personality
assessment showing no, or very small, group differences
10. Predictive Bias
• We have tested for hundreds of possible predictive biases with
outcomes including job performance, peer-ratings, and 360
ratings
• For different ethnic, gender, and age groups
• We consistently find no evidence for predictive bias
• A high/low score on our assessments has the same implications for
everyone
12. Algorithm Bias
• Because we know exactly how our algorithms work, we can
regularly test them for both kinds of bias. We consistently find no
evidence of either group difference or predictive bias.
• Because we have no idea how an interviewer’s judgment works,
we have no idea if it contains group difference or predictive bias.
13. Baked In Bias
• “But if the criterion (e.g., performance, 360, ACL) are themselves
biased, won’t your algorithms just return that bias?”
• This is complicated, but the answer is no.
14. Why No Baked In Bias?
• Scenario: 100 candidates + manager job performance ratings
• Manager’s ratings are biased
• Build an algorithm to predict performance
• Won’t the algorithm be biased?
• No.The inputs (personality data) do not know about the
manager’s bias because personality does not covary with
ethnicity, gender, etc.
• The inputs cannot predict the biased part of performance ratings
• However, this will lower the algorithm’s accuracy
15. What About DIF?
• DIF = Differential Item Functioning
• The degree to which items are rated differently by members of
different groups who have the same true standing on the trait
• 2006 Paper – “Differential Item Functioning by sex and race in the
Hogan Personality Inventory”
• No group mean differences
• Wrong scoring key was used
• Hogan’s own analysis of DIF (much larger sample)
• 2/728 comparisons showed non-negligible DIF
• Far fewer than we would expect by chance
• No evidence for Differential Test Functioning
16. Bias Against the Neurodiverse?
• Our assessments are not clinical measures and cannot be used to diagnose
developmental or behavioral disorders
• Our assessments consist of standard behavioral and attitudinal statements, like
those one might get in an interview
• Our assessments are untimed (exception: cognitive)
• We do not collect data on neurodiversity, so we do not know if there are group
differences
• There is no reason to expect predictive differences for neurodiverse candidates.
• A high score onADJ has the same implications regardless of neurological
differences
17. Summary Bias exists, is pervasive, and has real consequences
Our assessments are designed to combat bias
Accuracy, transparency, and fairness are central to
Hogan
This is not true for most methods for making personnel
decisions
“If you choose not to decide you still have made a choice”
Imagine a standard validation study scenario. We have personality measured in 100 candidates as well as job performance ratings on these candidates from their managers. Now, let’s further imagine that the managers are biased in their performance ratings against minority candidates (not a crazy assumption given what we already know about people). Now, let’s say we try to predict job performance from the personality data to build algorithms. Will our algorithms be biased against the minority candidates? The answer is no. The reason is that the inputs (the personality data) show no group differences. The minority and majority candidates have the same scores on average. Therefore, the inputs cannot, by definition, predict the biased part of the performance ratings. Yes, it is the case that the biased performance ratings will make our algorithms perform worse, in terms of their accuracy, but this will apply to all people and not benefit majority candidates in the selection process. It *would* create an algorithmic bias *if* the majority and minority candidates also differed on an input (e.g., race) that corresponded with the outcome variable (i.e., job performance ratings). But our assessments are blind to race and race does not covary with our assessments (the inputs) so it cannot have an impact. This is why we constantly preach the use of scientifically valid personality assessments as a way to combat bias.
Imagine two people – a man and woman – who are truly the same height. However, for whatever reason, when the man is measured, he gets a higher score on a particular measuring stick. The measuring stick shows DIF.