A deep dive into resume and LinkedIn sourcing and matching solutions claiming to use artificial intelligence, semantic search, and NLP, including how they work, their pros, cons, and limitations, and examples of what sourcers and recruiters can do that even the most advanced automated search and match algorithms can't do. Topics covered include human capital data information retrieval and analysis (HCDIR & A), Boolean and extended Boolean, semantic search, dynamic inference, dark matter resumes and social network profiles, and what I believe to be the ideal resume search and matching solution.
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic Search vs. Human Cognition and Sourcing
1. Talent Sourcing and Matching
Artificial Intelligence &
Black Box Semantic Search
vs.
Human Cognition & Sourcing
Glen Cathey
www.linkedin.com/in/glencathey
www.booleanblackbelt.com
2. What’s the big deal anyway?
Some people believe resume, LinkedIn and Internet
sourcing is so easy that sourcing is either dying or
dead or can be performed for $6/hour
3. Resume and LinkedIn sourcing
appears simple and easy on the
surface, however – it is deceptively
difficult and complex
4. Anyone can find candidates because all searches
"work" as long as they are syntactically correct
That doesn’t mean the searches are finding all of the
best candidates!
People make assumptions when creating
searches
Every time an assumption is made, there is room for
error and you unknowingly miss and/or eliminate
results!
5. No single search can return all potentially
qualified people
Every search both includes some qualified
people and excludes some qualified people
Some of the best people have resumes or
social profiles that may not appear to be
obvious or strong matches to your needs
6. People cannot effectively be reduced to and
represented by a text-based document
Job seekers are NOT professional resume or
LinkedIn profile writers
Most people still believe shorter and concise
resumes and social profiles are still better
This means they are removing data/info from their
resumes which can no longer be searched for!
7. No one mentions every skill or responsibility
they’ve had, nor describes every environment
they’ve ever worked in
There are many ways of expressing the same
skills and experience
Employers often don’t use the same job titles
for the same job functions
8. People don’t create their resumes and
LinkedIn profiles thinking about how you will
search for them
Sometimes people don’t even use correct
terminology
Anyone easy for you to find is easy for other
recruiters to find = no competitive
advantage!
9. In addition to the people you do find, there are
Dark Matter results of people that exist to be
retrieved, but can't be found through standard,
direct or obvious methods
I estimate Dark Matter to be at least
50% of each source searched
12. “When every business has free and ubiquitous
data, the ability to understand it and extract
value from it becomes the complimentary scarce
factor. It leads to intelligence, and the intelligent
business is the successful business, regardless of
its size. Data is the sword of the 21st century,
those who wield it well, the Samurai.”
-Jonathan Rosenberg, SVP, Product Management @ Google
13. Stop wasting time trying to create difficult and
complex Boolean search strings
Let "intelligent search and match applications"
do the work for you
A single query will give you the results you
need - no more re-querying, no more waste of
time!
14. Understand titles, skills, and concepts
Automatically analyze and define relationships
between words and concepts
Intuit and infer experience by context
17. Intuit experience by context = resume parsing
Parsing breaks down and extracts resume
information
Most recent title and employer
Skills and experience
Years of experience – overall, in each position, with
specific skills, in management, etc.
Education
18. Parsing enables structured, fielded search
Search by:
Most recent title
Recent experience
Years of experience
Etc.
19. Well developed ontologies and taxonomies
Hierarchical
20. Synonymous terms
Programmer, Software Engineer, Developer
Tax Manager, Manager of Tax
CSR, Customer Service Representative
Ruby on Rails, RoR, Rails, Ruby
Oracle Financials, Oracle Applications, e-Business
Suite, etc.
21. Some applications use complex statistical
methods in an attempt to "understand"
language and the relationships between words
Example: Google Distance
22. Keywords with the same or similar meanings in
a natural language sense tend to be "close" in
units of Google distance, while words with
dissimilar meanings tend to be farther apart
23. A measure of semantic interrelatedness
derived from the number of hits returned by
the Google search engine for a given set of
keywords
24. Non-interactive and unsupervised machine
learning technique seeking to automatically
analyze and define relationships between
words and concepts
Clustering is a common technique for statistical
data analysis
25. The design and development of algorithms
that allow computers to evolve behaviors
based on empirical data
A major focus is to automatically learn to
recognize complex patterns and make
intelligent decisions and classifications based
on data
26. Aims to classify data (patterns) in resumes
based either on a priori knowledge or on
statistical information extracted from the
patterns
A priori: independent of experience
Example of pattern recognition: spam filters
27. Finds approximate matches to a pattern in a
string
Useful for word and phrase variations and
misspellings
28.
29. Reduce time to find relevant matches
Can lessen or eliminate the need for recruiters
to have deep and specialized knowledge within
an industry or skill set
Reduce and even eliminate time spent on
research
30. Go beyond literal, identical lexical matching
Levels the playing field
Can make an inexperienced person look like a
sourcing wizard
Good for teams with low search/sourcing capability
31. Work well for positions where titles effectively
identify matches and where there is a low
volume and variety of keywords
Good for a high volume of unchanging hiring
needs
32. Removes thought from the talent identification
and decision making process
Danger of eliminating the need for recruiters to
understand what they’re searching for
Information technology, healthcare, and other
sectors/verticals can create pose serious
challenges to matching apps
33. Apps find some people, bury or eliminate
others
Is finding some people good enough for your
organization?
Shouldn’t your goal be to find ALL of the BEST
people?
34. Matching apps level the playing field
People from different companies using the same
solution will both find and miss the same people
Competitors using the same search and match
solution will have no competitive advantage over
each other!
35. Belief that one search finds all of the best
candidates is intrinsically flawed and simply not
based in reality
Top talent isn't represented by what a search
engine "thinks" has the best resume or profile
AI and semantic search apps favor keyword rich
resumes and profiles
36. Keyword poor resumes and profiles may in fact
represent better talent than keyword rich
resumes and profiles
It’s not just a matter of keyword frequency or
even keyword presence!
AI powered search & match applications can
only return results that explicitly mention
required keywords and their variants
37. Many people have skills and experience that
are simply not mentioned anywhere in their
resumes!
These people are the Dark Matter of
databases, ATS’s, and social networks, and
they exist but cannot be found via direct
search/match methods – AI or otherwise!
38. Pre-built taxonomies are static, limited in their
completeness and must be continually updated
in order to stay relevant and effective
Taxonomies are only as good as who created
them
Applications can only match on what’s present
and cannot “think outside of the box”
39. Semantic clustering and NLP applications can
retrieve related search terms, but that does
not mean they are relevant for your need!
40. Match primarily on titles and skill terms
True match is at the level of role, responsibilities,
environment, etc.
Some applications rank results favoring recent
employment duration
Is someone who has been in their current company
for 5 years really “better” than someone who has
been with their current company for 2 years?
41. Apps don’t "know" what you’re looking for or
what's the best match for your company
Apps are not and cannot be "aware" of people
that were excluded from their search results
Applications are not truly intelligent – they do
not actually "know" or "understand" the
meaning of titles and terms
42.
43. The ability to learn or understand or to deal
with new or trying situations
The ability to apply knowledge to manipulate
one’s environment or to think abstractly
REASON; the power of comprehending and
inferring
Source: Merriam-Webster.com
44. The capability of a machine to imitate
intelligent human behavior
Artificial = humanly contrived
Source: Merriam-Webster.com
45. Dr. Michio Kaku
Theoretical physicist and futurist
specializing in string field theory
Harvard Grad (summa cum laude)
Berkeley Ph.D
Currently working on completing
Einstein's dream of a unified field
theory
What are his thoughts on AI?
46. “…pattern recognition and common sense are
the two most difficult, unsolved problems in
artificial intelligence theory. Pattern
recognition means the ability to see, hear, and
to understand what you are seeing and
understand what you are hearing. Common
sense means your ability to make sense out of
the world, which even children can perform.”
- Dr. Michio Kaku
47. Dr. Michio Kaku believes the job market of the
future will be “dominated by jobs involving
common sense (e.g. leadership, judgment,
entertainment, art, analysis, creativity) and
pattern recognition (e.g. vision and non-
repetitive jobs). Jobs like brokers, tellers,
agents, low level accountants and jobs
involving inventory and repetition will be
eliminated.”
48. That’s good news for sourcers and recruiters who
perform sourcing!
Sourcing requires judgment, creativity, analysis,
common sense and pattern recognition (instantly
making sense of human capital data)
Sourcers of the future will be human capital data
analysts who are experts in HCDIR & A – Human
Capital Data Information Retrieval and Analysis
49. Matching apps do not have the dynamic ability
to learn, understand and instantly relate new
concepts and through direct experience and
observation
They depend on taxonomies, statistical
models, or semantic clustering to “understand”
relationships and concepts
50. The human mind naturally organizes its
knowledge of the world, instantly relating new
terms and concepts and judging their relevance
51. Example: A sourcer who is completely
unfamiliar with “infection control” can instantly
recognize non-highlighted but related and
relevant terms and incorporate them into new
and improved searches
Carolinas HealthCare System, Charlotte, NC
Infection Preventionist 1997-present
Responsible for all aspects of infection prevention and control for an
800+bed hospital. Uses science-based research to perform infection
prevention. Conducts all aspects of surveillance, data analysis, and
presents data to interdisciplinary teams, including the Infection
Control Committee.
52. Human sourcers can learn from research and
search results, dynamically and adaptively
identifying related and relevant search terms
and incorporate them into successive searches
to continuously refine and improve searches
for more relevant results
53. For example, if a recruiter was sourcing for a
position that required a skill that they were
unfamiliar with (e.g.,“Cockburn Use Case
Methodology” ) they could quickly perform
research to learn more about it
In the next slide, you will see a screen capture
of such research
54. From this quick research , the recruiter would
be able to determine that most people would
not explicitly mention “Cockburn Use Case
Methodology,” let alone “Cockburn” (which the
research revealed is pronounced “Co-burn”) –
thus they would not include the term in their
searches
55. Instead, it would be a better idea to search for
candidates that mention experience with Agile
methodology and simply call and ask them if
they have experience with using Cockburn’s
use case methodology (which many likely
would)
56. Applications using Natural Language Processing
do not truly understand human language
They use complex statistical methods to resolve
the many difficulties associated with making
sense of human language
NLP experts admit that to computers, even simple
sentences can be highly ambiguous when
processed with realistic grammars, yielding
thousands or millions of possible analyses
57. Humans effortlessly and automatically process
and understand language, regardless of
sentence length or complexity, ambiguity,
incorrect grammar, etc.
We can udnretsnad any msseed up stnecene as
lnog as the lsat and frsit lteetrs of wdros are in
the crrcoet plaecs
58. Human sourcers and recruiters can deduce
potential experience, even in the absence of
information (not explicitly mentioned in the
resume/profile)
Applications can only work with what’s
actually mentioned in a resume – if it's not
explicitly mentioned, it can't match on it
59. Applications are not aware that many of the
best people have average resumes
Applications are not aware of the people their
algorithms bury in results or eliminate entirely
Human sourcers can become aware of and
specifically target this Dark Matter
60. How can you target resumes and
LinkedIn profiles that exist, but
your searches can’t and don’t
retrieve them?
61. Well developed taxonomies,
semantically generated query
clouds and matching
algorithms can help greatly
with automatically searching
for and matching on
synonymous terms, related
words, word variants,
misspellings, etc.
62. Think + Perform Research
For keyword, phrase or title you are thinking
of using in your search, realize:
1. Not everyone will explicitly mention what you
think they would or should mention in their
resume/profile
2. There are many different and often unexpected
ways of expressing the same skills and experience
63. Global Experience
What search terms might you use if you are
looking for people with global experience?
How many can you think of off the top of your
head?
64. In a few minutes of exploratory research, a sourcer can
come up with a volume of related and relevant terms
Global, international, foreign, multinational, worldwid
e
Europe, European, EU, EMEA, Asia, Asia-Pac, Pacific
Rim, South America, Latin America, Americas, CALA
(Caribbean and Latin America), Middle East
Canada, Japan, China, Russia, India, UK, United
Kingdom, etc.
Countries, Offshore, Overseas
65. How can you target results of people
that your searches retrieve but the
results are buried (ranked poorly or
"too many" results to be reviewed) and
you don’t find them?
66. Search and matching software powered by
artificial intelligence / black box semantic
search doesn't have a solution to this
challenge
One of the major claims AI/semantic search
applications make is that their solutions can
find the "right people" in one search
67. However - a single search strategy is
intrinsically flawed and limited - no single
search can find all qualified candidates, and
each search both includes qualified people as
well as excludes qualified people
I am not aware of any search & match
software that allows for successive searching
via mutually exclusive filtering
68. Run Multiple Searches
Start with maximum qualifications
Use the NOT operator to systematically filter
through mutually exclusive result sets
End with minimum qualifications
70. 1. A and B and C and D and E and F
2. A and B and C and D and E and NOT F
3. A and B and C and D and NOT E and F
4. A and B and C and NOT D and E and F
5. A and B and C and NOT D and NOT E and F
6. A and B and C and D and NOT E and NOT F
7. A and B and C and NOT D and E and NOT F
8. A and B and C and NOT D and NOT E and NOT F
72. Probability-Based and Exhaustive!
This approach allows for:
1. The specific targeting of people who theoretically have
the highest probability of being a match based on
information present
2. The specific targeting of people who may be the best
match, but may have keyword/information poor resumes
or profiles, who do not explicitly mention what you think
the "right" person would or should mention
3. The ability to systematically filter through all available
results via manageable and mutually exclusive result sets
– never seeing the same person twice!
73.
74. A mix of “man and machine,” integrating
human knowledge and expertise into computer
systems
Essentially - the best of both worlds:
Autopilot: An artificially intelligent semantic
matching engine
Manual Override: Ability to take complete control
over searches and search results
75. An artificial intelligence semantic matching
engine coupled with taxonomies built by
human SMEs that are continually modified and
improved specifically for the organization
No COTS solution is customized for any specific
employer, industry or discipline, nor 100%
complete
76. Resume and LinkedIn profile parsing
Structured, contextual search
Most recent title and experience, overall years of
experience, education, etc.
White Box relevance weighting
Configurable by users – no black box!
Searchable tagging for level 5 semantic search
77. Standard and extended Boolean in full text and
field-based search
AND, OR, NOT, configurable proximity, weighting
Configurable proximity enables level 3 semantic
search
Variable term weighting allows users to control
which search terms are more important and thus
control over true relevance
78. Lucene is a free and open source text search
engine that support configurable proximity and
term weighting, and can be integrated into
some existing ATS's/databases
Some Applicant Tracking Systems already have
databases powered by text search engines that
allow for extended Boolean
79. “Society has reached the point where
one can push a button and immediately
be deluged with…information. This is all
very convenient, of course, but if one is
not careful there is a danger of losing
the ability to think.”
- Eiji Toyoda
80. Data and information requires analysis to
support decision making
Just as very expensive Business Intelligence
and Financial Analytics software hasn't
replaced the need for people to make sense of
the data, there is no software solution for HR
and recruiting that replaces the need for
people to analyze and interpret human capital
data to make appropriate decisions
81. Matching apps move/retrieve information, but
only PEOPLE can analyze and interpret for
relevance and make intelligent decisions
Relevant: the ability (as of an information retrieval
system) to retrieve material that satisfies the
needs of the user [1]
Only the user (sourcer/recruiter) can judge
relevance!
[1] Source: Merriam-Webster.com
82. Sourcers and recruiters need technology that
can enable their productivity
Intelligent search and match apps are not a
replacement for creative, curious, investigative
people
Do not seek to automate that which you do not
understand and cannot accomplish manually!
Lost and Found by metrognome0 via Flickr/creative commons
Haystack image
Haystack image
Concept = meaningful combination of words
Decrease content and speak to?
Hierarchical, parent-child, one-way: ontologies apply a larger variety of relation types/categorizationPractice and science of classificationHealthcare - hospital
Source: WikipediaLuwig Wittgenstein’s theories about how words are defined by context
Source: WikipediaLuwig Wittgenstein’s theories about how words are defined by context
Query clouds
Scientific discipline
A priori: independent of experience – book learning vs. OJTA posteriori: dependent on experience or empirical evidence
Some people = good enough for your organization? Find the best, and ALL of the bestMeans the same candidates are also missed
Statistical NLP - Automatically analyze and define relationships between words and conceptsRelevant:the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user
Infer = to derive as a conclusion from facts or premises
More accurately, created intelligence
B.S., summa cum laude from Harvard UniversityPh.D, University of California, Berkeley
Those two problems are at the present time largely unsolved. Now, I think, however, that within a few decades, we should be able to create robots as smart as mice, maybe dogs and cats.
Sourcing requires creativity, interpretive analysis, judgment, and common sense – a natural understanding based on experience
Sourcing requires creativity, interpretive analysis, judgment, and common sense – a natural understanding based on experience
Dynamic – continuous and productive activity or changeStatic – showing little/no change
Dynamic Inference = Infer = to derive as a conclusion from facts or premises
Dynamic Inference = Infer = to derive as a conclusion from facts or premises
Dynamic Inference = Infer = to derive as a conclusion from facts or premises
Dynamic Inference = Infer = to derive as a conclusion from facts or premises
Dynamic Inference = Infer = to derive as a conclusion from facts or premises
Dynamic Inference = Infer = to derive as a conclusion from facts or premises
Dynamic – continuous and productive activity or change
Aware: having or showing realization, perception, or knowledgeWhat if? How else? Curious…
The system gives you what it thinks you wanted, and you are able to tell the system what you wantedA Knowledge Engineering system that integrates human knowledge into computer systems in order to solve complex problems normally requiringa high level of human expertise
An expert system is computer software that attempts to mimic the reasoning of a human specialist
Kaizen and the Toyota Way
Financial data/BI - Tons of software – financial EPR, business intelligence apps – they STILL require people to analyze and interpretApplications can’t truly analyze/interpret
Financial data/BI - Tons of software – financial EPR, business intelligence apps – they STILL require people to analyze and interpretApplications can’t truly analyze/interpret