Formal Concept Analysis applied to Professional Social Networks
1. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Formal Concept Analysis applied to
Professional Social Networks
Paula Silva, S´ergio M. Dias, Wladmir Brand˜ao, Mark Song,
Luis Z´arate (supervisor)
Pontifical Catholic University of Minas Gerais, Brazil
ICEIS, 2017
ICEIS, 2017 1 / 26
2. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Context and Motivation
Online social networks for users oriented to business
The LinkedIn is the one of the largest and most popular online
professional social network
The size and diversity of users generated content data
The need to help professionals to increase skills and reach job positions
The Formal Concept Analysis (FCA) as mathematical formulation for data
analysis, applied to find patterns of professional competence
ICEIS, 2017 2 / 26
3. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Goal
Identify professional behaviors through data scraped from LinkedIn
Find the minimum set of skills that is necessary to reach job positions
For example: statistic, machine learning, databases → data scientist
Implications rules, specifically the set of proper implications
ICEIS, 2017 3 / 26
4. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Contributions
Our contributions are:
The domain problem mapping the model of competences
The professionals data set scraped from LinkedIn
The FCA-based approach
The set of experiments to apply FCA for professional career analysis
ICEIS, 2017 4 / 26
5. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Formal Concept Analysis
Based on the notions of concept and conceptual hierarchy
A mathematical way to look at data and knowledge, their acquisition
process and analysis based on lattices.
There are three main principles:
Formal context
Formal concept
Implications
ICEIS, 2017 5 / 26
6. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Formal Context
Formal context (G, M, I)
a set G of objects
a set M of attributes
a binary relation I ⊆ G ×M
a b c d e f g
18 x x
19 x x x
20 x x x x
21 x x x
22 x x
23 x x x x x
24 x x
Table: Example context of an user’s LinkedIn skills.
ICEIS, 2017 6 / 26
7. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Formal Concept
Derivation operators:
For A ⊆ G and B ⊆ M
A := {m ∈ M|gIm∀g ∈ A}
B := {g ∈ G|gIm∀m ∈ B}
Formal concept (A, B)
A ⊆ G B ⊆ M
A = B B = A
A is concept extent and B is concept intent
e.g. formal concept ({18, 21, 24}, {software engineering, data bases})
ICEIS, 2017 7 / 26
8. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Implication Rules
Being a formal context (G, M, I), an implication over M is P → Q, where
P,Q ⊆ M.
P is the premise
Q is the conclusion
P → Q has to be such that P ⊆ Q , so the sets of attributes P and Q
share the same subset of objects.
ICEIS, 2017 8 / 26
9. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
The Set of Proper Implications
the right hand side of each implication is unitary: if P → m ∈ I , then
m ∈ M;
superfluous implications are not allowed: if P → m ∈ I , then m /∈ P;
specializations are not allowed, i.e. left hand sides are minimal: if
P → m ∈ I , then there is not any Q → m ∈ I such that Q ⊂ P.
For example, {e} → {a} is a proper implication, but {e, g} → {a} is not a
proper implication.
ICEIS, 2017 9 / 26
11. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
1 - Problem Domain
Building the conceptual model according to the problem to be treated
The classification of informational categories, based on Model of
Competence
Figure: Duran(1998) adapted model
ICEIS, 2017 11 / 26
12. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
1 - Problem Domain
We identified 3 dimensions, 14 aspects and 51 variables
Figure: Professional identification through model of competence ICEIS, 2017 12 / 26
13. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
2 - Scrapping
The scrapping process was divided into two phases.
Selecting initial seeds randomly
Collecting the public profiles data: People undergraduate in IT coursers in
Minas Gerais, Brazil
ICEIS, 2017 13 / 26
14. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
3 - Preprocessing
We only considered the variables skills and experience
The ETL process:
String cleaning: UTF-8 encoding correction, accent removal,
standardization of all terms for the English language through Google
Translate API
Attribute reductions: reductions based on semantic relevance
Generic term Specific term
Java
JPA
JSF
Software developer
Developer
Programmer
Program developer
Formal context with 366 attributes and 970 objects
ICEIS, 2017 14 / 26
15. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
4 - Knowledge Extractor
The PropIm algorithm
Finding supersets: If A → b, so ↑ |(A → b) | and ↓ |A|
It computes proper implications with support > 0
Easily scalable strategy
Allows to set the attributes of interest for implications’ conclusions
Problem complexity: O(|M||I |(|G||M|+|I ||M|))
Pruning heuristic to reduce combinations possibilities among attributes
ICEIS, 2017 15 / 26
16. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
4 - Knowledge Extractor
Input : Formal context (G,M,I)
Output: Set of proper implications I with support
greater than 0
1 I = /0
2 foreach m ∈ M do
3 P = m
4 size = 1
5 Pa = /0
6 while size < |P| do
7 C = P
size
8 PC = getCandidate(C,Pa)
9 foreach P1 ⊂ PC do
10 if P1 = /0 and P1 ⊂ m then
11 Pa = Pa ∪{P1}
12 I = I ∪{P1 → m}
13 end
14 end
15 size ++
16 end
17 end
18 return I
1 Function getCandidate (C,Pa)
2 D = /0
3 foreach a ∈ A|A ⊂ Pa do
4 foreach B ⊂ C do
5 if a /∈ B then
6 D = PC B
7 end
8 end
9 end
10 return D
ICEIS, 2017 16 / 26
17. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Experiments
The goal was to answer the following questions:
How do proper implications identify relations between skills and positions?
Could we find intersections among sets of skills, and what do these
intersections represent?
ICEIS, 2017 17 / 26
18. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Proper Implications to Competence Identification
We selected 20 positions and their 180 skills
It was extracted 895 proper implications with PropIm algorithm
Figure: Proper implications network. AD = Administrative Director, ITC = IT Consultant
ICEIS, 2017 18 / 26
19. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Proper Implications to Competence Identification
Nodes with high in-degree value represent positions which have more
diversification of sets of skills
163 proper implications
{entrepreneurship, human resources, information management} →
{administrative director}
Figure: AD = Administrative Director
ICEIS, 2017 19 / 26
20. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Proper Implications to Competence Identification
Nodes with low in-degree value represent jobs positions that demand
more specific sets of skills
3 proper implications
{ABAP, agile methodology, BI} → {it consultant}
Figure: ITC = IT Consultant
ICEIS, 2017 20 / 26
21. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Intersection Between Skills and Job Positions
Top 3 best jobs in Information Technology area according to Career Cast
research (Cast, 2016)
Edges weight = relative frequency F =
Fi
Fp
, where F is the relative
frequency, Fi is the implication absolute frequency and Fp is |m |
Why relative frequency?
{java frameworks}→{software
engineer}
Support: 2.47%
Relative frequency: 75.56%
Figure: P1: data scientist, P2: information security
analyst and P3: software engineer job position
ICEIS, 2017 21 / 26
22. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Intersection Between Skills and Job Positions
Nodes P1 and P2 share two set of skills:
{agile methodology} → {data scientist}
{agile methodology} → {information security analyst}
Figure: P1: data scientist, P2: information security analyst and P3: software engineer job position
ICEIS, 2017 22 / 26
23. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Intersection Between Skills and Job Positions
Skills and positions related to hierarchical transition:
P1: {.NET, automation systems} → IT analyst
P2: {.NET, data base, ERP, it governance} → IT coordinator
P3: {BPM, cloud computing, CRM} → IT manager
P4: {assets management, BI, business management, consulting} → IT
director
Figure: P1 to P4 represents the following job positions: P1 is IT analyst, P2 is IT manager, P3 is IT
coordinator and P4 is IT director
ICEIS, 2017 23 / 26
24. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Conclusions
FCA-based approach to identify the minimum sets of skills that is
necessary to achieve a job position
Set of experiments for apply FCA to professional competences analysis
Computational strategy to find proper implications without loss of
information
In-degree of conclusions nodes mean the diversification among sets of
skills for the same job position
Sharing sets of skills do not determine the job position, but show positions
that can be achieved
The disjointed sets (representing hierarchy) show that is necessary
develop skills of different natures to progress in career
ICEIS, 2017 24 / 26
25. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Future Works
Experiments will be replicated for other areas
Exploring another algorithms which extract implications from concept
lattice, or from the set of formal concepts
Expanding the analysis to all dimensions from model of competence
Implementing an web environment with this FCA-based approach, for help
professionals to increase skills and look for possible job positions
Preprocessing: attribute fusion through correlation analysis
Temporal analysis of career evolution
Estimate the remuneration of people based on posts and professional data
ICEIS, 2017 25 / 26
26. Introduction FCA FCA-based Approach Experiments Conclusions Future Works
Any questions?
paula.raissa@sga.pucminas.br
paula.csraissa@gmail.com
ICEIS, 2017 26 / 26