O documento apresenta um resumo sobre o algoritmo de classificação Naive Bayes. Em três passos, o algoritmo calcula a probabilidade de uma nova instância pertencer a cada classe existente no conjunto de treinamento e a classifica na classe com maior probabilidade. O documento ilustra o processo com um exemplo sobre predição de aprovação de alunos baseado em tempo de estudo e número de postagens.
Enhancing the Status Message Question Asking Process on FacebookJonathas Magalhães
People have been using Social Networks to search for help by broadcasting messages that reflect their information needs. However, several factors, usually not considered by the user, influence the outcome of receiving or not an answer. In this work, we aim to increase the users’ chances of finding someone who could help them. For this purpose, we propose a mobile app called Social Query, which guides the users through some steps before they share the problem with their friends. As far as we know, this is the first work to merge these three aspects of the social search: Question Rephrasing, Expert Search Filtering and Expertise Finding. To evaluate our proposal, we ran a questionnaire in which users considered Useful most functions of the app.
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
In this paper, we propose a Personalized Paper Recommender System, a new user-paper based approach that takes into consideration the user academic curriculum vitae. To build the user profiles, we use a Brazilian academic platform called CV-Lattes. Furthermore, we examine some issues related to user profiling, such as (i) we define and compare different strategies to build and represent the user profiles, using terms and using concepts; (ii) we verify how much past information of a user is required to provide good recommendations; (iii) we compare our approaches with the state-of-art in paper recommendation using the CV-Lattes. To validate our strategies, we conduct a user study experiment involving 30 users in the Computer Science domain. Our results show that (i) our approaches outperform the state-of-art in CV-Lattes; (ii) concepts profiles are comparable with the terms profiles; (iii) analyzing the content of the past four years for terms profiles and five years for concepts profiles achieved the best results; and (iv) terms profiles provide better results but they are slower than concepts profiles, thus, if the system needs real time recommendations, concepts profiles are better.
Enhancing the Status Message Question Asking Process on FacebookJonathas Magalhães
People have been using Social Networks to search for help by broadcasting messages that reflect their information needs. However, several factors, usually not considered by the user, influence the outcome of receiving or not an answer. In this work, we aim to increase the users’ chances of finding someone who could help them. For this purpose, we propose a mobile app called Social Query, which guides the users through some steps before they share the problem with their friends. As far as we know, this is the first work to merge these three aspects of the social search: Question Rephrasing, Expert Search Filtering and Expertise Finding. To evaluate our proposal, we ran a questionnaire in which users considered Useful most functions of the app.
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
In this paper, we propose a Personalized Paper Recommender System, a new user-paper based approach that takes into consideration the user academic curriculum vitae. To build the user profiles, we use a Brazilian academic platform called CV-Lattes. Furthermore, we examine some issues related to user profiling, such as (i) we define and compare different strategies to build and represent the user profiles, using terms and using concepts; (ii) we verify how much past information of a user is required to provide good recommendations; (iii) we compare our approaches with the state-of-art in paper recommendation using the CV-Lattes. To validate our strategies, we conduct a user study experiment involving 30 users in the Computer Science domain. Our results show that (i) our approaches outperform the state-of-art in CV-Lattes; (ii) concepts profiles are comparable with the terms profiles; (iii) analyzing the content of the past four years for terms profiles and five years for concepts profiles achieved the best results; and (iv) terms profiles provide better results but they are slower than concepts profiles, thus, if the system needs real time recommendations, concepts profiles are better.
An Ontology Based Approach for Sharing Distributed EducationalJonathas Magalhães
Over the last decade, research in the area of e-learning employed efforts in solutions for sharing and reuse of educational resources. However, these solutions were not built for communication with each other. That is, in general, the development of these solutions has not provided mechanisms to facilitate the exchange of resources among them. Thus, this effort led to a fragmented landscape of concurrent metadata schemas or interface mechanisms. To address this problem, this work aims to design an ontology based semantic model for sharing educational resources from different sources. This model serves as a hub of enriched metadata for educational resources, aiming to classify, organize and align these resources. At the
current step of this research, we developed the ontology for the integration model, and initial evaluations were performed, showing its efficiency in the formal description of educational resources.
A Recommender System for Predicting User Engagement in TwitterJonathas Magalhães
The RecSys Challenge is a traditional competition among
Recommender Systems’ (RS) researchers. The 2014 edition is focused on predicting the amount of interaction achieved by tweets related to movies. In this paper, we present an approach to participate in the 2014 RecSys Challenge. Our approach consists of three steps: (i) using binary classification methods in order to split the tweets into two lists, those having user engagement equal to zero, and those having user engagement different from zero; (ii) each list is sorted through the use of regression methods; and (iii) is performed a concatenation of the two lists and a sort of tweets. To validate our approach we tested 126 configurations and verify that the settings using MovieTweetings dataset, Naïve Bayes classifier and Linear Regression, obtained the best results: nDCG@10 = 0.9037242.
Social Query is a new and efficient way to get
answers on the social networks. However, the popular method of sharing public questions could be optimized by directing the question to an expert, a process called query routing. In this work, we propose a Social Query System for query routing on Twitter, currently, one of the most popular social networks. The Social Query Systems analyzes the information about the questioner’s followers and recommends the most suitable users to answer the questions. The use of the system changes the usual process, working apart of Twitter and allowing questioner and responder exceed the limit of 140 characters. Through a qualitative evaluation, we showed promising results and ideas for improving the system and the recommendation algorithm.
A Query Routing Model to Rank Expertcandidates on TwitterJonathas Magalhães
Online Social Networks (OSNs) have become very popular and new ways of use their virtual environment have emerged. One of these new ways is a method to obtain information online called Social Query that consists of sharing a question on an OSN and waiting for answers come from contacts. The usual strategy is sharing a question that will be visible to everyone (public). However, this way there is no guarantee that an answer will be received neither about the quality of the answer. Directing the question to an expert about its subject (Query Routing) is a better strategy, but decides to whom direct the question is not always an easy task. In this work, we propose and evaluate a model to decide who user is the most able to receive a question and answer it correctly and quickly. The differential of our research is that we focused in OSNs context and leaded with the recommendation as multi-criteria decision making problem. Our evaluation shows promising results and confirms the great performance of our proposal.
Predicting Potential Responders in Twitter: A Query Routing AlgorithmJonathas Magalhães
A phenomenon not so recent is the substantial increase in popularity and use of online social networks. With that has emerged a new way to find information online: the social query, which consists of posting a question in a social network and wait for responses from close friends. Usually, a question is posted to be visible to everyone, but we believe that this is not the best way: there will be the possibility of receiving several responses (including wrong), keep receiving answers where there is no need, do not receive answers, etc. The query router problem consists of finding the most able individual in the personal social network
of the questioner. This work presents an algorithm to Routing Questions in Twitter. The model was validated through its predict capacity and the results shows that its recommendations match in half cases only when combined with a technique to enrich the information present in the question.
An Open and Inspectable Learner Modeling with a Negotiation Mechanism to Solv...Jonathas Magalhães
Some researchers have developed relevant and diverse proposals for improving the content quality of the learner model in Intelligent Tutoring Systems, mainly reducing its uncertainty. Following
this aim, this paper proposes an open learner modeling approach using
Bayesian networks, focusing on negotiation mechanism to solve detected
cognitive conflicts that can emerge when the learner inspects information
of his model inferred by the system. Therefore, we addressed some issues
concerning the provision of inspectable model and negotiated updating
of this model. Its contribution lies in the fact that the learners attempt
to change the learner model is met with a challenge, leading to a decision
if the learner claims to know more (or less) than the model represents.
Improving a Recommender System Through Integration of User Profiles: a Semant...Jonathas Magalhães
The users are present in multiple social networks/virtual communities and each one can be considered as a source of information about this user. In face to this question it is important a mechanism to integrate the user profiles. Through the integration of user profiles it is possible identifier more accurately their interests analyzing other data sources that they are present, possible reducing the cold-start problem. In this context, we present a semantic approach to help integrate data from multiple sources, for the construction and maintenance of user profiles that will be used to improve the quality of a recommender system. To integrate data from multiple sources, we defined a heuristic that quantifies the importance of each data source for a given user. To validate our approach, we perform a case study, where the solution was coupled into a recommender system of papers focused in Software Engineering domain. The user profiles were built extracting their information from the Brazilian Curriculum Vitae database named CV-Lattes, an academic platform, and Linkedin, a business network. We compared the quality of the recommendation based on the profiles integrated and non-integrated. The results show the superior quality of the recommendation based on integrated profile.
An Ontology Based Approach for Sharing Distributed EducationalJonathas Magalhães
Over the last decade, research in the area of e-learning employed efforts in solutions for sharing and reuse of educational resources. However, these solutions were not built for communication with each other. That is, in general, the development of these solutions has not provided mechanisms to facilitate the exchange of resources among them. Thus, this effort led to a fragmented landscape of concurrent metadata schemas or interface mechanisms. To address this problem, this work aims to design an ontology based semantic model for sharing educational resources from different sources. This model serves as a hub of enriched metadata for educational resources, aiming to classify, organize and align these resources. At the
current step of this research, we developed the ontology for the integration model, and initial evaluations were performed, showing its efficiency in the formal description of educational resources.
A Recommender System for Predicting User Engagement in TwitterJonathas Magalhães
The RecSys Challenge is a traditional competition among
Recommender Systems’ (RS) researchers. The 2014 edition is focused on predicting the amount of interaction achieved by tweets related to movies. In this paper, we present an approach to participate in the 2014 RecSys Challenge. Our approach consists of three steps: (i) using binary classification methods in order to split the tweets into two lists, those having user engagement equal to zero, and those having user engagement different from zero; (ii) each list is sorted through the use of regression methods; and (iii) is performed a concatenation of the two lists and a sort of tweets. To validate our approach we tested 126 configurations and verify that the settings using MovieTweetings dataset, Naïve Bayes classifier and Linear Regression, obtained the best results: nDCG@10 = 0.9037242.
Social Query is a new and efficient way to get
answers on the social networks. However, the popular method of sharing public questions could be optimized by directing the question to an expert, a process called query routing. In this work, we propose a Social Query System for query routing on Twitter, currently, one of the most popular social networks. The Social Query Systems analyzes the information about the questioner’s followers and recommends the most suitable users to answer the questions. The use of the system changes the usual process, working apart of Twitter and allowing questioner and responder exceed the limit of 140 characters. Through a qualitative evaluation, we showed promising results and ideas for improving the system and the recommendation algorithm.
A Query Routing Model to Rank Expertcandidates on TwitterJonathas Magalhães
Online Social Networks (OSNs) have become very popular and new ways of use their virtual environment have emerged. One of these new ways is a method to obtain information online called Social Query that consists of sharing a question on an OSN and waiting for answers come from contacts. The usual strategy is sharing a question that will be visible to everyone (public). However, this way there is no guarantee that an answer will be received neither about the quality of the answer. Directing the question to an expert about its subject (Query Routing) is a better strategy, but decides to whom direct the question is not always an easy task. In this work, we propose and evaluate a model to decide who user is the most able to receive a question and answer it correctly and quickly. The differential of our research is that we focused in OSNs context and leaded with the recommendation as multi-criteria decision making problem. Our evaluation shows promising results and confirms the great performance of our proposal.
Predicting Potential Responders in Twitter: A Query Routing AlgorithmJonathas Magalhães
A phenomenon not so recent is the substantial increase in popularity and use of online social networks. With that has emerged a new way to find information online: the social query, which consists of posting a question in a social network and wait for responses from close friends. Usually, a question is posted to be visible to everyone, but we believe that this is not the best way: there will be the possibility of receiving several responses (including wrong), keep receiving answers where there is no need, do not receive answers, etc. The query router problem consists of finding the most able individual in the personal social network
of the questioner. This work presents an algorithm to Routing Questions in Twitter. The model was validated through its predict capacity and the results shows that its recommendations match in half cases only when combined with a technique to enrich the information present in the question.
An Open and Inspectable Learner Modeling with a Negotiation Mechanism to Solv...Jonathas Magalhães
Some researchers have developed relevant and diverse proposals for improving the content quality of the learner model in Intelligent Tutoring Systems, mainly reducing its uncertainty. Following
this aim, this paper proposes an open learner modeling approach using
Bayesian networks, focusing on negotiation mechanism to solve detected
cognitive conflicts that can emerge when the learner inspects information
of his model inferred by the system. Therefore, we addressed some issues
concerning the provision of inspectable model and negotiated updating
of this model. Its contribution lies in the fact that the learners attempt
to change the learner model is met with a challenge, leading to a decision
if the learner claims to know more (or less) than the model represents.
Improving a Recommender System Through Integration of User Profiles: a Semant...Jonathas Magalhães
The users are present in multiple social networks/virtual communities and each one can be considered as a source of information about this user. In face to this question it is important a mechanism to integrate the user profiles. Through the integration of user profiles it is possible identifier more accurately their interests analyzing other data sources that they are present, possible reducing the cold-start problem. In this context, we present a semantic approach to help integrate data from multiple sources, for the construction and maintenance of user profiles that will be used to improve the quality of a recommender system. To integrate data from multiple sources, we defined a heuristic that quantifies the importance of each data source for a given user. To validate our approach, we perform a case study, where the solution was coupled into a recommender system of papers focused in Software Engineering domain. The user profiles were built extracting their information from the Brazilian Curriculum Vitae database named CV-Lattes, an academic platform, and Linkedin, a business network. We compared the quality of the recommendation based on the profiles integrated and non-integrated. The results show the superior quality of the recommendation based on integrated profile.
Improving a Recommender System Through Integration of User Profiles: a Semant...
Naive Bayes
1. UFAL - Universidade Federal de Alagoas
UFAL - Instituto de Computa¸˜o
ca
Naive Bayes
Jonathas Magalh˜es a
jonathas@ic.ufal.br
Magalh˜es, J.J.
a IA – 2013 1
2. Naive Bayes
Baseado no Teorema de Bayes:
P(B|A) ∗ P(A)
P(A|B) = . (1)
P(B)
Seja X (A1 , A2 , ..., An , C ) um conjunto de dados de treinamento;
Onde C1 , C 2, ..., Ck s˜o classes dos poss´
a ıveis valores de C ;
R ´ um novo registro que deve ser classificado.
e
Os valores que R assume em X s˜o a1 , a2 , ..., an .
a
Magalh˜es, J.J.
a IA – 2013 2
3. Naive Bayes
Passos do algoritmo:
1 Calcular a probabilidade P(C = Ci |R), i = 1, 2, ..., k;
2 A sa´ ´ a classe Cj tal que P(C = Cj |R) seja m´xima.
ıda e a
A probabilidade de uma instˆncia pertencer a uma classe ´ dada por:
a e
P(C = Ci |A1 = a1 , A2 = a2 , ..., An = an ) =
P(A1 = a1 |C = Ci ) ∗ P(A2 = a2 |C = Ci ) ∗ ... (2)
∗P(An = an |C = Ci ) ∗ P(C = Ci ).
Magalh˜es, J.J.
a IA – 2013 3
4. Naive Bayes – Exemplo
Considere os seguintes dados:
X 1 : Tempo de utiliza¸˜o
ca X 2 : N´mero postagens
u Y : Passou na disciplina
2 4 N˜o
a
3 6 N˜o
a
4 8 N˜o
a
4 4 N˜o
a
5 7 N˜o
a
6 5 N˜o
a
6 6 Sim
6 5 Sim
7 7 Sim
8 5 Sim
8 6 Sim
10 10 Sim
Magalh˜es, J.J.
a IA – 2013 4
5. Naive Bayes
Discretizando os valores:
Baixo: {0, 1, 2, 3}
M´dio: {4, 5, 6, 7}
e
Alto: {8, 9, 10, 11}
Magalh˜es, J.J.
a IA – 2013 5
7. Naive Bayes
Deseja-se predizer se um aluno (instˆncia R) com...
a
Um n´mero m´dio de postagens (Postagens = m´dio), e;
u e e
Um n´mero m´dio de tempo de utiliza¸˜o (Tempo = m´dio).
u e ca e
...Ir´ passar de ano ou n˜o (Passar = ?).
a a
Magalh˜es, J.J.
a IA – 2013 7
8. Naive Bayes
Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
P(postagens = m´dio | passou = sim) * P(tempo = m´dio |
e e
passou = sim) * P(passou=sim).
P(postagens = m´dio | passou = sim) = ?
e
P(tempo = m´dio | passou = sim) = ?
e
P(passou=sim) = ?
Magalh˜es, J.J.
a IA – 2013 8
9. Naive Bayes
5
P(postagens = m´dio | passou = sim) =
e 6 = 0.83
Magalh˜es, J.J.
a IA – 2013 9
10. Naive Bayes
Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
0.83 * P(tempo = m´dio | passou = sim) * P(passou=sim).
e
P(postagens = m´dio | passou = sim) = 0.83
e
P(tempo = m´dio | passou = sim) = ?
e
P(passou=sim) = ?
Magalh˜es, J.J.
a IA – 2013 10
11. Naive Bayes
3
P(tempo = m´dio | passou = sim) =
e 6 = 0.5
Magalh˜es, J.J.
a IA – 2013 11
12. Naive Bayes
Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
0.83 * 0.5 * P(passou=sim).
P(postagens = m´dio | passou = sim) = 0.83
e
P(tempo = m´dio | passou = sim) = 0.5
e
P(passou=sim) = ?
Magalh˜es, J.J.
a IA – 2013 12
13. Naive Bayes
6
P(tempo = m´dio | passou = sim) =
e 12 = 0.5
Magalh˜es, J.J.
a IA – 2013 13
14. Naive Bayes
Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
0.83 * 0.5 * 0.5 = 0.21.
P(postagens = m´dio | passou = sim) = 0.83;
e
P(tempo = m´dio | passou = sim) = 0.5;
e
P(passou=sim) = 0.5.
Magalh˜es, J.J.
a IA – 2013 14
15. Naive Bayes
Calculando (Passar = n˜o | R):
a
P(passou = n˜o | tempo = m´dio, postagens = m´dio) =
a e e
0.5 * 0.67 * 0.5 = 0.17.
P(postagens = m´dio | passou = n˜o) = 0.5;
e a
P(tempo = m´dio | passou = n˜o) = 0.67;
e a
P(passou=sim) = 0.5.
Magalh˜es, J.J.
a IA – 2013 15
16. Naive Bayes
Classificando a instˆncia:
a
(Passar = n˜o | R) = 0.17;
a
(Passar = sim | R) = 0.21;
Como (Passar = sim | R) > (Passar = n˜o | R), logo a predi¸˜o
a ca
sobre o aluno ´ que ele passar´ na disciplina.
e a
Magalh˜es, J.J.
a IA – 2013 16