Naive Bayes

•

1 gostou•1,139 visualizações

O documento apresenta um resumo sobre o algoritmo de classificação Naive Bayes. Em três passos, o algoritmo calcula a probabilidade de uma nova instância pertencer a cada classe existente no conjunto de treinamento e a classifica na classe com maior probabilidade. O documento ilustra o processo com um exemplo sobre predição de aprovação de alunos baseado em tempo de estudo e número de postagens.

UFAL - Universidade Federal de Alagoas
UFAL - Instituto de Computa¸˜o
ca

Naive Bayes

Jonathas Magalh˜es a
jonathas@ic.ufal.br

Magalh˜es, J.J.
a IA – 2013 1

Naive Bayes

Baseado no Teorema de Bayes:

P(B|A) ∗ P(A)
P(A|B) = . (1)
P(B)

Seja X (A1 , A2 , ..., An , C ) um conjunto de dados de treinamento;
Onde C1 , C 2, ..., Ck s˜o classes dos poss´
a ıveis valores de C ;
R ´ um novo registro que deve ser classiﬁcado.
e
Os valores que R assume em X s˜o a1 , a2 , ..., an .
a

Magalh˜es, J.J.
a IA – 2013 2

Naive Bayes

Passos do algoritmo:
1 Calcular a probabilidade P(C = Ci |R), i = 1, 2, ..., k;
2 A sa´ ´ a classe Cj tal que P(C = Cj |R) seja m´xima.
ıda e a
A probabilidade de uma instˆncia pertencer a uma classe ´ dada por:
a e

P(C = Ci |A1 = a1 , A2 = a2 , ..., An = an ) =
P(A1 = a1 |C = Ci ) ∗ P(A2 = a2 |C = Ci ) ∗ ... (2)
∗P(An = an |C = Ci ) ∗ P(C = Ci ).

Magalh˜es, J.J.
a IA – 2013 3

Naive Bayes – Exemplo

Considere os seguintes dados:

X 1 : Tempo de utiliza¸˜o
ca X 2 : N´mero postagens
u Y : Passou na disciplina
2 4 N˜o
a
3 6 N˜o
a
4 8 N˜o
a
4 4 N˜o
a
5 7 N˜o
a
6 5 N˜o
a
6 6 Sim
6 5 Sim
7 7 Sim
8 5 Sim
8 6 Sim
10 10 Sim

Magalh˜es, J.J.
a IA – 2013 4

Naive Bayes

Discretizando os valores:
Baixo: {0, 1, 2, 3}
M´dio: {4, 5, 6, 7}
e
Alto: {8, 9, 10, 11}

Magalh˜es, J.J.
a IA – 2013 5

Naive Bayes

Dados discretizados:

Magalh˜es, J.J.
a IA – 2013 6

Naive Bayes

Deseja-se predizer se um aluno (instˆncia R) com...
a
Um n´mero m´dio de postagens (Postagens = m´dio), e;
u e e
Um n´mero m´dio de tempo de utiliza¸˜o (Tempo = m´dio).
u e ca e
...Ir´ passar de ano ou n˜o (Passar = ?).
a a

Magalh˜es, J.J.
a IA – 2013 7

Naive Bayes

Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
P(postagens = m´dio | passou = sim) * P(tempo = m´dio |
e e
passou = sim) * P(passou=sim).
P(postagens = m´dio | passou = sim) = ?
e
P(tempo = m´dio | passou = sim) = ?
e
P(passou=sim) = ?

Magalh˜es, J.J.
a IA – 2013 8

Naive Bayes

5
P(postagens = m´dio | passou = sim) =
e 6 = 0.83

Magalh˜es, J.J.
a IA – 2013 9

Naive Bayes

Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
0.83 * P(tempo = m´dio | passou = sim) * P(passou=sim).
e
P(postagens = m´dio | passou = sim) = 0.83
e
P(tempo = m´dio | passou = sim) = ?
e
P(passou=sim) = ?

Magalh˜es, J.J.
a IA – 2013 10

Naive Bayes

3
P(tempo = m´dio | passou = sim) =
e 6 = 0.5

Magalh˜es, J.J.
a IA – 2013 11

Naive Bayes

Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
0.83 * 0.5 * P(passou=sim).
P(postagens = m´dio | passou = sim) = 0.83
e
P(tempo = m´dio | passou = sim) = 0.5
e
P(passou=sim) = ?

Magalh˜es, J.J.
a IA – 2013 12

Naive Bayes

6
P(tempo = m´dio | passou = sim) =
e 12 = 0.5

Magalh˜es, J.J.
a IA – 2013 13

Naive Bayes

Calculando (Passar = sim | R):
P(passou = sim | tempo = m´dio, postagens = m´dio) =
e e
0.83 * 0.5 * 0.5 = 0.21.
P(postagens = m´dio | passou = sim) = 0.83;
e
P(tempo = m´dio | passou = sim) = 0.5;
e
P(passou=sim) = 0.5.

Magalh˜es, J.J.
a IA – 2013 14

Naive Bayes

Calculando (Passar = n˜o | R):
a
P(passou = n˜o | tempo = m´dio, postagens = m´dio) =
a e e
0.5 * 0.67 * 0.5 = 0.17.
P(postagens = m´dio | passou = n˜o) = 0.5;
e a
P(tempo = m´dio | passou = n˜o) = 0.67;
e a
P(passou=sim) = 0.5.

Magalh˜es, J.J.
a IA – 2013 15

Perguntas?

Magalh˜es, J.J.
a IA – 2013 17

Recomendados

K-Nearest Neighbor

Jonathas Magalhães

Pcnp editediretoriabraganca

Mat progressoes ( pg) iitrigono_metrico

Naive bayesAshraf Uddin

Enhancing the Status Message Question Asking Process on Facebook

Jonathas Magalhães

Recommending Scientific Papers: Investigating the User Curriculum

Jonathas Magalhães

Sistemas de Recomendação: Conceitos, Técnicas, Ferramentas e Aplicações

Jonathas Magalhães

Redes Bayesianas

Jonathas Magalhães

Probabilidade

Jonathas Magalhães

An Ontology Based Approach for Sharing Distributed Educational

Jonathas Magalhães

Over the last decade, research in the area of e-learning employed efforts in solutions for sharing and reuse of educational resources. However, these solutions were not built for communication with each other. That is, in general, the development of these solutions has not provided mechanisms to facilitate the exchange of resources among them. Thus, this effort led to a fragmented landscape of concurrent metadata schemas or interface mechanisms. To address this problem, this work aims to design an ontology based semantic model for sharing educational resources from different sources. This model serves as a hub of enriched metadata for educational resources, aiming to classify, organize and align these resources. At the current step of this research, we developed the ontology for the integration model, and initial evaluations were performed, showing its efficiency in the formal description of educational resources.

A Recommender System for Predicting User Engagement in Twitter

Jonathas Magalhães

The RecSys Challenge is a traditional competition among Recommender Systems’ (RS) researchers. The 2014 edition is focused on predicting the amount of interaction achieved by tweets related to movies. In this paper, we present an approach to participate in the 2014 RecSys Challenge. Our approach consists of three steps: (i) using binary classification methods in order to split the tweets into two lists, those having user engagement equal to zero, and those having user engagement different from zero; (ii) each list is sorted through the use of regression methods; and (iii) is performed a concatenation of the two lists and a sort of tweets. To validate our approach we tested 126 configurations and verify that the settings using MovieTweetings dataset, Naïve Bayes classifier and Linear Regression, obtained the best results: nDCG@10 = 0.9037242.

Social Query: A Query Routing System for Twitter

Jonathas Magalhães

Social Query is a new and efficient way to get answers on the social networks. However, the popular method of sharing public questions could be optimized by directing the question to an expert, a process called query routing. In this work, we propose a Social Query System for query routing on Twitter, currently, one of the most popular social networks. The Social Query Systems analyzes the information about the questioner’s followers and recommends the most suitable users to answer the questions. The use of the system changes the usual process, working apart of Twitter and allowing questioner and responder exceed the limit of 140 characters. Through a qualitative evaluation, we showed promising results and ideas for improving the system and the recommendation algorithm.

A Query Routing Model to Rank Expertcandidates on Twitter

Jonathas Magalhães

Online Social Networks (OSNs) have become very popular and new ways of use their virtual environment have emerged. One of these new ways is a method to obtain information online called Social Query that consists of sharing a question on an OSN and waiting for answers come from contacts. The usual strategy is sharing a question that will be visible to everyone (public). However, this way there is no guarantee that an answer will be received neither about the quality of the answer. Directing the question to an expert about its subject (Query Routing) is a better strategy, but decides to whom direct the question is not always an easy task. In this work, we propose and evaluate a model to decide who user is the most able to receive a question and answer it correctly and quickly. The differential of our research is that we focused in OSNs context and leaded with the recommendation as multi-criteria decision making problem. Our evaluation shows promising results and confirms the great performance of our proposal.

Predicting Potential Responders in Twitter: A Query Routing Algorithm

Jonathas Magalhães

A phenomenon not so recent is the substantial increase in popularity and use of online social networks. With that has emerged a new way to find information online: the social query, which consists of posting a question in a social network and wait for responses from close friends. Usually, a question is posted to be visible to everyone, but we believe that this is not the best way: there will be the possibility of receiving several responses (including wrong), keep receiving answers where there is no need, do not receive answers, etc. The query router problem consists of finding the most able individual in the personal social network of the questioner. This work presents an algorithm to Routing Questions in Twitter. The model was validated through its predict capacity and the results shows that its recommendations match in half cases only when combined with a technique to enrich the information present in the question.

An Open and Inspectable Learner Modeling with a Negotiation Mechanism to Solv...

Jonathas Magalhães

Some researchers have developed relevant and diverse proposals for improving the content quality of the learner model in Intelligent Tutoring Systems, mainly reducing its uncertainty. Following this aim, this paper proposes an open learner modeling approach using Bayesian networks, focusing on negotiation mechanism to solve detected cognitive conflicts that can emerge when the learner inspects information of his model inferred by the system. Therefore, we addressed some issues concerning the provision of inspectable model and negotiated updating of this model. Its contribution lies in the fact that the learners attempt to change the learner model is met with a challenge, leading to a decision if the learner claims to know more (or less) than the model represents.

Improving a Recommender System Through Integration of User Profiles: a Semant...

Jonathas Magalhães

The users are present in multiple social networks/virtual communities and each one can be considered as a source of information about this user. In face to this question it is important a mechanism to integrate the user profiles. Through the integration of user profiles it is possible identifier more accurately their interests analyzing other data sources that they are present, possible reducing the cold-start problem. In this context, we present a semantic approach to help integrate data from multiple sources, for the construction and maintenance of user profiles that will be used to improve the quality of a recommender system. To integrate data from multiple sources, we defined a heuristic that quantifies the importance of each data source for a given user. To validate our approach, we perform a case study, where the solution was coupled into a recommender system of papers focused in Software Engineering domain. The user profiles were built extracting their information from the Brazilian Curriculum Vitae database named CV-Lattes, an academic platform, and Linkedin, a business network. We compared the quality of the recommendation based on the profiles integrated and non-integrated. The results show the superior quality of the recommendation based on integrated profile.

Mais conteúdo relacionado

Mais de Jonathas Magalhães

Probabilidade

Jonathas Magalhães

An Ontology Based Approach for Sharing Distributed Educational

Jonathas Magalhães

A Recommender System for Predicting User Engagement in Twitter

Jonathas Magalhães

Social Query: A Query Routing System for Twitter

Jonathas Magalhães

A Query Routing Model to Rank Expertcandidates on Twitter

Jonathas Magalhães

Predicting Potential Responders in Twitter: A Query Routing Algorithm

Jonathas Magalhães

An Open and Inspectable Learner Modeling with a Negotiation Mechanism to Solv...

Jonathas Magalhães

Improving a Recommender System Through Integration of User Profiles: a Semant...

Jonathas Magalhães

Mais de Jonathas Magalhães (8)

Probabilidade

An Ontology Based Approach for Sharing Distributed Educational

A Recommender System for Predicting User Engagement in Twitter

Social Query: A Query Routing System for Twitter

A Query Routing Model to Rank Expertcandidates on Twitter

Predicting Potential Responders in Twitter: A Query Routing Algorithm

An Open and Inspectable Learner Modeling with a Negotiation Mechanism to Solv...

Improving a Recommender System Through Integration of User Profiles: a Semant...

Naive Bayes

1. UFAL - Universidade Federal de Alagoas UFAL - Instituto de Computa¸˜o ca Naive Bayes Jonathas Magalh˜es a jonathas@ic.ufal.br Magalh˜es, J.J. a IA – 2013 1

2. Naive Bayes Baseado no Teorema de Bayes: P(B|A) ∗ P(A) P(A|B) = . (1) P(B) Seja X (A1 , A2 , ..., An , C ) um conjunto de dados de treinamento; Onde C1 , C 2, ..., Ck sõ classes dos poss´ a ıveis valores de C ; R ´ um novo registro que deve ser classificado. e Os valores que R assume em X sõ a1 , a2 , ..., an . a Magalh˜es, J.J. a IA – 2013 2

3. Naive Bayes Passos do algoritmo: 1 Calcular a probabilidade P(C = Ci |R), i = 1, 2, ..., k; 2 A sa´ ´ a classe Cj tal que P(C = Cj |R) seja m´xima. ıda e a A probabilidade de uma instˆncia pertencer a uma classe ´ dada por: a e P(C = Ci |A1 = a1 , A2 = a2 , ..., An = an ) = P(A1 = a1 |C = Ci ) ∗ P(A2 = a2 |C = Ci ) ∗ ... (2) ∗P(An = an |C = Ci ) ∗ P(C = Ci ). Magalh˜es, J.J. a IA – 2013 3

4. Naive Bayes – Exemplo Considere os seguintes dados: X 1 : Tempo de utiliza¸õ ca X 2 : N´mero postagens u Y : Passou na disciplina 2 4 Nõ a 3 6 Nõ a 4 8 Nõ a 4 4 Nõ a 5 7 Nõ a 6 5 Nõ a 6 6 Sim 6 5 Sim 7 7 Sim 8 5 Sim 8 6 Sim 10 10 Sim Magalh˜es, J.J. a IA – 2013 4

5. Naive Bayes Discretizando os valores: Baixo: {0, 1, 2, 3} M´dio: {4, 5, 6, 7} e Alto: {8, 9, 10, 11} Magalh˜es, J.J. a IA – 2013 5

6. Naive Bayes Dados discretizados: Magalh˜es, J.J. a IA – 2013 6

7. Naive Bayes Deseja-se predizer se um aluno (instˆncia R) com... a Um n´mero m´dio de postagens (Postagens = m´dio), e; u e e Um n´mero m´dio de tempo de utiliza¸˜o (Tempo = m´dio). u e ca e ...Ir´ passar de ano ou n˜o (Passar = ?). a a Magalh˜es, J.J. a IA – 2013 7

8. Naive Bayes Calculando (Passar = sim | R): P(passou = sim | tempo = m´dio, postagens = m´dio) = e e P(postagens = m´dio | passou = sim) * P(tempo = m´dio | e e passou = sim) * P(passou=sim). P(postagens = m´dio | passou = sim) = ? e P(tempo = m´dio | passou = sim) = ? e P(passou=sim) = ? Magalh˜es, J.J. a IA – 2013 8

9. Naive Bayes 5 P(postagens = m´dio | passou = sim) = e 6 = 0.83 Magalh˜es, J.J. a IA – 2013 9

10. Naive Bayes Calculando (Passar = sim | R): P(passou = sim | tempo = m´dio, postagens = m´dio) = e e 0.83 * P(tempo = m´dio | passou = sim) * P(passou=sim). e P(postagens = m´dio | passou = sim) = 0.83 e P(tempo = m´dio | passou = sim) = ? e P(passou=sim) = ? Magalh˜es, J.J. a IA – 2013 10

11. Naive Bayes 3 P(tempo = m´dio | passou = sim) = e 6 = 0.5 Magalh˜es, J.J. a IA – 2013 11

12. Naive Bayes Calculando (Passar = sim | R): P(passou = sim | tempo = m´dio, postagens = m´dio) = e e 0.83 * 0.5 * P(passou=sim). P(postagens = m´dio | passou = sim) = 0.83 e P(tempo = m´dio | passou = sim) = 0.5 e P(passou=sim) = ? Magalh˜es, J.J. a IA – 2013 12

13. Naive Bayes 6 P(tempo = m´dio | passou = sim) = e 12 = 0.5 Magalh˜es, J.J. a IA – 2013 13

14. Naive Bayes Calculando (Passar = sim | R): P(passou = sim | tempo = m´dio, postagens = m´dio) = e e 0.83 * 0.5 * 0.5 = 0.21. P(postagens = m´dio | passou = sim) = 0.83; e P(tempo = m´dio | passou = sim) = 0.5; e P(passou=sim) = 0.5. Magalh˜es, J.J. a IA – 2013 14

15. Naive Bayes Calculando (Passar = nõ | R): a P(passou = nõ | tempo = m´dio, postagens = m´dio) = a e e 0.5 * 0.67 * 0.5 = 0.17. P(postagens = m´dio | passou = nõ) = 0.5; e a P(tempo = m´dio | passou = nõ) = 0.67; e a P(passou=sim) = 0.5. Magalh˜es, J.J. a IA – 2013 15

16. Naive Bayes Classificando a instˆncia: a (Passar = nõ | R) = 0.17; a (Passar = sim | R) = 0.21; Como (Passar = sim | R) > (Passar = nõ | R), logo a predi¸õ a ca sobre o aluno ´ que ele passar´ na disciplina. e a Magalh˜es, J.J. a IA – 2013 16

17. Perguntas? Magalh˜es, J.J. a IA – 2013 17