IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social Media Factors

IRJET Journal
IRJET JournalFast Track Publications

https://irjet.net/archives/V7/i3/IRJET-V7I3171.pdf

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 925
SOCIRANK IDENTIFYING AND RANKING PREVALENT NEWS TOPICS
USING SOCIAL MEDIA FACTORS
SRIKANTH.M1, RIHANA PARVEEN.SK2, SIVA SAI.B3, SRAVANI.M4, VENU GOPAL REDDY.G5
1B.Tech, M.Tech, Associate Professor Dept. of CSE, Tirumala Engineering College, Narasaraopet, Guntur, A.P, India
2345 B.Tech Students, Dept. of CSE, Tirumala Engineering College, Narasaraopet, Guntur, A.P, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Mass media sources, specifically the news media,
have traditionally informed us of daily events. In modern
times, social media services such as Twitter provide an
enormous amount of user- generated data, which have great
potential to contain informative news-related content. For
these resources to be useful, we must find a way to filter noise
and only capture the content that, based on its similarity to
the news media, is considered valuable. However, even after
noise is removed, information overload may still exist in the
remaining data—hence; it is convenient to prioritize it for
consumption. To achieve prioritization, information must be
ranked in order of estimated importance considering three
factors. First, the temporal prevalence of a particular topic in
the news media is a factor of importance, and can be
considered the media focus (MF) of a topic. Second, the
temporal prevalence of the topic in social media indicates its
user attention (UA). Last, the interaction between the social
media users who mention this topic indicates the strength of
the community discussing it, and can be regarded as the user
interaction (UI) toward thetopic. Weproposeanunsupervised
framework—SociRank—which identifies news topics
prevalent in both social media and the news media, and then
ranks them by relevance using their degrees of MF, UA, and UI.
Our experiments show that SociRank improvesthequalityand
variety of automatically identified news topics.
Key Words: Information filtering, social computing, social
network analysis, topic identification, topic ranking.
1. INTRODUCTION
The mining of valuable information from online sources has
become a prominent research area in information
technology in recent years. Historically, knowledge that
apprises the general public ofdailyeventshasbeenprovided
by mass media sources, specifically the newsmedia.Many of
these news media sources have either abandoned their
hardcopy publications and moved to the World Wide Web,
or now produce both hard-copy and Internet versions
simultaneously. These news media sources are considered
reliable because they are published by professional
journalists, who are held accountable for their content. On
the other hand, the Internet, being a free and open forum for
information exchange, has recently seen a fascinating
phenomenon knownassocial media.Insocial media,regular,
nonjournalist users are able to publish unverified content
and express their interest in certain events.
1.1 Selection of Common News Topics
Microblogs have become one of the most popular social
media outlets. One microblogging service in particular,
Twitter, is used by millions of people around the world,
providing enormous amounts of user-generated data. One
may assume that this sourcepotentiallycontainsinformation
with equal or greater value than the news media, but one
mustalso assume that because of the unverifiednatureofthe
source, much of this content is useless. For social media data
to be of any use fortopic identification, wemust find awayto
filter uninformative information and capture only
information which, based on its content similarity to the
news media, may be considered useful or valuable.
The news media presentsprofessionallyverifiedoccurrences
or events, while social media presents the interests of the
audience in these areas, and may thus provide insight into
their popularity. Social media services like Twitter can also
provide additional or supporting information to a particular
news media topic. In summary, truly valuable information
may be thought of as the area in which these two media
sources topically intersect. Unfortunately, even after the
removal of unimportant content, there is still information
overload in the remaining news-related data, which must be
prioritized for consumption.
1.2 Ranking them based on Social Media
A straightforward approach for identifying topics from
different social and news media sources is the application of
topic modeling. Many methods have been proposed in this
area, such as latent Dirichlet allocation (LDA) and
probabilistic latent semantic analysis (PLSA). Topic
modeling is, in essence, the discovery of “topics” in text
corpora by clustering together frequently co-occurring
words. This approach, however, misses out in the temporal
component of prevalent topic detection, that is, it does not
take into account how topics changewithtime.Furthermore,
topic modeling and other topic detection techniques do not
rank topics according to their popularity by taking into
account their prevalence in both news media and social
media.
We propose an unsupervised system—SociRank—which
effectively identifies news topics that are prevalent in both
social media and the news media, and then ranks them by
relevance using their degrees of MF, UA, andUI.Eventhough
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 926
this paper focuses on news topics, it can be easily adaptedto
a wide variety of fields, from science and technology to
culture and sports. To the best of our knowledge, no other
work attempts to employ the use of either the social media
interests of users or their social relationships to aid in the
ranking of topics. Moreover, SociRank undergoes an
empirical framework, comprising and integrating several
techniques, such as keyword extraction, measures of
similarity, graph clustering, and social network analysis.The
effectiveness of our system is validated by extensive
controlled and uncontrolled experiments.
2. Releated work
The main research areas applied in this paper include: topic
identification, topic ranking social, network analysis,
keyword extraction, co-occurrence similarity measures, and
graphclustering. Extensive workhasbeenconductedinmost
of these areas.
2.1 Topic Identification
Much research has been carried out in the field of topic
identification—referred to more formally as topic modeling.
Two traditional methods for detecting topics are LDA and
PLSA. LDA is a generative probabilistic model that can be
applied to different tasks, including topic identification.
PLSA, similarly, is a statistical technique, which can also be
applied to topic modeling. In these approaches, however,
temporal information is lost, which is paramount in
identifying prevalent topics and is an important
characteristic of social media data. Furthermore, LDA and
PLSA only discover topics from text corpora; they do not
rank based on popularity or prevalence.
Wartena and Brusseeimplementeda methodtodetecttopics
by clustering keywords. Their method entails the clustering
ofkeywords—basedondifferentsimilaritymeasures—using
the induced k-bisecting clustering algorithm. Although they
do not employ the use of graphs, they do observe that a
distance measure based on the Jensen–Shannon divergence
(or information radius)ofprobabilitydistributionsperforms
well.
2.2 Topic Ranking
Another major concept that is incorporatedintothispaperis
topic ranking. There are several means by which this task
can be accomplished, traditionally being done by estimating
how frequently and recently a topic has been reported by
mass media.
2.3 Social Network Analysis
In the case of UA, Wang, estimated this factor by using
anonymous website visitor data. Their method counts the
amount of times a site was visited during a particular period
of time, which represents the UA of the topic to which the
site is related. Our belief, on the other hand, is that, although
website usage statistics provide initial proof of attention,
additional data are needed to corroborate it. We employ the
use of social media, specifically Twitter, as a means to
estimate UA. When a user tweets about a particular topic, it
signifies that the user is interested in the topic and it has
captured her attention more so than visiting a website
related to it.
2.4 Keyword Extraction
Concerning the field of keyword or informative term
extraction, many unsupervised and supervised methods
have been proposed. Unsupervised methods for keyword
extraction rely solely on implicit information found in
individual texts or in a text corpus. Supervised methods, on
the other hand, make use of training datasets that have
already been classified.
2.5 Co-Occurrence Similarity
Matsuo and Ishizuka suggested that the co-occurrence
relationship of frequent word pairs from a single document
may provide statistical information to aid in the
identification of the document’s keywords. They proposed
that if the probability distribution of co-occurrencebetween
a term x and all other terms in a document is biased to a
particular subset of frequent terms, then term x is likely to
be a keyword. Even though our intentionisnottoemploy co-
occurrence for keyword extraction, this hypothesis
emphasizes the importance of co-occurrence relationships.
2.6 Graph Clustering
The main purpose of graph clustering in this paper is to
identify and separate TCs, as done in Wartena and Brussee’s
work. Iwasaka and Tanaka-Ishiialsoproposeda methodthat
clusters a co-occurrence graph based on a graph measure
known as transitivity. The basic idea of transitivity is that in
a relationship between three elements, if the relationship
holds between the first and second elements and between
the second and third elements, it also holdsbetweenthefirst
and third elements. They suggested that each output cluster
is expected to have no ambiguity, and that this is only
achieved when the edges of a graph (representing co-
occurrence relations) are transitive.
3. SOCIRANK FRAMEWORK
The goal of our method—SociRank—is to identify,
consolidate and rank the most prevalent topics discussed in
both news media and social media duringa specificperiodof
time. The system framework can be visualized in Fig. 1. To
achieve its goal, the system must undergo four main stages.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 927
1) Preprocessing: Key terms are extracted and filtered from
news and social data corresponding to a particular periodof
time.
2) Key Term Graph Construction: A graphisconstructedfrom
the previously extracted key term set, whose vertices
represent the key terms and edges represent the co-
occurrence similarity between them. The graph, after
processing and pruning, contains slightly joint clusters of
topics popular in both news media and social media.
3) Graph Clustering: The graph is clustered in order to
obtain well-defined and disjoint TCs.
4) Content Selection and Ranking:TheTCsfromthegraph are
selected and ranked using the three relevance factors (MF,
UA, and UI).
Initially, news and tweets data are crawledfromtheInternet
and stored in a database. News articles are obtained from
specific news websites via their RSS feeds and tweets are
crawled from the Twitter public timeline . A user then
requests an output of the top k ranked news topics for a
specified period of time between date d1 (start) and date d2
(end).
3.1 Preprocessing
In the preprocessing stage, the system first queries all news
articles and tweets from the database thatfall withindate d1
and date d2. Additionally, two sets of terms are created: one
for the news articles and one for the tweets, as explained
below.
1) News Term Extraction: The set of terms from the news
data source consists of keywords extracted from all the
queried articles. Due to its simple implementation and
effectiveness, we implement a variant of the popular
TextRank algorithm to extract the top k keywordsfromeach
news article.2 The selected keywords are then lemmatized
using the WordNet lemmatizer in order toconsiderdifferent
inflected forms of a word as a single item. After
lemmatization, all unique terms are added to set N. It is
worth pointing out that, since N is a set, it does not contain
duplicate terms.
2) Tweets Term Extraction: For the tweets data source, the
set of terms are not the tweets’ keywords, but all uniqueand
relevant terms. First, the language of each queried tweet is
identified, disregarding any tweet that is not in English.
From the remaining tweets, all terms that appear in a stop
word list or that are less than three characters in length are
eliminated. The part of speech (POS) of each term in the
tweets is then identified using a POS tagger. This POS tagger
is especially useful because it can identify Twitter-specific
POSs, such as hashtags, mentions, and emoticon symbols.
3.2 Key Term Graph Construction
In this component, a graph G is constructed,whoseclustered
nodes represent the most prevalent news topics in both
news and social media. The vertices in G are unique terms
selected from N and T, and the edges are represented by a
relationship between these terms. In the following sections,
we define a method for selecting the terms and establish a
relationship between them. After the terms and
relationships are identified, the graph is pruned by filtering
out unimportant vertices and edges.
3.3 Graph Clustering
Once graph G has been constructed and its most significant
terms (vertices) and term-pair co-occurrencevalues(edges)
have been selected, the next goal is to identify and separate
well-defined TCs (subgraphs)inthegraph.Before explaining
the graph clustering algorithm, the conceptsof betweenness
and transitivity must first be understood.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 928
3.4 Content Selection and Ranking
Now that the prevalent news-TCs that fall within dates d1
and d2 have been identified, relevant content from the two
media sources that is related tothesetopicsmustbeselected
and finally ranked. Related items from the news media will
represent the MF of the topic. Similarly, related items from
social media (Twitter) will represent the UA—more
specifically, the number of unique Twitter users related to
the selected tweets. Selecting the appropriate items (i.e.,
tweets and news articles) related to the key terms of a topic
is not an easy task, as many other items unrelated to the
desired topic also contain similar key terms.
4. CONCLUSION
In this paper, we proposed an unsupervised method—
SociRank—which identifies news topics prevalent in both
social media and the news media, and then ranks them by
taking into account their MF, UA, and UI as relevancefactors.
The temporal prevalence of a particular topic in the news
media is considered the MF of a topic, which gives us insight
into its mass media popularity. The temporal prevalence of
the topic in social media, specifically Twitter, indicates user
interest, and is considered its UA. Finally, the interaction
between the social media users who mention the topic
indicates the strength of the community discussing it, and is
considered the UI. To the best of our knowledge, no other
work has attempted to employ the use of either theinterests
of social media users or their social relationships to aid in
the ranking of topics.
REFERENCES
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet
allocation,” J. Mach. Learn. Res.,vol.3,pp.993–1022,Jan.
2003.
[2] T. Hofmann, “Probabilistic latent semantic analysis,” in
Proc. 15th Conf. Uncertainty Artif. Intell., 1999,pp.289–
296.
[3] T. Hofmann, “Probabilistic latent semantic indexing,” in
Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf.
Retrieval, Berkeley, CA, USA, 1999, pp. 50–57.
[4] C. Wartena and R. Brussee, “Topic detection by
clustering keywords,” in Proc. 19th Int. Workshop
Database Expert Syst. Appl. (DEXA), Turin, Italy, 2008,
pp. 54–58.
[5] C. D. Manning and H. Schütze, Foundations of Statistical
Natural Language Processing. Cambridge, MA,USA:MIT
Press, 1999.
[6] J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D.
Lieberman, and J. Sperling, “TwitterStand: News in
tweets,” in Proc. 17th ACM SIGSPATIAL Int. Conf. Adv.
Geograph. Inf. Syst., Seattle, WA, USA, 2009, pp. 42–51.
[7] C. Zhang, “Automatic keyword extraction from
documents using conditional random fields,” J. Comput.
Inf. Syst., vol. 4, no. 3, pp. 1169–1180, 2008.
[8] K. Sarkar, M. Nasipuri, and S. Ghose, “A new approach to
keyphrase extraction using neural networks,” Int. J.
Comput. Sci. Issues, vol. 7, no. 3, pp. 16–25, Mar. 2010.
[9] G. Figueroa and Y.-S. Chen, “Collaborative ranking
between supervised and unsupervised approaches for
keyphrase extraction,” in Proc. Conf. Comput. Linguist.
Speech Process. (ROCLING), 2014, pp. 110–124.

Recomendados

IRJET- Identification of Prevalent News from Twitter and Traditional Media us... por
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET Journal
16 visualizações8 slides
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh... por
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...IJET - International Journal of Engineering and Techniques
177 visualizações6 slides
Text mining on Twitter information based on R platform por
Text mining on Twitter information based on R platformText mining on Twitter information based on R platform
Text mining on Twitter information based on R platformFayan TAO
286 visualizações13 slides
Fuzzy AndANN Based Mining Approach Testing For Social Network Analysis por
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisFuzzy AndANN Based Mining Approach Testing For Social Network Analysis
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
59 visualizações5 slides
nm por
nmnm
nmZainab Nayyar
145 visualizações5 slides
Integrated expert recommendation model for online communitiesst02 por
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02IJwest
399 visualizações11 slides

Mais conteúdo relacionado

Mais procurados

Exploiting Wikipedia and Twitter for Text Mining Applications por
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsIRJET Journal
31 visualizações7 slides
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING por
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGCATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGijaia
74 visualizações12 slides
Data mining on Social Media por
Data mining on Social MediaData mining on Social Media
Data mining on Social Mediahome
898 visualizações10 slides
Predicting the Brand Popularity from the Brand Metadata por
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand MetadataIJECEIAES
10 visualizações13 slides
Data mining in social network por
Data mining in social networkData mining in social network
Data mining in social networkakash_mishra
13.7K visualizações28 slides
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK por
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKINFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKIAEME Publication
113 visualizações8 slides

Mais procurados(20)

Exploiting Wikipedia and Twitter for Text Mining Applications por IRJET Journal
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
IRJET Journal31 visualizações
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING por ijaia
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGCATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERING
ijaia74 visualizações
Data mining on Social Media por home
Data mining on Social MediaData mining on Social Media
Data mining on Social Media
home898 visualizações
Predicting the Brand Popularity from the Brand Metadata por IJECEIAES
Predicting the Brand Popularity from the Brand MetadataPredicting the Brand Popularity from the Brand Metadata
Predicting the Brand Popularity from the Brand Metadata
IJECEIAES10 visualizações
Data mining in social network por akash_mishra
Data mining in social networkData mining in social network
Data mining in social network
akash_mishra13.7K visualizações
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK por IAEME Publication
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORKINFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
INFORMATION RETRIEVAL TOPICS IN TWITTER USING WEIGHTED PREDICTION NETWORK
IAEME Publication113 visualizações
Social Data Mining por Mahesh Meniya
Social Data MiningSocial Data Mining
Social Data Mining
Mahesh Meniya60.9K visualizações
Vol 7 No 1 - November 2013 por ijcsbi
Vol 7 No 1 - November 2013Vol 7 No 1 - November 2013
Vol 7 No 1 - November 2013
ijcsbi234 visualizações
Data collection thru social media por i4box Anon
Data collection thru social mediaData collection thru social media
Data collection thru social media
i4box Anon529 visualizações
Political prediction analysis using text mining and deep learning por Vishwambhar Deshpande
Political prediction analysis using text mining and deep learningPolitical prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learning
Vishwambhar Deshpande34 visualizações
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai por 1crore projects
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1 Crore Projects | ieee 2016 Projects | 2016 ieee Projects in chennai
1crore projects49 visualizações
IRJET - Election Result Prediction using Sentiment Analysis por IRJET Journal
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment Analysis
IRJET Journal37 visualizações
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL por cscpconf
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALCONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
cscpconf38 visualizações
Contextual model of recommending resources on an academic networking portal por csandit
Contextual model of recommending resources on an academic networking portalContextual model of recommending resources on an academic networking portal
Contextual model of recommending resources on an academic networking portal
csandit432 visualizações
Measuring information credibility in social media using combination of user p... por IJECEIAES
Measuring information credibility in social media using combination of user p...Measuring information credibility in social media using combination of user p...
Measuring information credibility in social media using combination of user p...
IJECEIAES65 visualizações
Prediction of Reaction towards Textual Posts in Social Networks por Mohamed El-Geish
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
Mohamed El-Geish475 visualizações
Social Media Mining: An Introduction por Ali Abbasi
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
Ali Abbasi2.6K visualizações
Jf2516311637 por IJERA Editor
Jf2516311637Jf2516311637
Jf2516311637
IJERA Editor223 visualizações
Social Network Analysis (Part 1) por Vala Ali Rohani
Social Network Analysis (Part 1)Social Network Analysis (Part 1)
Social Network Analysis (Part 1)
Vala Ali Rohani972 visualizações
A Review: Text Classification on Social Media Data por IOSR Journals
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media Data
IOSR Journals212 visualizações

Similar a IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social Media Factors

A REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKING por
A REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKINGA REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKING
A REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKINGDeja Lewis
2 visualizações6 slides
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53 por
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53IRJET Journal
71 visualizações3 slides
Researching Social Media – Big Data and Social Media Analysis por
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
6K visualizações43 slides
Survey of data mining techniques for social por
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for socialFiras Husseini
550 visualizações25 slides
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study... por
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...AIRCC Publishing Corporation
4 visualizações16 slides
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY... por
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
6 visualizações16 slides

Similar a IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social Media Factors(20)

A REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKING por Deja Lewis
A REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKINGA REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKING
A REVIEW ON MACHINE LEARNING TECHNIQUES ON SOCIAL MEDIA DATA FOR POLICY MAKING
Deja Lewis2 visualizações
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53 por IRJET Journal
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
IRJET Journal71 visualizações
Researching Social Media – Big Data and Social Media Analysis por Farida Vis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
Farida Vis6K visualizações
Survey of data mining techniques for social por Firas Husseini
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for social
Firas Husseini550 visualizações
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study... por AIRCC Publishing Corporation
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
AIRCC Publishing Corporation4 visualizações
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY... por ijcsit
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
ijcsit6 visualizações
Final_report6 por Priyanshu Kedia
Final_report6Final_report6
Final_report6
Priyanshu Kedia174 visualizações
Selasturkiye Guide To The Top 16 Social Media Research Questions By Mra Imro por Ziya NISANOGLU
Selasturkiye Guide To The Top 16 Social Media Research Questions By Mra ImroSelasturkiye Guide To The Top 16 Social Media Research Questions By Mra Imro
Selasturkiye Guide To The Top 16 Social Media Research Questions By Mra Imro
Ziya NISANOGLU312 visualizações
IRJET- Event Detection and Text Summary by Disaster Warning por IRJET Journal
IRJET- Event Detection and Text Summary by Disaster WarningIRJET- Event Detection and Text Summary by Disaster Warning
IRJET- Event Detection and Text Summary by Disaster Warning
IRJET Journal24 visualizações
The Implementation of Social Media for Educational Objectives por theijes
The Implementation of Social Media for Educational ObjectivesThe Implementation of Social Media for Educational Objectives
The Implementation of Social Media for Educational Objectives
theijes292 visualizações
F017433947 por IOSR Journals
F017433947F017433947
F017433947
IOSR Journals153 visualizações
E017433538 por IOSR Journals
E017433538E017433538
E017433538
IOSR Journals109 visualizações
Twitter_Hashtag_Prediction.pptx por SayaliKawale2
Twitter_Hashtag_Prediction.pptxTwitter_Hashtag_Prediction.pptx
Twitter_Hashtag_Prediction.pptx
SayaliKawale26 visualizações
Jx2517481755 por IJERA Editor
Jx2517481755Jx2517481755
Jx2517481755
IJERA Editor181 visualizações
Jx2517481755 por IJERA Editor
Jx2517481755Jx2517481755
Jx2517481755
IJERA Editor3.3K visualizações
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci... por IRJET Journal
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET Journal21 visualizações
Knime social media_white_paper por Firas Husseini
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
Firas Husseini299 visualizações
Analyzing-Threat-Levels-of-Extremists-using-Tweets por RESHAN FARAZ
Analyzing-Threat-Levels-of-Extremists-using-TweetsAnalyzing-Threat-Levels-of-Extremists-using-Tweets
Analyzing-Threat-Levels-of-Extremists-using-Tweets
RESHAN FARAZ89 visualizações
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on... por IRJET Journal
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET- A Survey on Trend Analysis on Twitter for Predicting Public Opinion on...
IRJET Journal19 visualizações

Mais de IRJET Journal

SOIL STABILIZATION USING WASTE FIBER MATERIAL por
SOIL STABILIZATION USING WASTE FIBER MATERIALSOIL STABILIZATION USING WASTE FIBER MATERIAL
SOIL STABILIZATION USING WASTE FIBER MATERIALIRJET Journal
25 visualizações7 slides
Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles... por
Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles...Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles...
Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles...IRJET Journal
8 visualizações7 slides
Identification, Discrimination and Classification of Cotton Crop by Using Mul... por
Identification, Discrimination and Classification of Cotton Crop by Using Mul...Identification, Discrimination and Classification of Cotton Crop by Using Mul...
Identification, Discrimination and Classification of Cotton Crop by Using Mul...IRJET Journal
8 visualizações5 slides
“Analysis of GDP, Unemployment and Inflation rates using mathematical formula... por
“Analysis of GDP, Unemployment and Inflation rates using mathematical formula...“Analysis of GDP, Unemployment and Inflation rates using mathematical formula...
“Analysis of GDP, Unemployment and Inflation rates using mathematical formula...IRJET Journal
13 visualizações11 slides
MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR... por
MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR...MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR...
MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR...IRJET Journal
14 visualizações6 slides
Performance Analysis of Aerodynamic Design for Wind Turbine Blade por
Performance Analysis of Aerodynamic Design for Wind Turbine BladePerformance Analysis of Aerodynamic Design for Wind Turbine Blade
Performance Analysis of Aerodynamic Design for Wind Turbine BladeIRJET Journal
7 visualizações5 slides

Mais de IRJET Journal(20)

SOIL STABILIZATION USING WASTE FIBER MATERIAL por IRJET Journal
SOIL STABILIZATION USING WASTE FIBER MATERIALSOIL STABILIZATION USING WASTE FIBER MATERIAL
SOIL STABILIZATION USING WASTE FIBER MATERIAL
IRJET Journal25 visualizações
Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles... por IRJET Journal
Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles...Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles...
Sol-gel auto-combustion produced gamma irradiated Ni1-xCdxFe2O4 nanoparticles...
IRJET Journal8 visualizações
Identification, Discrimination and Classification of Cotton Crop by Using Mul... por IRJET Journal
Identification, Discrimination and Classification of Cotton Crop by Using Mul...Identification, Discrimination and Classification of Cotton Crop by Using Mul...
Identification, Discrimination and Classification of Cotton Crop by Using Mul...
IRJET Journal8 visualizações
“Analysis of GDP, Unemployment and Inflation rates using mathematical formula... por IRJET Journal
“Analysis of GDP, Unemployment and Inflation rates using mathematical formula...“Analysis of GDP, Unemployment and Inflation rates using mathematical formula...
“Analysis of GDP, Unemployment and Inflation rates using mathematical formula...
IRJET Journal13 visualizações
MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR... por IRJET Journal
MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR...MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR...
MAXIMUM POWER POINT TRACKING BASED PHOTO VOLTAIC SYSTEM FOR SMART GRID INTEGR...
IRJET Journal14 visualizações
Performance Analysis of Aerodynamic Design for Wind Turbine Blade por IRJET Journal
Performance Analysis of Aerodynamic Design for Wind Turbine BladePerformance Analysis of Aerodynamic Design for Wind Turbine Blade
Performance Analysis of Aerodynamic Design for Wind Turbine Blade
IRJET Journal7 visualizações
Heart Failure Prediction using Different Machine Learning Techniques por IRJET Journal
Heart Failure Prediction using Different Machine Learning TechniquesHeart Failure Prediction using Different Machine Learning Techniques
Heart Failure Prediction using Different Machine Learning Techniques
IRJET Journal7 visualizações
Experimental Investigation of Solar Hot Case Based on Photovoltaic Panel por IRJET Journal
Experimental Investigation of Solar Hot Case Based on Photovoltaic PanelExperimental Investigation of Solar Hot Case Based on Photovoltaic Panel
Experimental Investigation of Solar Hot Case Based on Photovoltaic Panel
IRJET Journal3 visualizações
Metro Development and Pedestrian Concerns por IRJET Journal
Metro Development and Pedestrian ConcernsMetro Development and Pedestrian Concerns
Metro Development and Pedestrian Concerns
IRJET Journal2 visualizações
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An... por IRJET Journal
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...
Mapping the Crashworthiness Domains: Investigations Based on Scientometric An...
IRJET Journal3 visualizações
Data Analytics and Artificial Intelligence in Healthcare Industry por IRJET Journal
Data Analytics and Artificial Intelligence in Healthcare IndustryData Analytics and Artificial Intelligence in Healthcare Industry
Data Analytics and Artificial Intelligence in Healthcare Industry
IRJET Journal3 visualizações
DESIGN AND SIMULATION OF SOLAR BASED FAST CHARGING STATION FOR ELECTRIC VEHIC... por IRJET Journal
DESIGN AND SIMULATION OF SOLAR BASED FAST CHARGING STATION FOR ELECTRIC VEHIC...DESIGN AND SIMULATION OF SOLAR BASED FAST CHARGING STATION FOR ELECTRIC VEHIC...
DESIGN AND SIMULATION OF SOLAR BASED FAST CHARGING STATION FOR ELECTRIC VEHIC...
IRJET Journal54 visualizações
Efficient Design for Multi-story Building Using Pre-Fabricated Steel Structur... por IRJET Journal
Efficient Design for Multi-story Building Using Pre-Fabricated Steel Structur...Efficient Design for Multi-story Building Using Pre-Fabricated Steel Structur...
Efficient Design for Multi-story Building Using Pre-Fabricated Steel Structur...
IRJET Journal12 visualizações
Development of Effective Tomato Package for Post-Harvest Preservation por IRJET Journal
Development of Effective Tomato Package for Post-Harvest PreservationDevelopment of Effective Tomato Package for Post-Harvest Preservation
Development of Effective Tomato Package for Post-Harvest Preservation
IRJET Journal4 visualizações
“DYNAMIC ANALYSIS OF GRAVITY RETAINING WALL WITH SOIL STRUCTURE INTERACTION” por IRJET Journal
“DYNAMIC ANALYSIS OF GRAVITY RETAINING WALL WITH SOIL STRUCTURE INTERACTION”“DYNAMIC ANALYSIS OF GRAVITY RETAINING WALL WITH SOIL STRUCTURE INTERACTION”
“DYNAMIC ANALYSIS OF GRAVITY RETAINING WALL WITH SOIL STRUCTURE INTERACTION”
IRJET Journal5 visualizações
Understanding the Nature of Consciousness with AI por IRJET Journal
Understanding the Nature of Consciousness with AIUnderstanding the Nature of Consciousness with AI
Understanding the Nature of Consciousness with AI
IRJET Journal12 visualizações
Augmented Reality App for Location based Exploration at JNTUK Kakinada por IRJET Journal
Augmented Reality App for Location based Exploration at JNTUK KakinadaAugmented Reality App for Location based Exploration at JNTUK Kakinada
Augmented Reality App for Location based Exploration at JNTUK Kakinada
IRJET Journal6 visualizações
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba... por IRJET Journal
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
IRJET Journal18 visualizações
Enhancing Real Time Communication and Efficiency With Websocket por IRJET Journal
Enhancing Real Time Communication and Efficiency With WebsocketEnhancing Real Time Communication and Efficiency With Websocket
Enhancing Real Time Communication and Efficiency With Websocket
IRJET Journal5 visualizações
Textile Industrial Wastewater Treatability Studies by Soil Aquifer Treatment ... por IRJET Journal
Textile Industrial Wastewater Treatability Studies by Soil Aquifer Treatment ...Textile Industrial Wastewater Treatability Studies by Soil Aquifer Treatment ...
Textile Industrial Wastewater Treatability Studies by Soil Aquifer Treatment ...
IRJET Journal4 visualizações

Último

DESIGN OF SPRINGS-UNIT4.pptx por
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptxgopinathcreddy
19 visualizações47 slides
Proposal Presentation.pptx por
Proposal Presentation.pptxProposal Presentation.pptx
Proposal Presentation.pptxkeytonallamon
63 visualizações36 slides
Design of machine elements-UNIT 3.pptx por
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptxgopinathcreddy
34 visualizações31 slides
_MAKRIADI-FOTEINI_diploma thesis.pptx por
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptxfotinimakriadi
10 visualizações32 slides
fakenews_DBDA_Mar23.pptx por
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptxdeepmitra8
16 visualizações34 slides
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) por
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Tsuyoshi Horigome
39 visualizações16 slides

Último(20)

DESIGN OF SPRINGS-UNIT4.pptx por gopinathcreddy
DESIGN OF SPRINGS-UNIT4.pptxDESIGN OF SPRINGS-UNIT4.pptx
DESIGN OF SPRINGS-UNIT4.pptx
gopinathcreddy19 visualizações
Proposal Presentation.pptx por keytonallamon
Proposal Presentation.pptxProposal Presentation.pptx
Proposal Presentation.pptx
keytonallamon63 visualizações
Design of machine elements-UNIT 3.pptx por gopinathcreddy
Design of machine elements-UNIT 3.pptxDesign of machine elements-UNIT 3.pptx
Design of machine elements-UNIT 3.pptx
gopinathcreddy34 visualizações
_MAKRIADI-FOTEINI_diploma thesis.pptx por fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi10 visualizações
fakenews_DBDA_Mar23.pptx por deepmitra8
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
deepmitra816 visualizações
Update 42 models(Diode/General ) in SPICE PARK(DEC2023) por Tsuyoshi Horigome
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Update 42 models(Diode/General ) in SPICE PARK(DEC2023)
Tsuyoshi Horigome39 visualizações
Searching in Data Structure por raghavbirla63
Searching in Data StructureSearching in Data Structure
Searching in Data Structure
raghavbirla6314 visualizações
Ansari: Practical experiences with an LLM-based Islamic Assistant por M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous7 visualizações
SUMIT SQL PROJECT SUPERSTORE 1.pptx por Sumit Jadhav
SUMIT SQL PROJECT SUPERSTORE 1.pptxSUMIT SQL PROJECT SUPERSTORE 1.pptx
SUMIT SQL PROJECT SUPERSTORE 1.pptx
Sumit Jadhav 22 visualizações
sam_software_eng_cv.pdf por sammyigbinovia
sam_software_eng_cv.pdfsam_software_eng_cv.pdf
sam_software_eng_cv.pdf
sammyigbinovia9 visualizações
Web Dev Session 1.pptx por VedVekhande
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptx
VedVekhande13 visualizações
REACTJS.pdf por ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR335 visualizações
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... por csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn6 visualizações
MongoDB.pdf por ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR349 visualizações
GDSC Mikroskil Members Onboarding 2023.pdf por gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil59 visualizações
SPICE PARK DEC2023 (6,625 SPICE Models) por Tsuyoshi Horigome
SPICE PARK DEC2023 (6,625 SPICE Models) SPICE PARK DEC2023 (6,625 SPICE Models)
SPICE PARK DEC2023 (6,625 SPICE Models)
Tsuyoshi Horigome36 visualizações
MK__Cert.pdf por Hassan Khan
MK__Cert.pdfMK__Cert.pdf
MK__Cert.pdf
Hassan Khan16 visualizações
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf por AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure6 visualizações

IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social Media Factors

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 925 SOCIRANK IDENTIFYING AND RANKING PREVALENT NEWS TOPICS USING SOCIAL MEDIA FACTORS SRIKANTH.M1, RIHANA PARVEEN.SK2, SIVA SAI.B3, SRAVANI.M4, VENU GOPAL REDDY.G5 1B.Tech, M.Tech, Associate Professor Dept. of CSE, Tirumala Engineering College, Narasaraopet, Guntur, A.P, India 2345 B.Tech Students, Dept. of CSE, Tirumala Engineering College, Narasaraopet, Guntur, A.P, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Mass media sources, specifically the news media, have traditionally informed us of daily events. In modern times, social media services such as Twitter provide an enormous amount of user- generated data, which have great potential to contain informative news-related content. For these resources to be useful, we must find a way to filter noise and only capture the content that, based on its similarity to the news media, is considered valuable. However, even after noise is removed, information overload may still exist in the remaining data—hence; it is convenient to prioritize it for consumption. To achieve prioritization, information must be ranked in order of estimated importance considering three factors. First, the temporal prevalence of a particular topic in the news media is a factor of importance, and can be considered the media focus (MF) of a topic. Second, the temporal prevalence of the topic in social media indicates its user attention (UA). Last, the interaction between the social media users who mention this topic indicates the strength of the community discussing it, and can be regarded as the user interaction (UI) toward thetopic. Weproposeanunsupervised framework—SociRank—which identifies news topics prevalent in both social media and the news media, and then ranks them by relevance using their degrees of MF, UA, and UI. Our experiments show that SociRank improvesthequalityand variety of automatically identified news topics. Key Words: Information filtering, social computing, social network analysis, topic identification, topic ranking. 1. INTRODUCTION The mining of valuable information from online sources has become a prominent research area in information technology in recent years. Historically, knowledge that apprises the general public ofdailyeventshasbeenprovided by mass media sources, specifically the newsmedia.Many of these news media sources have either abandoned their hardcopy publications and moved to the World Wide Web, or now produce both hard-copy and Internet versions simultaneously. These news media sources are considered reliable because they are published by professional journalists, who are held accountable for their content. On the other hand, the Internet, being a free and open forum for information exchange, has recently seen a fascinating phenomenon knownassocial media.Insocial media,regular, nonjournalist users are able to publish unverified content and express their interest in certain events. 1.1 Selection of Common News Topics Microblogs have become one of the most popular social media outlets. One microblogging service in particular, Twitter, is used by millions of people around the world, providing enormous amounts of user-generated data. One may assume that this sourcepotentiallycontainsinformation with equal or greater value than the news media, but one mustalso assume that because of the unverifiednatureofthe source, much of this content is useless. For social media data to be of any use fortopic identification, wemust find awayto filter uninformative information and capture only information which, based on its content similarity to the news media, may be considered useful or valuable. The news media presentsprofessionallyverifiedoccurrences or events, while social media presents the interests of the audience in these areas, and may thus provide insight into their popularity. Social media services like Twitter can also provide additional or supporting information to a particular news media topic. In summary, truly valuable information may be thought of as the area in which these two media sources topically intersect. Unfortunately, even after the removal of unimportant content, there is still information overload in the remaining news-related data, which must be prioritized for consumption. 1.2 Ranking them based on Social Media A straightforward approach for identifying topics from different social and news media sources is the application of topic modeling. Many methods have been proposed in this area, such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA). Topic modeling is, in essence, the discovery of “topics” in text corpora by clustering together frequently co-occurring words. This approach, however, misses out in the temporal component of prevalent topic detection, that is, it does not take into account how topics changewithtime.Furthermore, topic modeling and other topic detection techniques do not rank topics according to their popularity by taking into account their prevalence in both news media and social media. We propose an unsupervised system—SociRank—which effectively identifies news topics that are prevalent in both social media and the news media, and then ranks them by relevance using their degrees of MF, UA, andUI.Eventhough
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 926 this paper focuses on news topics, it can be easily adaptedto a wide variety of fields, from science and technology to culture and sports. To the best of our knowledge, no other work attempts to employ the use of either the social media interests of users or their social relationships to aid in the ranking of topics. Moreover, SociRank undergoes an empirical framework, comprising and integrating several techniques, such as keyword extraction, measures of similarity, graph clustering, and social network analysis.The effectiveness of our system is validated by extensive controlled and uncontrolled experiments. 2. Releated work The main research areas applied in this paper include: topic identification, topic ranking social, network analysis, keyword extraction, co-occurrence similarity measures, and graphclustering. Extensive workhasbeenconductedinmost of these areas. 2.1 Topic Identification Much research has been carried out in the field of topic identification—referred to more formally as topic modeling. Two traditional methods for detecting topics are LDA and PLSA. LDA is a generative probabilistic model that can be applied to different tasks, including topic identification. PLSA, similarly, is a statistical technique, which can also be applied to topic modeling. In these approaches, however, temporal information is lost, which is paramount in identifying prevalent topics and is an important characteristic of social media data. Furthermore, LDA and PLSA only discover topics from text corpora; they do not rank based on popularity or prevalence. Wartena and Brusseeimplementeda methodtodetecttopics by clustering keywords. Their method entails the clustering ofkeywords—basedondifferentsimilaritymeasures—using the induced k-bisecting clustering algorithm. Although they do not employ the use of graphs, they do observe that a distance measure based on the Jensen–Shannon divergence (or information radius)ofprobabilitydistributionsperforms well. 2.2 Topic Ranking Another major concept that is incorporatedintothispaperis topic ranking. There are several means by which this task can be accomplished, traditionally being done by estimating how frequently and recently a topic has been reported by mass media. 2.3 Social Network Analysis In the case of UA, Wang, estimated this factor by using anonymous website visitor data. Their method counts the amount of times a site was visited during a particular period of time, which represents the UA of the topic to which the site is related. Our belief, on the other hand, is that, although website usage statistics provide initial proof of attention, additional data are needed to corroborate it. We employ the use of social media, specifically Twitter, as a means to estimate UA. When a user tweets about a particular topic, it signifies that the user is interested in the topic and it has captured her attention more so than visiting a website related to it. 2.4 Keyword Extraction Concerning the field of keyword or informative term extraction, many unsupervised and supervised methods have been proposed. Unsupervised methods for keyword extraction rely solely on implicit information found in individual texts or in a text corpus. Supervised methods, on the other hand, make use of training datasets that have already been classified. 2.5 Co-Occurrence Similarity Matsuo and Ishizuka suggested that the co-occurrence relationship of frequent word pairs from a single document may provide statistical information to aid in the identification of the document’s keywords. They proposed that if the probability distribution of co-occurrencebetween a term x and all other terms in a document is biased to a particular subset of frequent terms, then term x is likely to be a keyword. Even though our intentionisnottoemploy co- occurrence for keyword extraction, this hypothesis emphasizes the importance of co-occurrence relationships. 2.6 Graph Clustering The main purpose of graph clustering in this paper is to identify and separate TCs, as done in Wartena and Brussee’s work. Iwasaka and Tanaka-Ishiialsoproposeda methodthat clusters a co-occurrence graph based on a graph measure known as transitivity. The basic idea of transitivity is that in a relationship between three elements, if the relationship holds between the first and second elements and between the second and third elements, it also holdsbetweenthefirst and third elements. They suggested that each output cluster is expected to have no ambiguity, and that this is only achieved when the edges of a graph (representing co- occurrence relations) are transitive. 3. SOCIRANK FRAMEWORK The goal of our method—SociRank—is to identify, consolidate and rank the most prevalent topics discussed in both news media and social media duringa specificperiodof time. The system framework can be visualized in Fig. 1. To achieve its goal, the system must undergo four main stages.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 927 1) Preprocessing: Key terms are extracted and filtered from news and social data corresponding to a particular periodof time. 2) Key Term Graph Construction: A graphisconstructedfrom the previously extracted key term set, whose vertices represent the key terms and edges represent the co- occurrence similarity between them. The graph, after processing and pruning, contains slightly joint clusters of topics popular in both news media and social media. 3) Graph Clustering: The graph is clustered in order to obtain well-defined and disjoint TCs. 4) Content Selection and Ranking:TheTCsfromthegraph are selected and ranked using the three relevance factors (MF, UA, and UI). Initially, news and tweets data are crawledfromtheInternet and stored in a database. News articles are obtained from specific news websites via their RSS feeds and tweets are crawled from the Twitter public timeline . A user then requests an output of the top k ranked news topics for a specified period of time between date d1 (start) and date d2 (end). 3.1 Preprocessing In the preprocessing stage, the system first queries all news articles and tweets from the database thatfall withindate d1 and date d2. Additionally, two sets of terms are created: one for the news articles and one for the tweets, as explained below. 1) News Term Extraction: The set of terms from the news data source consists of keywords extracted from all the queried articles. Due to its simple implementation and effectiveness, we implement a variant of the popular TextRank algorithm to extract the top k keywordsfromeach news article.2 The selected keywords are then lemmatized using the WordNet lemmatizer in order toconsiderdifferent inflected forms of a word as a single item. After lemmatization, all unique terms are added to set N. It is worth pointing out that, since N is a set, it does not contain duplicate terms. 2) Tweets Term Extraction: For the tweets data source, the set of terms are not the tweets’ keywords, but all uniqueand relevant terms. First, the language of each queried tweet is identified, disregarding any tweet that is not in English. From the remaining tweets, all terms that appear in a stop word list or that are less than three characters in length are eliminated. The part of speech (POS) of each term in the tweets is then identified using a POS tagger. This POS tagger is especially useful because it can identify Twitter-specific POSs, such as hashtags, mentions, and emoticon symbols. 3.2 Key Term Graph Construction In this component, a graph G is constructed,whoseclustered nodes represent the most prevalent news topics in both news and social media. The vertices in G are unique terms selected from N and T, and the edges are represented by a relationship between these terms. In the following sections, we define a method for selecting the terms and establish a relationship between them. After the terms and relationships are identified, the graph is pruned by filtering out unimportant vertices and edges. 3.3 Graph Clustering Once graph G has been constructed and its most significant terms (vertices) and term-pair co-occurrencevalues(edges) have been selected, the next goal is to identify and separate well-defined TCs (subgraphs)inthegraph.Before explaining the graph clustering algorithm, the conceptsof betweenness and transitivity must first be understood.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 03 | Mar 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.34 | ISO 9001:2008 Certified Journal | Page 928 3.4 Content Selection and Ranking Now that the prevalent news-TCs that fall within dates d1 and d2 have been identified, relevant content from the two media sources that is related tothesetopicsmustbeselected and finally ranked. Related items from the news media will represent the MF of the topic. Similarly, related items from social media (Twitter) will represent the UA—more specifically, the number of unique Twitter users related to the selected tweets. Selecting the appropriate items (i.e., tweets and news articles) related to the key terms of a topic is not an easy task, as many other items unrelated to the desired topic also contain similar key terms. 4. CONCLUSION In this paper, we proposed an unsupervised method— SociRank—which identifies news topics prevalent in both social media and the news media, and then ranks them by taking into account their MF, UA, and UI as relevancefactors. The temporal prevalence of a particular topic in the news media is considered the MF of a topic, which gives us insight into its mass media popularity. The temporal prevalence of the topic in social media, specifically Twitter, indicates user interest, and is considered its UA. Finally, the interaction between the social media users who mention the topic indicates the strength of the community discussing it, and is considered the UI. To the best of our knowledge, no other work has attempted to employ the use of either theinterests of social media users or their social relationships to aid in the ranking of topics. REFERENCES [1] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res.,vol.3,pp.993–1022,Jan. 2003. [2] T. Hofmann, “Probabilistic latent semantic analysis,” in Proc. 15th Conf. Uncertainty Artif. Intell., 1999,pp.289– 296. [3] T. Hofmann, “Probabilistic latent semantic indexing,” in Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, Berkeley, CA, USA, 1999, pp. 50–57. [4] C. Wartena and R. Brussee, “Topic detection by clustering keywords,” in Proc. 19th Int. Workshop Database Expert Syst. Appl. (DEXA), Turin, Italy, 2008, pp. 54–58. [5] C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. Cambridge, MA,USA:MIT Press, 1999. [6] J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D. Lieberman, and J. Sperling, “TwitterStand: News in tweets,” in Proc. 17th ACM SIGSPATIAL Int. Conf. Adv. Geograph. Inf. Syst., Seattle, WA, USA, 2009, pp. 42–51. [7] C. Zhang, “Automatic keyword extraction from documents using conditional random fields,” J. Comput. Inf. Syst., vol. 4, no. 3, pp. 1169–1180, 2008. [8] K. Sarkar, M. Nasipuri, and S. Ghose, “A new approach to keyphrase extraction using neural networks,” Int. J. Comput. Sci. Issues, vol. 7, no. 3, pp. 16–25, Mar. 2010. [9] G. Figueroa and Y.-S. Chen, “Collaborative ranking between supervised and unsupervised approaches for keyphrase extraction,” in Proc. Conf. Comput. Linguist. Speech Process. (ROCLING), 2014, pp. 110–124.