Topic modeling using big data analytics

•Transferir como PPTX, PDF•

3 gostaram•642 visualizações

Farheen Nilofer

Dados e análise

INTRODUCTION
WORK FLOW:
Installation of Hadoop on multiple nodes- For distributed processing.
Pre processing of data set - Cleaning and conversion of data into desired
format.
Passing the converted data to the modeling tool.
Parallelizing the computation and algorithm selection.
Comparison of results on the basis of
Efficiency of different modelling algorithms
Computation on single vs multiple node.

WHAT Is Big Data and Topic Modelling?
Big Data:
Data that cannot be stored or processed by traditional computing techniques.
EXAMPLES: Black Box Data, Social Media, Space Exploration, Power and Grid Station,Search Engine Data…
Topic Modelling:
Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections.
IN LAYMAN TERMS
A method of text mining to identify patterns in a corpus. Topic modeling helps us develop new ways to search,
browse and summarize large archives of texts.
VOLUME VARIETY VELOCITY

WHY Topic Modelling using Big Data ?
You have to create several different variables for every single word in the corpus. The models we
would be running, with roughly 2,000 documents, will get to the edge of what can be done on an
average desktop machine, and commonly take a day.
Hadoop is a framework which could provide all the facilities that are needed in modelling of such a
huge set of data.
till
2
DAYS
10 MINS
??
5 MILLION GIGABYTES OF DATA GENERATED(amount)
2003
2011
2013
2015

HADOOP and its COMPONENTS
Hadoop:
An open source framework written in JAVA.
It is designed to scale up from single servers to thousands
of machines, each offering local computation and
storage.(confi)
It has two major component-
HDFS(Hadoop Distributed File System)- For the storage.
MapReduce- Processing of data.(pgram model)
Hadoop Installation:
Cluster of 5 (in our case) commodity hardwares.
Namenode-the manager.
Datanodes- the actual storage and processing units.

HOW Topic Modelling is Achieved Using
Big Data Analytics?
Proposed Algorithms:
Probabilistic Latent Semantic Indexing ( PLSI) :
It is a novel statistical technique for the analysis of two-mode and co-occurrence data
Latent Dirichlet allocation (LDA):
It’s a way of automatically discovering topics that sentences contain.
Pachinko allocation
Modeling correlations between topics in addition to the word correlations which constitute topics.
`
d wz

HOW Topic Modelling is Achieved Using
Big Data Analytics?
TOOL
S
Model/
Algori
thm
Langu
age
Introd
uction
Mallet LDA(in
cluding
Naïve
Bayes,
Maximu
m
Entropy
, and
Decisio
n
Trees)
Java efficient
routine
s for
converti
ng text
to
"feature
s", a
wide
variety
of
algorith
TOPIC MODELLING TOOLS

WHERE is Topic Modeling Using Big
Data Applied?
SOME APPLICATIONS OF TOPIC MODELING INCLUDE:
Topic Modeling for analyzing news articles.
Topic Modeling for Page Rank in Search Engines.
Finding patterns in genetic data, images, social graphs.
Topic modeling on historical journals.

REFERENCES:
1.Papadimitriou, Christos; Raghavan, Prabhakar; Tamaki, Hisao; Vempala, Santosh (1998). "Latent Semantic Indexing:
A probabilistic analysis" (Postscript). Proceedings of ACM PODS.
2.Blei, David M.; Ng, Andrew Y.; Jordan, Michael I; Lafferty, John (January 2003). "Latent Dirichlet allocation". Journal of
Machine Learning Research 3: 993–1022. doi:10.1162/jmlr.2003.3.4-5.993.
3.Blei, David M. (April 2012). "Introduction to Probabilistic Topic Models" (PDF). Comm. ACM 55 (4): 77–84.
doi:10.1145/2133806.2133826.
4.Sanjeev Arora; Rong Ge; Ankur Moitra (April 2012). "Learning Topic Models—Going beyond SVD". arXiv:1204.1956.

Mais conteúdo relacionado

Mais procurados

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato

Topic model, LDA and all thatZhibo Xiao

Neural Models for Information RetrievalBhaskar Mitra

Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics

Topic ModelsClaudia Wagner

Deep Neural Methods for RetrievalBhaskar Mitra

Adversarial and reinforcement learning-based approaches to information retrievalBhaskar Mitra

Tdm probabilistic models (part 2)KU Leuven

Neural Models for Information RetrievalBhaskar Mitra

Deep Learning for SearchBhaskar Mitra

Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra

Probabilistic models (part 1)KU Leuven

Artificial Intelligencevini89

A Simple Introduction to Neural Information RetrievalBhaskar Mitra

Neural Models for Document RankingBhaskar Mitra

Language Models for Information RetrievalNik Spirin

Survey of Generative Clustering Models 2008Roman Stanchak

The Duet modelBhaskar Mitra

Topic Extraction on Domain OntologyKeerti Bhogaraju

5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra

Mais procurados (20)

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks

Topic model, LDA and all that

Neural Models for Information Retrieval

Introduction to Probabilistic Latent Semantic Analysis

Topic Models

Deep Neural Methods for Retrieval

Adversarial and reinforcement learning-based approaches to information retrieval

Tdm probabilistic models (part 2)

Neural Models for Information Retrieval

Deep Learning for Search

Duet @ TREC 2019 Deep Learning Track

Probabilistic models (part 1)

Artificial Intelligence

A Simple Introduction to Neural Information Retrieval

Neural Models for Document Ranking

Language Models for Information Retrieval

Survey of Generative Clustering Models 2008

The Duet model

Topic Extraction on Domain Ontology

5 Lessons Learned from Designing Neural Models for Information Retrieval

Destaque

Is hadoop for youGwen (Chen) Shapira

Twitter with hadoop for oowGwen (Chen) Shapira

Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder

SpringPeople Introduction to Apache HadoopSpringPeople

Data Architectures for Robust Decision MakingGwen (Chen) Shapira

Have your cake and eat it tooGwen (Chen) Shapira

Kafka for DBAsGwen (Chen) Shapira

Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople

Building a Data Lake on AWSAmazon Web Services

Omnichannel Customer ExperienceDivante

Big Data in Retail - Examples in ActionDavid Pittman

Destaque (11)

Is hadoop for you

Twitter with hadoop for oow

Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)

SpringPeople Introduction to Apache Hadoop

Data Architectures for Robust Decision Making

Have your cake and eat it too

Kafka for DBAs

Introduction To Big Data Analytics On Hadoop - SpringPeople

Building a Data Lake on AWS

Omnichannel Customer Experience

Big Data in Retail - Examples in Action

Semelhante a Topic modeling using big data analytics

Topic modeling using big data analytics Farheen Nilofer

MAD skills for analysis and big data Machine LearningGianvito Siciliano

Big Data AnalyticsTyrone Systems

Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals

G017143640IOSR Journals

Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf

Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?IJCSIS Research Publications

Yarn spark next_gen_hadoop_8_jan_2014Vijay Srinivas Agneeswaran, Ph.D

PandoraPPT2Ace Haidrey

A Glimpse of Bigdata - Introductionsaisreealekhya

Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training

EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal

An experimental evaluation of performanceijcsa

عصر کلان داده، چرا و چگونه؟datastack

Big Data & HadoopKrishna Sujeer

Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR

Big data analytics 1gauravsc36

The future of Big Data toolingData Science Society

IJET-V3I2P14IJET - International Journal of Engineering and Techniques

TCS_DATA_ANALYSIS_REPORT_ADITYAAditya Srinivasan

Semelhante a Topic modeling using big data analytics (20)

Topic modeling using big data analytics

MAD skills for analysis and big data Machine Learning

Big Data Analytics

Big Data Analysis and Its Scheduling Policy – Hadoop

G017143640

Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...

Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?

Yarn spark next_gen_hadoop_8_jan_2014

PandoraPPT2

A Glimpse of Bigdata - Introduction

Module 01 - Understanding Big Data and Hadoop 1.x,2.x

EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING

An experimental evaluation of performance

عصر کلان داده، چرا و چگونه؟

Big Data & Hadoop

Unstructured Datasets Analysis: Thesaurus Model

Big data analytics 1

The future of Big Data tooling

IJET-V3I2P14

TCS_DATA_ANALYSIS_REPORT_ADITYA

Último

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg

7. Epi of Chronic respiratory diseases.pptibrahimabdi22

Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls

Discover Why Less is More in B2B Researchmichael115558

Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health

Statistics notes ,it includes mean to index numberssuginr1

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation

Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro

Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515

Gartner's Data Analytics Maturity Model.pptxchadhar227

Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg

Topic modeling using big data analytics

1. TOPIC MODELING USING BIG DATA ANALYTICS -BY SARAH MASUD(12-CSS-57) FARHEEN NILOFER(12-CSS-23)

2. INTRODUCTION WORK FLOW: Installation of Hadoop on multiple nodes- For distributed processing. Pre processing of data set - Cleaning and conversion of data into desired format. Passing the converted data to the modeling tool. Parallelizing the computation and algorithm selection. Comparison of results on the basis of Efficiency of different modelling algorithms Computation on single vs multiple node.

3. WHAT Is Big Data and Topic Modelling? Big Data: Data that cannot be stored or processed by traditional computing techniques. EXAMPLES: Black Box Data, Social Media, Space Exploration, Power and Grid Station,Search Engine Data… Topic Modelling: Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. IN LAYMAN TERMS A method of text mining to identify patterns in a corpus. Topic modeling helps us develop new ways to search, browse and summarize large archives of texts. VOLUME VARIETY VELOCITY

4. TOPIC MODELING IN IMPLEMENTATION

5. WHY Topic Modelling using Big Data ? You have to create several different variables for every single word in the corpus. The models we would be running, with roughly 2,000 documents, will get to the edge of what can be done on an average desktop machine, and commonly take a day. Hadoop is a framework which could provide all the facilities that are needed in modelling of such a huge set of data. till 2 DAYS 10 MINS ?? 5 MILLION GIGABYTES OF DATA GENERATED(amount) 2003 2011 2013 2015

6. HADOOP and its COMPONENTS Hadoop: An open source framework written in JAVA. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.(confi) It has two major component- HDFS(Hadoop Distributed File System)- For the storage. MapReduce- Processing of data.(pgram model) Hadoop Installation: Cluster of 5 (in our case) commodity hardwares. Namenode-the manager. Datanodes- the actual storage and processing units.

7. COMPARISON OF EXECUTION TIME

8. HOW Topic Modelling is Achieved Using Big Data Analytics? Proposed Algorithms: Probabilistic Latent Semantic Indexing ( PLSI) : It is a novel statistical technique for the analysis of two-mode and co-occurrence data Latent Dirichlet allocation (LDA): It’s a way of automatically discovering topics that sentences contain. Pachinko allocation Modeling correlations between topics in addition to the word correlations which constitute topics. ` d wz

9. HOW Topic Modelling is Achieved Using Big Data Analytics? TOOL S Model/ Algori thm Langu age Introd uction Mallet LDA(in cluding Naïve Bayes, Maximu m Entropy , and Decisio n Trees) Java efficient routine s for converti ng text to "feature s", a wide variety of algorith TOPIC MODELLING TOOLS

10. WHERE is Topic Modeling Using Big Data Applied? SOME APPLICATIONS OF TOPIC MODELING INCLUDE: Topic Modeling for analyzing news articles. Topic Modeling for Page Rank in Search Engines. Finding patterns in genetic data, images, social graphs. Topic modeling on historical journals.

11. REFERENCES: 1.Papadimitriou, Christos; Raghavan, Prabhakar; Tamaki, Hisao; Vempala, Santosh (1998). "Latent Semantic Indexing: A probabilistic analysis" (Postscript). Proceedings of ACM PODS. 2.Blei, David M.; Ng, Andrew Y.; Jordan, Michael I; Lafferty, John (January 2003). "Latent Dirichlet allocation". Journal of Machine Learning Research 3: 993–1022. doi:10.1162/jmlr.2003.3.4-5.993. 3.Blei, David M. (April 2012). "Introduction to Probabilistic Topic Models" (PDF). Comm. ACM 55 (4): 77–84. doi:10.1145/2133806.2133826. 4.Sanjeev Arora; Rong Ge; Ankur Moitra (April 2012). "Learning Topic Models—Going beyond SVD". arXiv:1204.1956.

12. THANK YOU

Topic modeling using big data analytics

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (11)

Semelhante a Topic modeling using big data analytics

Semelhante a Topic modeling using big data analytics (20)

Último

Último (20)

Topic modeling using big data analytics