SlideShare uma empresa Scribd logo
1 de 30
Analysis of Metadata and Topic Modeling for
Academic Articles - MIS Quarterly Journal
Under the Supervision: Dr. Arun RaiBy Jigar Mehta
May 12th, 2016
GRA Work Report Submission
Spring 2016
Results and Insights – MIS Quarterly Journal– Descriptive Stats
• #Articles published per year has increased two fold over 20 years
• Avg. #Keywords per article has doubled over 20 years
• 82% of Articles belong to two dominant categories : Research Article/Note (60%) and Special Issue (20%)
• Avg. length of articles (number of pages) per year has witnessed a three fold increase over last 20 years
• Avg. Abstract length per article was higher in ’05-’10 but has been consistent since then (~1500 characters)
• No significant trend in Avg Title length (~100 characters) per article except for small variations by year
• For last 5 years: Avg #Tables per article ranges from 7 to 8; whereas Avg #Figures per article is around 4
• For last 5 years: Avg #References per article per year has seen a small increase (Avg ~ 85 references)
• On an average there are two authors per article
Results and Insights – MIS Quarterly Journal– Content Analysis
• Based on Topic Modeling on only Abstracts for last 20 years, these 8 topics are widely discussed by authors:
User/ customer centric – approach and attributes
Product/service attributes
Ethics and legal issues
Project outsourcing, teams and offshoring
Scientific studies, analysis methods and models
Firms investments, working and capabilities
Decision support systems and framework
Organizational process development and framework
Ethics and legal issues
Product/service attributes
Project outsorcing, teams and offshoring
Scientific studies, analysis methods and…
Decision support systems and framework
Firms investments, working and capabilities
User - centric
Organizational process development
6%
11%
11%
12%
13%
14%
16%
18%
TOPICS AND THEIR WEIGHTAGE
Increasing Trend of Topics :
1. product/service attributes,
2. user-centric focused approach,
3. firms investment & capability alignment
Decreasing Trend of Topics :
1. Ethics and legal issues
2. Project outsourcing
Consistent Trend of Topic:
1. Organizational processes dev
2. Scientific studies and models
3. Decision support systems
Project
Objective and
Framework
Discussion
MISQ Journal
- Data Fetch
Python Script
to create
Metadata and
Other tables
Python Script
for Base Table
Preparation
for Analysis
R code for
Word Clouds
and Keywords
Trend
Analysis
Academic
Papers
Descriptive
Analysis -
Code and
Results
Topic
Modeling -
R code, results
and
Presentation
Topic
Modeling -
Trend
Analysis and
Presentation
Topic
Modeling -
Multiple
iterations &
Tableau
Final Results
Visualizing Work Progress
Jan 12th
Jan 19th
Feb 2th
Feb 16th
Mar 1st
Mar 8th
Mar 22th
Apr 5th
Apr 19th
May 4th
Keywords Trend Analysis –
Comparison of Word Clouds across different Time horizons
Shrinking
cognitive
agility
support
management
electronic
costs
risk
empirical
longitudinal
manufacturing
auctions
Growing
security
privacy
Online
social
software
design
data
web
product
statistics
mobile
user-behavior
trust
quality
Persistent
business
knowledge
development
outsourcing
innovation
performance
model
science
network
process
analysis
value
Top Keywords by trend behavior
Documents and words can be directly observed, topics are latent
Textual Analysis – Topic Modeling on Abstracts of Papers
Assumptions
Documents
• A Document is a mix of topics
• Single document can consist of many topics, but to different proportions
• A Topic is a mix of word
• Two documents with the same topics will have overlap in words
• Use statistics to find latent topics represented by groups of words
Topics
• To find topics that are as much distinct as each other
• To highlight the most heavily discussed topic(s) in each paper
• Keeping α low will lead to sparse topic distribution
• Keeping β low will lead to topics having less common words
Topic Modeling – Understanding LDA and latent parameters
Understanding Alpha and Beta parameters
α
• A high alpha-value means that
each document is likely to contain
a mixture of most of the topics,
and not any single topic
specifically
• A low alpha value puts less such
constraints on documents and
means that it is more likely that a
document may contain mixture of
just a few, or even only one, of
the topics.
β
• A high beta-value means that each
topic is likely to contain a mixture
of most of the words, and not any
word specifically, while
• A low value means that a topic
may contain a mixture of just a
few of the words.
Impact on Content
• In practice, A high alpha-value
will lead to documents being
more similar in terms of what
topics they contain.
• A high beta-value will similarly
lead to topics being more similar
in terms of what words they
contain.
N- iterations N- iterations α β 5 8 12 16 20
700 1500 0.02 0.02
1000 1500 0.1 0.08
2000 1500 0.3 0.1
5000 1500 0.6 0.4
8000 1500 0.8 0.6
10000 1500 1 0.8
K
Multiple Iterations – Tuning α, β, K and N – 60 Topic Models
Insights
• As α increases, topics are more evenly distributed in terms of proportion of documents they hold. Low values causes Sparse topic
distribution, High value causes topics to have common themes and hence, overlap.
• As β increases, topics are more similar in terms of the words they are made up and end up being more similar topics. Low values causes
unique topics, High values causes topic to be similar and overlap.
• As K increases, more topics are discovered. Low values causes significant topics to be missed and and higher value can cause overlapping and
similar topics.
• As N increases, topics discovery becomes stable and guarantees convergence. Low values indicated unstable and unreliable topics discovery.
Topic Model Result 1
(Topics= 8, Iterations = 1800, alpha = 0.61, beta = 0.4)
Topic Trend over years and Top words for each Topic
User –
centric
behavior
Product/
Service
attributes
Epistemological
perspectives in
IS
IS
development
/ Project
management
(outsourcing/
offshoring)
Research
Design and
Methods
IT
Strategy/
Business
Value
Changing
nature of
computing
Organizati
onal
processes
user product work project studies firm decision development
influence service theories task field firms support innovation
adoption quality managers time analysis strategic making organizations
users trust professionals communication modeling strategy virtual practice
perceived privacy quandaries projects researchers risk effectiveness technologies
usage price deception groups interpretive alignment complexity analysis
factors consumer ethical group constructs resource problem develop
intention electronic term media methods capability usersã context
security markets increase team models resources tools work
behaviors perceived stakeholder teams evaluation investments effects change
behavior products normative members case capabilities search
understandin
g
training impact challenges differences science level user action
individual content managerial control measurement significant approach practices
acceptance Market explored client construct investment world theoretical
relationship effects resolve tasks approach outsourcing develop framework
affect uncertainty law development validity benefits explanations case
support consumers conflict cultural statistical industry present processes
efficacy internet turnover offshore principles findings framework concept
implementation sales reported offshoring structural network existing developing
computer find ethics learning issues governance important role
beliefs feedback violating support techniques agility interface mechanisms
20%
7%
20%
13%
26%
12%
21%
10%
18%
13%
17%
10%
23%
27%
7%
10%
13%
11%
19%
5%
13%
7%
4%
2%
10%
6%
9%
7%
15%
8%
12%
16%
10%
15%
13% 13%
24%
13%
10%
8%
12%
13%
7%
4%
3%
7%
4%
5%
6%
3% 3%
7% 6%
4%
5%
3% 3%
8%
13%
18%
12%
22%
7%
13%
11%
8% 9%
10%
20%
10%
6%
7%
6%
7%
13%
7%
10% 10%
10%
12%
10%
13%
12%
7%
9%
8%
10%
15%
20%
12%
19%
21%
10%
9% 9%
12%
22%
6%
16%
5%
14%
18%
16%
24%
19%
6%
12%
9%
11%
15%
21%
16%
13%
20%
16%
22%
14%
18%
7%
12%
8%
9% 9%
20%
17%
14%
7%
11%
17%
11%
10% 10% 11%
19%
5%
13%
13%
21%
28%
20%
31%
21%
12%
26%
14%
12%
17%
14% 13%
25%
19% 19%
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Topic Trend over the years
user - centric product/service attributes
ethics and legal issues project outsorcing, teams and offshoring
scientific studies, analysis methods and models firms investments, working and capabilities
decision support systems and framework organizational process development
Pearson Correlation (Linear) amongst the topics
Topics
User –
centric
behavior
Product/
Service
attributes
Epistemol
ogical
perspecti
ves in IS
IS
development
/ Project
management
(outsourcing/
offshoring)
Research
Design
and
Methods
IT
Strategy/
Business
Value
Changing
nature of
computing
Organizati
onal
processes
User – centric behavior 1.00 -0.45 0.08 0.13 -0.12 -0.47 -0.49 0.12
Product/ Service attributes -0.45 1.00 -0.54 -0.27 0.22 0.21 0.04 -0.23
Epistemological perspectives in IS 0.08 -0.54 1.00 0.20 -0.24 -0.27 0.47 -0.20
IS development/ Project
management
(outsourcing/offshoring)
0.13 -0.27 0.20 1.00 -0.17 -0.48 -0.06 -0.17
Research Design and Methods -0.12 0.22 -0.24 -0.17 1.00 -0.04 -0.10 -0.38
IT Strategy/ Business Value -0.47 0.21 -0.27 -0.48 -0.04 1.00 0.15 -0.17
Changing nature of computing -0.49 0.04 0.47 -0.06 -0.10 0.15 1.00 -0.49
Organizational processes 0.12 -0.23 -0.20 -0.17 -0.38 -0.17 -0.49 1.00
Topic Model Result 2
(Topics = 8, Iterations =1500, alpha = 0.02, beta = 0.02)
Hierarchical Topic Distribution of Major Topic
Social
Media
Topic
Top
Keywords
T1 T2 T3 T4 T5 T6 T7 T8
privacy virtual firm field acceptance quandaries Product support
security task service interpretive constructs theories Price analysis
user media firms science measurement ethical Decision work
efficacy groups project principles adoption managers Trust understanding
users communication strategic case usage stakeholder Products develop
influence time projects studies models fraud markets theoretical
behaviors teams risk evaluation cultural ethics content quarterly
resistance tasks alignment evaluating culture resolve perceived suggest
work training capability published modeling explored consumer innovation
computer group investments journals construct normative consumers introduction
affect team firmã articles perceived violating market practice
compliance members outsourcing methods differences detection uncertainty level
intention minute strategy journal formative deception commerce important
professionals partitioned client discipline structural managerial electronic approach
usersã support resource critical behavioral related user potential
employees presented cost researchers validity increase quality framework
individuals worlds resources academic behavior real auctions change
attitudes periods enterprise methodology intention world sales associate
personal ideas investment rationale ease actions search action
cognitive differences level issues usefulness balancing feedback community
11%
27%
4%
13%
1%
12% 12% 11%
20%
16%
5%
7% 7%
9%
14%
19%
13% 14%
21%
12%
3%
7%
11%
18%
8%
15%
8%
13%
11%
14%
8%
14% 13%
9% 9% 9% 9%
12%
7%
4%
8%
3%
5% 4%
2% 1% 2% 3% 3% 4%
12% 12%
9% 8%
5% 4%
8%
2%
0%
14%
1%
0%
1% 1% 0% 0%
1%
0% 0% 0%
2%
4%
1% 1% 0%
1%
61%
49%
38%
59%
56%
64%
54%
66%
53%
44%
62%
52%
45%
52%
46%
50%
59%
49% 46%
2%
5%
13%
2%
1%
6%
3% 4% 4%
13%
10%
8%
12%
7%
10%
7%
9%
15%
6%
3%
11% 12%
5%
13%
3%
7%
4%
1%
4% 4%
5%
3%
2%
6%
1% 2%
6%
2%
1% 1%
4%
7% 6%
2%
7% 6% 7% 8%
3%
15%
7%
4%
2%
6%
3% 4% 4%
Topic Distribution and Trend over years
T1 T2 T3 T4 T5 T6 T7 T8
9% 9%
7%
14%
18%
22%
32%
29%
21%
12%
16%
20%
35%
28%
18%
29%
15%
10%
16%
27%
16% 16%
24%
5%
15%
8%
16%
21%
12%
9% 8%
5%
22%
12%
13% 14%
7% 8%7%
21%
18%
16%
18%
10%
13% 12% 14%
25% 26%
17%
12% 12%
8%
5% 5% 6%
3%2% 4%
14%
5%
9%
11% 12%
8%
19%
6%
18%
10% 8% 8%
11%
9%
19%
29%
14%
29%
22%
25%
20%
32%
26%
20%
15% 16%
20%
16%
18%
19%
13%
25%
21% 22%
19%
20%
12%
2%
7%
3%
6%
3%
10%
7%
5%
3% 4%
8%
11%
9%
12% 12%
16%
12%
27%
13%
26%
13%
19%
12%
14%
5%
12%
4%
21%
11%
20%
10%
7%
14%
11%
9%
17%
12%
Topic Distribution and Trend over years
T1 T2 T3 T4 T5 T6 T7
T1 T2 T3 T4 T5 T6 T7
field business decision social model innovation design
action strategy work media theoretical digital software
issues creation served community communication critical process
theories strategic task online empirical human learning
practice firms implementation change analysis service world
researchers performance group technologies behaviors context support
address term associate communities future construct science
discipline competitive virtual power review innovations problem
studies industry explanations complex culture conceptual expertise
methods internet team practice framework modeling types
academic strategies reviewers networks case technological specific
tools advantage teams network data base project
argue alignment context analysis level artifacts experience
conducted case findings features researchers infrastructure traditional
health framework computer identity implications path approach
problems investment support transfer privacy phenomena control
principles product users people contexts dominant dimensions
present sustainability professionals time impact developing effectiveness
core firm conditions practices studies lead elements
published level making mechanisms develop realism search
StopWords used in Topic Modeling
'knowledge','information','system','research','paper','study','based','literature','article','tion
',
'gss','cid','146','pls','font','misq','text','open','pp','vol','1px','post','number',
'quaterly','www','http',
'website','org','appendencies','border','systems','senior','accepting','2',
'theory',
'editor','keywords','?','1','mis','oss','technology','organization','management','knowledge',
'organization','organizations','organizational','development'
Appendix
Semantic Relatedness and TF-IDF
Semantic
Analysis
TF-IDF
Dimen-
sionality
Reduction
• Reduce high-dimensional term vector space to low-dimensional
'latent' topic space
• Two words co-occurring in a text
• signal that they are related
• document frequency determines strength of signal
• co-occurrence index
• TF: Term Frequency
• terms more frequently in document are more important
• IDF: Inverted Document Frequency
• terms in fewer documents are more specific
• TF * IDF indicates importance of term relative to the document
Topic Modeling Process – LDA Implementation Steps (Part 1)
• Cleaned the abstracts from as much noise as possible and lowercase all the abstract
• Replace all special characters and do n-gram tokenizing
• Lemmatizing - reducing words to their root form, e.g., “reviews” and “reviewing to “review”
• Removing numbers (e.g., “2014”) and removing HTML tags and symbols,
• Create Dictionaries, Corpus of Bag-of-Words
• Pass through LDA Algorithm and Evaluate
Vector Space Model
Bag of-
words Dictionaries
Tokeniz
ation
Lemmati
zation
Stopwords
Removal
LDA
Preprocessing
Topics and their Words
Tuning
Parameter
s
Dictionaries
Bag-of-
Words
Step 1:
Select β
• The term distribution β is determined for each topic by
β ∼ Dirichlet (δ).
Step 2:
Select α
• The proportions θ of the topic distribution for the document w
are determined by: θ ∼ Dirichlet (α).
Step 3:
Iterate
• For each of the N words wi
• (a) Choose a topic zi ∼ Multinomial(θ).
• (b) Choose a word wi from a multinomial probability distribution
conditioned on the topic
• zi : p(wi|zi, β).
Topic Modeling Generative Process
LDA Implementation Steps (Part 2)
For LDA the generative model consists of the following three steps :
* β is the term distribution of topics and contains the probability of a word occurring in a given topic.
* The process is purely based on frequency and co-occurrence of words
1996-2000 2001-2005
2006-2010 2011-2015
Number of Articles Published by the Year of Publication (1977 – 2015)
Total Papers = 1081
Number of Articles Published by the Category of Paper (2000-2015)
0
50
100
150
200
250
300
RESEARCH
ARTICLE
SPECIAL ISSUE RESEARCH NOTE ISSUES AND
OPINIONS
RESEARCH ESSAY THEORY AND
REVIEW
MISQ REVIEW SIM PAPER
COMPETITION
[CELLRANGE] (281)
[CELLRANGE] (111)
[CELLRANGE] (69)
[CELLRANGE] (41)
[CELLRANGE] (25) [CELLRANGE] (21)
[CELLRANGE] (7) [CELLRANGE] (3)
# Articles by Category
Total Papers = 551
Trend of Average # Keywords Per Article by Year (1996 – 2015)
Avg. #Keywords per article have doubled over 20 years
Total Papers = 584
Trend of Average Abstract length per Article by Year (1996 – 2015)
1462 1407
1329
1387
1535
1181
1374
1502
1948
2102
1926
2079 2082
1555
1497 1451 1499 1477 1467
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Avg Abs length
Avg Abs length
Total Papers = 584
Trend of Average Title length per Article by Year (2000 – 2015)
92
95
85
94
109
101
89
83
94
107
95
86 87
98
103
100
50
60
70
80
90
100
110
120
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Average Title Length Linear (Average Title Length)
Total Papers = 551
Article Size – # Pages Per Article v/s Avg File Size (KBs)
Total Papers = 1,081
0
5
10
15
20
25
30
35
0
500
1000
1500
2000
2500
3000
3500 1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
AVG#PAGESPERARTICLE
AVGFILESIZE(KB)
Avg Filze Size (KB) Avg Number of Pages Per Aricles

Mais conteúdo relacionado

Mais procurados

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
Automated content analysis of cognitive presence: Improving the quality of in...
Automated content analysis of cognitive presence: Improving the quality of in...Automated content analysis of cognitive presence: Improving the quality of in...
Automated content analysis of cognitive presence: Improving the quality of in...Vitomir Kovanovic
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...Traian Rebedea
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...Traian Rebedea
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSebastian Ruder
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueJinho Choi
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingSebastian Ruder
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
Contributions to the multidisciplinarity of computer science and IS
Contributions to the multidisciplinarity of computer science and ISContributions to the multidisciplinarity of computer science and IS
Contributions to the multidisciplinarity of computer science and ISSaïd Assar
 
LAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkLAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkHendrik Drachsler
 
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...Olga Maksimenkova
 

Mais procurados (15)

An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
Automated content analysis of cognitive presence: Improving the quality of in...
Automated content analysis of cognitive presence: Improving the quality of in...Automated content analysis of cognitive presence: Improving the quality of in...
Automated content analysis of cognitive presence: Improving the quality of in...
 
How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...How useful are semantic links for the detection of implicit references in csc...
How useful are semantic links for the detection of implicit references in csc...
 
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
PhD Defense: Computer-Based Support and Feedback for Collaborative Chat Conve...
 
Question answering
Question answeringQuestion answering
Question answering
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep Learning
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
Fuschi current Research and Developments
Fuschi current Research and DevelopmentsFuschi current Research and Developments
Fuschi current Research and Developments
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
Contributions to the multidisciplinarity of computer science and IS
Contributions to the multidisciplinarity of computer science and ISContributions to the multidisciplinarity of computer science and IS
Contributions to the multidisciplinarity of computer science and IS
 
LAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkLAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_framework
 
Latex
LatexLatex
Latex
 
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...
Providing Cognitive Scaffolding within Computer-Supported Adaptive Learning E...
 

Destaque

Topic Models, LDA and all that
Topic Models, LDA and all thatTopic Models, LDA and all that
Topic Models, LDA and all thatZhibo Xiao
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Contents page deconstructions
Contents page deconstructionsContents page deconstructions
Contents page deconstructionsdemig98
 
Centro educativo
Centro educativo Centro educativo
Centro educativo Doris975
 
IMA-Europe 2016 Awards - Introduction
IMA-Europe 2016 Awards - IntroductionIMA-Europe 2016 Awards - Introduction
IMA-Europe 2016 Awards - IntroductionAmina Langedijk
 
Advanced Wireless Report_Styles
Advanced Wireless Report_StylesAdvanced Wireless Report_Styles
Advanced Wireless Report_StylesMalcolm Lenore
 
Modificacion ley 6730 Xumek
Modificacion ley 6730 XumekModificacion ley 6730 Xumek
Modificacion ley 6730 Xumekanitapnegrim
 
Desempleo juvenil Adecco
Desempleo juvenil AdeccoDesempleo juvenil Adecco
Desempleo juvenil Adeccoanitapnegrim
 
կոմիտասի մասին
կոմիտասի մասինկոմիտասի մասին
կոմիտասի մասինvaheanush
 
Zena J. Zahran: Personal Persona Project
Zena J. Zahran:  Personal Persona ProjectZena J. Zahran:  Personal Persona Project
Zena J. Zahran: Personal Persona ProjectZena Zahran
 
Valley's Active Communication Experience
Valley's Active Communication ExperienceValley's Active Communication Experience
Valley's Active Communication ExperienceValley Expo Displays
 
IMIS 2016 - Санкт-Петербургский международный мотосалон
IMIS 2016 - Санкт-Петербургский международный мотосалонIMIS 2016 - Санкт-Петербургский международный мотосалон
IMIS 2016 - Санкт-Петербургский международный мотосалонСергей Терещенко
 
Jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzz
Jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzzJusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzz
Jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzzXavier Chefneux
 
Cardinal eganfuneralprogram
Cardinal eganfuneralprogramCardinal eganfuneralprogram
Cardinal eganfuneralprogramamason04
 

Destaque (19)

Topic Models, LDA and all that
Topic Models, LDA and all thatTopic Models, LDA and all that
Topic Models, LDA and all that
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Contents page deconstructions
Contents page deconstructionsContents page deconstructions
Contents page deconstructions
 
Centro educativo
Centro educativo Centro educativo
Centro educativo
 
Lifi
LifiLifi
Lifi
 
IMA-Europe 2016 Awards - Introduction
IMA-Europe 2016 Awards - IntroductionIMA-Europe 2016 Awards - Introduction
IMA-Europe 2016 Awards - Introduction
 
Advanced Wireless Report_Styles
Advanced Wireless Report_StylesAdvanced Wireless Report_Styles
Advanced Wireless Report_Styles
 
Modificacion ley 6730 Xumek
Modificacion ley 6730 XumekModificacion ley 6730 Xumek
Modificacion ley 6730 Xumek
 
Julia Pox Interview
Julia Pox InterviewJulia Pox Interview
Julia Pox Interview
 
Desempleo juvenil Adecco
Desempleo juvenil AdeccoDesempleo juvenil Adecco
Desempleo juvenil Adecco
 
կոմիտասի մասին
կոմիտասի մասինկոմիտասի մասին
կոմիտասի մասին
 
Neurafy_EN
Neurafy_ENNeurafy_EN
Neurafy_EN
 
Zena J. Zahran: Personal Persona Project
Zena J. Zahran:  Personal Persona ProjectZena J. Zahran:  Personal Persona Project
Zena J. Zahran: Personal Persona Project
 
Valley's Active Communication Experience
Valley's Active Communication ExperienceValley's Active Communication Experience
Valley's Active Communication Experience
 
IMIS 2016 - Санкт-Петербургский международный мотосалон
IMIS 2016 - Санкт-Петербургский международный мотосалонIMIS 2016 - Санкт-Петербургский международный мотосалон
IMIS 2016 - Санкт-Петербургский международный мотосалон
 
Jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzz
Jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzzJusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzz
Jusqu'où les entreprises sont-elles prêtes à aller trop loin pour le buzz
 
Cardinal eganfuneralprogram
Cardinal eganfuneralprogramCardinal eganfuneralprogram
Cardinal eganfuneralprogram
 

Semelhante a Analysis of Metadata and Topic Modeling for

Manuscript editing | Research data analyst | Data analysis
Manuscript editing | Research data analyst | Data analysisManuscript editing | Research data analyst | Data analysis
Manuscript editing | Research data analyst | Data analysisPubrica
 
School of Accounting Trimester 3A 2013 Information Sheet Tes.docx
School of Accounting Trimester 3A 2013 Information Sheet Tes.docxSchool of Accounting Trimester 3A 2013 Information Sheet Tes.docx
School of Accounting Trimester 3A 2013 Information Sheet Tes.docxkenjordan97598
 
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicações
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicaçõesWorkshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicações
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicaçõesSIBiUSP
 
· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx
· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx
· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docxoswald1horne84988
 
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docxDBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docxedwardmarivel
 
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...Ringgold Inc
 
If You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository ManagementIf You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository ManagementSarah Currier
 
Oklahoma Collections Innovation Presentation
Oklahoma Collections Innovation PresentationOklahoma Collections Innovation Presentation
Oklahoma Collections Innovation PresentationGreg Raschke
 
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016IXIASOFT
 
Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...
Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...
Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...Jenn Corcoran
 
Assignment 2 LASA Research ProposalSubmit your final research
Assignment 2 LASA Research ProposalSubmit your final research Assignment 2 LASA Research ProposalSubmit your final research
Assignment 2 LASA Research ProposalSubmit your final research BenitoSumpter862
 
Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...studywriters
 
Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...write4
 
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docxASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docxsherni1
 
Successful Single-Source Content Development
Successful Single-Source Content Development Successful Single-Source Content Development
Successful Single-Source Content Development Xyleme
 
TRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptx
TRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptxTRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptx
TRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptxRidha778536
 

Semelhante a Analysis of Metadata and Topic Modeling for (20)

Manuscript editing | Research data analyst | Data analysis
Manuscript editing | Research data analyst | Data analysisManuscript editing | Research data analyst | Data analysis
Manuscript editing | Research data analyst | Data analysis
 
School of Accounting Trimester 3A 2013 Information Sheet Tes.docx
School of Accounting Trimester 3A 2013 Information Sheet Tes.docxSchool of Accounting Trimester 3A 2013 Information Sheet Tes.docx
School of Accounting Trimester 3A 2013 Information Sheet Tes.docx
 
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicações
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicaçõesWorkshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicações
Workshop IEEE na USP – Como aumentar o impacto de suas pesquisas e publicações
 
· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx
· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx
· ;,Individual Research Paper TopicsDiscussion TopicIm Done.docx
 
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docxDBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
DBA CAPSTONE TEMPLATEThe pages in this template are correctl.docx
 
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
 
If You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository ManagementIf You Tag it, Will They Come? Metadata Quality and Repository Management
If You Tag it, Will They Come? Metadata Quality and Repository Management
 
Oklahoma Collections Innovation Presentation
Oklahoma Collections Innovation PresentationOklahoma Collections Innovation Presentation
Oklahoma Collections Innovation Presentation
 
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
Metrics for continual improvements - Nolwenn Kerzreho LavaconDublin2016
 
KBART update ER&L 2009
KBART update ER&L 2009KBART update ER&L 2009
KBART update ER&L 2009
 
ER&L KBART Update
ER&L KBART UpdateER&L KBART Update
ER&L KBART Update
 
Documentation Checklist
Documentation ChecklistDocumentation Checklist
Documentation Checklist
 
Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...
Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...
Smart Marketing for Engineers - IEEE GlobalSpec and TREW Marketing - 2017 Res...
 
Selling Text Analytics to your boss
Selling Text Analytics to your bossSelling Text Analytics to your boss
Selling Text Analytics to your boss
 
Assignment 2 LASA Research ProposalSubmit your final research
Assignment 2 LASA Research ProposalSubmit your final research Assignment 2 LASA Research ProposalSubmit your final research
Assignment 2 LASA Research ProposalSubmit your final research
 
Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...
 
Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...Brite Divinity School of Texas Christian Effective Communication in Businesse...
Brite Divinity School of Texas Christian Effective Communication in Businesse...
 
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docxASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
 
Successful Single-Source Content Development
Successful Single-Source Content Development Successful Single-Source Content Development
Successful Single-Source Content Development
 
TRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptx
TRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptxTRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptx
TRANSFORMASI DATA MENJADI JURNAL ILMIAH.pptx
 

Analysis of Metadata and Topic Modeling for

  • 1. Analysis of Metadata and Topic Modeling for Academic Articles - MIS Quarterly Journal Under the Supervision: Dr. Arun RaiBy Jigar Mehta May 12th, 2016 GRA Work Report Submission Spring 2016
  • 2. Results and Insights – MIS Quarterly Journal– Descriptive Stats • #Articles published per year has increased two fold over 20 years • Avg. #Keywords per article has doubled over 20 years • 82% of Articles belong to two dominant categories : Research Article/Note (60%) and Special Issue (20%) • Avg. length of articles (number of pages) per year has witnessed a three fold increase over last 20 years • Avg. Abstract length per article was higher in ’05-’10 but has been consistent since then (~1500 characters) • No significant trend in Avg Title length (~100 characters) per article except for small variations by year • For last 5 years: Avg #Tables per article ranges from 7 to 8; whereas Avg #Figures per article is around 4 • For last 5 years: Avg #References per article per year has seen a small increase (Avg ~ 85 references) • On an average there are two authors per article
  • 3. Results and Insights – MIS Quarterly Journal– Content Analysis • Based on Topic Modeling on only Abstracts for last 20 years, these 8 topics are widely discussed by authors: User/ customer centric – approach and attributes Product/service attributes Ethics and legal issues Project outsourcing, teams and offshoring Scientific studies, analysis methods and models Firms investments, working and capabilities Decision support systems and framework Organizational process development and framework Ethics and legal issues Product/service attributes Project outsorcing, teams and offshoring Scientific studies, analysis methods and… Decision support systems and framework Firms investments, working and capabilities User - centric Organizational process development 6% 11% 11% 12% 13% 14% 16% 18% TOPICS AND THEIR WEIGHTAGE Increasing Trend of Topics : 1. product/service attributes, 2. user-centric focused approach, 3. firms investment & capability alignment Decreasing Trend of Topics : 1. Ethics and legal issues 2. Project outsourcing Consistent Trend of Topic: 1. Organizational processes dev 2. Scientific studies and models 3. Decision support systems
  • 4. Project Objective and Framework Discussion MISQ Journal - Data Fetch Python Script to create Metadata and Other tables Python Script for Base Table Preparation for Analysis R code for Word Clouds and Keywords Trend Analysis Academic Papers Descriptive Analysis - Code and Results Topic Modeling - R code, results and Presentation Topic Modeling - Trend Analysis and Presentation Topic Modeling - Multiple iterations & Tableau Final Results Visualizing Work Progress Jan 12th Jan 19th Feb 2th Feb 16th Mar 1st Mar 8th Mar 22th Apr 5th Apr 19th May 4th
  • 5. Keywords Trend Analysis – Comparison of Word Clouds across different Time horizons
  • 7. Documents and words can be directly observed, topics are latent Textual Analysis – Topic Modeling on Abstracts of Papers
  • 8. Assumptions Documents • A Document is a mix of topics • Single document can consist of many topics, but to different proportions • A Topic is a mix of word • Two documents with the same topics will have overlap in words • Use statistics to find latent topics represented by groups of words Topics • To find topics that are as much distinct as each other • To highlight the most heavily discussed topic(s) in each paper • Keeping α low will lead to sparse topic distribution • Keeping β low will lead to topics having less common words
  • 9. Topic Modeling – Understanding LDA and latent parameters
  • 10. Understanding Alpha and Beta parameters α • A high alpha-value means that each document is likely to contain a mixture of most of the topics, and not any single topic specifically • A low alpha value puts less such constraints on documents and means that it is more likely that a document may contain mixture of just a few, or even only one, of the topics. β • A high beta-value means that each topic is likely to contain a mixture of most of the words, and not any word specifically, while • A low value means that a topic may contain a mixture of just a few of the words. Impact on Content • In practice, A high alpha-value will lead to documents being more similar in terms of what topics they contain. • A high beta-value will similarly lead to topics being more similar in terms of what words they contain.
  • 11. N- iterations N- iterations α β 5 8 12 16 20 700 1500 0.02 0.02 1000 1500 0.1 0.08 2000 1500 0.3 0.1 5000 1500 0.6 0.4 8000 1500 0.8 0.6 10000 1500 1 0.8 K Multiple Iterations – Tuning α, β, K and N – 60 Topic Models Insights • As α increases, topics are more evenly distributed in terms of proportion of documents they hold. Low values causes Sparse topic distribution, High value causes topics to have common themes and hence, overlap. • As β increases, topics are more similar in terms of the words they are made up and end up being more similar topics. Low values causes unique topics, High values causes topic to be similar and overlap. • As K increases, more topics are discovered. Low values causes significant topics to be missed and and higher value can cause overlapping and similar topics. • As N increases, topics discovery becomes stable and guarantees convergence. Low values indicated unstable and unreliable topics discovery.
  • 12. Topic Model Result 1 (Topics= 8, Iterations = 1800, alpha = 0.61, beta = 0.4)
  • 13. Topic Trend over years and Top words for each Topic User – centric behavior Product/ Service attributes Epistemological perspectives in IS IS development / Project management (outsourcing/ offshoring) Research Design and Methods IT Strategy/ Business Value Changing nature of computing Organizati onal processes user product work project studies firm decision development influence service theories task field firms support innovation adoption quality managers time analysis strategic making organizations users trust professionals communication modeling strategy virtual practice perceived privacy quandaries projects researchers risk effectiveness technologies usage price deception groups interpretive alignment complexity analysis factors consumer ethical group constructs resource problem develop intention electronic term media methods capability usersã context security markets increase team models resources tools work behaviors perceived stakeholder teams evaluation investments effects change behavior products normative members case capabilities search understandin g training impact challenges differences science level user action individual content managerial control measurement significant approach practices acceptance Market explored client construct investment world theoretical relationship effects resolve tasks approach outsourcing develop framework affect uncertainty law development validity benefits explanations case support consumers conflict cultural statistical industry present processes efficacy internet turnover offshore principles findings framework concept implementation sales reported offshoring structural network existing developing computer find ethics learning issues governance important role beliefs feedback violating support techniques agility interface mechanisms 20% 7% 20% 13% 26% 12% 21% 10% 18% 13% 17% 10% 23% 27% 7% 10% 13% 11% 19% 5% 13% 7% 4% 2% 10% 6% 9% 7% 15% 8% 12% 16% 10% 15% 13% 13% 24% 13% 10% 8% 12% 13% 7% 4% 3% 7% 4% 5% 6% 3% 3% 7% 6% 4% 5% 3% 3% 8% 13% 18% 12% 22% 7% 13% 11% 8% 9% 10% 20% 10% 6% 7% 6% 7% 13% 7% 10% 10% 10% 12% 10% 13% 12% 7% 9% 8% 10% 15% 20% 12% 19% 21% 10% 9% 9% 12% 22% 6% 16% 5% 14% 18% 16% 24% 19% 6% 12% 9% 11% 15% 21% 16% 13% 20% 16% 22% 14% 18% 7% 12% 8% 9% 9% 20% 17% 14% 7% 11% 17% 11% 10% 10% 11% 19% 5% 13% 13% 21% 28% 20% 31% 21% 12% 26% 14% 12% 17% 14% 13% 25% 19% 19% 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Topic Trend over the years user - centric product/service attributes ethics and legal issues project outsorcing, teams and offshoring scientific studies, analysis methods and models firms investments, working and capabilities decision support systems and framework organizational process development
  • 14. Pearson Correlation (Linear) amongst the topics Topics User – centric behavior Product/ Service attributes Epistemol ogical perspecti ves in IS IS development / Project management (outsourcing/ offshoring) Research Design and Methods IT Strategy/ Business Value Changing nature of computing Organizati onal processes User – centric behavior 1.00 -0.45 0.08 0.13 -0.12 -0.47 -0.49 0.12 Product/ Service attributes -0.45 1.00 -0.54 -0.27 0.22 0.21 0.04 -0.23 Epistemological perspectives in IS 0.08 -0.54 1.00 0.20 -0.24 -0.27 0.47 -0.20 IS development/ Project management (outsourcing/offshoring) 0.13 -0.27 0.20 1.00 -0.17 -0.48 -0.06 -0.17 Research Design and Methods -0.12 0.22 -0.24 -0.17 1.00 -0.04 -0.10 -0.38 IT Strategy/ Business Value -0.47 0.21 -0.27 -0.48 -0.04 1.00 0.15 -0.17 Changing nature of computing -0.49 0.04 0.47 -0.06 -0.10 0.15 1.00 -0.49 Organizational processes 0.12 -0.23 -0.20 -0.17 -0.38 -0.17 -0.49 1.00
  • 15. Topic Model Result 2 (Topics = 8, Iterations =1500, alpha = 0.02, beta = 0.02)
  • 16. Hierarchical Topic Distribution of Major Topic Social Media Topic Top Keywords
  • 17. T1 T2 T3 T4 T5 T6 T7 T8 privacy virtual firm field acceptance quandaries Product support security task service interpretive constructs theories Price analysis user media firms science measurement ethical Decision work efficacy groups project principles adoption managers Trust understanding users communication strategic case usage stakeholder Products develop influence time projects studies models fraud markets theoretical behaviors teams risk evaluation cultural ethics content quarterly resistance tasks alignment evaluating culture resolve perceived suggest work training capability published modeling explored consumer innovation computer group investments journals construct normative consumers introduction affect team firmã articles perceived violating market practice compliance members outsourcing methods differences detection uncertainty level intention minute strategy journal formative deception commerce important professionals partitioned client discipline structural managerial electronic approach usersã support resource critical behavioral related user potential employees presented cost researchers validity increase quality framework individuals worlds resources academic behavior real auctions change attitudes periods enterprise methodology intention world sales associate personal ideas investment rationale ease actions search action cognitive differences level issues usefulness balancing feedback community 11% 27% 4% 13% 1% 12% 12% 11% 20% 16% 5% 7% 7% 9% 14% 19% 13% 14% 21% 12% 3% 7% 11% 18% 8% 15% 8% 13% 11% 14% 8% 14% 13% 9% 9% 9% 9% 12% 7% 4% 8% 3% 5% 4% 2% 1% 2% 3% 3% 4% 12% 12% 9% 8% 5% 4% 8% 2% 0% 14% 1% 0% 1% 1% 0% 0% 1% 0% 0% 0% 2% 4% 1% 1% 0% 1% 61% 49% 38% 59% 56% 64% 54% 66% 53% 44% 62% 52% 45% 52% 46% 50% 59% 49% 46% 2% 5% 13% 2% 1% 6% 3% 4% 4% 13% 10% 8% 12% 7% 10% 7% 9% 15% 6% 3% 11% 12% 5% 13% 3% 7% 4% 1% 4% 4% 5% 3% 2% 6% 1% 2% 6% 2% 1% 1% 4% 7% 6% 2% 7% 6% 7% 8% 3% 15% 7% 4% 2% 6% 3% 4% 4% Topic Distribution and Trend over years T1 T2 T3 T4 T5 T6 T7 T8 9% 9% 7% 14% 18% 22% 32% 29% 21% 12% 16% 20% 35% 28% 18% 29% 15% 10% 16% 27% 16% 16% 24% 5% 15% 8% 16% 21% 12% 9% 8% 5% 22% 12% 13% 14% 7% 8%7% 21% 18% 16% 18% 10% 13% 12% 14% 25% 26% 17% 12% 12% 8% 5% 5% 6% 3%2% 4% 14% 5% 9% 11% 12% 8% 19% 6% 18% 10% 8% 8% 11% 9% 19% 29% 14% 29% 22% 25% 20% 32% 26% 20% 15% 16% 20% 16% 18% 19% 13% 25% 21% 22% 19% 20% 12% 2% 7% 3% 6% 3% 10% 7% 5% 3% 4% 8% 11% 9% 12% 12% 16% 12% 27% 13% 26% 13% 19% 12% 14% 5% 12% 4% 21% 11% 20% 10% 7% 14% 11% 9% 17% 12% Topic Distribution and Trend over years T1 T2 T3 T4 T5 T6 T7 T1 T2 T3 T4 T5 T6 T7 field business decision social model innovation design action strategy work media theoretical digital software issues creation served community communication critical process theories strategic task online empirical human learning practice firms implementation change analysis service world researchers performance group technologies behaviors context support address term associate communities future construct science discipline competitive virtual power review innovations problem studies industry explanations complex culture conceptual expertise methods internet team practice framework modeling types academic strategies reviewers networks case technological specific tools advantage teams network data base project argue alignment context analysis level artifacts experience conducted case findings features researchers infrastructure traditional health framework computer identity implications path approach problems investment support transfer privacy phenomena control principles product users people contexts dominant dimensions present sustainability professionals time impact developing effectiveness core firm conditions practices studies lead elements published level making mechanisms develop realism search
  • 18. StopWords used in Topic Modeling 'knowledge','information','system','research','paper','study','based','literature','article','tion ', 'gss','cid','146','pls','font','misq','text','open','pp','vol','1px','post','number', 'quaterly','www','http', 'website','org','appendencies','border','systems','senior','accepting','2', 'theory', 'editor','keywords','?','1','mis','oss','technology','organization','management','knowledge', 'organization','organizations','organizational','development'
  • 20. Semantic Relatedness and TF-IDF Semantic Analysis TF-IDF Dimen- sionality Reduction • Reduce high-dimensional term vector space to low-dimensional 'latent' topic space • Two words co-occurring in a text • signal that they are related • document frequency determines strength of signal • co-occurrence index • TF: Term Frequency • terms more frequently in document are more important • IDF: Inverted Document Frequency • terms in fewer documents are more specific • TF * IDF indicates importance of term relative to the document
  • 21. Topic Modeling Process – LDA Implementation Steps (Part 1) • Cleaned the abstracts from as much noise as possible and lowercase all the abstract • Replace all special characters and do n-gram tokenizing • Lemmatizing - reducing words to their root form, e.g., “reviews” and “reviewing to “review” • Removing numbers (e.g., “2014”) and removing HTML tags and symbols, • Create Dictionaries, Corpus of Bag-of-Words • Pass through LDA Algorithm and Evaluate Vector Space Model Bag of- words Dictionaries Tokeniz ation Lemmati zation Stopwords Removal LDA Preprocessing Topics and their Words Tuning Parameter s Dictionaries Bag-of- Words
  • 22. Step 1: Select β • The term distribution β is determined for each topic by β ∼ Dirichlet (δ). Step 2: Select α • The proportions θ of the topic distribution for the document w are determined by: θ ∼ Dirichlet (α). Step 3: Iterate • For each of the N words wi • (a) Choose a topic zi ∼ Multinomial(θ). • (b) Choose a word wi from a multinomial probability distribution conditioned on the topic • zi : p(wi|zi, β). Topic Modeling Generative Process LDA Implementation Steps (Part 2) For LDA the generative model consists of the following three steps : * β is the term distribution of topics and contains the probability of a word occurring in a given topic. * The process is purely based on frequency and co-occurrence of words
  • 25. Number of Articles Published by the Year of Publication (1977 – 2015) Total Papers = 1081
  • 26. Number of Articles Published by the Category of Paper (2000-2015) 0 50 100 150 200 250 300 RESEARCH ARTICLE SPECIAL ISSUE RESEARCH NOTE ISSUES AND OPINIONS RESEARCH ESSAY THEORY AND REVIEW MISQ REVIEW SIM PAPER COMPETITION [CELLRANGE] (281) [CELLRANGE] (111) [CELLRANGE] (69) [CELLRANGE] (41) [CELLRANGE] (25) [CELLRANGE] (21) [CELLRANGE] (7) [CELLRANGE] (3) # Articles by Category Total Papers = 551
  • 27. Trend of Average # Keywords Per Article by Year (1996 – 2015) Avg. #Keywords per article have doubled over 20 years Total Papers = 584
  • 28. Trend of Average Abstract length per Article by Year (1996 – 2015) 1462 1407 1329 1387 1535 1181 1374 1502 1948 2102 1926 2079 2082 1555 1497 1451 1499 1477 1467 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Avg Abs length Avg Abs length Total Papers = 584
  • 29. Trend of Average Title length per Article by Year (2000 – 2015) 92 95 85 94 109 101 89 83 94 107 95 86 87 98 103 100 50 60 70 80 90 100 110 120 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Average Title Length Linear (Average Title Length) Total Papers = 551
  • 30. Article Size – # Pages Per Article v/s Avg File Size (KBs) Total Papers = 1,081 0 5 10 15 20 25 30 35 0 500 1000 1500 2000 2500 3000 3500 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 AVG#PAGESPERARTICLE AVGFILESIZE(KB) Avg Filze Size (KB) Avg Number of Pages Per Aricles