SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Developing the Korean Internet Network
Miner(KINM):
E-research Tool for Social Network Analysis
of Blogospherein South Korea*



       Anatoliy Gruzd1), Chung Joo Chung2), Jaeeun(Angela) Yoo3),
                                                         박한우4)



                   2009년 한국자료분석학회 가을철 학술대회
Anatoliy Gruzd1),
A ssistant Professor, School of Information Management, Dalhousie University,
Canada
E-mail : agruzd@gmail.com


Chung Joo Chung2)
Ph.D. Candidate, Department of Communication, State University of New York at
Buffalo, USA
E-mail : idream4you@gmail.com


Jaeeun(Angela) Yoo3)
3B.S. Student, Division of Engineering Science, University of Toronto, Canada
E-mail : angela.yoo@utoronto.ca


박한우4)
(Corresponding A uthor) A ssociate Professor, Department of Media and
Information, YeungNam
University, Korea.
E-mail : hanpark@ ynu.ac.kr

                              2009년 한국자료분석학회 가을철 학술대회
Contents

     Introduction
     Related Studies: Tools for Blog Network Analysis

     Section 1.
       Development of the Korean Internet Network Miner
     Section2.
       Evaluation of the Name Network Discovery Algorithm

     Conclusion


Page  3               2009년 한국자료분석학회 가을철 학술대회

                          료분석학회 가을철 학술대회
Introduction
  The growing adoption of e-research tools has lead to
  changes in social and communication research(Jankowski, 2009)




   Typical technologies in this domain include LexiURL(Thelwall,
  2009), Virtual Observatory for the Study of Online Networks
  (Ackland, 2009), and Internet Community Text Analyzer(ICTA;
  Gruzd, 2009a).




Page  4                2009년 한국자료분석학회 가을철 학술대회

                           료분석학회 가을철 학술대회
Introduction
  In contrast to the e-research developments in North America and
  Europe, Soon and Park(2009) note that digital tools to support
  e-research are rare in Asia, even in South Korea (Internet World
  Stats, 2008).


    Thus, we attempt to develop an e-research tool for automatic discovery
    of online communication networks on the Korean Web.

     First section deals with prior studies related to large-scale blog network
     analysis using automatic tools,
     Second section illustrates the process of developing our analytic tool called
           the Korean Internet Network Miner(KINM).



Page  5                        2009년 한국자료분석학회 가을철 학술대회

                                    료분석학회 가을철 학술대회
Related Studies:                 Tools for Blog Network Analysis



 The structure of networks can be measured mathematically and
 visualized graphically. The shape of a network emerging from online
 users’ writing and linking choices reflects interest trends.

 Research on blog networks have confirmed that large-scale online
 communities are structurally reflected in higher density network
 neighborhoods through linking(Kelly and Etling, 2008).




Page  6               2009년 한국자료분석학회 가을철 학술대회

                           료분석학회 가을철 학술대회
Related Studies:                                  Tools for Blog Network Analysis

 Traditional data mining research
 • Traditional data mining research focuses largely on algorithms for
  inferring association rules and other statistical correlation measures
  in a given data set(Kumar et al., 1999; Jung, 2009).




 For example)
 • Kelly and Etling(2008) used research firm Morningside Analytics for blog Selection and data mapping
   along with Fruchterman-Rheingold's "physics model”algorithm to understand the blog networks of
   the Iran blogosphere.
 • Gryc and his colleagues(2008) developed categories for blog networks they studied by analyzing
   key words, post classification, and linking patterns of blogs.




Page  7                             2009년 한국자료분석학회 가을철 학술대회

                                          료분석학회 가을철 학술대회
Related Studies:                     Tools for Blog Network Analysis

  Current research

  • The current research uses a web-based system for automated text
    analysis to discover and understand social networks from blog data.

  • It focuses not only on chain networks—social networks based on
    the number of messages exchanged between individuals—but also on
    name networks—social networks built from mining personal
    names and nicknames.




Page  8                   2009년 한국자료분석학회 가을철 학술대회

                               료분석학회 가을철 학술대회
Section 1.
Development of the Korean Internet Network Miner
  As an initial framework for the KINM, we used some of the social network
 discovery and visualization tools and techniques previously developed by
 Gruzd(2009).
  These tools and techniques were developed as part of a General
 purpose web system for content and network analysis of computer-
 mediated communication in English called ICTA
 (available at http://textanalytics.net).




Page  9                  2009년 한국자료분석학회 가을철 학술대회

                              료분석학회 가을철 학술대회
Section 1.
Development of the Korean Internet Network Miner
  As an initial framework for the KINM, we used some of the social network
 discovery and visualization tools and techniques previously developed by
 Gruzd(2009).
  These tools and techniques were developed as part of a General
 purpose web system for content and network analysis of computer-
 mediated communication in English called ICTA
 (available at http://textanalytics.net).

            Barriers>
            1. since ICTA only works with texts in English, we had to modify all text
               processing functions to support Korean texts.
            2. ICTA requires the data to be stored in a machine-readable format such
              as an RSS feed. However, after a manual examination of a number of
              Korean blogs, we noticed that the majority of them do not provide RSS
               feeds for their comments data.

Page  10                          2009년 한국자료분석학회 가을철 학술대회

                                       료분석학회 가을철 학술대회
Section 1.
Development of the Korean Internet Network Miner
 The reason we were especially interested in analyzing comments data
 was because comments turned out to be a good source for mining social
 connections among blog readers because comments contain
 most of The social interactions on a blog.

 To address this challenge, we created a script using the Kapow Mashup
 Server (http://www.kapowtech.com) to retrieve comments from a selected
 Blog automatically and output them as an RSS feed.




Page  11                2009년 한국자료분석학회 가을철 학술대회

                            료분석학회 가을철 학술대회
Section 1.
Development of the Korean Internet Network Miner

  After retrieving the blog data, it was processed to build two types of networks.

 • First, a chain network was extracted. In the chain network, one
   commentator is connected to another if the first commentator directly
   replied to the second commentator by clicking on the "reply-to" button.

 • However, after manually examining a number of comments on several
   blogs, we found that there are some comments that are not "reply-to"
   comments, but are addressing or referencing a previous poster.

      This observation is in-line with a previous empirical
      study on online Learning communities by Gruzd(2009a),
      which discovered that the chain network misses on
      average   40%       of possible connections.
Page  12                     2009년 한국자료분석학회 가을철 학술대회

                                   료분석학회 가을철 학술대회
Section 1.
Development of the Korean Internet Network Miner

    Name Network>
 • Instead of just relying on information about who replied to whom, the
  Name network method starts by automatically finding all mentions
  of personal names or nicknames in comments and uses them as nodes
  in a social network.

 • Next, to discover ties between nodes, the method connects a sender of
   a comment to all names found in his/her comment. (A more detailed
   description of this method can be found in Gruzd(2009b).)

 • Although the name network approach provides additional information
  about connections among blog commentators, it has its own challenges.
  - personal name/nickname and a word that just appears to be one.


Page  13               2009년 한국자료분석학회 가을철 학술대회

                            료분석학회 가을철 학술대회
Section 1.
Development of the Korean Internet Network Miner


                                                         For example, the algorithm marked
                                                        the word 사람 (people) as a
                                                        reference to another person on
                                                        the blog.

                                                         This happened because there was
                                                        at least one comment in the
                                                        dataset posted by a person with the
                                                        "사람" nickname.
            Figure 1: Sample comment
                                                         However, in the sample comment,
                                                        this word does not refer to another
                                                        online participant; it is used as a
                                                        noun that means "people".




Page  14                       2009년 한국자료분석학회 가을철 학술대회

                                       료분석학회 가을철 학술대회
Section 1.
Development of the Korean Internet Network Miner

 Name Network>
 Another good example of challenges associated with the name/nickname
 disambiguation problem in comments is the word "2mb".
 1. this word can be used as a nickname for one of the blog commentators.
 2. it could refer to the capacity of a computer memory (2 megabytes).
 3. it could be the alias of the current Korean president, Lee Myung-Bak.




  To address these challenges and develop recommendations for the next
 generation of the name network discovery algorithm, we conducted a
 semi-automated analysis of all names/nicknames discovered from a
 sample dataset using our initial algorithm.

Page  15                 2009년 한국자료분석학회 가을철 학술대회

                              료분석학회 가을철 학술대회
Section 2.
Evaluation of the Name Network Discovery Algorithm

 To evaluate our automated approach for analyzing communication
 networks from blog comments, specifically the accuracy of the name
 Network discovery algorithm,

  We selected a single blog
 authored by 방짜(bangzza)
 from
 http://blog.ohmynews.com/bangzza




Page  16                 2009년 한국자료분석학회 가을철 학술대회

                              료분석학회 가을철 학술대회
Section 2.
Evaluation of the Name Network Discovery Algorithm

 OhMyNews was ranked as one of the top three web sites in terms of
 blog users in 2009(Rankey.com, 2009) it is frequently ranked as the most
 popular blog site in Korea, registering over 20 million page views per day
 during the presidential election.

  Users of OhMyNews, known as "news guerrillas", contribute news articles on
 the Web site. OhMyNews allows individuals in far-flung locations to come together,
 share, and build strong ties and a sense of community—united in ideology even
 if separated by geographic distance—
 that fosters a true Grassroots movement
 (Streitmatter, 2001).




Page  17                    2009년 한국자료분석학회 가을철 학술대회

                                 료분석학회 가을철 학술대회
Section 2.
Evaluation of the Name Network Discovery Algorithm
    For our tests, we retrieved and analyzed a sample set of 943 comments
       (posted between April 2008 and April 2009) from the selected blog.

    In the study, we relied on an interactive tag cloud feature available in
  KINM To explore and evaluate all names and nicknames that were found
                          automatically (see Figure 2).




            <Figure 2> An interactive tag cloud showing the 50 most frequently used name/nickname
                                      candidates found in the sample dataset

Page  18                           2009년 한국자료분석학회 가을철 학술대회

                                         료분석학회 가을철 학술대회
Section 2.
Evaluation of the Name Network Discovery Algorithm

     The evaluation procedure involved clicking on each word found by the
    name network algorithm and exploring the context where each instance
   of the word was used(see Figure 3). The purpose of this semi-automated
   analysis was to discover what name/nickname candidates were identified
                             incorrectly and why.




                        <Figure 3> A list of messages containing "2MB”

Page  19                  2009년 한국자료분석학회 가을철 학술대회

                                 료분석학회 가을철 학술대회
Section 2.
Evaluation of the Name Network Discovery Algorithm
  The following set includes clues suggesting that a word is
 likely to be a nickname :

● a word candidate is followed by a context word such as "님" = an honorific or "씨" = Mr./Ms.;
other possibilities, although rare, include "굮" = Mr. for younger males or "양양" = Miss/Ms. for
younger females at the end of a word candidate, and "미스터" = Mr. or "미스" = Miss at the beginning;

● a word candidate contains a combination of characters(Korean, English and/or Chinese),
symbols(e.g., underscores, hyphens) and numbers;

● a word candidate appears to be a real name, which is almost always three characters:
a single-character last name followed by the two-character first name, which may be found
in a dictionary of first names and/or common characters used therein;

● a word candidate is a less common, non-topic word(e.g., "너구리" = raccoon);

● a word candidate is followed by punctuation indicative of someone being addressed (e.g., "/" or ":");

● a word candidate contains patterns indicative of non-native words-phonetic Koreanization of English
(e.g., "미디어몽골" = mediamogul = Media Mogul) or phonetic romanization of Korean(e.g. "jihwaja"
= 지화자).

Page  20                           2009년 한국자료분석학회 가을철 학술대회

                                         료분석학회 가을철 학술대회
Section 2.
Evaluation of the Name Network Discovery Algorithm
 The second set includes clues suggesting that a word is
 NOT likely to be used as a nickname:

● a word candidate is a phrase—for example, if the nickname input (the "FROM"field) is
Used more like a subject line(possible indicators include white spaces and length);

● a word candidate consists of a single character(e.g., "a" or "ㄱ");

● a word candidate consists of netspeak, including emoticons(e.g. "=_="), slang and
abbreviations(e.g., using "2MB" to refer to the current Korean president), and onomatopoeia
(e.g. "ㅉㅉ" = tsk tsk, ” ㅋㅋ" = heehee, "하하" = haha, "음" = hmm);

● a word candidate appears more than one time in the comment;

● a word candidate consists of random characters(e.g. "ㅁㄴㅇㄹ" or "asdf");

● a word candidate is a short, conversational word or phrase(e.g., "나나 " = me, "아이고" =
oh no, "그래서" = so/therefore);

● a word candidate is a common word or idea in the given context/topic(e.g., "대한민국" =
Republic of Korea, "쥐체사상" = a newly created word used to refer to political fanatics).

Page  21                           2009년 한국자료분석학회 가을철 학술대회

                                         료분석학회 가을철 학술대회
Conclusion

  This research briefly reviewed some of the studies related to
  a large-scale blog analysis using automatic tools.

  We reviewed the process of developing our own analysis tool
  KINM. The main goal of KINM is to automate the process of
  finding communication networks in the Korean blogosphere
  that accurately represent social interactions among blog
  readers.

  To find these networks, the system relies on a set of text
  mining techniques to look for personal names and
  nicknames in users’comments.

Page  22              2009년 한국자료분석학회 가을철 학술대회

                          료분석학회 가을철 학술대회
Conclusion

   To address some of the challenges associated with the
  automated discovery of names and nicknames in Korean
  texts, this paper also presented an exploratory study of a
  sample dataset.
  The study suggests a set of additional rules to improve the
  accuracy of the current name/nickname discovery algorithm
  used in KINM.

  These additional rules will be incorporated into KINM and
  evaluated in a subsequent study.


Page  23              2009년 한국자료분석학회 가을철 학술대회

                          료분석학회 가을철 학술대회
Thank you.


Page  24    2009년 한국자료분석학회 가을철 학술대회

                료분석학회 가을철 학술대회

Mais conteúdo relacionado

Mais procurados

2010 09-17-張志威老師-confucius & “its” intelligent disciples
2010 09-17-張志威老師-confucius & “its” intelligent disciples2010 09-17-張志威老師-confucius & “its” intelligent disciples
2010 09-17-張志威老師-confucius & “its” intelligent disciples
nccuscience
 
On client’s interactive behaviour to design peer selection policies for bitto...
On client’s interactive behaviour to design peer selection policies for bitto...On client’s interactive behaviour to design peer selection policies for bitto...
On client’s interactive behaviour to design peer selection policies for bitto...
IJCNCJournal
 
2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spaces2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spaces
Marc Smith
 
Meyer dig ethno_2013sdp
Meyer dig ethno_2013sdpMeyer dig ethno_2013sdp
Meyer dig ethno_2013sdp
Eric Meyer
 
1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chat1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chat
Marc Smith
 

Mais procurados (20)

Scalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on MicroblogsScalable Topic-Specific Influence Analysis on Microblogs
Scalable Topic-Specific Influence Analysis on Microblogs
 
Personalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) WebPersonalizing Media Interaction on the (Semantic & Social) Web
Personalizing Media Interaction on the (Semantic & Social) Web
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTubeWebometrics and Studies of Cultural Diffusion-Psy Gangnam Style on YouTube
Webometrics and Studies of Cultural Diffusion -Psy Gangnam Style on YouTube
 
2010 09-17-張志威老師-confucius & “its” intelligent disciples
2010 09-17-張志威老師-confucius & “its” intelligent disciples2010 09-17-張志威老師-confucius & “its” intelligent disciples
2010 09-17-張志威老師-confucius & “its” intelligent disciples
 
On client’s interactive behaviour to design peer selection policies for bitto...
On client’s interactive behaviour to design peer selection policies for bitto...On client’s interactive behaviour to design peer selection policies for bitto...
On client’s interactive behaviour to design peer selection policies for bitto...
 
2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spaces2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spaces
 
What do we get from Twitter - and what not?
What do we get from Twitter - and what not?What do we get from Twitter - and what not?
What do we get from Twitter - and what not?
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
DCLA14_Haythornthwaite_Absar_Paulin
DCLA14_Haythornthwaite_Absar_PaulinDCLA14_Haythornthwaite_Absar_Paulin
DCLA14_Haythornthwaite_Absar_Paulin
 
Socio Media Connect: A Social Profile based P2P Network
Socio Media Connect: A Social Profile based P2P NetworkSocio Media Connect: A Social Profile based P2P Network
Socio Media Connect: A Social Profile based P2P Network
 
Simplifying Social Network Diagrams
Simplifying Social Network Diagrams Simplifying Social Network Diagrams
Simplifying Social Network Diagrams
 
Network Structure For Social Network
Network Structure For Social NetworkNetwork Structure For Social Network
Network Structure For Social Network
 
Meyer dig ethno_2013sdp
Meyer dig ethno_2013sdpMeyer dig ethno_2013sdp
Meyer dig ethno_2013sdp
 
1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chat1999-UIST-Alternative interfaces for chat
1999-UIST-Alternative interfaces for chat
 
Studying people who can talk back, Meyer 2013 DH at Oxford summer school
Studying people who can talk back, Meyer 2013 DH at Oxford summer schoolStudying people who can talk back, Meyer 2013 DH at Oxford summer school
Studying people who can talk back, Meyer 2013 DH at Oxford summer school
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
Grade 7 Data Communication Lesson
Grade 7 Data Communication LessonGrade 7 Data Communication Lesson
Grade 7 Data Communication Lesson
 

Destaque (7)

인터넷소셜미디어개론3
인터넷소셜미디어개론3인터넷소셜미디어개론3
인터넷소셜미디어개론3
 
Disc 2014 program at a glance
Disc 2014 program at a glanceDisc 2014 program at a glance
Disc 2014 program at a glance
 
Election meeting 2nd
Election meeting 2ndElection meeting 2nd
Election meeting 2nd
 
Chang network analysis
Chang network analysisChang network analysis
Chang network analysis
 
융합사회의 정보분석의 매커니즘과 사례(11aug2010)sm
융합사회의 정보분석의 매커니즘과 사례(11aug2010)sm융합사회의 정보분석의 매커니즘과 사례(11aug2010)sm
융합사회의 정보분석의 매커니즘과 사례(11aug2010)sm
 
김숙현 영남지역언론사의온라인사회자본분석
김숙현 영남지역언론사의온라인사회자본분석김숙현 영남지역언론사의온라인사회자본분석
김숙현 영남지역언론사의온라인사회자본분석
 
웹보메트릭스로 밝혀낸 소셜미디어 감성 (20 nov2011)경주감성과학회 최종본2
웹보메트릭스로 밝혀낸 소셜미디어 감성 (20 nov2011)경주감성과학회 최종본2웹보메트릭스로 밝혀낸 소셜미디어 감성 (20 nov2011)경주감성과학회 최종본2
웹보메트릭스로 밝혀낸 소셜미디어 감성 (20 nov2011)경주감성과학회 최종본2
 

Semelhante a Developing the korean_internet_network_miner_change

A history of clu
A history of cluA history of clu
A history of clu
sugeladi
 
Requirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docx
Requirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docxRequirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docx
Requirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docx
heunice
 
SocialCom 2009 - Social Synchrony
SocialCom 2009 - Social SynchronySocialCom 2009 - Social Synchrony
SocialCom 2009 - Social Synchrony
Munmun De Choudhury
 
What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...
nooriasukmaningtyas
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
Harish Vaidyanathan
 

Semelhante a Developing the korean_internet_network_miner_change (20)

A Study of Internet RFC Authors using NetDraw and yEd
A Study of Internet RFC Authors using NetDraw and yEdA Study of Internet RFC Authors using NetDraw and yEd
A Study of Internet RFC Authors using NetDraw and yEd
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...
 
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBOA COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
A COMPREHENSIVE STUDY ON DATA EXTRACTION IN SINA WEIBO
 
A history of clu
A history of cluA history of clu
A history of clu
 
On the Navigability of Social Tagging Systems
On the Navigability of Social Tagging SystemsOn the Navigability of Social Tagging Systems
On the Navigability of Social Tagging Systems
 
Online social network
Online social networkOnline social network
Online social network
 
Graph
GraphGraph
Graph
 
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
An Empirical Study On IMDb And Its Communities Based On The Network Of Co-Rev...
 
Q046049397
Q046049397Q046049397
Q046049397
 
Requirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docx
Requirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docxRequirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docx
Requirements1) Due 6 Oct2) APA 6th Ed format3) 6 Pages.docx
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
SocialCom 2009 - Social Synchrony
SocialCom 2009 - Social SynchronySocialCom 2009 - Social Synchrony
SocialCom 2009 - Social Synchrony
 
Webometrics Revisited in Big Data Age_DISC2013
Webometrics Revisited in Big Data Age_DISC2013Webometrics Revisited in Big Data Age_DISC2013
Webometrics Revisited in Big Data Age_DISC2013
 
What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...What network simulator questions do users ask? a large-scale study of stack o...
What network simulator questions do users ask? a large-scale study of stack o...
 
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
An Efficient Modified Common Neighbor Approach for Link Prediction in Social ...
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
 

Mais de Han Woo PARK

4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등
4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등
4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등
Han Woo PARK
 
박한우 교수 프로파일 (31 oct2017)
박한우 교수 프로파일 (31 oct2017)박한우 교수 프로파일 (31 oct2017)
박한우 교수 프로파일 (31 oct2017)
Han Woo PARK
 
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
Han Woo PARK
 
페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회
페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회
페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회
Han Woo PARK
 
세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다
Han Woo PARK
 

Mais de Han Woo PARK (20)

소셜 빅데이터를 활용한_페이스북_이용자들의_반응과_관계_분석
소셜 빅데이터를 활용한_페이스북_이용자들의_반응과_관계_분석소셜 빅데이터를 활용한_페이스북_이용자들의_반응과_관계_분석
소셜 빅데이터를 활용한_페이스북_이용자들의_반응과_관계_분석
 
페이스북 선도자 탄핵촛불에서 캠폐인 이동경로
페이스북 선도자 탄핵촛불에서 캠폐인 이동경로페이스북 선도자 탄핵촛불에서 캠폐인 이동경로
페이스북 선도자 탄핵촛불에서 캠폐인 이동경로
 
WATEF 2018 신년 세미나(수정)
WATEF 2018 신년 세미나(수정)WATEF 2018 신년 세미나(수정)
WATEF 2018 신년 세미나(수정)
 
세계트리플헬릭스미래전략학회 WATEF 2018 신년 세미나
세계트리플헬릭스미래전략학회 WATEF 2018 신년 세미나세계트리플헬릭스미래전략학회 WATEF 2018 신년 세미나
세계트리플헬릭스미래전략학회 WATEF 2018 신년 세미나
 
Disc 2015 보도자료 (휴대폰번호 삭제-수정)
Disc 2015 보도자료 (휴대폰번호 삭제-수정)Disc 2015 보도자료 (휴대폰번호 삭제-수정)
Disc 2015 보도자료 (휴대폰번호 삭제-수정)
 
Another Interdisciplinary Transformation: Beyond an Area-studies Journal
Another Interdisciplinary Transformation: Beyond an Area-studies JournalAnother Interdisciplinary Transformation: Beyond an Area-studies Journal
Another Interdisciplinary Transformation: Beyond an Area-studies Journal
 
4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등
4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등
4차산업혁명 린든달러 비트코인 알트코인 암호화폐 가상화폐 등
 
KISTI-WATEF-BK21Plus-사이버감성연구소 2017 동계세미나 자료집
KISTI-WATEF-BK21Plus-사이버감성연구소 2017 동계세미나 자료집KISTI-WATEF-BK21Plus-사이버감성연구소 2017 동계세미나 자료집
KISTI-WATEF-BK21Plus-사이버감성연구소 2017 동계세미나 자료집
 
박한우 교수 프로파일 (31 oct2017)
박한우 교수 프로파일 (31 oct2017)박한우 교수 프로파일 (31 oct2017)
박한우 교수 프로파일 (31 oct2017)
 
Global mapping of artificial intelligence in Google and Google Scholar
Global mapping of artificial intelligence in Google and Google ScholarGlobal mapping of artificial intelligence in Google and Google Scholar
Global mapping of artificial intelligence in Google and Google Scholar
 
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
 
향기담은 하루찻집
향기담은 하루찻집향기담은 하루찻집
향기담은 하루찻집
 
Twitter network map of #ACPC2017 1st day using NodeXL
Twitter network map of #ACPC2017 1st day using NodeXLTwitter network map of #ACPC2017 1st day using NodeXL
Twitter network map of #ACPC2017 1st day using NodeXL
 
페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회
페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회
페이스북 댓글을 통해 살펴본 대구·경북(TK) 촛불집회
 
Facebook bigdata to understand regime change and migration patterns during ca...
Facebook bigdata to understand regime change and migration patterns during ca...Facebook bigdata to understand regime change and migration patterns during ca...
Facebook bigdata to understand regime change and migration patterns during ca...
 
세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다세계산학관협력총회 Watef 패널을 공지합니다
세계산학관협력총회 Watef 패널을 공지합니다
 
2017 대통령선거 후보수락 유튜브 후보수락 동영상 김찬우 박효찬 박한우
2017 대통령선거 후보수락 유튜브 후보수락 동영상 김찬우 박효찬 박한우2017 대통령선거 후보수락 유튜브 후보수락 동영상 김찬우 박효찬 박한우
2017 대통령선거 후보수락 유튜브 후보수락 동영상 김찬우 박효찬 박한우
 
2017년 인포그래픽스 과제모음
2017년 인포그래픽스 과제모음2017년 인포그래픽스 과제모음
2017년 인포그래픽스 과제모음
 
SNS 매개 학습공동체의 학습네트워크 탐색 : 페이스북 그룹을 중심으로
SNS 매개 학습공동체의 학습네트워크 탐색 : 페이스북 그룹을 중심으로SNS 매개 학습공동체의 학습네트워크 탐색 : 페이스북 그룹을 중심으로
SNS 매개 학습공동체의 학습네트워크 탐색 : 페이스북 그룹을 중심으로
 
2016년 촛불집회의 페이스북 댓글 데이터를 통해 본 하이브리드 미디어 현상
2016년 촛불집회의 페이스북 댓글 데이터를 통해 본 하이브리드 미디어 현상2016년 촛불집회의 페이스북 댓글 데이터를 통해 본 하이브리드 미디어 현상
2016년 촛불집회의 페이스북 댓글 데이터를 통해 본 하이브리드 미디어 현상
 

Developing the korean_internet_network_miner_change

  • 1. Developing the Korean Internet Network Miner(KINM): E-research Tool for Social Network Analysis of Blogospherein South Korea* Anatoliy Gruzd1), Chung Joo Chung2), Jaeeun(Angela) Yoo3), 박한우4) 2009년 한국자료분석학회 가을철 학술대회
  • 2. Anatoliy Gruzd1), A ssistant Professor, School of Information Management, Dalhousie University, Canada E-mail : agruzd@gmail.com Chung Joo Chung2) Ph.D. Candidate, Department of Communication, State University of New York at Buffalo, USA E-mail : idream4you@gmail.com Jaeeun(Angela) Yoo3) 3B.S. Student, Division of Engineering Science, University of Toronto, Canada E-mail : angela.yoo@utoronto.ca 박한우4) (Corresponding A uthor) A ssociate Professor, Department of Media and Information, YeungNam University, Korea. E-mail : hanpark@ ynu.ac.kr 2009년 한국자료분석학회 가을철 학술대회
  • 3. Contents  Introduction  Related Studies: Tools for Blog Network Analysis  Section 1. Development of the Korean Internet Network Miner  Section2. Evaluation of the Name Network Discovery Algorithm  Conclusion Page  3 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 4. Introduction The growing adoption of e-research tools has lead to changes in social and communication research(Jankowski, 2009) Typical technologies in this domain include LexiURL(Thelwall, 2009), Virtual Observatory for the Study of Online Networks (Ackland, 2009), and Internet Community Text Analyzer(ICTA; Gruzd, 2009a). Page  4 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 5. Introduction In contrast to the e-research developments in North America and Europe, Soon and Park(2009) note that digital tools to support e-research are rare in Asia, even in South Korea (Internet World Stats, 2008). Thus, we attempt to develop an e-research tool for automatic discovery of online communication networks on the Korean Web.  First section deals with prior studies related to large-scale blog network  analysis using automatic tools,  Second section illustrates the process of developing our analytic tool called the Korean Internet Network Miner(KINM). Page  5 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 6. Related Studies: Tools for Blog Network Analysis The structure of networks can be measured mathematically and visualized graphically. The shape of a network emerging from online users’ writing and linking choices reflects interest trends. Research on blog networks have confirmed that large-scale online communities are structurally reflected in higher density network neighborhoods through linking(Kelly and Etling, 2008). Page  6 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 7. Related Studies: Tools for Blog Network Analysis Traditional data mining research • Traditional data mining research focuses largely on algorithms for inferring association rules and other statistical correlation measures in a given data set(Kumar et al., 1999; Jung, 2009). For example) • Kelly and Etling(2008) used research firm Morningside Analytics for blog Selection and data mapping along with Fruchterman-Rheingold's "physics model”algorithm to understand the blog networks of the Iran blogosphere. • Gryc and his colleagues(2008) developed categories for blog networks they studied by analyzing key words, post classification, and linking patterns of blogs. Page  7 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 8. Related Studies: Tools for Blog Network Analysis Current research • The current research uses a web-based system for automated text analysis to discover and understand social networks from blog data. • It focuses not only on chain networks—social networks based on the number of messages exchanged between individuals—but also on name networks—social networks built from mining personal names and nicknames. Page  8 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 9. Section 1. Development of the Korean Internet Network Miner As an initial framework for the KINM, we used some of the social network discovery and visualization tools and techniques previously developed by Gruzd(2009). These tools and techniques were developed as part of a General purpose web system for content and network analysis of computer- mediated communication in English called ICTA (available at http://textanalytics.net). Page  9 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 10. Section 1. Development of the Korean Internet Network Miner As an initial framework for the KINM, we used some of the social network discovery and visualization tools and techniques previously developed by Gruzd(2009). These tools and techniques were developed as part of a General purpose web system for content and network analysis of computer- mediated communication in English called ICTA (available at http://textanalytics.net). Barriers> 1. since ICTA only works with texts in English, we had to modify all text processing functions to support Korean texts. 2. ICTA requires the data to be stored in a machine-readable format such as an RSS feed. However, after a manual examination of a number of Korean blogs, we noticed that the majority of them do not provide RSS feeds for their comments data. Page  10 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 11. Section 1. Development of the Korean Internet Network Miner The reason we were especially interested in analyzing comments data was because comments turned out to be a good source for mining social connections among blog readers because comments contain most of The social interactions on a blog. To address this challenge, we created a script using the Kapow Mashup Server (http://www.kapowtech.com) to retrieve comments from a selected Blog automatically and output them as an RSS feed. Page  11 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 12. Section 1. Development of the Korean Internet Network Miner After retrieving the blog data, it was processed to build two types of networks. • First, a chain network was extracted. In the chain network, one commentator is connected to another if the first commentator directly replied to the second commentator by clicking on the "reply-to" button. • However, after manually examining a number of comments on several blogs, we found that there are some comments that are not "reply-to" comments, but are addressing or referencing a previous poster. This observation is in-line with a previous empirical study on online Learning communities by Gruzd(2009a), which discovered that the chain network misses on average 40% of possible connections. Page  12 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 13. Section 1. Development of the Korean Internet Network Miner Name Network> • Instead of just relying on information about who replied to whom, the Name network method starts by automatically finding all mentions of personal names or nicknames in comments and uses them as nodes in a social network. • Next, to discover ties between nodes, the method connects a sender of a comment to all names found in his/her comment. (A more detailed description of this method can be found in Gruzd(2009b).) • Although the name network approach provides additional information about connections among blog commentators, it has its own challenges. - personal name/nickname and a word that just appears to be one. Page  13 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 14. Section 1. Development of the Korean Internet Network Miner For example, the algorithm marked the word 사람 (people) as a reference to another person on the blog. This happened because there was at least one comment in the dataset posted by a person with the "사람" nickname. Figure 1: Sample comment However, in the sample comment, this word does not refer to another online participant; it is used as a noun that means "people". Page  14 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 15. Section 1. Development of the Korean Internet Network Miner Name Network> Another good example of challenges associated with the name/nickname disambiguation problem in comments is the word "2mb". 1. this word can be used as a nickname for one of the blog commentators. 2. it could refer to the capacity of a computer memory (2 megabytes). 3. it could be the alias of the current Korean president, Lee Myung-Bak.  To address these challenges and develop recommendations for the next generation of the name network discovery algorithm, we conducted a semi-automated analysis of all names/nicknames discovered from a sample dataset using our initial algorithm. Page  15 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 16. Section 2. Evaluation of the Name Network Discovery Algorithm To evaluate our automated approach for analyzing communication networks from blog comments, specifically the accuracy of the name Network discovery algorithm, We selected a single blog authored by 방짜(bangzza) from http://blog.ohmynews.com/bangzza Page  16 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 17. Section 2. Evaluation of the Name Network Discovery Algorithm OhMyNews was ranked as one of the top three web sites in terms of blog users in 2009(Rankey.com, 2009) it is frequently ranked as the most popular blog site in Korea, registering over 20 million page views per day during the presidential election. Users of OhMyNews, known as "news guerrillas", contribute news articles on the Web site. OhMyNews allows individuals in far-flung locations to come together, share, and build strong ties and a sense of community—united in ideology even if separated by geographic distance— that fosters a true Grassroots movement (Streitmatter, 2001). Page  17 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 18. Section 2. Evaluation of the Name Network Discovery Algorithm For our tests, we retrieved and analyzed a sample set of 943 comments (posted between April 2008 and April 2009) from the selected blog. In the study, we relied on an interactive tag cloud feature available in KINM To explore and evaluate all names and nicknames that were found automatically (see Figure 2). <Figure 2> An interactive tag cloud showing the 50 most frequently used name/nickname candidates found in the sample dataset Page  18 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 19. Section 2. Evaluation of the Name Network Discovery Algorithm The evaluation procedure involved clicking on each word found by the name network algorithm and exploring the context where each instance of the word was used(see Figure 3). The purpose of this semi-automated analysis was to discover what name/nickname candidates were identified incorrectly and why. <Figure 3> A list of messages containing "2MB” Page  19 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 20. Section 2. Evaluation of the Name Network Discovery Algorithm The following set includes clues suggesting that a word is likely to be a nickname : ● a word candidate is followed by a context word such as "님" = an honorific or "씨" = Mr./Ms.; other possibilities, although rare, include "굮" = Mr. for younger males or "양양" = Miss/Ms. for younger females at the end of a word candidate, and "미스터" = Mr. or "미스" = Miss at the beginning; ● a word candidate contains a combination of characters(Korean, English and/or Chinese), symbols(e.g., underscores, hyphens) and numbers; ● a word candidate appears to be a real name, which is almost always three characters: a single-character last name followed by the two-character first name, which may be found in a dictionary of first names and/or common characters used therein; ● a word candidate is a less common, non-topic word(e.g., "너구리" = raccoon); ● a word candidate is followed by punctuation indicative of someone being addressed (e.g., "/" or ":"); ● a word candidate contains patterns indicative of non-native words-phonetic Koreanization of English (e.g., "미디어몽골" = mediamogul = Media Mogul) or phonetic romanization of Korean(e.g. "jihwaja" = 지화자). Page  20 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 21. Section 2. Evaluation of the Name Network Discovery Algorithm The second set includes clues suggesting that a word is NOT likely to be used as a nickname: ● a word candidate is a phrase—for example, if the nickname input (the "FROM"field) is Used more like a subject line(possible indicators include white spaces and length); ● a word candidate consists of a single character(e.g., "a" or "ㄱ"); ● a word candidate consists of netspeak, including emoticons(e.g. "=_="), slang and abbreviations(e.g., using "2MB" to refer to the current Korean president), and onomatopoeia (e.g. "ㅉㅉ" = tsk tsk, ” ㅋㅋ" = heehee, "하하" = haha, "음" = hmm); ● a word candidate appears more than one time in the comment; ● a word candidate consists of random characters(e.g. "ㅁㄴㅇㄹ" or "asdf"); ● a word candidate is a short, conversational word or phrase(e.g., "나나 " = me, "아이고" = oh no, "그래서" = so/therefore); ● a word candidate is a common word or idea in the given context/topic(e.g., "대한민국" = Republic of Korea, "쥐체사상" = a newly created word used to refer to political fanatics). Page  21 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 22. Conclusion This research briefly reviewed some of the studies related to a large-scale blog analysis using automatic tools. We reviewed the process of developing our own analysis tool KINM. The main goal of KINM is to automate the process of finding communication networks in the Korean blogosphere that accurately represent social interactions among blog readers. To find these networks, the system relies on a set of text mining techniques to look for personal names and nicknames in users’comments. Page  22 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 23. Conclusion To address some of the challenges associated with the automated discovery of names and nicknames in Korean texts, this paper also presented an exploratory study of a sample dataset. The study suggests a set of additional rules to improve the accuracy of the current name/nickname discovery algorithm used in KINM. These additional rules will be incorporated into KINM and evaluated in a subsequent study. Page  23 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회
  • 24. Thank you. Page  24 2009년 한국자료분석학회 가을철 학술대회 료분석학회 가을철 학술대회