SlideShare a Scribd company logo
1 of 27
November 17th, 2011
                       www.know-center.at




Information Quality in
Social Media
Presentation at UNSL


Elisabeth Lex
Agenda

 The Know-Center
 The WIQ-EI project
 Why Information Quality on the Web?
 Selected Results
 Conclusion




                                       2
The Know Center – We are...

Austria’s Competence Center for Knowledge Management
and Knowledge Technologies
Link between Science and Industry
A multi-disciplinary team of 40+ Scientists and Developers
Over 575 publications since 2001
100 Master theses, 26 Phd theses, 4 habilitations
Editors of 2 Journals: Journal of Universal Knowledge
Management, Journal of Universal Computer Science
Organizer of the International Conference on Knowledge
Management and Knowledge Technologies (I-KNOW)

                                                             3
The Know Center

2 Areas of Research:
  Knowledge Relationship Discovery:

          Detecting semantic entities, semantic relations in
           unstructured data
          Cross-language and cross-domain search and retrieval
          Automatic analysis of information structure and quality
          User interfaces for visual analysis of large information
           repositories

  Knowledge Services:

          Web 2.0, Collective Intelligence and Social Network Analysis
          Semantic Technologies, Semantic Web, Semantic Retrieval
          Communication and Collaboration Technologies
          Mobile Technologies
                                                                          4
The WIQ-EI Project - Goals

Web Information Quality Evaluation Initiative
3 Objectives:
  Development of Web Content Information Quality Measures
  Plagiarism Detection and Authorship Attribution
  Multilingual Opinion and Sentiment Mining




  Derive algorithms, tools and test data sets



                                                             5
The WIQ-EI Project - Implementation


On a global scale:
  Researcher exchanges between organisations from
   European (Austria, Germany, Spain, Greece) and
   non European countries with expertise in topic
   relevant fields (Argentina, Mexico, India)
  Carry out research secondments, training and
   dissemination activites, challenges, workshops




                                                     6
Agenda

 The Know-Center
 Why Information Quality on the Web?
 Selected Results
 Conclusion




                                       7
Introduction


 On the Web - large amount of potentially useful content
   Navigating is challenging
 Web is changing: User Generated Content, Social Media




                                                           8
Introduction


 On the Web - large amount of potentially useful content
   Navigating is challenging
 Web is changing: User Generated Content, Social Media



  - Social media up to date
  - Wide audience, highly dynamic
  - Open to (almost) anyone
  - Powerful e.g. for media resonance
  analysis




                                                           9
Introduction


 On the Web - large amount of potentially useful content
   Navigating is challenging
 Web is changing: User Generated Content, Social Media



  - Social media up to date
  - Wide audience, highly dynamic
  - Open to (almost) anyone
  - Powerful e.g. for media resonance
  analysis



Information Quality of
Social Media is questionable!                              10
What is Information Quality?

A multi-dimensional concept [Klein, 2001]
Different Types of Information Quality (IQ) [Knight2005]
E.g. [Wang1996]:
  Intrinsic IQ: Accuracy, Objectivity, Believability,
   Reputation
  Accessibility IQ: Accessibility, Security
  Contextual IQ: Relevancy, Value-Added, Timeliness,
   Completness, Amount of Information, Presence of Author
   information [Katerattanakul1999]
  Representational IQ: Interpretability, Ease of
   Understanding, Concise Representation, Consistent
   Representation                                           11
Information Quality – Link to Information
Retrieval, Data Mining




                The Information Retrieval Process




                                                    12
Information Quality – Link to Information
Retrieval, Text Mining




                                     Text Mining




                The Information Retrieval Process




                                                    13
Information Quality – Link to Information
Retrieval, Data Mining


                                                    Enables to retrieve core
                                                    information from
                                                    unstructured text
                                     Text Mining    -   Information Extraction
                                                    -   Clustering
                                                    -   ...




                The Information Retrieval Process




                                                                      14
Information Quality – Link to Information
Retrieval, Data Mining


                                                    Enables to retrieve core
                                                    information from
                                                    unstructured text
                                     Text Mining    -   Information Extraction
           Faceted Search                           -   Clustering
                                                    -   ...




                The Information Retrieval Process




                                                                      15
Information Quality – Link to Information
Retrieval, Data Mining




                                     Text Mining
           Faceted Search




                The Information Retrieval Process




                                                    16
Information Quality – Link to Information
Retrieval, Data Mining
             IQ Dimensions:
             - Objectivity
             - Accuracy
             ...                      Text Mining
           Faceted Search




                 The Information Retrieval Process




                                                     17
Our work – Focus on Media Domain

Goal: Assess intrinsic Information Quality in social
media, traditional media, arbitrary Web content
Several IQ dimensions:
  Objectivity
  Emotionality
  Credibility
  Readibility
  Indepth versus Shallow
  Expert versus Non-Expert
  Personal versus Official
                                                       18
Agenda

 The Know-Center
 Why Information Quality in Media Domain?
 Selected Results
 Conclusion




                                            19
Results
Information Quality Dimension: Objectivity

Task:
  Objectivity Classification in
   Blogs
Use features based on style
properties:
Dataset: Trec Blogs08 - 83 blogs,
12844 blog posts



Results:
 Accuracy of 87% for Objectivity
  Classification in Blogs




                                             20
Results
     Information Quality Dimension: Credibility

       Rank blogs by credibility
           Compare blogs with credible source:

                        Quantity structure
                        Content similarity: Nouns, Verbs+ Adjectives


       Dataset: APA news articles, crawled blogs


       Results:
           Average precision of 83% for blog credibility ranking
           Correlation between quantity structures of blogs and news
                e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79
                                                                                                                       21

[Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.
Results
Web Genre and Quality Classification

 ECML/PKDD Discovery Challenge 2010
    Task 1: Web Genre and Quality Facets

              News/Editorial, Educational, Discussion, Commercial, Personal
               /Leisure, Web Spam
              Bias, Trustworthiness, Neutrality

    Task 2: English Content Quality: Combination of Facets 
     Quality Score
    Task 3: Multilingual Content Quality: German, French

 Dataset: English, German, French Web hosts: NLP
 Features, Content Features, Terms, Links

 Approach: Ensemble Classifier Approach (J48, CFC, SVM)

                                                                               22
Combined Quality Score




             Use Case: Web Archival   23
Results
  Web Genre and Quality Classification
   Challenges:
      Unbalanced and low quality training data (Training data contained
       also Hungarian, Czech,.. Hosts)
      News and Educational hard to separate
      Too few training data for German and French hosts

   Results:
      Methods performs best for Educational/Research (NDCG 0.688),
       Commercial (0.694), and Personal/Leisure (0.583)
      English quality task: NDCG 0.844
      Multilingual quality task: Use topic independent features from English
       hosts

                   German: NDCG 0.792
                   French: NDCG: 0.823                                                                          24

[Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.
Agenda

 The Know-Center
 Why Information Quality in Social Media?
 Selected Results
 Conclusion




                                            25
Conclusions
Summary

 Information Quality (IQ) consists of multiple dimensions
 Depends on Use Case
   BUT: Several dimensions are commonly agreed
    upon
 IQ dimensions can be combined in one quality score
 Supervised Classification often used to assess IQ
   However, training data needed!
 Simple style based features suited to assess IQ
 dimensions
                                                            26
Thank you for your attention!




                                27

More Related Content

What's hot

Digitální kompetence
Digitální kompetenceDigitální kompetence
Digitální kompetenceMichal Černý
 
Lecture semantic lifting_presentation
Lecture semantic lifting_presentationLecture semantic lifting_presentation
Lecture semantic lifting_presentationIKS - Project
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemMarco Grassi
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011SEO CAMP
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNRDatiGovIT
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social MediaMeena Nagarajan
 
Knowledge management and knowledge sharing
Knowledge management and knowledge sharingKnowledge management and knowledge sharing
Knowledge management and knowledge sharingHazel Hall
 
A brief introduction to learning cell
A brief introduction to learning cellA brief introduction to learning cell
A brief introduction to learning cellWei Cheng
 
Knowledge-based generation of educational web pages
Knowledge-based generation of educational web pagesKnowledge-based generation of educational web pages
Knowledge-based generation of educational web pagesStefan Trausan-Matu
 
Unlocking The Value Of Your Information
Unlocking The Value Of Your InformationUnlocking The Value Of Your Information
Unlocking The Value Of Your InformationIntergen
 

What's hot (11)

Digitální kompetence
Digitální kompetenceDigitální kompetence
Digitální kompetence
 
Lecture semantic lifting_presentation
Lecture semantic lifting_presentationLecture semantic lifting_presentation
Lecture semantic lifting_presentation
 
Dh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit systemDh2012 enriching digital libraries contents with pundit system
Dh2012 enriching digital libraries contents with pundit system
 
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
Jean-Marc Lazard d'Exalead - Pioneering hypermedia - SEO Campus 2011
 
Orsi PersDB11
Orsi PersDB11Orsi PersDB11
Orsi PersDB11
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
User-Generated Content on Social Media
User-Generated Content on Social MediaUser-Generated Content on Social Media
User-Generated Content on Social Media
 
Knowledge management and knowledge sharing
Knowledge management and knowledge sharingKnowledge management and knowledge sharing
Knowledge management and knowledge sharing
 
A brief introduction to learning cell
A brief introduction to learning cellA brief introduction to learning cell
A brief introduction to learning cell
 
Knowledge-based generation of educational web pages
Knowledge-based generation of educational web pagesKnowledge-based generation of educational web pages
Knowledge-based generation of educational web pages
 
Unlocking The Value Of Your Information
Unlocking The Value Of Your InformationUnlocking The Value Of Your Information
Unlocking The Value Of Your Information
 

Viewers also liked

Building Information Quality from the Inside Out
Building Information Quality from the Inside OutBuilding Information Quality from the Inside Out
Building Information Quality from the Inside OutCastlebridge Associates
 
Inforum 2016 Keynote: Data and Information Quality
Inforum 2016 Keynote: Data and Information Quality                  Inforum 2016 Keynote: Data and Information Quality
Inforum 2016 Keynote: Data and Information Quality Jay Zaidi
 
Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)Scott Abel
 
MIS 02 foundations of information systems
MIS 02  foundations of information systemsMIS 02  foundations of information systems
MIS 02 foundations of information systemsTushar B Kute
 
Foundation of information system in business
Foundation of information system in businessFoundation of information system in business
Foundation of information system in businessAmrit Banstola
 

Viewers also liked (7)

Introduction to Information Quality
Introduction to Information QualityIntroduction to Information Quality
Introduction to Information Quality
 
Building Information Quality from the Inside Out
Building Information Quality from the Inside OutBuilding Information Quality from the Inside Out
Building Information Quality from the Inside Out
 
Inforum 2016 Keynote: Data and Information Quality
Inforum 2016 Keynote: Data and Information Quality                  Inforum 2016 Keynote: Data and Information Quality
Inforum 2016 Keynote: Data and Information Quality
 
Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)Ensuring Information Quality (June 2008)
Ensuring Information Quality (June 2008)
 
MIS 02 foundations of information systems
MIS 02  foundations of information systemsMIS 02  foundations of information systems
MIS 02 foundations of information systems
 
Foundation of information system in business
Foundation of information system in businessFoundation of information system in business
Foundation of information system in business
 
Quality of information
Quality of informationQuality of information
Quality of information
 

Similar to Information Quality Assessment in the WIQ-EI EU Project

Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataDhaval Thakker
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madnesssemanticsconference
 
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...Think Latin America
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...New York University
 
Case Study: Building a Wiki
Case Study: Building a WikiCase Study: Building a Wiki
Case Study: Building a WikiGoodmind
 
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine  FP7 Call3 presentationiDiff 2008 conference #09 IP-Racine  FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentationBenoit Michel
 
Exploring the Information Ecosystem
Exploring the Information EcosystemExploring the Information Ecosystem
Exploring the Information EcosystemRob Hanna, ECMs
 
TCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data WhodiniTCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data WhodiniTata Consultancy Services
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebFabrizio Orlandi
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012Lee Dirks
 
Open Source Web Content Management Technologies for Libraries
Open Source Web Content Management Technologies for LibrariesOpen Source Web Content Management Technologies for Libraries
Open Source Web Content Management Technologies for LibrariesAnil Mishra
 
Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1ISSIP
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 

Similar to Information Quality Assessment in the WIQ-EI EU Project (20)

Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
New Horizons for Global Content: Embedded intelligence for Dynamic Global Con...
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
Case Study: Building a Wiki
Case Study: Building a WikiCase Study: Building a Wiki
Case Study: Building a Wiki
 
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine  FP7 Call3 presentationiDiff 2008 conference #09 IP-Racine  FP7 Call3 presentation
iDiff 2008 conference #09 IP-Racine FP7 Call3 presentation
 
Rhk38
Rhk38Rhk38
Rhk38
 
Exploring the Information Ecosystem
Exploring the Information EcosystemExploring the Information Ecosystem
Exploring the Information Ecosystem
 
B-S-S Context Aware Information Access
B-S-S  Context Aware Information AccessB-S-S  Context Aware Information Access
B-S-S Context Aware Information Access
 
TCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data WhodiniTCS Innovation Forum 2012 - Big Data Whodini
TCS Innovation Forum 2012 - Big Data Whodini
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 
Open Source Web Content Management Technologies for Libraries
Open Source Web Content Management Technologies for LibrariesOpen Source Web Content Management Technologies for Libraries
Open Source Web Content Management Technologies for Libraries
 
Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1Ijcai nyc ai summit 20140224 v1
Ijcai nyc ai summit 20140224 v1
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Information Quality Assessment in the WIQ-EI EU Project

  • 1. November 17th, 2011 www.know-center.at Information Quality in Social Media Presentation at UNSL Elisabeth Lex
  • 2. Agenda The Know-Center The WIQ-EI project Why Information Quality on the Web? Selected Results Conclusion 2
  • 3. The Know Center – We are... Austria’s Competence Center for Knowledge Management and Knowledge Technologies Link between Science and Industry A multi-disciplinary team of 40+ Scientists and Developers Over 575 publications since 2001 100 Master theses, 26 Phd theses, 4 habilitations Editors of 2 Journals: Journal of Universal Knowledge Management, Journal of Universal Computer Science Organizer of the International Conference on Knowledge Management and Knowledge Technologies (I-KNOW) 3
  • 4. The Know Center 2 Areas of Research:  Knowledge Relationship Discovery:  Detecting semantic entities, semantic relations in unstructured data  Cross-language and cross-domain search and retrieval  Automatic analysis of information structure and quality  User interfaces for visual analysis of large information repositories  Knowledge Services:  Web 2.0, Collective Intelligence and Social Network Analysis  Semantic Technologies, Semantic Web, Semantic Retrieval  Communication and Collaboration Technologies  Mobile Technologies 4
  • 5. The WIQ-EI Project - Goals Web Information Quality Evaluation Initiative 3 Objectives:  Development of Web Content Information Quality Measures  Plagiarism Detection and Authorship Attribution  Multilingual Opinion and Sentiment Mining  Derive algorithms, tools and test data sets 5
  • 6. The WIQ-EI Project - Implementation On a global scale:  Researcher exchanges between organisations from European (Austria, Germany, Spain, Greece) and non European countries with expertise in topic relevant fields (Argentina, Mexico, India)  Carry out research secondments, training and dissemination activites, challenges, workshops 6
  • 7. Agenda The Know-Center Why Information Quality on the Web? Selected Results Conclusion 7
  • 8. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media 8
  • 9. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysis 9
  • 10. Introduction On the Web - large amount of potentially useful content  Navigating is challenging Web is changing: User Generated Content, Social Media - Social media up to date - Wide audience, highly dynamic - Open to (almost) anyone - Powerful e.g. for media resonance analysis Information Quality of Social Media is questionable! 10
  • 11. What is Information Quality? A multi-dimensional concept [Klein, 2001] Different Types of Information Quality (IQ) [Knight2005] E.g. [Wang1996]:  Intrinsic IQ: Accuracy, Objectivity, Believability, Reputation  Accessibility IQ: Accessibility, Security  Contextual IQ: Relevancy, Value-Added, Timeliness, Completness, Amount of Information, Presence of Author information [Katerattanakul1999]  Representational IQ: Interpretability, Ease of Understanding, Concise Representation, Consistent Representation 11
  • 12. Information Quality – Link to Information Retrieval, Data Mining The Information Retrieval Process 12
  • 13. Information Quality – Link to Information Retrieval, Text Mining Text Mining The Information Retrieval Process 13
  • 14. Information Quality – Link to Information Retrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction - Clustering - ... The Information Retrieval Process 14
  • 15. Information Quality – Link to Information Retrieval, Data Mining Enables to retrieve core information from unstructured text Text Mining - Information Extraction Faceted Search - Clustering - ... The Information Retrieval Process 15
  • 16. Information Quality – Link to Information Retrieval, Data Mining Text Mining Faceted Search The Information Retrieval Process 16
  • 17. Information Quality – Link to Information Retrieval, Data Mining IQ Dimensions: - Objectivity - Accuracy ... Text Mining Faceted Search The Information Retrieval Process 17
  • 18. Our work – Focus on Media Domain Goal: Assess intrinsic Information Quality in social media, traditional media, arbitrary Web content Several IQ dimensions:  Objectivity  Emotionality  Credibility  Readibility  Indepth versus Shallow  Expert versus Non-Expert  Personal versus Official 18
  • 19. Agenda The Know-Center Why Information Quality in Media Domain? Selected Results Conclusion 19
  • 20. Results Information Quality Dimension: Objectivity Task:  Objectivity Classification in Blogs Use features based on style properties: Dataset: Trec Blogs08 - 83 blogs, 12844 blog posts Results:  Accuracy of 87% for Objectivity Classification in Blogs 20
  • 21. Results Information Quality Dimension: Credibility Rank blogs by credibility  Compare blogs with credible source:  Quantity structure  Content similarity: Nouns, Verbs+ Adjectives Dataset: APA news articles, crawled blogs Results:  Average precision of 83% for blog credibility ranking  Correlation between quantity structures of blogs and news e.g. Query “Frankreich”, Pearson Correlation Coeff: 0.79 21 [Juffinger, Granitzer, Lex 2009] Blog credibility ranking by exploiting verified content. In Proc. of WICOW in at WWW‘2009.
  • 22. Results Web Genre and Quality Classification ECML/PKDD Discovery Challenge 2010  Task 1: Web Genre and Quality Facets  News/Editorial, Educational, Discussion, Commercial, Personal /Leisure, Web Spam  Bias, Trustworthiness, Neutrality  Task 2: English Content Quality: Combination of Facets  Quality Score  Task 3: Multilingual Content Quality: German, French Dataset: English, German, French Web hosts: NLP Features, Content Features, Terms, Links Approach: Ensemble Classifier Approach (J48, CFC, SVM) 22
  • 23. Combined Quality Score  Use Case: Web Archival 23
  • 24. Results Web Genre and Quality Classification Challenges:  Unbalanced and low quality training data (Training data contained also Hungarian, Czech,.. Hosts)  News and Educational hard to separate  Too few training data for German and French hosts Results:  Methods performs best for Educational/Research (NDCG 0.688), Commercial (0.694), and Personal/Leisure (0.583)  English quality task: NDCG 0.844  Multilingual quality task: Use topic independent features from English hosts  German: NDCG 0.792  French: NDCG: 0.823 24 [Lex et al., 2010]. Assessing the quality of Web content. In Proceedings of the ECML/PKDD Discovery Challenge.
  • 25. Agenda The Know-Center Why Information Quality in Social Media? Selected Results Conclusion 25
  • 26. Conclusions Summary Information Quality (IQ) consists of multiple dimensions Depends on Use Case  BUT: Several dimensions are commonly agreed upon IQ dimensions can be combined in one quality score Supervised Classification often used to assess IQ  However, training data needed! Simple style based features suited to assess IQ dimensions 26
  • 27. Thank you for your attention! 27