SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Searching for patterns in
crowdsourced Information



Silvia Puglisi
Table of content



- Let me introduce myself..
- What is crowdsourcing?
- Discovering network dynamics and patterns in
unstructured data.
- Where to go from here..
Let me introduce myself..

2007: Graduated in Computer Engineering from Polimi
[Politecnico di Milano].

Thesis on applications in robotics of a model of the
hippocampal spatial function.

The project involved applying a path-planning algorithm
based on neural networks on a e-puck robot.

                              http://www.e-puck.org for more info on e-puck
Let me introduce myself..



2007: Joined Google as Corporate Operations Engineer.

My responsibilities included maintaining, designing,
diagnosing, troubleshooting and/or updating Google
corporate IT infrastructure and user-facing services.
Let me introduce myself..

2010: Joined Google Enterprise team as Technical Account
Manager for Gmail and Postini.

My responsibilities included:
- Develop creative solutions to maximize the adoption of
Google Apps in organisations.
- Work with product and engineering teams to translate
customer needs into a better product experience.
- Develop and implement processes and infrastructure to
scale customer-facing operations.
Let me introduce myself..


2012: Left Google to finish M.Sc. Thesis and prepare for
Ph.D.

2012: Graduated from Trinity College Dublin in M.Sc.
program in Management of Information Systems.

Final Thesis: Proposing a method for evaluating the quality
of crowdsourced geographical information.
What is crowdsourcing?




Crowdsourcing can be defined as the application of Open
    Source principles to fields outside of software.
                                               Howe, 2006.
What is crowdsourcing?




Crowdsourcing takes a decentralized approach to problem
solving, sourcing tasks that have been performed
traditionally by individuals, to a group of people:
                          the crowd.
From crowdsourcing to
spontaneous collaboration.

Crowdsourcing initiatives usually starts with a call for
solutions from an organization or an entity.

Although..
Networks dynamics sometimes are also an indirect source
for data and answers to specific problems.

Wikipedia is maybe the most striking example of this
phenomenon, for which people decide to collaborate
spontaneously towards a task.
Discovering networks dynamics and
patterns in unstructured data.



    “Some twenty years ago I saw, or thought I saw, a
  synchronal or simultaneous flashing of fireflies. I could
 hardly believe my eyes, for such a thing to occur among
      insects is certainly contrary to all natural laws.”
                        Philip Laurent, Science Journal 1917
Discovering networks dynamics and
patterns in unstructured data.

Complex network structures describe a wide variety of
systems, of technological and biological importance.

The web itself is an example of a complex network of
pages linked by their hyperlinks.

A social network is instead an idea of a network whose
nodes are the human beings and whose edge are the
various human relationships that occur between them.
The web is a giant bobble of
unstructured data.

The web has hence been developing as an open
environment with infinite possibilities for collaboration and
information sharing.

Users activity on the web now generates content which
provides a variety of diverse information regarding the
interaction between different entities and the world around
them.

This is enhanced in Social Networks where people
voluntarily share information about anything.
Volunteered Information VS web
pages.



Volunteered information constitute snippets of text, most of
the times just a few words, with other media attached:
photos, videos, sounds.

Volunteered information are to web pages what post-its or
snippets are to books.
Volunteered Information VS web
pages.


Volunteer information do not exhibits an explicit network
structure constituted by the explicit link between them.

In the case of a web page, this structure is evident, since
one page can link to other pages explicitly.

Links between volunteered information are instead created
by the relationships between the context of a document.
Defining context..

The context of a document is made of the surrounding
circumstances and facts that influence the meaning of a
sentence, a passage, or even just a picture, a video or an
audio file.

Understanding the context is the key point towards
understand the semantic of a document and hence how
much valuable information is actually contained in it.
Defining context..


 Defining context hence means trying to figure out what
  can be automatically inferred regarding:

    - Where the document was created?
    - Who created the document and shared it?
    - What does the document describe?
    - When was it shared?
Context is the key ingredient.


Context is then the ingredient that adds value to
information.

If a document can be contextually linked to other
documents it becomes more relevant.

It means more information can be inferred regarding that
document.
Which context?

Regarding volunteer information, five types of context can
be identified for a given object:

1) personal,
2) social,
3) geographical,
4) temporal,
5) linguistic.
A network model.



If context is interpreted as a property for a given object, we
find out that at every level, each attribute will define a
derived hierarchy in which an element “belongs” or is a
“child” of another element higher or lower in the hierarchy.
A network model.


Let's imagine the following - followed relationship in a social
network..

John Stewart follows Dave Matthews and Stephen Colbert
Tim Reynolds follows Dave Matthews and Stephen Colbert
Stephen Colbert follows John Stewart
Dave Matthews follows John Stewart and Tim Reynolds
A network model.
A network model.

Let's now concentrate on attributes for volunteered
information.

Every attribute could describe a node in our system.

Every edge describes with which frequency (or probability)
two attributes are most likely to appear together.

This behaviour can be particularly true for tags networks.
A network model.



Such a model consist hence of N nodes, connected with
probability p between one another, creating a graph with
approximately p N (N-1) / 2 edges distributed randomly.

This is what is called a random graph model, and it is
among the most used models in complex networks theory.
Small world networks.


It is agreed that the relationships between a node and
another in such networks it is not entirely random, but
displays some hints of the underlying organizing principles.

One of such principle is the small-world concept, which
describes how despite their often large size, in complex
networks there is a relatively short path between any two
nodes (Watts, D. J., & Strogatz, S. H., 1998).
Properties of small world networks.


A common property of such networks is that the
relationships between the nodes tend to form cliques.

Cliques may represent circle of acquaintances at a social
level, they can even describe all the users of an online
community that tend to communicate together, or they can
describes relationships between words in different
documents.
Properties of small world networks.

Another important aspect of complex networks to better
understand their properties and dynamics is the degree
distribution, i.e. a measurement of the number of edges at
a given node in the network.

In fact, we would expect that not all nodes   in the network
would have the same node degree, but          this would be
characterized by a probability distribution   function P(k),
which give the probability that a randomly    selected node
has exactly k edges.
Where to go from here?
Search and Quality Ranking.



In Page and Brin PageRank algorithm the Rank of a node
in the network (i.e. a web page), could be calculated as
follow:
Search and Quality Ranking.




Where Bi is the set of documents connected to i, R(i) is the
rank of the given document i, R(j) is the rank of a document
j connected to i, and N(j) is the number of connections from
j.
Search and Quality Ranking.

Both the local clustering coefficient and the degree
distribution for a given node in the network give an estimate
of how much a given node is connected to other nodes
nearby.

Because the model used is built on the document context,
more connections are therefore an indication of a richer
content and a better quality of the information contained in
the document itself.
Privacy and Security.. just some
food for thoughts.


We said that a common property of small world networks is
that the relationships between the nodes tend to form
cliques.

What if this could be applied to the rules in a stateful
firewall?

What if we want to find out which data we are most likely to
share with which people on a social network?
Questions and Answers.




               ?

Mais conteúdo relacionado

Mais procurados

A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis WorkshopData Works MD
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorIAEME Publication
 
Making the invisible visible through SNA
Making the invisible visible through SNAMaking the invisible visible through SNA
Making the invisible visible through SNAMYRA School of Business
 
Social Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive IntelligenceSocial Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive IntelligenceAugust Jackson
 
An Introduction to Network Theory
An Introduction to Network TheoryAn Introduction to Network Theory
An Introduction to Network TheorySocialphysicist
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018Arsalan Khan
 
Data Ethics for Mathematicians
Data Ethics for MathematiciansData Ethics for Mathematicians
Data Ethics for MathematiciansMason Porter
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit Ipkaviya
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation Ratnesh Shah
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part ITHomas Plotkowiak
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsNoah Flower
 
Least Cost Influence by Mapping Online Social Networks
Least Cost Influence by Mapping Online Social Networks Least Cost Influence by Mapping Online Social Networks
Least Cost Influence by Mapping Online Social Networks paperpublications3
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction SurveyPatrick Walter
 

Mais procurados (20)

A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 
Social Network Analysis Workshop
Social Network Analysis WorkshopSocial Network Analysis Workshop
Social Network Analysis Workshop
 
Multimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behaviorMultimode network based efficient and scalable learning of collective behavior
Multimode network based efficient and scalable learning of collective behavior
 
Roles In Networks
Roles In NetworksRoles In Networks
Roles In Networks
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Making the invisible visible through SNA
Making the invisible visible through SNAMaking the invisible visible through SNA
Making the invisible visible through SNA
 
Social Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive IntelligenceSocial Network Analysis for Competitive Intelligence
Social Network Analysis for Competitive Intelligence
 
An Introduction to Network Theory
An Introduction to Network TheoryAn Introduction to Network Theory
An Introduction to Network Theory
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Data Ethics for Mathematicians
Data Ethics for MathematiciansData Ethics for Mathematicians
Data Ethics for Mathematicians
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit I
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
Network literacy-high-res
Network literacy-high-resNetwork literacy-high-res
Network literacy-high-res
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis Platforms
 
Least Cost Influence by Mapping Online Social Networks
Least Cost Influence by Mapping Online Social Networks Least Cost Influence by Mapping Online Social Networks
Least Cost Influence by Mapping Online Social Networks
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction Survey
 

Semelhante a Searching for patterns in crowdsourced information

Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebNoshir Contractor
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMarko Rodriguez
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...Daniel Katz
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and OverviewDuke Network Analysis Center
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Jonathan Stray
 
Network Media - A Final Lecture
Network Media - A Final LectureNetwork Media - A Final Lecture
Network Media - A Final Lecturevogmae
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)Duke Network Analysis Center
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measuresdnac
 
Objectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative ConnotationsObjectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative ConnotationsBeth Johnson
 
Using the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social InteractionsUsing the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social InteractionsDmitry Paranyushkin
 
00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and OverviewDuke Network Analysis Center
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Hybrid social learning networks internal d4 dl research note - 05-06-14
Hybrid social learning networks   internal d4 dl research note - 05-06-14Hybrid social learning networks   internal d4 dl research note - 05-06-14
Hybrid social learning networks internal d4 dl research note - 05-06-14University of the West of England
 
Library trends and_theory
Library trends and_theoryLibrary trends and_theory
Library trends and_theoryJanet Tillotson
 
15 minute co teaching week 16
15 minute co teaching week 1615 minute co teaching week 16
15 minute co teaching week 16Kazim Pardhan
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksEditor IJCATR
 
Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data scienceHan Woo PARK
 
Networking Portfolio Term Paper
Networking Portfolio Term PaperNetworking Portfolio Term Paper
Networking Portfolio Term PaperWriters Per Hour
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 

Semelhante a Searching for patterns in crowdsourced information (20)

Open Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and WebOpen Grid Forum workshop on Social Networks, Semantic Grids and Web
Open Grid Forum workshop on Social Networks, Semantic Grids and Web
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network Research
 
DREaM Event 2: Louise Cooke
DREaM Event 2: Louise CookeDREaM Event 2: Louise Cooke
DREaM Event 2: Louise Cooke
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
Network Media - A Final Lecture
Network Media - A Final LectureNetwork Media - A Final Lecture
Network Media - A Final Lecture
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
Objectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative ConnotationsObjectification Is A Word That Has Many Negative Connotations
Objectification Is A Word That Has Many Negative Connotations
 
Using the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social InteractionsUsing the Framework of Networks to Enhance Learning and Social Interactions
Using the Framework of Networks to Enhance Learning and Social Interactions
 
00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview00 Introduction to SN&H: Key Concepts and Overview
00 Introduction to SN&H: Key Concepts and Overview
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Hybrid social learning networks internal d4 dl research note - 05-06-14
Hybrid social learning networks   internal d4 dl research note - 05-06-14Hybrid social learning networks   internal d4 dl research note - 05-06-14
Hybrid social learning networks internal d4 dl research note - 05-06-14
 
Library trends and_theory
Library trends and_theoryLibrary trends and_theory
Library trends and_theory
 
15 minute co teaching week 16
15 minute co teaching week 1615 minute co teaching week 16
15 minute co teaching week 16
 
Mining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social NetworksMining and Analyzing Academic Social Networks
Mining and Analyzing Academic Social Networks
 
Mapping big data science
Mapping big data scienceMapping big data science
Mapping big data science
 
Networking Portfolio Term Paper
Networking Portfolio Term PaperNetworking Portfolio Term Paper
Networking Portfolio Term Paper
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 

Mais de Silvia Puglisi

Personal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark NetPersonal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark NetSilvia Puglisi
 
Analysis, modelling and protection of online private data.
Analysis, modelling and protection of online private data.Analysis, modelling and protection of online private data.
Analysis, modelling and protection of online private data.Silvia Puglisi
 
On line footprint @upc
On line footprint @upcOn line footprint @upc
On line footprint @upcSilvia Puglisi
 
Resource recommendation vs privacy enhancement
Resource recommendation vs privacy enhancementResource recommendation vs privacy enhancement
Resource recommendation vs privacy enhancementSilvia Puglisi
 

Mais de Silvia Puglisi (7)

you_never_surf_alone
you_never_surf_aloneyou_never_surf_alone
you_never_surf_alone
 
Mobilitapp
MobilitappMobilitapp
Mobilitapp
 
Personal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark NetPersonal tracking devices - A Journey Into The True Dark Net
Personal tracking devices - A Journey Into The True Dark Net
 
Analysis, modelling and protection of online private data.
Analysis, modelling and protection of online private data.Analysis, modelling and protection of online private data.
Analysis, modelling and protection of online private data.
 
On line footprint @upc
On line footprint @upcOn line footprint @upc
On line footprint @upc
 
On line footprint
On line footprintOn line footprint
On line footprint
 
Resource recommendation vs privacy enhancement
Resource recommendation vs privacy enhancementResource recommendation vs privacy enhancement
Resource recommendation vs privacy enhancement
 

Último

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Último (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Searching for patterns in crowdsourced information

  • 1. Searching for patterns in crowdsourced Information Silvia Puglisi
  • 2. Table of content - Let me introduce myself.. - What is crowdsourcing? - Discovering network dynamics and patterns in unstructured data. - Where to go from here..
  • 3. Let me introduce myself.. 2007: Graduated in Computer Engineering from Polimi [Politecnico di Milano]. Thesis on applications in robotics of a model of the hippocampal spatial function. The project involved applying a path-planning algorithm based on neural networks on a e-puck robot. http://www.e-puck.org for more info on e-puck
  • 4. Let me introduce myself.. 2007: Joined Google as Corporate Operations Engineer. My responsibilities included maintaining, designing, diagnosing, troubleshooting and/or updating Google corporate IT infrastructure and user-facing services.
  • 5. Let me introduce myself.. 2010: Joined Google Enterprise team as Technical Account Manager for Gmail and Postini. My responsibilities included: - Develop creative solutions to maximize the adoption of Google Apps in organisations. - Work with product and engineering teams to translate customer needs into a better product experience. - Develop and implement processes and infrastructure to scale customer-facing operations.
  • 6. Let me introduce myself.. 2012: Left Google to finish M.Sc. Thesis and prepare for Ph.D. 2012: Graduated from Trinity College Dublin in M.Sc. program in Management of Information Systems. Final Thesis: Proposing a method for evaluating the quality of crowdsourced geographical information.
  • 7. What is crowdsourcing? Crowdsourcing can be defined as the application of Open Source principles to fields outside of software. Howe, 2006.
  • 8. What is crowdsourcing? Crowdsourcing takes a decentralized approach to problem solving, sourcing tasks that have been performed traditionally by individuals, to a group of people: the crowd.
  • 9. From crowdsourcing to spontaneous collaboration. Crowdsourcing initiatives usually starts with a call for solutions from an organization or an entity. Although.. Networks dynamics sometimes are also an indirect source for data and answers to specific problems. Wikipedia is maybe the most striking example of this phenomenon, for which people decide to collaborate spontaneously towards a task.
  • 10. Discovering networks dynamics and patterns in unstructured data. “Some twenty years ago I saw, or thought I saw, a synchronal or simultaneous flashing of fireflies. I could hardly believe my eyes, for such a thing to occur among insects is certainly contrary to all natural laws.” Philip Laurent, Science Journal 1917
  • 11. Discovering networks dynamics and patterns in unstructured data. Complex network structures describe a wide variety of systems, of technological and biological importance. The web itself is an example of a complex network of pages linked by their hyperlinks. A social network is instead an idea of a network whose nodes are the human beings and whose edge are the various human relationships that occur between them.
  • 12. The web is a giant bobble of unstructured data. The web has hence been developing as an open environment with infinite possibilities for collaboration and information sharing. Users activity on the web now generates content which provides a variety of diverse information regarding the interaction between different entities and the world around them. This is enhanced in Social Networks where people voluntarily share information about anything.
  • 13. Volunteered Information VS web pages. Volunteered information constitute snippets of text, most of the times just a few words, with other media attached: photos, videos, sounds. Volunteered information are to web pages what post-its or snippets are to books.
  • 14. Volunteered Information VS web pages. Volunteer information do not exhibits an explicit network structure constituted by the explicit link between them. In the case of a web page, this structure is evident, since one page can link to other pages explicitly. Links between volunteered information are instead created by the relationships between the context of a document.
  • 15. Defining context.. The context of a document is made of the surrounding circumstances and facts that influence the meaning of a sentence, a passage, or even just a picture, a video or an audio file. Understanding the context is the key point towards understand the semantic of a document and hence how much valuable information is actually contained in it.
  • 16. Defining context.. Defining context hence means trying to figure out what can be automatically inferred regarding: - Where the document was created? - Who created the document and shared it? - What does the document describe? - When was it shared?
  • 17. Context is the key ingredient. Context is then the ingredient that adds value to information. If a document can be contextually linked to other documents it becomes more relevant. It means more information can be inferred regarding that document.
  • 18. Which context? Regarding volunteer information, five types of context can be identified for a given object: 1) personal, 2) social, 3) geographical, 4) temporal, 5) linguistic.
  • 19. A network model. If context is interpreted as a property for a given object, we find out that at every level, each attribute will define a derived hierarchy in which an element “belongs” or is a “child” of another element higher or lower in the hierarchy.
  • 20. A network model. Let's imagine the following - followed relationship in a social network.. John Stewart follows Dave Matthews and Stephen Colbert Tim Reynolds follows Dave Matthews and Stephen Colbert Stephen Colbert follows John Stewart Dave Matthews follows John Stewart and Tim Reynolds
  • 22. A network model. Let's now concentrate on attributes for volunteered information. Every attribute could describe a node in our system. Every edge describes with which frequency (or probability) two attributes are most likely to appear together. This behaviour can be particularly true for tags networks.
  • 23. A network model. Such a model consist hence of N nodes, connected with probability p between one another, creating a graph with approximately p N (N-1) / 2 edges distributed randomly. This is what is called a random graph model, and it is among the most used models in complex networks theory.
  • 24. Small world networks. It is agreed that the relationships between a node and another in such networks it is not entirely random, but displays some hints of the underlying organizing principles. One of such principle is the small-world concept, which describes how despite their often large size, in complex networks there is a relatively short path between any two nodes (Watts, D. J., & Strogatz, S. H., 1998).
  • 25. Properties of small world networks. A common property of such networks is that the relationships between the nodes tend to form cliques. Cliques may represent circle of acquaintances at a social level, they can even describe all the users of an online community that tend to communicate together, or they can describes relationships between words in different documents.
  • 26. Properties of small world networks. Another important aspect of complex networks to better understand their properties and dynamics is the degree distribution, i.e. a measurement of the number of edges at a given node in the network. In fact, we would expect that not all nodes in the network would have the same node degree, but this would be characterized by a probability distribution function P(k), which give the probability that a randomly selected node has exactly k edges.
  • 27. Where to go from here?
  • 28. Search and Quality Ranking. In Page and Brin PageRank algorithm the Rank of a node in the network (i.e. a web page), could be calculated as follow:
  • 29. Search and Quality Ranking. Where Bi is the set of documents connected to i, R(i) is the rank of the given document i, R(j) is the rank of a document j connected to i, and N(j) is the number of connections from j.
  • 30. Search and Quality Ranking. Both the local clustering coefficient and the degree distribution for a given node in the network give an estimate of how much a given node is connected to other nodes nearby. Because the model used is built on the document context, more connections are therefore an indication of a richer content and a better quality of the information contained in the document itself.
  • 31. Privacy and Security.. just some food for thoughts. We said that a common property of small world networks is that the relationships between the nodes tend to form cliques. What if this could be applied to the rules in a stateful firewall? What if we want to find out which data we are most likely to share with which people on a social network?