Searching for patterns in crowdsourced information

Searching for patterns in
crowdsourced Information

Silvia Puglisi

Table of content

- Let me introduce myself..
- What is crowdsourcing?
- Discovering network dynamics and patterns in
unstructured data.
- Where to go from here..

Let me introduce myself..

2007: Graduated in Computer Engineering from Polimi
[Politecnico di Milano].

Thesis on applications in robotics of a model of the
hippocampal spatial function.

The project involved applying a path-planning algorithm
based on neural networks on a e-puck robot.

http://www.e-puck.org for more info on e-puck


2007: Joined Google as Corporate Operations Engineer.

My responsibilities included maintaining, designing,
diagnosing, troubleshooting and/or updating Google
corporate IT infrastructure and user-facing services.


2010: Joined Google Enterprise team as Technical Account
Manager for Gmail and Postini.

My responsibilities included:
- Develop creative solutions to maximize the adoption of
Google Apps in organisations.
- Work with product and engineering teams to translate
customer needs into a better product experience.
- Develop and implement processes and infrastructure to
scale customer-facing operations.


2012: Left Google to finish M.Sc. Thesis and prepare for
Ph.D.

2012: Graduated from Trinity College Dublin in M.Sc.
program in Management of Information Systems.

Final Thesis: Proposing a method for evaluating the quality
of crowdsourced geographical information.

What is crowdsourcing?

Crowdsourcing can be defined as the application of Open
Source principles to fields outside of software.
Howe, 2006.

What is crowdsourcing?

Crowdsourcing takes a decentralized approach to problem
solving, sourcing tasks that have been performed
traditionally by individuals, to a group of people:
the crowd.

From crowdsourcing to
spontaneous collaboration.

Crowdsourcing initiatives usually starts with a call for
solutions from an organization or an entity.

Although..
Networks dynamics sometimes are also an indirect source
for data and answers to specific problems.

Wikipedia is maybe the most striking example of this
phenomenon, for which people decide to collaborate
spontaneously towards a task.

Discovering networks dynamics and
patterns in unstructured data.

“Some twenty years ago I saw, or thought I saw, a
synchronal or simultaneous flashing of fireflies. I could
hardly believe my eyes, for such a thing to occur among
insects is certainly contrary to all natural laws.”
Philip Laurent, Science Journal 1917

Discovering networks dynamics and
patterns in unstructured data.

Complex network structures describe a wide variety of
systems, of technological and biological importance.

The web itself is an example of a complex network of
pages linked by their hyperlinks.

A social network is instead an idea of a network whose
nodes are the human beings and whose edge are the
various human relationships that occur between them.

The web is a giant bobble of
unstructured data.

The web has hence been developing as an open
environment with infinite possibilities for collaboration and
information sharing.

Users activity on the web now generates content which
provides a variety of diverse information regarding the
interaction between different entities and the world around
them.

This is enhanced in Social Networks where people
voluntarily share information about anything.

Volunteered Information VS web
pages.

Volunteered information constitute snippets of text, most of
the times just a few words, with other media attached:
photos, videos, sounds.

Volunteered information are to web pages what post-its or
snippets are to books.

Volunteered Information VS web
pages.

Volunteer information do not exhibits an explicit network
structure constituted by the explicit link between them.

In the case of a web page, this structure is evident, since
one page can link to other pages explicitly.

Links between volunteered information are instead created
by the relationships between the context of a document.

Defining context..

The context of a document is made of the surrounding
circumstances and facts that influence the meaning of a
sentence, a passage, or even just a picture, a video or an
audio file.

Understanding the context is the key point towards
understand the semantic of a document and hence how
much valuable information is actually contained in it.

Defining context..

Defining context hence means trying to figure out what
can be automatically inferred regarding:

- Where the document was created?
- Who created the document and shared it?
- What does the document describe?
- When was it shared?

Context is the key ingredient.

Context is then the ingredient that adds value to
information.

If a document can be contextually linked to other
documents it becomes more relevant.

It means more information can be inferred regarding that
document.

Which context?

Regarding volunteer information, five types of context can
be identified for a given object:

1) personal,
2) social,
3) geographical,
4) temporal,
5) linguistic.

A network model.

If context is interpreted as a property for a given object, we
find out that at every level, each attribute will define a
derived hierarchy in which an element “belongs” or is a
“child” of another element higher or lower in the hierarchy.

A network model.

Let's imagine the following - followed relationship in a social
network..

John Stewart follows Dave Matthews and Stephen Colbert
Tim Reynolds follows Dave Matthews and Stephen Colbert
Stephen Colbert follows John Stewart
Dave Matthews follows John Stewart and Tim Reynolds

A network model.

Let's now concentrate on attributes for volunteered
information.

Every attribute could describe a node in our system.

Every edge describes with which frequency (or probability)
two attributes are most likely to appear together.

This behaviour can be particularly true for tags networks.

A network model.

Such a model consist hence of N nodes, connected with
probability p between one another, creating a graph with
approximately p N (N-1) / 2 edges distributed randomly.

This is what is called a random graph model, and it is
among the most used models in complex networks theory.

Small world networks.

It is agreed that the relationships between a node and
another in such networks it is not entirely random, but
displays some hints of the underlying organizing principles.

One of such principle is the small-world concept, which
describes how despite their often large size, in complex
networks there is a relatively short path between any two
nodes (Watts, D. J., & Strogatz, S. H., 1998).

Properties of small world networks.

A common property of such networks is that the
relationships between the nodes tend to form cliques.

Cliques may represent circle of acquaintances at a social
level, they can even describe all the users of an online
community that tend to communicate together, or they can
describes relationships between words in different
documents.

Properties of small world networks.

Another important aspect of complex networks to better
understand their properties and dynamics is the degree
distribution, i.e. a measurement of the number of edges at
a given node in the network.

In fact, we would expect that not all nodes in the network
would have the same node degree, but this would be
characterized by a probability distribution function P(k),
which give the probability that a randomly selected node
has exactly k edges.

Search and Quality Ranking.

In Page and Brin PageRank algorithm the Rank of a node
in the network (i.e. a web page), could be calculated as
follow:


Where Bi is the set of documents connected to i, R(i) is the
rank of the given document i, R(j) is the rank of a document
j connected to i, and N(j) is the number of connections from
j.


Both the local clustering coefficient and the degree
distribution for a given node in the network give an estimate
of how much a given node is connected to other nodes
nearby.

Because the model used is built on the document context,
more connections are therefore an indication of a richer
content and a better quality of the information contained in
the document itself.

Privacy and Security.. just some
food for thoughts.

We said that a common property of small world networks is
that the relationships between the nodes tend to form
cliques.

What if this could be applied to the rules in a stateful
firewall?

What if we want to find out which data we are most likely to
share with which people on a social network?

Questions and Answers.

?

Searching for patterns in crowdsourced information

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Searching for patterns in crowdsourced information

Semelhante a Searching for patterns in crowdsourced information (20)

Mais de Silvia Puglisi

Mais de Silvia Puglisi (7)

Último

Último (20)

Searching for patterns in crowdsourced information