Big & Open Data: Challenges for Smartcity

Big and Open data.
Challenges for Smartcity
Victoria López
Grupo G-TeC
www.tecnologiaUCM.es
Universidad Complutense de Madrid
www.tecnologiaUCM.es http://grasia.fdi.ucm.es
ICIST 2014
Valencia
1

Index
• Introduction
• Fighting with Big Data: Genoma data
• What is Big Data?
• Technology transfer: Open Data opportunities
• Developing projects for Smartcity.
• Rmap, a real example in Madrid
• Conclusions
2

Introduction
– Mobile technologies
– Intelligent agents
– Optimization and forecasting
– Bioinformatics, Biostatistics
– …
– www.tecnologiaUCM.es
3

Fighting with the Big Data
• Every day we need to deal with more and more data.
• For many years, new computers with more memory and higher
speed seem to be the solution for data growing.
• Many researching areas which was fighting with the Big Data:
Bioinformatics, Genoma data, DNA, RNA, proteins and, in general all
biological data have been required by computing monitors and
storing in large data bases in several laboratories and researching
centers along the world.
The future of genomics rests on the foundation of the Human Genome Project4

Fighting with the Big Data
• Each time an organization or an individual is not
able to deal with data, a big data problem is
facing.
• Same philosophy than modern Big Data: large
data bases distributed along the world with
parallel processing when available and suitable
• (Sequence alignment and Dynamic Programming)
• The amount of biological data is a big data base.
5

Big Data
From Data Warehouse to Big Data
6
1970 relational model invented
RDBMS declared mainstream till 90s
One-size fits all, Elephant vendors- heavily
encoded even indexing by B-trees.

Alex ' Sandy' Pentland,
director of 'Media Lab' at
Massachusetts Institute of
Technology (MIT)
7
Nowadays bussiness needs a
high avalailability of data, then
new techniques must be
developed: Complex analytics,
Graph Databases

unstructured
data
8
¿Quién genera Big Data?
Progress and innovation are no longer hampered by the ability to collect data,
but the ability to manage, analyze, synthesize, visualize, and discover
knowledge from data collected in a timely manner and in a scalable way

Big Data
Big Data 3+1+1 V’s
9

Big Data
1. High Availability is now a requirement
2. Host and Cloudcomputing
3. Running in parallel
1. Data Aggregation process
2. Analytics on Data
3. GraphDBMSs similarities
4. Not only SQL: Cassandra* and MongoDB**
5. Moving toward ACID, people from Google admit ACID as a
good idea for working with dababases.
*The Apache Cassandra database is the right choice when you need
scalability and high availability without compromising performance.
**Document oriented storage
10
MONGO

11
• Main feature: scalability to many nodes
– Scan of 100 TB in 1 node @ 50 MB/sec = 23 days
– Scan in a cluster of 1000 nodes = 33 minutes
MapReduce
– Parallel programming model
– Simple concept, smart, suitable for multiple applications
– Big datasets  multi-node in multiprocessors
– Sets of nodes: Clusters or Grids (distributed programming)
• By Google (2004)
– Able to process 20 PB per day
– Based on Map & Reduce, classiclal methods in functional programming
related to the classic divide & conquer
– Come from numeric analysis (big matrix products).
Big Data: Map Reduce
MapReduce

• Friendly for non technical users
Map Reduce
12
Big Data: Map Reduce

– UsedbyYahoo!,Facebook,Twitter
Amazon,eBay…
– Canbeusedindifferentarchitectures:
bothclusters(in-house)andgrid
(Cloudcomputing)
http://hadoop.apache.org/
Hadoop
13
Big Data: Hadoop

Big Data: Datamining & Scalability
• Techniques of Datamining (Machine Learning, Data Clustering,
Predictive Models, etc.) are compatible with big data by complex
analytics
• Modeling prices in electricity Spanish markets under uncertainty
G. Miñana, H. Marrao, R. Caro, J. Gil, V. Lopez, B. González , F. Sun et al. (eds.), Knowledge Engineering
and Management, Advances in Intelligent Systems and Computing 214,DOI: 10.1007/978-3-642-37832-
4_46, Springer-Verlag Berlin Heidelberg 2014
• To get a scalable system
– Aggregation
– Generalization
– (Formal specification)
• Not only many cores, many nodes and out of memory data
- Host and Cloudcomputing
- Not all problems can be solve with the same techniques, Hadoop is
not enough
14

Technology transfer
• A great oportunity for researchers working to
transfer technology, who can increase their
efforts in developing new techniques for
– Monitoring data (Sensors, smartphones, …)
– Storing data (Cloudcomputing, Amazon S3, EC2,
Google BigQuery, Tableau …)
– Cleaning, Integrating & Processing data
– data (Data Curation at Scale: The Data Tamer System,
M. Stonebraker et al., CIDR 2013)
– Analysing data (R, SAS… but also Google, Amazon,
eBay..)
– Fully homomorphic encryption & searching on
encrypted data
15

Open Data
“Open data is data that can be freely used, reused and redistributed by anyone –
subject only, at most, to the requirement to attribute and sharealike.”
OpenDefinition.org -
“Open data is data that can be freely used,
reused and redistributed by anyone – subject
only, at most, to the requirement to attribute
and share alike.” OpenDefinition.org
Availability and Access: the data must be
available as a whole and at no more than a
reasonable reproduction cost, preferably by
downloading over the internet. The data
must also be available in a convenient and
modifiable form.
Reuse and Redistribution: the data must be
provided under terms that permit reuse and
redistribution including the intermixing with
other datasets. The data must be machine-
readable.
Universal Participation: everyone must be
able to use, reuse and redistribute – there
should be no discrimination against fields of
endeavour or against persons or groups. For
example, ‘non-commercial’ restrictions that
would prevent ‘commercial’ use, or
restrictions of use for certain purposes (e.g.
only in education), are not allowed.
16

Why Open Data by Open Knowledge Foundation
18

Open Data for Smartcity
• What a citizen can expect when living in a
city?
• Internet of the things
– Libraries
– Public transportation, trafic monitoring
– Pets, devices, cars, even people
• Intelligent agents
– Interacting without our control
– Credit cards control (BBVA case of use)
19

Basic structure
Patrón Cliente/Servidor
PUBLIC
DATA
Web
Service
SERVER CLIENT
WEB
SERVER
20

NEW DATA IS
COLLECTED.
A SERVICE IS GIVEN
query
DATA TRANSFER
21

Data Analytics
FROM (UNSTRUCTURED) DATA TO VALUE
23

Mariam Saucedo
Pilar Torralbo
Daniel Sanz
Recycla.me
Ana Alfaro
Sergio Ballesteros
Lidia Sesma
Héctor Martos
Álvaro Bustillo
Arturo Callejo
Belén Abellanas
Jaime Ramos
Ignacio P. de Ziriza
Victor Torres
Alberto Segovia
Miguel Bueno
Mar Octavio de
Toledo
Antonio Sanmartín
Carlos Fernández
MAPA DE RECURSOS
RECYCLA.TE
24

• Parks and gardens
• Parkings for
• Cars
• Motorbikes
• Bikes
• Recycing Points
• Fixed
• Mobile
• Cloths
• Stations
• Bioetanol
• Gas
• Oil
• Electric
• Routes for bikes
• Vías ciclistas
• Calles seguras
• Áreas de Prioridad Residencial
Madrid – Smart City
RMapRMap
25

Big and Open data.
Challenges for Smartcity
Victoria López
Grupo G-TeC
www.tecnologiaUCM.es
Universidad Complutense de Madrid
ICIST 2014
Valencia

Big & Open Data: Challenges for Smartcity

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Big & Open Data: Challenges for Smartcity

Semelhante a Big & Open Data: Challenges for Smartcity (20)

Mais de Victoria López

Mais de Victoria López (20)

Último

Último (20)

Big & Open Data: Challenges for Smartcity