SlideShare uma empresa Scribd logo
1 de 48
Big and Open Data
Challenges for Smartcity
Victoria López
Grupo G-TeC
www.tecnologiaUCM.es
Universidad Complutense de Madrid
Big and Open data. Challenges for Smartcity
• Introduction
• Fighting with Big Data: Genoma Data
• Big Data. Big Projects
• Open Data. Technology Transfer Opportunities
• Smartcity. Big and Open Systems
• Madrid as Smartcity
• Conclusions
2
Introduction
Our Goal: to transfer technology and knowledge
– Mobile technologies applyed to environment
– Intelligent agents
– Optimization and forecasting from data
– Bioinformatics, Biostatistics
G-TeC group: statisticians, physicists, mathematicians,
economists and several computer scientists.
– www.tecnologiaUCM.es
Fighting with the Big Data
• Every day we need to deal with more and more data.
• For many years, new computers with more memory and higher
speed seem to be the solution for data growing (Elephant vendors).
• Many researching areas which was fighting with the Big Data:
Bioinformatics, Genoma data, DNA, RNA, proteins and, in general all
biological data have been required by computing monitors and
storing in large data bases in several laboratories and researching
centers along the world.
The future of genomics rests on the foundation of the Human Genome Project4
Fighting with the Big Data
• Each time an organization or an individual is not able
to deal with data, a big data problem is facing.
• Human Genoma Project managed with same
philosophy than modern Big Data: large data bases
distributed along the world with parallel processing
when available and suitable.
• Our experience: Sequence alignment and its
optimization with Dynamic Programming and
their heuristics.
• The amount of biological data is a Big Data base.
• Adding new sequences, searching and forecasting are
task very similar than those we face in every Big Data
problem.
5
22/05/2014
Vineyards in La Geria, Lanzarote
6
Case of Use. Looking for a Fungus
• Application to infections in agricultural
crops when it is no possible to identify
the real fungus.
• The responsible needs to make
decisions about what to do, what
medicine apply, or procedure is better.
– A fragment of fungus DNA must be
sequenced in the lab.
– Then the scientist looks for it in molecular
data bases by means of sequence
searching (“DB homology search”).
– Some alignment algorithms (Blast, Fasta)
are executed to return the best matches.
• gtttacgctctacaaccctttgtgaacatacctacaactgttg
cttcggcgggtagggtctccgcgaccctcccggcctcccgcct
ccgggcgggtcggcgcccgccggaggataaccaaactctgatt
taacgacgtttcttctgagtggtacaagcaaataatcaaaact
tttaacaaccggatctcttggttctggcatcgatgaagaacgc
agcgaaatgcgataagtaatgtgaat
The sequence
22/05/2014 7
1. EBI: European Bioinformatics Institute
2. Choose the tools available into the web site
a. Fasta3 
b. Select DATABASE:
• Nucleic ACIDS
• FUNGI
c. Fit sequences and run queries
3. A sorted list (but not complete) from better to
worst similarity is returned.
Data Base and Algorithm Selection
PIC 2014, Shanghai
Case of Use
22/05/2014 8
EBI Web Site
PIC 2014, Shanghai
Case of Use
22/05/2014 PIC 2014, Shanghai 9
Web Toolbox in EBI
Case of Use
22/05/2014 10
Algorithm Fasta 3
PIC 2014, Shanghai
Case of Use
22/05/2014 11
DATABASES NUCLEIC ACIDS: FUNGI
PIC 2014, Shanghai
Case of Use
22/05/2014 12
Fit sequences and run FASTA 3
PIC 2014, Shanghai
Case of Use
22/05/2014 13
The output
• FASTA searches a protein or DNA sequence data bank
• version 3.3t09 May 18, 2001
• Please cite:
• W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
• @:1-: 241 nt
•
• vs EMBL Fungi library
• searching /ebi/services/idata/v225/fastadb/em_fun library
• 104701680 residues in 66478 sequences
• statistics extrapolated from 60000 to 61164 sequences
• Expectation_n fit: rho(ln(x))= -1.2290+/-0.000361; mu= 72.1313+/- 0.026
• mean_var=907.6270+/-295.007, 0's: 68 Z-trim: 4246 B-trim: 15652 in 3/79
• Lambda= 0.0426
• FASTA (3.39 May 2001) function [optimized, +5/-4 matrix (5:-4)] ktup: 6
• join: 48, opt: 33, gap-pen: -16/ -4, width: 16
• Scan time: 3.180
• The best scores are: opt bits E(61164)
• EM_FUN:CGL301988 AJ301988.1 Colletotrichum glo (1484) [f] 1184 88 5.7e-17
• EM_FUN:AF090855 AF090855.1 Colletotrichum gloe ( 500) [f] 1205 88 7.3e-17
• EM_FUN:CGL301986 AJ301986.1 Colletotrichum glo (1484) [f] 1166 87 1.2e-16
• EM_FUN:CGL301908 AJ301908.1 Colletotrichum glo (2868) [f] 1148 87 1.3e-16
• EM_FUN:CGL301909 AJ301909.1 Colletotrichum glo (2868) [f] 1148 87 1.3e-16
• EM_FUN:CGL301907 AJ301907.1 Colletotrichum glo (2867) [f] 1148 87 1.3e-16
• EM_FUN:CGL301919 AJ301919.1 Colletotrichum glo (1171) [f] 1166 87 1.6e-16
• EM_FUN:CGL301977 AJ301977.1 Colletotrichum glo (1876) [f] 1148 86 2e-16
• EM_FUN:CFR301912 AJ301912.1 Colletotrichum fra (2870) [f] 1137 86 2.1e-16
PIC 2014, Shanghai
Case of Use
Our background about Bioinformatics
• Bioinformatics (Master in researching in
Informatics, UCM)
• Several Master Thesis & publications
– Alignment of sequences with R and Rhadoop*
– Analysis & Visualization with R Language and
Chernoff faces
– Others
14
Big Data
From Data Warehouse to Big Data (large Data Bases)
15
1970 relational model invented
RDBMS declared mainstream till 90s
One-size fits all, Elephant vendors- heavily
encoded even indexing by B-trees.
Alex ' Sandy' Pentland, director of 'Media Lab' at
Massachusetts Institute of Technology (MIT):
The big data revolution,
2013 Campus Party Europe
16
Nowadays bussiness needs a high
avalailability of data, then new
techniques must be developed:
Complex analytics, Graph Databases
Data Volume is increasing
exponentially
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
unstructured
data
17
¿Quién genera Big Data?
Progress and innovation are no longer hampered by the ability to collect data,
but the ability to manage, analyze, synthesize, visualize, and discover
knowledge from data collected in a timely manner and in a scalable way
Big Data
Big Data 3+1+1 V’s
18
From data to value
• Big Data Collection
– Monitoring
– Data cleaning and integration
– Hosted Data Platforms and the Cloud
• Big Data Storage
– Modern Data Bases
– Distributed Computing Platforms
– NoSQL, NewSQL
• Big Data Systems
– Security
– Multicore scalability
– Visualization and User Interfaces
• Big Data Analytics
– Fast algorithms
– Data compression
– Machine learning tools
– Visualization & Reporting
19
The MIT proposal stage list
to deal with Big Data
Big Data in use
1. High Availability is now a requirement
2. Host (not only in house) and Cloudcomputing
3. Running in parallel
1. Data Aggregation process
2. Analytics on Data
3. GraphDBMSs similarities
4. Not only SQL: Cassandra* and MongoDB**
*The Apache Cassandra database is the right choice when you need
scalability and high availability without compromising performance.
**Document oriented storage
20
MONGO
21
• Main feature: scalability to many nodes
– Scan of 100 TB in 1 node @ 50 MB/sec = 23 days
– Scan in a cluster of 1000 nodes = 33 minutes
MapReduce
– Parallel programming model
– Simple concept, smart, suitable for multiple applications
– Big datasets  multi-node in multiprocessors
– Sets of nodes: Clusters or Grids (distributed programming)
• By Google (2004)
– Able to process 20 PB per day
– Based on Map & Reduce, classiclal methods in functional programming
related to the classic divide & conquer
– Come from numeric analysis (big matrix products).
Big Data: Map Reduce
MapReduce
• Friendly for non technical users
Map Reduce
22
Big Data: Map Reduce
– UsedbyYahoo!,Facebook,Twitter
Amazon,eBay…
– Canbeusedindifferentarchitectures:
bothclusters(in-house)andgrid
(Cloudcomputing)
– StrormandSparkaresamemodel“in
memory”insteadofindisk
http://hadoop.apache.org/
Hadoop
23
Big Data: Hadoop
More technical information
• http://www.slideshare.net/vlopezlo
24www.hortonworks.com www.coursera.com www.Bigdatauniversity.com www.mit.edu
Technology Transfer Opportunities
• A great opportunity for researchers working to transfer
technology, who can increase their efforts in
developing new techniques in optimization of:
– Monitoring data (Sensors, smartphones, …)
– Storing data (Cloud Computing, Amazon S3, EC2, Google
BigQuery, Tableau …)
– Cleaning, Integrating & Processing data (Data Curation at
Scale: The Data Tamer System, M. Stonebraker et al., CIDR 2013)
– Analysing data (R, SAS… but also Google, Amazon, eBay...)
– Encryption & searching on encrypted data
– Techniques of Data Mining (Machine Learning, Data
Clustering, Predictive Models, ...) which are compatible
with big data by complex analytics
25
Big Data. Big Projects.
• Google
• eBay
• Amazon
• Twitter
• …
• They develop big projects with their big data,
but also many business get their data to make
analysis.
• Government data. Public data.
26
Working with Big
Data in G-TeC group
28
Academia & Industry Working Together
OMUS
Industry
know-how
and
expertise
Data
Collection Big
Data
and
Analytics
Patents,
Intellectual
Property and
other output
Doctoral
Thesis: joint
guidance
University
Theoretical
Models &
Research
Open Data
“Open data is data that can be freely used, reused and redistributed by anyone –
subject only, at most, to the requirement to attribute and sharealike.”
OpenDefinition.org -
“Open data is data that can be freely used,
reused and redistributed by anyone – subject
only, at most, to the requirement to attribute
and share alike.” OpenDefinition.org
Availability and Access: the data must be
available as a whole and at no more than a
reasonable reproduction cost, preferably by
downloading over the internet. The data
must also be available in a convenient and
modifiable form.
Reuse and Redistribution: the data must be
provided under terms that permit reuse and
redistribution including the intermixing with
other datasets. The data must be machine-
readable.
Universal Participation: everyone must be
able to use, reuse and redistribute – there
should be no discrimination against fields of
endeavour or against persons or groups. For
example, ‘non-commercial’ restrictions that
would prevent ‘commercial’ use, or
restrictions of use for certain purposes (e.g.
only in education), are not allowed.
30
Open Data
31
Why Open Data by Open Knowledge Foundation
32
Open Data for Smartcity
• What a citizen can expect when living in a
city?
• Internet of the things
– Libraries
– Public transportation, trafic monitoring
– Pets, devices, cars, even people
• Intelligent agents
– Interacting without our control
– Credit cards control (BBVA case of use)
33
C-KAN
• The Comprehensive Knowledge Archive
Network (CKAN) is a web-based open source
data management system for the storage and
distribution of data, such as spreadsheets
and the contents of databases. It is inspired
by the package management capabilities
common to open source operating systems
like Linux.
34
• Its code base is maintained by the Open Knowledge
Foundation.
• The system is used both as a public platform on Datahub and
in various government data catalogues (UK's data.gov.uk, the
Dutch National Data Register, the United States government's
Data.gov and the Australian government's "Gov 2.0“)
Basic structure
Patrón Cliente/Servidor
PUBLIC
DATA
Web
Service
SERVER CLIENT
WEB
SERVER
35
Smartcity concept
• Large amount of people. Big cities.
– Search 7 thousand differences
• Smartcity business.
• The role of technology in the city: efficiency & security
• Normalization of the concept of Smartcity (May, 2014)
– Better quality of life. Security
– Sustainability
– Innovation opportunities
– Multidiscipline: social researchers, engineers, architects, …
• Relationships are in change. Based on mobile
technologies (smartphones, tablets, internet of the
things,…)
• Transverse developing projects: sensors and monitoring
devices, connectivity, platform, services in the cloud. 36
Smartcity concept
• Large amount of non structured information
• Machine learning, big data technologies, internet
of the things, intelligent systems are needed.
• Technology development as a service in all areas:
1. Structure:
– Environment, infrastructure (water, energy, material,
mobility, nature), built domain
2. Society:
– pubic space, functions, people
3. Data:
– information flows, performance
37
Mariam Saucedo
Pilar Torralbo
Daniel Sanz
Recycla.me
Ana Alfaro
Sergio Ballesteros
Lidia Sesma
Héctor Martos
Álvaro Bustillo
Arturo Callejo
Belén Abellanas
Jaime Ramos
Ignacio P. de Ziriza
Victor Torres
Alberto Segovia
Miguel Bueno
Mar Octavio de
Toledo
Antonio Sanmartín
Carlos Fernández
MAPA DE RECURSOS
RECYCLA.TE
38
• Parks and gardens
• Parkings for
• Cars
• Motorbikes
• Bikes
• Recycing Points
• Fixed
• Mobile
• Cloths
• Stations
• Bioetanol
• Gas
• Oil
• Electric
• Routes for bikes
• Vías ciclistas
• Calles seguras
• Residential Priority Areas
Madrid – Smart City
39
40
NEW DATA IS
COLLECTED.
A SERVICE IS GIVEN
query
DATA TRANSFER
41
Recycla.me
42
Data Analytics, Data Scientist
FROM (UNSTRUCTURED) DATA TO VALUE
43
•PIC 2014
MyConference
Be ready at PIC 2014 with
MyConference
Main
Menu
Access to
Committees
Venue and
localization
Extra
Information
https://play.google.com/store/apps/details?id=es.ucm.myconference
Conclusions
47
Big Data, Open Data and Smartcity
• A great opportunity for researchers working to transfer
technology, who can increase their efforts in developing
new techniques in optimization of:
– Monitoring data
– Storing data
– Cleaning, Integrating & Processing data
– Analysing data
– Encryption & searching on encrypted data
– Techniques of Data Mining
• A great future work in relation to development new smart
cities in environment, security and infrastructures.
Big and Open Data
Challenges for Smartcity
Victoria López
Grupo G-TeC
www.tecnologiaUCM.es
Universidad Complutense de Madrid

Mais conteúdo relacionado

Semelhante a Big&open data challenges for smartcity-PIC2014 Shanghai

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013oj08
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Andreas Kamilaris
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaMaria de la Iglesia
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. maigva
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGGeoffrey Fox
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00Pistoia Alliance
 
Big & Open Data: Challenges for Smartcity
Big & Open Data:  Challenges for SmartcityBig & Open Data:  Challenges for Smartcity
Big & Open Data: Challenges for SmartcityVictoria López
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data ScienceFeyzi R. Bagirov
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Geoffrey Fox
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015terradue
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Toolsijsrd.com
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Miningijcsit
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
Fortune Time Institute: Big Data - Challenges for Smartcity
Fortune Time Institute: Big Data - Challenges for SmartcityFortune Time Institute: Big Data - Challenges for Smartcity
Fortune Time Institute: Big Data - Challenges for SmartcityVictoria López
 

Semelhante a Big&open data challenges for smartcity-PIC2014 Shanghai (20)

Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Computational intelligence for big data analytics bda 2013
Computational intelligence for big data analytics   bda 2013Computational intelligence for big data analytics   bda 2013
Computational intelligence for big data analytics bda 2013
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...
 
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la IglesiaBIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
BIMCV, Banco de Imagen Medica de la Comunidad Valenciana. María de la Iglesia
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
Big & Open Data: Challenges for Smartcity
Big & Open Data:  Challenges for SmartcityBig & Open Data:  Challenges for Smartcity
Big & Open Data: Challenges for Smartcity
 
Introduction to Big Data and Data Science
Introduction to Big Data and Data ScienceIntroduction to Big Data and Data Science
Introduction to Big Data and Data Science
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Center...
 
MDIS workshop 2015
MDIS workshop 2015MDIS workshop 2015
MDIS workshop 2015
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
Survey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data MiningSurvey of the Euro Currency Fluctuation by Using Data Mining
Survey of the Euro Currency Fluctuation by Using Data Mining
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Fortune Time Institute: Big Data - Challenges for Smartcity
Fortune Time Institute: Big Data - Challenges for SmartcityFortune Time Institute: Big Data - Challenges for Smartcity
Fortune Time Institute: Big Data - Challenges for Smartcity
 

Mais de Victoria López

Alan turing uva-presentationdec-2019
Alan turing uva-presentationdec-2019Alan turing uva-presentationdec-2019
Alan turing uva-presentationdec-2019Victoria López
 
Seminar UvA 2018- socialbigdata
Seminar UvA  2018- socialbigdataSeminar UvA  2018- socialbigdata
Seminar UvA 2018- socialbigdataVictoria López
 
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALES
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALESBIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALES
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALESVictoria López
 
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
ICCES'2016  BIG DATA IN HEALTHCARE AND SOCIAL SCIENCESICCES'2016  BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCESVictoria López
 
Presentación Gupo G-TeC en Social Big Data
Presentación Gupo G-TeC en Social Big DataPresentación Gupo G-TeC en Social Big Data
Presentación Gupo G-TeC en Social Big DataVictoria López
 
Big data systems and analytics
Big data systems and analyticsBig data systems and analytics
Big data systems and analyticsVictoria López
 
Big Data. Complejidad,algoritmos y su procesamiento
Big Data. Complejidad,algoritmos y su procesamientoBig Data. Complejidad,algoritmos y su procesamiento
Big Data. Complejidad,algoritmos y su procesamientoVictoria López
 
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...Victoria López
 
G te c sesion1a-bioinformatica y big data
G te c sesion1a-bioinformatica y big dataG te c sesion1a-bioinformatica y big data
G te c sesion1a-bioinformatica y big dataVictoria López
 
G te c sesion1b-casos de uso
G te c sesion1b-casos de usoG te c sesion1b-casos de uso
G te c sesion1b-casos de usoVictoria López
 
G te c sesion2a-data collection
G te c sesion2a-data collectionG te c sesion2a-data collection
G te c sesion2a-data collectionVictoria López
 
G tec sesion2b-host-cloud y cloudcomputing
G tec sesion2b-host-cloud y cloudcomputingG tec sesion2b-host-cloud y cloudcomputing
G tec sesion2b-host-cloud y cloudcomputingVictoria López
 
G te c sesion3a-bases de datos modernas
G te c sesion3a-bases de datos modernasG te c sesion3a-bases de datos modernas
G te c sesion3a-bases de datos modernasVictoria López
 
G te c sesion3b- mapreduce
G te c sesion3b- mapreduceG te c sesion3b- mapreduce
G te c sesion3b- mapreduceVictoria López
 
G te c sesion4a-bigdatasystemsanalytics
G te c sesion4a-bigdatasystemsanalyticsG te c sesion4a-bigdatasystemsanalytics
G te c sesion4a-bigdatasystemsanalyticsVictoria López
 
G te c sesion4b-complejidad y tpa
G te c sesion4b-complejidad y tpaG te c sesion4b-complejidad y tpa
G te c sesion4b-complejidad y tpaVictoria López
 
Open Data para Smartcity-Facultad de Estudios Estadísticos
Open Data para Smartcity-Facultad de Estudios EstadísticosOpen Data para Smartcity-Facultad de Estudios Estadísticos
Open Data para Smartcity-Facultad de Estudios EstadísticosVictoria López
 
Deep Learning + R by Gabriel Valverde
Deep Learning + R by Gabriel ValverdeDeep Learning + R by Gabriel Valverde
Deep Learning + R by Gabriel ValverdeVictoria López
 
Curso Big Data. Introducción a Deep Learning by Gabriel Valverde Castilla
Curso Big Data. Introducción a  Deep Learning by Gabriel Valverde CastillaCurso Big Data. Introducción a  Deep Learning by Gabriel Valverde Castilla
Curso Big Data. Introducción a Deep Learning by Gabriel Valverde CastillaVictoria López
 

Mais de Victoria López (20)

Alan turing uva-presentationdec-2019
Alan turing uva-presentationdec-2019Alan turing uva-presentationdec-2019
Alan turing uva-presentationdec-2019
 
Seminar UvA 2018- socialbigdata
Seminar UvA  2018- socialbigdataSeminar UvA  2018- socialbigdata
Seminar UvA 2018- socialbigdata
 
Jornada leiden short
Jornada leiden shortJornada leiden short
Jornada leiden short
 
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALES
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALESBIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALES
BIG DATA EN CIENCIAS DE LA SALUD Y CIENCIAS SOCIALES
 
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
ICCES'2016  BIG DATA IN HEALTHCARE AND SOCIAL SCIENCESICCES'2016  BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
ICCES'2016 BIG DATA IN HEALTHCARE AND SOCIAL SCIENCES
 
Presentación Gupo G-TeC en Social Big Data
Presentación Gupo G-TeC en Social Big DataPresentación Gupo G-TeC en Social Big Data
Presentación Gupo G-TeC en Social Big Data
 
Big data systems and analytics
Big data systems and analyticsBig data systems and analytics
Big data systems and analytics
 
Big Data. Complejidad,algoritmos y su procesamiento
Big Data. Complejidad,algoritmos y su procesamientoBig Data. Complejidad,algoritmos y su procesamiento
Big Data. Complejidad,algoritmos y su procesamiento
 
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...
APLICACIÓN DE TÉCNICAS DE OPTIMIZACIÓN Y BIG DATA AL PROBLEMA DE BÚSQUEDA...
 
G te c sesion1a-bioinformatica y big data
G te c sesion1a-bioinformatica y big dataG te c sesion1a-bioinformatica y big data
G te c sesion1a-bioinformatica y big data
 
G te c sesion1b-casos de uso
G te c sesion1b-casos de usoG te c sesion1b-casos de uso
G te c sesion1b-casos de uso
 
G te c sesion2a-data collection
G te c sesion2a-data collectionG te c sesion2a-data collection
G te c sesion2a-data collection
 
G tec sesion2b-host-cloud y cloudcomputing
G tec sesion2b-host-cloud y cloudcomputingG tec sesion2b-host-cloud y cloudcomputing
G tec sesion2b-host-cloud y cloudcomputing
 
G te c sesion3a-bases de datos modernas
G te c sesion3a-bases de datos modernasG te c sesion3a-bases de datos modernas
G te c sesion3a-bases de datos modernas
 
G te c sesion3b- mapreduce
G te c sesion3b- mapreduceG te c sesion3b- mapreduce
G te c sesion3b- mapreduce
 
G te c sesion4a-bigdatasystemsanalytics
G te c sesion4a-bigdatasystemsanalyticsG te c sesion4a-bigdatasystemsanalytics
G te c sesion4a-bigdatasystemsanalytics
 
G te c sesion4b-complejidad y tpa
G te c sesion4b-complejidad y tpaG te c sesion4b-complejidad y tpa
G te c sesion4b-complejidad y tpa
 
Open Data para Smartcity-Facultad de Estudios Estadísticos
Open Data para Smartcity-Facultad de Estudios EstadísticosOpen Data para Smartcity-Facultad de Estudios Estadísticos
Open Data para Smartcity-Facultad de Estudios Estadísticos
 
Deep Learning + R by Gabriel Valverde
Deep Learning + R by Gabriel ValverdeDeep Learning + R by Gabriel Valverde
Deep Learning + R by Gabriel Valverde
 
Curso Big Data. Introducción a Deep Learning by Gabriel Valverde Castilla
Curso Big Data. Introducción a  Deep Learning by Gabriel Valverde CastillaCurso Big Data. Introducción a  Deep Learning by Gabriel Valverde Castilla
Curso Big Data. Introducción a Deep Learning by Gabriel Valverde Castilla
 

Último

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Big&open data challenges for smartcity-PIC2014 Shanghai

  • 1. Big and Open Data Challenges for Smartcity Victoria López Grupo G-TeC www.tecnologiaUCM.es Universidad Complutense de Madrid
  • 2. Big and Open data. Challenges for Smartcity • Introduction • Fighting with Big Data: Genoma Data • Big Data. Big Projects • Open Data. Technology Transfer Opportunities • Smartcity. Big and Open Systems • Madrid as Smartcity • Conclusions 2
  • 3. Introduction Our Goal: to transfer technology and knowledge – Mobile technologies applyed to environment – Intelligent agents – Optimization and forecasting from data – Bioinformatics, Biostatistics G-TeC group: statisticians, physicists, mathematicians, economists and several computer scientists. – www.tecnologiaUCM.es
  • 4. Fighting with the Big Data • Every day we need to deal with more and more data. • For many years, new computers with more memory and higher speed seem to be the solution for data growing (Elephant vendors). • Many researching areas which was fighting with the Big Data: Bioinformatics, Genoma data, DNA, RNA, proteins and, in general all biological data have been required by computing monitors and storing in large data bases in several laboratories and researching centers along the world. The future of genomics rests on the foundation of the Human Genome Project4
  • 5. Fighting with the Big Data • Each time an organization or an individual is not able to deal with data, a big data problem is facing. • Human Genoma Project managed with same philosophy than modern Big Data: large data bases distributed along the world with parallel processing when available and suitable. • Our experience: Sequence alignment and its optimization with Dynamic Programming and their heuristics. • The amount of biological data is a Big Data base. • Adding new sequences, searching and forecasting are task very similar than those we face in every Big Data problem. 5
  • 6. 22/05/2014 Vineyards in La Geria, Lanzarote 6 Case of Use. Looking for a Fungus • Application to infections in agricultural crops when it is no possible to identify the real fungus. • The responsible needs to make decisions about what to do, what medicine apply, or procedure is better. – A fragment of fungus DNA must be sequenced in the lab. – Then the scientist looks for it in molecular data bases by means of sequence searching (“DB homology search”). – Some alignment algorithms (Blast, Fasta) are executed to return the best matches. • gtttacgctctacaaccctttgtgaacatacctacaactgttg cttcggcgggtagggtctccgcgaccctcccggcctcccgcct ccgggcgggtcggcgcccgccggaggataaccaaactctgatt taacgacgtttcttctgagtggtacaagcaaataatcaaaact tttaacaaccggatctcttggttctggcatcgatgaagaacgc agcgaaatgcgataagtaatgtgaat The sequence
  • 7. 22/05/2014 7 1. EBI: European Bioinformatics Institute 2. Choose the tools available into the web site a. Fasta3  b. Select DATABASE: • Nucleic ACIDS • FUNGI c. Fit sequences and run queries 3. A sorted list (but not complete) from better to worst similarity is returned. Data Base and Algorithm Selection PIC 2014, Shanghai Case of Use
  • 8. 22/05/2014 8 EBI Web Site PIC 2014, Shanghai Case of Use
  • 9. 22/05/2014 PIC 2014, Shanghai 9 Web Toolbox in EBI Case of Use
  • 10. 22/05/2014 10 Algorithm Fasta 3 PIC 2014, Shanghai Case of Use
  • 11. 22/05/2014 11 DATABASES NUCLEIC ACIDS: FUNGI PIC 2014, Shanghai Case of Use
  • 12. 22/05/2014 12 Fit sequences and run FASTA 3 PIC 2014, Shanghai Case of Use
  • 13. 22/05/2014 13 The output • FASTA searches a protein or DNA sequence data bank • version 3.3t09 May 18, 2001 • Please cite: • W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 • @:1-: 241 nt • • vs EMBL Fungi library • searching /ebi/services/idata/v225/fastadb/em_fun library • 104701680 residues in 66478 sequences • statistics extrapolated from 60000 to 61164 sequences • Expectation_n fit: rho(ln(x))= -1.2290+/-0.000361; mu= 72.1313+/- 0.026 • mean_var=907.6270+/-295.007, 0's: 68 Z-trim: 4246 B-trim: 15652 in 3/79 • Lambda= 0.0426 • FASTA (3.39 May 2001) function [optimized, +5/-4 matrix (5:-4)] ktup: 6 • join: 48, opt: 33, gap-pen: -16/ -4, width: 16 • Scan time: 3.180 • The best scores are: opt bits E(61164) • EM_FUN:CGL301988 AJ301988.1 Colletotrichum glo (1484) [f] 1184 88 5.7e-17 • EM_FUN:AF090855 AF090855.1 Colletotrichum gloe ( 500) [f] 1205 88 7.3e-17 • EM_FUN:CGL301986 AJ301986.1 Colletotrichum glo (1484) [f] 1166 87 1.2e-16 • EM_FUN:CGL301908 AJ301908.1 Colletotrichum glo (2868) [f] 1148 87 1.3e-16 • EM_FUN:CGL301909 AJ301909.1 Colletotrichum glo (2868) [f] 1148 87 1.3e-16 • EM_FUN:CGL301907 AJ301907.1 Colletotrichum glo (2867) [f] 1148 87 1.3e-16 • EM_FUN:CGL301919 AJ301919.1 Colletotrichum glo (1171) [f] 1166 87 1.6e-16 • EM_FUN:CGL301977 AJ301977.1 Colletotrichum glo (1876) [f] 1148 86 2e-16 • EM_FUN:CFR301912 AJ301912.1 Colletotrichum fra (2870) [f] 1137 86 2.1e-16 PIC 2014, Shanghai Case of Use
  • 14. Our background about Bioinformatics • Bioinformatics (Master in researching in Informatics, UCM) • Several Master Thesis & publications – Alignment of sequences with R and Rhadoop* – Analysis & Visualization with R Language and Chernoff faces – Others 14
  • 15. Big Data From Data Warehouse to Big Data (large Data Bases) 15 1970 relational model invented RDBMS declared mainstream till 90s One-size fits all, Elephant vendors- heavily encoded even indexing by B-trees.
  • 16. Alex ' Sandy' Pentland, director of 'Media Lab' at Massachusetts Institute of Technology (MIT): The big data revolution, 2013 Campus Party Europe 16 Nowadays bussiness needs a high avalailability of data, then new techniques must be developed: Complex analytics, Graph Databases Data Volume is increasing exponentially – 44x increase from 2009 2020 – From 0.8 zettabytes to 35zb
  • 17. unstructured data 17 ¿Quién genera Big Data? Progress and innovation are no longer hampered by the ability to collect data, but the ability to manage, analyze, synthesize, visualize, and discover knowledge from data collected in a timely manner and in a scalable way
  • 18. Big Data Big Data 3+1+1 V’s 18
  • 19. From data to value • Big Data Collection – Monitoring – Data cleaning and integration – Hosted Data Platforms and the Cloud • Big Data Storage – Modern Data Bases – Distributed Computing Platforms – NoSQL, NewSQL • Big Data Systems – Security – Multicore scalability – Visualization and User Interfaces • Big Data Analytics – Fast algorithms – Data compression – Machine learning tools – Visualization & Reporting 19 The MIT proposal stage list to deal with Big Data
  • 20. Big Data in use 1. High Availability is now a requirement 2. Host (not only in house) and Cloudcomputing 3. Running in parallel 1. Data Aggregation process 2. Analytics on Data 3. GraphDBMSs similarities 4. Not only SQL: Cassandra* and MongoDB** *The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. **Document oriented storage 20 MONGO
  • 21. 21 • Main feature: scalability to many nodes – Scan of 100 TB in 1 node @ 50 MB/sec = 23 days – Scan in a cluster of 1000 nodes = 33 minutes MapReduce – Parallel programming model – Simple concept, smart, suitable for multiple applications – Big datasets  multi-node in multiprocessors – Sets of nodes: Clusters or Grids (distributed programming) • By Google (2004) – Able to process 20 PB per day – Based on Map & Reduce, classiclal methods in functional programming related to the classic divide & conquer – Come from numeric analysis (big matrix products). Big Data: Map Reduce MapReduce
  • 22. • Friendly for non technical users Map Reduce 22 Big Data: Map Reduce
  • 23. – UsedbyYahoo!,Facebook,Twitter Amazon,eBay… – Canbeusedindifferentarchitectures: bothclusters(in-house)andgrid (Cloudcomputing) – StrormandSparkaresamemodel“in memory”insteadofindisk http://hadoop.apache.org/ Hadoop 23 Big Data: Hadoop
  • 24. More technical information • http://www.slideshare.net/vlopezlo 24www.hortonworks.com www.coursera.com www.Bigdatauniversity.com www.mit.edu
  • 25. Technology Transfer Opportunities • A great opportunity for researchers working to transfer technology, who can increase their efforts in developing new techniques in optimization of: – Monitoring data (Sensors, smartphones, …) – Storing data (Cloud Computing, Amazon S3, EC2, Google BigQuery, Tableau …) – Cleaning, Integrating & Processing data (Data Curation at Scale: The Data Tamer System, M. Stonebraker et al., CIDR 2013) – Analysing data (R, SAS… but also Google, Amazon, eBay...) – Encryption & searching on encrypted data – Techniques of Data Mining (Machine Learning, Data Clustering, Predictive Models, ...) which are compatible with big data by complex analytics 25
  • 26. Big Data. Big Projects. • Google • eBay • Amazon • Twitter • … • They develop big projects with their big data, but also many business get their data to make analysis. • Government data. Public data. 26
  • 27. Working with Big Data in G-TeC group
  • 28. 28
  • 29. Academia & Industry Working Together OMUS Industry know-how and expertise Data Collection Big Data and Analytics Patents, Intellectual Property and other output Doctoral Thesis: joint guidance University Theoretical Models & Research
  • 30. Open Data “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” OpenDefinition.org - “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike.” OpenDefinition.org Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine- readable. Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed. 30
  • 32. Why Open Data by Open Knowledge Foundation 32
  • 33. Open Data for Smartcity • What a citizen can expect when living in a city? • Internet of the things – Libraries – Public transportation, trafic monitoring – Pets, devices, cars, even people • Intelligent agents – Interacting without our control – Credit cards control (BBVA case of use) 33
  • 34. C-KAN • The Comprehensive Knowledge Archive Network (CKAN) is a web-based open source data management system for the storage and distribution of data, such as spreadsheets and the contents of databases. It is inspired by the package management capabilities common to open source operating systems like Linux. 34 • Its code base is maintained by the Open Knowledge Foundation. • The system is used both as a public platform on Datahub and in various government data catalogues (UK's data.gov.uk, the Dutch National Data Register, the United States government's Data.gov and the Australian government's "Gov 2.0“)
  • 36. Smartcity concept • Large amount of people. Big cities. – Search 7 thousand differences • Smartcity business. • The role of technology in the city: efficiency & security • Normalization of the concept of Smartcity (May, 2014) – Better quality of life. Security – Sustainability – Innovation opportunities – Multidiscipline: social researchers, engineers, architects, … • Relationships are in change. Based on mobile technologies (smartphones, tablets, internet of the things,…) • Transverse developing projects: sensors and monitoring devices, connectivity, platform, services in the cloud. 36
  • 37. Smartcity concept • Large amount of non structured information • Machine learning, big data technologies, internet of the things, intelligent systems are needed. • Technology development as a service in all areas: 1. Structure: – Environment, infrastructure (water, energy, material, mobility, nature), built domain 2. Society: – pubic space, functions, people 3. Data: – information flows, performance 37
  • 38. Mariam Saucedo Pilar Torralbo Daniel Sanz Recycla.me Ana Alfaro Sergio Ballesteros Lidia Sesma Héctor Martos Álvaro Bustillo Arturo Callejo Belén Abellanas Jaime Ramos Ignacio P. de Ziriza Victor Torres Alberto Segovia Miguel Bueno Mar Octavio de Toledo Antonio Sanmartín Carlos Fernández MAPA DE RECURSOS RECYCLA.TE 38
  • 39. • Parks and gardens • Parkings for • Cars • Motorbikes • Bikes • Recycing Points • Fixed • Mobile • Cloths • Stations • Bioetanol • Gas • Oil • Electric • Routes for bikes • Vías ciclistas • Calles seguras • Residential Priority Areas Madrid – Smart City 39
  • 40. 40
  • 41. NEW DATA IS COLLECTED. A SERVICE IS GIVEN query DATA TRANSFER 41
  • 43. Data Analytics, Data Scientist FROM (UNSTRUCTURED) DATA TO VALUE 43
  • 45. Be ready at PIC 2014 with MyConference Main Menu Access to Committees Venue and localization Extra Information
  • 47. Conclusions 47 Big Data, Open Data and Smartcity • A great opportunity for researchers working to transfer technology, who can increase their efforts in developing new techniques in optimization of: – Monitoring data – Storing data – Cleaning, Integrating & Processing data – Analysing data – Encryption & searching on encrypted data – Techniques of Data Mining • A great future work in relation to development new smart cities in environment, security and infrastructures.
  • 48. Big and Open Data Challenges for Smartcity Victoria López Grupo G-TeC www.tecnologiaUCM.es Universidad Complutense de Madrid

Notas do Editor

  1. GRASIA: Agentes inteligentes e ingeniería del software
  2. Esta plantilla se puede usar como archivo de inicio para proporcionar actualizaciones de los hitos del proyecto. Secciones Para agregar secciones, haga clic con el botón secundario del mouse en una diapositiva. Las secciones pueden ayudarle a organizar las diapositivas o a facilitar la colaboración entre varios autores. Notas Use la sección Notas para las notas de entrega o para proporcionar detalles adicionales al público. Vea las notas en la vista Presentación durante la presentación. Tenga en cuenta el tamaño de la fuente (es importante para la accesibilidad, visibilidad, grabación en vídeo y producción en línea) Colores coordinados Preste especial atención a los gráficos, diagramas y cuadros de texto. Tenga en cuenta que los asistentes imprimirán en blanco y negro o escala de grises. Ejecute una prueba de impresión para asegurarse de que los colores son los correctos cuando se imprime en blanco y negro puros y escala de grises. Gráficos y tablas En breve: si es posible, use colores y estilos uniformes y que no distraigan. Etiquete todos los gráficos y tablas.
  3. ¿Cuáles son las dependencias que afectan a la escala de tiempo, costo y resultado de este proyecto?
  4. Este Esta presentación, que se recomienda ver en modo de presentación, muestra las nuevas funciones de PowerPoint. Estas diapositivas están diseñadas para ofrecerle excelentes ideas para las presentaciones que creará en PowerPoint 2010. Para obtener más plantillas de muestra, haga clic en la pestaña Archivo y después, en la ficha Nuevo, haga clic en Plantillas de muestra.