Enviar pesquisa
Carregar
The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler
•
2 gostaram
•
1,126 visualizações
Romeo Kienzler
Seguir
Tecnologia
Negócios
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 108
Baixar agora
Baixar para ler offline
Recomendados
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
IntelAPAC
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
Gleb Otochkin
Vortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-data
Aravindharamanan S
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
Big Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld
Recomendados
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
IntelAPAC
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Sumeet Singh
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
Gleb Otochkin
Vortrag ralph behrens_ibm-data
Vortrag ralph behrens_ibm-data
Aravindharamanan S
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
Roman Nikitchenko
Big Data Performance and Capacity Management
Big Data Performance and Capacity Management
rightsize
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
DataWorks Summit
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
Donghan Kim
Hadoop Overview
Hadoop Overview
EMC
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
Big Data Processing Using Hadoop Infrastructure
Big Data Processing Using Hadoop Infrastructure
Dmitry Buzdin
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Hadoop sqoop
Hadoop sqoop
Wei-Yu Chen
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
Hive
Hive
Min Zhou
Burst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copies
Alluxio, Inc.
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Yahoo Developer Network
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Hortonworks
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Bio bigdata
Bio bigdata
Mk Kim
Introduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
10c introduction
10c introduction
mapr-academy
Hadoop workshop
Hadoop workshop
Fang Mac
Big Data Benchmarking
Big Data Benchmarking
Venkata Naga Ravi
Big data ppt
Big data ppt
Thirunavukkarasu Ps
SQL on Hadoop - 12th Swiss Big Data User Group Meeting, 3rd of July, 2014, ET...
SQL on Hadoop - 12th Swiss Big Data User Group Meeting, 3rd of July, 2014, ET...
Romeo Kienzler
SQL on Hadoop
SQL on Hadoop
Swiss Big Data User Group
Mais conteúdo relacionado
Mais procurados
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
Donghan Kim
Hadoop Overview
Hadoop Overview
EMC
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio, Inc.
Big Data Processing Using Hadoop Infrastructure
Big Data Processing Using Hadoop Infrastructure
Dmitry Buzdin
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
WANdisco Plc
Hadoop sqoop
Hadoop sqoop
Wei-Yu Chen
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Richard McDougall
Hive
Hive
Min Zhou
Burst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copies
Alluxio, Inc.
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Yahoo Developer Network
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Hortonworks
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
DataWorks Summit/Hadoop Summit
Bio bigdata
Bio bigdata
Mk Kim
Introduction to Hadoop Administration
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
10c introduction
10c introduction
mapr-academy
Hadoop workshop
Hadoop workshop
Fang Mac
Big Data Benchmarking
Big Data Benchmarking
Venkata Naga Ravi
Big data ppt
Big data ppt
Thirunavukkarasu Ps
Mais procurados
(20)
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
Hadoop Overview
Hadoop Overview
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Big Data Processing Using Hadoop Infrastructure
Big Data Processing Using Hadoop Infrastructure
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hadoop sqoop
Hadoop sqoop
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
Hive
Hive
Burst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copies
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Jan 2013 HUG: Cloud-Friendly Hadoop and Hive
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Demystify Big Data Breakfast Briefing: Herb Cunitz, Hortonworks
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
Bio bigdata
Bio bigdata
Introduction to Hadoop Administration
Introduction to Hadoop Administration
10c introduction
10c introduction
Hadoop workshop
Hadoop workshop
Big Data Benchmarking
Big Data Benchmarking
Big data ppt
Big data ppt
Destaque
SQL on Hadoop - 12th Swiss Big Data User Group Meeting, 3rd of July, 2014, ET...
SQL on Hadoop - 12th Swiss Big Data User Group Meeting, 3rd of July, 2014, ET...
Romeo Kienzler
SQL on Hadoop
SQL on Hadoop
Swiss Big Data User Group
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Romeo Kienzler
Digital Banking
Digital Banking
Kristof Breesch
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Romeo Kienzler
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Romeo Kienzler
Destaque
(6)
SQL on Hadoop - 12th Swiss Big Data User Group Meeting, 3rd of July, 2014, ET...
SQL on Hadoop - 12th Swiss Big Data User Group Meeting, 3rd of July, 2014, ET...
SQL on Hadoop
SQL on Hadoop
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Digital Banking
Digital Banking
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Semelhante a The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
Hadoop Fundamentals I
Hadoop Fundamentals I
Romeo Kienzler
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
inside-BigData.com
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
Romeo Kienzler
Big data nyu
Big data nyu
Edward Capriolo
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Richard McDougall
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Data Con LA
Big Data and OSS at IBM
Big Data and OSS at IBM
Boulder Java User's Group
Inroduction to Big Data
Inroduction to Big Data
Omnia Safaan
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
Daniel Martin
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
MongoDB
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
Cloudera, Inc.
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
KeithETD_CTO
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
John Sing
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Daniela Zuppini
Ibm integrated analytics system
Ibm integrated analytics system
ModusOptimum
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
Pranav Kulkarni
BIG DATA
BIG DATA
Shashank Shetty
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
inside-BigData.com
Semelhante a The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler
(20)
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Hadoop Fundamentals I
Hadoop Fundamentals I
IBM Data Centric Systems & OpenPOWER
IBM Data Centric Systems & OpenPOWER
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
BigData processing in the cloud – Guest Lecture - University of Applied Scien...
Big data nyu
Big data nyu
Is your cloud ready for Big Data? Strata NY 2013
Is your cloud ready for Big Data? Strata NY 2013
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Big Data and OSS at IBM
Big Data and OSS at IBM
Inroduction to Big Data
Inroduction to Big Data
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
EMC Isilon Database Converged deck
EMC Isilon Database Converged deck
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Ibm integrated analytics system
Ibm integrated analytics system
Introduction to Big Data by Manouj Bongirr
Introduction to Big Data by Manouj Bongirr
BIG DATA
BIG DATA
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Mais de Romeo Kienzler
Parallelization Stategies of DeepLearning Neural Network Training
Parallelization Stategies of DeepLearning Neural Network Training
Romeo Kienzler
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Romeo Kienzler
Love & Innovative technology presented by a technology pioneer and an AI expe...
Love & Innovative technology presented by a technology pioneer and an AI expe...
Romeo Kienzler
Blockchain Technology Book Vernisage
Blockchain Technology Book Vernisage
Romeo Kienzler
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Romeo Kienzler
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
Romeo Kienzler
Apache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine Learning
Romeo Kienzler
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Romeo Kienzler
Geo Python16 keynote
Geo Python16 keynote
Romeo Kienzler
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
Romeo Kienzler
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Romeo Kienzler
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
Romeo Kienzler
TDWI_DW2014_SQLNoSQL_DBAAS
TDWI_DW2014_SQLNoSQL_DBAAS
Romeo Kienzler
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
Romeo Kienzler
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
Romeo Kienzler
DBaaS Bluemix Meetup DACH 26.8.14
DBaaS Bluemix Meetup DACH 26.8.14
Romeo Kienzler
Cloud Databases, Developer Week Nuernberg 2014
Cloud Databases, Developer Week Nuernberg 2014
Romeo Kienzler
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Romeo Kienzler
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Romeo Kienzler
BlueMix – IBM CIO Leadership Exchange Europe 27/28.5.14 - Berlin - Romeo Kien...
BlueMix – IBM CIO Leadership Exchange Europe 27/28.5.14 - Berlin - Romeo Kien...
Romeo Kienzler
Mais de Romeo Kienzler
(20)
Parallelization Stategies of DeepLearning Neural Network Training
Parallelization Stategies of DeepLearning Neural Network Training
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Love & Innovative technology presented by a technology pioneer and an AI expe...
Love & Innovative technology presented by a technology pioneer and an AI expe...
Blockchain Technology Book Vernisage
Blockchain Technology Book Vernisage
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
IBM Middle East Data Science Connect 2016 - Doha, Qatar
IBM Middle East Data Science Connect 2016 - Doha, Qatar
Apache SystemML - Declarative Large-Scale Machine Learning
Apache SystemML - Declarative Large-Scale Machine Learning
DeepLearning and Advanced Machine Learning on IoT
DeepLearning and Advanced Machine Learning on IoT
Geo Python16 keynote
Geo Python16 keynote
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
Scala, Apache Spark, The PlayFramework and Docker in IBM Platform As A Service
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
TDWI_DW2014_SQLNoSQL_DBAAS
TDWI_DW2014_SQLNoSQL_DBAAS
Cloudant Overview Bluemix Meetup from Lisa Neddam
Cloudant Overview Bluemix Meetup from Lisa Neddam
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
DBaaS Bluemix Meetup DACH 26.8.14
DBaaS Bluemix Meetup DACH 26.8.14
Cloud Databases, Developer Week Nuernberg 2014
Cloud Databases, Developer Week Nuernberg 2014
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
Cloudfoundry / Bluemix tutorials, compressed in 4 Hours
BlueMix – IBM CIO Leadership Exchange Europe 27/28.5.14 - Berlin - Romeo Kien...
BlueMix – IBM CIO Leadership Exchange Europe 27/28.5.14 - Berlin - Romeo Kien...
Último
How to write a Business Continuity Plan
How to write a Business Continuity Plan
Databarracks
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
mohitsingh558521
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Nathaniel Shimoni
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
Rick Flair
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Pixlogix Infotech
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Raghuram Pandurangan
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
BkGupta21
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
LoriGlavin3
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Lars Bell
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Zilliz
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dilum Bandara
Último
(20)
How to write a Business Continuity Plan
How to write a Business Continuity Plan
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
The datascientists workplace of the future, IBM developerDays 2014, Vienna by Romeo Kienzler
1.
© 2013 IBM
Corporation1 The Data Scientists Workplace of the Future - Workshop SwissRE, 11.6.14 Romeo Kienzler IBM Center of Excellence for Data Science, Cognitive Systems and BigData (A joint-venture between IBM Research Zurich and IBM Innovation Center DACH) Source: http://www.kdnuggets.com/2012/04/data-science-history.jpg
2.
© 2013 IBM
Corporation2 The Data Scientists Workplace of the Future - * * C R E D I T S * * Romeo Kienzler IBM Innovation Center ● Parts of these slides have been copied from and/or revised by ● Dr. Anand Ranganathan, IBM Watson Research Lab ● Dr. Stefan Mück, IBM BigData Leader Europe ● Dr. Berthold Rheinwald, IBM Almaden Research Lab ● Dr. Diego Kuonen, Statoo Consulting ● Dr. Abdel Labbi, IBM Zurich Research Lab ● Brandon MacKenzie, IBM Software Group
3.
© 2013 IBM
Corporation3 What is DataScience? Source: Statoo.com http://slidesha.re/1kmNiX0
4.
© 2013 IBM
Corporation4 What is DataScience? Source: Statoo.com http://slidesha.re/1kmNiX0
5.
© 2013 IBM
Corporation5 DataScience at present ● Tools (http://blog.revolutionanalytics.com/2014/01/in-data-scientist-survey-r-is-the-most-used-tool-other-than-databases.html) ● SQL (42%) ● R (33%) ● Python (26%) ● Excel (25%) ● Java, Ruby, C++ (17%) ● SPSS, SAS (9%) ● Limitations (Single Node usage) ● Main Memory ● CPU <> Main Memory Bandwidth ● CPU ● Storage <> Main Memory Bandwidth (either Single node or SAN)
6.
© 2013 IBM
Corporation6 DataScience at present - Demo ● Assume 1 TB file on Hard Drive ● Spit into 16 files ● split -d -n 16 output.json ● Distribute on 4 Nodes ● for node in `seq 1 16`; do scp x$node id@node$i:~/; done ● Perform calculation in paralell ● for node in `seq 1 16`; do ssh id@node$i 'cat $file |awk -F":" '{print $6}' |grep -i samsung |grep breathtaking |wc -l'; done > result ● Merge Result ● cat result |sum Source: http://sergeytihon.wordpress.com/2013/03/20/the-data-science-venn-diagram/
7.
© 2013 IBM
Corporation7 What is BIG data?
8.
© 2013 IBM
Corporation8 What is BIG data?
9.
© 2013 IBM
Corporation9 What is BIG data? Big Data Hadoop
10.
© 2013 IBM
Corporation10 What is BIG data? Business Intelligence Data Warehouse
11.
© 2013 IBM
Corporation11 BigData == Hadoop? Hadoop BigData Hadoop
12.
© 2013 IBM
Corporation12 What is beyond “Data Warehouse”? Data Lake Data Warehouse
13.
© 2013 IBM
Corporation13 First “BigData” UseCase ? ● Google Index ● 40 X 10^9 = 40.000.000.000 => 40 billion pages indexed ● Will break 100 PB barrier soon ● Derived from MapReduce ● now “caffeine” based on “percolator” ● Incremental vs. batch ● In-Memory vs. disk ●
14.
© 2013 IBM
Corporation14 Map-Reduce → Hadoop → BigInsights
15.
© 2013 IBM
Corporation15 BigData UseCases ● CERN LHC ● 25 petabytes per year ● Facebook ● Hive Datawarehouse ● 300 PB, Growing 600 TB / d ● > 100 k servers ● Genomics ● Enterprises ● Data center analytics (Logflies, OS/NW monitors, ...) ● Predictive Maintenance, Cybersecurity ● Social Media Analytics ● DWH offload ● Call Detail Record (CDR) data preservation http://www.balthasar-glaettli.ch/vorratsdaten/
16.
© 2013 IBM
Corporation1616 Why is Big Data important?
17.
© 2013 IBM
Corporation17 BigData Analytics Source: http://www.strategy-at-risk.com/2008/01/01/what-we-do/
18.
© 2013 IBM
Corporation18 BigData Analytics – Predictive Analytics "sometimes it's not who has the best algorithm that wins; it's who has the most data." (C) Google Inc. The Unreasonable Effectiveness of Data¹ ¹http://www.csee.wvu.edu/~gidoretto/courses/2011-fall-cp/reading/TheUnreasonable%20EffectivenessofData_IEEE_IS2009.pdf No Sampling => Work with full dataset => No p-Value/z-Scores anymore
19.
© 2013 IBM
Corporation19 We need Data Parallelism
20.
© 2013 IBM
Corporation20 Aggregated Bandwith between CPU, Main Memory and Hard Drive 1 TB (at 10 GByte/s) - 1 Node - 100 sec - 10 Nodes - 10 sec - 100 Nodes - 1 sec - 1000 Nodes - 100 msec
21.
© 2013 IBM
Corporation21 Fault Tolerance / Commodity Hardware AMD Turion II Neo N40L (2x 1,5GHz / 2MB / 15W), 8 GB RAM, 3TB SEAGATE Barracuda 7200.14 < CHF 500 100 K => 200 X (2, 4, 3) => 400 Cores, 1,6 TB RAM, 200 TB HD MTBF ~ 365 d > 1,5 d Source: http://www.cloudcomputingpatterns.org/Watchdog
22.
© 2013 IBM
Corporation22 NoSQL Databases Column Store – Hadoop / HBASE – Cassandra – Amazon Simple DB JSON / Document Store – MongoDB – CouchDB Key / Value Store – Amazon DynamoDB – Voldemort Graph DBs – DB2 SPARQL Extension – Neo4J MP RDBMS – DB2 DPF, DB2 pureScale, PureData for Operational Analytics – Oracle RAC – Greenplum http://nosql-database.org/ > 150
23.
© 2013 IBM
Corporation23 CAP Theorem / Brewers Theorem¹ impossible for a distributed computer system simultaneously guarantee all 3 properties – Consistency (all nodes see the same data at the same time) – Availability (guarantee that every request knows whether it was successful or failed) – Partition tolerance (continues to operate despite failure of part of the system) What about ACID? – Atomicity – Consistency – Isolation – Durability BASE, the new ACID – Basically Available – Soft state – Eventual consistency • Monotonic Read Consistency • Monotonic Write Consistency • Read Your Own Writes –
24.
© 2013 IBM
Corporation24 What role is the cloud playing here?
25.
© 2013 IBM
Corporation25 “Elastic” Scale-Out Source: http://www.cloudcomputingpatterns.org/Continuously_Changing_Workload
26.
© 2013 IBM
Corporation26 “Elastic” Scale-Out of
27.
© 2013 IBM
Corporation27 “Elastic” Scale-Out of CPU Cores
28.
© 2013 IBM
Corporation28 “Elastic” Scale-Out of CPU Cores Storage
29.
© 2013 IBM
Corporation29 “Elastic” Scale-Out of CPU Cores Storage Memory
30.
© 2013 IBM
Corporation30 “Elastic” Scale-Out linear Source: http://www.cloudcomputingpatterns.org/Elastic_Platform
31.
© 2013 IBM
Corporation31 How do Databases Scale-Out? Shared Disk Architectures
32.
© 2013 IBM
Corporation32 How do Databases Scale-Out? Shared Nothing Architectures
33.
© 2013 IBM
Corporation33 Hadoop? Shared Nothing Architecture? Shared Disk Architecture?
34.
© 2013 IBM
Corporation34 Data Science on Hadoop SQL (42%) R (33%) Python (26%) Excel (25%) Java, Ruby, C++ (17%) SPSS, SAS (9%) Data Science Hadoop
35.
© 2013 IBM
Corporation35 Large Scale Data Ingestion ● Traditionally ● Crawl to local file system (e.g. wget http://www.heise.de/newsticker/) ● Export RDBMS data to CSV (local file system) ● Batched FTP Servers uploads ● Then: Copy to HDFS ● BigInsights ● Use one of built-in importers ● Imports directly info HDFS ● Use Eclipse-Tooling to deploy custom importers easily
36.
© 2013 IBM
Corporation36 Large Scale Data Ingestion (ETL on M/R) ● Modern ETL (Extract, Transform, Load) tools support Hadoop as ● Source, Sink (HDFS) ● Engine (MapReduce) ● Example: InfoSphere DataStage
37.
© 2013 IBM
Corporation37 Real-Time/ In-Memory Data Ingestion ● If volume can be reduced dramatically during first processing steps ● Feature Extraction of ● Video ● Audio ● Semistructured Text (e.g. Logfiles) ● Structured Text ● Filtering ● Compression ● Recommendation: Usage of Streaming Engines ● IBM InfoSphere Streams ● Twitter Storm (now Apache incubator) ● Apache Spark Streaming
38.
© 2013 IBM
Corporation38 Real-Time/ In-Memory Data Ingestion ● If volume can be reduced dramatically during first processing steps ● Feature Extraction of ● Video ● Audio ● Semistructured Text (e.g. Logfiles) ● Structured Text ● Filtering ● Compression
39.
© 2013 IBM
Corporation39 SQL on Hadoop ● IBM BigSQL (ANSI 92 compliant) ● HIVE (SQL dialect) ● Cloudera Impala ● Lingual ● ... SQL Hadoop
40.
© 2013 IBM
Corporation40 BigSQL V3.0 – ANSI SQL 92 compliant IBM BigInsights v3.0, with Big SQL 3.0, is the only Hadoop distribution to successfully run ALL 99 TPC-DS queries and ALL 22 TPC-H queries without modification. Source: http://www.ibmbigdatahub.com/blog/big-deal-about-infosphere-biginsights-v30-big-sql
41.
© 2013 IBM
Corporation41 BigSQL V3.0 – Architecture
42.
© 2013 IBM
Corporation42 BigSQL V3.0 – Demo (small) ● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich) ● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich) ● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley) ● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich) ● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich) ● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
43.
© 2013 IBM
Corporation43 BigSQL V3.0 – Demo (small) CREATE EXTERNAL TABLE trace ( hour integer, employeeid integer, departmentid integer, clientid integer, date string, timestamp string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY 'n' STORED AS TEXTFILE LOCATION '/user/biadmin/32Gtest'; select count(hour), hour from trace group by hour order by hour -- This command runs on 32 GB / ~650.000.000 rows in HDFS
44.
© 2013 IBM
Corporation44 BigSQL V3.0 – Demo (small)
45.
© 2013 IBM
Corporation45 BigSQL V3.0 – Demo (small)
46.
© 2013 IBM
Corporation46 R on Hadoop ● IBM BigR (based on SystemML Almadan Research project) ● Rhadoop ● RHIPE ● ... “R” Hadoop
47.
© 2013 IBM
Corporation47 BigR (based on SystemML) Example: Gaussian Non-negative Matrix Factorization package gnmf; import java.io.IOException; import java.net.URISyntaxException; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.JobConf; public class MatrixGNMF { public static void main(String[] args) throws IOException, URISyntaxException { if(args.length < 10) { System.out.println("missing parameters"); System.out.println("expected parameters: [directory of v] [directory of w] [directory of h] " + "[k] [num mappers] [num reducers] [replication] [working directory] " + "[final directory of w] [final directory of h]"); System.exit(1); } String vDir = args[0]; String wDir = args[1]; String hDir = args[2]; int k = Integer.parseInt(args[3]); int numMappers = Integer.parseInt(args[4]); int numReducers = Integer.parseInt(args[5]); int replication = Integer.parseInt(args[6]); String outputDir = args[7]; String wFinalDir = args[8]; String hFinalDir = args[9]; JobConf mainJob = new JobConf(MatrixGNMF.class); String vDirectory; String wDirectory; String hDirectory; FileSystem.get(mainJob).delete(new Path(outputDir)); vDirectory = vDir; hDirectory = hDir; wDirectory = wDir; String workingDirectory; String resultDirectoryX; String resultDirectoryY; long start = System.currentTimeMillis(); System.gc(); System.out.println("starting calculation"); System.out.print("calculating X = WT * V... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_H, vDirectory, wDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = WT * W * H... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, wDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_H, hDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating H = H .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, hDirectory, resultDirectoryX, resultDirectoryY, hFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back H... "); FileSystem.get(mainJob).delete(new Path(hDirectory)); hDirectory = workingDirectory; System.out.println("done"); System.out.print("calculating X = V * HT... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_W, vDirectory, hDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = W * H * HT... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, hDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_W, wDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating W = W .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, wDirectory, resultDirectoryX, resultDirectoryY, wFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back W... "); FileSystem.get(mainJob).delete(new Path(wDirectory)); package gnmf; import gnmf.io.MatrixObject; import gnmf.io.MatrixVector; import gnmf.io.TaggedIndex; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; public class UpdateWHStep2 { static class UpdateWHStep2Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixVector, TaggedIndex, MatrixVector> { @Override public void map(TaggedIndex key, MatrixVector value, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { out.collect(key, value); } } static class UpdateWHStep2Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixVector, TaggedIndex, MatrixObject> { @Override public void reduce(TaggedIndex key, Iterator<MatrixVector> values, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { MatrixVector result = null; while(values.hasNext()) { MatrixVector current = values.next(); if(result == null) { result = current.getCopy(); } else { result.addVector(current); } } if(result != null) { out.collect(new TaggedIndex(key.getIndex(), TaggedIndex.TYPE_VECTOR_X), new MatrixObject(result)); } } } public static String runJob(int numMappers, int numReducers, int replication, String inputDir, String outputDir) throws IOException { String workingDirectory = outputDir + System.currentTimeMillis() + "-UpdateWHStep2/"; JobConf job = new JobConf(UpdateWHStep2.class); job.setJobName("MatrixGNMFUpdateWHStep2"); job.setInputFormat(SequenceFileInputFormat.class); FileInputFormat.setInputPaths(job, new Path(inputDir)); package gnmf; import gnmf.io.MatrixCell; import gnmf.io.MatrixFormats; import gnmf.io.MatrixObject; import gnmf.io.MatrixVector; import gnmf.io.TaggedIndex; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; public class UpdateWHStep1 { public static final int UPDATE_TYPE_H = 0; public static final int UPDATE_TYPE_W = 1; static class UpdateWHStep1Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixObject, TaggedIndex, MatrixObject> { private int updateType; @Override public void map(TaggedIndex key, MatrixObject value, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { if(updateType == UPDATE_TYPE_W && key.getType() == TaggedIndex.TYPE_CELL) { MatrixCell current = (MatrixCell) value.getObject(); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_CELL), new MatrixObject(new MatrixCell(key.getIndex(), current.getValue()))); } else { out.collect(key, value); } } @Override public void configure(JobConf job) { updateType = job.getInt("gnmf.updateType", 0); } } static class UpdateWHStep1Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixObject, TaggedIndex, MatrixVector> { private double[] baseVector = null; private int vectorSizeK; @Override public void reduce(TaggedIndex key, Iterator<MatrixObject> values, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { if(key.getType() == TaggedIndex.TYPE_VECTOR) { if(!values.hasNext()) throw new RuntimeException("expected vector"); MatrixFormats current = values.next().getObject(); if(!(current instanceof MatrixVector)) throw new RuntimeException("expected vector"); baseVector = ((MatrixVector) current).getValues(); } else { while(values.hasNext()) { MatrixCell current = (MatrixCell) values.next().getObject(); if(baseVector == null) { out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), new MatrixVector(vectorSizeK)); } else { if(baseVector.length == 0) throw new RuntimeException("base vector is corrupted"); MatrixVector resultingVector = new MatrixVector(baseVector); resultingVector.multiplyWithScalar(current.getValue()); if(resultingVector.getValues().length == 0) throw new RuntimeException("multiplying with scalar failed"); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), resultingVector); } } baseVector = null; } } @Override public void configure(JobConf job) { vectorSizeK = job.getInt("dml.matrix.gnmf.k", 0); Java Implementation (>1500 lines of code) Equivalent SystemML Implementation (10 lines of code) Experimenting with multiple variants! W = W*max(V%*%t(H) – alphaW JW, 0)/(W%*%H%*%t(H)) H = H*max(t(W)%*%V – alphaH JH, 0)/(t(W)%*%W%*%H) W = W*((S*V)%*%t(H))/((S*(W%*%H))%*%t(H)) H = H*(t(W)%*%(S*V))/(t(W)%*%(S*(W%*%H))) W = W*(V/(W%*%H) %*% t(H))/(E%*%t(H)) H = H*(t(W)%*%(V/(W%*%H)))/(t(W)%*%E)
48.
© 2013 IBM
Corporation48 BigR (based on SystemML) SystemML compiles hybrid runtime plans ranging from in- memory, single machine (CP) to large-scale, cluster (MR) compute ● Challenge ● Guaranteed hard memory constraints (budget of JVM size) ● for arbitrary complex ML programs ● Key Technical Innovations ● CP & MR Runtime: Single machine & MR operations, integrated runtime ● Caching: Reuse and eviction of in-memory objects ● Cost Model: Accurate time and worst-case memory estimates ● Optimizer: Cost-based runtime plan generation ● Dyn. Recompiler: Re-optimization for initial unknowns Data size Runtime CP CP/MR MR Gradually exploit MR parallelism High performance computing for small data sizes. Scalable computing for large data sizes. Hybrid Plans
49.
© 2013 IBM
Corporation49 R Clients SystemML Statistics Engine Data Sources Embedded R Execution IBM R Packages IBM R Packages Pull data (summaries) to R client Or, push R functions right on the data 1 2 3 © 2014 IBM Corporation17 IBM Internal Use Only BigR Architecture
50.
© 2013 IBM
Corporation50 BigR Demo (small) ● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich) ● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich) ● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
51.
© 2013 IBM
Corporation51 BigR Demo (small) library(bigr) bigr.connect(host="bigdata", port=7052, database="default", user="biadmin", password="xxx") is.bigr.connected() tbr <- bigr.frame(dataSource="DEL", coltypes = c("numeric","numeric","numeric","numeric","character","character"), dataPath="/user/biadmin/32Gtest", delimiter=",", header=F, useMapReduce=T) h <- bigr.histogram.stats(tbr$V1, nbins=24)
52.
© 2013 IBM
Corporation52 BigR Demo (small) class bins counts centroids 1 ALL 0 18289280 1.583333 2 ALL 1 15360 2.750000 3 ALL 2 55040 3.916667 4 ALL 3 189440 5.083333 5 ALL 4 579840 6.250000 6 ALL 5 5292160 7.416667 7 ALL 6 8074880 8.583333 8 ALL 7 15653120 9.750000 ...
53.
© 2013 IBM
Corporation53 BigR Demo (small)
54.
© 2013 IBM
Corporation54 BigR Demo (small) jpeg('hist.jpg') bigr.histogram(tbr$V1, nbins=24) # This command runs on 32 GB / ~650.000.000 rows in HDFS dev.off()
55.
© 2013 IBM
Corporation55 BigR Demo (small) Sampling, Resampling, Bootstrapping vs Whole Dataset Processing What is your experience?
56.
© 2013 IBM
Corporation56 Python on Hadoop python Hadoop
57.
© 2013 IBM
Corporation57 SPSS on Hadoop
58.
© 2013 IBM
Corporation58 SPSS on Hadoop
59.
© 2013 IBM
Corporation59 BigSheets Demo (small) ● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich) ● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich) ● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley) ● 32 GB Data, ~650.000.000 rows (small, Innovation Center Zurich) ● 3 TB Data, ~ 60.937.500.000 rows (middle, Innovation Center Zurich) ● 0.7 PB Data, ~ 1.421875×10¹³ rows (large, Innovation Center Hursley)
60.
© 2013 IBM
Corporation60 BigSheets Demo (small)
61.
© 2013 IBM
Corporation61 BigSheets Demo (small) This command runs on 32 GB / ~650.000.000 rows in HDFS
62.
© 2013 IBM
Corporation62 BigSheets Demo (small)
63.
© 2013 IBM
Corporation63 Text Extraction (SystemT, AQL)
64.
© 2013 IBM
Corporation64 Text Extraction (SystemT, AQL)
65.
© 2013 IBM
Corporation65 If this is not enough? → BigData AppStore
66.
© 2013 IBM
Corporation66 BigData AppStore, Eclipse Tooling ● Write your apps in ● Java (MapReduce) ● PigLatin,Jaql ● BigSQL/Hive/BigR ● Deploy it to BigInsights via Eclipse ● Automatically ● Schedule ● Update ● hdfs files ● BigSQL tables ● BigSheets collections
67.
© 2013 IBM
Corporation67 Questions? http://www.ibm.com/software/data/bigdata/ Twitter: @RomeoKienzler, @IBMEcosystem_DE, @IBM_ISV_Alps
68.
© 2013 IBM
Corporation68 DFT/Audio Analytics (as promised) library(tuneR) a <- readWave("whitenoisesine.wav") f<- fft(a@left) jpeg('rplot_wnsine.jpg') plot(Re(f)^2) dev.off() a <- readWave("whitenoise.wav") f<- fft(a@left) jpeg('rplot_wn.jpg') plot(Re(f)^2) dev.off() a <- readWave("whitenoisesine.wav") brv <- as.bigr.vector(a@left) al <- as.list(a@left)
69.
© 2013 IBM
Corporation69 Backup Slides
70.
© 2013 IBM
Corporation70
71.
© 2013 IBM
Corporation71
72.
© 2013 IBM
Corporation72
73.
© 2013 IBM
Corporation73
74.
© 2013 IBM
Corporation74
75.
© 2013 IBM
Corporation75
76.
© 2013 IBM
Corporation76
77.
© 2013 IBM
Corporation77
78.
© 2013 IBM
Corporation78
79.
© 2013 IBM
Corporation79
80.
© 2013 IBM
Corporation80
81.
© 2013 IBM
Corporation81
82.
© 2013 IBM
Corporation82
83.
© 2013 IBM
Corporation83
84.
© 2013 IBM
Corporation84 Map-Reduce Source: http://www.cloudcomputingpatterns.org/Map_Reduce
85.
© 2013 IBM
Corporation85
86.
© 2013 IBM
Corporation86
87.
© 2013 IBM
Corporation87
88.
© 2013 IBM
Corporation88
89.
© 2013 IBM
Corporation89
90.
© 2013 IBM
Corporation90
91.
© 2013 IBM
Corporation91
92.
© 2013 IBM
Corporation92
93.
© 2013 IBM
Corporation93
94.
© 2013 IBM
Corporation94
95.
© 2013 IBM
Corporation95
96.
© 2013 IBM
Corporation96
97.
© 2013 IBM
Corporation97
98.
© 2013 IBM
Corporation98
99.
© 2013 IBM
Corporation99
100.
© 2013 IBM
Corporation100
101.
© 2013 IBM
Corporation101
102.
© 2013 IBM
Corporation102
103.
© 2013 IBM
Corporation103
104.
© 2013 IBM
Corporation104
105.
© 2013 IBM
Corporation105
106.
© 2013 IBM
Corporation106
107.
© 2013 IBM
Corporation107
108.
© 2013 IBM
Corporation108
Baixar agora