SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Opowie @przemur z
Plan prezentacji 
• dobór parametrów replikacji węzła Hadoopa 
• Pig czy Hive do ETL-a? 
• samodzielne budowanie klastra czy Cloud?
Prawdziwy plan spotkania 
• Co to jest “Big Data”? 
• Roboty piszące zadania MapReduce 
• Zaproszeni goście - Harimata, GE Healthcare 
• Krasnale a Data Science
Big Data means "a collection of data sets so large and 
complex that it becomes difficult to process using on-hand 
database management tools or traditional data processing 
applications.” (Wikipedia) 
http://www.winshuttle.com/big-data-timeline/
http://plyojump.com/classes/mainframe_era.php
http://escience.washington.edu/content/hyak-0
http://escience.washington.edu/content/hyak-0
Dane 
Komputer 
Program 
Komputer 
Komputer 
Komputer 
Komputer 
Dane 
Dane 
Dane Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
Dane 
… 
Dane Program 
Program 
Program 
Program
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Program 
Program 
Program 
Program 
Program 
JobTracker, 
NameNode, 
… 
…
http://www.tik.ee.ethz.ch/~ddosvax/cluster/
2005
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Program 
Program 
Program 
Program 
Program 
ResourceManager, 
NameNode, … 
HDFS
Map Shuffle Reduce 
Dane 
Komputer 
Dane 
Dane 
Dane 
Dane 
Komputer 
Dane 
Dane 
Dane 
Program 
Program 
Wyniki fazy Map 
Komputer 
Komputer 
Wyniki fazy Map 
Wyniki koncowe 
Wyniki koncowe 
MapReduce
… 
15 public class WordCount { 
16 
17 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
18 private final static IntWritable one = new IntWritable(1); 
19 private Text word = new Text(); 
20 
21 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 
22 String line = value.toString(); 
23 StringTokenizer tokenizer = new StringTokenizer(line); 
24 while (tokenizer.hasMoreTokens()) { 
25 word.set(tokenizer.nextToken()); 
26 context.write(word, one); 
27 } 
28 } 
29 } 
30 
31 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 
32 
33 public void reduce(Text key, Iterable<IntWritable> values, Context context) 
34 throws IOException, InterruptedException { 
35 int sum = 0; 
36 for (IntWritable val : values) { 
37 sum += val.get(); 
38 } 
39 context.write(key, new IntWritable(sum)); 
40 } 
41 } 
…
input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS 
(line:chararray); 
words = FOREACH input_lines GENERATE 
FLATTEN(TOKENIZE(line)) AS word; 
filtered_words = FILTER words BY word MATCHES 'w+'; 
word_groups = GROUP filtered_words BY word; 
word_count = FOREACH word_groups GENERATE 
COUNT(filtered_words) AS count, group AS word; 
ordered_word_count = ORDER word_count BY count DESC; 
STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; 
CREATE TABLE input (line STRING); 
LOAD DATA LOCAL INPATH 'input.tsv' OVERWRITE INTO 
TABLE input; 
SELECT word, COUNT(*) FROM input LATERAL VIEW 
explode(split(text, ' ')) lTable as word GROUP BY word 
ORDER BY word;
200 
150 
100 
50 
0 
April May June July
2003
Data Science vs Big Data ???
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Gdzie więcej informacji? 
• http://www.meetup.com/datakrk/ 
• https://github.com/onurakpolat/awesome-bigdata 
• https://class.coursera.org/datasci-001/lecture 
• https://www.codeschool.com/courses/try-r 
• …
Specjalne podziękowania dla:

Mais conteúdo relacionado

Mais procurados

Geo & capped collections with MongoDB
Geo & capped collections  with MongoDBGeo & capped collections  with MongoDB
Geo & capped collections with MongoDB
Rainforest QA
 
MongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes PlatformMongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes Platform
MongoDB
 
Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010
Skills Matter
 

Mais procurados (20)

Geo & capped collections with MongoDB
Geo & capped collections  with MongoDBGeo & capped collections  with MongoDB
Geo & capped collections with MongoDB
 
Handle 08
Handle 08Handle 08
Handle 08
 
Session 09 learning relationships.pptx
Session 09 learning relationships.pptxSession 09 learning relationships.pptx
Session 09 learning relationships.pptx
 
MongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes PlatformMongoDB and Web Scrapping with the Gyes Platform
MongoDB and Web Scrapping with the Gyes Platform
 
Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010
 
GitConnect
GitConnectGitConnect
GitConnect
 
Giovanni Lanzani – SQL & NoSQL databases for data driven applications - NoSQL...
Giovanni Lanzani – SQL & NoSQL databases for data driven applications - NoSQL...Giovanni Lanzani – SQL & NoSQL databases for data driven applications - NoSQL...
Giovanni Lanzani – SQL & NoSQL databases for data driven applications - NoSQL...
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Entity Framework Core
Entity Framework CoreEntity Framework Core
Entity Framework Core
 
MongoDB and Play! Framework workshop
MongoDB and Play! Framework workshopMongoDB and Play! Framework workshop
MongoDB and Play! Framework workshop
 
An Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL RepositoriesAn Approach for RDF-based Semantic Access to NoSQL Repositories
An Approach for RDF-based Semantic Access to NoSQL Repositories
 
XWiki: The best wiki for developers
XWiki: The best wiki for developersXWiki: The best wiki for developers
XWiki: The best wiki for developers
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Git as NoSQL
Git as NoSQLGit as NoSQL
Git as NoSQL
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
MongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
MongoDB IoT City Tour EINDHOVEN: Managing the Database ComplexityMongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
MongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
 
Powering Rails Application With PostgreSQL
Powering Rails Application With PostgreSQLPowering Rails Application With PostgreSQL
Powering Rails Application With PostgreSQL
 
Faites évoluer votre accès aux données avec MongoDB Stitch
Faites évoluer votre accès aux données avec MongoDB StitchFaites évoluer votre accès aux données avec MongoDB Stitch
Faites évoluer votre accès aux données avec MongoDB Stitch
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 

Destaque

Nadchodzi tsunami danych, zacznij uczyć się serfować sebastian starzyński a...
Nadchodzi tsunami danych, zacznij uczyć się serfować   sebastian starzyński a...Nadchodzi tsunami danych, zacznij uczyć się serfować   sebastian starzyński a...
Nadchodzi tsunami danych, zacznij uczyć się serfować sebastian starzyński a...
DataSci Foundation
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
Sages
 

Destaque (20)

Big data w praktyce
Big data w praktyceBig data w praktyce
Big data w praktyce
 
Nie bój się analizy danych! Fakty i mity o big data i Business Intelligence.
Nie bój się analizy danych! Fakty i mity o big data i Business Intelligence.Nie bój się analizy danych! Fakty i mity o big data i Business Intelligence.
Nie bój się analizy danych! Fakty i mity o big data i Business Intelligence.
 
Big Data - tylko na przykładach
Big Data - tylko na przykładachBig Data - tylko na przykładach
Big Data - tylko na przykładach
 
Warsaw Data Science - Recsys2016 Quick Review
Warsaw Data Science - Recsys2016 Quick ReviewWarsaw Data Science - Recsys2016 Quick Review
Warsaw Data Science - Recsys2016 Quick Review
 
Prezentacja z Big Data Tech 2016: Machine Learning vs Big Data
Prezentacja z Big Data Tech 2016: Machine Learning vs Big DataPrezentacja z Big Data Tech 2016: Machine Learning vs Big Data
Prezentacja z Big Data Tech 2016: Machine Learning vs Big Data
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Big Data - Targi Kreatywne - szczecin
Big Data - Targi Kreatywne - szczecinBig Data - Targi Kreatywne - szczecin
Big Data - Targi Kreatywne - szczecin
 
Artur Senk, OKE Poland, Big Data na zakupach
Artur Senk, OKE Poland, Big Data na zakupachArtur Senk, OKE Poland, Big Data na zakupach
Artur Senk, OKE Poland, Big Data na zakupach
 
Big Data w Polsce i za granicą (Big Data in Poland and worldwide)
Big Data w Polsce i za granicą (Big Data in Poland and worldwide)Big Data w Polsce i za granicą (Big Data in Poland and worldwide)
Big Data w Polsce i za granicą (Big Data in Poland and worldwide)
 
ATAAS2016 - Big data analytics – data visualization himanshu and santosh
ATAAS2016 - Big data analytics – data visualization   himanshu and santoshATAAS2016 - Big data analytics – data visualization   himanshu and santosh
ATAAS2016 - Big data analytics – data visualization himanshu and santosh
 
Machine learning and Big Data (lecture in Polish)
Machine learning and Big Data (lecture in Polish)Machine learning and Big Data (lecture in Polish)
Machine learning and Big Data (lecture in Polish)
 
Nadchodzi tsunami danych, zacznij uczyć się serfować sebastian starzyński a...
Nadchodzi tsunami danych, zacznij uczyć się serfować   sebastian starzyński a...Nadchodzi tsunami danych, zacznij uczyć się serfować   sebastian starzyński a...
Nadchodzi tsunami danych, zacznij uczyć się serfować sebastian starzyński a...
 
Michał Dec - Quality in Clouds
Michał Dec - Quality in CloudsMichał Dec - Quality in Clouds
Michał Dec - Quality in Clouds
 
Zabawne ogloszenia
Zabawne ogloszeniaZabawne ogloszenia
Zabawne ogloszenia
 
Jak zacząć przetwarzanie małych i dużych danych tekstowych?
Jak zacząć przetwarzanie małych i dużych danych tekstowych?Jak zacząć przetwarzanie małych i dużych danych tekstowych?
Jak zacząć przetwarzanie małych i dużych danych tekstowych?
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Wprowadzenie do Big Data i Apache Spark
Wprowadzenie do Big Data i Apache SparkWprowadzenie do Big Data i Apache Spark
Wprowadzenie do Big Data i Apache Spark
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
A novel approach to big data veracity using crowd-sourcing techniques
A novel approach to big data veracity using crowd-sourcing techniques A novel approach to big data veracity using crowd-sourcing techniques
A novel approach to big data veracity using crowd-sourcing techniques
 
What is big data?
What is big data?What is big data?
What is big data?
 

Semelhante a [WebMuses] Big data dla zdezorientowanych

Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
MongoDB
 

Semelhante a [WebMuses] Big data dla zdezorientowanych (20)

Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
#JavaFX.forReal() - ElsassJUG
#JavaFX.forReal() - ElsassJUG#JavaFX.forReal() - ElsassJUG
#JavaFX.forReal() - ElsassJUG
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Microservices in Go_Dessi_Massimiliano_Codemotion_2017_Rome
Microservices in Go_Dessi_Massimiliano_Codemotion_2017_Rome Microservices in Go_Dessi_Massimiliano_Codemotion_2017_Rome
Microservices in Go_Dessi_Massimiliano_Codemotion_2017_Rome
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Ac2
Ac2Ac2
Ac2
 
AngularJS in large applications - AE NV
AngularJS in large applications - AE NVAngularJS in large applications - AE NV
AngularJS in large applications - AE NV
 
Compiler Case Study - Design Patterns in C#
Compiler Case Study - Design Patterns in C#Compiler Case Study - Design Patterns in C#
Compiler Case Study - Design Patterns in C#
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
 
IoT Protocols Integration with Vortex Gateway
IoT Protocols Integration with Vortex GatewayIoT Protocols Integration with Vortex Gateway
IoT Protocols Integration with Vortex Gateway
 
NoSQL Endgame Percona Live Online 2020
NoSQL Endgame Percona Live Online 2020NoSQL Endgame Percona Live Online 2020
NoSQL Endgame Percona Live Online 2020
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-PersistenceHidden pearls for High-Performance-Persistence
Hidden pearls for High-Performance-Persistence
 
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
Hazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMSHazelcast and MongoDB at Cloud CMS
Hazelcast and MongoDB at Cloud CMS
 

Último

怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 

Último (20)

怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 

[WebMuses] Big data dla zdezorientowanych

  • 2. Plan prezentacji • dobór parametrów replikacji węzła Hadoopa • Pig czy Hive do ETL-a? • samodzielne budowanie klastra czy Cloud?
  • 3.
  • 4. Prawdziwy plan spotkania • Co to jest “Big Data”? • Roboty piszące zadania MapReduce • Zaproszeni goście - Harimata, GE Healthcare • Krasnale a Data Science
  • 5. Big Data means "a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” (Wikipedia) http://www.winshuttle.com/big-data-timeline/
  • 6.
  • 9.
  • 10.
  • 12. Dane Komputer Program Komputer Komputer Komputer Komputer Dane Dane Dane Dane Dane Dane Dane Dane Dane Dane Dane Dane Dane Dane Dane Dane … Dane Program Program Program Program
  • 13.
  • 14. Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Program Program Program Program Program JobTracker, NameNode, … …
  • 16. 2005
  • 17. Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Program Program Program Program Program ResourceManager, NameNode, … HDFS
  • 18. Map Shuffle Reduce Dane Komputer Dane Dane Dane Dane Komputer Dane Dane Dane Program Program Wyniki fazy Map Komputer Komputer Wyniki fazy Map Wyniki koncowe Wyniki koncowe MapReduce
  • 19. … 15 public class WordCount { 16 17 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 18 private final static IntWritable one = new IntWritable(1); 19 private Text word = new Text(); 20 21 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { 22 String line = value.toString(); 23 StringTokenizer tokenizer = new StringTokenizer(line); 24 while (tokenizer.hasMoreTokens()) { 25 word.set(tokenizer.nextToken()); 26 context.write(word, one); 27 } 28 } 29 } 30 31 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 32 33 public void reduce(Text key, Iterable<IntWritable> values, Context context) 34 throws IOException, InterruptedException { 35 int sum = 0; 36 for (IntWritable val : values) { 37 sum += val.get(); 38 } 39 context.write(key, new IntWritable(sum)); 40 } 41 } …
  • 20. input_lines = LOAD '/tmp/my-copy-of-all-pages-on-internet' AS (line:chararray); words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; filtered_words = FILTER words BY word MATCHES 'w+'; word_groups = GROUP filtered_words BY word; word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words-on-internet'; CREATE TABLE input (line STRING); LOAD DATA LOCAL INPATH 'input.tsv' OVERWRITE INTO TABLE input; SELECT word, COUNT(*) FROM input LATERAL VIEW explode(split(text, ' ')) lTable as word GROUP BY word ORDER BY word;
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. 200 150 100 50 0 April May June July
  • 26.
  • 27.
  • 28. 2003
  • 29. Data Science vs Big Data ???
  • 30.
  • 32.
  • 33.
  • 34. Gdzie więcej informacji? • http://www.meetup.com/datakrk/ • https://github.com/onurakpolat/awesome-bigdata • https://class.coursera.org/datasci-001/lecture • https://www.codeschool.com/courses/try-r • …