2. I am Simon Su
var simon = {};
simon.aboutme = 'http://about.me/peihsinsu';
simon.nodejs = ‘http://opennodes.arecord.us';
simon.googleshare = 'http://gappsnews.blogspot.tw'
simon.nodejsblog = ‘http://nodejs-in-example.blogspot.tw';
simon.blog = ‘http://peihsinsu.blogspot.com';
simon.slideshare = ‘http://slideshare.net/peihsinsu/';
simon.email = ‘simonsu.mail@gmail.com’;
simon.say(‘Good luck to everybody!');
3. I am Sunny Hu
var sunny = {};
sunny.aboutme = 'https://plus.google.com/u/0/+sunnyHU/posts';
sunny.email = sunnyhu@mitac.com.tw’;
sunny.language =[‘Java’,’.NET’,’NodeJS’,’SQL’ ]
sunny.skill = [ ‘Project management’,’System Analysis’,
’System design’,’Car ho lan’]
sunny.say(‘寫code太苦悶,心情要sunny');
10. What is the components of Hadoop...
Strategy
MapReduce
HDFS
Your idea for filtering information from the
given datasets
Mass computing power to parallel load and
process the requirements
Persistence storage for parallel access, better
with good performance...
11. You have better choice in Cloud...
Strategy
MapReduce
HDFS
Nothing can replace a good idea…, but fast...
Cloud machines with unlimited resources,
better with lower and scalable pricing...
Object storage services, like: Google Cloud
Storage, AWS S3...
38. Workflow...
1. Dump sample data from [publicdata:samples.shakespeare]
2. MapReduce to count the word display
3. Update result to BigQuery specific table
39. Look into source code...
● BigQueryInputFormat class
● Input parameters
● Mapper
● BigQueryOutputFormat class
● Output parameters
● Reducer
40. BigQueryInputFormat
● Using a user-specified query to select the appropriate
BigQuery objects.
● Splitting the results of the query evenly among the Hadoop
nodes.
● Parsing the splits into java objects to pass to the mapper
41. Input parameters
● Project Id : GCP project id , eg. hadoop-conf-2014
● Input Table Id :[optional projectId]:[datasetId].[table id]
42.
43. BigqueryOutputFormat Class
● Provides Hadoop with the ability to write JsonObject
values directly into a BigQuery table
● An extension of the Hadoop OutputFormat class
44. Output parameters
● Project Id : GCP project id ,eg. hadoop-conf-2014
● Output Table Id :[optional projectId]:[datasetId].[table id]
● Output Table Schema :[{'name': 'Name','type': 'STRING'},
{'name': 'Number','type': 'INTEGER'}]