1. Bigtop Working Group
Elance 6/27/2013
DC Absolute SW:Intro to BWG, intro to team
Roman Cloudera: Bigtop Creator
Marshall/Ryan Palomino Labs: BenchPress
2. Thank You Sponsors for the
Donations!
● Elance, post your Hadoop Jobs here!! Meeting
Space/food
● Docusign/SF for meeting space/food
● Cloudera
● DataPipe/$500 credit, free time for people doing
POCs, Gary?
● Safari Online Books/ 30 day donation
● Amazon AWS, $100/credits
●
3. Poll
● How many are managing POCs?
● How many are looking to do a career change
into Hadoop*?
4. Intro
● Technical Architect @ABSW, 2 POCS, Hbase &
Storm, Mongo doesn't count
● POC example, overly simplistic example:
– Write performance: Incoming data, save to disk
– Read Performance:Read Time, all table scans are
awful for browser interaction (reporting)
Slideshare:
5. POCs
● Proof of Concept, to verify scope, architecture
and cost
● A BigData Stack Implementation consists of:
1) DevOps
2) Application (e.g. Astyanax)
3) Internals: Cloudera/MapR/HW. We don't cover
internals. Take cs346 Please!!! Github/redbase
– We cover 1) and some of 2) For a POC?
6. Small vs. Large POCs
● GM >>$1M, $5-$10M hire Cloudera. World
experts who cover 1), 2) & 3)
● @$500k/$1M; you get 1y and most fail
– A high level person ~200k/year who doesn't code
– You as a newly hired tech lead or architect
– 1-2+ programmers who know nothing about
Hadoop* but know the business processes
● What happens after this?
7. Scope creep; HLP adds
components; defines effort
● Hadoop alone not fit; a VC, >1Y, fails or zombie
project, extrapolation from HLP downloads and
runs wc, HLP learns from web posts and sales
people
HDFS/Hadoop
HBase Storm
8. HLP gets info from BigData vendors
● Argument between Cassandra/Hadoop centers on
SPOF, building an application is difficult!
– Cassandra vs. HBase; nobody talks about Astyanax
Lethal underspecification of 2).
● See this in Job postings also. J2EE !=scalable
distributed programming
● Go to Palomino Labs for 2). Have to understand
Zookeeper programming first! PL can do 1) and 3)
● Java Concurrency->Zookeeper->Scalable Dist Apps
9. HLP && Machine Learning
● BigData == Machine Learning. Find someone who
knows R/Mahout. The same job listing w/J2EE
● R & Mahout aren't used in production.
● For this to work you have to be a GOOD server
programmer first. Not someone who downloads Tomcat
and figures out how to stub out REST calls.
● Separate track TBD/w Charles Nainen. Need sample
POC! W/sponsoring vendor
10. What to do?
● Contribute to Bigtop. Why?
– Teaches you the internals of
Bigtop/Hbase/Hadoop/Flume and gets you 1) and API
practice for 2)
– Add new components to Bigtop
● Hands on experience w/new components
● Contribute to Benchpress to get to 2) as a first
step. Gets you ZK. Still long way to go
● We don't cover 3). Not on the road map
11. Logistics
● Max 20 ppl. Maple Tree Inn
● Charge $100->$200/month for the room rental.
● Meet 2 weeks to do demos.
● Will cover Bigtop & Benchpress/ Storm future
session
● First session 3 meetings only. We reserve right
to stop these if we run out of time
12. Not a class
● Cloudera/HW/MapR have classes, $800-1k/day
for 3 days-1 week
● They have to charge this to pay someone's
salary to create material for you.
● We don't replace this. I took these classes. Not
going to steal their material. You will have to
read and write test code.
● Same information as a new Cloudera employee
● This works if the consultants get new business
and we get open source code contributions.
13. Group POC?
● Please talk to Roman/Bruno/Ryan/Charles if you have funding
which gets them new business
● POC mentors
– Ryan/Marshall: Application on Hadoop*. 9 people. Any POC
– Ron: Chef/Bigtop group project; MongoDB in production
– Roman/Bruno:Hadoop* Hadoop* POC
– DC:Hadoop/Storm
– Charles Nainen: ML POC. Need data/problem description/$
● We can do a group POC. Talk to a POC mentor
14. Group Sign Up Sheets
● Group POC
● Form groups, at least 2+; 500k-1M POCs are
good business; have to do it as a team.
● Skill shortage; not a budget issue
● Working Group Session Signup for Safari
subscription and AWS codes.
– June 30, July 14th
, July 28th
.