Mais conteúdo relacionado
Semelhante a Disruptive Applications with Hadoop__HadoopSummit2010 (20)
Mais de Yahoo Developer Network (20)
Disruptive Applications with Hadoop__HadoopSummit2010
- 2. IBM Software for a Smarter Planet
Emerging Technology - What Do We Do?
Innovation/collaborations in technologies
that we hope garner broad industry
adoption in timeframe of 12 -18 months
Our technology initiatives are refined based
on the marketplace & evolution of web
technologies
Voice of the Customer – early & direct
customer engagements (POCs) to iterate
on both the technology and the business
value
IBM Confidential Chart 2 © 2009 IBM Corporation
- 3. IBM Software for a Smarter Planet
Evolving Emerging Technology Focus Areas
Big Data Analytics for Business
Professionals - DIY Analytic Tool &
middleware - enabling massive amounts
of data to be in analyzed for actionable
insights
Web Browser Application Platform -
pushing the envelope of next
generation RIA applications & tooling
delivered with web browser reach &
economics
Mobile - next generation Enterprise-
Consumer applications & architecture
IBM Confidential Chart 3 © 2009 IBM Corporation
- 4. IBM Software for a Smarter Planet
Evolving Emerging Technology Focus Areas
Big Data Analytics for Business
Professionals - DIY Analytic Tool &
middleware - enabling massive amounts
of data to be in analyzed for actionable
insights
Web Browser Application Platform -
pushing the envelope of next
generation RIA applications & tooling
delivered with web browser reach &
economics
Mobile - next generation Enterprise-
Consumer applications & architecture
IBM Confidential Chart 4 © 2009 IBM Corporation
- 5. IBM Software for a Smarter Planet
Evolving Emerging Technology Focus Areas
Big Data Analytics for Business
Professionals - DIY Analytic Tool &
middleware - enabling massive amounts
of data to be in analyzed for actionable
insights
Web Browser Application Platform -
pushing the envelope of next
generation RIA applications & tooling
delivered with web browser reach &
economics
Mobile - next generation Enterprise-
Consumer applications & architecture
IBM Confidential Chart 5 © 2009 IBM Corporation
- 6. IBM Software for a Smarter Planet
New Intelligence
DIY Analytics
Making Hadoop accessible
to the business professionals
IBM Confidential Chart 6 © 2009 IBM Corporation
- 7. IBM Software for a Smarter Planet
New Intelligence - New Class of Application On Horizon
Hear business users asking for the
ability to directly manipulate, analyze &
remix massive data sources & services
• LOB “… Google wetted my appetite...I
want more customizable analytics with
me in the drivers seat…” Rich
Spectrum
DIY Analytic
Leveraging easy-to-use, rich data
manipulation metaphors like Applications
spreadsheets, etc.. Emerging
Rich visualizations to quickly identify
insights
IBM Confidential Chart 7 © 2009 IBM Corporation
- 8. IBM Software for a Smarter Planet
IBM Emerging Technology Project: BigSheets
What is it?
An insight engine for enabling ad-hoc business insights for
business users - at web scale
How does it work?
Discovery Process
1. point BigSheets to data sources of interests
• unstructured web data, feeds, XML, etc..
2. transform data into a form that can be analyzed
• Unstructured data becomes semi-structured data
• Example: name: Rod Smith, employer: IBM, state: GA
• Apply analytics - enriching the data
3. “what if tooling” - browser-based visual front end - spreadsheet
metaphor to create worksheets for exploring/visualizing the big data
What’s different?
• Unlocking insights embedded in unstructured data
• Analyzing data previously unavailable to analyze
IBM Confidential Chart 8 © 2009 IBM Corporation
- 9. IBM Software for a Smarter Planet
BigSheets: Framework on Hadoop
Expanding upon the Hadoop stack
• Visual tooling builds extensively on Pig
Big Sheets Architecture Characteristics:
• Extensible via UDFs
• REST API for customer choice of analytic service/
engine
• REST APl for choice of visualization packages
• Export content as feeds, XML, etc..
• ...more to come
IBM Confidential Chart 9 © 2009 IBM Corporation
- 10. IBM Software for a Smarter Planet
BigSheets in action
Crowd sourcing - Nikon: what are folks on
twitter saying about our cameras - by model
[ Input
Gather Daily Tweets for May
• 64 million tweets per day
• ~210 terabytes a month ][
•
•
Map
Split data across cluster
Emit tweets mentioning Nikon
cameras (key=Nikon D90, …) ][
•
•
•
model
Reduce
D90: 300 tweets
D3000: 68 tweets ]
Aggregate tweets for each Nikon
•
•
Output
Perform sediment analysis
• “..Wow, Great, Incredible…”
“..Lousy, sucks, ... “
“..no RAW support...”
IBM Confidential Chart 10
3 © 2009 IBM Corporation
- 11. IBM Software for a Smarter Planet
A Demonstration of BigSheets in action
Crowd sourcing - What do people want to buy?
What do people want to buy
• Gather
• Created an analysis model, using IBM Content Analytics, looking for ʻbuy signalsʼ:
• Verb phrase indicating the desire to get something
• “I would really love a...”
• Buy Target (“I would really love to get myself a cool new phone”)
• Brand, Company, and opinion statements in the context of this buy statement
• Deployed the analysis model into BigSheets where it gets deployed across the Hadoop
cloud
★In BigSheets each analysis model is considered a macro
• Visualize the results
IBM Confidential Chart 11
3 © 2009 IBM Corporation
- 12. IBM Software for a Smarter Planet
Marketplace Application Example - British Library
The Goal
Can an ET technology project &
Web Archive Opportunity IBM’s Classification Module (ICM)
electronically classify & tag web
Libraries & archives are interested in content & enable/create
collecting & preserving the web data visualizations
• British Library has opened the UK Web Archive
portal for researchers & historians to explore
preserved web content
• Parliament nearing vote to give the British Library
the nod to archive all .uk domain data, spanning 4
million sites & ~128TB today.
• Today, web page classification for the 5000 British
Library web sites is performed by 30 folks
Web Content To Gather:
• British Library gathered 1.48 TB of data - 4
web archive files comprising ~400,000 web
pages from 300 archived websites
• 4 machines (dual core), HD 1TB, 8 GBs
RAM
IBM Confidential Chart 12 © 2009 IBM Corporation
- 13. IBM Software for a Smarter Planet
Marketplace Application Example: AmEx or IBM
Business Questions
• Ongoing tracking of acquisitions and
associated IP
• Visualizations, e.g. corporate
genealogy
Project: Knowledge of Interest:
Improve IP Portfolio Analysis for • Corporate genealogies
Mergers & Acquisitions • IP ownership roll-up
• Patents ranked by citation
• Augment analysis with items affecting IP
“...please collect all US Patent value, inventor affiliation, citation rank by
filings… then let’s do…” time
Web Content To Gather:
• SEC filings, e.g. annual and quarterly reports
• USPTO patents, assignments and trademarks
• Company press releases
• Other M&A, inventor information from
feeds, webpages
IBM Confidential Chart 13 © 2009 IBM Corporation
- 14. IBM Software for a Smarter Planet
Let’s Talk Customers: AmEx or IBM
American Express:
Evaluating IP with large amounts of public and private data
Gathered 1,400,000 U.S. Patents on record from
2002 - 2009
★ 90 were cited/referenced of AMEX cited patents, 24
• The 1,400,000 cited/referenced another 6,100,000 cited 1 time thru one cited 67 times
U.S. & International patents
• 3600 cases from Court of Appeals, Federal Circuit,
★ Odd fact: a few patents cited/referenced as many as 1993 - 2007 (Georgetown Law)
13,870 other patents
★ 43 mentions of U.S. patents issued between 2002 -
• ~216 are AMEX patents 2009; relies on exact “Patent No. 9,999,999” match
• Productivity improvement from weeks to hours
IBM Confidential Chart 14 © 2009 IBM Corporation
- 15. IBM Software for a Smarter Planet
Conclusion
In God we trust
...all others, bring data
IBM Confidential Chart 15 © 2009 IBM Corporation