SlideShare uma empresa Scribd logo
1 de 26
Extracting Value
from Big Data in the Cloud -
    Michael Newberry
Big data in a Hybrid-Cloud world
   Dr Michael Newberry
   Windows Azure Lead, Microsoft UK
   Michael.Newberry@Microsoft.com
Doggerland: Simon Fitch, Vince Gaffney and Ken Thomson
Image Source: drowned-landscapes.tumblr.com
Royal Society's Summer Science Blog (http://summer-science.tumblr.com/)
Big Data.
VOLUME      VARIETY      VELOCITY
 (Size)    (Structure)    (Speed)


          Big Data.
Getting useful insights
from awkward data sets
using the most appropriate
computing platform at each
stage.
   Dr Michael Newberry
   Windows Azure Lead
   Microsoft UK
Big data in a Hybrid-Cloud world
   Dr Michael Newberry
   Windows Azure Lead, Microsoft UK
   Michael.Newberry@Microsoft.com
Machine Learning & Bayes theorem
….Amazon (AMZN) calls this homegrown math "item-to-item collaborative filtering," and it's used this algorithm to heavily
customize the browsing experience for returning customers…. Judging by Amazon's success, the recommendation
system works. The company reported a 29% sales increase to $12.83 billion during its second fiscal quarter, up from
$9.9 billion during the same time last year. A lot of that growth arguably has to do with the way Amazon has integrated
recommendations into nearly every part of the purchasing process from product discovery to checkout.
http://tech.fortune.cnn.com/2012/07/30/amazon-5/
“In theory there is no difference between theory and practice;
                                          in practice, there is”.
                 Yogi Berra, cited in Nassim Taleb, Antifragile.
Big data techniques

NoSQL (ala MongoDB)   Map-Reduce (e.g. Hadoop)
Embedded devices
Cloud OS
MICROSOFT




Cloud OS
  1      CONSISTENT
         PLATFORM




ON-PREMISES   SERVICE PROVIDER
MANAGE ANY DATA, ANY SIZE, ANYWHERE


                         010101010101010101
                          1010101010101010
                           01010101010101
                            101010101010
POLYBASE: COMBINING RELATIONAL AND NON-
RELATIONAL DATA
The future of query processing


                   


                   


                   


                   
19
20
Lock-In
Windows Azure
                                            Other Service Providers
                           Windows
                       Virtual Machine




                     Customer Data Center
DATA PLATFORM DELIVERY MODELS




Rationale
for Usage




            On-Premises   On-Premises or     Microsoft Cloud or
Location
                          Service Provider   Service Provider
BALANCING ON PREMISE & CLOUD
Snowline graph
A
Takeaways
1. “big data” can do some amazing stuff.
2. Don’t think “big data” as much as “data needing non-
   relational approaches”
3. If your big data insights are probabilistic, which they often
   are, have a plan to deal with variance.
4. Pick the most appropriate platform: Think “and” not “or”:
   - Balance public cloud AND on-premise,
   - Combine “big data” with RDBMS.
Michael newberry

Mais conteúdo relacionado

Semelhante a Michael newberry

Semelhante a Michael newberry (20)

My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
 
Soluções de IoT no Microsoft Azure
Soluções de IoT no Microsoft AzureSoluções de IoT no Microsoft Azure
Soluções de IoT no Microsoft Azure
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Hybrid datasenter - fremtidsrettet og optimalisert med Microsoft Cloud OS
Hybrid datasenter - fremtidsrettet og optimalisert med Microsoft Cloud OSHybrid datasenter - fremtidsrettet og optimalisert med Microsoft Cloud OS
Hybrid datasenter - fremtidsrettet og optimalisert med Microsoft Cloud OS
 
Morning with MongoDB Paris 2012 - Making Big Data Small
Morning with MongoDB Paris 2012 - Making Big Data SmallMorning with MongoDB Paris 2012 - Making Big Data Small
Morning with MongoDB Paris 2012 - Making Big Data Small
 
Improve your Tech Quotient
Improve your Tech QuotientImprove your Tech Quotient
Improve your Tech Quotient
 
Big Data Fabric: A Recipe for Big Data Initiatives
Big Data Fabric: A Recipe for Big Data InitiativesBig Data Fabric: A Recipe for Big Data Initiatives
Big Data Fabric: A Recipe for Big Data Initiatives
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Cloud Computing & Mobile Applications: Opportunity for Thai Developers
 Cloud Computing & Mobile Applications: Opportunity for Thai Developers Cloud Computing & Mobile Applications: Opportunity for Thai Developers
Cloud Computing & Mobile Applications: Opportunity for Thai Developers
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
A blueprint for data in a multicloud world
A blueprint for data in a multicloud worldA blueprint for data in a multicloud world
A blueprint for data in a multicloud world
 
Every Cloud Has a Silver Lining
Every Cloud Has a Silver LiningEvery Cloud Has a Silver Lining
Every Cloud Has a Silver Lining
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Moving enterprise IT to the cloud
Moving enterprise IT to the cloudMoving enterprise IT to the cloud
Moving enterprise IT to the cloud
 
Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010Cloud Seminar Feb 4 2010
Cloud Seminar Feb 4 2010
 
Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)Best Practices in the Cloud for Data Management (US)
Best Practices in the Cloud for Data Management (US)
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 

Mais de PatrickCrompton

eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
PatrickCrompton
 
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
PatrickCrompton
 
eSynergy Keiran Sweet - Bringing order to chaos with puppet
eSynergy Keiran Sweet - Bringing order to chaos with puppeteSynergy Keiran Sweet - Bringing order to chaos with puppet
eSynergy Keiran Sweet - Bringing order to chaos with puppet
PatrickCrompton
 
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
PatrickCrompton
 
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to businesseSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
PatrickCrompton
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
 
Microsoft Azure User Group
Microsoft Azure User GroupMicrosoft Azure User Group
Microsoft Azure User Group
PatrickCrompton
 
Top 10 photos from Comic Relief 2013
Top 10 photos from Comic Relief 2013Top 10 photos from Comic Relief 2013
Top 10 photos from Comic Relief 2013
PatrickCrompton
 
Cloud and Big Data Conference Images
Cloud and Big Data Conference ImagesCloud and Big Data Conference Images
Cloud and Big Data Conference Images
PatrickCrompton
 

Mais de PatrickCrompton (17)

eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
 
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
eSynergy Andy Hawkins - Enabling DevOps through next generation configuration...
 
eSynergy Keiran Sweet - Bringing order to chaos with puppet
eSynergy Keiran Sweet - Bringing order to chaos with puppeteSynergy Keiran Sweet - Bringing order to chaos with puppet
eSynergy Keiran Sweet - Bringing order to chaos with puppet
 
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
eSynergy Dave Sayers - Applying DevOps principles in established corporate or...
 
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to businesseSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
eSynergy Paul Swartout - DevOps - what is it and why is it valuable to business
 
APSCo Cup Winners 2013
APSCo Cup Winners 2013APSCo Cup Winners 2013
APSCo Cup Winners 2013
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Rik Van Bruggen - Getting beer into and out of neo4j
Rik Van Bruggen - Getting beer into and out of neo4jRik Van Bruggen - Getting beer into and out of neo4j
Rik Van Bruggen - Getting beer into and out of neo4j
 
Happy Easter
Happy EasterHappy Easter
Happy Easter
 
Microsoft Azure User Group
Microsoft Azure User GroupMicrosoft Azure User Group
Microsoft Azure User Group
 
Top 10 photos from Comic Relief 2013
Top 10 photos from Comic Relief 2013Top 10 photos from Comic Relief 2013
Top 10 photos from Comic Relief 2013
 
Team photo
Team photoTeam photo
Team photo
 
Cloud and Big Data Conference Images
Cloud and Big Data Conference ImagesCloud and Big Data Conference Images
Cloud and Big Data Conference Images
 
Tim Marston.
Tim Marston.Tim Marston.
Tim Marston.
 
Tim marston
Tim marstonTim marston
Tim marston
 
Barak regev
Barak regevBarak regev
Barak regev
 
Andy cross
Andy crossAndy cross
Andy cross
 

Michael newberry

  • 1. Extracting Value from Big Data in the Cloud - Michael Newberry
  • 2. Big data in a Hybrid-Cloud world Dr Michael Newberry Windows Azure Lead, Microsoft UK Michael.Newberry@Microsoft.com
  • 3.
  • 4. Doggerland: Simon Fitch, Vince Gaffney and Ken Thomson Image Source: drowned-landscapes.tumblr.com Royal Society's Summer Science Blog (http://summer-science.tumblr.com/)
  • 6. VOLUME VARIETY VELOCITY (Size) (Structure) (Speed) Big Data.
  • 7. Getting useful insights from awkward data sets using the most appropriate computing platform at each stage. Dr Michael Newberry Windows Azure Lead Microsoft UK
  • 8. Big data in a Hybrid-Cloud world Dr Michael Newberry Windows Azure Lead, Microsoft UK Michael.Newberry@Microsoft.com
  • 9. Machine Learning & Bayes theorem
  • 10. ….Amazon (AMZN) calls this homegrown math "item-to-item collaborative filtering," and it's used this algorithm to heavily customize the browsing experience for returning customers…. Judging by Amazon's success, the recommendation system works. The company reported a 29% sales increase to $12.83 billion during its second fiscal quarter, up from $9.9 billion during the same time last year. A lot of that growth arguably has to do with the way Amazon has integrated recommendations into nearly every part of the purchasing process from product discovery to checkout. http://tech.fortune.cnn.com/2012/07/30/amazon-5/
  • 11. “In theory there is no difference between theory and practice; in practice, there is”. Yogi Berra, cited in Nassim Taleb, Antifragile.
  • 12. Big data techniques NoSQL (ala MongoDB) Map-Reduce (e.g. Hadoop)
  • 14.
  • 16. MICROSOFT Cloud OS 1 CONSISTENT PLATFORM ON-PREMISES SERVICE PROVIDER
  • 17. MANAGE ANY DATA, ANY SIZE, ANYWHERE 010101010101010101 1010101010101010 01010101010101 101010101010
  • 18. POLYBASE: COMBINING RELATIONAL AND NON- RELATIONAL DATA The future of query processing    
  • 19. 19
  • 20. 20
  • 21. Lock-In Windows Azure Other Service Providers Windows Virtual Machine Customer Data Center
  • 22. DATA PLATFORM DELIVERY MODELS Rationale for Usage On-Premises On-Premises or Microsoft Cloud or Location Service Provider Service Provider
  • 23. BALANCING ON PREMISE & CLOUD Snowline graph
  • 24. A
  • 25. Takeaways 1. “big data” can do some amazing stuff. 2. Don’t think “big data” as much as “data needing non- relational approaches” 3. If your big data insights are probabilistic, which they often are, have a plan to deal with variance. 4. Pick the most appropriate platform: Think “and” not “or”: - Balance public cloud AND on-premise, - Combine “big data” with RDBMS.

Notas do Editor

  1. A personal view from where I sit.Picking examples unlikely to be used by other speakers.Hype – freeform dynamics on the register: http://www.freeformdynamics.com/fullarticle.asp?aid=1590
  2. Big DataThis is a picture down the center isle of a shipping container from one of Microsoft’s datacenters. We put ~1800 computers inside one of these containers. Some of us had the privilege of working on the data storage and computational platform that powers Bing. We used 22 of these containers, spanning 40,000 machines where we stored over 100PB of data. This was three years ago, and now these servers are almost obsolete.Big Data is in constant motion and growing at an incredible rate,90% of the world’s data generated in just the past two years. That's remarkable growth.
  3. Doug Laney http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
  4. Don’t forget – other kinds of machine learningBridge into
  5. Need for these tools motivated by data explosion –“Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum. As an optimization, the reducer is also used as a combiner on the map outputs. This reduces the amount of data sent across the network by combining each word into a single record. “
  6. Future of query processingPioneered in the Jim Gray Systems Labs by David DeWitt, PolyBase is a federated query processor in SQL Server 2012 Parallel Data Warehouse which represents a breakthrough innovation from traditional query processing to join structured and unstructured data from Hadoop together. Without manual intervention, PolyBase Query Processor can accept a standard SQL query and combine tables from a relational source with tables from a Hadoop source directly through external tables.  As well, PolyBase Query Processor parallelizes the ability to import/export data to and from Hadoop giving PDW speed, simplicity, and responsiveness in addressing these new types of queries.Ability to issue standard T-SQL that joins relational data with unstructured data in Hadoop PolyBase rapidly imports/exports data between Hadoop and PDW in parallel3) PolyBase can query data in Hadoop directly without movement (with external tables)4) Created in “Gray Systems Labs” by David DeWitthttp://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx
  7. http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000002102As the game was prepared for release, however, 343 Industries was faced with an entirely new kind of challenge: to gain insight into player behavior and user preferences. To achieve this goal, Microsoft leadership asked 343 Industries to find a way to effectively mine user data. At the same time, the team was faced with another need: analyzing data during the five-week Halo 4 “Infinity Challenge” tournament and providing results each day to their tournament partner, Virgin Gaming. The Halo 4 Infinity Challenge, the largest free-to-enter online Halo tournament in the world, tracked a player’s personal score in the game’s multiplayer modes across a global leaderboard, giving players a chance to win more than 2,800 prizes. Virgin Gaming needed to use business intelligence (BI) data gathered during the event to update leaderboards on the tournament website.“..the average length of a game and the specific game features that players use the most. By getting these insights, the Halo 4 team can make frequent updates to the game. “Based on the user preference data we’re getting from Hadoop, we’re able to update game maps and game modes on a week-to-week basis,” says Vayman. “And the suggestions we get in the forums often find their way into the next week’s update. We can actually use this feedback to make changes and see if we attract new players. Hadoop and the forums are great tuning mechanisms for us.”