O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Unlocking value in your (big) data

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
Accenture big-data
Accenture big-data
Carregando em…3
×

Confira estes a seguir

1 de 21 Anúncio

Unlocking value in your (big) data

Baixar para ler offline

The presentation is a introduction to Big Data and analytics, how to go about enabling big data and analytics in our company, what are the main differences between big data analytics vs. traditional analytics and how to get started.

This material was used at the SAS Big Data Analytics event held in Helsinki on 19th of April 2011.

The slides are copyright of Accenture.

The presentation is a introduction to Big Data and analytics, how to go about enabling big data and analytics in our company, what are the main differences between big data analytics vs. traditional analytics and how to get started.

This material was used at the SAS Big Data Analytics event held in Helsinki on 19th of April 2011.

The slides are copyright of Accenture.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Quem viu também gostou (20)

Anúncio

Semelhante a Unlocking value in your (big) data (20)

Mais de Oscar Renalias (11)

Anúncio

Mais recentes (20)

Unlocking value in your (big) data

  1. 1. Unlocking Value in (Big) Data Oscar Renalias, Accenture oscar.renalias@accenture.com
  2. 2. About the presenter Oscar Renalias Oscar is a Technology Architect and has been working at Accenture in the Helsinki office for the last 5 years. He holds a Bachelor’s Degree in Computer Science from the Universitat Politècnica de Catalunya (UPC), in Barcelona. Oscar currently belongs to the global organization within Accenture responsible for pushing technology innovation, working with selected new and emerging technologies together with clients to generate business value. Hadoop/Big Data is one of those areas. Oscar.renalias@accenture.com +358407725915 Copyright © 2012 Accenture All rights reserved.
  3. 3. Agenda • Top 4 things about Big Data & Analytics • What is Big Data? • Big Data Analytics – what is it? • What does it contain? • How is it integrated? • How do we manage it? • What next? Copyright © 2012 Accenture All rights reserved.
  4. 4. Top 4 things about Big Data Analytics Resistance is futile, you will be assimilated Competitive advantage It’s different Data wants to be open Copyright © 2012 Accenture All rights reserved.
  5. 5. Data is growing It’s growing. Quickly. And it’s everywhere. Data stored in Exabytes (1018) 9000 7910 8000 7000 6000 5000 4000 3000 2000 1227 1000 130 0 2005 2010 2015 Source: IDC’s Digital Universe Study (sponsored by EMC), June 2011 Copyright © 2012 Accenture All rights reserved.
  6. 6. New kinds of data Structured data vs. Unstructured data growth Complex, Unstructured Analysis gap Our ability Relational to analyze Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. . Copyright © 2012 Accenture All rights reserved.
  7. 7. Big Data Technologies New technologies, new approaches Source: Wordle for Credit Suisse, Does Size Matter Only?, September 2011 Copyright © 2012 Accenture All rights reserved.
  8. 8. Where do analysts see Big Data? Gartner’s Hype Cycle for Emerging Technologies 2011 Copyright © 2012 Accenture All rights reserved.
  9. 9. MapReduce and Hadoop MapReduce revolutionized how we handle large amounts of data, Hadoop made it simple and affordable • Originally designed and first developed in Google as part of their efforts to more efficiently index the web • MapReduce splits input data into smaller chunk that can be processed in parallel • Scales linearly with number of nodes • Yahoo’s implementation of MapReduce • Open source, top-level project in the Apache Foundation • Designed to run on commodity software (Linux) and hardware (consumer-grade computers with directly attached storage) • Large ecosystem of additional components (both open source and commercial) Copyright © 2012 Accenture All rights reserved.
  10. 10. Big Data Analytics What is it? Big Data Analytics is a shift in the mindset of how we think about analytics as an internal component to the organization Focuses on letting data be productized in a way that drives meaningful insights in a rapid fashion and innovation to exploit missed opportunities in areas previously unlooked… … providing a path to competitive advantage Copyright © 2012 Accenture All rights reserved.
  11. 11. Big Data Analytics vs. traditional analytics Where do they differ? Technology Skills Processes & Organization Assumes Basic knowledge of “Siloed” data condensed, structured, an reporting and analysis organizations Traditional Analytics d feature rich datasets that tools, few specialized can be modeled: relational resources Only specific “views” of databases, data data visible across the warehouses, dashboards enterprise A stack of tools that Advanced Data is productized and enables an organization to analytical, mathematical shared across the Big Data Analytics build a framework that and statistical knowledge enterprise allows them to extract required to develop new useful features from a models – the data scientist Dedicated data large dataset to further organizations with well- understand how to model defined data management their data. processes and ownership Copyright © 2012 Accenture All rights reserved.
  12. 12. Everything will be analyzed The three Vs Real-time Event In- processing, H memory, NoS adoop + QL, Event NoSQL processing, E DW Velocity Relational, ET Hadoop, ETL L Batch Volume Structured Unstructured Variety Source: IDC Copyright © 2012 Accenture All rights reserved.
  13. 13. Big Data and Analytics in the Enterprise Many technology choices in a rapidly changing environment. Which one is right for you? Distributed Non-Relational Storage and Processing Big Data-Enabled Intelligence and Analysis Analytics-Focused Massively Parallel Processing (MPP) Software Platforms Hardware Optimized MPP Data Warehouses Distributed In-memory Cloud Copyright © 2012 Accenture All rights reserved.
  14. 14. Technology Augmenting existing analytics with Big Data technologies Emerging Data Technologies Big Data Analytics Traditional Tools Copyright © 2012 Accenture All rights reserved.
  15. 15. SAS-Hadoop integration An example of how traditional analytics tools are evolving to interoperate with Hadoop SAS/Access Interface to Hadoop • Enable SAS user to analyze data stored in Hadoop • Allow Hadoop data processing from SAS client software such as Data Integration Studio, Enterprise Guide and Enterprise Miner. • The Access Engine not only move data into and out of Hadoop, but you can also run data processing and have it “pushed-down” into Hadoop SAS Data Integration Studio Transformation for Hadoop • New sets of Hadoop transformations that enable DI studio user to load and unload data from Hadoop faster than Sqoop (Can connect to Oracle) • Perform “ETL-like” processing with Hive and Pig. • Hadoop specific scoring transform that enable models to be developed with Enterprise Miner to be deployed to Hadoop via DI Studio. Copyright © 2012 Accenture All rights reserved.
  16. 16. The impact of Big Data Analytics on our landscapes Hybrid landscapes, where old and new converge Internal apps, customer- facing apps, mobile Analysis tools apps (SAS, SPSS, R, Data Services (REST, WS) Tableau) Relational DBs Pig Hive HBase MapReduce HDFS Enterprise DW ETL Real-time analytics Time Series Files Social Logs Web ERP CRM Copyright © 2012 Accenture All rights reserved.
  17. 17. Data Science and the skill gap Closing the loop – it’s not just about technology skills Data science “The sexy job in the next 10 years will be statisticians” – Hal Varian, Chief Economist at Google Data scientists are the next-generation analytics professional, responsible for turning the data into insight Copyright © 2012 Accenture All rights reserved.
  18. 18. Big Data Analytics Management How does Big Data Analytics Management Style Differ? In big data analytics resources generally have a hybrid cross between Software Engineering and Advanced Statistics. This dynamic of skill sets produces a challenge in project methodology. Analytics Methodologies Software Methodologies Copyright © 2012 Accenture All rights reserved.
  19. 19. Wrapping up Big Data is challenging current patterns of thought Cost-effective Data computing and Big Data and Analytics “explosion” storage Everything can be Data everywhere: Resistance is futile stored structured, unstru ctured, other Are the path to competitive advantage and create value Cheap large scale people’s computing power data, geolocation Compared to traditional readily available data analytics, they’re different; adapt or become irrelevant Open your data Copyright © 2012 Accenture All rights reserved.
  20. 20. Wrapping up How to get started • Identify business processes that you could do more effectively with the help of big data and analytics • Start with well-funded but small trials and proof-of- concepts, evolve towards a solid roadmap • Open up your data, transformation towards a “data as a service” architecture • Acquire or grow the needed technology and analytical skills Copyright © 2012 Accenture All rights reserved.
  21. 21. Accenture Technology Vision Strong advice on data for 2012 http://bit.ly/accenturetechnologyvision2012 Copyright © 2012 Accenture All rights reserved.

Notas do Editor

  • We’llbuildontheseduringthepresentation
  • Thebadnews? It’snotgoing stop.Largeamounts of data bring a whole set of new challenges, howshouldwegoaboutthem?
  • It’s not just growing volumes of existing data, it’s also:The recognition of value in previously throw-away dataNew kinds of “data exhaust” – by-product data generated as part of other processes, currently ignored or thrown awayNew kinds of “intentional” dataThe combination of previously separate data
  • Big Data isnot so muchaboutthe “big”, butaboutfinding new waystohandle and analyze data thatwerenotpossiblebefore. There are a wholelot of new technologiesthat can be usedtodealwithbig data. Are familiar withall of them? Whichoneismostsuitableforyour case?
  • Source: http://www.gartner.com/it/page.jsp?id=1763814
  • Let’s stopfor a secondto look at thekeyenablertechnologies in Big Data.MapReduceOriginallydesigned and firstdeveloped in Google as part of theireffortsto more efficientlyindexthe webMapReduce splits input data into smaller chunk that can be processed in parallel.Scales linearly with number of nodes.HadoopOpen sourceimplementation of MapReduce, basedonGoogle’swhitepaper. Started in Yahoo, nowan top-levelproject in the Apache Foundation.Runsoncommodity software (Linux) and hardware (consumer-grade computerswithdirectlyattachedstorage)Ratherstraightforwardtoinstall and administrateLargeecosystem of additional open sourcecomponents: Pig, Hive, Oozie, FlumeLargeecosystem of commercialofferings (bothclosed and open source)
  • Big Data AnalyticsTechnologyMultiple tools and technologies, sometimes for the same purpose: Hadoop, NoSQL databases, in-memory analytics)Time to information is critical to extract value from data sources that include mobile devices, RFID, the web and a growing list of automated sensory technologiestraditional data warehousing processes are too slow and limited in scalabilityability to converge data from multiple data sources, both structured and unstructureddecreased that time to informationSkillsThere’sonly so muchwe can do withexploratoryprocesses; theonlywaystoeffectivelyanalyzebig data requiremathematical and statisticalconceptswithwhich more traditionalanalysts are not familiarBusinessanalystsusedto be abletomanagewith Excel and basic SQL knowledge; nowwith data thatdoesnotfollowany particular model (it’sunstructuredafterall), thereis a needto look foranalysisthat are comfortablewithstatisticals and mathematicalconcepts, who are abletodevisetheirownmodelstofindpatters and insightswherethereapparentlywerenone.Processes & OrganizationData must be open and sharedacrosstheenterprise, supportedbyorganizationsthat “own” itData must be madeavailableacrosstheenterprise (i.e. wecan’tfindtrends in data thatwe do nothave)
  • Source: “Big Data Analytics:Future Architectures,Skills and Roadmapsfor the CIO”, IDC 2012 (http://www.sas.com/resources/asset/BigDataAnalytics-FutureArchitectures-Skills-RomapsfortheCIO.pdf)Thethree Vs:Velocity, Volume and VarietyEverythingwill be analyzed, buthowmuch do wehave, howsoon do weneedit and howfast can we do it?
  • MapReduce and Hadoop is currently seen as a low-level paradigm on top of which high-level tools must be built that are more intuitive and easy to use non-programmer types (business analysts, data scientists)Big Data technologies have not reached maturity yet and will continue to evolve over the next coming years. IT decision makers must still be realistic about the limits of what can be achieved via these technologies, sometimes waiting instead for the next generation of data technologies.There is also a lot of start-up activity happening (Scalar, MapR). Also, “traditional” large vendors do not want to be left behind: Microsoft SQL Server 2012 will be able to read and write data from Hadoop and HDFS or run Hadoop on Microsoft’s Azure PaaS, IBM has a version of InfoSphereBigInsights ready to be run on their SmartCloud solution and Oracle has recently introduced its own appliance of both a software and hardware solution with Hadoop and in-memory capabilities for handling large amounts of data.
  • Big Data Analytics is anaugmentation to existinganalytical infrastructure that willallow to scale and drive insights beyond “current capabilities”So the question becomes:how do we add these capabilities to interoperate with traditional tools?
  • The worlds of structured and unstructured data are rapidly converging. Architects and CIOs must find ways to manage this convergence and enable all forms of datamanagement to coexist, sometimes using bridge technologies, such as using Hadoop to process and import data into traditional systems in ways that wouldn’t be possible with just the RDBMS approach. “Hybrid” landscapes are justthat, where Hadoop isintegratedwithexisting data warehouses, traditionalrelationaldatabases and applications in a waythattheimpactontheenterpriseisminimized.The reality is that the EDW is evolving into a virtualized cloud ecosystem in which all of these database architectures can and will coexist in a pluggable “Big Data” storage layer alongside HDFS, HBase (Hadoop’s columnar database), Cassandra (a sibling Apache project that supports peer-to-peer persistence for complex event processing and other real-time applications), graph databases, and other “NoSQL” platforms behind an abstraction layer with MapReduce as its focusBig Data is not necessarily about its “bigness.” Very few organizations are going to need the type of scale that often makes the Big Data headlines. So, far from rendering the relational database obsolete, the new advances will be incorporated over time into the traditional databases, extending their performance.Adding Hadoop to the enterprise provides a cost effective place to store vast quantities of structured data from operational systems and combine it with both internal and externally sourced unstructured / semi-structured data.Also advanced MapReduce analytical methods can be used directly against that store, or through Hive / Hbase more traditional BI tools can be used to analyze the data.
  • We’veseenthetools,butwhoisgoingtobuild, run and maintainallthis?TechnologyskillsTheemergence of big data isbasedon new technologiesthatrequireeither training orsourcingadditionalexpertiseData scienceTraditionalanalyticalmodels do notgenerallyscalewelltothetypical “big data-like” volumes; new ways of thinking are needed, waysthathelpfindwhatwewantedtofind as well as whatwedidnotknowwecouldfindData scientists are thenextgeneration of businessanalysts, withstrongstatisticalskills and abletothink “outside of the box” lookingfor new analyticalmodels.
  • Agile software developmentmethodologies are one of thepotentialanswerstothis.A data strategyisrequired, butwithanapproachthatisaboutmodelingless and iterating more (justlike agile).
  • Require new tools and technologyBig Data doesn’talwaysgetitright,withorwithoutanalytics (wacky iTunes and Spotifyrecommendations, weirdLinkedInsuggestions)Require new skills in yourworkforceResistanceisfutile – Big Data and analytics are inescapableTheycreatebusinessvalueforthebottom-lineItisthepathtocompetitiveadvantageBig Data isnotonlytransforming IT, itisalsotransformingbusinesses and industries: retailrecommendations, smart meter/gridanalytics
  • How do wegetstartedwithallthis?Identifywhichbusinessprocessescouldbenefitthemostfromimprovedhandling and processing of largeamounts of data – what are thebusinessdecisionsthatwemakeeachday and thatwe’dliketomake more efficiently and more effectively?Productize data acrossthecompany, makeit a “firstclasscitizen” and providesomekind of data servicelayer so that data isaccessiblethroughouttheenterpriseIdentifytheskill and technology gaps and decide whethertogroworacquire new talent and technologyforthecompany (withorwithoutthecloud)Itisclearthatthisrequiresaninvestment; itisthepath forward, butitrequiresthatyou as decision-makersmake a commitmenttogrowbig data in yourcompany.
  • Source: http://www.accenture.com/us-en/technology/technology-labs/Pages/insight-accenture-technology-vision-2012.aspx (http://bit.ly/accenturetechvision2012 and http://bit.ly/accenturetechnologyvision2012)

×