SlideShare uma empresa Scribd logo
1 de 27
Copyright	
  ©	
  2012	
  Splunk	
  Inc.	
  




Experiences	
  in	
  Streaming	
  
Analy>cs	
  at	
  Petabyte	
  	
  
(or	
  larger)	
  Scale	
  
Stephen	
  Sorkin	
  
VP	
  Engineering,	
  Splunk	
  Inc.	
  
Eddie	
  Sa/erly	
  
Chief	
  Big	
  Data	
  Evangelist,	
  Splunk	
  Inc.	
  
Big	
  Data	
  Comes	
  from	
  Machines	
  
                  Volume	
  	
  |	
  	
  Velocity	
  	
  |	
  	
  Variety	
  |	
  Variability	
  




 Machine-­‐generated	
  data	
  is	
  one	
  of	
  the	
  
       fastest	
  growing,	
  most	
  complex	
  	
  
                                                                                                             GPS,	
  
and	
  most	
  valuable	
  segments	
  of	
  big	
  data	
                                                  RFID,	
  
                                                                                                   Hypervisor,	
  
                                                                                                 Web	
  Servers,	
  
                                                                                             Email,	
  Messaging	
  
                                                                                      Clickstreams,	
  Mobile,	
  	
  
                                                                                 Telephony,	
  IVR,	
  Databases,	
  
                                                                          Sensors,	
  Telema>cs,	
  Storage,	
  
                                                                    Servers,	
  Security	
  Devices,	
  Desktops	
  	
  




                                                        2	
  
What	
  Does	
  Machine	
  Data	
  Look	
  Like?	
  
    Sources	
  


Order	
  Processing	
  



   Middleware	
  	
  
      Error	
  




     Care	
  IVR	
  




      Twi/er	
  




                                                 3	
  
Machine	
  Data	
  Contains	
  Cri>cal	
  Insights	
  
    Sources	
  
                                                                   Customer	
  ID	
       Order	
  ID	
                   Product	
  ID	
  


Order	
  Processing	
  
                                                                                            Order	
  ID	
       Customer	
  ID	
  

   Middleware	
  	
  
      Error	
  

                            Time	
  Wai>ng	
  On	
  Hold	
  


     Care	
  IVR	
                                    Customer	
  ID	
  



                                                                                     TwiZer	
                 Customer’s	
  Tweet	
  	
  
                                                                                     ID	
  
      Twi/er	
  
                               Company’s	
  TwiZer	
  ID	
  




                                                                             4	
  
Big	
  Data	
  Technologies	
  

                                                               Aster	
  Data	
          Cassandra	
  
                                                               Greenplum	
              Voldemort	
  
                                                                                        Big	
  Table	
  
                                                                                        CouchDB	
  
                                                                           Hadoop	
  




 Single	
        Single	
              RDBMS	
                  SQL	
  &	
                      NoSQL	
  
RDBMS	
         Bigger	
              Sharding	
              Map/Reduce	
  
                RDBMS	
  
                                                                                               Map	
  /	
  Reduce	
  



              Rela>onal	
  Database	
  (highly	
  structured)	
                      Key/Value,	
  Tables	
  or	
  	
     Temporal,	
  Unstructured	
  
                                                                                    Other	
  (semi-­‐structured)	
           Heterogeneous	
  

                                                                                                                                                          Time	
  


                                                                            5	
  
Splunk	
  Turns	
  Machine	
  Data	
  into	
  Real-­‐>me	
  Insights	
  
         Op>mized	
  for	
  real-­‐>me,	
  low	
  latency	
  and	
  interac>vity	
  


                                                                                          Ad	
  hoc	
  	
  
                                                                                          search	
  



                                                                                        Monitor	
  	
  
                                                                                        and	
  alert	
  

                         Real-­‐Dme	
  
                       CollecDon	
  and	
  
                             	
                                                        Report	
  and	
  
                         Indexing	
                                                     analyze	
  


                                              Splunk	
  storage	
  
                                                                          Other	
  
                                                                                         Custom	
  	
  
                                                                          Stores	
     dashboards	
  



                                                                                       Developer	
  
                                                                                        PlaHorm	
  



                                                 6	
  
Splunk	
  Collects	
  and	
  Indexes	
  Any	
  Machine	
  Data	
  
                               No	
  upfront	
  schema.	
  No	
  RDBMS.	
  No	
  custom	
  connectors.	
  



Customer	
  	
                                                                                                                                                                                   Outside	
  the	
  
Facing	
  Data	
                                                                                                                                                                                 Datacenter	
  

!     Click-­‐stream	
  data	
                                                                                                                                                                   !    Manufacturing,	
  
!     Shopping	
  cart	
  data	
                                                                                                                                                                      logis>cs…	
  
!     Online	
  transac>on	
  data	
                                                                                                                                                             !    CDRs	
  &	
  IPDRs	
  
                                                                                                                                                                                                 !    Power	
  consump>on	
  
                                                                                                                                                                                                 !    RFID	
  data	
  
                                                    Logfiles	
         Configs	
   Messages	
             Traps	
  	
      Metrics	
              Scripts	
         Changes	
       Tickets	
      !    GPS	
  data	
  
                                                                                                     	
  Alerts	
  




     Windows	
                           Linux/Unix	
                          VirtualizaDon	
  	
                              ApplicaDons	
                           Databases	
                   Networking	
  
     !    Registry	
                     !    Configura>ons	
  
                                                                               &	
  Cloud	
                                     !    Web	
  logs	
                       !    Configura>ons	
          !    Configura>ons	
  
     !    Event	
  logs	
                !    syslog	
                                                                          !    Log4J,	
  JMS,	
  JMX	
             !    Audit/query	
           !    syslog	
  
     !    File	
  system	
               !    File	
  system	
                 !    Hypervisor	
                                !    .NET	
  events	
                         logs	
                  !    SNMP	
  
     !    sysinternals	
                 !    ps,	
  iostat,	
  top	
          !    Guest	
  OS,	
  Apps	
                      !    Code	
  and	
  scripts	
            !    Tables	
                !    neglow	
  
                                                                               !    Cloud	
                                                                              !    Schemas	
  




                                                                                                                        7	
  
New	
  Approach	
  to	
  Analyzing	
  Heterogeneous	
  Data	
  

         Universal	
  	
                                Late	
  Structure	
                       Analysis	
  and	
  
         Indexing	
                                        Binding	
                              Visualiza>on	
  

!   No	
  data	
  normaliza>on	
                   !   Knowledge	
  applied	
  at	
               !   Normaliza>on	
  as	
  it’s	
  
!   Automa>cally	
  handles	
                          search-­‐>me	
                                 needed	
  
    >mestamps	
                                    !   No	
  briZle	
  schema	
  to	
  work	
     !   Faster	
  implementa>on	
  
!   Parsers	
  not	
  required	
                       around	
                                   !   Easy	
  search	
  language	
  
!   Index	
  every	
  term	
  &	
                  !   Mul>ple	
  views	
  into	
  the	
          !   Mul>ple	
  views	
  into	
  the	
  
    paZern	
  “blindly”	
                              same	
  data	
                                 same	
  data	
  
!   No	
  aZempt	
  to	
                           !   Find	
  transac>ons,	
  paZerns	
  
    “understand”	
  up	
  front	
                      and	
  trends	
  




                            Rapid	
  >me-­‐to-­‐deploy:	
  hours	
  or	
  days	
  



                                                                        8	
  
Splunk	
  Search	
  Processing	
  Language	
  
Lots	
  of	
  random	
  “hypothe>cal	
  examples”	
  from	
  our	
  Mugs	
  




                                     9
Opera>onal	
  Intelligence	
  for	
  IT	
  and	
  Business	
  Users	
  
               IT	
  Opera>ons	
  Management	
                                                                                             Web	
  Intelligence	
  




                                Applica>on	
  Management	
  	
  	
  	
  	
  	
                                       Business	
  Analy>cs	
  




                                                                                   Security	
  &	
  Compliance	
  

Customer	
                                                                                                                                                                      LOB	
  Owners/	
  
 Support	
                                                                                                                                                                       Execu>ves	
  




                Opera>ons	
                                                                                                                              Website/Business	
  
                 Teams	
                                                                                                                                    Analysts	
  




                                  System	
                                                                                                  IT	
  	
  
                                Administrator	
                                                                                         Execu>ves	
  
                                                             Development	
  	
  
                                                                                               Security	
             Auditors	
  
                                                                Teams	
  
                                                                                               Analysts	
  




                                                                                              10	
  
Scalability	
  to	
  Tens	
  of	
  TBs/Day	
  on	
  Commodity	
  Servers	
  




                                                             Offload	
  search	
  load	
  to	
  Splunk	
  Search	
  Heads	
  	
  




                        Auto	
  load-­‐balanced	
  forwarding	
  to	
  as	
  many	
  Splunk	
  Indexers	
  as	
  you	
  need	
  to	
  index	
  terabytes/day	
  




 Send	
  data	
  from	
  1000s	
  of	
  servers	
  using	
  combina>on	
  of	
  Splunk	
  Forwarders,	
  syslog,	
  WMI,	
  message	
  queues,	
  or	
  other	
  remote	
  protocols	
  




                                                                                             11	
  
Splunk	
  Big	
  Data	
  Solu>on	
  


     Product-­‐based	
                         Integrated	
  and	
  	
                        Performance	
  	
  
        Solu>on	
                                End-­‐to-­‐end	
                               at	
  scale	
  

!    Easy	
  to	
  download	
  and	
      !    Collects	
  data	
  from	
  tens	
      !    Proven	
  at	
  mul>-­‐terabyte	
  
     deploy	
                                  of	
  thousands	
  of	
  sources	
           scale	
  per	
  day	
  
!    Pre-­‐integrated,	
  end-­‐to-­‐     !    Advanced	
  real-­‐>me	
  and	
         !    Upwards	
  of	
  PB	
  under	
  
     end	
  func>onality	
                     historical	
  analysis	
  of	
               management	
  
!    Enterprise-­‐grade	
                      data	
                                  !    4,000+	
  customers	
  
     features	
                           !    Fast,	
  custom	
  
                                               visualiza>ons	
  for	
  IT	
  and	
  
                                               business	
  users	
  
                                          !    Developer	
  APIs	
  SDKs	
  




                                                              12	
  
Accelerate	
  Games	
  Releases	
  with	
  Big	
  Data	
  Insight	
  


Splunk	
  Use:	
  
 –    Over	
  10	
  TB/day	
  from	
  scaled-­‐out	
  cloud	
  and	
  physical	
  infrastructure	
  
 –    Data	
  indexed	
  includes	
  web	
  server	
  and	
  applica>on	
  logs	
  for	
  games	
  
 –    Splunk	
  for	
  opera>onal	
  visibility,	
  troubleshoo>ng	
  and	
  monitoring	
  
 –    Users	
  include:	
  game	
  opera>ons,	
  developers,	
  and	
  corporate	
  IT	
  

Value	
  Delivered:	
  
 –    Faster	
  game	
  releases	
  with	
  real-­‐>me	
  visibility	
  into	
  produc>on	
  issues	
  
 –    Reduced	
  fault	
  resolu>on	
  >me	
  from	
  hours	
  to	
  minutes	
  
 –    Scale	
  ops	
  team	
  to	
  manage	
  and	
  monitor	
  growing	
  infrastructure	
  



                                                                                                          l    Leading	
  social	
  gaming	
  company	
  
                                                                                                                globally	
  
                                                                                                          l    232	
  million	
  monthly	
  ac>ve	
  users	
  
                                                                                                          l    60	
  million	
  daily	
  ac>ve	
  users	
  	
  




                                                                                         13	
  
!    Launched	
  in	
  November	
  2008	
  
!    Over	
  33	
  million	
  ac>ve	
  customers	
  (as	
  of	
  December	
  2011)	
  
!    More	
  than	
  11,000	
  employees	
  worldwide	
  
!    Ac>ve	
  in	
  48	
  countries	
  
!    Running	
  over	
  1,000	
  deals/day	
  worldwide	
  
Daily	
  Uses	
  of	
  Splunk	
  

                    Key	
  AcDviDes	
                                                         Splunk	
  Use	
  Cases	
  
!   Guarantee	
  API	
  performance	
                                   !    All	
  log	
  data	
  is	
  available	
  through	
  Splunk	
  
!   Monitor	
  API	
  data	
  usage	
                                   !    Dashboards	
  
!    Early	
  access	
  to	
  key	
  business	
  metrics	
              !    No>fica>ons	
  
     (conversions,	
  funnel,	
  etc.)	
  
!    End-­‐to-­‐end	
  tes>ng	
                                >	
      !    Near	
  real-­‐>me	
  

!    Ad	
  hoc	
  troubleshoo>ng	
  

                                                                                 “Cannot	
  have	
  a	
  server	
  that	
  is	
  
                                                                                  not	
  sending	
  data	
  into	
  Splunk”	
  




                                                               15	
  
Dashboards	
  




           16	
  
Complemen>ng	
  BI	
  and	
  Hadoop	
  


CollecDon	
  &	
  OperaDonal	
  Intelligence	
                              Daily,	
  weekly,	
  monthly	
  metrics	
  across	
  promo>ons	
  	
  
                                                                            offers	
  and	
  acceptance	
  rates	
  
                                                                            Applica>on	
  Performance	
  Management	
  (APM)	
  	
  
                                                                            and	
  system	
  availability	
  

                  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Hadoop	
   Machine	
  Data	
  ETL	
  –	
  highly	
  reliable	
  data	
  delivery	
  	
  
                                              IntegraDon	
                 to	
  HDFS	
  


  Data	
  Archival	
  &	
  Batch	
  Data	
  Science	
  
                                                                           Long-­‐term	
  data	
  warehousing	
  and	
  specialized,	
  batch	
  
                                                                           analy>cs	
  




                                                                                                      17	
  
Turning	
  Big	
  Data	
  Into	
  
Opera>onal	
  Insights	
  
at	
  Expedia	
  
Formerly	
  -­‐	
  Sr.	
  Director	
  –	
  	
  
Who	
  	
           Eddie	
  Sa/erly	
  
Am	
  I?	
                                              Architecture	
  &	
  Engineering,	
  Expedia	
  




                !     The	
  World’s	
  Largest	
  	
                !    Discount	
  travel	
  site	
  
                      Travel	
  Site	
                                    Hotwire®	
  
                !     First	
  $1B	
  Quarter	
  in	
  2011	
        !    4,000+	
  Technology	
  Workers	
  
                !     90	
  localized	
  Expedia.com®	
  and	
       !    Development	
  Team	
  	
  
Who	
  Is	
           Hotels.com®	
  sites	
                              of	
  1,800	
  
Expedia?	
                                                           !    NASDAQ:	
  (EXPE)	
  




                                                        19	
  
Where	
  Splunk	
  Comes	
  In	
  


12,000+	
  	
               27,000+	
  	
                       1,000+	
  	
              227,000	
  	
  
   Servers	
                     Hosts	
                       Source	
  Types	
              Sources	
  




  38	
  Indexers,	
  	
  16	
  Search	
  heads	
                      >	
  6.5TB	
  per	
  day	
  indexed	
  


20+	
  Different	
  Solu>ons	
  for	
  RCA	
  	
  	
  	
  	
  All	
  Migrated	
  to	
  Splunk	
  in	
  3	
  Months   	
  




                                                      20	
  
SDK	
  Integra>ons	
  
built	
  for	
  Cassandra	
  	
  
                                                 Why	
  Splunk?	
                                   Archiving	
  Data	
  to	
  
                                                                                                    Hadoop	
  for	
  batch	
  
    data	
  stores	
                                                                                    analysis	
  
                                                       Speed	
  	
  
                                                         of	
  	
  
                                                     Deployment	
  

                      Splunkbase	
  Apps	
                                         Scales	
  via	
  
                        Available	
  for	
                                        Commodity	
  
                         Download	
                                                Hardware	
  




                     Developers	
  Build	
                                      Aggrega>on	
  of	
  	
  
                     Custom	
  Apps	
  and	
                                    Log	
  Data	
  from	
  	
  
                       Dashboards	
                                              Any	
  Device	
  

                                                     Simple	
  UI	
  	
  
                                                     for	
  IT	
  and	
  	
  
                                                   Business	
  Users	
  



                                                              21	
  
Splunk	
  Adop>on	
  Over	
  Ten	
  Months	
  



Use	
  case:	
  Business	
  Unit	
  	
           Use	
  case:	
  Ecommerce	
  Systems	
  
Data:	
  125GB/day	
                             Data:	
  1.8TB/day	
  
Systems:	
  1100	
                               Systems:	
  8700	
  
Deployment:	
  Jan.	
  2011	
                    Deployment:	
  March	
  2011	
                                                               Big	
  Data	
  Integra>on	
  
                                                 	
  
                                                                                                                                          Use	
  case:	
  App	
  Transac>ons	
  	
  
                                                                                                                                          Data:	
  3TB/day	
  
             Ini>al	
  Pilot	
                       Viral	
  Growth	
  from	
                                                            Systems:	
  90TB	
  Data	
  Per	
  Mo.	
  
                                                    Demonstrated	
  Value	
                                                               Deployment:	
  1Q12-­‐2Q12	
  
                                                                                                                All	
  Devices,	
  	
  
                                                                                                             All	
  Data	
  Centers	
  

                                                                                                     Use	
  case:	
  All	
  Devices	
  
                                                                                                     Data:	
  ~4TB/day	
  
                                                                                                     Systems:	
  ~21000	
  
                                                                                                     Deployment:	
  Aug.	
  2011	
  




                                                                                            22	
  
Integrate	
  External	
  Data	
  
             Extend	
  search	
  with	
  lookups	
  to	
  external	
  data	
  sources.	
  
                                                    	
  


           LDAP,	
  AD	
                                                                   Watch	
  Lists	
  




                CMDB	
                                                           Message	
  
                                                                                  Stores	
  


                                                                                    Reference	
  
                                                                                    Lookups	
  




Correlate	
  across	
  mul>ple	
  data	
  sources	
  and	
  data	
  sets	
  using	
  indexes	
  and	
  keys	
  



                                                       23	
  
Unique	
  Characteris>cs	
  of	
  Splunk	
  MapReduce	
  




 •    Real-­‐>me	
  temporal	
  MapReduce	
  
 •    Preview	
  in-­‐progress	
  searches	
  
 •    Searching	
  works	
  on	
  any	
  devices	
  
 •    Simplified	
  Search	
  Language	
  




                                   24	
  
Splunk	
  Impact	
  /	
  Top	
  Takeaways	
  

           Splunk	
  helped	
  deliver	
  Expedia	
  an	
  annual	
  ROI	
  of	
  over	
  $11	
  Million	
  


     ROI	
  =	
  5x	
  original	
                  Splunk	
  usage	
  	
                         More	
  data	
  =	
  	
  
      Business	
  Case	
                              is	
  viral	
                              more	
  benefits	
  


!    Tools	
  Consolida>on	
               !    50+	
  Apps	
  Developed	
  	
         !      Adding	
  more	
  data	
  to	
  
     and	
  Re>rement	
                         by	
  Our	
  Team	
                           Splunk	
  via	
  weekly	
  
                                                                                              deployments	
  	
  
!    83%	
  MTTR	
  Reduc>on	
  	
         !    Over	
  1,400	
  Users	
  on	
  	
  
                                                                                       !      Analyzing	
  more	
  data	
  
     Outage	
  Avoidance	
                      a	
  Regular	
  Basis	
  
!                                                                                             sets	
  in	
  Splunk	
  UI	
  from	
  
                                                                                              Hadoop	
  &	
  Cassandra	
  

                                                                                       	
  




                                                                 25	
  
splunk.com/bigdata	
  
	
  
Ques>ons?	
  
Sessions will resume at 11:25am




                             Page 27

Mais conteúdo relacionado

Semelhante a Experiences Streaming Analytics at Petabyte Scale

X commerce open stack beijing keynote - 2012-08-10 final
X commerce   open stack beijing keynote - 2012-08-10 finalX commerce   open stack beijing keynote - 2012-08-10 final
X commerce open stack beijing keynote - 2012-08-10 final
OpenCity Community
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
MapR Technologies
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of Business
DataWorks Summit
 
Drupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows AzureDrupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows Azure
Ford AntiTrust
 
Lap around windows azure
Lap around windows azureLap around windows azure
Lap around windows azure
Manish Corriea
 
EvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics PlatformEvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics Platform
Sergei Dolukhanov
 
EvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics PlatformEvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics Platform
Sergei Dolukhanov
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsets
responsedatacomms
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsets
responsedatacomms
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industries
dirkbeth
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
Odinot Stanislas
 

Semelhante a Experiences Streaming Analytics at Petabyte Scale (20)

Introducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data EngineIntroducing Splunk – The Big Data Engine
Introducing Splunk – The Big Data Engine
 
Don't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROIDon't be Hadooped when looking for Big Data ROI
Don't be Hadooped when looking for Big Data ROI
 
X commerce open stack beijing keynote - 2012-08-10 final
X commerce   open stack beijing keynote - 2012-08-10 finalX commerce   open stack beijing keynote - 2012-08-10 final
X commerce open stack beijing keynote - 2012-08-10 final
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Implementing Big Data at the Speed of Business
Implementing Big Data at the Speed of BusinessImplementing Big Data at the Speed of Business
Implementing Big Data at the Speed of Business
 
Drupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows AzureDrupal in the Cloud with Windows Azure
Drupal in the Cloud with Windows Azure
 
Windows Azure Platform - The Color of Cloud Computing
Windows Azure Platform - The Color of Cloud ComputingWindows Azure Platform - The Color of Cloud Computing
Windows Azure Platform - The Color of Cloud Computing
 
Lap around windows azure
Lap around windows azureLap around windows azure
Lap around windows azure
 
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on DemandApachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
Apachecon Euro 2012: Elastic, Multi-tenant Hadoop on Demand
 
EvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics PlatformEvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics Platform
 
EvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics PlatformEvoApp - Bermuda Real-Time Analytics Platform
EvoApp - Bermuda Real-Time Analytics Platform
 
Android Virtualization: Opportunity and Organization
Android Virtualization: Opportunity and OrganizationAndroid Virtualization: Opportunity and Organization
Android Virtualization: Opportunity and Organization
 
Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).Scalable Computing Labs (SCL).
Scalable Computing Labs (SCL).
 
Windows Azure Overview
Windows Azure OverviewWindows Azure Overview
Windows Azure Overview
 
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
Overcoming the Top Four Challenges to Real-Time Performance in Large-Scale, D...
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsets
 
Ixia anue maximum roi from your existing toolsets
Ixia anue   maximum roi from your existing toolsetsIxia anue   maximum roi from your existing toolsets
Ixia anue maximum roi from your existing toolsets
 
The Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated IndustriesThe Move to the Cloud for Regulated Industries
The Move to the Cloud for Regulated Industries
 
Solving Compliance for Big Data
Solving Compliance for Big DataSolving Compliance for Big Data
Solving Compliance for Big Data
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 

Mais de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Experiences Streaming Analytics at Petabyte Scale

  • 1. Copyright  ©  2012  Splunk  Inc.   Experiences  in  Streaming   Analy>cs  at  Petabyte     (or  larger)  Scale   Stephen  Sorkin   VP  Engineering,  Splunk  Inc.   Eddie  Sa/erly   Chief  Big  Data  Evangelist,  Splunk  Inc.  
  • 2. Big  Data  Comes  from  Machines   Volume    |    Velocity    |    Variety  |  Variability   Machine-­‐generated  data  is  one  of  the   fastest  growing,  most  complex     GPS,   and  most  valuable  segments  of  big  data   RFID,   Hypervisor,   Web  Servers,   Email,  Messaging   Clickstreams,  Mobile,     Telephony,  IVR,  Databases,   Sensors,  Telema>cs,  Storage,   Servers,  Security  Devices,  Desktops     2  
  • 3. What  Does  Machine  Data  Look  Like?   Sources   Order  Processing   Middleware     Error   Care  IVR   Twi/er   3  
  • 4. Machine  Data  Contains  Cri>cal  Insights   Sources   Customer  ID   Order  ID   Product  ID   Order  Processing   Order  ID   Customer  ID   Middleware     Error   Time  Wai>ng  On  Hold   Care  IVR   Customer  ID   TwiZer   Customer’s  Tweet     ID   Twi/er   Company’s  TwiZer  ID   4  
  • 5. Big  Data  Technologies   Aster  Data   Cassandra   Greenplum   Voldemort   Big  Table   CouchDB   Hadoop   Single   Single   RDBMS   SQL  &   NoSQL   RDBMS   Bigger   Sharding   Map/Reduce   RDBMS   Map  /  Reduce   Rela>onal  Database  (highly  structured)   Key/Value,  Tables  or     Temporal,  Unstructured   Other  (semi-­‐structured)   Heterogeneous   Time   5  
  • 6. Splunk  Turns  Machine  Data  into  Real-­‐>me  Insights   Op>mized  for  real-­‐>me,  low  latency  and  interac>vity   Ad  hoc     search   Monitor     and  alert   Real-­‐Dme   CollecDon  and     Report  and   Indexing   analyze   Splunk  storage   Other   Custom     Stores   dashboards   Developer   PlaHorm   6  
  • 7. Splunk  Collects  and  Indexes  Any  Machine  Data   No  upfront  schema.  No  RDBMS.  No  custom  connectors.   Customer     Outside  the   Facing  Data   Datacenter   !  Click-­‐stream  data   !  Manufacturing,   !  Shopping  cart  data   logis>cs…   !  Online  transac>on  data   !  CDRs  &  IPDRs   !  Power  consump>on   !  RFID  data   Logfiles   Configs   Messages   Traps     Metrics   Scripts   Changes   Tickets   !  GPS  data    Alerts   Windows   Linux/Unix   VirtualizaDon     ApplicaDons   Databases   Networking   !  Registry   !  Configura>ons   &  Cloud   !  Web  logs   !  Configura>ons   !  Configura>ons   !  Event  logs   !  syslog   !  Log4J,  JMS,  JMX   !  Audit/query   !  syslog   !  File  system   !  File  system   !  Hypervisor   !  .NET  events   logs   !  SNMP   ! sysinternals   !  ps,  iostat,  top   !  Guest  OS,  Apps   !  Code  and  scripts   !  Tables   !  neglow   !  Cloud   !  Schemas   7  
  • 8. New  Approach  to  Analyzing  Heterogeneous  Data   Universal     Late  Structure   Analysis  and   Indexing   Binding   Visualiza>on   ! No  data  normaliza>on   ! Knowledge  applied  at   ! Normaliza>on  as  it’s   ! Automa>cally  handles   search-­‐>me   needed   >mestamps   ! No  briZle  schema  to  work   ! Faster  implementa>on   ! Parsers  not  required   around   ! Easy  search  language   ! Index  every  term  &   ! Mul>ple  views  into  the   ! Mul>ple  views  into  the   paZern  “blindly”   same  data   same  data   ! No  aZempt  to   ! Find  transac>ons,  paZerns   “understand”  up  front   and  trends   Rapid  >me-­‐to-­‐deploy:  hours  or  days   8  
  • 9. Splunk  Search  Processing  Language   Lots  of  random  “hypothe>cal  examples”  from  our  Mugs   9
  • 10. Opera>onal  Intelligence  for  IT  and  Business  Users   IT  Opera>ons  Management   Web  Intelligence   Applica>on  Management             Business  Analy>cs   Security  &  Compliance   Customer   LOB  Owners/   Support   Execu>ves   Opera>ons   Website/Business   Teams   Analysts   System   IT     Administrator   Execu>ves   Development     Security   Auditors   Teams   Analysts   10  
  • 11. Scalability  to  Tens  of  TBs/Day  on  Commodity  Servers   Offload  search  load  to  Splunk  Search  Heads     Auto  load-­‐balanced  forwarding  to  as  many  Splunk  Indexers  as  you  need  to  index  terabytes/day   Send  data  from  1000s  of  servers  using  combina>on  of  Splunk  Forwarders,  syslog,  WMI,  message  queues,  or  other  remote  protocols   11  
  • 12. Splunk  Big  Data  Solu>on   Product-­‐based   Integrated  and     Performance     Solu>on   End-­‐to-­‐end   at  scale   !  Easy  to  download  and   !  Collects  data  from  tens   !  Proven  at  mul>-­‐terabyte   deploy   of  thousands  of  sources   scale  per  day   !  Pre-­‐integrated,  end-­‐to-­‐ !  Advanced  real-­‐>me  and   !  Upwards  of  PB  under   end  func>onality   historical  analysis  of   management   !  Enterprise-­‐grade   data   !  4,000+  customers   features   !  Fast,  custom   visualiza>ons  for  IT  and   business  users   !  Developer  APIs  SDKs   12  
  • 13. Accelerate  Games  Releases  with  Big  Data  Insight   Splunk  Use:   –  Over  10  TB/day  from  scaled-­‐out  cloud  and  physical  infrastructure   –  Data  indexed  includes  web  server  and  applica>on  logs  for  games   –  Splunk  for  opera>onal  visibility,  troubleshoo>ng  and  monitoring   –  Users  include:  game  opera>ons,  developers,  and  corporate  IT   Value  Delivered:   –  Faster  game  releases  with  real-­‐>me  visibility  into  produc>on  issues   –  Reduced  fault  resolu>on  >me  from  hours  to  minutes   –  Scale  ops  team  to  manage  and  monitor  growing  infrastructure   l  Leading  social  gaming  company   globally   l  232  million  monthly  ac>ve  users   l  60  million  daily  ac>ve  users     13  
  • 14. !  Launched  in  November  2008   !  Over  33  million  ac>ve  customers  (as  of  December  2011)   !  More  than  11,000  employees  worldwide   !  Ac>ve  in  48  countries   !  Running  over  1,000  deals/day  worldwide  
  • 15. Daily  Uses  of  Splunk   Key  AcDviDes   Splunk  Use  Cases   !  Guarantee  API  performance   !  All  log  data  is  available  through  Splunk   !   Monitor  API  data  usage   !  Dashboards   !  Early  access  to  key  business  metrics   !  No>fica>ons   (conversions,  funnel,  etc.)   !  End-­‐to-­‐end  tes>ng   >   !  Near  real-­‐>me   !  Ad  hoc  troubleshoo>ng   “Cannot  have  a  server  that  is   not  sending  data  into  Splunk”   15  
  • 16. Dashboards   16  
  • 17. Complemen>ng  BI  and  Hadoop   CollecDon  &  OperaDonal  Intelligence   Daily,  weekly,  monthly  metrics  across  promo>ons     offers  and  acceptance  rates   Applica>on  Performance  Management  (APM)     and  system  availability                                                    Hadoop   Machine  Data  ETL  –  highly  reliable  data  delivery     IntegraDon   to  HDFS   Data  Archival  &  Batch  Data  Science   Long-­‐term  data  warehousing  and  specialized,  batch   analy>cs   17  
  • 18. Turning  Big  Data  Into   Opera>onal  Insights   at  Expedia  
  • 19. Formerly  -­‐  Sr.  Director  –     Who     Eddie  Sa/erly   Am  I?   Architecture  &  Engineering,  Expedia   ! The  World’s  Largest     ! Discount  travel  site   Travel  Site   Hotwire®   ! First  $1B  Quarter  in  2011   ! 4,000+  Technology  Workers   ! 90  localized  Expedia.com®  and   ! Development  Team     Who  Is   Hotels.com®  sites   of  1,800   Expedia?   ! NASDAQ:  (EXPE)   19  
  • 20. Where  Splunk  Comes  In   12,000+     27,000+     1,000+     227,000     Servers   Hosts   Source  Types   Sources   38  Indexers,    16  Search  heads   >  6.5TB  per  day  indexed   20+  Different  Solu>ons  for  RCA          All  Migrated  to  Splunk  in  3  Months   20  
  • 21. SDK  Integra>ons   built  for  Cassandra     Why  Splunk?   Archiving  Data  to   Hadoop  for  batch   data  stores   analysis   Speed     of     Deployment   Splunkbase  Apps   Scales  via   Available  for   Commodity   Download   Hardware   Developers  Build   Aggrega>on  of     Custom  Apps  and   Log  Data  from     Dashboards   Any  Device   Simple  UI     for  IT  and     Business  Users   21  
  • 22. Splunk  Adop>on  Over  Ten  Months   Use  case:  Business  Unit     Use  case:  Ecommerce  Systems   Data:  125GB/day   Data:  1.8TB/day   Systems:  1100   Systems:  8700   Deployment:  Jan.  2011   Deployment:  March  2011   Big  Data  Integra>on     Use  case:  App  Transac>ons     Data:  3TB/day   Ini>al  Pilot   Viral  Growth  from   Systems:  90TB  Data  Per  Mo.   Demonstrated  Value   Deployment:  1Q12-­‐2Q12   All  Devices,     All  Data  Centers   Use  case:  All  Devices   Data:  ~4TB/day   Systems:  ~21000   Deployment:  Aug.  2011   22  
  • 23. Integrate  External  Data   Extend  search  with  lookups  to  external  data  sources.     LDAP,  AD   Watch  Lists   CMDB   Message   Stores   Reference   Lookups   Correlate  across  mul>ple  data  sources  and  data  sets  using  indexes  and  keys   23  
  • 24. Unique  Characteris>cs  of  Splunk  MapReduce   •  Real-­‐>me  temporal  MapReduce   •  Preview  in-­‐progress  searches   •  Searching  works  on  any  devices   •  Simplified  Search  Language   24  
  • 25. Splunk  Impact  /  Top  Takeaways   Splunk  helped  deliver  Expedia  an  annual  ROI  of  over  $11  Million   ROI  =  5x  original   Splunk  usage     More  data  =     Business  Case   is  viral   more  benefits   !  Tools  Consolida>on   !  50+  Apps  Developed     !  Adding  more  data  to   and  Re>rement   by  Our  Team   Splunk  via  weekly   deployments     !  83%  MTTR  Reduc>on     !  Over  1,400  Users  on     !  Analyzing  more  data   Outage  Avoidance   a  Regular  Basis   !  sets  in  Splunk  UI  from   Hadoop  &  Cassandra     25  
  • 27. Sessions will resume at 11:25am Page 27