SlideShare uma empresa Scribd logo
1 de 46
Lily
A SMART DATA PLATFORM
MAKING BIG DATA APPS EASY



     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
volume
                                                            data

                            need for
                           distributed
                           processing
                                                                        moore




                                                                             time




         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   2
need for
distributed
processing




    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   3
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   4
distributed
systems are
hard.
  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
data shuffling, data duplication



  database                     data warehouse                          analytics

      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org    6
“Top-performing
         organizations are twice
         as likely to apply
         analytics to activities.”

         (MIT Sloan Management
         Review, Winter 2011)



IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   7
Heavy Math?
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10
                                                                             8
lots of
data?
   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11
                                                                               9
what drives insights?




      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10
your audience does.




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11
data

                                                              audience data
recommendations



                                        insights
      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   12
LILY combines
scalable storage,
indexing and search
with real-time usage
metrics, insights and
recommendations.
   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
...on top of CDH

       Cloudera’s Distribution including Apache Hadoop




  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
lily features

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
schema
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   18
                                                                            16
lily data model
» adds a high-level data model on top of HBase’s byte[ ]’s




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   17
value types
» basic value types                                      » datetime

 » string                                                » blob

 » integer                                               » uri

 » long                                               » parametrized value types
 » double                                                » list
 » decimal                                               » path
 » boolean                                               » link
 » date                                                  » record

             IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   18
why a schema?

» contract for client apps
» validation
» application lifecycle mgmt
  » schema versioning
  » cfr. Avro
» content-based indexing



          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   19
versioning
    IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   17
                                                                                20
record version                                            schema version




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21
Versioning scopes




  data gets overwritten               versioned data                version status data



          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org       22
API
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23
builder API
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   24
builder API
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25
other apis



                                                               Java Builder
                                                                                      REST API
» REST (HTTP + json)                                               API

» Java (Avro)                                                              Java API




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                26
indexing
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   27
                                                                            19
features

» indexing configuration
» denormalization + link dereferencing
» indexing of multiple versions
» incremental and batch (MR) index updates
» blob content extraction (Apache Tika)
» Solr index sharding (!)



         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   28
indexer configuration




     IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   29
search



  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   20
                                                                              30
simple:
 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   31
consistency


  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21
                                                                              32
consistencY

» many things can go wrong
 » service failure: HBase, HDFS, Solr, Lily
 » network failure, time skew
 » node failure
» no Lily master node
 » services can pick up where died ones left
 » Zookeeper as service coordinator + lookup



         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   33
architecture

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
High-level data model / easy API                           indexes



    UI Framework                                  SDK
            (HUE)                               (HUE SDK)

                                                                         Search
                                                                                               Dev2Dev
  Workflow               Scheduling                    Metadata                               tutoring,
    (OOZIE)                  (oozie)                    (HIVE)                               integrated
                                                                                             deployment
                                                                                                 and
                         Languages /                                                         enterprise
    Data                  Compilers                       Fast       usage metrics,            support
Integration                (PIG, HIVE)                 Read/Write     analytics &
  (FLUME,                                                Access        recommen-
  SQOOP)                                                 (HBASE)        dations
                           (PIG, HIVE)



                                       Coordination
                                         (ZOOKEEPER)


                                                                            CDH
                 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                35
falling in love with Hbase
  HBase = Google BigTable, open source
» datamodel with column families and cell versioning: flexible, for sparse data
» ordered tables with range scans
» HDFS for blob storage
» Apache
» consistent!
» atomic single-row updates
» automatic scaling to large data sets
» fault-tolerance
» commodity hardware
» efficient random access M/R for index regeneration

            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   36
Lily Architecture
(deployment)




           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   37
Lily Architecture
                    (components)




                                   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   38
HBase indexing & RowLog Library
» building and querying                                          » need for sync/async
 indexes, GAE-style                                                 operations
                                                                    » updating of secondary indexes
               rowkey            col          col
 content
                   A             val3         foo6                      (e.g. link tables)
   table
                   B             val2         foo7
                                                                    » feeding of Indexer
                                                                        (= indexes Lily-content into Solr)
                        rowkey          col                      » not: transactions
           order




   index
 table A                val2-B
                        val3-A                                   » need for distribution and
                                                                    durability

                        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            39
WHERE?



                  www.lilyproject.org




IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   40
lily enterprise


» adds tools:
 » Redhat/Debian package repo
 » cluster deploy tools
   (based on Whirr)
 » Administration UI
» + enterprise support



         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   41
food for
(outer)thought


  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
from analytics to recommendations
» shorten distance between transactional and analytical aspects
» (near) real-time analytics > Insights
  » "people are buying this now"
» algorithmical feedback > data-backed feedback
  » based on Insights > recommendations
  » shorten feedback cycle
» store data + metadata + attention data in one data system
  » basis for analytical queries
  » single point of growth
  » incremental insights

           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   43
challenges ahead
» solving real-time aspects
 » in a distributed environment
» terra-byte-sized snapshots & backups?
 » build resilience into data architecture
   (i.e. against operator malfunction)
» presenting bigdata insights: new UIs
 » less static / report-driven
 » more 'Minority Report'
 » from near-real-time data to real-time decision systems

         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   44
conclusion: 4 trends


» real-time is rapidly becoming really important
» agility with complex/dynamic information
» predictive analysis
» one store for operational data + analytics




        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   45
Thank you !
                               for your attention
                               for your questions

                               » stevenn@outerthought.org

                               »           @stevenn

  IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

Mais conteúdo relacionado

Destaque

Menasa technical brochure sea
Menasa technical brochure seaMenasa technical brochure sea
Menasa technical brochure seaKryzka De Guzman
 
Grow and Retain Users with Analytics and Push Notifications
Grow and Retain Users with Analytics and Push NotificationsGrow and Retain Users with Analytics and Push Notifications
Grow and Retain Users with Analytics and Push NotificationsAmazon Web Services
 
Get Smart About Rewarded Video
Get Smart About Rewarded VideoGet Smart About Rewarded Video
Get Smart About Rewarded VideoSmart AdServer
 
Notiplastic Febrero 2015
Notiplastic Febrero 2015Notiplastic Febrero 2015
Notiplastic Febrero 2015avipla
 
Plan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos Tucci
Plan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos TucciPlan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos Tucci
Plan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos TucciAtilio José Zaldívar Ramírez
 
Qué es un email
Qué es un emailQué es un email
Qué es un emailColegioFebe
 
Revista IAI. Presentación Riesgos
Revista IAI. Presentación RiesgosRevista IAI. Presentación Riesgos
Revista IAI. Presentación Riesgospaco74
 
Revista NBE- New Business Enterprises - Edición Mayo '14
Revista NBE- New Business Enterprises - Edición Mayo '14Revista NBE- New Business Enterprises - Edición Mayo '14
Revista NBE- New Business Enterprises - Edición Mayo '14Gabriel Carlos Mancisidor
 
Netcitizens vietnam report en
Netcitizens vietnam report enNetcitizens vietnam report en
Netcitizens vietnam report enDung Do
 
Oekonomie-Digitalisierung
Oekonomie-DigitalisierungOekonomie-Digitalisierung
Oekonomie-DigitalisierungFESD GKr
 
Metodo suzuki v. 1
Metodo suzuki v. 1Metodo suzuki v. 1
Metodo suzuki v. 1Lucas Gomes
 
Informe de resultados para concursos online
Informe de resultados para concursos online Informe de resultados para concursos online
Informe de resultados para concursos online Convierte Más
 
12 Impresionante-(www.menudospeques.net)
12 Impresionante-(www.menudospeques.net)12 Impresionante-(www.menudospeques.net)
12 Impresionante-(www.menudospeques.net)Menudos Peques
 

Destaque (19)

Quipo n° 1
Quipo n° 1Quipo n° 1
Quipo n° 1
 
Clases de videojuegos
Clases de videojuegosClases de videojuegos
Clases de videojuegos
 
The insight group
The insight groupThe insight group
The insight group
 
Menasa technical brochure sea
Menasa technical brochure seaMenasa technical brochure sea
Menasa technical brochure sea
 
Grow and Retain Users with Analytics and Push Notifications
Grow and Retain Users with Analytics and Push NotificationsGrow and Retain Users with Analytics and Push Notifications
Grow and Retain Users with Analytics and Push Notifications
 
Get Smart About Rewarded Video
Get Smart About Rewarded VideoGet Smart About Rewarded Video
Get Smart About Rewarded Video
 
Notiplastic Febrero 2015
Notiplastic Febrero 2015Notiplastic Febrero 2015
Notiplastic Febrero 2015
 
Felinos
FelinosFelinos
Felinos
 
Plan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos Tucci
Plan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos TucciPlan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos Tucci
Plan de Drenage Urbano de Puerto Alegre Brasil del Dr. Carlos Tucci
 
Qué es un email
Qué es un emailQué es un email
Qué es un email
 
Revista IAI. Presentación Riesgos
Revista IAI. Presentación RiesgosRevista IAI. Presentación Riesgos
Revista IAI. Presentación Riesgos
 
Revista NBE- New Business Enterprises - Edición Mayo '14
Revista NBE- New Business Enterprises - Edición Mayo '14Revista NBE- New Business Enterprises - Edición Mayo '14
Revista NBE- New Business Enterprises - Edición Mayo '14
 
Netcitizens vietnam report en
Netcitizens vietnam report enNetcitizens vietnam report en
Netcitizens vietnam report en
 
Oekonomie-Digitalisierung
Oekonomie-DigitalisierungOekonomie-Digitalisierung
Oekonomie-Digitalisierung
 
Metodo suzuki v. 1
Metodo suzuki v. 1Metodo suzuki v. 1
Metodo suzuki v. 1
 
MR101 Combined
MR101 CombinedMR101 Combined
MR101 Combined
 
Informe de resultados para concursos online
Informe de resultados para concursos online Informe de resultados para concursos online
Informe de resultados para concursos online
 
12 Impresionante-(www.menudospeques.net)
12 Impresionante-(www.menudospeques.net)12 Impresionante-(www.menudospeques.net)
12 Impresionante-(www.menudospeques.net)
 
Informe OAC i Pla d'Atenció Ciutadana
Informe OAC i Pla d'Atenció CiutadanaInforme OAC i Pla d'Atenció Ciutadana
Informe OAC i Pla d'Atenció Ciutadana
 

Semelhante a Hadoop World 2011: Lily: Smart Data at Scale, Made Easy

Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work WebinarNGDATA
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseNGDATA
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog libraryNGDATA
 
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Ageクラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native AgeYoichi Kawasaki
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataNGDATA
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UKNGDATA
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...HostedbyConfluent
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB
 
Interop 2017 - Managing Containers in Production
Interop 2017 - Managing Containers in ProductionInterop 2017 - Managing Containers in Production
Interop 2017 - Managing Containers in ProductionBrian Gracely
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
Our Brave Modular Future
Our Brave Modular FutureOur Brave Modular Future
Our Brave Modular FutureOrchestrate
 

Semelhante a Hadoop World 2011: Lily: Smart Data at Scale, Made Easy (20)

Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
Sirris innovate2011 - Lily, Smart Data at scale made easy, Steven Noels, Oute...
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Ageクラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
クラウドネイティブ時代の分散トレーシング - Distributed Tracing in a Cloud Native Age
 
From Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart DataFrom Content Storage to Scaling Smart Data
From Content Storage to Scaling Smart Data
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
 
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
Safer Commutes & Streaming Data | George Padavick, Ohio Department of Transpo...
 
Huguk lily
Huguk lilyHuguk lily
Huguk lily
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
Interop 2017 - Managing Containers in Production
Interop 2017 - Managing Containers in ProductionInterop 2017 - Managing Containers in Production
Interop 2017 - Managing Containers in Production
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
Our Brave Modular Future
Our Brave Modular FutureOur Brave Modular Future
Our Brave Modular Future
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Hadoop World 2011: Lily: Smart Data at Scale, Made Easy

  • 1. Lily A SMART DATA PLATFORM MAKING BIG DATA APPS EASY IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 2. volume data need for distributed processing moore time IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
  • 3. need for distributed processing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3
  • 4. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4
  • 5. distributed systems are hard. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 6. data shuffling, data duplication database data warehouse analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6
  • 7. “Top-performing organizations are twice as likely to apply analytics to activities.” (MIT Sloan Management Review, Winter 2011) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7
  • 8. Heavy Math? IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10 8
  • 9. lots of data? IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11 9
  • 10. what drives insights? IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10
  • 11. your audience does. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11
  • 12. data audience data recommendations insights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12
  • 13. LILY combines scalable storage, indexing and search with real-time usage metrics, insights and recommendations. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 14. ...on top of CDH Cloudera’s Distribution including Apache Hadoop IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 15. lily features IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 16. schema IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18 16
  • 17. lily data model » adds a high-level data model on top of HBase’s byte[ ]’s IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17
  • 18. value types » basic value types » datetime » string » blob » integer » uri » long » parametrized value types » double » list » decimal » path » boolean » link » date » record IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18
  • 19. why a schema? » contract for client apps » validation » application lifecycle mgmt » schema versioning » cfr. Avro » content-based indexing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19
  • 20. versioning IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17 20
  • 21. record version schema version IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21
  • 22. Versioning scopes data gets overwritten versioned data version status data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22
  • 23. API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23
  • 24. builder API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24
  • 25. builder API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25
  • 26. other apis Java Builder REST API » REST (HTTP + json) API » Java (Avro) Java API IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 26
  • 27. indexing IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 27 19
  • 28. features » indexing configuration » denormalization + link dereferencing » indexing of multiple versions » incremental and batch (MR) index updates » blob content extraction (Apache Tika) » Solr index sharding (!) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 28
  • 29. indexer configuration IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 29
  • 30. search IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20 30
  • 31. simple: IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 31
  • 32. consistency IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21 32
  • 33. consistencY » many things can go wrong » service failure: HBase, HDFS, Solr, Lily » network failure, time skew » node failure » no Lily master node » services can pick up where died ones left » Zookeeper as service coordinator + lookup IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
  • 34. architecture IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 35. High-level data model / easy API indexes UI Framework SDK (HUE) (HUE SDK) Search Dev2Dev Workflow Scheduling Metadata tutoring, (OOZIE) (oozie) (HIVE) integrated deployment and Languages / enterprise Data Compilers Fast usage metrics, support Integration (PIG, HIVE) Read/Write analytics & (FLUME, Access recommen- SQOOP) (HBASE) dations (PIG, HIVE) Coordination (ZOOKEEPER) CDH IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
  • 36. falling in love with Hbase HBase = Google BigTable, open source » datamodel with column families and cell versioning: flexible, for sparse data » ordered tables with range scans » HDFS for blob storage » Apache » consistent! » atomic single-row updates » automatic scaling to large data sets » fault-tolerance » commodity hardware » efficient random access M/R for index regeneration IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 36
  • 37. Lily Architecture (deployment) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 37
  • 38. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
  • 39. HBase indexing & RowLog Library » building and querying » need for sync/async indexes, GAE-style operations » updating of secondary indexes rowkey col col content A val3 foo6 (e.g. link tables) table B val2 foo7 » feeding of Indexer (= indexes Lily-content into Solr) rowkey col » not: transactions order index table A val2-B val3-A » need for distribution and durability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 39
  • 40. WHERE? www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 40
  • 41. lily enterprise » adds tools: » Redhat/Debian package repo » cluster deploy tools (based on Whirr) » Administration UI » + enterprise support IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 41
  • 42. food for (outer)thought IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
  • 43. from analytics to recommendations » shorten distance between transactional and analytical aspects » (near) real-time analytics > Insights » "people are buying this now" » algorithmical feedback > data-backed feedback » based on Insights > recommendations » shorten feedback cycle » store data + metadata + attention data in one data system » basis for analytical queries » single point of growth » incremental insights IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 43
  • 44. challenges ahead » solving real-time aspects » in a distributed environment » terra-byte-sized snapshots & backups? » build resilience into data architecture (i.e. against operator malfunction) » presenting bigdata insights: new UIs » less static / report-driven » more 'Minority Report' » from near-real-time data to real-time decision systems IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 44
  • 45. conclusion: 4 trends » real-time is rapidly becoming really important » agility with complex/dynamic information » predictive analysis » one store for operational data + analytics IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 45
  • 46. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org