SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Smart data,
                 Lily                                                                        at scale
                                                                                             madE easy




                      from content storage
                      to scaling smart data


                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

maandag 6 juni 2011
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   2

maandag 6 juni 2011
the pain

                                                                    data


                                             need for
                                            distributed
                                            processing
                                                                             moore




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   3

maandag 6 juni 2011
the pain

              » growth of data sets
              » smart businesses need
                to apply analytics to                                            Smart data,
                activities
                                                                                 at scale
              » doing business online
                means real-time
                                                                                 madE easy
              » talent shortage


                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   4

maandag 6 juni 2011
LILY


                  The Real-time Platform built for the Age of Data.

                  We manage, track and measure your data and users,
                  and do the mat(c)hmaking in-between:
                  » provide you with business intelligence and analytics
                  » harvest user profiles and learn their interests
                  » dynamically engage your users using quality recommendations



                         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   5

maandag 6 juni 2011
where would you use lily?
        » large collections of data                              » large groups of users
             » content repositories                                 » e-commerce / retail
             » library catalogs                                     » news / media
             » (media) asset management
             » product catalogs
             » ‘live’ archives

                                                                 » ... if you want to use big
                                                                    data, but you need easy.
                        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   6

maandag 6 juni 2011
ns
                                                                           pe
                                                                           ap
                                                                      gic h
                                                                   ma
                                                                  he
                                                                  t
                                                               re
                                                             he
                                                          sw
                                                          si
                                                     +
                                                       thi




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   7

maandag 6 juni 2011
beyond content management

                                                                    marketing
                            broadcast




                                                                     revenue

                                                               product / service


                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   8

maandag 6 juni 2011
beyond content management: data + analytics

                                                              recommendations
                         call to action



                      personalised

                                                                     revenue

                                                               product / service
                                                         audience data
                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   9

maandag 6 juni 2011
LILY 2.0: smart data
                                      SMARTER DATA                 data processing
                                                  s
                                          relation
                                                                     recommendations
                                                                     semantic augmentation
                                                                     Analytics




                                                          usage
                                                         metrics              domain
                                                                            knowledge
                                                                              patterns
                                                                              rules
                                                                              keywords
                                                                              lists
                                                                              ...




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   10

maandag 6 juni 2011
roadmap
              » now: highly-scalable data repository: store, index and search
              » next: with real-time usage stats gathering and analytics
              » later: and built-in context- and user-sensitive
                  recommendations

              » built on top of Google BigTable / HBase / Solr
                  » identical, robust technology in use at Facebook, Twitter,
                      StumbleUpon, Yahoo!
                  » scales widely over distributed (cloud) infrastructure

                           IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   11

maandag 6 juni 2011
Lily Repository Model




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   12

maandag 6 juni 2011
Sample Lily Schema (excerpt)
                                                                      

{
     namespaces:
{
                                                                      



name:
"b$name",
     



/*
Declaration
of
namespace
prefixes.
*/
                                                                      



valueType:
{
primitive:
"STRING"
},
     



"org.lilyproject.bookssample":
"b",
                                                                      



scope:
"versioned"
     



"org.lilyproject.vtag":
"vtag"
                                                                      

},
     

},
                                                                      

{
     fieldTypes:
[
                                                                      



name:
"b$bio",
     

{
                                                                      



valueType:
{
primitive:
"STRING"
},
     



name:
"b$title",
                                                                      



scope:
"versioned"
     



valueType:
{
primitive:
"STRING"
},
                                                                      

},
     



scope:
"versioned"
                                                                      

{
     

},
                                                                      



name:
"vtag$last",
     

{
                                                                      



valueType:
{
primitive:
"LONG"
},
     



name:
"b$pages",
                                                                      



scope:
"non_versioned"
     



valueType:
{
primitive:
"INTEGER"
},
                                                                      

}
     



scope:
"versioned"
                                                                      

],
     

},
                                                                      recordTypes:
[
     

{
                                                                      

{
     



name:
"b$language",
                                                                      



name:
"b$Book",
     



valueType:
{
primitive:
"STRING"
},
                                                                      



fields:
[
     



scope:
"versioned"
                                                                      





{name:
"b$title",
mandatory:
true
},
     

},
                                                                      





{name:
"b$pages",
mandatory:
false
},
     

{
                                                                      





{name:
"b$language",
mandatory:
false
},
     



name:
"b$authors",
                                                                      





{name:
"b$authors",
mandatory:
false
},
     



valueType:
{
primitive:
"LINK",
multiValue:
true
},
                                                                      





{name:
"vtag$last",
mandatory:
false
}
     



scope:
"versioned"
                                                                      



]
     

},
                                                                      

},

                                                                      ...


                         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                     13

maandag 6 juni 2011
Lily Architecture
             (deployment)




                        IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   14

maandag 6 juni 2011
Lily Architecture
                           (components)




                                          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   15

maandag 6 juni 2011
HBase indexing & RowLog Library
        » building and querying                                             » need for sync/async
            indexes, GAE-style                                                 operations
                                                                               » updating of secondary indexes
                          rowkey            col          col
            content
                              A             val3         foo6                      (e.g. link tables)
              table
                              B             val2         foo7
                                                                               » feeding of Indexer
                                                                                   (= indexes Lily-content into Solr)
                                   rowkey          col                      » not: transactions
                      order




              index
            table A                val2-B
                                   val3-A                                   » need for distribution and
                                                                               durability

                                   IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org            16

maandag 6 juni 2011
The Lily Indexer

                                                                                                          sharding towards
                         indexing of multiple   incremental index                          blob content
       denormalization                                              batch index building                   multiple SOLR
                         versions of a record        updating                               extraction
                                                                                                              instances




                            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                        17

maandag 6 juni 2011
status june 2011

              » Lily 1.0.1 released - developing since Q4/09
              » some customers - DIY retail / media / news
                  » e-commerce platform project
                  » Lily as the data (integration) tier
              » first contrib: FrogPond (annotated Java <> Lily mapper)
                  https://bitbucket.org/calmera/frogpond


                          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   18

maandag 6 juni 2011
Next up: usage stats
              » sits in CRUD-path
              » tracks users ops against
                  records
                                                                                                interactions
                  » from both perspectives
                                                                             record                                  user
                  » arbitrary K/V properties: time,
                      location, ...




                                                                                rec
              » automatically builds user



                                                                                 om
                                                                                   me
                                                                                      nd
                                                                                      ati
                                                                                         o
                  profiles (as records)


                                                                                           ns
                                                                                                 indexes




                                                                                                                e
                                                                                                               tim
                  » tied to records ops
                  » indexed access
                  » time dimension: trending

                              IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org                     19

maandag 6 juni 2011
from usage stats to recommendations ‘light’

                                record                                                     user



              » grouping of users based on
                  » shared properties
                  » shared record access
              » grouping of records based on
                  » shared properties
                                                                                     {     connections


                  » shared user operations                                            recommendations

                         IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org       20

maandag 6 juni 2011
full-on recommendations

              » look at real-time-capable Mahout algorithms
              » pre-index or -calculate as much as possible
                  » save as secondary indexes
                  » present recommendations as part of record API
              » allow user to contribute ‘domain knowledge’ to
                  record processing pipeline
                  » pattern detection, keywords, ontologies, ...



                          IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   21

maandag 6 juni 2011
timeline


              » Lily + usage stats                                                                 10/2011
              » Lily + usage stats + light-weight analytics                                        12/2011
              » Lily + recommendations ‘light’                                                      3/2012
              » Lily 2.0 : full-on recommendations                                                  6/2012



                       IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org             22

maandag 6 juni 2011
lily enterprise


              » adds tools:
                  » yum/deb package repo
                  » cluster deploy scripts
                      (also EC2)
                  » Admin UI
              » + enterprise support



                            IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   23

maandag 6 juni 2011
demo (if time permits)


                       message                                                 part
                      ‣to                                             ‣content
                      ‣from                                           ‣mediaType
                      ‣parts                                          ‣message
                      ‣listId
                      ‣subject
                      ‣sender




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   24

maandag 6 juni 2011
WHERE?



                                        www.lilyproject.org




                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org   25

maandag 6 juni 2011
Thank you !
                                                   for your attention
                                                   for your questions

                                                   » stevenn@outerthought.org

                                                   »           @stevenn

                      IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org

maandag 6 juni 2011

Mais conteúdo relacionado

Semelhante a From Content Storage to Scaling Smart Data

Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work WebinarNGDATA
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily PartnershipsNGDATA
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UKNGDATA
 
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyHadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyCloudera, Inc.
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of DataNGDATA
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNGDATA
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNGDATA
 
ODI Overview 2013-04-09
ODI Overview 2013-04-09ODI Overview 2013-04-09
ODI Overview 2013-04-09theODI
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsMarcos Álvarez-Díaz
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesNGDATA
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversNGDATA
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)NGDATA
 
Predictive Analytics Innovation Summit
Predictive Analytics Innovation SummitPredictive Analytics Innovation Summit
Predictive Analytics Innovation SummitFractal Analytics
 
W-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate EvolutionW-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate Evolutionjstogdill
 
Nfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnellyNfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnellyShane Dempsey
 

Semelhante a From Content Storage to Scaling Smart Data (20)

Lily @ Work Webinar
Lily @ Work WebinarLily @ Work Webinar
Lily @ Work Webinar
 
Outerthought / Lily Partnerships
Outerthought / Lily PartnershipsOuterthought / Lily Partnerships
Outerthought / Lily Partnerships
 
Lily at HUG UK
Lily at HUG UKLily at HUG UK
Lily at HUG UK
 
Huguk lily
Huguk lilyHuguk lily
Huguk lily
 
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made EasyHadoop World 2011: Lily: Smart Data at Scale, Made Easy
Hadoop World 2011: Lily: Smart Data at Scale, Made Easy
 
Welcome to the Age of Data
Welcome to the Age of DataWelcome to the Age of Data
Welcome to the Age of Data
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
NoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG LuxembourgNoSQL intro for YaJUG / NoSQL UG Luxembourg
NoSQL intro for YaJUG / NoSQL UG Luxembourg
 
ODI Overview 2013-04-09
ODI Overview 2013-04-09ODI Overview 2013-04-09
ODI Overview 2013-04-09
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Gradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business AnalyticsGradiant - Technology Offer in Business Analytics
Gradiant - Technology Offer in Business Analytics
 
Learning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologiesLearning Lessons: Building a CMS on top of NoSQL technologies
Learning Lessons: Building a CMS on top of NoSQL technologies
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
KVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database serversKVIV / NoSQL : the new generation of database servers
KVIV / NoSQL : the new generation of database servers
 
Beyond Online PDFs
Beyond Online PDFs Beyond Online PDFs
Beyond Online PDFs
 
Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)Building a CMS on top of NoSQL (for ParisJUG)
Building a CMS on top of NoSQL (for ParisJUG)
 
Predictive Analytics Innovation Summit
Predictive Analytics Innovation SummitPredictive Analytics Innovation Summit
Predictive Analytics Innovation Summit
 
ODOO_klab
ODOO_klabODOO_klab
ODOO_klab
 
W-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate EvolutionW-JAX Keynote - Big Data and Corporate Evolution
W-JAX Keynote - Big Data and Corporate Evolution
 
Nfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnellyNfa workshop introductions_wdonnelly
Nfa workshop introductions_wdonnelly
 

Mais de NGDATA

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog libraryNGDATA
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghentNGDATA
 
Big Data
Big DataBig Data
Big DataNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyNGDATA
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaNGDATA
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseNGDATA
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at DevoxxNGDATA
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNGDATA
 

Mais de NGDATA (10)

NGDATA Corporate Presentation
NGDATA Corporate PresentationNGDATA Corporate Presentation
NGDATA Corporate Presentation
 
The Lily RowLog library
The Lily RowLog libraryThe Lily RowLog library
The Lily RowLog library
 
20110514 appsforghent
20110514 appsforghent20110514 appsforghent
20110514 appsforghent
 
Big Data
Big DataBig Data
Big Data
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and LilyDevoxx 2010 | Tools In Action : Kauri and Lily
Devoxx 2010 | Tools In Action : Kauri and Lily
 
Devoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in JavaDevoxx 2010 | LAB : ReST in Java
Devoxx 2010 | LAB : ReST in Java
 
N-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the riseN-O-SQL, new database technologies on the rise
N-O-SQL, new database technologies on the rise
 
NoSQL BOF at Devoxx
NoSQL BOF at DevoxxNoSQL BOF at Devoxx
NoSQL BOF at Devoxx
 
NoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at DevoxxNoSQL "Tools in Action" talk at Devoxx
NoSQL "Tools in Action" talk at Devoxx
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

From Content Storage to Scaling Smart Data

  • 1. Smart data, Lily at scale madE easy from content storage to scaling smart data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011
  • 2. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2 maandag 6 juni 2011
  • 3. the pain data need for distributed processing moore IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 3 maandag 6 juni 2011
  • 4. the pain » growth of data sets » smart businesses need to apply analytics to Smart data, activities at scale » doing business online means real-time madE easy » talent shortage IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 4 maandag 6 juni 2011
  • 5. LILY The Real-time Platform built for the Age of Data. We manage, track and measure your data and users, and do the mat(c)hmaking in-between: » provide you with business intelligence and analytics » harvest user profiles and learn their interests » dynamically engage your users using quality recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 5 maandag 6 juni 2011
  • 6. where would you use lily? » large collections of data » large groups of users » content repositories » e-commerce / retail » library catalogs » news / media » (media) asset management » product catalogs » ‘live’ archives » ... if you want to use big data, but you need easy. IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 6 maandag 6 juni 2011
  • 7. ns pe ap gic h ma he t re he sw si + thi IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 7 maandag 6 juni 2011
  • 8. beyond content management marketing broadcast revenue product / service IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 8 maandag 6 juni 2011
  • 9. beyond content management: data + analytics recommendations call to action personalised revenue product / service audience data IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 9 maandag 6 juni 2011
  • 10. LILY 2.0: smart data SMARTER DATA data processing s relation recommendations semantic augmentation Analytics usage metrics domain knowledge patterns rules keywords lists ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 10 maandag 6 juni 2011
  • 11. roadmap » now: highly-scalable data repository: store, index and search » next: with real-time usage stats gathering and analytics » later: and built-in context- and user-sensitive recommendations » built on top of Google BigTable / HBase / Solr » identical, robust technology in use at Facebook, Twitter, StumbleUpon, Yahoo! » scales widely over distributed (cloud) infrastructure IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 11 maandag 6 juni 2011
  • 12. Lily Repository Model IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 12 maandag 6 juni 2011
  • 13. Sample Lily Schema (excerpt) 

{ namespaces:
{ 



name:
"b$name", 



/*
Declaration
of
namespace
prefixes.
*/ 



valueType:
{
primitive:
"STRING"
}, 



"org.lilyproject.bookssample":
"b", 



scope:
"versioned" 



"org.lilyproject.vtag":
"vtag" 

}, 

}, 

{ fieldTypes:
[ 



name:
"b$bio", 

{ 



valueType:
{
primitive:
"STRING"
}, 



name:
"b$title", 



scope:
"versioned" 



valueType:
{
primitive:
"STRING"
}, 

}, 



scope:
"versioned" 

{ 

}, 



name:
"vtag$last", 

{ 



valueType:
{
primitive:
"LONG"
}, 



name:
"b$pages", 



scope:
"non_versioned" 



valueType:
{
primitive:
"INTEGER"
}, 

} 



scope:
"versioned" 

], 

}, recordTypes:
[ 

{ 

{ 



name:
"b$language", 



name:
"b$Book", 



valueType:
{
primitive:
"STRING"
}, 



fields:
[ 



scope:
"versioned" 





{name:
"b$title",
mandatory:
true
}, 

}, 





{name:
"b$pages",
mandatory:
false
}, 

{ 





{name:
"b$language",
mandatory:
false
}, 



name:
"b$authors", 





{name:
"b$authors",
mandatory:
false
}, 



valueType:
{
primitive:
"LINK",
multiValue:
true
}, 





{name:
"vtag$last",
mandatory:
false
} 



scope:
"versioned" 



] 

}, 

}, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 13 maandag 6 juni 2011
  • 14. Lily Architecture (deployment) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 14 maandag 6 juni 2011
  • 15. Lily Architecture (components) IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 15 maandag 6 juni 2011
  • 16. HBase indexing & RowLog Library » building and querying » need for sync/async indexes, GAE-style operations » updating of secondary indexes rowkey col col content A val3 foo6 (e.g. link tables) table B val2 foo7 » feeding of Indexer (= indexes Lily-content into Solr) rowkey col » not: transactions order index table A val2-B val3-A » need for distribution and durability IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 16 maandag 6 juni 2011
  • 17. The Lily Indexer sharding towards indexing of multiple incremental index blob content denormalization batch index building multiple SOLR versions of a record updating extraction instances IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 17 maandag 6 juni 2011
  • 18. status june 2011 » Lily 1.0.1 released - developing since Q4/09 » some customers - DIY retail / media / news » e-commerce platform project » Lily as the data (integration) tier » first contrib: FrogPond (annotated Java <> Lily mapper) https://bitbucket.org/calmera/frogpond IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 18 maandag 6 juni 2011
  • 19. Next up: usage stats » sits in CRUD-path » tracks users ops against records interactions » from both perspectives record user » arbitrary K/V properties: time, location, ... rec » automatically builds user om me nd ati o profiles (as records) ns indexes e tim » tied to records ops » indexed access » time dimension: trending IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 19 maandag 6 juni 2011
  • 20. from usage stats to recommendations ‘light’ record user » grouping of users based on » shared properties » shared record access » grouping of records based on » shared properties { connections » shared user operations recommendations IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 20 maandag 6 juni 2011
  • 21. full-on recommendations » look at real-time-capable Mahout algorithms » pre-index or -calculate as much as possible » save as secondary indexes » present recommendations as part of record API » allow user to contribute ‘domain knowledge’ to record processing pipeline » pattern detection, keywords, ontologies, ... IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 21 maandag 6 juni 2011
  • 22. timeline » Lily + usage stats 10/2011 » Lily + usage stats + light-weight analytics 12/2011 » Lily + recommendations ‘light’ 3/2012 » Lily 2.0 : full-on recommendations 6/2012 IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 22 maandag 6 juni 2011
  • 23. lily enterprise » adds tools: » yum/deb package repo » cluster deploy scripts (also EC2) » Admin UI » + enterprise support IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 23 maandag 6 juni 2011
  • 24. demo (if time permits) message part ‣to ‣content ‣from ‣mediaType ‣parts ‣message ‣listId ‣subject ‣sender IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 24 maandag 6 juni 2011
  • 25. WHERE? www.lilyproject.org IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 25 maandag 6 juni 2011
  • 26. Thank you ! for your attention for your questions » stevenn@outerthought.org » @stevenn IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org maandag 6 juni 2011