SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Modeling taste with Cassandra




Affinity is based on user tastes, preferences, and interests

                                                               1
What is a taste profile?

               Operational definition: the set of things you like and dislike

Stuff I like                                   Stuff I don’t like




     Challenge: how do you build a set of things you like and dislike
           Operational definition: the taste profile for someone?               2
Thesis: Likes are correlated
Inferring correlations
        D               1)   User A:
                              •   Democrat
                              •   Likes Arugula
                        2)   User B:
                    C
                              •   Republican
    E
            ?                 •   Dislikes Arugula
                        3)   User C indicates:
                              •   Democrat

                        What would we infer is User C’s affinity for
                        Arugula?

A
                        Answer: User C would like Arugula
                B




                                                                       4
Inferring correlations

               Like arugula
                                      User A


                                      <3, 2.5>

                     <1,1>
Dislike                           Like
Obama                             Obama
          User B


      <-2,-1.5>


    <-3,-3>
              Dislike arugula

                              User C           If someone’s affinity
                                               for Obama is 2.0,
                              <2,?>
                                               what is their affinity
                                               for arugula?

                                                                        5
Discovering latent factors
                                                            Obama
                                                                             Liberal
                                                     Arugula        <5, 5>
                                Like arugula
                                                                <4, 4>
                                                   User A

                                                     <3, 2>

                                      <1,1>
                 Dislike                            Like
                 Obama                              Obama
                           User B


                        <-2,-1.5>

            Iceberg
                       <-3,-3>
               <-4, -4>        Dislike arugula
    GOP


 <-5, -5>                                      User C         Predict 1.5 for how
                                                              much this person will
                                               <2,1.5>
Conservative                                                  like arugula.


                                                                                       6
Taste space = many latent factors

                                      <0.7, 4.4, -.1>
                        Liberal

                                  <0.5, 2.4, -.4>
                                     A
                                     Extroverted


Masculine                                          Feminine



                        <-0.5, -3.1, 0.1>
        Introverted
                       B
                      Conservative




                                                              7
What is a taste profile profile?

                 Operational definition: a coordinate in taste space

Stuff I like (close to me in taste space)   Stuff I don’t like (far away in taste space)




           Operational definition: the set of things you like and dislike
        Challenge: how do you calculate taste coordinates?                            8
Calculating taste coordinates
                       D                     Edge weight = dot product of nodes
? <x, y>
                                             to constrain similar items to be
                   2            <1, -1>
                                             close to each other.
                                     C       Assume edge weights of:
               E                                +2 = “love”
                                                -2 = “hate”
      2    <1, -0.5>
                                             Democratic node must solve:
                                               1*x -2*y = 2 (edge from A)
           2
                           -2                  1*x -1*y = 2 (edge from C)
      A
                                             Solution = <2, 0>
 <1, -2>                         B
                           <-1, 2>




                                                                             9
Updating taste coordinates

          User A purchases a camera...

<1, -1>
                                                          <1, -0.5>
                          2         <1, -1>
                                                                                      2         <1, -1>
                                         C
                                                                                                     C
                                              <-1, 0.5>
                                                                                                          <-1, 0.5>
                  <1, -0.5>
                                                                      2       <1, -0.5>

              2
                               -2                                             2
                                              2                                            -2
          A                                                                                               2
                                                                      A
   <1, -2>                           B
                                                               <0.75, -2.5>                      B
                              <-1, 2>
                                                                                          <-1, 2>

          Resulting in blue coordinates changing.
v1 System overview - Model updates

                                     1) Receive event
  Rec.              Updater          (eg, Purchase)
  Engine



    3) Write user             2a) Write Purchase edge
    and item                  2b) Read other edges
    coordinates               for this user and item




Reco. DB            Taste graph
User -> coord
Item -> coord
v1 System overview - Rec serving

                     1) Page load        Rec.          Updater
                     requests            Engine
                     recommendations


                               2) Rec. engine
                               finds other
                               cameras close
                               to user’s
3) Recommendations             coordinates
shown to user


                                       Reco. DB        Taste graph
                                       User -> coord
                                       Item -> coord
v1 Taste Graph data size


40 billion edges
2 billion item nodes
200 million user nodes

5TB of data, takes up 10TB with Replication Factor of 2

We expect this to quadruple next year as we get more events and add
  new types of edges




                                                                      13
v1 Taste Graph DB configuration


32 Linux machines
  128GB RAM
  1TB iSCSI SSD
  10 GigE NIC


Cassandra version 1.0.8

8GB JVM heap space

Size-tiered compaction strategy
v1 Taste Graph schema

User Edges
              (timestamp, edge_type, item_id)   …
   user_id               <empty>
Item Edges
              (timestamp, edge_type, user_id)   …
    item_id              <empty>
User Nodes
                   tastevector
    user_id   200 bytes (50 floats)
Item Nodes
                   tastevector
    item_id   200 bytes (50 floats)
v1 Real-time taste updates

Edges and nodes read per second
v1 Real-time taste updates

Edges and nodes written per second
Questions?


tp@hunch.com




                        18

Mais conteúdo relacionado

Mais procurados

AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나Amazon Web Services Korea
 
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...Amazon Web Services Korea
 
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon Web Services Korea
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Amazon Web Services
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon Web Services Korea
 
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기Amazon Web Services Korea
 
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?Amazon Web Services Korea
 
AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다
AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다
AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다Amazon Web Services Korea
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to GraphNeo4j
 
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법Amazon Web Services Korea
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Amazon Route 53 - Webinar Presentation 9.16.2015
Amazon Route 53 - Webinar Presentation 9.16.2015Amazon Route 53 - Webinar Presentation 9.16.2015
Amazon Route 53 - Webinar Presentation 9.16.2015Amazon Web Services
 
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAmazon Web Services
 

Mais procurados (20)

Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
AWS Single Sign-On (SSO) 서비스 집중 탐구 - 윤석찬 :: AWS Unboxing 온라인 세미나
 
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
데브옵스 엔지니어를 위한 신규 운영 서비스 - 김필중, AWS 개발 전문 솔루션즈 아키텍트 / 김현민, 메가존클라우드 솔루션즈 아키텍트 :...
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
Amazon DynamoDB - Use Cases and Cost Optimization - 발표자: 이혁, DynamoDB Special...
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
 
Amazon S3 Masterclass
Amazon S3 MasterclassAmazon S3 Masterclass
Amazon S3 Masterclass
 
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
 
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
AWS Summit Seoul 2023 | 실시간 CDC 데이터 처리! Modern Transactional Data Lake 구축하기
 
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
 
AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다
AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다
AWS Summit Seoul 2023 | 플로 AWS All-in 전략을 통해 음원서비스의 혁신을 이루다
 
RDBMS to Graph
RDBMS to GraphRDBMS to Graph
RDBMS to Graph
 
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
10월 웨비나 - AWS에서 Active Directory 구축 및 연동 옵션 살펴보기 (김용우 솔루션즈 아키텍트)
 
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
AWS Backup을 이용한 데이터베이스의 백업 자동화와 편리한 복구방법
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
ElastiCache & Redis
ElastiCache & RedisElastiCache & Redis
ElastiCache & Redis
 
Amazon Route 53 - Webinar Presentation 9.16.2015
Amazon Route 53 - Webinar Presentation 9.16.2015Amazon Route 53 - Webinar Presentation 9.16.2015
Amazon Route 53 - Webinar Presentation 9.16.2015
 
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
 

Destaque

Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendationsproksik
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph DatabasesInfiniteGraph
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph DatabasesAntonio Maccioni
 
Graph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoGraph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoCodemotion
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - ImportNeo4j
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jDebanjan Mahata
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDaysNeo4j
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 

Destaque (17)

Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Lju Lazarevic
Lju LazarevicLju Lazarevic
Lju Lazarevic
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Graph Database, a little connected tour - Castano
Graph Database, a little connected tour - CastanoGraph Database, a little connected tour - Castano
Graph Database, a little connected tour - Castano
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDays
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 

Mais de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mais de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Graph Based Recommendation Systems at eBay

  • 1. Modeling taste with Cassandra Affinity is based on user tastes, preferences, and interests 1
  • 2. What is a taste profile? Operational definition: the set of things you like and dislike Stuff I like Stuff I don’t like Challenge: how do you build a set of things you like and dislike Operational definition: the taste profile for someone? 2
  • 3. Thesis: Likes are correlated
  • 4. Inferring correlations D 1) User A: • Democrat • Likes Arugula 2) User B: C • Republican E ? • Dislikes Arugula 3) User C indicates: • Democrat What would we infer is User C’s affinity for Arugula? A Answer: User C would like Arugula B 4
  • 5. Inferring correlations Like arugula User A <3, 2.5> <1,1> Dislike Like Obama Obama User B <-2,-1.5> <-3,-3> Dislike arugula User C If someone’s affinity for Obama is 2.0, <2,?> what is their affinity for arugula? 5
  • 6. Discovering latent factors Obama Liberal Arugula <5, 5> Like arugula <4, 4> User A <3, 2> <1,1> Dislike Like Obama Obama User B <-2,-1.5> Iceberg <-3,-3> <-4, -4> Dislike arugula GOP <-5, -5> User C Predict 1.5 for how much this person will <2,1.5> Conservative like arugula. 6
  • 7. Taste space = many latent factors <0.7, 4.4, -.1> Liberal <0.5, 2.4, -.4> A Extroverted Masculine Feminine <-0.5, -3.1, 0.1> Introverted B Conservative 7
  • 8. What is a taste profile profile? Operational definition: a coordinate in taste space Stuff I like (close to me in taste space) Stuff I don’t like (far away in taste space) Operational definition: the set of things you like and dislike Challenge: how do you calculate taste coordinates? 8
  • 9. Calculating taste coordinates D Edge weight = dot product of nodes ? <x, y> to constrain similar items to be 2 <1, -1> close to each other. C Assume edge weights of: E +2 = “love” -2 = “hate” 2 <1, -0.5> Democratic node must solve: 1*x -2*y = 2 (edge from A) 2 -2 1*x -1*y = 2 (edge from C) A Solution = <2, 0> <1, -2> B <-1, 2> 9
  • 10. Updating taste coordinates User A purchases a camera... <1, -1> <1, -0.5> 2 <1, -1> 2 <1, -1> C C <-1, 0.5> <-1, 0.5> <1, -0.5> 2 <1, -0.5> 2 -2 2 2 -2 A 2 A <1, -2> B <0.75, -2.5> B <-1, 2> <-1, 2> Resulting in blue coordinates changing.
  • 11. v1 System overview - Model updates 1) Receive event Rec. Updater (eg, Purchase) Engine 3) Write user 2a) Write Purchase edge and item 2b) Read other edges coordinates for this user and item Reco. DB Taste graph User -> coord Item -> coord
  • 12. v1 System overview - Rec serving 1) Page load Rec. Updater requests Engine recommendations 2) Rec. engine finds other cameras close to user’s 3) Recommendations coordinates shown to user Reco. DB Taste graph User -> coord Item -> coord
  • 13. v1 Taste Graph data size 40 billion edges 2 billion item nodes 200 million user nodes 5TB of data, takes up 10TB with Replication Factor of 2 We expect this to quadruple next year as we get more events and add new types of edges 13
  • 14. v1 Taste Graph DB configuration 32 Linux machines 128GB RAM 1TB iSCSI SSD 10 GigE NIC Cassandra version 1.0.8 8GB JVM heap space Size-tiered compaction strategy
  • 15. v1 Taste Graph schema User Edges (timestamp, edge_type, item_id) … user_id <empty> Item Edges (timestamp, edge_type, user_id) … item_id <empty> User Nodes tastevector user_id 200 bytes (50 floats) Item Nodes tastevector item_id 200 bytes (50 floats)
  • 16. v1 Real-time taste updates Edges and nodes read per second
  • 17. v1 Real-time taste updates Edges and nodes written per second