SlideShare uma empresa Scribd logo
1 de 21
Solutions Engineer
@MarcSelwan
Marc Selwan
Enabling Search in your Cassandra Application with Datastax Enterprise
1
Why Search?
Confidential
Confidential
Confidential
The bright blue butterfly hangs on the breeze.
[the] [bright] [blue] [butterfly] [hangs] [on] [the] [breeze]
Terms
Confidential Credit: https://developer.apple.com/library/mac/documentation/userexperience/conceptual/SearchKitConcepts/searchKit_basics/searchKit_basics.html
What is Solr Missing?
Not a
Database
Doesn’t
Cluster
Not
transparently
sharded
Requires ETL
to injest
application data
Doesn’t
Reindex
Confidential
7
OLTP DB Search Cluster
Your Application
DB API Search API
Your
ETL
Transactional
Workloads
Search
Workloads
Open Source Search Reference Architecture
Confidential
+ =
DSE Search Reference Architecture
Confidential
9
Search
+
Cassandra
80
10
3050
70
60
40
20
Your Application
CQL
Easy CQL API
All the goodness of DataStax driver
Distributed, Replicated, Always On
Data locality and shared memory
• Automatic indexing on db insert
• Higher ingestion throughput
• Distributed query optimization
Compared to open source search
• No separate search cluster to manage
• Probably less total hardware required
• No “Split Brain” data inconsistencies
• No ETL or synch to build and maintain
• No app level data management code
Data stored in Cassandra
Indexes stored in Solr/Lucene
Disk
Memory
Solr Cassandra
Disk
Memory
Mem-
Table
Index
Segments
Ram Buffer
Index
Segments
Index
Segments
Mem-
Table
Mem-
table
Index
Segments
SSTables
Commit
Log
Coordinator
Index
Segments
Shard Router
UPDATE videos (videoid, tags)
SET tags = {‘cat tubes’, ‘Al Gore’s Internet’,
‘NoSQL Fairytales’}
WHERE voided = b3a76c6b-7c7f-4af6-964f-803a9283c401
OSS Solr
Disk
Memory
Index
Segments
Ram Buffer
Index
Segments
Index
Segments
Index
Segments
Index
Segments
Not Searchable
Searchable
DSE Search
Disk
Memory
Index
Segments
Ram Buffer
Index
Segments
Index
Segments
Index
Segments
Index
Segments
Searchable
Confidential
Let’s see this in action!
Search in Retail
Filter queries: These are awesome because the result set gets cached in memory.
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "fq":"categories:Books", "sort":"title
asc"}' limit 10;
Faceting: Get counts of fields
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "facet":{"field":"categories"}}' limit 10;
Geospatial Searches: Supports box and radius
SELECT * FROM amazon.clicks WHERE solr_query='{"q":"asin:*", "fq":"+{!geofilt pt="37.7484,-122.4156"
sfield=location d=1}"}' limit 10;
Joins: Not your relational joins. These queries 'borrow' indexes from other tables to add filter logic. These are
fast!
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "fq":"{!join from=asin to=asin force=true
fromIndex=amazon.clicks}area_code:415"}' limit 5;
Fun all in one.
SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "facet":{"field":"categories"}, "fq":"{!join
from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5;
How do you get started??
Confidential
1) Spin up a new C* Cluster with search enabled using the DSE installer.
$ sudo service dse cassandra -s
2) Run your schema DDL to create the C* keyspace and tables.
3) Run dse_tool on the videos table*
$ dsetool create_core keyspace.table generateResources=true reindex=true
4) Write a CQL query with a Solr Search in it.
SELECT * FROM keyspace.table
WHERE solr_query=‘column:*’
*This will create lucene indexes on ALL the columns in your table.
Behind the scenes…
dse_tool
schema.xml
solrconfig.xml
CQL Query
$ dsetool create_core killrvideo.videos generateResources=true
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
<types>
…
<fields>
<field indexed="true" multiValued="false" name="added_date" stored="true" type="TrieDateField"/>
<field indexed="true" multiValued="false" name="location" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="preview_image_location" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="name" termVectors="true" stored="true" type="TextField"/>
<field indexed="true" multiValued="true" name="tags" termVectors="true" stored="true" type="TextField"/>
<field indexed="true" multiValued="false" name="userid" stored="true" type="UUIDField"/>
<field indexed="true" multiValued="false" name="videoid" stored="true" type="UUIDField"/>
<field indexed="true" multiValued="false" name="location_type" stored="true" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="description" termVectors="true" stored="true" type="TextField"/>
</fields>
<uniqueKey>videoid</uniqueKey>
</schema>
<!--
=======
Copyright DataStax, Inc.
Please see the included license file for details.
-->
<!--
For more details about configurations options that may appear in
this file, see http://wiki.apache.org/solr/SolrConfigXml.
-->
<config>
<!-- In all configuration below, a prefix of "solr." for class names
is an alias that causes solr to search appropriate packages,
including org.apache.solr.(search|update|request|core|analysis)
You may also specify a fully qualified Java classname if you
have your own custom plugins.
-->
…
SELECT * FROM killrvideo.videos
WHERE solr_query=‘name:*’
Thank you!
25

Mais conteúdo relacionado

Mais procurados

Az 104 session 6 azure networking part2
Az 104 session 6 azure networking part2Az 104 session 6 azure networking part2
Az 104 session 6 azure networking part2AzureEzy1
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMDATAVERSITY
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
MongoDB and Azure Databricks
MongoDB and Azure DatabricksMongoDB and Azure Databricks
MongoDB and Azure DatabricksMongoDB
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureMark Kromer
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Agile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven DesignAgile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven DesignAraf Karsh Hamid
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content TransformationAlfresco Software
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flowconfluent
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 

Mais procurados (20)

Az 104 session 6 azure networking part2
Az 104 session 6 azure networking part2Az 104 session 6 azure networking part2
Az 104 session 6 azure networking part2
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
MongoDB and Azure Databricks
MongoDB and Azure DatabricksMongoDB and Azure Databricks
MongoDB and Azure Databricks
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Agile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven DesignAgile, User Stories, Domain Driven Design
Agile, User Stories, Domain Driven Design
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Metadata Extraction and Content Transformation
Metadata Extraction and Content TransformationMetadata Extraction and Content Transformation
Metadata Extraction and Content Transformation
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 

Destaque

Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Using Event-Driven Architectures with Cassandra
Using Event-Driven Architectures with CassandraUsing Event-Driven Architectures with Cassandra
Using Event-Driven Architectures with CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 

Destaque (20)

Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Using Event-Driven Architectures with Cassandra
Using Event-Driven Architectures with CassandraUsing Event-Driven Architectures with Cassandra
Using Event-Driven Architectures with Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 

Semelhante a Enabling Search in your Cassandra Application with DataStax Enterprise

DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
From 0 to hero adf cicd pass mdpug oslo feb 2020
From 0 to hero adf cicd pass mdpug oslo feb 2020From 0 to hero adf cicd pass mdpug oslo feb 2020
From 0 to hero adf cicd pass mdpug oslo feb 2020Halvar Trøyel Nerbø
 
GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao IntroductionBooch Lin
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1Charles Givre
 
SQL KONFERENZ 2020 Azure Key Vault, Azure Dev Ops and Azure Data Factory how...
SQL KONFERENZ 2020  Azure Key Vault, Azure Dev Ops and Azure Data Factory how...SQL KONFERENZ 2020  Azure Key Vault, Azure Dev Ops and Azure Data Factory how...
SQL KONFERENZ 2020 Azure Key Vault, Azure Dev Ops and Azure Data Factory how...Erwin de Kreuk
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !Microsoft
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 
Painless Persistence in a Disconnected World
Painless Persistence in a Disconnected WorldPainless Persistence in a Disconnected World
Painless Persistence in a Disconnected WorldChristian Melchior
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP ConferenceDave Stokes
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextCarsten Czarski
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!Dave Stokes
 
Slick – the modern way to access your Data
Slick – the modern way to access your DataSlick – the modern way to access your Data
Slick – the modern way to access your DataJochen Huelss
 
DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...
DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...
DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...Erwin de Kreuk
 

Semelhante a Enabling Search in your Cassandra Application with DataStax Enterprise (20)

DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax: Enabling Search in your Cassandra Application with DataStax Enterprise
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
From 0 to hero adf cicd pass mdpug oslo feb 2020
From 0 to hero adf cicd pass mdpug oslo feb 2020From 0 to hero adf cicd pass mdpug oslo feb 2020
From 0 to hero adf cicd pass mdpug oslo feb 2020
 
GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao Introduction
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
 
SQL KONFERENZ 2020 Azure Key Vault, Azure Dev Ops and Azure Data Factory how...
SQL KONFERENZ 2020  Azure Key Vault, Azure Dev Ops and Azure Data Factory how...SQL KONFERENZ 2020  Azure Key Vault, Azure Dev Ops and Azure Data Factory how...
SQL KONFERENZ 2020 Azure Key Vault, Azure Dev Ops and Azure Data Factory how...
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !
 
TIAD : Automating the modern datacenter
TIAD : Automating the modern datacenterTIAD : Automating the modern datacenter
TIAD : Automating the modern datacenter
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
Painless Persistence in a Disconnected World
Painless Persistence in a Disconnected WorldPainless Persistence in a Disconnected World
Painless Persistence in a Disconnected World
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
 
MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!MySQL Without the MySQL -- Oh My!
MySQL Without the MySQL -- Oh My!
 
Slick – the modern way to access your Data
Slick – the modern way to access your DataSlick – the modern way to access your Data
Slick – the modern way to access your Data
 
Iac d.damyanov 4.pptx
Iac d.damyanov 4.pptxIac d.damyanov 4.pptx
Iac d.damyanov 4.pptx
 
DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...
DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...
DatamindsConnect2019 Azure Key Vault, Azure Dev Ops and Azure Data Factory ho...
 
Content search api in sitecore 8.1
Content search api in sitecore 8.1Content search api in sitecore 8.1
Content search api in sitecore 8.1
 

Mais de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkDataStax Academy
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and CassandraDataStax Academy
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Traveler's Guide to Cassandra
Traveler's Guide to CassandraTraveler's Guide to Cassandra
Traveler's Guide to CassandraDataStax Academy
 
Cassandra: One (is the loneliest number)
Cassandra: One (is the loneliest number)Cassandra: One (is the loneliest number)
Cassandra: One (is the loneliest number)DataStax Academy
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 

Mais de DataStax Academy (11)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
Analytics with Spark and Cassandra
Analytics with Spark and CassandraAnalytics with Spark and Cassandra
Analytics with Spark and Cassandra
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Traveler's Guide to Cassandra
Traveler's Guide to CassandraTraveler's Guide to Cassandra
Traveler's Guide to Cassandra
 
Cassandra: One (is the loneliest number)
Cassandra: One (is the loneliest number)Cassandra: One (is the loneliest number)
Cassandra: One (is the loneliest number)
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
New features in 3.0
New features in 3.0New features in 3.0
New features in 3.0
 

Último

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Enabling Search in your Cassandra Application with DataStax Enterprise

  • 1. Solutions Engineer @MarcSelwan Marc Selwan Enabling Search in your Cassandra Application with Datastax Enterprise 1
  • 4. Confidential The bright blue butterfly hangs on the breeze. [the] [bright] [blue] [butterfly] [hangs] [on] [the] [breeze] Terms
  • 6. What is Solr Missing? Not a Database Doesn’t Cluster Not transparently sharded Requires ETL to injest application data Doesn’t Reindex
  • 7. Confidential 7 OLTP DB Search Cluster Your Application DB API Search API Your ETL Transactional Workloads Search Workloads Open Source Search Reference Architecture
  • 9. DSE Search Reference Architecture Confidential 9 Search + Cassandra 80 10 3050 70 60 40 20 Your Application CQL Easy CQL API All the goodness of DataStax driver Distributed, Replicated, Always On Data locality and shared memory • Automatic indexing on db insert • Higher ingestion throughput • Distributed query optimization Compared to open source search • No separate search cluster to manage • Probably less total hardware required • No “Split Brain” data inconsistencies • No ETL or synch to build and maintain • No app level data management code
  • 10. Data stored in Cassandra Indexes stored in Solr/Lucene
  • 12. Disk Memory Mem- Table Index Segments Ram Buffer Index Segments Index Segments Mem- Table Mem- table Index Segments SSTables Commit Log Coordinator Index Segments Shard Router UPDATE videos (videoid, tags) SET tags = {‘cat tubes’, ‘Al Gore’s Internet’, ‘NoSQL Fairytales’} WHERE voided = b3a76c6b-7c7f-4af6-964f-803a9283c401
  • 17. Filter queries: These are awesome because the result set gets cached in memory. SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "fq":"categories:Books", "sort":"title asc"}' limit 10; Faceting: Get counts of fields SELECT * FROM amazon.metadata WHERE solr_query='{"q":"title:Noir~", "facet":{"field":"categories"}}' limit 10; Geospatial Searches: Supports box and radius SELECT * FROM amazon.clicks WHERE solr_query='{"q":"asin:*", "fq":"+{!geofilt pt="37.7484,-122.4156" sfield=location d=1}"}' limit 10; Joins: Not your relational joins. These queries 'borrow' indexes from other tables to add filter logic. These are fast! SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5; Fun all in one. SELECT * FROM amazon.metadata WHERE solr_query='{"q":"*:*", "facet":{"field":"categories"}, "fq":"{!join from=asin to=asin force=true fromIndex=amazon.clicks}area_code:415"}' limit 5;
  • 18. How do you get started??
  • 19. Confidential 1) Spin up a new C* Cluster with search enabled using the DSE installer. $ sudo service dse cassandra -s 2) Run your schema DDL to create the C* keyspace and tables. 3) Run dse_tool on the videos table* $ dsetool create_core keyspace.table generateResources=true reindex=true 4) Write a CQL query with a Solr Search in it. SELECT * FROM keyspace.table WHERE solr_query=‘column:*’ *This will create lucene indexes on ALL the columns in your table.
  • 20. Behind the scenes… dse_tool schema.xml solrconfig.xml CQL Query $ dsetool create_core killrvideo.videos generateResources=true <?xml version="1.0" encoding="UTF-8" standalone="no"?> <schema name="autoSolrSchema" version="1.5"> <types> … <fields> <field indexed="true" multiValued="false" name="added_date" stored="true" type="TrieDateField"/> <field indexed="true" multiValued="false" name="location" stored="true" type="TextField"/> <field indexed="true" multiValued="false" name="preview_image_location" stored="true" type="TextField"/> <field indexed="true" multiValued="false" name="name" termVectors="true" stored="true" type="TextField"/> <field indexed="true" multiValued="true" name="tags" termVectors="true" stored="true" type="TextField"/> <field indexed="true" multiValued="false" name="userid" stored="true" type="UUIDField"/> <field indexed="true" multiValued="false" name="videoid" stored="true" type="UUIDField"/> <field indexed="true" multiValued="false" name="location_type" stored="true" type="TrieIntField"/> <field indexed="true" multiValued="false" name="description" termVectors="true" stored="true" type="TextField"/> </fields> <uniqueKey>videoid</uniqueKey> </schema> <!-- ======= Copyright DataStax, Inc. Please see the included license file for details. --> <!-- For more details about configurations options that may appear in this file, see http://wiki.apache.org/solr/SolrConfigXml. --> <config> <!-- In all configuration below, a prefix of "solr." for class names is an alias that causes solr to search appropriate packages, including org.apache.solr.(search|update|request|core|analysis) You may also specify a fully qualified Java classname if you have your own custom plugins. --> … SELECT * FROM killrvideo.videos WHERE solr_query=‘name:*’

Notas do Editor

  1. Search is everywhere. Tell story about kids with search and recommendations. Properly written queries, on the other hand, are completely precise. A query will give you exactly the result you ask for. Exactly what you asked for, which may or not be exactly what you want. The hardest part of a query is asking the right question. Search, on the other hand, is not precise. A dumb search just gives you a randomly ordered list of items in which your search term occurs. A smart search will provide you a ranked list of items that are strongly related to the search terms you entered, even if they do not match the exactly. This means that even though some of the results will doubtless be unrelated, and sometimes absurd, there is a very good chance that there is data you can use pretty high up in those results, even if you didn’t ask quite the right question. And there may well be relevant data you didn’t even think to ask for. Being smart in this way is a key benefit of search. Queries cannot be smart. Queries must always give you exactly what you asked for. There can be no tolerance for serendipity in query results. Search can be smart, but query must be dumb and strictly obedient. “Find me all webpages that contain “Cassandra” and “Optimization”” “Recommend artists who are “like” “Taylor Swift”” “Highlight the keywords “Awesome,” “Good,” and “Amazing.”” Find me all shoes with: Size: 12 Price: $15 - $60 Brand: Nike
  2. Lucene is the base to nearly every popular search engine out there, including elastic search (which is more a of fried taco shell)Open source , Fast, high performance, scalable search/IR library and was Initially developed by Doug Cutting (Also author of Hadoop) Provides advanced Search options like synonyms, stopwords, based on similarity, proximity. Faceted search (search by price, size, manufacturer, etc.) Geospatial search (combining location information, filtering by distance, and more) Solr was Created by Yonik Seeley for CNET as a Enterprise Search platform for Apache Lucene Open source, Highly reliable, scalable, fault tolerant. Its Support distributed Indexing (SolrCloud),Replication, and load balanced querying. Solr “wraps” the Lucene open sourc information retrieval engine You customize Solr via configuration & plug-ins Free-form text search includes wildcards and phrases Query support includes filtering (ranges, geo-spatial) and sorting You can run Solr as a webapp, or in stand-alone (embedded) mode You make search requests using HTTP GET requests Documents are added and deleted via HTTP POST requests
  3. These are extremely efficient and fast structures that live on disk. Simple queries are typically fulfilled in < 1ms. More complex ones, < 10ms
  4. Not a database- A user has to create an interface to add a database, no built in security, no SQL like language Requires ETL- data needs to be moved from an DBMS or other system to be indexed, increasing operational complexity, risk and reducing time for the data to be available. no automatic indexing- no reindex after lucene upgrade, to reindiex, you need to reinvest the data must reindex data after upgrade Not transparently sharded- shared is only available in SolrCloud, and is not transparently sharded. Doesn’t cluster- no HA or failover builtin. no load rebalancing when need to scale out.
  5. Cassandra has a lot of the things that Solr is missing, so the great minds at Datastax engineering put them together to create all the good things in one, easy to use package
  6. What is cassandra? High level overview: OSS NoSQL database optimized for extreme OLTP workloads. I will proceed that you understand the basics of C* from here; but if you don’t, please check out some of the other great intro videos on C* available on DSA.
  7. You get a lot of the benefits from using Solr on top of C* - No ETL! Just load the data in cassandra and it’s automatically replicated to Solr! Data is automatically replicated from the Cassandra DC to the Solr DC No single point of failure No ETL
  8. Let’s look closer at the how these 2 technologies are integrated. C Cassandra stores the data, and solr will create the lucene indexes required to perform searches on that particular node’s data.
  9. Cassandra and Solr reside together on a single DSE Search node in the same JVM. Consequently, you will want to size these nodes a bit larger than a typical cassandra node, esp when it comes to memory. This also does cause a very busy garbage collector, so it is recommended that you pay close attention to your JVM and garbage collection settings
  10. Review C* write path. The solr write path in DSE search is very similar. Coordinator communicates with the shard router to determine which node(s) get the write. DSE figures out which shards to query Favors local shards over remote shards Reduces fan-out by using multiple shards on each node This assumes replication factor > 1 Cassandra Router implementation called ShardRouter DSE sends a distributed query to remote nodes Uses random (shuffle) to pick which nodes by default Other shard shuffle strategies available. Each query has an fq=id:[token range start TO token range end] This limits the request to a subset of the shards on a node The data is first written to the ram buffer. From the ram buffer, based on the soft commit threshold, that data is written to index segments that live on disk, but are not fsynced. At the time the C* memtables flush to SSTables on disk, the data in the in-memory index segments are fsyncd and “hard committed” to disk.
  11. In traditional OSS Solr, the ram buffer is not searchable so data is not available to be searched until it is soft commited. These soft commit are not durable, so if the JVM crashes before these are synced to disk, you may lose some of your indexes. Creates a delay in being able to read newly invested data and the Soft commit threshold needs to be tuned… this is typically 1 - 10second delay.
  12. Datastax has modified OSS Solr so that the Ram buffer is now searchable, a bit like Cassandra memtables. This means that the data is available to be searched very shortly after being injested by Cassandra… less than 1 second in some cases. Now your real time application data will be available to the application to be searched without having to wait for soft commit. This will not only improve your application’s performance but also allow more data to be indexed in a shorter period of time.
  13. “Find me all webpages that contain “Cassandra” and “Optimization”” “Recommend artists who are “like” “Taylor Swift”” “Highlight the keywords “Awesome,” “Good,” and “Amazing.”” Find me all shoes with: Size: 12 Price: $15 - $60 Brand: Nike
  14. This ONLY allows search on tags, and the “index” table needs to be maintained manually. This is fine if this is all you want to do, and is very common. Location_type, currently ith only 2 values, will work with a2i, name on the other hand, would not. So how do you search by name? Secondary indexes will allow you to search on the whole field, but those aren’t efficient for higher cardinality values, like email address Secondary index creates additional data structures on each node that hold table partitions Each ‘local index’ indexes values in rows stored locally Query on an indexed column requires accessing ‘local indexes’ on all nodes Expensive Query on a partition key and an indexed column requires accessing a ‘local index’ on one (or a few) nodes Efficient But say you want to search the text in description or title? What now?