2. 2 2
NoSQL Databases
â˘âŻWikipedia says:
A NoSQL database provides a mechanism for storage and retrieval of data that
use looser consistency models than traditional relational databases in order to
achieve horizontal scaling and higher availability. Some authors refer to them as
"Not only SQL" to emphasize that some NoSQL systems do allow SQL-like query
language to be used.
â˘âŻNon-traditional data stores
â˘âŻDoesnât use / isnât designed around SQL
â˘âŻMay not give full ACID guarantees
â˘âŻOffers other advantages such as greater scalability as a
tradeoff
â˘âŻDistributed, fault-tolerant architecture
4. 4 4
Solr Cloud
â˘âŻDistributed Indexing designed from the ground up to
accommodate desired features
â˘âŻCAP Theorem
â˘âŻConsistency, Availability, Partition Tolerance (saying goes âchoose 2â)
â˘âŻReality: Must handle P â the real choice is tradeoffs between C and A
â˘âŻEnded up with a CP system (roughly)
â˘âŻValue Consistency over Availability
â˘âŻEventual consistency is incompatible with optimistic concurrency
â˘âŻClosest to MongoDB in architecture
â˘âŻWe still do well with Availability
â˘âŻAll N replicas of a shard must go down before we lose writability for that
shard
â˘âŻFor a network partition, the âbigâ partition remains active (i.e. Availability
isnât âonâ or âoffâ)
6. 6 6
Solr 4 at a glance
â˘âŻDocument Oriented NoSQL Search Server
â˘âŻData-format agnostic (JSON, XML, CSV, binary)
â˘âŻSchema-less options (more coming soon)
â˘âŻDistributed
â˘âŻMulti-tenanted
â˘âŻFault Tolerant
â˘âŻHA + No single points of failure
â˘âŻAtomic Updates
â˘âŻOptimistic Concurrency
â˘âŻNear Real-time Search
â˘âŻFull-Text search + Hit Highlighting
â˘âŻTons of specialized queries: Faceted search, grouping,
pseudo-join, spatial search, functions
The desire for these
features drove some
of the âSolrCloudâ
architecture
7. 7 7
Quick Start
1.⯠Unzip the binary distribution (.ZIP file)
Note: no âinstallationâ required
3.⯠Start Solr
4.⯠Go!
Browse to http://localhost:8983/solr for the new admin
interface
$
 cd
 example
Â
$
 java
 âjar
 start.jar
Â
9. 9 9
Add and Retrieve document
$ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d '
[
{ "id" : "book1",
"title" : "American Gods",
"author" : "Neil Gaiman"
}
]'
$ curl http://localhost:8983/solr/get?id=book1
{
Â
Â
 "doc":
 {
Â
Â
Â
Â
 "id"
 :
 "book1",
Â
Â
Â
Â
 "author":
 "Neil
 Gaiman",
Â
Â
Â
Â
 "title"
 :
 "American
 Gods",
Â
Â
Â
Â
 "_version_":
 1410390803582287872
Â
Â
 }
Â
}
Â
Note: no type of âcommitâ
is necessary to retrieve
documents via /get
(real-time get)
10. 10 10
Simplified JSON Delete Syntax
â˘âŻSinge delete-by-id
{"delete":âbook1"}
Â
â˘âŻMultiple delete-by-id
{"delete":[âbook1â,âbook2â,âbook3â]}
Â
â˘âŻDelete with optimistic concurrency
{"delete":{"id":âbook1",
 "_version_":123456789}}
Â
â˘âŻDelete by Query
{"delete":{âquery":âtag:category1â}}
Â
11. 11 11
Atomic Updates
$
 curl
 http://localhost:8983/solr/update
 -ÂâH
 'Content-Ââtype:application/json'
 -Ââd
 '
Â
[
Â
 {"id"
Â
Â
Â
Â
Â
Â
Â
 :
 "book1",
Â
Â
 "pubyear_i"
 :
 {
 "add"
 :
 2001
 },
Â
Â
 "ISBN_s"
Â
Â
Â
 :
 {
 "add"
 :
 "0-Ââ380-Ââ97365-Ââ1"}
Â
 }
Â
]'
Â
$
 curl
 http://localhost:8983/solr/update
 -ÂâH
 'Content-Ââtype:application/json'
 -Ââd
 '
Â
[
Â
 {"id"
Â
Â
Â
Â
Â
Â
Â
 :
 "book1",
Â
Â
 "copies_i"
Â
 :
 {
 "inc"
 :
 1},
Â
Â
 "cat"
Â
Â
Â
Â
Â
Â
 :
 {
 "add"
 :
 "fantasy"},
Â
Â
 "ISBN_s"
Â
Â
Â
 :
 {
 "set"
 :
 "0-Ââ380-Ââ97365-Ââ0"}
Â
Â
 "remove_s"
Â
 :
 {
 "set"
 :
 null
 }
 }
Â
]'
Â
12. 12 12
Optimistic Concurrency
â˘âŻ Conditional update based on document version
Solr
1. /get document
2. Modify
document,
retaining
_version_
3. /update resulting
document
4. Go back to
step #1 if fail
code=409
client
13. 13 13
Version semantics
_version_ Update Semantics
> 1 Document version must exactly match supplied
_version_
1 Document must exist
< 0 Document must not exist
0 Donât care (normal overwrite if exists)
â˘âŻ Specifying _version_ on any update
invokes optimistic concurrency
14. 14 14
Optimistic Concurrency Example
$
 curl
 http://localhost:8983/solr/update
 -ÂâH
 'Content-Ââtype:application/json'
 -Ââd
 '
Â
[
Â
 {
Â
Â
Â
Â
 "id":"book2",
Â
Â
Â
Â
 "title":["Neuromancer"],
Â
Â
Â
Â
 "author":"William
 Gibson",
Â
Â
Â
Â
 "copiesIn_i":6,
Â
Â
Â
Â
 "copiesOut_i":4,
Â
Â
Â
Â
 "_version_":123456789
 }
Â
]'
Â
$
 curl
 http://localhost:8983/solr/get?id=book2
Â
{
 "docâ
 :
 {
Â
Â
Â
Â
 "id":"book2",
Â
Â
Â
Â
 "title":["Neuromancer"],
Â
Â
Â
Â
 "author":"William
 Gibson",
Â
Â
Â
Â
 "copiesIn_i":7,
Â
Â
Â
Â
 "copiesOut_i":3,
Â
Â
Â
Â
 "_version_":123456789
 }}
Â
curl http://localhost:8983/solr/update?_version_=123456789 -H 'Content-type:application/json'
-d [âŚ]
Get the document
Modify and resubmit, using
the same _version_
Alternately, specify
the _version_ as a
request parameter
15. 15 15
Optimistic Concurrency Errors
â˘âŻHTTP Code 409 (Conflict) returned on version mismatch
$ curl -i http://localhost:8983/solr/update -H 'Content-type:application/json' -d '
[{"id":"book1", "author":"Mr Bean", "_version_":54321}]'
HTTP/1.1
 409
 Conflict
Â
Content-ÂâType:
 text/plain;charset=UTF-Ââ8
Â
Transfer-ÂâEncoding:
 chunked
Â
Â
Â
{
Â
Â
 "responseHeader":{
Â
Â
Â
Â
 "status":409,
Â
Â
Â
Â
 "QTime":1},
Â
Â
 "error":{
Â
Â
Â
Â
 "msg":"version
 conflict
 for
 book1
 expected=12345
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
 actual=1408814192853516288",
Â
Â
Â
Â
 "code":409}}
Â
17. 17 17
Schema REST API
â˘âŻRestlet is now integrated with Solr
â˘âŻGet a specific field
curl
Â
http://localhost:8983/solr/schema/fields/price
Â
{"field":{
Â
Â
Â
Â
 "name":"price",
Â
Â
Â
Â
 "type":"float",
Â
Â
Â
Â
 "indexed":true,
Â
Â
Â
Â
 "stored":true
 }}
Â
â˘âŻGet all fields
curl
 http://localhost:8983/solr/schema/fields
Â
â˘âŻGet Entire Schema!
curl
 http://localhost:8983/solr/schema
Â
Â
18. 18 18
Dynamic Schema
â˘âŻAdd a new field (Solr 4.4)
curl
 -ÂâXPUT
 http://localhost:8983/solr/schema/fields/strength
 -Ââd
 â
{"type":âfloat",
 "indexed":"trueâ}
Â
Â
Â
â
Â
â˘âŻWorks in distributed (cloud) mode too!
â˘âŻSchema must be managed & mutable (not currently the default)
<schemaFactory
 class="ManagedIndexSchemaFactory">
Â
Â
 <bool
 name="mutable">true</bool>
Â
Â
 <str
 name="managedSchemaResourceName">managed-Ââschema</str>
Â
</schemaFactory>
Â
Â
19. 19 19
Schemaless
â˘âŻâSchemalessâ really normally means that the client(s) have an implicit
schema
â˘âŻâNo Schemaâ impossible for anything based on Lucene
â˘âŻ A field must be indexed the same way across documents
â˘âŻDynamic fields: convention over configuration
â˘âŻ Only pre-define types of fields, not fields themselves
â˘âŻ No guessing. Any field name ending in _i is an integer
â˘âŻâGuessed Schemaâ or âType Guessingâ
â˘âŻ For previously unknown fields, guess using JSON type as a hint
â˘âŻ Coming soon (4.4?) based on the Dynamic Schema work
â˘âŻMany disadvantages to guessing
â˘âŻ Lose ability to catch field naming errors
â˘âŻ Canât optimize based on types
â˘âŻ Guessing incorrectly means having to start over
22. 22 22
Distributed Indexing
shard1
http://.../solr/collection1/update
shard2
â˘âŻ Update sent to any node
â˘âŻ Solr determines what shard the document is on, and forwards to shard leader
â˘âŻ Shard Leader versions document and forwards to all other shard replicas
â˘âŻ HA for updates (if one leader fails, another takes itâs place)
23. 23 23
Collections API
lďŹâŻ Create a new document collection
http://localhost:8983/solr/admin/collections?
Â
 action=CREATE
Â
Â
 &name=mycollection
Â
 &numShards=4
Â
 &replicationFactor=3
Â
Â
lďŹâŻ Delete a collection
http://localhost:8983/solr/admin/collections?
Â
 action=DELETE
Â
 &name=mycollection
Â
Â
lďŹâŻ Create an alias to a collection (or a group of collections)
http://localhost:8983/solr/admin/collections?
Â
 action=CREATEALIAS
Â
 &name=tri_state
Â
 &collections=NY,NJ,CT
Â
25. 25 25
Distributed Query Requests
lďŹâŻ Distributed query across all shards in the collection
http://localhost:8983/solr/collection1/query?q=foo
Â
Â
lďŹâŻ Explicitly specify node addresses to load-balance across
shards=localhost:8983/solr|localhost:8900/solr,
Â
Â
Â
Â
Â
Â
Â
 localhost:7574/solr|localhost:7500/solr
Â
lďŹâŻ A list of equivalent nodes are separated by â|â
lďŹâŻ Different phases of the same distributed request use the same node
lďŹâŻ Specify logical shards to search across
shards=NY,NJ,CT
Â
Â
lďŹâŻ Specify multiple collections to search across
collection=collection1,collection2
Â
Â
lďŹâŻ public
 CloudSolrServer(String
 zkHost)
Â
lďŹâŻ ZK aware SolrJ Java client that load-balances across all nodes in cluster
lďŹâŻ Calculate where document belongs and directly send to shard leader (new)
26. 26 26
Durable Writes
â˘âŻLucene flushes writes to disk on a âcommitâ
â˘âŻUncommitted docs are lost on a crash (at lucene level)
â˘âŻSolr 4 maintains itâs own transaction log
â˘âŻContains uncommitted documents
â˘âŻServices real-time get requests
â˘âŻRecovery (log replay on restart)
â˘âŻSupports distributed âpeer syncâ
â˘âŻWrites forwarded to multiple shard replicas
â˘âŻA replica can go away forever w/o collection data loss
â˘âŻA replica can do a fast âpeer syncâ if itâs only slightly out of date
â˘âŻA replica can do a full index replication (copy) from a peer
27. 27 27
Near Real Time (NRT) softCommit
â˘âŻsoftCommit opens a new view of the index without
flushing + fsyncing files to disk
â˘âŻDecouples update visibility from update durability
â˘âŻcommitWithin now implies a soft commit
â˘âŻCurrent autoCommit defaults from solrconfig.xml:
 <autoCommit>
Â
Â
Â
Â
Â
Â
Â
Â
 <maxTime>15000</maxTime>
Â
Â
Â
Â
Â
Â
Â
Â
 <openSearcher>false</openSearcher>
Â
Â
 </autoCommit>
Â
Â
<!-Ââ-Ââ
Â
 <autoSoftCommit>
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
 <maxTime>5000</maxTime>
Â
Â
Â
Â
Â
Â
Â
Â
 </autoSoftCommit>
 -Ââ-Ââ>
Â
29. 29 29
Seamless Online Shard Splitting
Shard2_0
Shard1
replica
leader
Shard2
replica
leader
Shard3
replica
leader
Shard2_1
1.⯠http://localhost:8983/solr/admin/collections?
action=SPLITSHARD&collection=mycollection&shard=Shard2
Â
2.⯠New sub-shards created in âconstructionâ state
3.⯠Leader starts forwarding applicable updates, which are buffered by the sub-shards
4.⯠Leader index is split and installed on the sub-shards
5.⯠Sub-shards apply buffered updates then become âactiveâ leaders and old shard
becomes âinactiveâ
update
30. 30 30
Stay in touch
https://twitter.com/LucidWorks
http://www.linkedin.com/company/lucidworks
http://plus.google.com/u/0/b/112313059186533721298/