Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Cassandra useful features
1. self-tunningRowCache
Readpath at top: row isupdate storedindifferentfiles.Updatesemailindatafile whileother data
storedinotherfile. Here we store onlymergedCache andupdate onlyfile where emailisupdate.No
cachingof fragmentsbut cachingwhole mergedrowssavedCPUand diskIO.
Anotheradvantage noneedtomerge rowsbefore presentingtouser.How muchmemoryforeachrow
caches nowautomatic.
Supportfor SSD/HDDin same cluster.
User session/activitystreamupdate andreadconstantlywrite once andmostrecent readlater.Each
table separate subdirectorylinkthat to ssdvolume andhddvolume pinthattable tomediawhichis
betterforits performance profile.
2. Lock free concurrencycontrol byrow level isolation.
ACID:Appendupdate tocommitlogevenif crash itwill be replayedfromcommitlog.
DistributedSystemconsistency:Givesmostrecentdataforthe row requested.
Concurentschemachanges: create table ordrop table thenpropagate acrossclusterbefore anotherone
but nowfor analytical workflowmultiple clientable toissue schemachangesable toreconcile and
merge those changesonfly.
JBOD:Justa bitof diskconfiguration: Whichhelpindetectinglostdiskunderreplicationearlieritwasleft
to RAID10 array and RAID10 deal withsingle diskfailure.
Virtual nodes:
3. Leftone node data splitinto3 virtual nodesonrightspreadon cluster.Addinganew node toclusterto
rebuildfailedclustereachof those virtual node canparticipate inrebuild.
Each NodeinringisassignedaTokenrebuildwithoutVirtual node.
Parallelize rebuildwithvirtual node eachone havingitsowntoken.
4. Casandara SQL limitation:
No joins, (Canbe done withHIVEsupport)
No subqueries,
No aggregationfunctionon groupby, (Canbe done withHIVE support)
limitedOrderBY.
Partitionblockbyblock_idsub-blockare orderedwithincluster.compoundprimarykeyfirstpartis
block_idispartitionkeyhence partitionbyblockidandguaranteesall sub-blockforablockremainlocal
to same machine andclusterdandorderedbysubblock.