SlideShare uma empresa Scribd logo
1 de 57
When to NoSQL and When 
to Know SQL 
Simon Elliston Ball 
Head of Big Data 
@sireb 
#noSQLknowSQL 
http://nosqlknowsql.io
what is NoSQL? 
SQL 
Not only SQL 
Many many things 
NoSQL 
No, SQL
before SQL 
files 
multi-value 
ur… hash maps?
after SQL 
everything is relational 
ORMs fill in the other data structures 
scale up rules 
data first design
and now NoSQL 
datastores that suit applications 
polyglot persistence: the right tools 
scale out rules 
APIs not EDWs
why should you care? 
data growth 
rapid development 
fewer migration headaches… maybe 
machine learning 
social
big bucks. 
O’Reilly 2013 Data Science Salary Survey
So many NoSQLs…
document databases
document databases 
rapid development 
JSON docs 
complex, variable models 
known access pattern
document databases 
learn a very new query language 
denormalize 
document 
form 
joins? JUST DON’T 
http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
document vs SQL 
what can SQL do? 
query all the angles 
sure, you can use blobs… 
… but you can’t get into them
documents in SQL 
SQL xml fields 
mapping xquery paths is painful 
native JSON 
but still structured
query everything: search 
class of database database 
full-text indexing
you know google, right… 
range query 
span query 
keyword query
you know the score 
scores 
"query": { 
"function_score": { 
"query": { 
"match": { "title": "NoSQL"} 
}, 
"functions": [ 
"boost": 1, 
"gauss": { 
"timestamp": { 
"scale": "4w" 
} 
}, 
"script_score" : { 
"script" : "_score * doc['important_document'].value ? 2 : 1" 
} 
], 
"score_mode": "sum" 
} 
}
SQL knows the score too 
scores 
declare @origin float = 0; 
declare @delay_weeks float = 4; 
SELECT TOP 10 * FROM ( 
SELECT title, 
score * 
CASE 
WHEN p.important = 1 THEN 2.0 
WHEN p.important = 0 THEN 1.0 
END 
* exp(-power(timestamp-@origin,2)/(2*@delay*7*24*3600)) 
+ 1 
AS score 
FROM posts p 
WHERE title LIKE '%NoSQL%' 
) as found 
ORDER BY score
you know google, right… 
more like this: instant tf-idf 
{ 
"more_like_this" : { 
"fields" : ["name.first", "name.last"], 
"like_text" : "text like this one", 
"min_term_freq" : 1, 
"max_query_terms" : 12 
} 
}
Facets
Facets
Facets 
SQL: 
x lots 
SELECT a.name, count(p.id) FROM 
people p 
JOIN industry a on a.id = p.industry_id 
JOIN people_keywords pk on pk.person_id = p.id 
JOIN keywords k on k.id = pk.keyword_id 
WHERE CONTAINS(p.description, 'NoSQL') 
OR k.name = 'NoSQL' 
... 
GROUP BY a.name 
SELECT a.name, count(p.id) FROM 
people p 
JOIN area a on a.id = p.area_id 
JOIN people_keywords pk on pk.person_id = p.id 
JOIN keywords k on k.id = pk.keyword_id 
WHERE CONTAINS(p.description, 'NoSQL') 
OR k.name = 'NoSQL' 
... 
GROUP BY a.name
Facets 
Elastic search: 
{ 
"query": { 
“query_string": { 
“default_field”: “content”, 
“query”: “keywords” 
} 
}, 
"facets": { 
“myTerms": { 
"terms": { 
"field" : "lang", 
"all_terms" : true 
} 
} 
} 
}
logs 
untyped free-text documents 
timestamped 
semi-structured 
discovery 
aggregation and statistics
key: value 
close to your programming model 
distributed map | list | set 
keys can be objects
SQL extensions 
hash types 
hstore
SQL and polymorphism 
inheritance 
ORMs hide the horror
turning round the rows 
columnar databases 
physical layout matters
turning round the rows 
key value type 
1 A Home 
2 B Work 
3 C Work 
4 D Work 
Row storage 
00001 1 A Home 00002 2 B Work 00003 3 C Work … 
Column storage 
A B C D Home Work Work Work …
teaching an old SQL new tricks 
MySQL InfoBright 
SQL Server Columnar Indexes 
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_col 
ON Orders (OrderDate, DueDate, ShipDate) 
Great for your data warehouse, but no use for OLTP
column for hadoop and other animals 
ORC files 
Parquet 
http://parquet.io
column families 
wide column databases
millions of columns 
eventually consistent 
CQL 
http://cassandra.apache.org/ 
http://www.datastax.com/ 
set | list | map types
cell level security 
SQL: so many views, so much confusion 
accumulo https://accumulo.apache.org/
time 
retrieving time series and graphs 
window functions 
SELECT business_date, ticker, 
close, 
close / 
LAG(close,1) OVER (PARTITION BY ticker ORDER BY business_date ASC) 
- 1 AS ret 
FROM sp500
time 
Open TSDB 
Influx DB 
key:sub-key:metric 
key:* 
key:*:metric
Queues
queues in SQL 
CREATE procedure [dbo].[Dequeue] 
AS 
set nocount on 
declare @BatchSize int 
set @BatchSize = 10 
declare @Batch table (QueueID int, QueueDateTime datetime, Title nvarchar(255)) 
begin tran 
insert into @Batch 
select Top (@BatchSize) QueueID, QueueDateTime, Title from QueueMeta 
WITH (UPDLOCK, HOLDLOCK) 
where Status = 0 
order by QueueDateTime ASC 
declare @ItemsToUpdate int 
set @ItemsToUpdate = @@ROWCOUNT 
update QueueMeta 
SET Status = 1 
WHERE QueueID IN (select QueueID from @Batch) 
AND Status = 0 
if @@ROWCOUNT = @ItemsToUpdate 
begin 
commit tran 
select b.*, q.TextData from @Batch b 
inner join QueueData q on q.QueueID = b.QueueID 
print 'SUCCESS' 
end 
else 
begin 
rollback tran 
print 'FAILED' 
end
queues in SQL 
index fragmentation is a problem 
but built in logs of a sort
message queues 
specialised apis 
capabilities like fan-out 
routing 
acknowledgement
relationships count 
Graph databases
relationships count 
trees and hierarchies 
overloaded relationships 
fancy algorithms
hierarchies with SQL 
adjacency lists 
CONSTRAIN fk_parent_id_id 
FOREIGN KEY parent_id REFERENCES 
some_table.id 
materialised path 
nested sets (MPTT) 
path = 1.2.23.55.786.33425 
Node Left Right Depth 
A 1 22 1 
B 2 9 2 
C 10 21 2 
D 3 4 3 
E 5 6 3 
F 7 8 3 
G 11 18 3 
H 19 20 3 
I 12 13 4 
J 14 15 4 
K 16 17 4
Consolidation 
OrientDB 
ArangoDB 
http://www.orientechnologies.com/ 
SQLs with NoSQL 
Postgres 
MySQL 
https://www.arangodb.org/
Big Data
SQL on Hadoop
More than SQL 
Shark 
Drill 
Cascading 
Map Reduce
System issues, 
Speed issues, 
Soft issues
the ACID, BASE litmus 
Atomic 
Consistent 
Isolated 
Durable 
Basically Available 
Soft-state 
Eventually consistent 
what matters to you?
CAP it all 
Consistency 
Partition Availability
write fast, ask questions later 
SQL writes cost a lot 
mainly write workload: NoSQL 
low latency write workload: NoSQL
is it web scale? 
most NoSQL scales well 
but clusters still need management 
are you facebook? one machine is easier than n 
can ops handle it? app developers make bad admins
who is going to use it? 
analysts: they want SQL 
developers: they want applications 
data scientists: they want access
choose the 
right tool 
Photo: http://www.homespothq.com/
Thank you! 
Simon Elliston Ball 
simon@simonellistonball.com 
@sireb 
#noSQLknowSQL 
http://nosqlknowsql.io
Questions 
Simon Elliston Ball 
simon@simonellistonball.com 
@sireb 
http://nosqlknowsql.io

Mais conteúdo relacionado

Mais procurados

OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
Marco Gralike
 

Mais procurados (20)

UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
 
NyaruDBにゃるものを使ってみた話 (+Realm比較)
NyaruDBにゃるものを使ってみた話 (+Realm比較)NyaruDBにゃるものを使ってみた話 (+Realm比較)
NyaruDBにゃるものを使ってみた話 (+Realm比較)
 
Mdst 3559-03-01-sql-php
Mdst 3559-03-01-sql-phpMdst 3559-03-01-sql-php
Mdst 3559-03-01-sql-php
 
Polyglot Persistence
Polyglot PersistencePolyglot Persistence
Polyglot Persistence
 
XML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDBXML In The Real World - Use Cases For Oracle XMLDB
XML In The Real World - Use Cases For Oracle XMLDB
 
Scrap your query boilerplate with specql
Scrap your query boilerplate with specqlScrap your query boilerplate with specql
Scrap your query boilerplate with specql
 
Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2Starting with JSON Path Expressions in Oracle 12.1.0.2
Starting with JSON Path Expressions in Oracle 12.1.0.2
 
Oracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New FeaturesOracle Database 11g Release 2 - XMLDB New Features
Oracle Database 11g Release 2 - XMLDB New Features
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Mssql to oracle
Mssql to oracleMssql to oracle
Mssql to oracle
 
SQL Reports in Koha
SQL Reports in KohaSQL Reports in Koha
SQL Reports in Koha
 
Introduction to SPARQL
Introduction to SPARQLIntroduction to SPARQL
Introduction to SPARQL
 
Introduction to database
Introduction to databaseIntroduction to database
Introduction to database
 
ODTUG Webcast - Thinking Clearly about XML
ODTUG Webcast - Thinking Clearly about XMLODTUG Webcast - Thinking Clearly about XML
ODTUG Webcast - Thinking Clearly about XML
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
 
LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0LATERAL Derived Tables in MySQL 8.0
LATERAL Derived Tables in MySQL 8.0
 
How sqlite works
How sqlite worksHow sqlite works
How sqlite works
 
wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?wtf is in Java/JDK/wtf7?
wtf is in Java/JDK/wtf7?
 
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
OPP2010 (Brussels) - Programming with XML in PL/SQL - Part 2
 
Jdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And EnhancementsJdbc 4.0 New Features And Enhancements
Jdbc 4.0 New Features And Enhancements
 

Destaque

Parking eye reply
Parking eye replyParking eye reply
Parking eye reply
jpskiller
 
Chapter8 contentlit part 2
Chapter8 contentlit part 2Chapter8 contentlit part 2
Chapter8 contentlit part 2
Brittany Clawson
 
American gangster trailer analysis
American gangster trailer analysisAmerican gangster trailer analysis
American gangster trailer analysis
BenjaminSSmith
 
Nuevo presentación de microsoft office power point (3)
Nuevo presentación de microsoft office power point (3)Nuevo presentación de microsoft office power point (3)
Nuevo presentación de microsoft office power point (3)
Bachicmc1A
 
Types of documentary
Types of documentaryTypes of documentary
Types of documentary
cw00531169
 
Simple guitar lesson
Simple guitar lessonSimple guitar lesson
Simple guitar lesson
drdxp
 

Destaque (20)

Parking eye reply
Parking eye replyParking eye reply
Parking eye reply
 
Swing ui design
Swing ui designSwing ui design
Swing ui design
 
Chapter8 contentlit part 2
Chapter8 contentlit part 2Chapter8 contentlit part 2
Chapter8 contentlit part 2
 
Hz study internationalisationaerosuppliers
Hz study internationalisationaerosuppliersHz study internationalisationaerosuppliers
Hz study internationalisationaerosuppliers
 
Addressing Needs of BYOD for Enterprises with Unified Access
Addressing Needs of BYOD for Enterprises with Unified AccessAddressing Needs of BYOD for Enterprises with Unified Access
Addressing Needs of BYOD for Enterprises with Unified Access
 
ΙΣΤΟΡΙΑ
ΙΣΤΟΡΙΑΙΣΤΟΡΙΑ
ΙΣΤΟΡΙΑ
 
Trabajo momento 2 7
Trabajo momento 2   7Trabajo momento 2   7
Trabajo momento 2 7
 
American gangster trailer analysis
American gangster trailer analysisAmerican gangster trailer analysis
American gangster trailer analysis
 
A Practical Guide to Developing a Connected Hospital
A Practical Guide to Developing a Connected HospitalA Practical Guide to Developing a Connected Hospital
A Practical Guide to Developing a Connected Hospital
 
Plane figures1
Plane figures1Plane figures1
Plane figures1
 
BIT 201 Carta Decano y Portada
BIT 201 Carta Decano y PortadaBIT 201 Carta Decano y Portada
BIT 201 Carta Decano y Portada
 
NDC London 2013 - Mongo db for c# developers
NDC London 2013 - Mongo db for c# developersNDC London 2013 - Mongo db for c# developers
NDC London 2013 - Mongo db for c# developers
 
Multiplication of binomials
Multiplication of binomialsMultiplication of binomials
Multiplication of binomials
 
User guide membuat diagram dengan draw express lite
User guide membuat diagram dengan draw express liteUser guide membuat diagram dengan draw express lite
User guide membuat diagram dengan draw express lite
 
Nuevo presentación de microsoft office power point (3)
Nuevo presentación de microsoft office power point (3)Nuevo presentación de microsoft office power point (3)
Nuevo presentación de microsoft office power point (3)
 
Chat log 12:06:2014 18:33:18
Chat log 12:06:2014 18:33:18Chat log 12:06:2014 18:33:18
Chat log 12:06:2014 18:33:18
 
Types of documentary
Types of documentaryTypes of documentary
Types of documentary
 
Simple guitar lesson
Simple guitar lessonSimple guitar lesson
Simple guitar lesson
 
Magazine analysis
Magazine analysisMagazine analysis
Magazine analysis
 
Organisation design
Organisation designOrganisation design
Organisation design
 

Semelhante a When to no sql and when to know sql javaone

What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
ukdpe
 

Semelhante a When to no sql and when to know sql javaone (20)

Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
When to NoSQL and when to know SQL
When to NoSQL and when to know SQLWhen to NoSQL and when to know SQL
When to NoSQL and when to know SQL
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Inside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTPInside SQL Server In-Memory OLTP
Inside SQL Server In-Memory OLTP
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...
A Designer's Favourite Security and Privacy Features in SQL Server and Azure ...
 
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
MySQL Without the SQL -- Oh My!  Longhorn PHP ConferenceMySQL Without the SQL -- Oh My!  Longhorn PHP Conference
MySQL Without the SQL -- Oh My! Longhorn PHP Conference
 
Kultam MM UI - MySQL for Data Analytics and Business Intelligence.pdf
Kultam MM UI - MySQL for Data Analytics and Business Intelligence.pdfKultam MM UI - MySQL for Data Analytics and Business Intelligence.pdf
Kultam MM UI - MySQL for Data Analytics and Business Intelligence.pdf
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
XMLDB Building Blocks And Best Practices - Oracle Open World 2008 - Marco Gra...
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 

Mais de Simon Elliston Ball

Mais de Simon Elliston Ball (10)

A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache Metron
 
mcubed london - data science at the edge
mcubed london - data science at the edgemcubed london - data science at the edge
mcubed london - data science at the edge
 
Machine learning without the PhD - azure ml
Machine learning without the PhD - azure mlMachine learning without the PhD - azure ml
Machine learning without the PhD - azure ml
 
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
Why Hadoop and SQL just want to be friends - lightning talk NoSQL Matters Dub...
 
Getting your Big Data on with HDInsight
Getting your Big Data on with HDInsightGetting your Big Data on with HDInsight
Getting your Big Data on with HDInsight
 
Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0
 
Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0
 
Finding and Using Big Data in your business
Finding and Using Big Data in your businessFinding and Using Big Data in your business
Finding and Using Big Data in your business
 
Mongo db for c# developers
Mongo db for c# developersMongo db for c# developers
Mongo db for c# developers
 
Mongo db for C# Developers
Mongo db for C# DevelopersMongo db for C# Developers
Mongo db for C# Developers
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

When to no sql and when to know sql javaone

  • 1. When to NoSQL and When to Know SQL Simon Elliston Ball Head of Big Data @sireb #noSQLknowSQL http://nosqlknowsql.io
  • 2. what is NoSQL? SQL Not only SQL Many many things NoSQL No, SQL
  • 3. before SQL files multi-value ur… hash maps?
  • 4. after SQL everything is relational ORMs fill in the other data structures scale up rules data first design
  • 5. and now NoSQL datastores that suit applications polyglot persistence: the right tools scale out rules APIs not EDWs
  • 6. why should you care? data growth rapid development fewer migration headaches… maybe machine learning social
  • 7. big bucks. O’Reilly 2013 Data Science Salary Survey
  • 9.
  • 11. document databases rapid development JSON docs complex, variable models known access pattern
  • 12. document databases learn a very new query language denormalize document form joins? JUST DON’T http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/
  • 13. document vs SQL what can SQL do? query all the angles sure, you can use blobs… … but you can’t get into them
  • 14. documents in SQL SQL xml fields mapping xquery paths is painful native JSON but still structured
  • 15. query everything: search class of database database full-text indexing
  • 16. you know google, right… range query span query keyword query
  • 17. you know the score scores "query": { "function_score": { "query": { "match": { "title": "NoSQL"} }, "functions": [ "boost": 1, "gauss": { "timestamp": { "scale": "4w" } }, "script_score" : { "script" : "_score * doc['important_document'].value ? 2 : 1" } ], "score_mode": "sum" } }
  • 18. SQL knows the score too scores declare @origin float = 0; declare @delay_weeks float = 4; SELECT TOP 10 * FROM ( SELECT title, score * CASE WHEN p.important = 1 THEN 2.0 WHEN p.important = 0 THEN 1.0 END * exp(-power(timestamp-@origin,2)/(2*@delay*7*24*3600)) + 1 AS score FROM posts p WHERE title LIKE '%NoSQL%' ) as found ORDER BY score
  • 19. you know google, right… more like this: instant tf-idf { "more_like_this" : { "fields" : ["name.first", "name.last"], "like_text" : "text like this one", "min_term_freq" : 1, "max_query_terms" : 12 } }
  • 22. Facets SQL: x lots SELECT a.name, count(p.id) FROM people p JOIN industry a on a.id = p.industry_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name SELECT a.name, count(p.id) FROM people p JOIN area a on a.id = p.area_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name
  • 23. Facets Elastic search: { "query": { “query_string": { “default_field”: “content”, “query”: “keywords” } }, "facets": { “myTerms": { "terms": { "field" : "lang", "all_terms" : true } } } }
  • 24. logs untyped free-text documents timestamped semi-structured discovery aggregation and statistics
  • 25. key: value close to your programming model distributed map | list | set keys can be objects
  • 26. SQL extensions hash types hstore
  • 27. SQL and polymorphism inheritance ORMs hide the horror
  • 28. turning round the rows columnar databases physical layout matters
  • 29. turning round the rows key value type 1 A Home 2 B Work 3 C Work 4 D Work Row storage 00001 1 A Home 00002 2 B Work 00003 3 C Work … Column storage A B C D Home Work Work Work …
  • 30. teaching an old SQL new tricks MySQL InfoBright SQL Server Columnar Indexes CREATE NONCLUSTERED COLUMNSTORE INDEX idx_col ON Orders (OrderDate, DueDate, ShipDate) Great for your data warehouse, but no use for OLTP
  • 31. column for hadoop and other animals ORC files Parquet http://parquet.io
  • 32. column families wide column databases
  • 33. millions of columns eventually consistent CQL http://cassandra.apache.org/ http://www.datastax.com/ set | list | map types
  • 34. cell level security SQL: so many views, so much confusion accumulo https://accumulo.apache.org/
  • 35.
  • 36. time retrieving time series and graphs window functions SELECT business_date, ticker, close, close / LAG(close,1) OVER (PARTITION BY ticker ORDER BY business_date ASC) - 1 AS ret FROM sp500
  • 37. time Open TSDB Influx DB key:sub-key:metric key:* key:*:metric
  • 39. queues in SQL CREATE procedure [dbo].[Dequeue] AS set nocount on declare @BatchSize int set @BatchSize = 10 declare @Batch table (QueueID int, QueueDateTime datetime, Title nvarchar(255)) begin tran insert into @Batch select Top (@BatchSize) QueueID, QueueDateTime, Title from QueueMeta WITH (UPDLOCK, HOLDLOCK) where Status = 0 order by QueueDateTime ASC declare @ItemsToUpdate int set @ItemsToUpdate = @@ROWCOUNT update QueueMeta SET Status = 1 WHERE QueueID IN (select QueueID from @Batch) AND Status = 0 if @@ROWCOUNT = @ItemsToUpdate begin commit tran select b.*, q.TextData from @Batch b inner join QueueData q on q.QueueID = b.QueueID print 'SUCCESS' end else begin rollback tran print 'FAILED' end
  • 40. queues in SQL index fragmentation is a problem but built in logs of a sort
  • 41. message queues specialised apis capabilities like fan-out routing acknowledgement
  • 43. relationships count trees and hierarchies overloaded relationships fancy algorithms
  • 44. hierarchies with SQL adjacency lists CONSTRAIN fk_parent_id_id FOREIGN KEY parent_id REFERENCES some_table.id materialised path nested sets (MPTT) path = 1.2.23.55.786.33425 Node Left Right Depth A 1 22 1 B 2 9 2 C 10 21 2 D 3 4 3 E 5 6 3 F 7 8 3 G 11 18 3 H 19 20 3 I 12 13 4 J 14 15 4 K 16 17 4
  • 45. Consolidation OrientDB ArangoDB http://www.orientechnologies.com/ SQLs with NoSQL Postgres MySQL https://www.arangodb.org/
  • 48. More than SQL Shark Drill Cascading Map Reduce
  • 49. System issues, Speed issues, Soft issues
  • 50. the ACID, BASE litmus Atomic Consistent Isolated Durable Basically Available Soft-state Eventually consistent what matters to you?
  • 51. CAP it all Consistency Partition Availability
  • 52. write fast, ask questions later SQL writes cost a lot mainly write workload: NoSQL low latency write workload: NoSQL
  • 53. is it web scale? most NoSQL scales well but clusters still need management are you facebook? one machine is easier than n can ops handle it? app developers make bad admins
  • 54. who is going to use it? analysts: they want SQL developers: they want applications data scientists: they want access
  • 55. choose the right tool Photo: http://www.homespothq.com/
  • 56. Thank you! Simon Elliston Ball simon@simonellistonball.com @sireb #noSQLknowSQL http://nosqlknowsql.io
  • 57. Questions Simon Elliston Ball simon@simonellistonball.com @sireb http://nosqlknowsql.io

Notas do Editor

  1. Some SQL implementations have added a certain amount of document style implementations…. SQL Server’s XML type lets you dig into things with Xquery. Not much fun Postgres also supports native JSON objects, so it makes a great solution for slinging JSON to a web server if you have multiple columns, or rather keys you want to use to get the data out. These do tend to be very bolt on solutions.
  2. Range queries, span, boosting gives you a probabilistic match.
  3. Search engines can also be thought of as sort engines. They order your results by a relevance score. This can be extremely flexible. For example Elastic search will let you control the sort function in huge detail. Here gausian decay of relevance for articles over time. Besting queries, adding boost factors to certain fields or certain classes on document based on custom expressions and scripts can be very powerful, and much easier to express than it would be in SQL.
  4. So how would you do this in SQL? That doesn’t seem so bad, however, here we’ve do a sub query, which requires the entire matching set to evaluate, we’re also using a simple scoring function here, which can just be a case statement, if we wanted to involve multiple fields, these scorings can get quite obtuse in SQL.
  5. you CAN implement a queue in SQL. Doesn’t mean you should.
  6. Big Data is a big topic. Hadoop is the leading platform, but it’s a platform.
  7. go through each one, say who made it, what it does. spark: technically, spark is the framework, the SQL bit is called Shark, but doesn’t have a logo, so it can’t go on the logo slide. hive: aiming for SQL completion, compiles to MapReduce, currently does most things, not correlated subqueries.