SlideShare uma empresa Scribd logo
1 de 25
Roma – 7 Febbraio 2017
presenta Alberto Paro, Seacom
Elastic & Spark.
Building A Search Geo Locator
Alberto Paro
 Laureato in Ingegneria Informatica (POLIMI)
 Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech
review
 Lavoro principalmente in Scala e su tecnologie BD
(Akka, Spray.io, Playframework, Apache Spark) e NoSQL
(Accumulo, Cassandra, ElasticSearch e MongoDB)
 Evangelist linguaggio Scala e Scala.JS
Elasticseach 5.x - Cookbook
 Choose the best ElasticSearch cloud topology to deploy and power it up
with external plugins
 Develop tailored mapping to take full control of index steps
 Build complex queries through managing indices and documents
 Optimize search results through executing analytics aggregations
 Monitor the performance of the cluster and nodes
 Install Kibana to monitor cluster and extend Kibana for plugins.
 Integrate ElasticSearch in Java, Scala, Python and Big Data applications
Discount code for Ebook: ALPOEB50
Discount code for Print Book: ALPOPR15
Expiration Date: 21st Feb 2017
Obiettivi
 Architetture Big Data con ES
 Apache Spark
 GeoIngester
 Data Collection
 Ottimizzazione Indici
 Ingestion via Apache Spark
 Ricerca per un luogo
 Cenni di Big Data Tools
Architettura
Hadoop / Spark
Input
Iter 1
HDFS
Iter 2
HDFS
HDFS
Read
HDFS
Read
HDFS
Write
HDFS
Write
Input
Iter 1 Iter 2
Hadoop MapReduce
Apache Spark
Evoluzione del modello Map Reduce
Apache Spark
 Scritto in Scala con API in Java, Python e R
 Evoluzione del modello Map/Reduce
 Potenti moduli a corredo:
 Spark SQL
 Spark Streaming
 MLLib (Machine Learning)
 GraphX (graph)
Geoname
GeoNames è un database geografico, scaricabile gratuitamente sotto
licenza creative commons.
Contiene circa 10 millioni di nomi geografici e consiste di circa 9
milioni di feature uniqche di cui 2.8 milioni di posti popolati e 5.5
millioni di nomi alternativi.
Può essere facilmente scaricato da
http://download.geonames.org/export/dump come file CSV.
Il codice è disponibile all’indirizzo:
https://github.com/aparo/elasticsearch-geonames-locator
Geoname - Struttura
No. Attribute name Explanation
1 geonameid Unique ID for this geoname
2 name The name of the geoname
3 asciiname ASCII representation of the name
4 alternatenames Other forms of this name. Generally in several languages
5 latitude Latitude in decimal degrees of the Geoname
6 longitude Longitude in decimal degrees of the Geoname
7 fclass Feature class see http://www.geonames.org/export/codes.html
8 fcode Feature code see http://www.geonames.org/export/codes.html
9 country ISO-3166 2-letter country code
10 cc2 Alternate country codes, comma separated, ISO-3166 2-letter country code
11 admin1 Fipscode (subject to change to iso code
12 admin2 Code for the second administrative division, a county in the US
13 admin3 Code for third level administrative division
14 admin4 Code for fourth level administrative division
14 population The Population of Geoname
14 elevation The elevation in meters of Geoname
14 gtopo30 Digital elevation model
14 timezone The timezone of Geoname
14 moddate The date of last change of this Geoname
Ottimizzazione indici – 1/2
Necessario per:
 Rimuove campi non richiesti.
 Gestire campi Geo Point.
 Ottimizzare i campi stringa (text, keyword)
 Numeri shard corretto (11M records => 2 shards)
Vantaggi => performances/spazio/CPU
Ottimizzazione indici – 2/2
{
"mappings": {
"geoname": {
"properties": {
"admin1": {
"type": "keyword",
"ignore_above": 256
},
…
"alternatenames": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
…
…
"location": {
"type": "geo_point"
},
…
"longitude": {
"type": "float"
},
"moddate": {
"type": "date"
},
Ingestion via Spark – GeonameIngester – 1/7
Il nostro ingester eseguirà i seguenti steps:
 Inizializzazione Job Spark
 Parse del CSV
 Definizione della struttura di indicizzazione
 Popolamento delle classi
 Scrittura dati in Elasticsearch
 Esecuzione del Job Spark
Ingestion via Spark – GeonameIngester – 2/7
Inizializzazione di un Job Spark
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._
import org.elasticsearch.spark.rdd.EsSpark
import scala.util.Try
object GeonameIngester {
def main(args: Array[String]) {
val sparkSession = SparkSession.builder
.master("local")
.appName("GeonameIngester")
.getOrCreate()
Ingestion via Spark – GeonameIngester – 3/7
Parse del CSV
val geonameSchema = StructType(Array(
StructField("geonameid", IntegerType, false),
StructField("name", StringType, false),
StructField("asciiname", StringType, true),
StructField("alternatenames", StringType, true),
StructField("latitude", FloatType, true), ….
val GEONAME_PATH = "downloads/allCountries.txt"
val geonames = sparkSession.sqlContext.read
.option("header", false)
.option("quote", "")
.option("delimiter", "t").option("maxColumns", 22)
.schema(geonameSchema)
.csv(GEONAME_PATH)
.cache()
Ingestion via Spark – GeonameIngester – 4/7
Definizione delle nostre classi per l’Inidicizzazione
case class GeoPoint(lat: Double, lon: Double)
case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String],
latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String,
cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4:
Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate:
String)
implicit def emptyToOption(value: String): Option[String] = {
if (value == null) return None
val clean = value.trim
if (clean.isEmpty) { None } else { Some(clean)}
}
Ingestion via Spark – GeonameIngester – 5/7
Definizione delle nostre classi per l’Inidicizzazione
case class GeoPoint(lat: Double, lon: Double)
case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String],
latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String,
cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4:
Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate:
String)
implicit def emptyToOption(value: String): Option[String] = {
if (value == null) return None
val clean = value.trim
if (clean.isEmpty) { None } else { Some(clean)}
}
Ingestion via Spark – GeonameIngester – 6/7
Popolazione delle nostre classi
val records = geonames.map {
row =>
val id = row.getInt(0)
val lat = row.getFloat(4)
val lon = row.getFloat(5)
Geoname(id, row.getString(1), row.getString(2),
Option(row.getString(3)).map(_.split(",").map(_.trim).filterNot(_.isEmpty).toList).getOrElse(Nil),
lat, lon, GeoPoint(lat, lon),
row.getString(6), row.getString(7), row.getString(8), row.getString(9),
row.getString(10), row.getString(11), row.getString(12), row.getString(13),
row.getDouble(14), fixNullInt(row.get(15)), row.getInt(16), row.getString(17),
row.getDate(18).toString
)
}
Ingestion via Spark – GeonameIngester – 7/7
Scrittura in Elasticsearch
EsSpark.saveToEs(records.toJavaRDD, "geonames/geoname", Map("es.mapping.id" ->
"geonameid"))
Esecuzione di uno Spark Job
spark-submit --class GeonameIngester target/scala-2.11/elasticsearch-geonames-locator-
assembly-1.0.jar
(~20 minuti su singola macchina)
Ricerca di un luogo
curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{ "term": { "name": "moscow"}},
{ "term": { "alternatenames": "moscow"}},
{ "term": { "asciiname": "moscow" }}
],
"filter": [
{ "term": { "fclass": "P" }},
{ "range": { "population": {"gt": 0}}}
]
}
},
"sort": [ { "population": { "order": "desc"}}]
}'
NoSQL
Key-Value
 Redis
 Voldemort
 Dynomite
 Tokio*
BigTable Clones
 Accumulo
 Hbase
 Cassandra
Document
 CouchDB
 MongoDB
 ElasticSearch
GraphDB
 Neo4j
 OrientDB
 …Graph
Message Queue
 Kafka
 RabbitMQ
 ...MQ
NoSQL - Evolution
MicroServices
Linguaggio – Scala vs Java
public class User {
private String firstName;
private String lastName;
private String email;
private Password password;
public User(String firstName, String lastName,
String email, Password password) {
this.firstName = firstName;
this.lastName = lastName;
this.email = email;
this.password = password;
}
public String getFirstName() {return firstName; }
public void setFirstName(String firstName) { this.firstName = firstName; }
public String getLastName() { return lastName; }
public void setLastName(String lastName) { this.lastName = lastName; }
public String getEmail() { return email; }
public void setEmail(String email) { this.email = email; }
public Password getPassword() { return password; }
public void setPassword(Password password) { this.password = password; }
@Override public String toString() {
return "User [email=" + email + ", firstName=" + firstName + ", lastName=" + lastName + "]"; }
@Override public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((email == null) ? 0 : email.hashCode());
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
result = prime * result + ((lastName == null) ? 0 : firstName.hashCode());
result = prime * result + ((password == null) ? 0 : password.hashCode());
return result; }
@Override public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
User other = (User) obj;
if (email == null) {
if (other.email != null)
return false;
} else if (!email.equals(other.email))
return false;
if (password == null) {
if (other.password != null)
return false;
} else if (!password.equals(other.password))
return false;
if (firstName == null) {
if (other.firstName != null)
return false;
} else if (!firstName.equals(other.firstName))
return false;
if (lastName == null) {
case class User(
var firstName:String,
var lastName:String,
var email:String,
var password:Password)
JAVASCALA
Grazie per
l’attenzione
Alberto Paro
Q&A

Mais conteúdo relacionado

Mais procurados

Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04Krishna Sankar
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Jonathan Katz
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceLivePerson
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? DataWorks Summit
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop toolsalireza alikhani
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
Manifests of Future Past
Manifests of Future PastManifests of Future Past
Manifests of Future PastPuppet
 
JSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notJSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notChengHui Weng
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Realm: Building a mobile database
Realm: Building a mobile databaseRealm: Building a mobile database
Realm: Building a mobile databaseChristian Melchior
 
Going beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with PostgresGoing beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with PostgresCraig Kerstiens
 
The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212Mahmoud Samir Fayed
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...Altinity Ltd
 
GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)Gagan Agrawal
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Michaël Figuière
 

Mais procurados (19)

Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)Accelerating Local Search with PostgreSQL (KNN-Search)
Accelerating Local Search with PostgreSQL (KNN-Search)
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of code
 
Scalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduceScalding: Reaching Efficient MapReduce
Scalding: Reaching Efficient MapReduce
 
Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch? Should I Use Scalding or Scoobi or Scrunch?
Should I Use Scalding or Scoobi or Scrunch?
 
Apache avro and overview hadoop tools
Apache avro and overview hadoop toolsApache avro and overview hadoop tools
Apache avro and overview hadoop tools
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Manifests of Future Past
Manifests of Future PastManifests of Future Past
Manifests of Future Past
 
JSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notJSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why not
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Realm: Building a mobile database
Realm: Building a mobile databaseRealm: Building a mobile database
Realm: Building a mobile database
 
Going beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with PostgresGoing beyond Django ORM limitations with Postgres
Going beyond Django ORM limitations with Postgres
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Realm to Json & Royal
Realm to Json & RoyalRealm to Json & Royal
Realm to Json & Royal
 
The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)GPars (Groovy Parallel Systems)
GPars (Groovy Parallel Systems)
 
Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!Cassandra summit 2013 - DataStax Java Driver Unleashed!
Cassandra summit 2013 - DataStax Java Driver Unleashed!
 
Efficient Indexes in MySQL
Efficient Indexes in MySQLEfficient Indexes in MySQL
Efficient Indexes in MySQL
 

Semelhante a 2017 02-07 - elastic & spark. building a search geo locator

Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Jesper Kamstrup Linnet
 
Best of build 2021 - C# 10 & .NET 6
Best of build 2021 -  C# 10 & .NET 6Best of build 2021 -  C# 10 & .NET 6
Best of build 2021 - C# 10 & .NET 6Moaid Hathot
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Sunghyouk Bae
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs偉格 高
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Kotlin, smarter development for the jvm
Kotlin, smarter development for the jvmKotlin, smarter development for the jvm
Kotlin, smarter development for the jvmArnaud Giuliani
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghStuart Roebuck
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsBartosz Kosarzycki
 
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016STX Next
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecLoïc Descotte
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopSages
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Guillaume Laforge
 
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6Dmitry Soshnikov
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 

Semelhante a 2017 02-07 - elastic & spark. building a search geo locator (20)

Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?
 
Scala - en bedre Java?
Scala - en bedre Java?Scala - en bedre Java?
Scala - en bedre Java?
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Best of build 2021 - C# 10 & .NET 6
Best of build 2021 -  C# 10 & .NET 6Best of build 2021 -  C# 10 & .NET 6
Best of build 2021 - C# 10 & .NET 6
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017
 
3 database-jdbc(1)
3 database-jdbc(1)3 database-jdbc(1)
3 database-jdbc(1)
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Scala in Places API
Scala in Places APIScala in Places API
Scala in Places API
 
Kotlin, smarter development for the jvm
Kotlin, smarter development for the jvmKotlin, smarter development for the jvm
Kotlin, smarter development for the jvm
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup Edinburgh
 
Kotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projectsKotlin Developer Starter in Android projects
Kotlin Developer Starter in Android projects
 
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
Kotlin Developer Starter in Android - STX Next Lightning Talks - Feb 12, 2016
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar Prokopec
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
Java
JavaJava
Java
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
HelsinkiJS meet-up. Dmitry Soshnikov - ECMAScript 6
 
TechTalk - Dotnet
TechTalk - DotnetTechTalk - Dotnet
TechTalk - Dotnet
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 

Mais de Alberto Paro

LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19Alberto Paro
 
2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patternsAlberto Paro
 
Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Alberto Paro
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locatorAlberto Paro
 
2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big DataAlberto Paro
 
What's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeWhat's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeAlberto Paro
 
ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014Alberto Paro
 
Scala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSScala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSAlberto Paro
 

Mais de Alberto Paro (9)

Data streaming
Data streamingData streaming
Data streaming
 
LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19
 
2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns2018 07-11 - kafka integration patterns
2018 07-11 - kafka integration patterns
 
Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017Elasticsearch in architetture Big Data - EsInADay-2017
Elasticsearch in architetture Big Data - EsInADay-2017
 
2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator2017 02-07 - elastic & spark. building a search geo locator
2017 02-07 - elastic & spark. building a search geo locator
 
2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data2016 02-24 - Piattaforme per i Big Data
2016 02-24 - Piattaforme per i Big Data
 
What's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - FirenzeWhat's Big Data? - Big Data Tech - 2015 - Firenze
What's Big Data? - Big Data Tech - 2015 - Firenze
 
ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014ElasticSearch Meetup 30 - 10 - 2014
ElasticSearch Meetup 30 - 10 - 2014
 
Scala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJSScala Italy 2015 - Hands On ScalaJS
Scala Italy 2015 - Hands On ScalaJS
 

Último

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Último (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 

2017 02-07 - elastic & spark. building a search geo locator

  • 1. Roma – 7 Febbraio 2017 presenta Alberto Paro, Seacom Elastic & Spark. Building A Search Geo Locator
  • 2. Alberto Paro  Laureato in Ingegneria Informatica (POLIMI)  Autore di 3 libri su ElasticSearch da 1 a 5.x + 6 Tech review  Lavoro principalmente in Scala e su tecnologie BD (Akka, Spray.io, Playframework, Apache Spark) e NoSQL (Accumulo, Cassandra, ElasticSearch e MongoDB)  Evangelist linguaggio Scala e Scala.JS
  • 3. Elasticseach 5.x - Cookbook  Choose the best ElasticSearch cloud topology to deploy and power it up with external plugins  Develop tailored mapping to take full control of index steps  Build complex queries through managing indices and documents  Optimize search results through executing analytics aggregations  Monitor the performance of the cluster and nodes  Install Kibana to monitor cluster and extend Kibana for plugins.  Integrate ElasticSearch in Java, Scala, Python and Big Data applications Discount code for Ebook: ALPOEB50 Discount code for Print Book: ALPOPR15 Expiration Date: 21st Feb 2017
  • 4. Obiettivi  Architetture Big Data con ES  Apache Spark  GeoIngester  Data Collection  Ottimizzazione Indici  Ingestion via Apache Spark  Ricerca per un luogo  Cenni di Big Data Tools
  • 6. Hadoop / Spark Input Iter 1 HDFS Iter 2 HDFS HDFS Read HDFS Read HDFS Write HDFS Write Input Iter 1 Iter 2 Hadoop MapReduce Apache Spark Evoluzione del modello Map Reduce
  • 7. Apache Spark  Scritto in Scala con API in Java, Python e R  Evoluzione del modello Map/Reduce  Potenti moduli a corredo:  Spark SQL  Spark Streaming  MLLib (Machine Learning)  GraphX (graph)
  • 8. Geoname GeoNames è un database geografico, scaricabile gratuitamente sotto licenza creative commons. Contiene circa 10 millioni di nomi geografici e consiste di circa 9 milioni di feature uniqche di cui 2.8 milioni di posti popolati e 5.5 millioni di nomi alternativi. Può essere facilmente scaricato da http://download.geonames.org/export/dump come file CSV. Il codice è disponibile all’indirizzo: https://github.com/aparo/elasticsearch-geonames-locator
  • 9. Geoname - Struttura No. Attribute name Explanation 1 geonameid Unique ID for this geoname 2 name The name of the geoname 3 asciiname ASCII representation of the name 4 alternatenames Other forms of this name. Generally in several languages 5 latitude Latitude in decimal degrees of the Geoname 6 longitude Longitude in decimal degrees of the Geoname 7 fclass Feature class see http://www.geonames.org/export/codes.html 8 fcode Feature code see http://www.geonames.org/export/codes.html 9 country ISO-3166 2-letter country code 10 cc2 Alternate country codes, comma separated, ISO-3166 2-letter country code 11 admin1 Fipscode (subject to change to iso code 12 admin2 Code for the second administrative division, a county in the US 13 admin3 Code for third level administrative division 14 admin4 Code for fourth level administrative division 14 population The Population of Geoname 14 elevation The elevation in meters of Geoname 14 gtopo30 Digital elevation model 14 timezone The timezone of Geoname 14 moddate The date of last change of this Geoname
  • 10. Ottimizzazione indici – 1/2 Necessario per:  Rimuove campi non richiesti.  Gestire campi Geo Point.  Ottimizzare i campi stringa (text, keyword)  Numeri shard corretto (11M records => 2 shards) Vantaggi => performances/spazio/CPU
  • 11. Ottimizzazione indici – 2/2 { "mappings": { "geoname": { "properties": { "admin1": { "type": "keyword", "ignore_above": 256 }, … "alternatenames": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, … … "location": { "type": "geo_point" }, … "longitude": { "type": "float" }, "moddate": { "type": "date" },
  • 12. Ingestion via Spark – GeonameIngester – 1/7 Il nostro ingester eseguirà i seguenti steps:  Inizializzazione Job Spark  Parse del CSV  Definizione della struttura di indicizzazione  Popolamento delle classi  Scrittura dati in Elasticsearch  Esecuzione del Job Spark
  • 13. Ingestion via Spark – GeonameIngester – 2/7 Inizializzazione di un Job Spark import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ import org.elasticsearch.spark.rdd.EsSpark import scala.util.Try object GeonameIngester { def main(args: Array[String]) { val sparkSession = SparkSession.builder .master("local") .appName("GeonameIngester") .getOrCreate()
  • 14. Ingestion via Spark – GeonameIngester – 3/7 Parse del CSV val geonameSchema = StructType(Array( StructField("geonameid", IntegerType, false), StructField("name", StringType, false), StructField("asciiname", StringType, true), StructField("alternatenames", StringType, true), StructField("latitude", FloatType, true), …. val GEONAME_PATH = "downloads/allCountries.txt" val geonames = sparkSession.sqlContext.read .option("header", false) .option("quote", "") .option("delimiter", "t").option("maxColumns", 22) .schema(geonameSchema) .csv(GEONAME_PATH) .cache()
  • 15. Ingestion via Spark – GeonameIngester – 4/7 Definizione delle nostre classi per l’Inidicizzazione case class GeoPoint(lat: Double, lon: Double) case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String], latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String, cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4: Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate: String) implicit def emptyToOption(value: String): Option[String] = { if (value == null) return None val clean = value.trim if (clean.isEmpty) { None } else { Some(clean)} }
  • 16. Ingestion via Spark – GeonameIngester – 5/7 Definizione delle nostre classi per l’Inidicizzazione case class GeoPoint(lat: Double, lon: Double) case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String], latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String, cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4: Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate: String) implicit def emptyToOption(value: String): Option[String] = { if (value == null) return None val clean = value.trim if (clean.isEmpty) { None } else { Some(clean)} }
  • 17. Ingestion via Spark – GeonameIngester – 6/7 Popolazione delle nostre classi val records = geonames.map { row => val id = row.getInt(0) val lat = row.getFloat(4) val lon = row.getFloat(5) Geoname(id, row.getString(1), row.getString(2), Option(row.getString(3)).map(_.split(",").map(_.trim).filterNot(_.isEmpty).toList).getOrElse(Nil), lat, lon, GeoPoint(lat, lon), row.getString(6), row.getString(7), row.getString(8), row.getString(9), row.getString(10), row.getString(11), row.getString(12), row.getString(13), row.getDouble(14), fixNullInt(row.get(15)), row.getInt(16), row.getString(17), row.getDate(18).toString ) }
  • 18. Ingestion via Spark – GeonameIngester – 7/7 Scrittura in Elasticsearch EsSpark.saveToEs(records.toJavaRDD, "geonames/geoname", Map("es.mapping.id" -> "geonameid")) Esecuzione di uno Spark Job spark-submit --class GeonameIngester target/scala-2.11/elasticsearch-geonames-locator- assembly-1.0.jar (~20 minuti su singola macchina)
  • 19. Ricerca di un luogo curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "minimum_should_match": 1, "should": [ { "term": { "name": "moscow"}}, { "term": { "alternatenames": "moscow"}}, { "term": { "asciiname": "moscow" }} ], "filter": [ { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }'
  • 20. NoSQL Key-Value  Redis  Voldemort  Dynomite  Tokio* BigTable Clones  Accumulo  Hbase  Cassandra Document  CouchDB  MongoDB  ElasticSearch GraphDB  Neo4j  OrientDB  …Graph Message Queue  Kafka  RabbitMQ  ...MQ
  • 23. Linguaggio – Scala vs Java public class User { private String firstName; private String lastName; private String email; private Password password; public User(String firstName, String lastName, String email, Password password) { this.firstName = firstName; this.lastName = lastName; this.email = email; this.password = password; } public String getFirstName() {return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getLastName() { return lastName; } public void setLastName(String lastName) { this.lastName = lastName; } public String getEmail() { return email; } public void setEmail(String email) { this.email = email; } public Password getPassword() { return password; } public void setPassword(Password password) { this.password = password; } @Override public String toString() { return "User [email=" + email + ", firstName=" + firstName + ", lastName=" + lastName + "]"; } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((email == null) ? 0 : email.hashCode()); result = prime * result + ((firstName == null) ? 0 : firstName.hashCode()); result = prime * result + ((lastName == null) ? 0 : firstName.hashCode()); result = prime * result + ((password == null) ? 0 : password.hashCode()); return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; User other = (User) obj; if (email == null) { if (other.email != null) return false; } else if (!email.equals(other.email)) return false; if (password == null) { if (other.password != null) return false; } else if (!password.equals(other.password)) return false; if (firstName == null) { if (other.firstName != null) return false; } else if (!firstName.equals(other.firstName)) return false; if (lastName == null) { case class User( var firstName:String, var lastName:String, var email:String, var password:Password) JAVASCALA
  • 25. Q&A

Notas do Editor

  1. Spark SQL (DB, Json, case class)
  2. Vantaggi: 100% opensource Svantaggi: Breaking modulli
  3. Spark SQL (DB, Json, case class)
  4. Spark SQL (DB, Json, case class)
  5. Spark SQL (DB, Json, case class)
  6. Spark SQL (DB, Json, case class)
  7. Spark SQL (DB, Json, case class)
  8. Spark SQL (DB, Json, case class)
  9. Spark SQL (DB, Json, case class)
  10. Spark SQL (DB, Json, case class)
  11. Spark SQL (DB, Json, case class)
  12. Spark SQL (DB, Json, case class)
  13. Spark SQL (DB, Json, case class)
  14. Spark SQL (DB, Json, case class)
  15. Spark SQL (DB, Json, case class)
  16. Key Value: Focus on scaling to huge amounts of data Designed to handle massive load Based on Amazon’s Dynamo paper Data model: (global) collection of Key-Value pairs Dynamo ring partitioning and replication Big Table Clones Like column oriented Relational Databases, but with a twist Tables similarly to RDBMS, but handles semi-structured ๏Based on Google’s BigTable paper Data model: ‣Columns → column families → ACL ‣Datums keyed by: row, column, time, index ‣Row-range → tablet → distribution Document Similar to Key-Value stores,but the DB knows what theValue is Inspired by Lotus Notes Data model: Collections of Key-Value collections Documents are often versioned GraphDB Focus on modeling the structure of data – interconnectivity Scales to the complexity of the data Inspired by mathematical Graph Theory ( G=(E,V) ) Data model: “Property Graph” ‣Nodes ‣Relationships/Edges between Nodes (first class) ‣Key-Value pairs on both ‣Possibly Edge Labels and/or Node/Edge Types