SlideShare uma empresa Scribd logo
1 de 72
Baixar para ler offline
More expressive types for Spark with
Frameless
Miguel Pérez Pasalodos
@Kamugo
Raise your hand if...
● You use Spark in production
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
● You know what the typeclass pattern is
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
● You know what the typeclass pattern is
● You know what generic programming or Shapeless is
Raise your hand if...
● You use Spark in production
● You use Spark with Scala
● You know what the typeclass pattern is
● You know what generic programming or Shapeless is
● You’ve used Spark with Frameless before
Spark API evolution
RDDs
trait Person { val name: String }
case class Teacher(id: Int, name: String, salary: Double) extends Person
case class Student(id: Int, name: String) extends Person
RDDs
trait Person { val name: String }
case class Teacher(id: Int, name: String, salary: Double) extends Person
case class Student(id: Int, name: String) extends Person
val people: RDD[Person] = sc.parallelize(List(
Teacher(1, "Emma", 60000),
Student(2, "Steve"),
Student(3, "Arnold")
))
Lambdas are (almost) type-safe
val names = people.map(person => person.name)
val names = people.map {
case Teacher(_, name, _) => s"Teacher $name"
case Student(_, name) => s"Student $name"
}
Lambdas are (almost) type-safe
val names = people.map(person => person.name)
val names = people.map {
case Teacher(_, name, _) => s"Teacher $name"
case Student(_, name) => s"Student $name"
}
Possible MatchError
at runtime
RDDs
● Basically, a lazy distributed immutable collection
● Compile-time type-safe
● Schema-less
● How-to non-optimized transformations
● Limited datasources
Our model from now on
case class Person(id: Int, name: String, age: Short)
DataFrames
val people: DataFrame = List(
Person(1, "Miguel", 26),
Person(2, "Sarah", 28),
Person(2, "John", 32)
).toDF()
Mandatory schema
scala> people.printSchema()
root
|-- id: integer (nullable = false)
|-- name: string (nullable = true)
|-- age: short (nullable = false)
scala> people.filter($"age" !== 26).filter($"age" !== 27).explain(true)
== Parsed Logical Plan ==
'Filter NOT ('age = 27)
+- Filter NOT (cast(age#133 as int) = 26)
+- LocalRelation [id#131, name#132, age#133
== Optimized Logical Plan ==
Filter (NOT (cast(age#133 as int) = 26) && NOT (cast(age#133 as int) = 27))
+- LocalRelation [id#131, name#132, age#133
Query optimization
They’re not type-safe :(
val names: DataFrame = people.select("namee")
They’re not type-safe :(
AnalysisException: cannot resolve '`namee`'
given input columns: [id, name, age]
Runtime
val names: DataFrame = people.select("namee")
DataFrames
● Mandatory schema
● Optimized what-to specification
● Compatible with SQL
● Not type-safe
● Extensible DataSource API
Datasets
val people: Dataset[Person] = List(
Person(1, "Miguel", 26),
Person(2, "Sarah", 28),
Person(2, "John", 32)
).toDS()
Datasets
● Try to get the best of both worlds
● We can use lambdas as in RDDs!
○ What about performance?
● Full DataFrame API as DataFrame = Dataset[Row]
● They seem type-safe
We can use the DataFrame API
val names: DataFrame = people.select("namee")
Still not type-safe :(
AnalysisException: cannot resolve '`namee`'
given input columns: [id, name, age]
Runtime
val names: DataFrame = people.select("namee")
But… we can cast them!
val names: Dataset[Int] = people.select("name").as[Int]
But… we can cast them! ...and fail :(
AnalysisException: Cannot up cast `name` from
string to int as it may truncate
Runtime
val names: Dataset[Int] = people.select("name").as[Int]
Lambdas...
val names: Dataset[String] = people.map(_.namee)
Lambdas… are type-safe!
Error: value namee is not a member of PersonCompile
val names: Dataset[String] = people.map(_.namee)
What about performance?
● 2²⁵ random generated people
● 20 parquet files
● 4 cores
people.filter(_.age == 26).count() VS people.filter($"age" === 26).count()
What about performance?
filter(_.age == 26) filter($"age" === 26)
Encoders?
class Car(name: String)
spark.createDataset(List(
new Car("Tesla Model S")
))
Encoders?
Unable to find encoder for type stored in DatasetCompile
class Car(name: String)
spark.createDataset(List(
new Car("Tesla Model S")
))
Encoders?
case class PersonCar(personId: Int, car: Car)
val cars: Dataset[PersonCar] = spark.createDataset(List(
PersonCar(1, new Car("Tesla Model S"))
))
Encoders?
UnsupportedOperationException: No Encoder found for
Car
- field (class: "Car", name: "car")
- root class: "PersonCar"
case class PersonCar(personId: Int, car: Car)
val cars: Dataset[PersonCar] = spark.createDataset(List(
PersonCar(1, new Car("Tesla Model S"))
))
Runtime
Frameless to the rescue!
Frameless
● Wraps the Spark API
● Type-safe non-lambda methods
● No run-time performance differences
● Provides a way to define custom encoders
● Actions are also lazy
Typed Datasets
val peopleFL: TypedDataset[Person] = people.typed
val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
Typed Datasets
No column Symbol with
shapeless.tag.Tagged[String("namee")] of type A in
Person
Compile
val peopleFL: TypedDataset[Person] = people.typed
val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
Column operations are also supported
scala> val agesDivided = peopleFL.select(peopleFL('age)/2)
agesDivided: TypedDataset[Double]
Column operations are also supported
scala> val agesDivided = peopleFL.select(peopleFL('age)/2)
agesDivided: TypedDataset[Double]
val intToString = (x: Int) => x.toString
val udf = peopleFL.makeUDF(intToString)
scala> val result = peopleFL.select(udf(peopleFL('age)))
result: TypedDataset[String]
Aggregations
case class AvgAge(name: String, age: Double)
val ageByName: TypedDataset[AvgAge] = {
peopleFL.groupBy(peopleFL('name)).agg(avg(peopleFL('age)))
}.as[AvgAge]
Custom type encoders: Injection
sealed trait Gender
case object Female extends Gender
case object Male extends Gender
case object Other extends Gender
case class PersonGender(id: Int, gender: Gender)
TypedDataset.create(peopleGender)
Custom encoders: Injection
sealed trait Gender
case object Female extends Gender
case object Male extends Gender
case object Other extends Gender
case class PersonGender(id: Int, gender: Gender)
TypedDataset.create(peopleGender)
Compile Cannot find implicit value for value encoder
Custom encoders: Injection
implicit val genderToInt: Injection[Gender, Int] = Injection(
{
case Female => 1; case Male => 2; case Other => 3
},{
case 1 => Female; case 2 => Male; case 3 => Other
}
)
scala> TypedDataset.create(peopleGender)
res0: TypedDataset[PersonGender] = [id: int, gender: int]
Lazy actions
val numPeopleJob: Job[Long] = people.count().withDescription("...")
val num: Long = numPeopleJob.run()
Lazy actions
val numPeopleJob: Job[Long] = people.count().withDescription("...")
val num: Long = numPeopleJob.run()
val sampleJob = for {
num <- people.count()
sample <- people.take((num/10).toInt)
} yield sample
How?
Encoders are typeclasses
val peopleList = List(Person(1, "Miguel", 26))
val people = spark.createDataset(peopleList)
def createDataset[T : Encoder](data: Seq[T]): Dataset[T]
Encoders are typeclasses
val peopleList = List(Person(1, "Miguel", 26))
val people = spark.createDataset(peopleList)
def createDataset[T : Encoder](data: Seq[T]): Dataset[T]
// It’s the same as
def createDataset[T](data: Seq[T])(implicit encoder: Encoder[T])
Encoders are typeclasses
● Instances provided by SQLImplicits class
● That’s why we need import spark.implicits._ everywhere!
implicit def newSequenceEncoder[T <: Seq[_] : TypeTag]: Encoder[T] =
ExpressionEncoder() // <- Reflection at runtime!
Reflection is not our friend
class Car(name: String)
val cars = Seq(Car("Tesla"))
val ds: Dataset[Car] = spark.createDataset(cars)
Compile Unable to find encoder for type stored in a Dataset.
Reflection is not our friend
class Car(name: String)
val cars = Seq(Car("Tesla"))
val ds: Dataset[Car] = spark.createDataset(cars)
val ds: Dataset[Seq[Cars]] = spark.createDataset(Seq(cars))
Runtime
Compile
No encoder found for Car
Unable to find encoder for type stored in a Dataset.
How different are the Frameless encoders?
def create[A](data: Seq[A])(
implicit
encoder: TypedEncoder[A],
sqlContext: SQLContext
): TypedDataset[A]
Recursive implicit resolution!
implicit def mapEncoder[A: NotCatalystNullable, B](
implicit
encodeA: TypedEncoder[A],
encodeB: TypedEncoder[B]
): TypedEncoder[Map[A, B]]
How to know if our class has a column?
// We were calling people(‘name)
def TypedDataset[T] {
def apply[A](column: Witness.Lt[Symbol])(
implicit
exists: TypedColumn.Exists[T, column.T, A],
encoder: TypedEncoder[A]
): TypedColumn[T, A]
}
How to know if our class has a column?
object TypedColumn.Exists[T, K, V] {
implicit def deriveRecord[T, H <: HList, K, V](
implicit
lgen: LabelledGeneric.Aux[T, H],
selector: Selector.Aux[H, K, V]
): Exists[T, K, V] = new Exists[T, K, V] {}
}
Concepts we need to understand first
● Generic programming and HList
● Literal types
● Phantom types
● Type tagging
● Dependent types
Generic programming!
HList = HNil | ::[A, H <: HList]
Generic programming!
val genericMe = 1 :: "Miguel" :: (26: Short) :: HNil
scala> :type genericMe
::[Int, ::[String, ::[Short, HNil]]]
HList = HNil | ::[A, H <: HList]
Shapeless Generic typeclass
val genericPerson = Generic[Person]
val genericMe = 1 :: "Miguel" :: (26: Short) :: HNil
scala> val me = genericPerson.from(genericMe)
me: Person = Person(1,Miguel,26)
scala> val genericMeAgain = genericPerson.to(me)
gemericMeAgain: genericPerson.Repr = 1 :: Miguel :: 26 :: HNil
Literal types
● A type for each value!
● Gives the compiler power to know about values
var three = 3.narrow
three: Int(3) = 3
Literal types
scala> three+three
res8: Int = 6
scala> three = 4
<console>:38: error: type mismatch;
found : Int(4)
required: Int(3)
trait Increasable
def inc(x: Int with Increasable) = x+1
inc(3.asInstanceOf[Int with Increasable]): Int = 4
inc(3)
error: type mismatch; found: Int(3); required: Int with Increasable
Phantom types and type tagging
● Phantom type: no runtime behaviour
● Type tagging: assign a phantom type to other types
All combined with Shapeless!
"name" ->> 1
res1: Int with KeyTag[String("name"),Int] = 1
All combined with Shapeless!
"name" ->> 1
res1: Int with KeyTag[String("name"),Int] = 1
val me = ("id" ->> 1) :: ("name" ->> "Miguel") :: ("age" ->> 26) :: HNil
::Int with KeyTag[String("id"),Int],
::String with KeyTag[String("name"),String],
::Short with KeyTag[String("age"),Short],
::HNil
LabelledGeneric
val genericPerson = LabelledGeneric[Person]
::Int with KeyTag[Symbol with Tagged[String("id")],Int],
::String with KeyTag[Symbol with Tagged[String("name")],String],
::Short with KeyTag[Symbol with Tagged[String("age")],Short],
HNil
Dependent types
trait Generic[A] {
type Repr
def to(value: A): Repr
}
def getRepr[A](v: A)(gen: Generic[A]): gen.Repr = gen.to(v)
// Is it not the same as this?
def getRepr[A, R](v: A)(gen: Generic2[A, R]): R = ???
Shapeless Witness
trait Witness {
type T
val value: T
}
def getField[A,K,V](value: A with KeyTag[K,V])
(implicit witness: Witness.Aux[K]) = witness.value
// Aux[K] = Witness { type T = K }
>scala getField("name" ->> 1)
res0: String("name") = name
Shapeless Witness
Witness.Aux[A] = Witness { type T = A }
>scala val witness = Witness(‘name)
witness: Witness.Aux[Symbol with Tagged[String("name")]
Witness.Lt[A] = Witness { type T <: A }
// Tagged Symbol is a subtype of Symbol. So previous line is also...
witness: Witness.Lt[Symbol]
Back to Frameless
// We were calling people(‘name)
def TypedDataset[T] {
def apply[A](column: Witness.Lt[Symbol])(
implicit
exists: TypedColumn.Exists[T, column.T, A],
encoder: TypedEncoder[A]
): TypedColumn[T, A]
}
Back to Frameless
object TypedColumn.Exists[T, K, V] {
implicit def deriveRecord[T, H <: HList, K, V](
implicit
lgen: LabelledGeneric.Aux[T, H],
selector: Selector.Aux[H, K, V]
): Exists[T, K, V] = new Exists[T, K, V] {}
}
To use it or not to use it
Type-safe with the same performance
Injections for custom types
Lazy jobs with descriptions
Slower compilation
Not yet stable. No official Spark backward compatibility
More expressive types for Spark with
Frameless
Miguel Pérez Pasalodos
@Kamugo

Mais conteúdo relacionado

Mais procurados

Building secure applications with keycloak
Building secure applications with keycloak Building secure applications with keycloak
Building secure applications with keycloak Abhishek Koserwal
 
IT Infrastructure Automation with Ansible
IT Infrastructure Automation with AnsibleIT Infrastructure Automation with Ansible
IT Infrastructure Automation with AnsibleDio Pratama
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageTony Pearson
 
Microservices Architectures: Become a Unicorn like Netflix, Twitter and Hailo
Microservices Architectures: Become a Unicorn like Netflix, Twitter and HailoMicroservices Architectures: Become a Unicorn like Netflix, Twitter and Hailo
Microservices Architectures: Become a Unicorn like Netflix, Twitter and Hailogjuljo
 
Agile Integration Workshop
Agile Integration WorkshopAgile Integration Workshop
Agile Integration WorkshopJudy Breedlove
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog
 
GKE Tip Series how do i choose between gke standard, autopilot and cloud run
GKE Tip Series   how do i choose between gke standard, autopilot and cloud run GKE Tip Series   how do i choose between gke standard, autopilot and cloud run
GKE Tip Series how do i choose between gke standard, autopilot and cloud run Sreenivas Makam
 
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...confluent
 
Comparison Between Top Five Cloud Service Provider In 2020
Comparison Between Top Five Cloud Service Provider In 2020Comparison Between Top Five Cloud Service Provider In 2020
Comparison Between Top Five Cloud Service Provider In 2020Abu Hasnat Md. Shakik Prodhan
 
InduSoft Web License Activation
InduSoft Web License ActivationInduSoft Web License Activation
InduSoft Web License ActivationAVEVA
 
Step By Step to Install Oracle Business Intelligence
Step By Step to Install Oracle Business IntelligenceStep By Step to Install Oracle Business Intelligence
Step By Step to Install Oracle Business IntelligenceOsama Mustafa
 
Logging presentation
Logging presentationLogging presentation
Logging presentationJatan Malde
 
IBM Spectrum Scale and Its Use for Content Management
 IBM Spectrum Scale and Its Use for Content Management IBM Spectrum Scale and Its Use for Content Management
IBM Spectrum Scale and Its Use for Content ManagementSandeep Patil
 
IBM Cloud Integration Platform Introduction - Integration Tech Conference
IBM Cloud Integration Platform Introduction - Integration Tech ConferenceIBM Cloud Integration Platform Introduction - Integration Tech Conference
IBM Cloud Integration Platform Introduction - Integration Tech ConferenceRobert Nicholson
 
What to Expect From Oracle database 19c
What to Expect From Oracle database 19cWhat to Expect From Oracle database 19c
What to Expect From Oracle database 19cMaria Colgan
 
Alfresco In An Hour - Document Management, Web Content Management, and Collab...
Alfresco In An Hour - Document Management, Web Content Management, and Collab...Alfresco In An Hour - Document Management, Web Content Management, and Collab...
Alfresco In An Hour - Document Management, Web Content Management, and Collab...Alfresco Software
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)confluent
 
E tech vmware presentation
E tech vmware presentationE tech vmware presentation
E tech vmware presentationjpenney
 
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and AdministerAndrejs Karpovs
 
Mapping Automotive SPICE: Achieving Higher Maturity &amp; Capability Levels
Mapping Automotive SPICE: Achieving Higher Maturity &amp; Capability LevelsMapping Automotive SPICE: Achieving Higher Maturity &amp; Capability Levels
Mapping Automotive SPICE: Achieving Higher Maturity &amp; Capability LevelsLuigi Buglione
 

Mais procurados (20)

Building secure applications with keycloak
Building secure applications with keycloak Building secure applications with keycloak
Building secure applications with keycloak
 
IT Infrastructure Automation with Ansible
IT Infrastructure Automation with AnsibleIT Infrastructure Automation with Ansible
IT Infrastructure Automation with Ansible
 
IBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object StorageIBM Spectrum Scale for File and Object Storage
IBM Spectrum Scale for File and Object Storage
 
Microservices Architectures: Become a Unicorn like Netflix, Twitter and Hailo
Microservices Architectures: Become a Unicorn like Netflix, Twitter and HailoMicroservices Architectures: Become a Unicorn like Netflix, Twitter and Hailo
Microservices Architectures: Become a Unicorn like Netflix, Twitter and Hailo
 
Agile Integration Workshop
Agile Integration WorkshopAgile Integration Workshop
Agile Integration Workshop
 
Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
GKE Tip Series how do i choose between gke standard, autopilot and cloud run
GKE Tip Series   how do i choose between gke standard, autopilot and cloud run GKE Tip Series   how do i choose between gke standard, autopilot and cloud run
GKE Tip Series how do i choose between gke standard, autopilot and cloud run
 
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
 
Comparison Between Top Five Cloud Service Provider In 2020
Comparison Between Top Five Cloud Service Provider In 2020Comparison Between Top Five Cloud Service Provider In 2020
Comparison Between Top Five Cloud Service Provider In 2020
 
InduSoft Web License Activation
InduSoft Web License ActivationInduSoft Web License Activation
InduSoft Web License Activation
 
Step By Step to Install Oracle Business Intelligence
Step By Step to Install Oracle Business IntelligenceStep By Step to Install Oracle Business Intelligence
Step By Step to Install Oracle Business Intelligence
 
Logging presentation
Logging presentationLogging presentation
Logging presentation
 
IBM Spectrum Scale and Its Use for Content Management
 IBM Spectrum Scale and Its Use for Content Management IBM Spectrum Scale and Its Use for Content Management
IBM Spectrum Scale and Its Use for Content Management
 
IBM Cloud Integration Platform Introduction - Integration Tech Conference
IBM Cloud Integration Platform Introduction - Integration Tech ConferenceIBM Cloud Integration Platform Introduction - Integration Tech Conference
IBM Cloud Integration Platform Introduction - Integration Tech Conference
 
What to Expect From Oracle database 19c
What to Expect From Oracle database 19cWhat to Expect From Oracle database 19c
What to Expect From Oracle database 19c
 
Alfresco In An Hour - Document Management, Web Content Management, and Collab...
Alfresco In An Hour - Document Management, Web Content Management, and Collab...Alfresco In An Hour - Document Management, Web Content Management, and Collab...
Alfresco In An Hour - Document Management, Web Content Management, and Collab...
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
 
E tech vmware presentation
E tech vmware presentationE tech vmware presentation
E tech vmware presentation
 
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and AdministerOracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
Oracle E-Business Suite R12.2.5 on Database 12c: Install, Patch and Administer
 
Mapping Automotive SPICE: Achieving Higher Maturity &amp; Capability Levels
Mapping Automotive SPICE: Achieving Higher Maturity &amp; Capability LevelsMapping Automotive SPICE: Achieving Higher Maturity &amp; Capability Levels
Mapping Automotive SPICE: Achieving Higher Maturity &amp; Capability Levels
 

Semelhante a More expressive types for spark with frameless

Spark Schema For Free with David Szakallas
 Spark Schema For Free with David Szakallas Spark Schema For Free with David Szakallas
Spark Schema For Free with David SzakallasDatabricks
 
DataMapper @ RubyEnRails2009
DataMapper @ RubyEnRails2009DataMapper @ RubyEnRails2009
DataMapper @ RubyEnRails2009Dirkjan Bussink
 
Understanding our code with tests, schemas, and types
Understanding our code with tests, schemas, and typesUnderstanding our code with tests, schemas, and types
Understanding our code with tests, schemas, and typesMark Godfrey
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Languageleague
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecLoïc Descotte
 
Types Working for You, Not Against You
Types Working for You, Not Against YouTypes Working for You, Not Against You
Types Working for You, Not Against YouC4Media
 
A well-typed program never goes wrong
A well-typed program never goes wrongA well-typed program never goes wrong
A well-typed program never goes wrongJulien Wetterwald
 
Typescript - why it's awesome
Typescript - why it's awesomeTypescript - why it's awesome
Typescript - why it's awesomePiotr Miazga
 
Let's refine your Scala Code
Let's refine your Scala CodeLet's refine your Scala Code
Let's refine your Scala CodeTech Triveni
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David SzakallasDatabricks
 
Automatically Spotting Cross-language Relations
Automatically Spotting Cross-language RelationsAutomatically Spotting Cross-language Relations
Automatically Spotting Cross-language RelationsFederico Tomassetti
 
DevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreDevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreXavier Coulon
 
Practices For Becoming A Better Programmer
Practices For Becoming A Better ProgrammerPractices For Becoming A Better Programmer
Practices For Becoming A Better ProgrammerSrikanth Shreenivas
 
A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)Imdadul Himu
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)Pedro Rodrigues
 

Semelhante a More expressive types for spark with frameless (20)

Spark Schema For Free with David Szakallas
 Spark Schema For Free with David Szakallas Spark Schema For Free with David Szakallas
Spark Schema For Free with David Szakallas
 
DataMapper @ RubyEnRails2009
DataMapper @ RubyEnRails2009DataMapper @ RubyEnRails2009
DataMapper @ RubyEnRails2009
 
Understanding our code with tests, schemas, and types
Understanding our code with tests, schemas, and typesUnderstanding our code with tests, schemas, and types
Understanding our code with tests, schemas, and types
 
Scala introduction
Scala introductionScala introduction
Scala introduction
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar Prokopec
 
Types Working for You, Not Against You
Types Working for You, Not Against YouTypes Working for You, Not Against You
Types Working for You, Not Against You
 
A well-typed program never goes wrong
A well-typed program never goes wrongA well-typed program never goes wrong
A well-typed program never goes wrong
 
Typescript - why it's awesome
Typescript - why it's awesomeTypescript - why it's awesome
Typescript - why it's awesome
 
A bit about Scala
A bit about ScalaA bit about Scala
A bit about Scala
 
Scala jargon cheatsheet
Scala jargon cheatsheetScala jargon cheatsheet
Scala jargon cheatsheet
 
Let's refine your Scala Code
Let's refine your Scala CodeLet's refine your Scala Code
Let's refine your Scala Code
 
Scala on Android
Scala on AndroidScala on Android
Scala on Android
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David Szakallas
 
Automatically Spotting Cross-language Relations
Automatically Spotting Cross-language RelationsAutomatically Spotting Cross-language Relations
Automatically Spotting Cross-language Relations
 
DevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreDevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a Datastore
 
Practices For Becoming A Better Programmer
Practices For Becoming A Better ProgrammerPractices For Becoming A Better Programmer
Practices For Becoming A Better Programmer
 
A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)A Presentation About Array Manipulation(Insertion & Deletion in an array)
A Presentation About Array Manipulation(Insertion & Deletion in an array)
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Basics of Python programming (part 2)
Basics of Python programming (part 2)Basics of Python programming (part 2)
Basics of Python programming (part 2)
 

Último

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 

Último (20)

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 

More expressive types for spark with frameless

  • 1. More expressive types for Spark with Frameless Miguel Pérez Pasalodos @Kamugo
  • 2. Raise your hand if... ● You use Spark in production
  • 3. Raise your hand if... ● You use Spark in production ● You use Spark with Scala
  • 4. Raise your hand if... ● You use Spark in production ● You use Spark with Scala ● You know what the typeclass pattern is
  • 5. Raise your hand if... ● You use Spark in production ● You use Spark with Scala ● You know what the typeclass pattern is ● You know what generic programming or Shapeless is
  • 6. Raise your hand if... ● You use Spark in production ● You use Spark with Scala ● You know what the typeclass pattern is ● You know what generic programming or Shapeless is ● You’ve used Spark with Frameless before
  • 8. RDDs trait Person { val name: String } case class Teacher(id: Int, name: String, salary: Double) extends Person case class Student(id: Int, name: String) extends Person
  • 9. RDDs trait Person { val name: String } case class Teacher(id: Int, name: String, salary: Double) extends Person case class Student(id: Int, name: String) extends Person val people: RDD[Person] = sc.parallelize(List( Teacher(1, "Emma", 60000), Student(2, "Steve"), Student(3, "Arnold") ))
  • 10. Lambdas are (almost) type-safe val names = people.map(person => person.name) val names = people.map { case Teacher(_, name, _) => s"Teacher $name" case Student(_, name) => s"Student $name" }
  • 11. Lambdas are (almost) type-safe val names = people.map(person => person.name) val names = people.map { case Teacher(_, name, _) => s"Teacher $name" case Student(_, name) => s"Student $name" } Possible MatchError at runtime
  • 12. RDDs ● Basically, a lazy distributed immutable collection ● Compile-time type-safe ● Schema-less ● How-to non-optimized transformations ● Limited datasources
  • 13. Our model from now on case class Person(id: Int, name: String, age: Short)
  • 14. DataFrames val people: DataFrame = List( Person(1, "Miguel", 26), Person(2, "Sarah", 28), Person(2, "John", 32) ).toDF()
  • 15. Mandatory schema scala> people.printSchema() root |-- id: integer (nullable = false) |-- name: string (nullable = true) |-- age: short (nullable = false)
  • 16. scala> people.filter($"age" !== 26).filter($"age" !== 27).explain(true) == Parsed Logical Plan == 'Filter NOT ('age = 27) +- Filter NOT (cast(age#133 as int) = 26) +- LocalRelation [id#131, name#132, age#133 == Optimized Logical Plan == Filter (NOT (cast(age#133 as int) = 26) && NOT (cast(age#133 as int) = 27)) +- LocalRelation [id#131, name#132, age#133 Query optimization
  • 17. They’re not type-safe :( val names: DataFrame = people.select("namee")
  • 18. They’re not type-safe :( AnalysisException: cannot resolve '`namee`' given input columns: [id, name, age] Runtime val names: DataFrame = people.select("namee")
  • 19. DataFrames ● Mandatory schema ● Optimized what-to specification ● Compatible with SQL ● Not type-safe ● Extensible DataSource API
  • 20. Datasets val people: Dataset[Person] = List( Person(1, "Miguel", 26), Person(2, "Sarah", 28), Person(2, "John", 32) ).toDS()
  • 21. Datasets ● Try to get the best of both worlds ● We can use lambdas as in RDDs! ○ What about performance? ● Full DataFrame API as DataFrame = Dataset[Row] ● They seem type-safe
  • 22. We can use the DataFrame API val names: DataFrame = people.select("namee")
  • 23. Still not type-safe :( AnalysisException: cannot resolve '`namee`' given input columns: [id, name, age] Runtime val names: DataFrame = people.select("namee")
  • 24. But… we can cast them! val names: Dataset[Int] = people.select("name").as[Int]
  • 25. But… we can cast them! ...and fail :( AnalysisException: Cannot up cast `name` from string to int as it may truncate Runtime val names: Dataset[Int] = people.select("name").as[Int]
  • 26. Lambdas... val names: Dataset[String] = people.map(_.namee)
  • 27. Lambdas… are type-safe! Error: value namee is not a member of PersonCompile val names: Dataset[String] = people.map(_.namee)
  • 28. What about performance? ● 2²⁵ random generated people ● 20 parquet files ● 4 cores people.filter(_.age == 26).count() VS people.filter($"age" === 26).count()
  • 29. What about performance? filter(_.age == 26) filter($"age" === 26)
  • 31. Encoders? Unable to find encoder for type stored in DatasetCompile class Car(name: String) spark.createDataset(List( new Car("Tesla Model S") ))
  • 32. Encoders? case class PersonCar(personId: Int, car: Car) val cars: Dataset[PersonCar] = spark.createDataset(List( PersonCar(1, new Car("Tesla Model S")) ))
  • 33. Encoders? UnsupportedOperationException: No Encoder found for Car - field (class: "Car", name: "car") - root class: "PersonCar" case class PersonCar(personId: Int, car: Car) val cars: Dataset[PersonCar] = spark.createDataset(List( PersonCar(1, new Car("Tesla Model S")) )) Runtime
  • 34. Frameless to the rescue!
  • 35. Frameless ● Wraps the Spark API ● Type-safe non-lambda methods ● No run-time performance differences ● Provides a way to define custom encoders ● Actions are also lazy
  • 36. Typed Datasets val peopleFL: TypedDataset[Person] = people.typed val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
  • 37. Typed Datasets No column Symbol with shapeless.tag.Tagged[String("namee")] of type A in Person Compile val peopleFL: TypedDataset[Person] = people.typed val names: TypedDataset[String] = peopleFL.select(peopleFL('namee))
  • 38. Column operations are also supported scala> val agesDivided = peopleFL.select(peopleFL('age)/2) agesDivided: TypedDataset[Double]
  • 39. Column operations are also supported scala> val agesDivided = peopleFL.select(peopleFL('age)/2) agesDivided: TypedDataset[Double] val intToString = (x: Int) => x.toString val udf = peopleFL.makeUDF(intToString) scala> val result = peopleFL.select(udf(peopleFL('age))) result: TypedDataset[String]
  • 40. Aggregations case class AvgAge(name: String, age: Double) val ageByName: TypedDataset[AvgAge] = { peopleFL.groupBy(peopleFL('name)).agg(avg(peopleFL('age))) }.as[AvgAge]
  • 41. Custom type encoders: Injection sealed trait Gender case object Female extends Gender case object Male extends Gender case object Other extends Gender case class PersonGender(id: Int, gender: Gender) TypedDataset.create(peopleGender)
  • 42. Custom encoders: Injection sealed trait Gender case object Female extends Gender case object Male extends Gender case object Other extends Gender case class PersonGender(id: Int, gender: Gender) TypedDataset.create(peopleGender) Compile Cannot find implicit value for value encoder
  • 43. Custom encoders: Injection implicit val genderToInt: Injection[Gender, Int] = Injection( { case Female => 1; case Male => 2; case Other => 3 },{ case 1 => Female; case 2 => Male; case 3 => Other } ) scala> TypedDataset.create(peopleGender) res0: TypedDataset[PersonGender] = [id: int, gender: int]
  • 44. Lazy actions val numPeopleJob: Job[Long] = people.count().withDescription("...") val num: Long = numPeopleJob.run()
  • 45. Lazy actions val numPeopleJob: Job[Long] = people.count().withDescription("...") val num: Long = numPeopleJob.run() val sampleJob = for { num <- people.count() sample <- people.take((num/10).toInt) } yield sample
  • 46. How?
  • 47. Encoders are typeclasses val peopleList = List(Person(1, "Miguel", 26)) val people = spark.createDataset(peopleList) def createDataset[T : Encoder](data: Seq[T]): Dataset[T]
  • 48. Encoders are typeclasses val peopleList = List(Person(1, "Miguel", 26)) val people = spark.createDataset(peopleList) def createDataset[T : Encoder](data: Seq[T]): Dataset[T] // It’s the same as def createDataset[T](data: Seq[T])(implicit encoder: Encoder[T])
  • 49. Encoders are typeclasses ● Instances provided by SQLImplicits class ● That’s why we need import spark.implicits._ everywhere! implicit def newSequenceEncoder[T <: Seq[_] : TypeTag]: Encoder[T] = ExpressionEncoder() // <- Reflection at runtime!
  • 50. Reflection is not our friend class Car(name: String) val cars = Seq(Car("Tesla")) val ds: Dataset[Car] = spark.createDataset(cars) Compile Unable to find encoder for type stored in a Dataset.
  • 51. Reflection is not our friend class Car(name: String) val cars = Seq(Car("Tesla")) val ds: Dataset[Car] = spark.createDataset(cars) val ds: Dataset[Seq[Cars]] = spark.createDataset(Seq(cars)) Runtime Compile No encoder found for Car Unable to find encoder for type stored in a Dataset.
  • 52. How different are the Frameless encoders? def create[A](data: Seq[A])( implicit encoder: TypedEncoder[A], sqlContext: SQLContext ): TypedDataset[A]
  • 53. Recursive implicit resolution! implicit def mapEncoder[A: NotCatalystNullable, B]( implicit encodeA: TypedEncoder[A], encodeB: TypedEncoder[B] ): TypedEncoder[Map[A, B]]
  • 54. How to know if our class has a column? // We were calling people(‘name) def TypedDataset[T] { def apply[A](column: Witness.Lt[Symbol])( implicit exists: TypedColumn.Exists[T, column.T, A], encoder: TypedEncoder[A] ): TypedColumn[T, A] }
  • 55. How to know if our class has a column? object TypedColumn.Exists[T, K, V] { implicit def deriveRecord[T, H <: HList, K, V]( implicit lgen: LabelledGeneric.Aux[T, H], selector: Selector.Aux[H, K, V] ): Exists[T, K, V] = new Exists[T, K, V] {} }
  • 56. Concepts we need to understand first ● Generic programming and HList ● Literal types ● Phantom types ● Type tagging ● Dependent types
  • 57. Generic programming! HList = HNil | ::[A, H <: HList]
  • 58. Generic programming! val genericMe = 1 :: "Miguel" :: (26: Short) :: HNil scala> :type genericMe ::[Int, ::[String, ::[Short, HNil]]] HList = HNil | ::[A, H <: HList]
  • 59. Shapeless Generic typeclass val genericPerson = Generic[Person] val genericMe = 1 :: "Miguel" :: (26: Short) :: HNil scala> val me = genericPerson.from(genericMe) me: Person = Person(1,Miguel,26) scala> val genericMeAgain = genericPerson.to(me) gemericMeAgain: genericPerson.Repr = 1 :: Miguel :: 26 :: HNil
  • 60. Literal types ● A type for each value! ● Gives the compiler power to know about values var three = 3.narrow three: Int(3) = 3
  • 61. Literal types scala> three+three res8: Int = 6 scala> three = 4 <console>:38: error: type mismatch; found : Int(4) required: Int(3)
  • 62. trait Increasable def inc(x: Int with Increasable) = x+1 inc(3.asInstanceOf[Int with Increasable]): Int = 4 inc(3) error: type mismatch; found: Int(3); required: Int with Increasable Phantom types and type tagging ● Phantom type: no runtime behaviour ● Type tagging: assign a phantom type to other types
  • 63. All combined with Shapeless! "name" ->> 1 res1: Int with KeyTag[String("name"),Int] = 1
  • 64. All combined with Shapeless! "name" ->> 1 res1: Int with KeyTag[String("name"),Int] = 1 val me = ("id" ->> 1) :: ("name" ->> "Miguel") :: ("age" ->> 26) :: HNil ::Int with KeyTag[String("id"),Int], ::String with KeyTag[String("name"),String], ::Short with KeyTag[String("age"),Short], ::HNil
  • 65. LabelledGeneric val genericPerson = LabelledGeneric[Person] ::Int with KeyTag[Symbol with Tagged[String("id")],Int], ::String with KeyTag[Symbol with Tagged[String("name")],String], ::Short with KeyTag[Symbol with Tagged[String("age")],Short], HNil
  • 66. Dependent types trait Generic[A] { type Repr def to(value: A): Repr } def getRepr[A](v: A)(gen: Generic[A]): gen.Repr = gen.to(v) // Is it not the same as this? def getRepr[A, R](v: A)(gen: Generic2[A, R]): R = ???
  • 67. Shapeless Witness trait Witness { type T val value: T } def getField[A,K,V](value: A with KeyTag[K,V]) (implicit witness: Witness.Aux[K]) = witness.value // Aux[K] = Witness { type T = K } >scala getField("name" ->> 1) res0: String("name") = name
  • 68. Shapeless Witness Witness.Aux[A] = Witness { type T = A } >scala val witness = Witness(‘name) witness: Witness.Aux[Symbol with Tagged[String("name")] Witness.Lt[A] = Witness { type T <: A } // Tagged Symbol is a subtype of Symbol. So previous line is also... witness: Witness.Lt[Symbol]
  • 69. Back to Frameless // We were calling people(‘name) def TypedDataset[T] { def apply[A](column: Witness.Lt[Symbol])( implicit exists: TypedColumn.Exists[T, column.T, A], encoder: TypedEncoder[A] ): TypedColumn[T, A] }
  • 70. Back to Frameless object TypedColumn.Exists[T, K, V] { implicit def deriveRecord[T, H <: HList, K, V]( implicit lgen: LabelledGeneric.Aux[T, H], selector: Selector.Aux[H, K, V] ): Exists[T, K, V] = new Exists[T, K, V] {} }
  • 71. To use it or not to use it Type-safe with the same performance Injections for custom types Lazy jobs with descriptions Slower compilation Not yet stable. No official Spark backward compatibility
  • 72. More expressive types for Spark with Frameless Miguel Pérez Pasalodos @Kamugo