SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
JoohyunKim
Sr.	
  Data	
  Scientist
MyFitnessPal – Under	
  Armour Connected	
  Fitness
Who	
  are	
  we?
#1	
  in	
  
Health	
  &	
  
Fitness
~7M	
  food	
  
database
14.5B+	
  
food	
  
logging	
  
data
85M+	
  
Users
Workout
e-­‐
Commerce
Retail
120M+	
  
Users
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 2
Under	
  Armour =	
  Apparel	
  Company?
• http://www.fool.com/investing/general/2015/06/07/how-­‐under-­‐armour-­‐is-­‐
becoming-­‐a-­‐tech-­‐company.aspx
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 3
Food	
  /	
  Nutrition
•14.5B+	
  logged	
  foods
•~7M	
  foods
•38M+	
  recipes
•From	
  85M+	
  users
Workout
•Time-­‐series	
  GPS	
  data
•Music	
  data	
  with	
  workout
•Run/Ride/Walk
•Active	
  time	
  of	
  the	
  day
•Sleep	
  pattern
•390B+	
  calories	
  burned	
  (MFP)
•1.2T+	
  minutes	
  exercise	
  (MFP)
Retail	
  /	
  e-­‐
Commerce
•Product	
  purchase	
  
transactions
•User	
  preferences	
  on	
  
clothes	
  /	
  shoes	
  /	
  wearable	
  
devices
Under	
  Armour =	
  Data	
  Company!
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 4
Importance	
  of	
  Data
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 5
Biggest	
  Concern	
  of	
  Life:	
  What	
  to	
  Eat?
Taste
Healthy
Preference
Dietary
Restriction
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 6
Recommender	
  System?
Word	
  of	
  
mouth
Reviews	
  /
Blogs
Expert	
  
Advice
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Typical	
  ways	
  of	
  getting	
  recommendation
Limited Biased Lack	
  of	
  Source
7
John Mary Mike Jane
Banana 5 3 1 4
Blueberry 3 -­‐ -­‐ 2
Apple -­‐ -­‐ 2 -­‐ (?)
Melon 1 -­‐ -­‐ -­‐ (?)
Collaborative	
  Filtering
• Predict	
  how	
  a	
  user	
  may	
  like	
  a	
  new	
  item	
  based	
  on	
  
prior	
  user	
  behaviors	
  with	
  similar	
  preference
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
User	
  – Food	
  Logged	
  Counts	
  Table
8
Matrix	
  Factorization
• Filling	
  in	
  the	
  missing	
  entries	
  in	
  ratings	
  (user-­‐food	
  
logged	
  counts)	
  matrix
• Formulate	
  as	
  low-­‐rank	
  matrix	
  factorization
• Factorize	
  user-­‐item	
  matrix	
  to	
  user-­‐feature	
  and	
  feature-­‐
item	
  matrix	
  (#	
  features	
  	
  ≪ #	
  users	
  or	
  #	
  items)
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
R X
YT
Objective
Minimize	
  RMSE	
  (Root	
  
mean	
  squared	
  error)	
  
between	
   𝑅	
  and	
   𝑋	
  ×	
   𝑌'
≈ ×
Users Items
9
• Explicit	
  ratings	
  are	
  available	
  for	
  movies	
  /	
  songs
• Typically	
  1~5	
  stars	
  (ratings)	
  given
• For	
  MFP	
  food	
  logging	
  events,	
  there	
  are	
  only	
  “logged”
foods.	
  No	
  negative	
  feedback
• Can’t	
  assume	
  0	
  count	
  (no	
  entry)	
  as	
  negative
• Reference:	
  Hu,	
  Koren,	
  and	
  Volinsky,	
  Collaborative	
  Filtering	
  with	
  
Implicit	
  Feedback	
  Dataset,	
  ICDM	
  08
• Construct	
  “binary”	
  ratings	
  matrix	
  P,	
  and	
  factorize	
  P	
  
instead	
  of	
  R	
  (original	
  ratings	
  matrix)
“Implicit”	
  Matrix	
  Factorization
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
5 3 ? 3 ?
? 2 ? ? 4
1 ? ? ? ?
? ? 1 2 ?
? ? 2 ? 1
? 3 ? ? 1
1 1 ? 1 ?
? 1 ? ? 1
1 ? ? ? ?
? ? 1 1 ?
? ? 1 ? 1
? 1 ? ? 1
R P
⇒ ≈ ×
X Y
10
Alternating	
  Least	
  Squares
• Optimizing	
  X	
  (user)	
  and	
  Y	
  (item)	
  at	
  the	
  same	
  time	
  is	
  
hard
• Fix	
  X	
  or	
  Y	
  ⟹ Solve	
  for	
  the	
  other
• Solve	
  the	
  system	
  of	
  linear	
  equations
• Take	
  the	
  derivative	
  of	
  objective	
  function	
  w.r.t X	
  or	
  Y,	
  set	
  
0,	
  and	
  solve
• Starting	
  with	
  random	
  initialization	
  of	
  Y
• EM-­‐like	
  iterative	
  process
• Iterate	
  until	
  the	
  change	
  is	
  very	
  small	
  (or	
  stop	
  with	
  
fixed	
  iteration	
  number)
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 11
Scalability?	
  ⇒ Parallelization!
6/16/2015 MyFitnessPal -­‐ Under	
  Armour Connected	
  Fitness
• Rating	
  matrix:	
  85M	
  users	
  × 7M	
  foods	
  ≅ 595T	
  
entries
• Impossible	
  to	
  fit	
  in	
  a	
  single	
  machine
• Sparse	
  representation:	
  billions	
  of	
  entries
• ALS	
  can	
  be	
  easily	
  parallelized	
  with	
  map-­‐reduce	
  
framework
• Sharding users	
  and	
  items	
  vectors
• Mapper	
  on	
  individual	
  sub-­‐matrix
• Reducer	
  on	
  aggregation	
  over	
  users/items
• Spark	
  MLLib
• Parallelized	
  version	
  of	
  ALS	
  ready	
  to	
  use
• Fast	
  computation	
  with	
  DataFrame
val model  =  new  ALS()
.setRank(20)
.setImplicitPrefs(true)
.setAlpha(40)
.setRegParam(0.1)
.setMaxIter(10)
.fit(ratings)
12
Food	
  Recommendation	
  Pipeline
Logged	
  foods	
  data	
  (user	
  /	
  food)
Predict	
  food	
  preference	
  by	
  matrix	
  
factorization
Generate	
  top	
  K	
  food
recommendation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 13
Generating	
  Top	
  K	
  Recommendations
• We	
  need	
  to	
  serve	
  top	
  recommended	
  foods	
  to	
  users
• With	
  the	
  trained	
  factorized	
  matrix	
  model,
• Predict	
  top	
  K	
  foods	
  for	
  each	
  user	
  (in	
  the	
  order	
  of	
  their	
  
own	
  preference)
• Seem	
  trivial,	
  but	
  the	
  computation	
  is	
  huge
• For	
  each	
  user,	
  retrieve	
  food	
  preference	
  by	
   𝑅 = 𝑋	
  ×	
   𝑌'
• Get	
  top	
  K	
  per	
  each	
  user:	
  min(O(𝐾𝑚𝑛),	
  O(𝑚𝑛 log 𝑛))
• m:	
  #	
  of	
  users,	
  n:	
  #	
  of	
  items
• Same	
  order	
  of	
  constructing	
  whole	
  ratings	
  matrix
• Major	
  bottleneck	
  of	
  the	
  entire	
  pipeline
• No	
  easy	
  way	
  to	
  get	
  around	
  the	
  computation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 14
Some	
  Numbers
• Spark	
  cluster
• 72	
  nodes	
  (1	
  master	
  +	
  71	
  workers)
• 2TB	
  memory	
  ⟸ One	
  of	
  the	
  largest	
  clusters	
  in	
  production
• Dataset
• User	
  :	
  85M+
• Item	
  (food)	
  :	
  ~7M
• Rating	
  (food	
  log	
  counts):	
  6.5B+	
  (aggregated	
  per	
  user/food)
• Time
• ALS	
  model	
  training:	
  4	
  hours	
  
• Generating	
  top	
  K	
  food	
  recommendation	
  for	
  every	
  user:	
  48	
  
hours
• More	
  than	
  20x speed	
  improvement	
  over	
  Mahout	
  in	
  
conventional	
  Hadoop	
  cluster
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 15
Advantages	
  Using	
  Spark
• Faster	
  development	
  cycle
• MLLib
• Parallelization	
  provided	
  via	
  RDD	
  with	
  abstraction
• Easy	
  to	
  construct	
  data	
  pipeline	
  with	
  DataFrame
• Easy	
  to	
  load	
  /	
  export	
  data	
  in	
  and	
  out	
  of	
  S3	
  /	
  Redshift
• Faster	
  model	
  optimization
• In-­‐memory,	
  distributed	
  computation
• Faster	
  model	
  training	
  /	
  testing	
  
• Significant	
  reduction	
  in	
  parameter	
  tuning	
  /	
  optimization	
  on	
  
validation	
  dataset
• Easy	
  scalability
• By	
  launching	
  more	
  worker	
  instances
• Enables	
  frequent	
  model	
  updates	
  
• Reflect	
  user	
  preference	
  change	
  more	
  often
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 16
Sample	
  Food	
  Recommendation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Recommended	
  Foods
Cooked	
  White	
  Jasmine	
  Rice
Steamed	
  White	
  Rice	
  
(Unenriched)
Pho
Kimchi
Tofu	
  -­‐ Fried
Miso	
  Soup	
  With	
  Seaweed	
  and	
  
Tofu
Shrimp	
  Dumplings
Miso	
  Soup
Salmon	
  Nigiri
Sunny	
  Side	
  Up
Logged	
  Foods
Korean	
  soy	
  milk	
  with	
  high	
  
calcium
Coke	
  12oz
Fried	
  rice
Korean	
  mixed	
  grain	
  shake
Korean	
  Rice	
  Cake
Sweet	
  Soy	
  Milk
Blackberries	
  -­‐ Raw
Blueberries	
  -­‐ Raw
Korean	
  Melon	
  (Chameh /	
  참외)
Japchae (Korean	
  Stir-­‐Fried	
  
Sweet	
  Potato	
  Noodles)
17
Extension:	
  Recipe	
  Recommendation
• Recipes	
  are	
  not	
  “public”
• Currently,	
  only	
  foods	
  are	
  shared	
  across	
  
different	
  users
• Recipes	
  are	
  “private” to	
  individual	
  user	
  
when	
  created
• Cannot	
  construct	
  standard	
  user-­‐item	
  
ratings	
  matrix
• Solution:	
  Recommend	
  recipes	
  using	
  
similarity	
  with	
  foods
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
• Advantages	
  of	
  recommending	
  recipes
• Richer	
  metadata	
  (instructions,	
  ingredients,	
  cuisines,	
  …)
• Complete	
  food,	
  home-­‐cookable
• Customizable	
  with	
  personal	
  preference/restriction
18
Extension:	
  Recipe	
  Recommendation
Logged	
  foods	
  data	
  (user	
  /	
  food)
Predict	
  food	
  preference	
  by	
  matrix	
  factorization
Generate	
  top	
  K	
  food recommendation
Food	
  – Recipe	
  similarity	
  computation	
  (LSH,	
  
kNN,	
  by	
  nutrition)
Final	
  top	
  K	
  recipes	
  from food	
  recommendation	
  
rank	
  and	
  recipe	
  similarity	
  rank
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Food	
  
Recom
menda
tion
Recipe
Recom
menda
tion
19
Nearest	
  Neighbor	
  with	
  Locality	
  Sensitive	
  Hashing
• Naïve	
  nearest	
  neighbor	
  computation:	
  O(n2)	
  
• Nearly	
  impossible	
  with	
  7M	
  foods	
  and	
  38M	
  recipes
• Locality	
  Sensitive	
  Hashing	
  (LSH)
• Hashes	
  similar	
  items	
  into	
  the	
  same	
  buckets	
  with	
  high	
  
probability
• Similarity	
  metric:	
  Euclidean	
  distance	
  on	
  nutrition	
  vector
• NN	
  computation	
  much	
  faster,	
  look	
  up	
  within	
  the	
  bucket:	
  
O(k2)	
  (k:	
  max	
  size	
  of	
  bucket)
• Spark	
  implementation
• https://github.com/mrsqueeze/spark-­‐hash
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 20
Top	
  K	
  Recipe	
  Recommendation
• Order	
  by	
  sum	
  of	
  food	
  recommendation	
  rank	
  and	
  recipe	
  similarity	
  rank
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Rank Food
1 Banana
2 Blueberry
3 Apple
4 Pad	
  Thai
5 …
6 …
Rank Recipe
1 Banana	
  Pie
2 Banana	
  Smoothie
3 Banana	
  Ice	
  Cream
Rank Recipe
1 Blueberry Yogurt
2 Blueberry	
  Pie
3 Salad with	
  Blueberry
…
Combin
ed	
  Rank
Recipe
2 Banana	
  Pie
3
Banana	
  
Smoothie
3 Blueberry	
  Yogurt
4
Banana	
  Ice	
  
Cream
4 Blueberry	
  Pie
5 …
21
Sample	
  Recipe	
  Recommendations
Recommended	
  Recipes
Noodle	
  sauce
Stone	
  Ground	
   Dijon	
  Mustard	
  
Marinade
Citrus	
  Dijon	
  Miso	
  Dressing
Shirataki Noodle	
  Soup
dumpling	
   sauce
Dijon	
  Miracle	
  Whip
Paleo	
  Vanilla	
  Ice-­‐Crème
Mable's	
  Chili	
  Burrito
cake	
  batter	
  milkshake
Pita	
  pizza
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness
Recommended	
  Foods
Cooked	
  White	
  Jasmine	
  Rice
Steamed	
  White	
  Rice	
  
(Unenriched)
Pho
Kimchi
Tofu	
  -­‐ Fried
Miso	
  Soup	
  With	
  Seaweed	
  and	
  
Tofu
Shrimp	
  Dumplings
Miso	
  Soup
Salmon	
  Nigiri
Sunny	
  Side	
  Up
22
Integrating	
  with	
  Taste	
  Profiles
• Machine	
  learning	
  classifier	
  that	
  outputs	
  probability	
  
distribution	
  over	
  6	
  taste	
  categories
• Savory,	
  Sweet,	
  Sour,	
  Spicy,	
  Salty,	
  Bitter
• NN	
  classifier	
  performed	
  over	
  feature	
  vector	
  (semantic	
  
word	
  vector	
  +	
  numeric	
  nutritional	
  value	
  vector)
• With	
  small	
  number	
  (~1400)	
  of	
  labeled	
  foods
• 87%	
  accuracy	
  on	
  separately	
  labeled	
  test	
  set	
  (~1200)
• Works	
  as	
  additional	
  metadata	
  for	
  foods
• Recommendation	
  results	
  can	
  be	
  further	
  filtered	
  /	
  
reordered	
  by	
  personal	
  taste	
  preferences
• With	
  Spark,	
  data	
  integration	
  with	
  hash-­‐join	
  is	
  much	
  
faster
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 23
Problems
• Cold-­‐start	
  problem
• New	
  user	
  (important)
• New	
  food	
  item	
  (not	
  too	
  much,	
  since	
  they	
  may	
  not	
  be	
  
popular)
• Solution
• Hybrid	
  with	
  content-­‐based	
  recommendation
• Construct	
  basic	
  profile	
  for	
  new	
  users	
  to	
  get	
  baseline	
  
recommendation
• May	
  recommend	
  new	
  items	
  based	
  on	
  feature	
  similarity
• Runs	
  the	
  pipeline	
  more	
  often
• For	
  new	
  users,	
  recommend	
  top/popular	
  foods
• After	
  a	
  while,	
  these	
  users	
  will	
  get	
  recommendation
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 24
Possible	
  Extension
• Collaborative	
  filtering	
  over	
  
aggregated	
  food	
  clusters
• User-­‐generated	
  foods	
  contain	
  (near)	
  
duplicates
• Combine	
  with	
  verified	
  food	
  project	
  
• Spark-­‐based	
  data	
  deduplication	
  /	
  
processing	
  pipeline
• Construct	
  “verified”	
  foods	
  out	
  of	
  
thorough	
  clustering
• Recommend	
  users	
  with	
  high-­‐quality	
  
food
• Recommend	
  representative	
  food	
  from	
  
each	
  cluster,	
  instead	
  of	
  individual	
  
variation
• Reducing	
  the	
  computation	
  time	
  due	
  
to	
  reduced	
  dimension
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 25
Application
• Recommend	
  frequently	
  paired	
  
foods
• Pairing	
  foods	
  within	
  a	
  single	
  meal	
  
depends	
  on
• Individual	
  user’s	
  own	
  preference
• Cultural	
  difference	
  (region,	
  
country)
• Simple	
  way
• Suggest	
  popular	
  foods	
  based	
  on	
  
co-­‐occurrence	
  stats	
  per	
  individual	
  
user	
  /	
  overall	
  users
• Utilize	
  this	
  framework	
  to	
  capture	
  
better	
  personalized	
  preference
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 26
Summary
• Spark-­‐powered	
  machine	
  learning	
  pipeline	
  for	
  
food/recipe	
  recommendation	
  system
• Faster	
  computation	
  help	
  reduce	
  the	
  time	
  on	
  
development	
  cycle
• Help	
  data	
  scientists	
  focus	
  on	
  core	
  problems
• Easy	
  extension	
  by	
  attaching	
  additional	
  data	
  processing	
  
steps	
  with	
  scalability
• Only	
  scratched	
  a	
  surface
• Food	
  /	
  recipe	
  recommendation
• Extensions	
  with	
  other	
  data	
  sources
• Workout
• Music
• Retail
• e-­‐Commerce	
  
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 27
Questions?
6/16/2015 MyFitnessPal	
  -­‐ Under	
  Armour	
  Connected	
  Fitness 28

Mais conteúdo relacionado

Semelhante a iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyun Kim, MyFitnessPal, Under Armour-Connected Fitness)

Nutrition Strategies in Worksite Wellness by TDSHS
Nutrition Strategies in Worksite Wellness by TDSHSNutrition Strategies in Worksite Wellness by TDSHS
Nutrition Strategies in Worksite Wellness by TDSHSAtlantic Training, LLC.
 
Ovenbot the real kitchen of the future
Ovenbot the real kitchen of the futureOvenbot the real kitchen of the future
Ovenbot the real kitchen of the futureshawn212
 
MyPlate plan PowerPoint Presentation
MyPlate plan PowerPoint PresentationMyPlate plan PowerPoint Presentation
MyPlate plan PowerPoint PresentationEarlene McNair
 
Educate 2017: Today’s Special: Item versioning and dynamic content
Educate 2017: Today’s Special: Item versioning and dynamic contentEducate 2017: Today’s Special: Item versioning and dynamic content
Educate 2017: Today’s Special: Item versioning and dynamic contentLearnosity
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...James Hendler
 
Thai food recommendation
Thai food recommendationThai food recommendation
Thai food recommendationYadaLimsuwan
 
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture? Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture? InsideScientific
 
UEMCON_2016_4
UEMCON_2016_4UEMCON_2016_4
UEMCON_2016_4Zsolt Ori
 
User modelling challenge ideatory 2014
User modelling challenge ideatory 2014User modelling challenge ideatory 2014
User modelling challenge ideatory 2014Parindsheel Dhillon
 
Filling the Exercise Prescription
Filling the Exercise PrescriptionFilling the Exercise Prescription
Filling the Exercise PrescriptionEsserHealth
 
Small Steps to Health and Wealth Presentation
Small Steps to Health and Wealth PresentationSmall Steps to Health and Wealth Presentation
Small Steps to Health and Wealth PresentationBarbara O'Neill
 
HCP Overview by Dr Bill Toth
HCP Overview by Dr Bill TothHCP Overview by Dr Bill Toth
HCP Overview by Dr Bill TothCreate Your Fate
 

Semelhante a iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyun Kim, MyFitnessPal, Under Armour-Connected Fitness) (20)

Evaluation of MyFitnessPal
Evaluation of MyFitnessPalEvaluation of MyFitnessPal
Evaluation of MyFitnessPal
 
Nutrition Strategies in Worksite Wellness by TDSHS
Nutrition Strategies in Worksite Wellness by TDSHSNutrition Strategies in Worksite Wellness by TDSHS
Nutrition Strategies in Worksite Wellness by TDSHS
 
Off seasonrts 11.13
Off seasonrts 11.13Off seasonrts 11.13
Off seasonrts 11.13
 
Ovenbot the real kitchen of the future
Ovenbot the real kitchen of the futureOvenbot the real kitchen of the future
Ovenbot the real kitchen of the future
 
Menu labeling risk mitigation
Menu labeling  risk mitigationMenu labeling  risk mitigation
Menu labeling risk mitigation
 
SEO Training May 2014
SEO Training May 2014SEO Training May 2014
SEO Training May 2014
 
MyPlate plan PowerPoint Presentation
MyPlate plan PowerPoint PresentationMyPlate plan PowerPoint Presentation
MyPlate plan PowerPoint Presentation
 
Educate 2017: Today’s Special: Item versioning and dynamic content
Educate 2017: Today’s Special: Item versioning and dynamic contentEducate 2017: Today’s Special: Item versioning and dynamic content
Educate 2017: Today’s Special: Item versioning and dynamic content
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
 
Thai food recommendation
Thai food recommendationThai food recommendation
Thai food recommendation
 
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture? Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
Circadian Rhythms of Food Intake: Are You Seeing The Whole Picture?
 
Herbalife 24 presentation
Herbalife 24  presentationHerbalife 24  presentation
Herbalife 24 presentation
 
UEMCON_2016_4
UEMCON_2016_4UEMCON_2016_4
UEMCON_2016_4
 
User modelling challenge ideatory 2014
User modelling challenge ideatory 2014User modelling challenge ideatory 2014
User modelling challenge ideatory 2014
 
Filling the Exercise Prescription
Filling the Exercise PrescriptionFilling the Exercise Prescription
Filling the Exercise Prescription
 
Small Steps to Health and Wealth Presentation
Small Steps to Health and Wealth PresentationSmall Steps to Health and Wealth Presentation
Small Steps to Health and Wealth Presentation
 
HCP Overview by Dr Bill Toth
HCP Overview by Dr Bill TothHCP Overview by Dr Bill Toth
HCP Overview by Dr Bill Toth
 
Chicago Child Care Center Standards Board of Health- March 2012
Chicago Child Care Center Standards  Board of Health- March 2012Chicago Child Care Center Standards  Board of Health- March 2012
Chicago Child Care Center Standards Board of Health- March 2012
 
13 a dietary analysis
13 a dietary analysis13 a dietary analysis
13 a dietary analysis
 
ProductPitch.pptx
ProductPitch.pptxProductPitch.pptx
ProductPitch.pptx
 

Mais de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Mais de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 

Último (20)

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 

iRIS: A Large-Scale Food and Recipe Recommendation System Using Spark-(Joohyun Kim, MyFitnessPal, Under Armour-Connected Fitness)

  • 1. JoohyunKim Sr.  Data  Scientist MyFitnessPal – Under  Armour Connected  Fitness
  • 2. Who  are  we? #1  in   Health  &   Fitness ~7M  food   database 14.5B+   food   logging   data 85M+   Users Workout e-­‐ Commerce Retail 120M+   Users 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 2
  • 3. Under  Armour =  Apparel  Company? • http://www.fool.com/investing/general/2015/06/07/how-­‐under-­‐armour-­‐is-­‐ becoming-­‐a-­‐tech-­‐company.aspx 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 3
  • 4. Food  /  Nutrition •14.5B+  logged  foods •~7M  foods •38M+  recipes •From  85M+  users Workout •Time-­‐series  GPS  data •Music  data  with  workout •Run/Ride/Walk •Active  time  of  the  day •Sleep  pattern •390B+  calories  burned  (MFP) •1.2T+  minutes  exercise  (MFP) Retail  /  e-­‐ Commerce •Product  purchase   transactions •User  preferences  on   clothes  /  shoes  /  wearable   devices Under  Armour =  Data  Company! 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 4
  • 5. Importance  of  Data 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 5
  • 6. Biggest  Concern  of  Life:  What  to  Eat? Taste Healthy Preference Dietary Restriction 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 6
  • 7. Recommender  System? Word  of   mouth Reviews  / Blogs Expert   Advice 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Typical  ways  of  getting  recommendation Limited Biased Lack  of  Source 7
  • 8. John Mary Mike Jane Banana 5 3 1 4 Blueberry 3 -­‐ -­‐ 2 Apple -­‐ -­‐ 2 -­‐ (?) Melon 1 -­‐ -­‐ -­‐ (?) Collaborative  Filtering • Predict  how  a  user  may  like  a  new  item  based  on   prior  user  behaviors  with  similar  preference 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness User  – Food  Logged  Counts  Table 8
  • 9. Matrix  Factorization • Filling  in  the  missing  entries  in  ratings  (user-­‐food   logged  counts)  matrix • Formulate  as  low-­‐rank  matrix  factorization • Factorize  user-­‐item  matrix  to  user-­‐feature  and  feature-­‐ item  matrix  (#  features    ≪ #  users  or  #  items) 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness R X YT Objective Minimize  RMSE  (Root   mean  squared  error)   between   𝑅  and   𝑋  ×   𝑌' ≈ × Users Items 9
  • 10. • Explicit  ratings  are  available  for  movies  /  songs • Typically  1~5  stars  (ratings)  given • For  MFP  food  logging  events,  there  are  only  “logged” foods.  No  negative  feedback • Can’t  assume  0  count  (no  entry)  as  negative • Reference:  Hu,  Koren,  and  Volinsky,  Collaborative  Filtering  with   Implicit  Feedback  Dataset,  ICDM  08 • Construct  “binary”  ratings  matrix  P,  and  factorize  P   instead  of  R  (original  ratings  matrix) “Implicit”  Matrix  Factorization 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 5 3 ? 3 ? ? 2 ? ? 4 1 ? ? ? ? ? ? 1 2 ? ? ? 2 ? 1 ? 3 ? ? 1 1 1 ? 1 ? ? 1 ? ? 1 1 ? ? ? ? ? ? 1 1 ? ? ? 1 ? 1 ? 1 ? ? 1 R P ⇒ ≈ × X Y 10
  • 11. Alternating  Least  Squares • Optimizing  X  (user)  and  Y  (item)  at  the  same  time  is   hard • Fix  X  or  Y  ⟹ Solve  for  the  other • Solve  the  system  of  linear  equations • Take  the  derivative  of  objective  function  w.r.t X  or  Y,  set   0,  and  solve • Starting  with  random  initialization  of  Y • EM-­‐like  iterative  process • Iterate  until  the  change  is  very  small  (or  stop  with   fixed  iteration  number) 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 11
  • 12. Scalability?  ⇒ Parallelization! 6/16/2015 MyFitnessPal -­‐ Under  Armour Connected  Fitness • Rating  matrix:  85M  users  × 7M  foods  ≅ 595T   entries • Impossible  to  fit  in  a  single  machine • Sparse  representation:  billions  of  entries • ALS  can  be  easily  parallelized  with  map-­‐reduce   framework • Sharding users  and  items  vectors • Mapper  on  individual  sub-­‐matrix • Reducer  on  aggregation  over  users/items • Spark  MLLib • Parallelized  version  of  ALS  ready  to  use • Fast  computation  with  DataFrame val model  =  new  ALS() .setRank(20) .setImplicitPrefs(true) .setAlpha(40) .setRegParam(0.1) .setMaxIter(10) .fit(ratings) 12
  • 13. Food  Recommendation  Pipeline Logged  foods  data  (user  /  food) Predict  food  preference  by  matrix   factorization Generate  top  K  food recommendation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 13
  • 14. Generating  Top  K  Recommendations • We  need  to  serve  top  recommended  foods  to  users • With  the  trained  factorized  matrix  model, • Predict  top  K  foods  for  each  user  (in  the  order  of  their   own  preference) • Seem  trivial,  but  the  computation  is  huge • For  each  user,  retrieve  food  preference  by   𝑅 = 𝑋  ×   𝑌' • Get  top  K  per  each  user:  min(O(𝐾𝑚𝑛),  O(𝑚𝑛 log 𝑛)) • m:  #  of  users,  n:  #  of  items • Same  order  of  constructing  whole  ratings  matrix • Major  bottleneck  of  the  entire  pipeline • No  easy  way  to  get  around  the  computation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 14
  • 15. Some  Numbers • Spark  cluster • 72  nodes  (1  master  +  71  workers) • 2TB  memory  ⟸ One  of  the  largest  clusters  in  production • Dataset • User  :  85M+ • Item  (food)  :  ~7M • Rating  (food  log  counts):  6.5B+  (aggregated  per  user/food) • Time • ALS  model  training:  4  hours   • Generating  top  K  food  recommendation  for  every  user:  48   hours • More  than  20x speed  improvement  over  Mahout  in   conventional  Hadoop  cluster 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 15
  • 16. Advantages  Using  Spark • Faster  development  cycle • MLLib • Parallelization  provided  via  RDD  with  abstraction • Easy  to  construct  data  pipeline  with  DataFrame • Easy  to  load  /  export  data  in  and  out  of  S3  /  Redshift • Faster  model  optimization • In-­‐memory,  distributed  computation • Faster  model  training  /  testing   • Significant  reduction  in  parameter  tuning  /  optimization  on   validation  dataset • Easy  scalability • By  launching  more  worker  instances • Enables  frequent  model  updates   • Reflect  user  preference  change  more  often 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 16
  • 17. Sample  Food  Recommendation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Recommended  Foods Cooked  White  Jasmine  Rice Steamed  White  Rice   (Unenriched) Pho Kimchi Tofu  -­‐ Fried Miso  Soup  With  Seaweed  and   Tofu Shrimp  Dumplings Miso  Soup Salmon  Nigiri Sunny  Side  Up Logged  Foods Korean  soy  milk  with  high   calcium Coke  12oz Fried  rice Korean  mixed  grain  shake Korean  Rice  Cake Sweet  Soy  Milk Blackberries  -­‐ Raw Blueberries  -­‐ Raw Korean  Melon  (Chameh /  참외) Japchae (Korean  Stir-­‐Fried   Sweet  Potato  Noodles) 17
  • 18. Extension:  Recipe  Recommendation • Recipes  are  not  “public” • Currently,  only  foods  are  shared  across   different  users • Recipes  are  “private” to  individual  user   when  created • Cannot  construct  standard  user-­‐item   ratings  matrix • Solution:  Recommend  recipes  using   similarity  with  foods 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness • Advantages  of  recommending  recipes • Richer  metadata  (instructions,  ingredients,  cuisines,  …) • Complete  food,  home-­‐cookable • Customizable  with  personal  preference/restriction 18
  • 19. Extension:  Recipe  Recommendation Logged  foods  data  (user  /  food) Predict  food  preference  by  matrix  factorization Generate  top  K  food recommendation Food  – Recipe  similarity  computation  (LSH,   kNN,  by  nutrition) Final  top  K  recipes  from food  recommendation   rank  and  recipe  similarity  rank 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Food   Recom menda tion Recipe Recom menda tion 19
  • 20. Nearest  Neighbor  with  Locality  Sensitive  Hashing • Naïve  nearest  neighbor  computation:  O(n2)   • Nearly  impossible  with  7M  foods  and  38M  recipes • Locality  Sensitive  Hashing  (LSH) • Hashes  similar  items  into  the  same  buckets  with  high   probability • Similarity  metric:  Euclidean  distance  on  nutrition  vector • NN  computation  much  faster,  look  up  within  the  bucket:   O(k2)  (k:  max  size  of  bucket) • Spark  implementation • https://github.com/mrsqueeze/spark-­‐hash 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 20
  • 21. Top  K  Recipe  Recommendation • Order  by  sum  of  food  recommendation  rank  and  recipe  similarity  rank 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Rank Food 1 Banana 2 Blueberry 3 Apple 4 Pad  Thai 5 … 6 … Rank Recipe 1 Banana  Pie 2 Banana  Smoothie 3 Banana  Ice  Cream Rank Recipe 1 Blueberry Yogurt 2 Blueberry  Pie 3 Salad with  Blueberry … Combin ed  Rank Recipe 2 Banana  Pie 3 Banana   Smoothie 3 Blueberry  Yogurt 4 Banana  Ice   Cream 4 Blueberry  Pie 5 … 21
  • 22. Sample  Recipe  Recommendations Recommended  Recipes Noodle  sauce Stone  Ground   Dijon  Mustard   Marinade Citrus  Dijon  Miso  Dressing Shirataki Noodle  Soup dumpling   sauce Dijon  Miracle  Whip Paleo  Vanilla  Ice-­‐Crème Mable's  Chili  Burrito cake  batter  milkshake Pita  pizza 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness Recommended  Foods Cooked  White  Jasmine  Rice Steamed  White  Rice   (Unenriched) Pho Kimchi Tofu  -­‐ Fried Miso  Soup  With  Seaweed  and   Tofu Shrimp  Dumplings Miso  Soup Salmon  Nigiri Sunny  Side  Up 22
  • 23. Integrating  with  Taste  Profiles • Machine  learning  classifier  that  outputs  probability   distribution  over  6  taste  categories • Savory,  Sweet,  Sour,  Spicy,  Salty,  Bitter • NN  classifier  performed  over  feature  vector  (semantic   word  vector  +  numeric  nutritional  value  vector) • With  small  number  (~1400)  of  labeled  foods • 87%  accuracy  on  separately  labeled  test  set  (~1200) • Works  as  additional  metadata  for  foods • Recommendation  results  can  be  further  filtered  /   reordered  by  personal  taste  preferences • With  Spark,  data  integration  with  hash-­‐join  is  much   faster 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 23
  • 24. Problems • Cold-­‐start  problem • New  user  (important) • New  food  item  (not  too  much,  since  they  may  not  be   popular) • Solution • Hybrid  with  content-­‐based  recommendation • Construct  basic  profile  for  new  users  to  get  baseline   recommendation • May  recommend  new  items  based  on  feature  similarity • Runs  the  pipeline  more  often • For  new  users,  recommend  top/popular  foods • After  a  while,  these  users  will  get  recommendation 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 24
  • 25. Possible  Extension • Collaborative  filtering  over   aggregated  food  clusters • User-­‐generated  foods  contain  (near)   duplicates • Combine  with  verified  food  project   • Spark-­‐based  data  deduplication  /   processing  pipeline • Construct  “verified”  foods  out  of   thorough  clustering • Recommend  users  with  high-­‐quality   food • Recommend  representative  food  from   each  cluster,  instead  of  individual   variation • Reducing  the  computation  time  due   to  reduced  dimension 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 25
  • 26. Application • Recommend  frequently  paired   foods • Pairing  foods  within  a  single  meal   depends  on • Individual  user’s  own  preference • Cultural  difference  (region,   country) • Simple  way • Suggest  popular  foods  based  on   co-­‐occurrence  stats  per  individual   user  /  overall  users • Utilize  this  framework  to  capture   better  personalized  preference 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 26
  • 27. Summary • Spark-­‐powered  machine  learning  pipeline  for   food/recipe  recommendation  system • Faster  computation  help  reduce  the  time  on   development  cycle • Help  data  scientists  focus  on  core  problems • Easy  extension  by  attaching  additional  data  processing   steps  with  scalability • Only  scratched  a  surface • Food  /  recipe  recommendation • Extensions  with  other  data  sources • Workout • Music • Retail • e-­‐Commerce   6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 27
  • 28. Questions? 6/16/2015 MyFitnessPal  -­‐ Under  Armour  Connected  Fitness 28