O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Scala Json Features and Performance

7.116 visualizações

Publicada em

Compares features and performance of 11 Json parsers that can be used from Scala.

Publicada em: Software
  • Seja o primeiro a comentar

Scala Json Features and Performance

  1. 1. SCALA JSON FEATURES AND PERFORMANCES John Nestor- 47 Degrees nestor@persist.com Dragos Manolescu dam@micro-workflow.com https://github.com/47deg/json-perf 47deg.com 1
  2. 2. 47deg.com DISCLAIMER • Best effort attempt to measure performance and describe features. • Corrections always appreciated. • Also let us know any Json parsers we missed. 47deg.com 2
  3. 3. 47deg.com • There are lots of Scala Json parsers • You can also use Java Json parsers in Scala • How to Choose: • Performance • Features • API • Support (will not be abandoned) • License (most are Apache 2) SCALA JSON 3
  4. 4. 47deg.com SCALA (2.11) JSON PARSERS Parser URL Version Language Persist Json https://github.com/nestorpersist/json 1.1.0 Scala Rojoma https://github.com/rjmac/rojoma-json 3.3.0 Scala Jackson http://wiki.fasterxml.com/ JacksonHome 2.5.3 Scala/Java Spray Json https://github.com/spray/spray-json 1.3.2 Scala Lift Json https://github.com/lift/lift/tree/master/ framework/lift-base/lift-json/ 2.6.2 Scala Twitter Json https://github.com/stevej/scala-json NA Scala Scala Library https://github.com/scala/scala-parser- combinators 1.0.4 Scala Play Json https://www.playframework.com/ documentation/2.0/ScalaJson 2.4.1 Scala/Java Json Smart https://github.com/netplex/json-smart- v2 2.1.0 Java Argonaut http://argonaut.io/ 6.0.4 Scala JAWN https://github.com/non/jawn 0.8.0 Scala 4
  5. 5. 47deg.com THE PARSERS (1 OF 4) • Scala Library. This parser is part of the standard Scala library in package scala.util.parsing.json. It is implemented using parsing combinators. • Twitter Json. A cleaned up version of the JSON parser in Odersky's Scala book. It is implemented using parsing combinators. Written by Steve Jenson while at Twitter. • Persist Json. Developed as part of the OStore, a new NoSQL database written in Scala. OStore started with the Twitter parser. This turned out to be much too slow, so it was rewritten from scratch keeping mostly the same API but with an emphasis on speed. Developed by John Nestor (with the codex based mapper by JR Dejardin). 5
  6. 6. 47deg.com THE PARSERS (2 OF 4) • Play Json. A part of the Typesafe Play framework. Implemented using Jerkson, a Scala wrapper on Jackson. • Lift Json. Developed as part of Lift, a framework for building web apps. • Spray Json. Developed as part of Spray, a REST/ HTTP network IO toolkit. 6
  7. 7. 47deg.com THE PARSERS (3 OF 4) • Argonaut. Purely functional Json in Scala. Uses Scalaz. • Rojama. Another Scala parser that makes extensive use of Scala’s functional features. Developed by Robert Macomber of Socrata. • Jawn. Jawn was designed to parse JSON into an AST as quickly as possible. 7
  8. 8. 47deg.com THE PARSERS (4 OF 4) • Jackson. Generally regarded as the best and fastest Java Json parser. Has a very rich set of features. We test using the DefaultScalaModule (by Chris Currie) that provides Scala support. • Json Smart. A newer faster (than Jackson) Json parser written in Java. 8
  9. 9. 47deg.com TEST SETS FOR PERFORMANCE TESTING • Twitter. Tweets processed by the Yap.tv Guide 
 (http://j.mp/15WL0p3), a service providing a personalized TV guide companion experience based on social content from Twitter and Facebook.This data set contains 100 tweets in Json 
 (http://j.mp/13lKbU6). • Google. PlaceSearchResults returned by Google in response to place queries at 100 locations. The locations correspond to the top best places to live in 2012, as compiled by CNN Money (http://j.mp/13NmVid). This data set contains 138 PlaceSearchResults in Json (http://j.mp/13NmCUC) using keyword “brewery” and a radius of 2 miles. • Each file has one Json object per line. 9
  10. 10. 47deg.com PRETTY SAMPLE TWITTER JSON {"contributors":null, "coordinates":null, "created_at":"Mon Jun 27 21:45:46 +0000 2011", "entities": {"hashtags":[], "urls": [{"display_url":"mercynotes.com", "expanded_url":"http://www.mercynotes.com/", "indices":[61,80], "url":"http://t.co/lKzLFOd" } ], "user_mentions":[] }, "favorited":false, "geo":null, "id":85463859615379456, "id_str":"85463859615379456", "in_reply_to_screen_name":null, "in_reply_to_status_id":null, "in_reply_to_status_id_str":null, "in_reply_to_user_id":null, "in_reply_to_user_id_str":null, "place":null, "retweet_count":0, "retweeted":false, "source":"web", "text": "Been watching Wimbledon? Check out new post Love and Tennis: http://t.co/lKzLFOd", "truncated":false, "user": {"contributors_enabled":false, "created_at":"Mon May 30 16:35:44 +0000 2011", "default_profile":true, "default_profile_image":false, "description":"", "favourites_count":0, "follow_request_sent":null, "followers_count":6, "following":null, "friends_count":12, "geo_enabled":false, "id":307978890, "id_str":"307978890", "is_translator":false, "lang":"en", "listed_count":0, "location":"NC", "name":"Julie LaJoe", "notifications":null, "profile_background_color":"C0DEED", "profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png", "profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png", "profile_background_tile":false, "profile_image_url": "http://a0.twimg.com/profile_images/1375001769/JulieMNnew__2__normal.jpg", "profile_image_url_https": "https://si0.twimg.com/profile_images/1375001769/JulieMNnew__2__normal.jpg", "profile_link_color":"0084B4", "profile_sidebar_border_color":"C0DEED", "profile_sidebar_fill_color":"DDEEF6", "profile_text_color":"333333", "profile_use_background_image":true, "protected":false, "screen_name":"mercynotes", "show_all_inline_media":false, "statuses_count":13, "time_zone":"Quito", "url":"http://mercynotes.com", "utc_offset":-18000, "verified":false } } 10
  11. 11. 47deg.com PRETTY SAMPLE GOOGLE JSON{"address_components": [{"long_name":"622", "short_name":"622", "types":["street_number"] }, {"long_name":"South Rangeline Road", "short_name":"South Rangeline Road", "types":["route"] }, {"long_name":"Carmel", "short_name":"Carmel", "types":["locality","political"] }, {"long_name":"Hamilton", "short_name":"Hamilton", "types": ["administrative_area_level_2","political"] }, {"long_name":"Indiana", "short_name":"Indiana", "types": ["administrative_area_level_1","political"] }, {"long_name":"US", "short_name":"US", "types":["country","political"] }, {"long_name":"46032", "short_name":"46032", "types":["postal_code"] } ], "formatted_address": "Suite Q, 622 South Rangeline Road, Carmel, Indiana, United States", "formatted_phone_number":"(317) 429-6345", "geometry": {"location":{"lat":39.971703,"lng":-86.129099}}, "icon": "http://maps.gstatic.com/mapfiles/place_api/icons/generic_business-71.png", "id":"fcd83d32717980ec1fec2c7ec8719389b201a331", "international_phone_number":"+1 317-429-6345", "name":"Union Brewing Company", "opening_hours": {"open_now":true, "periods": [{"close":{"day":0,"time":"2000"}, "open":{"day":0,"time":"1200"} }, {"close":{"day":2,"time":"2200"}, "open":{"day":2,"time":"1600"} }, {"close":{"day":4,"time":"2200"}, "open":{"day":4,"time":"1600"} }, {"close":{"day":6,"time":"0000"}, "open":{"day":5,"time":"1500"} }, {"close":{"day":0,"time":"0000"}, "open":{"day":6,"time":"1200"} } ] }, "photos": [{"height":1632, "html_attributions": ["<a href="https://plus.google.com/117934275405882297051">Greg Magnusson</a>" ], "photo_reference": "CnRoAAAAvN9y_gkgZIGa13kUSyyBlqwholvjtH4NKo-BzvlklcX-Tt9Ysc6HRMXPxKl3PumZtiOnomHi-Nk83y-lxf8RX8nsWulwuCBpY2okAqaU9wohOhncStFPZlKr02t3WquA6pt8mfCYYO- NAdU2HwdM1hIQYJmus4wpQBaRtP7BFdYhzRoU4XvzfAAQQwkdJZluFJ-tDoUulIo", "width":1224 } ], "reference": "CoQBcgAAAF3VKrWBUmLMv5tLs1Ru47j3Tbxa6lPxlIFj5BUvpsTyPt3bpui2vOTCcaHjKYuAjSulIPHpd0YFgm5CKLQH6P_19xU1UPeu6avWeIMWA0u4hxyx4TazCfFF9ESCwHaOEcKZfRyJSD2b5p2IJvT0eVkFFExeWbqAcWrH80jIQ- VrEhAvUSpbmH3rB4LEKn-cZtsYGhQxFpeco4U1rUtwe-ncAttqLBnSgQ", "reviews": [{"aspects":[{"rating":3,"type":"quality"}], "author_name":"Greg Magnusson", "author_url": "https://plus.google.com/117934275405882297051", "text": "Truly outstanding local craft brewing company. Indy's got some great local brewers, but these guys really get it right. Nice little location in Carmel, great beer and local guest taps... I'm so glad these guys moved into town. Love!", "time":1361059887 } ], "types":["food","establishment"], "url": "https://plus.google.com/102928473191458623183/about?hl=en-US", "utc_offset":-300, "vicinity": "Suite Q, 622 South Rangeline Road, Carmel", "website":"http://www.unionbrewingco.com" } 11
  12. 12. 47deg.com • Timing is done with Java System.nanotime(). • For each data set, each line is processed. • This is repeated 25 time to warm JVM. • This is repeated 200 times for measurement. • For example, google has 138 Json lines, so during warmup a total of 3450 lines are parsed and during testing 27600 lines are parsed. • The total summed nanoseconds for all 27600 parse steps are reported as milliseconds for each parser. TESTING PROCESS 12
  13. 13. 47deg.com TIMING SCALA/JAVA CODE • Timing is tricky! For example see • http://www.ibm.com/developerworks/library/j- benchmark1/ • A few of the many issues: • Warmup (run several times to warm JVM) • Repeatability (use average?, but what about P99?) • Interference from other processes • Caches • Garbage collection • Chosen data set 13
  14. 14. 47deg.com TESTING MACHINE • Times obviously depend on speed of machine used in testing. • Numbers here are for a MacBook pro with • 2 2.9 GHz cores • 16GB of main memory • You can run tests on a machine of your choice! 14
  15. 15. 47deg.com PARSING TIMES (MS) Parser Twitter Google Ignore Persist Json 443 712 Rojoma 540 1251 Jackson 445 842 Spray Json 603 1115 Lift Json 469 1002 Twitter Json 18179 42316 Too Slow Scala Library 126006 329215 Way Too Slow Play Json 442 1027 Json Smart 251 424 Argonaut 784 1448 JAWN 603 748 15
  16. 16. 47deg.com PARSING TIMES - TWITTER 16
  17. 17. 47deg.com PARSING TIMES - GOOGLE 17
  18. 18. 47deg.com WHY IS TWITTER SLOW? • Parsing combinators. Elegant but slow. • Interpreted. Backtracking. 
 18 def value: Parser[Any] = obj | arr | string | number |
 "null" ^^ (x => null) | "true" ^^ (x => true) | "false" ^^ (x => false)
 
 def obj: Parser[Map[String, Any]] = "{" ~> repsep(member, ",") <~ "}" ^^ (Map.empty ++ _)
 
 def arr: Parser[List[Any]] = "[" ~> repsep(value, ",") <~ "]"
 
 def member: Parser[(String, Any)] = string ~ ":" ~ value ^^ {
 case name ~ ":" ~ value => (name, value)
 }
  19. 19. 47deg.com WHY IS THE SCALA LIBRARY EVEN SLOWER? • Like Twitter uses parsing combinators. • But why is it so much slower? 19
  20. 20. 47deg.com WHY IS PLAY SO SLOW IF IT USES JACKSON? • It uses Jerkson (which is abandoned)? • ??? 20
  21. 21. 47deg.com JSON LANGUAGE EXTENSIONS Parser Comments NoQuotes Root Type Other Persist Json // field any raw strings Rojoma //,/**/ field any keeps field order Jackson // field object can use ‘ Spray Json // any Lift Json object keeps field order Twitter Json any Scala Library any Play Json object Json Smart # field/value object Argonaut object keeps field order JAWN object 21
  22. 22. 47deg.com PARSER RESULTS (ASTS) Parser Object, Array Wrapped in Object Immutable Collections Persist Json Map, List no yes Scala Rojoma LinkedHashMap, Vector yes no Scala Jackson Map, List yes no Java Spray Json Map, Vector yes yes Scala Lift Json List[Field], List yes yes Scala Twitter Json Map, List no yes Scala Scala Library Map, List yes yes Scala Play Json Map, List yes yes Scala Json Smart HashMap, List yes no Java Argonaut scalaz.InsertionMap, List yes yes Scala JAWN Map, Array yes no Scala 22
  23. 23. 47deg.com UNPARSING • The inverse of parsing (deserialization) is unparsing (serialization). • Unparsing takes the AST from parsing and converts it back to a string. • Useful for debugging and logging. • Many parsers also include a pretty printed unparser. • Timing here for the “non-pretty” simple form. 23
  24. 24. 47deg.com UN-PARSING TIMES (MS) Parser Twitter Google Persist Json 622 1172 Rojoma 226 511 Jackson 11 29 Spray Json 232 676 Lift Json 1125 3211 Play Json 322 323 Json Smart 349 934 Argonaut 1005 2468 JAWN 498 1161 24
  25. 25. 47deg.com UN-PARSING TIMES - TWITTER 25
  26. 26. 47deg.com UN-PARSING TIMES - GOOGLE 26
  27. 27. 47deg.com WHY IS JACKSON SO INCREDIBLY FAST? • Uses SegmentedStringBuilder (rather than StringBuilder). • Uses segmented internal buffer. • Buffers are recycled. 27
  28. 28. 47deg.com WHY IS PERSIST SLOW? • Uses raw Seq and Map rather than being wrapped in custom classes. • Must use pattern match rather than virtual dispatch to a virtual method. 28
  29. 29. 47deg.com MAPPERS • Parsers go from string to AST • Mappers go to user specified case classes • Twitter, Scala Library, Json Smart, JAWN: no mapper • Jackson, Argonaut, Rojoma: 
 string => case classes • Others: string => AST => case classes 29
  30. 30. 47deg.com DYNAMIC VERSUS STATIC TYPING • Dynamic: AST. More flexible and agile. No additional code needed for parsing. Can be used on any valid json data. But need extra code if more checking is needed. • Static: User Specified Case Classes. Must specify case classes before parsing can proceed. More checking. Can attach behavior to case classes. 30
  31. 31. 47deg.com MAPPING TIMES (MS) Parser Twitter Google Persist Json 622 2238 Rojoma 1117 2669 Jackson 326 1150 Spray Json 557 1675 Lift Json 520 2060 Play Json 1123 3768 Argonaut 937 2550 31
  32. 32. 47deg.com MAPPING TIMES - TWITTER 32
  33. 33. 47deg.com MAPPING TIMES - GOOGLE 33
  34. 34. 47deg.com MAPPERS Parser Extra Code Lines Why Persist Json 0 Rojoma 135 case classes, Array=>Seq Jackson 0 Spray Json 16 case classes Lift Json 7 BigDecimal Play Json 16 case classes Argonaut 180 case classes, Array=>List, Seq=>List, BigDecimal=>Double 34
  35. 35. 47deg.com AVOIDING EXTRA CODE • Find types of case class parameter names. Java reflection works. • Find names of case class parameters. Prior to Java 8 not available via Java reflection. Scala reflection however does work. • Reflection can be quite slow. Caching can help! • Persist: Shapeless • Lift and Jackson: Paranamer. Gets info from reading Java byte code symbol tables. 35
  36. 36. 47deg.com SUMMARY • Avoid: Scala Library, Twitter • Fast parse and no other features: Json Smart • Good overall choices: Jackson, Persist, Spray • Very fast unparse: Jackson 36
  37. 37. 47deg.com QUESTIONS 37 QUESTIONS
  38. 38. 47deg.com THANKS! 38 QUESTIONS To contact me or 47 Degrees: Email
 nestor@persist.com hello@47deg.com Twitter
 @47deg Web
 47deg.com

×