O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
http://avro.apache.org                                            Apache Avro                         More Than Just A Ser...
Agenda     • History / Overview     • Serialization Framework              • Supported Languages              • Performanc...
History / Overview3   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Com...
History / Overview     Existing Serialization Frameworks              • protobuf, thrift, avro, kryo, hessian, activemq-pr...
Serialization Framework5   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClic...
Serialization Framework     Avro Limitations              • Map keys can only be Strings     Avro Benefits              • ...
Supported Languages       Implementation Core                                                                    Data file...
Framework - Performance     Comparison Metrics     Time to Serialize / Deserialize              • Avro is not the fastest,...
Framework - Performance         Comparison Charts                Size of serialized data                                  ...
Implementing Avro10   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Com...
Framework - Types      Generic               • All avro records are represented by a generic attribute/value data structur...
Using Reflect Type      Class<T> type =                                 SomeObject.getClass();      Schema schema =       ...
Using Specific Type      Class<T> type =                                 SomeObject.getClass();      Schema schema =      ...
Using the DataFileWriter      Only one more thing to do and that is to tell this writer where to write...           writer...
Don’t Forget About Reading      Class<T> type =                                 SomeObject.getClass();      Schema schema ...
Defining a Specific Schema      Create an Enum type: serverstate.avsc (name is arbitrary, extension is not)           {"ty...
Defining a Specific Schema      Create a regular data object: historical.avsc      { "type":"record",         "namespace":...
Maven18   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All ri...
Avro With Maven     Maven Plugins     • This plugin assists with the Maven build lifecycle (may not be necessary in all us...
RPC20   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All righ...
RPC      How to utilize an Avro RPC Server      • Define the Protocol      • Datatypes passed via RPC require use of speci...
Define the Protocol      • Create an AVDL file: historytracker.avdl (name is arbitrary, but the extension        is not)  ...
Create an RPC Server      Creating a server is fast and easy…           InetSocketAddress address =                  new I...
Create an RPC Client      Creating a client is easier than creating a server…           InetSocketAddress address =       ...
Resources25   Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. Al...
Resources      References      • Apache Website and Wiki           http://avro.apache.org           https://cwiki.apache.o...
Thanks for Attending                                                                Questions?                            ...
Próximos SlideShares
Carregando em…5
×

Avro - More Than Just a Serialization Framework - CHUG - 20120416

14.670 visualizações

Publicada em

View the accompanying video on vimeo: https://vimeo.com/40776630

Publicada em: Tecnologia, Educação
  • http://www.dbmanagement.info/Tutorials/Hadoop.htm #Hadoop #Avro #Cassandro #Drill #Flume Tutorial (Videos and Books)at $7.95
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Avro - More Than Just a Serialization Framework - CHUG - 20120416

  1. 1. http://avro.apache.org Apache Avro More Than Just A Serialization Framework Jim Scott Lead Engineer / Architect A ValueClick Company
  2. 2. Agenda • History / Overview • Serialization Framework • Supported Languages • Performance • Implementing Avro (Including Code Examples) • Avro with Maven • RPC (Including Code Examples) • Resources • Questions?2 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  3. 3. History / Overview3 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  4. 4. History / Overview Existing Serialization Frameworks • protobuf, thrift, avro, kryo, hessian, activemq-protobuf, scala, sbinary, google-gson, jackson/JSON, javolution, protostuff, woodstox, aalto, fast- infoset, xstream, java serialization, etc… Most popular frameworks • JAXB, Protocol Buffers, Thrift Avro Created by Doug Cutting, the Creator of Hadoop • Data is always accompanied by a schema: Support for dynamic typing--code generation is not required Supports schema evolution The data is not tagged resulting in smaller serialization size4 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  5. 5. Serialization Framework5 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  6. 6. Serialization Framework Avro Limitations • Map keys can only be Strings Avro Benefits • Interoperability Can serialize into Avro/Binary or Avro/JSON Supports reading and writing protobufs and thrift • Supports multiple languages • Rich data structures with a schema described via JSON A compact, fast, binary data format. A container file, to store persistent data (Schema ALWAYS available) Remote procedure call (RPC). • Simple integration with dynamic languages (via the generic type) Unlike other frameworks, an unknown schema is supported at runtime • Compressable and splittable by Hadoop MapReduce6 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  7. 7. Supported Languages Implementation Core Data file Codec RPC C yes yes deflate yes C++ yes yes ? yes C# yes no n/a no Java yes yes deflate, snappy yes Perl yes yes deflate no Python yes yes deflate, snappy yes Ruby yes yes deflate yes PHP yes yes ? no Core: Parse JSON schema, read / write binary schema Data file: Read / write avro data files RPC: Over HTTP Source: https://cwiki.apache.org/confluence/display/AVRO/Supported+Languages7 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  8. 8. Framework - Performance Comparison Metrics Time to Serialize / Deserialize • Avro is not the fastest, but is in the top half of all frameworks Object Creation • Avro falls to the bottom, because it always uses UTF-8 for Strings. In normal use cases this is not a problem, as this test was just to compare object creation, not object reuse. Size of Serialized Objects (Compressed w/ deflate or nothing) • Avro is only bested by Kryo by about 1 byte Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV28 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  9. 9. Framework - Performance Comparison Charts Size of serialized data Total time to serialize data Avro Source: http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV29 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  10. 10. Implementing Avro10 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  11. 11. Framework - Types Generic • All avro records are represented by a generic attribute/value data structure. This style is most useful for systems which dynamically process datasets based on user-provided scripts. For example, a program may be passed a data file whose schema has not been previously seen by the program and told to sort it by the field named "city". Specific • Each Avro record corresponds to a different kind of object in the programming language. For example, in Java, C and C++, a specific API would generate a distinct class or struct definition for each record definition. This style is used for programs written to process a specific schema. RPC systems typically use this. Reflect • Avro schemas are generated via reflection to correspond to existing programming language data structures. This may be useful when converting an existing codebase to use Avro with minimal modifications. Source: https://cwiki.apache.org/confluence/display/AVRO/Glossary11 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  12. 12. Using Reflect Type Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new ReflectDatumWriter(schema));12 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  13. 13. Using Specific Type Class<T> type = SomeObject.getClass(); Schema schema = SpecificData.get().getSchema(type); DataFileWriter writer = new DataFileWriter(new SpecificDatumWriter(schema));13 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  14. 14. Using the DataFileWriter Only one more thing to do and that is to tell this writer where to write... writer.create(schema, OutputStream); What if you want to append to an existing file instead of creating a new one? writer.appendTo(new File("Some File That exists")); Time to write... writer.append(object of type T);14 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  15. 15. Don’t Forget About Reading Class<T> type = SomeObject.getClass(); Schema schema = ReflectData.AllowNull.get().getSchema(type); SpecificData.get().getSchema(type); DatumReader datumReader = new SpecificDatumReader(schema); new ReflectDatumReader(schema); DataFileStream reader = new DataFileStream(inputStream, datumReader); reader.iterator(); Remember that compressed data? Reader reads it automatically!15 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  16. 16. Defining a Specific Schema Create an Enum type: serverstate.avsc (name is arbitrary, extension is not) {"type":"enum", "namespace":"com.yourcompany.avro", "name":"ServerState", "symbols":[ "STARTING", "IDLE", "ACTIVE", "STOPPING“, "STOPPED“ ]} Create an Exception type: wrongstate.avsc { "type":"error", "namespace":"com.yourcompany.avro", "name":“WrongServerStateException", "fields":[ { "name":"message", "type":"string“ } ]}16 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  17. 17. Defining a Specific Schema Create a regular data object: historical.avsc { "type":"record", "namespace":"com.yourcompany.avro", "name":"NewHistoricalMessage", "aliases": ["com.yourcompany.avro.datatypes.HistoricalMessage"], "fields":[ { "name":"dataSource", "type":[ "null", "string“ ]} } Aliases allow for schema evolution. All data objects that are generated are defined with simple JSON and the documentation is very straight forward. Source: http://avro.apache.org/docs/current/spec.html17 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  18. 18. Maven18 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  19. 19. Avro With Maven Maven Plugins • This plugin assists with the Maven build lifecycle (may not be necessary in all use cases) <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>build-helper-maven-plugin</artifactId> </plugin> • Compiles *.avdl, *.avpr, *.avsc, and *.genavro (define the goals accordingly) <plugin> <groupId>org.apache.avro</groupId> <artifactId>avro-maven-plugin</artifactId> </plugin> • Necessary for Avro to introspect generated rpc code (http://paranamer.codehaus.org/) <plugin> <groupId>com.thoughtworks.paranamer</groupId> <artifactId>paranamer-maven-plugin</artifactId> </plugin>19 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  20. 20. RPC20 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  21. 21. RPC How to utilize an Avro RPC Server • Define the Protocol • Datatypes passed via RPC require use of specific types • An implementation of the interface generated by the protocol • Create and start an instance of an Avro RPC Server in Java • Create a client based on the interface generated by the protocol21 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  22. 22. Define the Protocol • Create an AVDL file: historytracker.avdl (name is arbitrary, but the extension is not) @namespace("com.yourcompany.rpc") protocol HistoryTracker { import schema "historical.avsc"; import schema "serverstate.avsc"; import schema "wrongstate.avsc“; void somethingHappened( com.yourcompany.avro.NewHistoricalMessage Item) oneway; /** * You can add comments */ com.yourcompany.avro.ServerState getState() throws com.yourcompany.avro.WrongServerStateException; } .22 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  23. 23. Create an RPC Server Creating a server is fast and easy… InetSocketAddress address = new InetSocketAddress(hostname, port); Responder responder = new SpecificResponder(HistoryTracker.class, HistoryTrackerImpl); Server avroServer = new NettyServer(responder, address); avroServer.start(); • The HistoryTracker is the interface generated from the AVDL file • The HistoryTrackerImpl is an implementation of the HistoryTracker • There are other service implementations beyond Netty, e.g. HTTP23 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  24. 24. Create an RPC Client Creating a client is easier than creating a server… InetSocketAddress address = new InetSocketAddress(hostname, port); Transceiver transceiver = new NettyTransceiver(address); Object<rpcInterface> client = SpecificRequestor.getClient(HistoryTracker.class, transceiver); • The HistoryTracker is the interface generated from the AVDL file • There are other service implementations beyond Netty, e.g. HTTP24 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  25. 25. Resources25 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  26. 26. Resources References • Apache Website and Wiki http://avro.apache.org https://cwiki.apache.org/confluence/display/AVRO/Index • Benchmarking Serializaiton Frameworks http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2 • An Introduction to Avro (Chris Cooper) http://files.meetup.com/1634302/CHUG-ApacheAvro.pdf Resources • Mailing List: user@avro.apache.org26 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.
  27. 27. Thanks for Attending Questions? jscott@dotomi.com27 Not to be distributed without prior consent. Confidential. Copyright © 2011, Dotomi a ValueClick Company. All rights reserved.

×