SlideShare uma empresa Scribd logo
1 de 17
™
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
Avro
Avro Apache Avro Data
Serialization
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Apache Avro
❖ Data serialization system
❖ Data structures
❖ Binary data format
❖ Container file format to store persistent data
❖ RPC capabilities
❖ Does not require code generation to use
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Schemas
❖ Supports schemas for defining data structure
❖ Serializing and deserializing data, uses schema
❖ File schema
❖ Avro files store data with its schema
❖ RPC Schema
❖ RPC protocol exchanges schemas as part of the
handshake
❖ Schemas written in JSON
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro compared to…
❖ Similar to Thrift, Protocol Buffers, JSON, etc.
❖ Does not require code generation
❖ Avro needs less encoding as part of the data since it
stores names and types in the schema
❖ It supports evolution of schemas.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Schema
Avro schema stored in src/main/avro by default.
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Code Generation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Employee Code Generation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Using Generated Avro class
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Writing employees to an
Avro File
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Reading employees From a
File
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Using GenericRecord
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Writing Generic Records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Reading using Generic
Records
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro Schema Validation
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro supported types
❖ Records
❖ Arrays
❖ Enums
❖ Unions
❖ Maps
❖ Strings, Int, Boolean, Decimal, Timestamp, Date
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Fuller example Avro Schema
Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka
Consulting
™
Avro
❖ Fast data serialization
❖ Supports data structures
❖ Supports Records, Maps, Array, and basic types
❖ You can use it direct or use Code Generation
❖ Read more
❖ Kafka Training
❖ Kafka Consulting

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Highly efficient backups with percona xtrabackup
Highly efficient backups with percona xtrabackupHighly efficient backups with percona xtrabackup
Highly efficient backups with percona xtrabackup
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Destaque

Destaque (6)

Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Kafka website activity architecture
Kafka website activity architectureKafka website activity architecture
Kafka website activity architecture
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Processing IoT Data with Apache Kafka
Processing IoT Data with Apache KafkaProcessing IoT Data with Apache Kafka
Processing IoT Data with Apache Kafka
 

Semelhante a Avro Tutorial - Records with Schema for Kafka and Hadoop

Semelhante a Avro Tutorial - Records with Schema for Kafka and Hadoop (20)

Kafka Tutorial - Introduction to Apache Kafka (Part 2)
Kafka Tutorial - Introduction to Apache Kafka (Part 2)Kafka Tutorial - Introduction to Apache Kafka (Part 2)
Kafka Tutorial - Introduction to Apache Kafka (Part 2)
 
Kafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data ArchitectureKafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data Architecture
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Kafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examplesKafka Tutorial, Kafka ecosystem with clustering examples
Kafka Tutorial, Kafka ecosystem with clustering examples
 
Brief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming PlatformBrief introduction to Kafka Streaming Platform
Brief introduction to Kafka Streaming Platform
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
kafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdfkafka-tutorial-cloudruable-v2.pdf
kafka-tutorial-cloudruable-v2.pdf
 
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWSAmazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWS
 
Kafka Tutorial: Kafka Security
Kafka Tutorial: Kafka SecurityKafka Tutorial: Kafka Security
Kafka Tutorial: Kafka Security
 
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBSAmazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and Ops
 
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
Kafka MirrorMaker: Disaster Recovery, Scaling Reads, Isolate Mission Critical...
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
Apache Cassandra Interview Questions and Answers | Cassandra Tutorial | Cassa...
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
Triangle of Cassandra & Solr & Kafka
Triangle of Cassandra & Solr & KafkaTriangle of Cassandra & Solr & Kafka
Triangle of Cassandra & Solr & Kafka
 
spark-kafka_mod
spark-kafka_modspark-kafka_mod
spark-kafka_mod
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Avro Tutorial - Records with Schema for Kafka and Hadoop

  • 1. ™ Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting Avro Avro Apache Avro Data Serialization
  • 2. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Apache Avro ❖ Data serialization system ❖ Data structures ❖ Binary data format ❖ Container file format to store persistent data ❖ RPC capabilities ❖ Does not require code generation to use
  • 3. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Schemas ❖ Supports schemas for defining data structure ❖ Serializing and deserializing data, uses schema ❖ File schema ❖ Avro files store data with its schema ❖ RPC Schema ❖ RPC protocol exchanges schemas as part of the handshake ❖ Schemas written in JSON
  • 4. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro compared to… ❖ Similar to Thrift, Protocol Buffers, JSON, etc. ❖ Does not require code generation ❖ Avro needs less encoding as part of the data since it stores names and types in the schema ❖ It supports evolution of schemas.
  • 5. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Schema Avro schema stored in src/main/avro by default.
  • 6. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Code Generation
  • 7. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Employee Code Generation
  • 8. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Using Generated Avro class
  • 9. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Writing employees to an Avro File
  • 10. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Reading employees From a File
  • 11. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Using GenericRecord
  • 12. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Writing Generic Records
  • 13. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Reading using Generic Records
  • 14. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro Schema Validation
  • 15. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro supported types ❖ Records ❖ Arrays ❖ Enums ❖ Unions ❖ Maps ❖ Strings, Int, Boolean, Decimal, Timestamp, Date
  • 16. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Fuller example Avro Schema
  • 17. Cassandra / Kafka Support in EC2/AWS. Kafka Training, Kafka Consulting ™ Avro ❖ Fast data serialization ❖ Supports data structures ❖ Supports Records, Maps, Array, and basic types ❖ You can use it direct or use Code Generation ❖ Read more ❖ Kafka Training ❖ Kafka Consulting

Notas do Editor

  1. Apache Avro™ is a data serialization system. Avro provides data structures, binary data format, container file format to store persistent data and RPC capabilities. Avro does not require code generation to use. Integrates well with JavaScript, Python, Ruby and Java.
  2. Avro data format is defined by Avro schemas. When deserializing data, the schema is used. Data is serialized based on the schema, and schema is sent with data. Avro data plus schema is fully self-describing. When Avro files store data with its schema. Avro RPC is also based on schema. Part of the RPC protocol exchanges schemas as part of the handshake. When Avro is used in RPC, the client and server exchange schemas in the connection handshake. Avro schemas are written in JSON.
  3. Avro is similar to Thrift, Protocol Buffers, JSON, etc. Avro does not require code generation. Avro needs less encoding as part of the data since it stores names and types in the schema. It supports evolution of schemas.
  4. Example Schema: {"namespace": "com.cloudurable.phonebook", "type": "record", "name": "Employee", "fields": [ {"name": "firstName", "type": “string"}, {"name": "lastName", "type": "string"}, {"name": "age", "type": "int"}, {"name": "phoneNumber", "type": "string"} ] } Avro schema is just JSON.
  5. There are plugins for Maven and Gradle to generate code based on Avro schemas. This gradle-avro-plugin is a Gradle plugin that uses Avro tools to do Java code generation for Apache Avro. This plugin supports Avro schema files (avsc), and Avro RPC IDL (avdl). For Kafka you only need avsc. Notice that we did not generate setter methods. This makes the instances somewhat immutable.
  6. The plugin generates the files and puts them under build/generated-main-avro-java.
  7. The Employee class has a constructor and has a builder.
  8. The above shoes serializing an Employee list to disk. In Kafka, we will not be writing to disk directly. We are just showing how so you have a way to test Avro serialization, which is helpful when debugging schema incompatibilities. Note we create a DatumWriter, which converts Java instance into an in-memory serialized format. SpecificDatumWriter is used with generated classes like Employee. DataFileWriter writes the serialized records to the employee.avro file.
  9. The above deserializes employees from the employees.avro file. Deserializing is similar to serializing but in reverse. We create a SpecificDatumReader to converts in-memory serialized items into instances of our generated Employee class. The DatumReader reads records from the file by calling next. Another way to read is using forEach as follows: final DataFileReader<Employee> dataFileReader = new DataFileReader<>(file, empReader); dataFileReader.forEach(employeeList::add);
  10. You can use a generic record instead of using generated code.
  11. You can write to Avro files using Generic records as well.
  12. You can read from Avro files using generic records as well.
  13. Avro will validate the data types when it serializes and deserializes the data.
  14. The document https://avro.apache.org/docs/current/spec.html#Protocol+Declaration describes all of the supported types.
  15. The above has examples of default values, arrays, primitive types, Records within records, enums, and more.