Hasura 2.0 is our biggest release since we launched.
This webinar goes over the new capabilities in our v2 release:
- Connect to multiple databases
- Generate REST APIs
- Enhanced Authorisation
- Hasura Cloud & AWS VPC Peering
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
Google Cloud Dataflow is a fully managed service that allows users to build batch or streaming parallel data processing pipelines. It provides a unified programming model for batch and streaming workflows. Cloud Dataflow handles resource management and optimization to efficiently execute data processing jobs on Google Cloud Platform.
Grafana is an open source analytics and monitoring tool that allows users to visualize time series data from various databases in customizable dashboards. It supports advanced visualizations, alerting features, and reporting. Grafana works with time series databases like InfluxDB to collect, analyze, and visualize metrics data. Users can build dashboards to monitor servers and get alert notifications. Grafana is widely used across different domains for its flexibility and rich feature set.
This document discusses Redis, MongoDB, and Amazon DynamoDB. It begins with an overview of NoSQL databases and the differences between SQL and NoSQL databases. It then covers Redis data types like strings, hashes, lists, sets, sorted sets, and streams. Examples use cases for Redis are also provided like leaderboards, geospatial queries, and message queues. The document also discusses MongoDB design patterns like embedding data, embracing duplication, and relationships. Finally, it provides a high-level overview of DynamoDB concepts like tables, items, attributes, and primary keys.
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. With Amazon RDS, you can MySQL in minutes with cost-efficient and re-sizable hardware capacity. In this webinar, we'll discuss how to get the most out of the service, including techniques for migrating data in and out.
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
By Tom Wilkie, delivered at London Microservices User Group on 2/12/15
The rise of microservice-based applications has had many knock-on effects, not least on the complexity of monitoring your application. Order-of-magnitude increase in the number of moving parts and rate of change of the application require us to reassess traditional monitoring techniques.
In this talk we will discuss some different approaches to monitoring, visualising and tracing containerised, microservices-based applications. We’ll present different techniques to some of the emergent problems, and try not to rant too much.
Hasura 2.0 is our biggest release since we launched.
This webinar goes over the new capabilities in our v2 release:
- Connect to multiple databases
- Generate REST APIs
- Enhanced Authorisation
- Hasura Cloud & AWS VPC Peering
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
Google Cloud Dataflow is a fully managed service that allows users to build batch or streaming parallel data processing pipelines. It provides a unified programming model for batch and streaming workflows. Cloud Dataflow handles resource management and optimization to efficiently execute data processing jobs on Google Cloud Platform.
Grafana is an open source analytics and monitoring tool that allows users to visualize time series data from various databases in customizable dashboards. It supports advanced visualizations, alerting features, and reporting. Grafana works with time series databases like InfluxDB to collect, analyze, and visualize metrics data. Users can build dashboards to monitor servers and get alert notifications. Grafana is widely used across different domains for its flexibility and rich feature set.
This document discusses Redis, MongoDB, and Amazon DynamoDB. It begins with an overview of NoSQL databases and the differences between SQL and NoSQL databases. It then covers Redis data types like strings, hashes, lists, sets, sorted sets, and streams. Examples use cases for Redis are also provided like leaderboards, geospatial queries, and message queues. The document also discusses MongoDB design patterns like embedding data, embracing duplication, and relationships. Finally, it provides a high-level overview of DynamoDB concepts like tables, items, attributes, and primary keys.
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. With Amazon RDS, you can MySQL in minutes with cost-efficient and re-sizable hardware capacity. In this webinar, we'll discuss how to get the most out of the service, including techniques for migrating data in and out.
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.
In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.
By Tom Wilkie, delivered at London Microservices User Group on 2/12/15
The rise of microservice-based applications has had many knock-on effects, not least on the complexity of monitoring your application. Order-of-magnitude increase in the number of moving parts and rate of change of the application require us to reassess traditional monitoring techniques.
In this talk we will discuss some different approaches to monitoring, visualising and tracing containerised, microservices-based applications. We’ll present different techniques to some of the emergent problems, and try not to rant too much.
Event driven workloads on Kubernetes with KEDANilesh Gule
Slide deck of the presentation done at the Pune User Group on 27th February 2021. Demonstrate how Kubernetes based event driven autoscaling (KEDA) can be used with RabbitMQ as the event source.
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
Lakehouses are quickly growing in popularity as a new approach to Data Platform Architecture bringing some of the long-established benefits from OLTP world to OLAP, including transactions, record-level updates/deletes, and changes streaming. In this talk, we will discuss Apache Hudi and how it unlocks possibilities of building your own fully open-source Lakehouse featuring a rich set of integrations with existing technologies, including Apache Pulsar. In this session, we will present: - What Lakehouses are, and why they are needed. - What Apache Hudi is and how it works. - Provide a use-case and demo that applies Apache Hudi’s DeltaStreamer tool to ingest data from Apache Pulsar.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will start with quick recap of how we created systems over the past 20 years and how different architectures evolved from it. The talk will show how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so.
Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
The document compares the query execution plans produced by Apache Hive and PostgreSQL. It shows that Hive's old-style execution plans are overly verbose and difficult to understand, providing many low-level details across multiple stages. In contrast, PostgreSQL's plans are more concise and readable, showing the logical query plan in a top-down manner with actual table names and fewer lines of text. The document advocates for Hive to adopt a simpler execution plan format similar to PostgreSQL's.
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
Managing Apache Kafka sometimes could be cumbersome, and that's something that we would like to avoid, especially for developers and data engineers that need to build and develop data pipelines.
Luckily, Kubernetes and Kafka's combination helps us reduce everyday tasks tremendously by adding myriad capabilities to lessen the complexity of managing clusters.
Kafka Connect and KSQLDB are a fantastic combo to add to your streaming stack. These two soldiers can facilitate data acquisition and processing and also provide outstanding real-time ETL capabilities. But what if you need an OLAP datastore to answer complex queries with a low-latency response, that's where Apache Pinot comes to play.
At this session, you're going to learn:
- Effective Kafka deployment on Kubernetes
- How to properly configure Kafka Connect and KSQLDB
- Integrate Apache Pinot to answer OLAP queries
Apache Kafka is an open-source distributed event streaming platform used for building real-time data pipelines and streaming apps. It was developed by LinkedIn in 2011 to solve problems with data integration and processing. Kafka uses a publish-subscribe messaging model and is designed to be fast, scalable, and durable. It allows both streaming and storage of data and acts as a central data backbone for large organizations.
This document provides an overview of Apache Airflow, an open-source workflow management system. It describes Airflow's key features like workflow definition using directed acyclic graphs (DAGs), rich UI, scheduler, operators for tasks like databases and web services, and use of Jinja templating. The document also discusses Airflow's architecture with parallel execution, UI, command line operations like backfilling, and security features. Airflow is used by over 200 companies for workflows like ETL, analytics, and machine learning pipelines.
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
Apache Hudi is a data lake platform, that provides streaming primitives (upserts/deletes/change streams) on top of data lake storage. Hudi powers very large data lakes at Uber, Robinhood and other companies, while being pre-installed on four major cloud platforms.
Hudi supports exactly-once, near real-time data ingestion from Apache Kafka to cloud storage, which is typically used in-place of a S3/HDFS sink connector to gain transactions and mutability. While this approach is scalable and battle-tested, it can only ingest data in mini batches, leading to lower data freshness. In this talk, we introduce a Kafka Connect Sink Connector for Apache Hudi, which writes data straight into Hudi's log format, making the data immediately queryable, while Hudi's table services like indexing, compaction, clustering work behind the scenes, to further re-organize for better query performance.
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will start with quick recap of how we created systems over the past 20 years and how different architectures evolved from it. The talk will show how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so.
Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help boost feelings of calmness, happiness and focus.
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...confluent
(Guido Schmutz, Trivadis) Kafka Summit SF 2018
Internet of Things use cases are a perfect match for processing with a streaming platform such as Kafka and the Confluent Platform. Some of the questions to be answered are: How do we feed the data from our devices into Kafka? Do we directly send data to Kafka? Is Kafka accessible from outside the organization over the internet? What if we want to use a more specific IoT protocol such as MQTT or CoAP in between? How would we integrate it with Kafka? How can we enrich IoT streaming data with static data sitting in a traditional system?
This session will provide answers to these and other questions using a fictitious use case of a trucking company. Trucks are constantly sending data about position and driving habits, which can be used to derive real-time information and actions. A large part of the presentation will be a live demo. The demo will show the implementation of the pipeline incrementally: starting with sending the truck movement events directly to Kafka, then adding MQTT to the sensor data ingestion, followed by using Kafka Streams and KSQL to apply stream processing on the information received. The final pipeline will demonstrate the application of Kafka Connect with MQTT and JDBC source connectors for data ingestion and event stream enrichment, and Kafka Streams and KSQL for stream processing. The key takeaway is the live demonstration of a working end-to-end IoT streaming data ingestion pipeline using Kafka technologies.
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder.
Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.
This document discusses autoscaling in Kubernetes. It describes horizontal and vertical autoscaling, and how Kubernetes can autoscale nodes and pods. For nodes, it proposes using Google Compute Engine's managed instance groups and cloud autoscaler to automatically scale the number of nodes based on resource utilization. For pods, it discusses using an autoscaler controller to scale the replica counts of replication controllers based on metrics from cAdvisor or Google Cloud Monitoring. Issues addressed include rebalancing pods and handling autoscaling during rolling updates.
Shaping serverless architecture with domain driven design patternsAsher Sterkin
This document discusses using Domain-Driven Design (DDD) patterns to structure serverless applications. It introduces DDD concepts like bounded contexts, aggregates, repositories, and CQRS. Bounded contexts separate domains into cohesive models that are loosely coupled. Aggregates define transactional boundaries and ensure data integrity. Repositories provide storage and retrieval of aggregates. CQRS separates commands and queries using different data models. Applying these DDD patterns can help organize serverless applications as they grow in complexity.
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.
About VisualDNA Architecture @ Rubyslava 2014Michal Harish
Michal Hariš provides an overview of the evolution of VisualDNA's data architecture over the past 3 years. Originally, 10 people managed a single MySQL table holding 50M user profiles. They transitioned to using Cassandra and Hadoop to address scalability issues. Currently, they have a 120 person team using a lambda architecture with Java, Scala, Hadoop, Cassandra, Kafka, Redis, R and AngularJS. Real-time processing of 8.5k events/second is done alongside batch pipelines and machine learning. They have learned lessons around system design, testing, and remote collaboration while addressing challenges such as globally distributed APIs and bottlenecks in their data pipeline.
Event driven workloads on Kubernetes with KEDANilesh Gule
Slide deck of the presentation done at the Pune User Group on 27th February 2021. Demonstrate how Kubernetes based event driven autoscaling (KEDA) can be used with RabbitMQ as the event source.
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
Lakehouses are quickly growing in popularity as a new approach to Data Platform Architecture bringing some of the long-established benefits from OLTP world to OLAP, including transactions, record-level updates/deletes, and changes streaming. In this talk, we will discuss Apache Hudi and how it unlocks possibilities of building your own fully open-source Lakehouse featuring a rich set of integrations with existing technologies, including Apache Pulsar. In this session, we will present: - What Lakehouses are, and why they are needed. - What Apache Hudi is and how it works. - Provide a use-case and demo that applies Apache Hudi’s DeltaStreamer tool to ingest data from Apache Pulsar.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will start with quick recap of how we created systems over the past 20 years and how different architectures evolved from it. The talk will show how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so.
Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
The document compares the query execution plans produced by Apache Hive and PostgreSQL. It shows that Hive's old-style execution plans are overly verbose and difficult to understand, providing many low-level details across multiple stages. In contrast, PostgreSQL's plans are more concise and readable, showing the logical query plan in a top-down manner with actual table names and fewer lines of text. The document advocates for Hive to adopt a simpler execution plan format similar to PostgreSQL's.
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
Managing Apache Kafka sometimes could be cumbersome, and that's something that we would like to avoid, especially for developers and data engineers that need to build and develop data pipelines.
Luckily, Kubernetes and Kafka's combination helps us reduce everyday tasks tremendously by adding myriad capabilities to lessen the complexity of managing clusters.
Kafka Connect and KSQLDB are a fantastic combo to add to your streaming stack. These two soldiers can facilitate data acquisition and processing and also provide outstanding real-time ETL capabilities. But what if you need an OLAP datastore to answer complex queries with a low-latency response, that's where Apache Pinot comes to play.
At this session, you're going to learn:
- Effective Kafka deployment on Kubernetes
- How to properly configure Kafka Connect and KSQLDB
- Integrate Apache Pinot to answer OLAP queries
Apache Kafka is an open-source distributed event streaming platform used for building real-time data pipelines and streaming apps. It was developed by LinkedIn in 2011 to solve problems with data integration and processing. Kafka uses a publish-subscribe messaging model and is designed to be fast, scalable, and durable. It allows both streaming and storage of data and acts as a central data backbone for large organizations.
This document provides an overview of Apache Airflow, an open-source workflow management system. It describes Airflow's key features like workflow definition using directed acyclic graphs (DAGs), rich UI, scheduler, operators for tasks like databases and web services, and use of Jinja templating. The document also discusses Airflow's architecture with parallel execution, UI, command line operations like backfilling, and security features. Airflow is used by over 200 companies for workflows like ETL, analytics, and machine learning pipelines.
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
Apache Hudi is a data lake platform, that provides streaming primitives (upserts/deletes/change streams) on top of data lake storage. Hudi powers very large data lakes at Uber, Robinhood and other companies, while being pre-installed on four major cloud platforms.
Hudi supports exactly-once, near real-time data ingestion from Apache Kafka to cloud storage, which is typically used in-place of a S3/HDFS sink connector to gain transactions and mutability. While this approach is scalable and battle-tested, it can only ingest data in mini batches, leading to lower data freshness. In this talk, we introduce a Kafka Connect Sink Connector for Apache Hudi, which writes data straight into Hudi's log format, making the data immediately queryable, while Hudi's table services like indexing, compaction, clustering work behind the scenes, to further re-organize for better query performance.
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
What is a Microservices architecture and how does it differ from a Service-Oriented Architecture? Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will start with quick recap of how we created systems over the past 20 years and how different architectures evolved from it. The talk will show how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so.
Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
Real-time event processing monitors the incoming data stream and initiates action based on detected events like fraud, error or performance degradation. These events are often used to issue alerts and notifications, take responsive action, or to populate a monitoring dashboard. In this session, we will walk through different use cases for event processing and demonstrate how to build a scalable pipeline for tracking IoT device status. AWS services to be covered include: AWS Lambda and the Kinesis Client Library (KCL).
Kafka Streams is a new stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. As such it is the most convenient yet scalable option to analyze, transform, or otherwise process data that is backed by Kafka. We will provide the audience with an overview of Kafka Streams including its design and API, typical use cases, code examples, and an outlook of its upcoming roadmap. We will also compare Kafka Streams' light-weight library approach with heavier, framework-based tools such as Spark Streaming or Storm, which require you to understand and operate a whole different infrastructure for processing real-time data in Kafka.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help boost feelings of calmness, happiness and focus.
Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams...confluent
(Guido Schmutz, Trivadis) Kafka Summit SF 2018
Internet of Things use cases are a perfect match for processing with a streaming platform such as Kafka and the Confluent Platform. Some of the questions to be answered are: How do we feed the data from our devices into Kafka? Do we directly send data to Kafka? Is Kafka accessible from outside the organization over the internet? What if we want to use a more specific IoT protocol such as MQTT or CoAP in between? How would we integrate it with Kafka? How can we enrich IoT streaming data with static data sitting in a traditional system?
This session will provide answers to these and other questions using a fictitious use case of a trucking company. Trucks are constantly sending data about position and driving habits, which can be used to derive real-time information and actions. A large part of the presentation will be a live demo. The demo will show the implementation of the pipeline incrementally: starting with sending the truck movement events directly to Kafka, then adding MQTT to the sensor data ingestion, followed by using Kafka Streams and KSQL to apply stream processing on the information received. The final pipeline will demonstrate the application of Kafka Connect with MQTT and JDBC source connectors for data ingestion and event stream enrichment, and Kafka Streams and KSQL for stream processing. The key takeaway is the live demonstration of a working end-to-end IoT streaming data ingestion pipeline using Kafka technologies.
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
Building data pipelines is pretty hard! Building a multi-datacenter active-active real time data pipeline for multiple classes of data with different durability, latency and availability guarantees is much harder.
Real time infrastructure powers critical pieces of Uber (think Surge) and in this talk we will discuss our architecture, technical challenges, learnings and how a blend of open source infrastructure (Apache Kafka and Samza) and in-house technologies have helped Uber scale.
This document discusses autoscaling in Kubernetes. It describes horizontal and vertical autoscaling, and how Kubernetes can autoscale nodes and pods. For nodes, it proposes using Google Compute Engine's managed instance groups and cloud autoscaler to automatically scale the number of nodes based on resource utilization. For pods, it discusses using an autoscaler controller to scale the replica counts of replication controllers based on metrics from cAdvisor or Google Cloud Monitoring. Issues addressed include rebalancing pods and handling autoscaling during rolling updates.
Shaping serverless architecture with domain driven design patternsAsher Sterkin
This document discusses using Domain-Driven Design (DDD) patterns to structure serverless applications. It introduces DDD concepts like bounded contexts, aggregates, repositories, and CQRS. Bounded contexts separate domains into cohesive models that are loosely coupled. Aggregates define transactional boundaries and ensure data integrity. Repositories provide storage and retrieval of aggregates. CQRS separates commands and queries using different data models. Applying these DDD patterns can help organize serverless applications as they grow in complexity.
It covers a brief introduction to Apache Kafka Connect, giving insights about its benefits,use cases, motivation behind building Kafka Connect.And also a short discussion on its architecture.
This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.
About VisualDNA Architecture @ Rubyslava 2014Michal Harish
Michal Hariš provides an overview of the evolution of VisualDNA's data architecture over the past 3 years. Originally, 10 people managed a single MySQL table holding 50M user profiles. They transitioned to using Cassandra and Hadoop to address scalability issues. Currently, they have a 120 person team using a lambda architecture with Java, Scala, Hadoop, Cassandra, Kafka, Redis, R and AngularJS. Real-time processing of 8.5k events/second is done alongside batch pipelines and machine learning. They have learned lessons around system design, testing, and remote collaboration while addressing challenges such as globally distributed APIs and bottlenecks in their data pipeline.
This document provides an overview of Hazelcast, an in-memory distributed computing platform. It discusses Hazelcast's capabilities for distributed caching, data grids, and real-time processing. The document also provides details on Hazelcast's commercial support offerings, recent 3.7 release features including modularity and performance improvements, and roadmap including new client libraries and cloud integration.
The document provides a roadmap for Oracle Coherence, including upcoming versions, new capabilities, and use cases. Key points include:
- Coherence 19 will support JDK 8 and 11 and include new development, runtime, and cloud capabilities.
- Common use cases for Coherence include as a web application platform, fast data platform, and for offloading backend/legacy systems.
- New features in recent/upcoming versions include persistence, federated caching, security improvements, and leveraging Java 8 features like lambdas and streams.
The document outlines Neo4j's product strategy and roadmap. It discusses trends like increasing cloud adoption and the blending of transactional and analytical use cases. The roadmap focuses on cloud-first capabilities, ease of use for developers, trusted fundamentals of the database, and enabling AI through graph algorithms and knowledge graphs. Key announcements include new graph algorithms, change data capture for integration, autonomous clustering for scalability, and innovations in graph embeddings and generative AI integration.
Cask Webinar
Date: 08/10/2016
Link to video recording: https://www.youtube.com/watch?v=XUkANr9iag0
In this webinar, Nitin Motgi, CTO of Cask, walks through the new capabilities of CDAP 3.5 and explains how your organization can benefit.
Some of the highlights include:
- Enterprise-grade security - Authentication, authorization, secure keystore for storing configurations. Plus integration with Apache Sentry and Apache Ranger.
- Preview mode - Ability to preview and debug data pipelines before deploying them.
- Joins in Cask Hydrator - Capabilities to join multiple data sources in data pipelines
- Real-time pipelines with Spark Streaming - Drag & drop real-time pipelines using Spark Streaming.
- Data usage analytics - Ability to report application usage of data sets.
- And much more!
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Sandipan Chakraborty, Director of Engineering (Rakuten)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
What’s New in Cloudera Enterprise 6.0: The Inside Scoop 6.14.18Cloudera, Inc.
Webinar on Cloudera Enterprise 6.0 where we will discuss how to build new applications on the modern platform for machine learning and analytics. This webinar will take a look at the latest software enhancements and how they’ll help you improve your productivity and innovate new analytics applications.
MyDBOPS Team has presented on Oracle MySQL user Camp ( 29-07-2016 ). This presentation is about Grafana and Prometheus for MySQL alerting and Dashboard setup.
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent
When moving to a cloud native architecture Moogsoft knew they needed more scale than Rabbit could provide. Moogsoft moved into Kafka which is known for quick writing and driving heavy event driven workloads on top of niceties such as replayability. Choosing the tool was easy, finding a vendor that ticked all their boxes was not. They needed to ensure scalability, upgradability, builds via existing IAC pipelines, and observability via existing tools. When Moogsoft found Aiven, they were impressed with their offering and ability to scale on demand. During this presentation we will explore how Moogsoft used Aiven for Kafka to manage and scale their data in the cloud.
These are the slides of the second talk of the first Tech Talk@TransferWise Singapore, which happened on the 23rd of November 2017.
These slides share how TransferWise codebase is moving from a monolith architecture to a microservices architecture.
Analytics at the Speed of Thought: Actian Express Overview Actian Corporation
Deliver faster insight – reduce query response times to seconds
Analyze more data faster – explore billions of rows of data in seconds
More concurrent users – enable more concurrent BI users to explore more data
.NET Usergroup Oldenburg 26. März 2015 - von Winfried Klinker und Andre Hühn
Microsoft Azure gehört zu den Cloud-Diensten, die Microsoft anbietet. Es umfasst neben dem Hosting von virtuellen Maschinen insbesondere eine große Sammlung an Diensten (wie SQL Azure, Mobile Services, Machine Learning).
Wir geben einen ersten Überblick über die Features von Azure insbesondere für Entwickler. Dabei werden wir sowohl auf die Platform as a Service (PaaS) Angebote wie auch auf die Infrastructe as a Service (IaaS) eingehen. Außerdem geben wir einen Einblick in moderne Cloud Architektur und zeigen Best Practices bei der Cloud Entwicklung auf. Dabei werden Beispiele aus der Praxis zeigen, wie man eine Fehlertolerante und robuste Cloud Lösung erstellen kann.
Über die Sprecher:
Winfried Klinker ist als Software Architekt bei der Firma Sitrion in Oldenburg tätig. Er beschäftigt sich größtenteils mit Cloud Architekturen mit Microsoft Azure vor allem in Bezug auf Backends für mobile Anwendungen.
Andre Hühn ist Team Lead für Entwicklung mobiler Apps bei der Firma Sitrion in Oldenburg und beeinflusst damit die Richtung der Architektur für das Sitrion ONE Produkt.
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...confluent
(Bruno Simic, Solutions Engineer, Couchbase)
Breakout during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
From Data to Services at the Speed of BusinessAli Hodroj
From Data to Services at the Speed of Business: Applying cloud-native paradigm to combine fast data analytics with microservices architecture for hybrid workloads.
The document discusses a Cloud OnAir event about database management and databases. It includes an agenda that covers overviews of Cloud SQL, Cloud Memorystore, Cloud Spanner, and Cloud Firestore updates. Several speakers are also listed that will provide presentations on choosing the right database, database migrations, database modernization, and specific database services like Cloud Spanner and Cloud Bigtable.
Lyftrondata enables enterprises to load data from 300+ connectors to Google Bigquery in minutes without any engineering requirements. Simply connect, organize, centralize and share your data on Bigquery with zero code data pipeline, ETL & ELT tool.
Azure provides cloud computing services including computing, analytics, networking, storage, and more. It offers virtual machines, databases, websites, and other services that can be accessed from anywhere and scaled up as needed. Azure aims to provide enterprise-grade services that are economical, scalable, and hybrid-ready to work with existing on-premises systems. It has data centers across the world and over 600,000 servers to provide its services globally at scale.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Top 9 Trends in Cybersecurity for 2024.pptxdevvsandy
Security and risk management (SRM) leaders face disruptions on technological, organizational, and human fronts. Preparation and pragmatic execution are key for dealing with these disruptions and providing the right cybersecurity program.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
2. Intro to Hasura
● globally distributed team, with offices in San Francisco
● make data access fast, secure & scalable
○ a world where data delivery is just another piece of infrastructure
● Written in Haskell
● Support PostgreSQL, MySQL, SQL server & soon BigQuery
3. Access data is a bottleneck for developers, Hasura solves this
5. Features
● Powerful Querying.
● Subscriptions and Live Queries.
● Dynamic Access Control.
● Write business logic backed by REST APIs.
● Remote Schemas.
● Event Triggers for serverless.
● Adding to an existing Postgres database.
● Admin UI & Schema migrations.
6. Benefits
Tangible impact on
● Time to market
○ Self-serve APIs (GraphQL + Declarative API engine)
● API Performance
○ 5x-100x faster vs DIY
● Uptime & Reliability
○ Distributed, HA, Fault-tolerant
● Security
○ Easy & Observable
Designed for
● New applications
● incremental modernization of existing applications
7. The community
● 20K+ Github starts
● 370+ Contributors
● Fastest growing graphQL project
● 100+ million downloads
8. My challenges & experience
● Meta data & config management
● Authorization (RBAC & ABAC combination)
● Caching