High Availability can be a curiously nebulous term, and most people probably don't care about it until they can't access their online banking service, or their plane crashes.
This presentation examines some of the considerations necessary when building highly available computer systems, then focuses on the HA infrastructure software currently available from the Corosync/OpenAIS, Linux-HA and Pacemaker projects.
Originally presented at Linux Users Victoria in April 2010 (http://luv.asn.au/2010/04/06)
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward
Over the past year, we've seen users build entire event-driven applications such as social networks on top of Apache Flink (Drivetribe.com), elevating the importance of state management in Flink to a whole new level. Users are placing more and more data as state in Flink, using it as a replacement to conventional databases. With such mission-critical data entrusted to Flink, we need to provide similar capabilities of a database to users. One of such capabilities is being flexible in how data is persisted and represented. Specifically, how can I change how my state is serialized and stored, or even the schema of my state, as business logic changes over time? In this talk, we'll provide details on the latest state management features in Flink that allows users to do exactly that. We'll talk about how Flink manages the state for you, how it provides flexibility to the user to adapt to evolving state serialization formats and schemas, and the best practices when working with it.
Performance Schema is a powerful diagnostic instrument for:
- Query performance
- Complicated locking issues
- Memory leaks
- Resource usage
- Problematic behavior, caused by inappropriate settings
- More
It comes with hundreds of options which allow precisely tuning what to instrument. More than 100 consumers store collected data.
In this tutorial, we will try all the important instruments out. We will provide a test environment and a few typical problems which could be hardly solved without Performance Schema. You will not only learn how to collect and use this information but have experience with it.
Tutorial at Percona Live Austin 2019
Windows Registry Forensics with Volatility FrameworkKapil Soni
Windows Registry Forensics is the most important part of Memory Forensics Investigations. With the help of Windows Registry Forensics we can reconstruct user activity as well find the evidence easily.
Windows Registry Forensics (WRF) is a one of most important part on malware analysis. The changes made due to malware on Windows that reflect on Registry.
If attacker tried to make changes on Windows OS so all the logs like opening, deleting, modifying folder or file as well if attacker executed a file like .exe , everything is stores in Windows Registry that helps investigator to catch cyber criminal.
Simple REST-APIs with Dropwizard and SwaggerLeanIX GmbH
During the VOXXED Days in Berlin on 29 January 2016 Bernd Schönbach from LeanIX demonstrated an easy way to create well documented and implemented REST-APIs using the Dropwizard Library for the implementation and Swagger for easy Documentation.
===
LeanIX offers an innovative software-as-a-service solution for Enterprise Architecture Management (EAM), based either in a public cloud or the client’s data center.
Companies like Adidas, Axel Springer, Helvetia, RWE, Trusted Shops and Zalando use LeanIX Enterprise Architecture Management tool.
Free Trial: http://bit.ly/LeanIXFreeTrial
The document discusses computer forensics in the context of investigating a Windows system. It outlines the process of gathering volatile data like memory contents and network connections using tools run from a trusted CD. Non-volatile data like the filesystem is acquired by imaging the entire disk. Timeline analysis uses data from files, registry keys and logs to determine when files and events occurred. The goal is to methodically identify and preserve digital evidence while following forensic standards.
High Availability can be a curiously nebulous term, and most people probably don't care about it until they can't access their online banking service, or their plane crashes.
This presentation examines some of the considerations necessary when building highly available computer systems, then focuses on the HA infrastructure software currently available from the Corosync/OpenAIS, Linux-HA and Pacemaker projects.
Originally presented at Linux Users Victoria in April 2010 (http://luv.asn.au/2010/04/06)
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache FlinkFlink Forward
Over the past year, we've seen users build entire event-driven applications such as social networks on top of Apache Flink (Drivetribe.com), elevating the importance of state management in Flink to a whole new level. Users are placing more and more data as state in Flink, using it as a replacement to conventional databases. With such mission-critical data entrusted to Flink, we need to provide similar capabilities of a database to users. One of such capabilities is being flexible in how data is persisted and represented. Specifically, how can I change how my state is serialized and stored, or even the schema of my state, as business logic changes over time? In this talk, we'll provide details on the latest state management features in Flink that allows users to do exactly that. We'll talk about how Flink manages the state for you, how it provides flexibility to the user to adapt to evolving state serialization formats and schemas, and the best practices when working with it.
Performance Schema is a powerful diagnostic instrument for:
- Query performance
- Complicated locking issues
- Memory leaks
- Resource usage
- Problematic behavior, caused by inappropriate settings
- More
It comes with hundreds of options which allow precisely tuning what to instrument. More than 100 consumers store collected data.
In this tutorial, we will try all the important instruments out. We will provide a test environment and a few typical problems which could be hardly solved without Performance Schema. You will not only learn how to collect and use this information but have experience with it.
Tutorial at Percona Live Austin 2019
Windows Registry Forensics with Volatility FrameworkKapil Soni
Windows Registry Forensics is the most important part of Memory Forensics Investigations. With the help of Windows Registry Forensics we can reconstruct user activity as well find the evidence easily.
Windows Registry Forensics (WRF) is a one of most important part on malware analysis. The changes made due to malware on Windows that reflect on Registry.
If attacker tried to make changes on Windows OS so all the logs like opening, deleting, modifying folder or file as well if attacker executed a file like .exe , everything is stores in Windows Registry that helps investigator to catch cyber criminal.
Simple REST-APIs with Dropwizard and SwaggerLeanIX GmbH
During the VOXXED Days in Berlin on 29 January 2016 Bernd Schönbach from LeanIX demonstrated an easy way to create well documented and implemented REST-APIs using the Dropwizard Library for the implementation and Swagger for easy Documentation.
===
LeanIX offers an innovative software-as-a-service solution for Enterprise Architecture Management (EAM), based either in a public cloud or the client’s data center.
Companies like Adidas, Axel Springer, Helvetia, RWE, Trusted Shops and Zalando use LeanIX Enterprise Architecture Management tool.
Free Trial: http://bit.ly/LeanIXFreeTrial
The document discusses computer forensics in the context of investigating a Windows system. It outlines the process of gathering volatile data like memory contents and network connections using tools run from a trusted CD. Non-volatile data like the filesystem is acquired by imaging the entire disk. Timeline analysis uses data from files, registry keys and logs to determine when files and events occurred. The goal is to methodically identify and preserve digital evidence while following forensic standards.
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
- The data onboarding process involves systematically bringing new data sources into Splunk to make the data instantly usable and valuable for users
- The process includes pre-boarding activities like identifying the data, mapping fields, and building index-time and search-time configurations
- It also involves deploying any necessary infrastructure, deploying the configurations, testing and validating the data, and getting user approval before the process is complete
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Chartbeat measures and monetizes attention on the web. They were experiencing slow load times and TCP retransmissions due to default system settings. Tuning various TCP, NGINX and EC2 ELB settings like increasing buffers, disabling Nagle's algorithm, and enabling HTTP keep-alive resolved the issues and improved performance. These included tuning settings like net.ipv4.tcp_max_syn_backlog, net.core.somaxconn, and nginx listen backlog values.
This talk is from Distributed Data Summit SF 2018 - http://distributeddatasummit.com/2018-sf/sessions#netflix2
Operating C* can involve a lot of required manpower, complex automation, or both. Some of this complexity comes from operational/configuration activity of the underlying kernel and hardware but much of it is operation complexity stemming from C* itself. Some examples of this complexity are restarting the database in a safe way, reliability backing up and restoring snapshots, monitoring the health of the datastore, and even ensuring eventual consistency through repair. As a result of these complexities, C* operators end up with complicated operational setups, which are expensive to build, manage and monitor. As part of this talk, we will share lessons learned in managing such complexity via our Priam sidecar including recent innovations in how our sidecar ensures the highest possible uptime and correctness of Cassandra. We then use this to motivate building in the management sidecar directly as part of C* itself (CASSANDRA-14395).
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...InfluxData
InfluxDB IOx Tech Talks
This talk presents a design of a distributed database system that splits data to gain query performance. The talk will define four main properties of data splitting: sharding, partitioning, sorting, and encoding; and then delve into examples to show their impacts on query performance.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
The document introduces the Scylla Operator for Kubernetes, which provides a management layer for Scylla on Kubernetes. It addresses some limitations of using StatefulSets alone to run Scylla, such as safe scale down operations and tracking member identity. The operator implements the controller pattern with custom resources to deploy and manage Scylla clusters on Kubernetes. It handles tasks like cluster creation and scale up/down while addressing issues like local storage failures.
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Este documento describe las funcionalidades del programa Nmap, una herramienta de escaneo de red open source que permite realizar mapeos de red, identificar puertos y servicios abiertos, determinar sistemas operativos y versiones de servicios. Explica cómo realizar escaneos básicos, opciones de paralelismo, visualización, salida de resultados, detección de sistemas operativos, scripts NSE y Zenmap como interfaz gráfica.
This document discusses using MySQL 5.1 partitions to boost performance with large datasets. It begins with an introduction of the author Giuseppe Maxia. It then discusses defining problems with too much data like not enough RAM. It explains what partitions are, how to create them, and when they should be used, such as for large tables or historical data. Hands-on examples are provided using the MySQL employees test database partitioned by year. Partitions provided faster queries and deleting of records compared to non-partitioned tables. Some pitfalls and best practices are also covered.
Deploying Flink on Kubernetes - David AndersonVerverica
Kubernetes has rapidly established itself as the de facto standard for orchestrating containerized infrastructures. And with the recent completion of the refactoring of Flink's deployment and process model known as FLIP-6, Kubernetes has become a natural choice for Flink deployments. In this talk we will walk through how to get Flink running on Kubernetes
This document discusses Go programming patterns and best practices presented by MegaEase, an enterprise cloud native architecture provider. It covers topics like slices, interfaces, performance optimization, and common Go mistakes. Examples are provided to demonstrate slice internals, deep comparison, interface patterns, and how to check interface compliance.
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
Writing and testing high frequency trading engines in javaPeter Lawrey
JavaOne presentation of Writing and Testing High Frequency Trading Engines in Java. Talk looks at low latency trading, thread affinity, lock free code, ultra low garbage and low latency persistence and IPC.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
Network Scanning Phases and Supporting ToolsJoseph Bugeja
This presentation focuses on the network penetration scanning phase. It introduces tools and techniques that professional pen-testers and ethical hackers need to master to find target machines, openings on those targets and vulnerabilities.
Trino (formerly known as PrestoSQL) is an open source distributed SQL query engine for running fast analytical queries against data sources of all sizes. Some key updates since being rebranded from PrestoSQL to Trino include new security features, language features like window functions and temporal types, performance improvements through dynamic filtering and partition pruning, and new connectors. Upcoming improvements include support for MERGE statements, MATCH_RECOGNIZE patterns, and materialized view enhancements.
User Behavior Analytics Using Machine LearningDNIF
In this presentation we talk about:
- Introduction to user behavior analytics.
- Classifying malicious IP using machine learning.
- User behavior analytics using machine learning.
You can watch the complete demonstration video here: https://youtu.be/HfpjLR6ZwIU?t=3550
SplunkLive! Presentation - Data Onboarding with SplunkSplunk
- The data onboarding process involves systematically bringing new data sources into Splunk to make the data instantly usable and valuable for users
- The process includes pre-boarding activities like identifying the data, mapping fields, and building index-time and search-time configurations
- It also involves deploying any necessary infrastructure, deploying the configurations, testing and validating the data, and getting user approval before the process is complete
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
Kubernetes has the concept of resource requests and limits. Pods get scheduled on the nodes based on their requests and optionally limited in how much of the resource they can consume. Understanding and optimizing resource requests/limits is crucial both for reducing resource "slack" and ensuring application performance/low-latency. This talk shows our approach to monitoring and optimizing Kubernetes resources for 80+ clusters to achieve cost-efficiency and reducing impact for latency-critical applications. All shown tools are Open Source and can be applied to most Kubernetes deployments.
Chartbeat measures and monetizes attention on the web. They were experiencing slow load times and TCP retransmissions due to default system settings. Tuning various TCP, NGINX and EC2 ELB settings like increasing buffers, disabling Nagle's algorithm, and enabling HTTP keep-alive resolved the issues and improved performance. These included tuning settings like net.ipv4.tcp_max_syn_backlog, net.core.somaxconn, and nginx listen backlog values.
This talk is from Distributed Data Summit SF 2018 - http://distributeddatasummit.com/2018-sf/sessions#netflix2
Operating C* can involve a lot of required manpower, complex automation, or both. Some of this complexity comes from operational/configuration activity of the underlying kernel and hardware but much of it is operation complexity stemming from C* itself. Some examples of this complexity are restarting the database in a safe way, reliability backing up and restoring snapshots, monitoring the health of the datastore, and even ensuring eventual consistency through repair. As a result of these complexities, C* operators end up with complicated operational setups, which are expensive to build, manage and monitor. As part of this talk, we will share lessons learned in managing such complexity via our Priam sidecar including recent innovations in how our sidecar ensures the highest possible uptime and correctness of Cassandra. We then use this to motivate building in the management sidecar directly as part of C* itself (CASSANDRA-14395).
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...InfluxData
InfluxDB IOx Tech Talks
This talk presents a design of a distributed database system that splits data to gain query performance. The talk will define four main properties of data splitting: sharding, partitioning, sorting, and encoding; and then delve into examples to show their impacts on query performance.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
The document introduces the Scylla Operator for Kubernetes, which provides a management layer for Scylla on Kubernetes. It addresses some limitations of using StatefulSets alone to run Scylla, such as safe scale down operations and tracking member identity. The operator implements the controller pattern with custom resources to deploy and manage Scylla clusters on Kubernetes. It handles tasks like cluster creation and scale up/down while addressing issues like local storage failures.
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Este documento describe las funcionalidades del programa Nmap, una herramienta de escaneo de red open source que permite realizar mapeos de red, identificar puertos y servicios abiertos, determinar sistemas operativos y versiones de servicios. Explica cómo realizar escaneos básicos, opciones de paralelismo, visualización, salida de resultados, detección de sistemas operativos, scripts NSE y Zenmap como interfaz gráfica.
This document discusses using MySQL 5.1 partitions to boost performance with large datasets. It begins with an introduction of the author Giuseppe Maxia. It then discusses defining problems with too much data like not enough RAM. It explains what partitions are, how to create them, and when they should be used, such as for large tables or historical data. Hands-on examples are provided using the MySQL employees test database partitioned by year. Partitions provided faster queries and deleting of records compared to non-partitioned tables. Some pitfalls and best practices are also covered.
Deploying Flink on Kubernetes - David AndersonVerverica
Kubernetes has rapidly established itself as the de facto standard for orchestrating containerized infrastructures. And with the recent completion of the refactoring of Flink's deployment and process model known as FLIP-6, Kubernetes has become a natural choice for Flink deployments. In this talk we will walk through how to get Flink running on Kubernetes
This document discusses Go programming patterns and best practices presented by MegaEase, an enterprise cloud native architecture provider. It covers topics like slices, interfaces, performance optimization, and common Go mistakes. Examples are provided to demonstrate slice internals, deep comparison, interface patterns, and how to check interface compliance.
There are many ways to run high availability with PostgreSQL. Here, we present a template for you to create your own customized, high-availability solution using Python and for maximum accessibility, a distributed configuration store like ZooKeeper or etcd.
Writing and testing high frequency trading engines in javaPeter Lawrey
JavaOne presentation of Writing and Testing High Frequency Trading Engines in Java. Talk looks at low latency trading, thread affinity, lock free code, ultra low garbage and low latency persistence and IPC.
Understanding InfluxDB’s New Storage EngineInfluxData
Learn more about InfluxDB’s new storage engine! The team developed a cloud-native, real-time, columnar database optimized for time series data. We built it all in Rust and it sits on top of Apache Arrow and DataFusion. We chose Apache Parquet as the persistent format, which is an open source columnar data file format. This new storage engine provides InfluxDB Cloud users with new functionality, including the removal of cardinality limits, so developers can bring in massive amounts of time series data at scale.
In this webinar, Anais Dotis-Georgiou will dive into:
Requirements for rebuilding InfluxDB’s core
Key product features and timeline
How Apache Arrow’s ecosystem is used to meet those requirements
Stick around for a demo and live Q&A
Network Scanning Phases and Supporting ToolsJoseph Bugeja
This presentation focuses on the network penetration scanning phase. It introduces tools and techniques that professional pen-testers and ethical hackers need to master to find target machines, openings on those targets and vulnerabilities.
Trino (formerly known as PrestoSQL) is an open source distributed SQL query engine for running fast analytical queries against data sources of all sizes. Some key updates since being rebranded from PrestoSQL to Trino include new security features, language features like window functions and temporal types, performance improvements through dynamic filtering and partition pruning, and new connectors. Upcoming improvements include support for MERGE statements, MATCH_RECOGNIZE patterns, and materialized view enhancements.
User Behavior Analytics Using Machine LearningDNIF
In this presentation we talk about:
- Introduction to user behavior analytics.
- Classifying malicious IP using machine learning.
- User behavior analytics using machine learning.
You can watch the complete demonstration video here: https://youtu.be/HfpjLR6ZwIU?t=3550
Mesos-based Data Infrastructure @ DoubanZhong Bo Tian
How to build an elastic and efficient platform to support various Big Data and Machine Learning tasks is a challenge for a lot of corporations. In this presentation, Zhongbo Tian will give an overview of the Mesos-based core infrastructure of Douban, and demonstrate how to integrate the platform with state-of-art Big Data/ML technologies.
Build 1 trillion warehouse based on carbon databoxu42
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark™ is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
22. How it works
•CK架构
Data Storage
MergeTree
ReplicatedMergeTree
ReplacingMergeTree
SummingMergeTree
Log
Memory
Buffer
Client
PHP JDBC
Python
Golang
Data Type SQL
ODBC
Cluster
Access Quotas
Kafka
Distributed
MaterializedView
Node.js
R
Scala
Julia
Rust
C++
Perl
Ruby .NET
HTTP
Functions
49. 回写MySQL
直接查MySQL数据
Limit 10 by date
直接查数据⽂文件Best practice
INSERT INTO FUNCTION
mysql('host:port', 'db', 'tb', 'user', 'passwd', 1)
SELECT xxx
CREATE TABLE xxxx
ENGINE = MergeTree ORDER BY id AS
SELECT *
FROM mysql('host:port', 'db', 'tb', 'user', 'password')
clickhouse-client
--query="SELECT partition, count() AS
number_of_parts,
formatReadableSize(sum(bytes)) AS
sum_size FROM xxx
WHERE xxxxx ;"
--external --file=test.sql --name=parts
--structure='partition UInt16,name
String,table String,engine String'
-h 127.0.0.1
GROUP BY
xxxx
ORDER BY
xxxx
LIMIT 10 BY date, city;