Indexes are one of the most crucial structures of any relational database. In this talk we'll explain how to use them efficiently, how to read query plans and what do they mean for us. We'll also cover a variety of different indexing structures available in PostgreSQL database and build up some intuition about which one to pick depending on the situation.
The document provides an overview of PostgreSQL indexes, including the different types: B-Tree, Hash, BRIN, Bloom, GiST, SP-GiST, GIN, and RUM indexes. It explains how each index type stores and organizes data, as well as when each type is best suited in terms of performance, size, and supported query types such as equality scans, range scans, and full-text search. The document also covers index-only scans, bitmap scans, and tuple identifiers to help explain how indexes are used during query execution.
The document summarizes external sorting techniques used in database management systems. It describes a two-phase sorting approach using limited buffer space in memory. The first phase creates runs by sorting each page individually. The second phase repeatedly merges runs by pairs until a single sorted run is produced, using three buffer pages - two for input runs and one for the output merged run. The process of merging two sorted runs by comparing elements and writing the smallest to the output page is also explained.
Indexes are data structures that improve retrieval speed for data in a database. They work by sorting field values and storing pointers to records, allowing for faster searching. Indexes should be used on fields involved in searches, joins, or with high cardinality. There are different types of indexes including clustered, non-clustered, unique, non-unique, bitmap and full text. Indexes are created using SQL commands and their information can be displayed and deleted as needed.
The document discusses various data structures and their classification. It begins by stating the objectives of understanding how data structures can be classified, basic data types and arrays, and problem-oriented data structures used to solve specific problems. It then defines key terms like data, information, and data structures. It provides examples of different data structure types like arrays, lists, stacks, queues and trees. It also discusses basic data types, array types, and various operations involved in searching, sorting and manipulating different data structures.
The document discusses data structures and abstract data types (ADTs). It describes lists, stacks, and queues. Lists can be implemented using arrays or linked lists. Linked lists allow for faster insertion and deletion compared to arrays. Stacks follow a last-in, first-out order, while queues follow a first-in, first-out order. Common operations on stacks and queues include push, pop, enqueue, and dequeue. The document provides examples of how stacks and queues can be used in applications such as bracket matching, calculators, and job queues.
The document discusses various data structures and their classification. It begins by stating the objectives of understanding how data structures can be classified, basic data types and arrays, and problem-oriented data structures used to solve specific problems. It then defines key terms like data, information, and data structures. It describes primitive and non-primitive, linear and non-linear data structures. It also discusses basic and problem-oriented data structures like lists, stacks, queues, and trees. It provides examples and applications of different data structures.
The document summarizes a lecture on DBMS internals including hash-based indexing and external sorting. It discusses static hashing which uses a fixed number of buckets and can develop long overflow chains. Extendible hashing is then introduced which uses a directory of pointers to buckets and dynamically doubles the directory and splits buckets as needed when inserting entries. The key aspects are that it can gracefully handle insertions and deletions without performance degradation and requires fewer disk I/Os than static hashing.
The document provides an overview of PostgreSQL indexes, including the different types: B-Tree, Hash, BRIN, Bloom, GiST, SP-GiST, GIN, and RUM indexes. It explains how each index type stores and organizes data, as well as when each type is best suited in terms of performance, size, and supported query types such as equality scans, range scans, and full-text search. The document also covers index-only scans, bitmap scans, and tuple identifiers to help explain how indexes are used during query execution.
The document summarizes external sorting techniques used in database management systems. It describes a two-phase sorting approach using limited buffer space in memory. The first phase creates runs by sorting each page individually. The second phase repeatedly merges runs by pairs until a single sorted run is produced, using three buffer pages - two for input runs and one for the output merged run. The process of merging two sorted runs by comparing elements and writing the smallest to the output page is also explained.
Indexes are data structures that improve retrieval speed for data in a database. They work by sorting field values and storing pointers to records, allowing for faster searching. Indexes should be used on fields involved in searches, joins, or with high cardinality. There are different types of indexes including clustered, non-clustered, unique, non-unique, bitmap and full text. Indexes are created using SQL commands and their information can be displayed and deleted as needed.
The document discusses various data structures and their classification. It begins by stating the objectives of understanding how data structures can be classified, basic data types and arrays, and problem-oriented data structures used to solve specific problems. It then defines key terms like data, information, and data structures. It provides examples of different data structure types like arrays, lists, stacks, queues and trees. It also discusses basic data types, array types, and various operations involved in searching, sorting and manipulating different data structures.
The document discusses data structures and abstract data types (ADTs). It describes lists, stacks, and queues. Lists can be implemented using arrays or linked lists. Linked lists allow for faster insertion and deletion compared to arrays. Stacks follow a last-in, first-out order, while queues follow a first-in, first-out order. Common operations on stacks and queues include push, pop, enqueue, and dequeue. The document provides examples of how stacks and queues can be used in applications such as bracket matching, calculators, and job queues.
The document discusses various data structures and their classification. It begins by stating the objectives of understanding how data structures can be classified, basic data types and arrays, and problem-oriented data structures used to solve specific problems. It then defines key terms like data, information, and data structures. It describes primitive and non-primitive, linear and non-linear data structures. It also discusses basic and problem-oriented data structures like lists, stacks, queues, and trees. It provides examples and applications of different data structures.
The document summarizes a lecture on DBMS internals including hash-based indexing and external sorting. It discusses static hashing which uses a fixed number of buckets and can develop long overflow chains. Extendible hashing is then introduced which uses a directory of pointers to buckets and dynamically doubles the directory and splits buckets as needed when inserting entries. The key aspects are that it can gracefully handle insertions and deletions without performance degradation and requires fewer disk I/Os than static hashing.
Every time you choose how to store data in your database, a lot of things happen under the hood.
Making the best choice is even more important in those applications that aim to high performance.
The purpose of the talk is to show how indexes work and how slightly changing their combinations can impact on the performance of your application.
This document discusses data structures and discrete mathematics. It provides an overview of linked lists, stacks, and queues. Key points include:
- Linked lists, stacks, and queues are common data structures that can be implemented using arrays or linked nodes.
- Common operations on data structures include adding, removing, and searching for data.
- Abstract data types (ADTs) specify functionality without defining the implementation. This allows data structures to be reused.
- Stacks follow last-in, first-out behavior using push and pop operations. Queues follow first-in, first-out behavior using enqueue and dequeue operations.
- Both stacks and queues have many applications areas like expression evaluation,
This document discusses performing data science on HBase using the WibiData platform. It introduces WibiData Language (WDL), which allows analyzing data stored in HBase columns in a concise and interactive way using Scala and Apache Crunch. The document demonstrates building a histogram of editor metrics by reading user data from an HBase table, filtering and binning average edit deltas, and visualizing the results. WDL aims to make HBase data exploration more accessible for data scientists compared to other frameworks like Hive and Pig.
The document provides answers to lab exercises on creating and manipulating tables in a database. It includes answers for creating tables, inserting data, updating records, running queries, and demonstrating relationships between tables. The lab covers topics like creating student, library, employee, insurance, course enrollment, and book dealer databases. Queries are demonstrated to retrieve, update and aggregate data from the tables. Primary keys, foreign keys and relationships between tables are also defined.
The document discusses B-tree indexes in PostgreSQL. It provides an overview of B-tree index internals including page layout, the meta page, Lehman & Yao algorithm adaptations, and new features like covering indexes, partial indexes, and HOT updates. It also outlines development challenges and future work needed like index compression, index-organized tables, and global partitioned indexes. The presenter aims to inspect B-tree index internals, present new features, clarify the development roadmap, and understand difficulties.
This document provides information on importing and working with different data types in R. It introduces packages for importing files like SPSS, Stata, SAS, Excel, databases, JSON, XML, and APIs. It also covers functions for reading and writing common file types like CSV, TSV, and RDS. Finally, it discusses parsing data and handling missing values when reading files.
Hive User Meeting March 2010 - Hive TeamZheng Shao
The document summarizes new features and API updates in the Hive data warehouse software. It discusses enhancements to JDBC/ODBC connectivity, the introduction of CREATE TABLE AS SELECT (CTAS) functionality, improvements to join strategies including map joins and bucketed map joins, and work on views, HBase integration, user-defined functions, serialization/deserialization (SerDe), and object inspectors. It also provides guidance on developing new SerDes for custom data formats and serialization needs.
The document discusses different data structures and their implementations and applications. It covers arrays, linked lists, stacks, queues, binary trees, and binary search. The key points are:
- Arrays allow fast access but have fixed size; linked lists can grow dynamically but access is slower.
- Binary trees allow fast (O(log n)) search, insertion, and deletion operations due to their hierarchical structure.
- Stacks and queues are useful for modeling LIFO and FIFO data access with applications like function calls and job scheduling.
- Binary search runs in O(log n) time by recursively dividing the search space for sorted data.
This document contains a data structures question paper from Anna University. It has two parts:
Part A contains 10 short answer questions covering topics like ADT, linked stacks, graph theory, algorithm analysis, binary search trees, and more.
Part B contains 5 long answer questions each worth 16 marks. Topics include algorithms for binary search, linear search, recursion, sorting, trees, graphs, files, and more. Students are required to write algorithms, analyze time complexity, and provide examples for each question.
This document provides an overview of a lecture on modern database systems. It discusses the following key points:
1) The lecture will review relational models, SQL, storage and indexing. Assignments will be posted online and are due on February 12th.
2) The relational model represents data in tables and supports intuitive querying with SQL. Common queries like selections, projections, joins and aggregations are demonstrated.
3) Database files can be organized using heap files, sorted files and indexes like B+ trees and hash indexes. These different structures allow for efficient retrieval of data based on queries.
Part B CS8391 Data Structures Part B Questions compiled from R2008 & R2013 to help the students of Affiliated Colleges appearing for Ann University Examination
How the query planner in PostgreSQL works? Index access methods, join execution types, aggregation & pipelining. Optimizing queries with WHERE conditions, ORDER BY and GROUP BY. Composite indexes, partial and expression indexes. Exploiting assumptions about data and denormalization.
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
In this session, we will take you through setting up an Amazon Redshift cluster and at the ways you can populate it with data. We will start by using AWS DMS to replicate the data as-is as well as doing some ETL on it. This will be followed by AWS Glue where you can do more advanced ETL operations. Lastly, we will look at how you can use Amazon Kinesis Firehose to stream event directly to the Redshift cluster.
The document discusses how MySQL chooses query execution plans and the importance of indexing for performance. It covers the MySQL optimizer, tools for analyzing queries like EXPLAIN and TRACE, and techniques like index condition pushdown that push conditions to the storage engine. The document uses examples and a quiz to illustrate indexing concepts and how the optimizer works in MySQL.
This document discusses and compares several R tools for visualizing Hi-C data, including HiTC, HiCBricks, DNA_Rchitect, GENOVA, GenomicInteractions, Sushi, HiCeekR, and adjclust. It summarizes the file formats each tool accepts, how the data are imported and stored, and the types of visualizations provided, such as heatmaps, arcs/networks, and quality control plots. While most tools can import basic text-based formats, support for newer formats like cool and hic files is still limited. Heatmaps are widely supported but tools vary in customization options. Networks/arcs are best for small regions due to readability issues. Further development is still
This document provides an overview of the Nutch and Lucene frameworks. It describes Nutch as an open-source search engine implemented in Java that uses Lucene for indexing and searching crawled data. Both Nutch and Lucene use a plugin framework and are customizable. The document outlines Nutch's crawling and indexing processes and how it incorporates MapReduce. It also summarizes Lucene's features such as field-based indexing and searching and its use of an inverted index.
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
The document summarizes new features in Informix Dynamic Server (IDS) version 11.10. Key features include:
1) Full support for subqueries in the FROM clause of SQL statements and enhancements to distributed queries.
2) New data types like Node and Binary, and a basic text search index for full text search capabilities.
3) Performance improvements to the SQL optimizer including an index self-join access method and directives for ANSI joins.
4) Enhancements to stored procedures, functions, isolation levels and utilities like SYSDBOPEN and SYSDBCLOSE.
This document discusses managing schema objects in an Oracle database. It defines schema objects as tables, constraints, indexes, views, sequences and temporary tables. It provides instructions on how to create and modify tables, define constraints on tables, view table columns and data, create indexes, views, sequences, and temporary tables. It explains the purpose and use of each schema object type.
The document outlines the key concepts of linked lists including:
- Linked lists allow for dynamic resizing and efficient insertion/deletion unlike arrays.
- A linked list contains nodes that have a data field and a pointer to the next node.
- Common operations on linked lists include insertion, deletion, searching, and traversing the list.
- The time complexity of these operations depends on whether it's at the head, tail, or interior node.
- Linked lists can be implemented using classes for the node and linked list objects.
How do databases perform live backups and point-in-time recoveryBartosz Sypytkowski
In this talk we'll talk in details on how modern databases are capable of performing backups without downtimes and how these backups later can be used to restore database to any point in time.
While the talk describes generally applicable approach, a Litestream (SQLite backup service) is used for reference implementation.
This presentation covers HyParView and Plumtree - protocols used to build highly scalable clusters of data capable of gossiping messages between thousands of clients.
Mais conteúdo relacionado
Semelhante a Postgres indexes: how to make them work for your application
Every time you choose how to store data in your database, a lot of things happen under the hood.
Making the best choice is even more important in those applications that aim to high performance.
The purpose of the talk is to show how indexes work and how slightly changing their combinations can impact on the performance of your application.
This document discusses data structures and discrete mathematics. It provides an overview of linked lists, stacks, and queues. Key points include:
- Linked lists, stacks, and queues are common data structures that can be implemented using arrays or linked nodes.
- Common operations on data structures include adding, removing, and searching for data.
- Abstract data types (ADTs) specify functionality without defining the implementation. This allows data structures to be reused.
- Stacks follow last-in, first-out behavior using push and pop operations. Queues follow first-in, first-out behavior using enqueue and dequeue operations.
- Both stacks and queues have many applications areas like expression evaluation,
This document discusses performing data science on HBase using the WibiData platform. It introduces WibiData Language (WDL), which allows analyzing data stored in HBase columns in a concise and interactive way using Scala and Apache Crunch. The document demonstrates building a histogram of editor metrics by reading user data from an HBase table, filtering and binning average edit deltas, and visualizing the results. WDL aims to make HBase data exploration more accessible for data scientists compared to other frameworks like Hive and Pig.
The document provides answers to lab exercises on creating and manipulating tables in a database. It includes answers for creating tables, inserting data, updating records, running queries, and demonstrating relationships between tables. The lab covers topics like creating student, library, employee, insurance, course enrollment, and book dealer databases. Queries are demonstrated to retrieve, update and aggregate data from the tables. Primary keys, foreign keys and relationships between tables are also defined.
The document discusses B-tree indexes in PostgreSQL. It provides an overview of B-tree index internals including page layout, the meta page, Lehman & Yao algorithm adaptations, and new features like covering indexes, partial indexes, and HOT updates. It also outlines development challenges and future work needed like index compression, index-organized tables, and global partitioned indexes. The presenter aims to inspect B-tree index internals, present new features, clarify the development roadmap, and understand difficulties.
This document provides information on importing and working with different data types in R. It introduces packages for importing files like SPSS, Stata, SAS, Excel, databases, JSON, XML, and APIs. It also covers functions for reading and writing common file types like CSV, TSV, and RDS. Finally, it discusses parsing data and handling missing values when reading files.
Hive User Meeting March 2010 - Hive TeamZheng Shao
The document summarizes new features and API updates in the Hive data warehouse software. It discusses enhancements to JDBC/ODBC connectivity, the introduction of CREATE TABLE AS SELECT (CTAS) functionality, improvements to join strategies including map joins and bucketed map joins, and work on views, HBase integration, user-defined functions, serialization/deserialization (SerDe), and object inspectors. It also provides guidance on developing new SerDes for custom data formats and serialization needs.
The document discusses different data structures and their implementations and applications. It covers arrays, linked lists, stacks, queues, binary trees, and binary search. The key points are:
- Arrays allow fast access but have fixed size; linked lists can grow dynamically but access is slower.
- Binary trees allow fast (O(log n)) search, insertion, and deletion operations due to their hierarchical structure.
- Stacks and queues are useful for modeling LIFO and FIFO data access with applications like function calls and job scheduling.
- Binary search runs in O(log n) time by recursively dividing the search space for sorted data.
This document contains a data structures question paper from Anna University. It has two parts:
Part A contains 10 short answer questions covering topics like ADT, linked stacks, graph theory, algorithm analysis, binary search trees, and more.
Part B contains 5 long answer questions each worth 16 marks. Topics include algorithms for binary search, linear search, recursion, sorting, trees, graphs, files, and more. Students are required to write algorithms, analyze time complexity, and provide examples for each question.
This document provides an overview of a lecture on modern database systems. It discusses the following key points:
1) The lecture will review relational models, SQL, storage and indexing. Assignments will be posted online and are due on February 12th.
2) The relational model represents data in tables and supports intuitive querying with SQL. Common queries like selections, projections, joins and aggregations are demonstrated.
3) Database files can be organized using heap files, sorted files and indexes like B+ trees and hash indexes. These different structures allow for efficient retrieval of data based on queries.
Part B CS8391 Data Structures Part B Questions compiled from R2008 & R2013 to help the students of Affiliated Colleges appearing for Ann University Examination
How the query planner in PostgreSQL works? Index access methods, join execution types, aggregation & pipelining. Optimizing queries with WHERE conditions, ORDER BY and GROUP BY. Composite indexes, partial and expression indexes. Exploiting assumptions about data and denormalization.
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
In this session, we will take you through setting up an Amazon Redshift cluster and at the ways you can populate it with data. We will start by using AWS DMS to replicate the data as-is as well as doing some ETL on it. This will be followed by AWS Glue where you can do more advanced ETL operations. Lastly, we will look at how you can use Amazon Kinesis Firehose to stream event directly to the Redshift cluster.
The document discusses how MySQL chooses query execution plans and the importance of indexing for performance. It covers the MySQL optimizer, tools for analyzing queries like EXPLAIN and TRACE, and techniques like index condition pushdown that push conditions to the storage engine. The document uses examples and a quiz to illustrate indexing concepts and how the optimizer works in MySQL.
This document discusses and compares several R tools for visualizing Hi-C data, including HiTC, HiCBricks, DNA_Rchitect, GENOVA, GenomicInteractions, Sushi, HiCeekR, and adjclust. It summarizes the file formats each tool accepts, how the data are imported and stored, and the types of visualizations provided, such as heatmaps, arcs/networks, and quality control plots. While most tools can import basic text-based formats, support for newer formats like cool and hic files is still limited. Heatmaps are widely supported but tools vary in customization options. Networks/arcs are best for small regions due to readability issues. Further development is still
This document provides an overview of the Nutch and Lucene frameworks. It describes Nutch as an open-source search engine implemented in Java that uses Lucene for indexing and searching crawled data. Both Nutch and Lucene use a plugin framework and are customizable. The document outlines Nutch's crawling and indexing processes and how it incorporates MapReduce. It also summarizes Lucene's features such as field-based indexing and searching and its use of an inverted index.
IBM Informix dynamic server 11 10 Cheetah Sql FeaturesKeshav Murthy
The document summarizes new features in Informix Dynamic Server (IDS) version 11.10. Key features include:
1) Full support for subqueries in the FROM clause of SQL statements and enhancements to distributed queries.
2) New data types like Node and Binary, and a basic text search index for full text search capabilities.
3) Performance improvements to the SQL optimizer including an index self-join access method and directives for ANSI joins.
4) Enhancements to stored procedures, functions, isolation levels and utilities like SYSDBOPEN and SYSDBCLOSE.
This document discusses managing schema objects in an Oracle database. It defines schema objects as tables, constraints, indexes, views, sequences and temporary tables. It provides instructions on how to create and modify tables, define constraints on tables, view table columns and data, create indexes, views, sequences, and temporary tables. It explains the purpose and use of each schema object type.
The document outlines the key concepts of linked lists including:
- Linked lists allow for dynamic resizing and efficient insertion/deletion unlike arrays.
- A linked list contains nodes that have a data field and a pointer to the next node.
- Common operations on linked lists include insertion, deletion, searching, and traversing the list.
- The time complexity of these operations depends on whether it's at the head, tail, or interior node.
- Linked lists can be implemented using classes for the node and linked list objects.
Semelhante a Postgres indexes: how to make them work for your application (20)
How do databases perform live backups and point-in-time recoveryBartosz Sypytkowski
In this talk we'll talk in details on how modern databases are capable of performing backups without downtimes and how these backups later can be used to restore database to any point in time.
While the talk describes generally applicable approach, a Litestream (SQLite backup service) is used for reference implementation.
This presentation covers HyParView and Plumtree - protocols used to build highly scalable clusters of data capable of gossiping messages between thousands of clients.
In this talk we'll discuss technical foundations behind Conflict-free Replicated Data Types (CRDT), which let us create collaborative client applications - systems where no reliance on central servers and offline-first capabilities are one of the founding principles. We'll cover some of the challenges bound to this approach and how to address them. Finally we'll present Yrs - Rust library, that allows us to build rich collaborative applications on desktop and browser.
The document discusses modern concurrency primitives like threads, thread pools, coroutines, and schedulers. It covers why asynchronous programming with async/await is preferred over traditional threading. It also discusses challenges like sharing data across threads and blocking on I/O calls. Some solutions covered include using thread pools with dedicated I/O threads, work stealing, and introducing interruption points in long-running tasks.
During this presentation we'll quickly cover the core principles of eventsourced systems and different approaches to scalling event log to distributed workload. We'll focus on peer-to-peer variants of such: what are their advantages and disadvantages and how we can use them.
During this talk we'll cover the theory and practical implementation behind most common patterns in modern multi-threaded programming. How our everyday libraries and frameworks optimize use of operating system resources for maximum efficiency. We'll also try to understand differences between various approaches and what tradeoffs do they infer. Finally we'll take a look at how they are supported by various compilers and runtimes.
Strongly consistent databases are dominating world of software. However, with increasing scale and global availability of our services, many developers often prefer to loose their constraints in favor of an eventual consistency.
During this presentation we'll talk about Conflict-free Replicated Data Types (CRDT) - an eventually-consistent structures, that can be found in many modern day multi-master, geo-distributed databases such as CosmosDB, DynamoDB, Riak, Cassandra or Redis: how do they work and what makes them so interesting choice in highly available systems.
This is presentation from WG.NET (May 2019), where I'm discussing different aspects of virtualization, mainly in the context of programming languages. We'll covering up what stack vs. register based virtual machines are, what is interpreter and compiler and how to build our own bytecode interpreter for a toy programming language.
This document discusses timekeeping in distributed systems. It begins by explaining how different types of clocks work, from pendulum clocks to atomic clocks. It then discusses key concepts like UTC, leap seconds, and how time is represented in Unix. The document also covers challenges of keeping time across distributed systems and algorithms like NTP, vector clocks, and logical clocks that help order events in a distributed system.
This document provides an introduction to Akka.NET Streams and Reactive Streams. It discusses key concepts like observables, async enumerables, and reactive streams. It also demonstrates how to build workflows with Akka.NET streams, including examples of building a TCP server. The document introduces core Akka.NET streams concepts like sources, flows, and sinks, and how they compose together in a runnable graph. It also covers testing streams with probes and materialization.
This is presentiation for Lambda Days 2019, in which I describe details behind building collaborative text editing experience using Replicated Growable Array CRDTs. Later on we come to defining its issues and how to solve them.
1. The document discusses different database storage structures like B+ trees, LSM trees, and their pros and cons for storing structured data on disk.
2. B+ trees are optimized for read performance but require copy-on-write or write-ahead logging for updates. LSM trees prioritize write performance using an append-only structure but require background merging.
3. Bloom filters can help optimize look ups in LSM trees by quickly checking if an element is not present in a collection without accessing all files.
Slides from presentation, I've made on the BuildStuff LT 2018. Here I'm talking about issues, many people have found when using RESTful APIs and how GraphQL addresses them. Also I'm trying to cover the tradeoffs made by the standard, solutions proposed by different implementations and some ideas for the future.
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
What to do when you have a perfect model for your software but you are constrained by an imperfect business model?
This talk explores the challenges of bringing modelling rigour to the business and strategy levels, and talking to your non-technical counterparts in the process.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
12. SEQ SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. (Hopefully) sequential I/O
2. Scans all table’s related pages
3. Doesn’t use index pages
13. create index on books(publication_date);
select publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
ONLY
SCAN
14. INDEX ONLY
SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Sequential I/O over index pages
2. Doesn’t use table’s related pages
15. create index on books(publication_date);
select title, publication_date
from books
where publication_date > ‘2020/01/01’
INDEX
SCAN
16. INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
17. INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
18. INDEX SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Uses index to find a first page of the related table…
2. Position read cursor on the first page…
3. Sequential I/O over all table’s pages until condition is done
19. create index on books
using gist(description_lex);
select title, publication_date
from books
where description_lex @@ ‘epic’
BITMAP
SCAN
20. BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
Bitmap
21. BITMAP SCAN
M1
T1 I1
T2 T3 I2 I3
Index Storage
T4 T5 T6 T7
Table Heap
1. Using index create bitmap of matching pages
2. Random I/O over pages covered by bitmap
Bitmap
22. INCLUDE & PARTIAL
INDEXES
create index ix_books_by_author
on books(author_id)
include (created_at)
where author_id is not null;
HEADER
HEADER
4
25
HEADER
HEADER
HEADER
4
DATA
7
DATA
13
DATA
16
DATA
19
DATA
25
DATA
32
DATA
47
DATA
61
DATA
4
TID
INC
16
TID
INC
25
TID
INC
32
TID
INC
duplicated
columns
Index
Storage
30. select tablename, attname, correlation
from pg_stats
where tablename = 'film'
tablename attname correlation
film film_id 0.9979791
film title 0.9979791
film description 0.04854498
film release_year 1
film rating 0.1953281
film last_update 1
film fulltext <null>
COLUMN-TUPLE
CORRELATION
31. COMPAR
ING
VECTOR
CLOCKS
BRIN INDEX
1. Imprecise
2. Very small in size
3. Good for columns aligned with tuple
insert order and immutable records
create index ih_events_created_at on events
using brin(created_at) with (pages_per_range = 128);
32. BLOOM INDEX
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
37. COMPAR
ING
VECTOR
CLOCKS
BLOOM INDEX
1. Small in size
2. Good for exclusion/narrowing
3. False positive ratio: hur.st/bloomfilter/
create extension bloom;
create index ix_active_codes
on active_codes using bloom(keycode)
with (length=80, col1=2);
number of bits per record
number of hashes for each
column
40. GiST INDEX
TSVECTOR
-- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gist(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=4.18..20.32 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..4.18 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
45. SP-GiST INDEX
TSVECTOR
-- spgist can be created on text column but not on nvarchar
create index idx_film_title on film using spgist(title);
select * from film
where title like ‘A Fast-Paced% in New Orleans';
Bitmap Heap Scan on film (cost=8.66..79.03 rows=51 width=416)
Filter: (description ~~ 'A Fast-Paced%'::text)
-> Bitmap Index Scan on idx_film_title (cost=0.00..8.64 rows=50 width=0)
Index Cond: ((description ~>=~ 'A Fast-Paced'::text) AND (description ~<~ 'A Fast-Pacee'::text))
Query Plan
46. COMPAR
ING
VECTOR
CLOCKS
GiST INDEX
1. Just like GiST, but faster for some ops…
2. … but unable to perform some other
3. Indexed space is partitioned into non-
overlapping regions
create index ix_files_path
on files using spgist(path);
48. GIN INDEX -- gist cannot be applied directly on text columns
alter table film add column
description_lex tsvector
generated always as (to_tsvector('english', description))
stored;
create index idx_film_description_lex
on film using gin(description_lex);
select * from film where description_lex @@ 'epic';
Bitmap Heap Scan on film (cost=8.04..24.18 rows=5 width=416)
Recheck Cond: (description_lex @@ '''epic'''::tsquery)
-> Bitmap Index Scan on idx_film_description_lex (cost=0.00..8.04 rows=5 width=0)
Index Cond: (description_lex @@ '''epic'''::tsquery)
Query Plan
50. COMPAR
ING
VECTOR
CLOCKS
GIN INDEX
create index ix_books_content
on books using gin(content_lex);
1. Reads usually faster than GiST
2. Writes are usually slower than GiST
3. Index size greater than GiST
53. RUM INDEX
-- similarity ranking
select description_lex <=> to_tsquery('epic’) as similarity
from books;
-- find description with 2 words located one after another
select * from books
where description_lex @@ to_tsquery(‘hello <-> world’);
54. COMPAR
ING
VECTOR
CLOCKS
RUM INDEX
1. GIN on steroids (bigger but more
capable)
2. Allows to query for terms and their
relative positions in text
3. Supports Index Scan and EXCLUDE
create extension rum;
create index ix_books_content
on books using rum(content_lex);
56. 1. Uses only lexical similarity
and is language-sensitive
2. Misses the context
(meaning)
3. Works only on text
LIMITS OF FULL-TEXT
SEARCH
select * from books
where description_lex @@
to_tsquery(‘white <-> castle’);
select * from books
where description_lex @@
to_tsquery(‘white <-> fortress’);
59. IVFFLAT VECTOR
INDEX
create index ix_books_content on books
using ivfflat(embedding content_lex) with (lists = 1000);
select * from items where embedding <-> ‘[3,1,2]’ < 5;
60. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
61. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
lists
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
62. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
centroids
create index on items
using ivfflat(embedding vector_l2_ops)
with (lists=3);
64. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
set ivfflat.probes = 2;
select * from items
where embedding <-> ‘[1,2]’ < 5;
[1,2]
65. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
[1,2]
66. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
67. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
68. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
L1
L2
L3
C1
C2
C3
[1,2]
set ivfflat.probes = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
69. HNSW VECTOR INDEX
create index on items using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
70. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=2, ef_construction=3);
71. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=2, ef_construction=3);
Layer 1
72. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=3, ef_construction=3);
Layer 1
Layer 2
73. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
create index on items
using hnsw(embedding vector_l2_ops)
with (m=3, ef_construction=3);
Layer 1
Layer 2
Layer 3
75. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
76. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
77. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
78. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
79. INVERTED FILE WITH FLAT COMPRESSION
x
y A
C
B
D
G
E
L
O
M N
F
I
K
J
H
[1,2]
set hnsw.ef_search = 2;
select * from items
order by embedding <-> ‘[1,2]’
take 4;
80. 1. Fast build time
2. Smaller size
3. Slower query
performance
4. Bad for frequent index
updates
1. Slow initial build time
2. Bigger index size
3. Faster performance
4. Better recall after updates
IVFFLAT
INDEX
HNSW
INDEX