5. What are Query Rewrites?
They come in two flavors
• Pre-Parse: string-to-string
– Low overhead
– No structure
• Post-Parse: parse tree
– Retains structure
– Require re-parse (no destructive editing)
– Need to traverse parse tree
– Only select statements
6. Program Agenda
What query rewrites are
The API's
Declaring plugins
Writing plugins
(Plugin services)
Bonus: Writing a Post-Parse Plugin
7. Query Rewrite Plugin API - Overview
Network
Parser
Optimizer
Server
text
LEX
QR Plugin(s)
Pre-parse QR API
Post-parse QR API
10. “Database auditing involves observing a database
so as to be aware of the actions of database users.
Database administrators and consultants often set
up auditing for security purposes, for example, to
ensure that those without the permission to access
information do not access it.”
Source: http://en.wikipedia.org/wiki/Database_audit
Audit API
11. Audit API provides infrastructure for
●
Locking
●
Caching
●
Gathering
●
Event model
●
Parameter passing
12. Program Agenda
What query rewrites are
The API's
Declaring plugins
Writing plugins
(Plugin services)
Bonus: Writing a Post-Parse Plugin
18. Skeleton of a Query Rewrite Plugin (Pre-Parse)
Doing the Pre-Parse Rewrite
static int do_the_rewrite(MYSQL_THD,
mysql_event_class_t event_class,
const void *event)
{
const mysql_event_parse *event_parse=
static_cast<const mysql_event_parse *>(event);
if (event_parse>event_subclass == MYSQL_AUDIT_PARSE_PREPARSE)
{
size_t query_length= event_parse>query.length;
char *rewritten_query=
static_cast<char *>(my_malloc(key_memory_rewrite_example,
query_length + 1, MYF(0)));
for (size_t i= 0; i < query_length + 1; ++i)
rewritten_query[i]= tolower(event_parse>query.str[i]);
event_parse>rewritten_query>str= rewritten_query;
event_parse>rewritten_query>length= query_length;
*reinterpret_cast<int *>(event_parse>flags)|=
MYSQL_AUDIT_PARSE_REWRITE_PLUGIN_QUERY_REWRITTEN;
}
19. Program Agenda
What query rewrites are
The API's
Declaring plugins
Writing plugins
Plugin services
Bonus: Writing a Post-Parse Plugin
20. What is A Plugin Service?
Server Audit API
Services
Plugins
Calls
Calls
21. The Parser Service
This Service Lets a Plugin:
• Parse a string, get:
●
Normalized query
●
Query digest
• Traverse a parse tree:
●
Find positions of literals
• Print literals
22. The Parser Service
In code (include/mysql/service_parser.h):
kuk
int mysql_parser_parse(MYSQL_THD thd, const MYSQL_LEX_STRING query,
unsigned char is_prepared,
sql_condition_handler_function handle_cond,
void *condition_handler_state)
MYSQL_LEX_STRING mysql_parser_get_normalized_query(MYSQL_THD thd)
int mysql_parser_get_statement_digest(MYSQL_THD thd, uchar *digest)
typedef
int (*parse_node_visit_function)(MYSQL_ITEM item, unsigned char* arg);
int mysql_parser_visit_tree(MYSQL_THD thd,
parse_node_visit_function processor,
unsigned char* arg)
MYSQL_LEX_STRING mysql_parser_item_string(MYSQL_ITEM item)
23. The Alloc Service
Lets a Plugin:
• Allocate
• Deallocate
• Instrument
Code (include/mysql/service_mysql_alloc.h)
• my_malloc
• my_realloc
• my_claim
• my_free
• my_memdup
• my_strdup
• my_strndup
24. Program Agenda
What query rewrites are
The API's
Declaring plugins
Writing plugins
(Plugin services)
Bonus: Writing a Post-Parse Plugin
26. Skeleton of a Post-Parse Query Rewrite Plugin
Catching a Literal
int catch_literal(MYSQL_ITEM item, unsigned char* arg)
{
MYSQL_LEX_STRING *result_string_ptr= (MYSQL_LEX_STRING*)arg;
if (result_string_ptr>str == NULL)
{
*result_string_ptr= mysql_parser_item_string(item);
return 0;
}
return 1;
}
27. Skeleton of a Post-Parse Query Rewrite Plugin
Result
mysql> INSTALL PLUGIN post_parse_example SONAME 'post_parse_example.so';
Query OK, 0 rows affected (0,01 sec)
mysql> SELECT 'abc', 'def';
+++
| abc | def |
+++
| abc | def |
+++
1 row in set, 1 warning (0,00 sec)
mysql> SHOW WARNINGSG
*************************** 1. row ***************************
Level: Note
Code: 1105
Message: Query 'SELECT 'abc', 'def'' rewritten to '/* First literal: 'abc' */
SELECT 'abc', 'def'' by a query rewrite plugin
1 row in set (0,00 sec)
28. Links
Blog posts:
• mysqlserverteam.com/write-yourself-a-query-rewrite-plugin-part-1/
• mysqlserverteam.com/the-query-rewrite-plugins/
In the Manual:
• dev.mysql.com/doc/refman/5.7/en/plugin-api.html
• dev.mysql.com/doc/refman/5.7/en/plugin-types.html
• dev.mysql.com/doc/refman/5.7/en/plugin-services.html
• dev.mysql.com/doc/refman/5.7/en/writing-audit-plugins.html
• dev.mysql.com/doc/refman/5.7/en/performance-schema-statement-
digests.html
In the Code:
• include/mysql/service_parser.h
• include/mysql/services.h
29. The preceding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Notas do Editor
Survey
Who has used plugins (installing, uninstalling)
“ Written plugins
“ Used QR plugins
“ Written QR plugin
Who saw Sveta&apos;s presentation?
Who understood it?
Here is my agenda for today. Short on time. Superficial than I would like. First time.
first we sort out what query rewrites are and what they aren&apos;t. Had all sorts of questions
Then we&apos;ll go over the QR API. Actually API&apos;s
We will take a look at how to declare a QR plugin
We will actually write a little plugin together.
If time allows, I&apos;ll give you an introduction to plugin services
Throw in a little bonus for you: a post parse plugin. Much more complex and I don&apos;t know of anyone else who has dared to.
Inside the server, query rewrites could potentially get very complex because there are all sorts of rewrites happening at different stages. We like to keep it simple. Like it was intercepted on the way. Once a query is rewritten, it is the query, to the user it should look like a query was rewritten, you get notified, but then the old query is never seen again . Something like this.
Talked about implementing it this way, like some sort of proxy, possibly even on a different machine. But then it&apos;d be more complex, extra component running. Not to mention cruel to animals.
There&apos;s also some benefit of parsing the string first. And what do I mean by that?
What QR plugins do is intercept an incoming sql command before it reaches the query optimizer and changes it into a different command. What they can&apos;t do is stop a query from happening I.e. filter out queries. And you can&apos;t turn one query into two queries, for instance
A pre-parse QR plugin takes a string and returns a string. Very fast and efficient. Downside no structure. Great for custom commands..
A post-parse QR plugin operates on the parse tree instead. The good thing is that we can pick out the parts of the query that we&apos;re interested in. If we want to look for a certain literal - “abc”, say – we won&apos;t get a false hit if it&apos;s inside a comment. Because comments are removed.
If we rewrite a query this way, we have to build a new string and re-parse it – there&apos;s no destructive editing. And we have a parse tree to traverse. May be expensive. Remebmer – on every query. We can however use query digests as a quick reject test which I&apos;ll get to later.
We currently only support select statements.
...
Here&apos;s a schematic of the query rewrite API&apos;s. The query first comes in from the &apos;network&apos; and is then normally sent straight to the parser. With the QR API we can intercept the query here. We then call out to the plugins
They can intercept it at one of two different times. Either before or right after parsing is done by the server.
Before the parser text. Between the parser and the optimizer, we are passing a parse tree. This is what is intercepted by the post-parse API. Called lex usually.
Let&apos;s take a moment to look at what query rewrites look like in practice. Here I use the example plugin that is included in the distribution. If you build from source you need to start the server with --plugin-dir flag.
First, install the plugin, and it will start rewriting queries right away. This example plugin will simply rewrite all queries to lowercase. As you can see it treats every character the same way. It doesn&apos;t matter if it&apos;s a keyword or a quoted string.
The QR API raises an SQL Note saying that the query was rewritten and from that point on the old query is gone and the new one is the query.
You use the SHOW WARNINGS command to see the Note. What is a little strange at first is that the SHOW WARNINGS command itself is rewritten, and the warning is queued up before parsing, so by the time the command is executed, there is a warning about the show command itself being rewritten.
Let&apos;s look @ apis
The API&apos;s are tiered with the server at the bottom and the specific pre- and post- parse API&apos;s at the top.
The specific API&apos;s call the plugins and handle the result.
The second layer from top is the general QR API. This is where the SQL Note &apos;warning&apos; that a query is rewritten is produced.
The lowest layer above the server is the Audit API. Let&apos;s take a moment to look at it.
The audit API lets you use pluggable auditing. Here is the definition of database auditing from Wikipedia. Auditing is essentially a type of logging to detect illegal activities in the database, such as security breaches and suspicious activity. There are various laws that have different requirements on what should be logged. Some laws require logging of which queries are performed,
Some laws require logging of who sees certain data.
Some laws require logging of modification of the data
Some require logging of all meta-data changes.
An audit plugin registers itself to listen to various events :logging in
What SELECTs are performed
What UPDATE and inserts are performed
Etc All are events
The audit api provides infrastructure for locking plugins (so they don&apos;t get uninstalled while logging). In order to make this locking scale to thousands of parse events per second we need some clever caching here.
The audit API has this nice feature where it collects the plugins according to event. So by the time an event happens, we already have all the plugins queued up that want to get notified of this event. That does wonders for performance.
So we discovered after some time that query rewrites are just a special case of auditing. We just added a &apos;pre parse event&apos; and a &apos;post parse event&apos;. The parse events fit nicely into the event model of the audit api.
We also have a working infrastructure for passing parameters to the plugin. It&apos;s a bit crude, based on void* pointers and bitmaps, but this way we know that it works on all platforms.
With that, I&apos;m going to dive into actual code snippets.
These examples are more or less straight out of the manual, the link is below. Give you links a end.
I&apos;m going to start with the type-specific plugin descriptor. This is not really my area of expertise, so I&apos;ll give a shallow introduction.
For interface_version, the convention is that type-specific plugin descriptors use the interface version for the given plugin type. The actual versioning is done in the general descriptor on the next slide. For audit plugins, the value of the interface_version member is MYSQL_AUDIT_INTERFACE_VERSION
The plugin may wish to be notified when the server dissociates it from the current session. Typically it happens when the session is closed.
There&apos;s the notify function which is called when the event is fired. Which event it is is in the next part.
And the class masks. In this case the class mask says that the plugin is both a pre- and a post-parse QR plugin.
General plugin desc.
The first line identifies this as an audit plugin. The second line is the descriptor from the last slide.
The name, author, description are what you choose them to be. The name is used in the INSTALL PLUGIN statement(?)
We hope your plugin will be GPL&apos;ed.
Init and deinit are called first and last, respectively. Typically you register pfs instrumentation in the init function. Gets torn down automatically when the plugin is uninstalled.
We have the versioning where the plugin API checks that a plugin is compatible with the API and otherwise doesn&apos;t load it. Instead it raises an error.
Any plugin can define status and system variables. I won&apos;t go into details on that here because my floor time is limited. But they are visible in information_schema and in show status/show variables while the plugin is installed. Work automatically.
10 min..
Now we&apos;re ready to write a QR plugin! Start with #include&apos;s. You will want to include my_global first, or I guarantee you will be running into various problems and you will get pissed off and then you&apos;ll end up doing it anyway. So save yourself the aggravation and trust me on this.
Here&apos;s my init function. This is called when the plugin is installed, and if you don&apos;t return 0, you will get an error message that the plugin failed to install.
Here&apos;s the rewrite function. It takes an opaque session object. Opaque is a fancy way of saying void*.
Then we have the event class, and the event itself. The actual type of the event depends on the event class.
You can also “abort” a query, in which case you get a warning with the number that you return.
Let&apos;s look at the rewriting function,
Like I said, this plugin is both a pre- and a post-parse plugin. So there&apos;s two events.
Because we only subscribe to parse events, we can trust that the event is a parse event. If we subscribed to other events, we would&apos;ve had to look at the event_class before this cast.
Now we need to know Are we before or after parsing? What we do next depends completely on this. Note that this is only an example to show as many details as possible. In real life you probably don&apos;t want to rewrite the query both before and after parsing.
When we&apos;re done, return 0.
Now let&apos;s take a look at how to rewrite a query string.
We start by using the alloc service to allocate a new query string. Then just copy character by character. Change to lower in this case. Point the event to the new query. Set the flag that it&apos;s rewritten or the rewritten query won&apos;t be used.
If you look closely at the my_malloc, it&apos;s using a pfs “memory key” to do instrumentation. I&apos;ll tell you more about that in a few slides.
...
Let&apos;s talk about services. Plugins can&apos;t accomplish very much without some help from the server. This is especially true of post-parse plugins, because all they have is an opaque pointer to a parse tree with no primitives to work on it.
In services the flow of calls goes like in this slide. The API calls the plugin, who processes the input. In so doing it may call up functionality in the server. Functionality offered by the server to plugins is called a service. I will mention two services that I deem the most important ones for writing QR plugins: The Parser service and the Alloc service.
First of all the parser service lets you hand a string to the parser. You want to do this after your post-parse rewrite is complete. Of course, if you&apos;re writing a post-parse plugin you already have a parse tree set up by the time notify fn. is called.
There is only one post-parse QR plugin that I know of, the one we wrote. Called Rewriter. It works by using normalized queries and digests to pattern-match queries that should be rewritten.
A normalized query has all the literals anonymized. A digest is an md5 of it. We did it this way to get a reasonable performance, because query rewrites are potentially very expensive. With md5&apos;s and lookup in a hash table we get a few percent of overhead, which is reasonable.
The way it&apos;s used is that we get the positions of literals that should be replaced. Then we build a new string and hand it to the parser service.
Covering the parser service exhaustively would be a full presentation of its own, so I will just have to skip most of the gritty details here.
And covering the Rewrite plugin would also be a full presentation of its own.
This is what actual code looks like. Just an excerpt. Literals are visited with a callback function parse_node_visit_function.
The alloc service lets you – you guessed it – allocate memory. So why would you want to do that instead of rolling your own memory allocation?
The main reason is that you get instrumentation through pfs. The good thing is that this data is together with all other instrumentation data in pfs, together with that of the rest of the server.
...
Now let&apos;s take a look at a post-parse rewrite. Quick example that I cooked up. The post parse api is kind of limited an heavily geared towards the existing Rewriter plugin. That plugin is way too complex to cover here.
This will simply pick out the first literal in the query and put it in a comment. Not very useful but hopefullly it can illustrate how it works.
We start by using the alloc service to allocate a new query string again.
So we call the parser service with a callback function catch_literal that catches the first literal.
Then we start building a new string.
Then parse it.
Set the flag.
Here&apos;s the callback function that catches the first literal.
Uses the parser service to print the item.