SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
Database Tools by Skype
          Asko Oja
Stored Procedure API
   Skype has had from the beginning the requirement that all database access must be
    implemented through stored procedures.
   Using function-based database access has more general good points:
      It's good to have SQL statements that operate on data near to tables. That
        makes life of DBA's easier.
      It makes it possible to optimize and reorganize tables transparently to the
        application.
      Enables DBA's to move parts of database into another database without
        changing application interface.
      Easier to manage security if you don't have to do it on table level. In most cases
        you need to control what user can do and on which data not on what tables.
      All transactions can be made in autocommit mode. That means absolutely
        minimal amount of roundtrips (1) for each query and also each transaction
        takes shortest possible amount of time on server - remember that various locks
        that transactions acquire are release on COMMIT.
Keeping Online Databases Slim and Fit
   Rule 1: Minimize number of connections. Channel all incoming queries through
    optimal number of database connections using pgBouncer
   Rule 2: Minimize number of indexes constraints that use up performance and
    resources. For example backoffice applications need quite often more and different
    indexes than are needed for online queries.
   Rule 3: Keep as little data as possible in online database. Move all data that is not
    need into second or third tier databases. Use remote calls to access it if needed.
   Rule 4: Keep transactions short. Using functions is the best way to do this as you
    can use functions in autocommit mode and still do complex SQL operations.
   Rule 5: Keep external dependancies to the minimum then you have less things that
    can get broken and affect online database. For examples run batch jobs against
    second and third tier databases.
Overall picture

Online databases      proxydb            proxydb
                                                           pgBouncer
● Proxy db's
                     pgBouncer          pgBouncer

● pgBouncers


● OLTP db's


● read only db's    onlinedb_p01       onlinedb_p02          shopdb           shopdb_ro
                      SkyTools           SkyTools           SkyTools           SkyTools



Support databases
● Queue db's
                                         queuedb
                                         SkyTools
● Datamining


● Batchjobs


● Backoffice


● Greenplum
                                   batchdb          analysisdb         backofficedb
pgBouncer – PostgreSQL Connection pooler
   pgBouncer is lightweight and robust
    connection pooler for PostgreSQL.            FrontEnd          WebServer
   Reduces thousands of incoming connections
    to only tens of connections in database.
   Low number of connections is important
    because each connection uses computer
    resources and each new connection is quite              pgBouncer
    expensive as prepared plans have to be
    created each time from scratch.
   We are not using pgBouncer for load                       userdb
    balancing.
   Can be used to redirect database calls
    (database aliases).
plProxy – Remote Call Language
        PL/Proxy is compact language for remote
         calls between PostgreSQL databases.
        With PL/Proxy user can create proxy
         functions that have same signature as
                                                         userdb
         remote functions to be called. The function
         body describes how the remote connection
         should be acquired.

                                                             userdb_ro




       CREATE FUNCTION get_user_email(username text) RETURNS text AS $$
           CONNECT 'dbname=userdb_ro';
        $$ LANGUAGE plproxy;
SkyTools: Batch Job Framework
   Written in python and contain most everything we have found
    useful in our everyday work with databases and PostgreSQL.
   Framework provides database connectivity, logging, stats
    management, encoding, decoding etc for batch jobs. Developers
    need only to take care of business logic in batch jobs all the rest      database
    is handled by batch jobs.
   SkyTools contains 10s of reusable generic scripts for doing
    various data related tasks.
   PgQ that adds event queues to PostgreSQL.
   Londiste replication.                                                               batch
                                                                database
                                                                                         job
   Walmgr for wal based log shipping.



                                                                      database
PgQ: PostgreSQL Queues
   PgQ is PostgreSQL based event processing system. It is part of SkyTools package
    that contains several useful implementations on this engine.
   Event - atomic piece of data created by Producers. In PgQ event is one record in
    one of tables that services that queue. PgQ guarantees that each event is seen at
    least once but it is up to consumer to make sure that event is processed no more
    than once if that is needed.
   Batch - PgQ is designed for efficiency and high throughput so events are grouped
    into batches for bulk processing.
   Queue - Event are stored in queue tables i.e queues. Several producers can write
    into same queue and several consumers can read from the queue. Events are kept
    in queue until all the consumers have seen them. Queue can contain any number of
    event types it is up to Producer and Consumer to agree on what types of events are
    passed and how they are encoded
   Producer - applications that pushes event into queue. Producer can be written in
    any language that is able to run stored procedures in PostgreSQL.
   Consumer - application that reads events from queue. Consumers can be written in
    any language that can interact with PostgreSQL.
PgQ: PostgreSQL Queues Illustration
   Database that keeps track of user status (online, offline, busy, etc.
   Producer can be anybody who can call stored procedure
   The same goes about consumers


                                                                             java
                                                                       presence_cache1


          c++:
    presence_listener                 PresenceDB                             java
                                                                       presence_cache2
                                      function:
                                 set_user_presence
          php:
    presence_listener                                                        c++
                                                 table:
                                                                        presence_poller
                                              user_status

                                              queue:
                                            user_status                     python:
                                                                       presence_updater
PgQ: Features
   Transactional. Event insertion is committed or rolled back together with the other
    things that transaction changes.
   Efficient. Usually database based queue implementations are not efficient but we
    managed to solve it because PostgreSQL makes transaction state visible to users.
   Fast. Events are processed in batches which gives low per event overhead.
   Flexible. Each queue can have several producers and consumers and any number
    of event types handled in it.
   Robust. PgQ guarantees that each consumers sees event at least once. There
    several methods how the processed events can be tracked depending on business
    needs.
   Ideally suited for all kinds of batch processing.
plProxy
plProxy: Installation
   Download PL/Proxy from http://pgfoundry.org/projects/plproxy and extract.
   Build PL/Proxy by running make and make install inside of the plproxy directory. If
    your having problems make sure that pg_config from the postgresql bin directory is
    in your path.
   To install PL/Proxy in a database execute the commands in the plproxy.sql file. For
    example psql -f $SHAREDIR/contrib/plproxy.sql mydatabase
   Steps 1 and 2 can be skipped if your installed pl/proxy from a packaging system
    such as RPM.
   Create a test function to validate that plProxy is working as expected.
   CREATE FUNCTION get_user_email(username text)
    RETURNS text AS $$
        CONNECT 'dbname=userdb';
    $$ LANGUAGE plproxy;
plProxy Language
   The language is similar to plpgsql - string quoting, comments, semicolon at the
    statements end.It contains only 4 statements: CONNECT, CLUSTER, RUN and
    SELECT.
   Each function needs to have either CONNECT or pair of CLUSTER + RUN
    statements to specify where to run the function.
   CONNECT 'libpq connstr'; -- Specifies exact location where to connect and
    execute the query. If several functions have same connstr, they will use same
    connection.
   CLUSTER 'cluster_name'; -- Specifies exact cluster name to be run on. The cluster
    name will be passed to plproxy.get_cluster_* functions.
   CLUSTER cluster_func(..); -- Cluster name can be dynamically decided upon
    proxy function arguments. cluster_func should return text value of final cluster
    name.
plProxy Language RUN ON ...
   RUN ON ALL; -- Query will be run on all partitions in cluster in parallel.
   RUN ON ANY; -- Query will be run on random partition.
   RUN ON <NR>; -- Run on partition number <NR>.
   RUN ON partition_func(..); -- Run partition_func() which should return one or more
    hash values. (int4) Query will be run on tagged partitions. If more than one partition
    was tagged, query will be sent in parallel to them.

     CREATE FUNCTION get_user_email(text)
     RETURNS text AS $$
        CLUSTER 'userdb';
        RUN ON get_hash($1);
     $$ LANGUAGE plproxy;

     Partition selection is done by taking lower bits from hash value.
     hash & (n-1), where n is power of 2.
plProxy Configuration
   Schema plproxy and three functions are needed for plProxy
   plproxy.get_cluster_partitions(cluster_name text) – initializes
    plProxy connect strings to remote databases
   plproxy.get_cluster_version(cluster_name text) – used by plProxy to
    determine if configuration has changed and should be read again. Should be as fast
    as possible because it is called for every function call that goes through plProxy.
   plproxy.get_cluster_config(in cluster_name text, out key text,
    out val text) – can be used to change plProxy parameters like connection
    lifetime.
    CREATE FUNCTION plproxy.get_cluster_version(i_cluster text)
    RETURNS integer AS $$
        SELECT 1;
    $$ LANGUAGE sql;

    CREATE FUNCTION plproxy.get_cluster_config(cluster_name text,
                        OUT key text, OUT val text)
    RETURNS SETOF record AS $$
        SELECT 'connection_lifetime'::text as key, text( 30*60 ) as val;
    $$ LANGUAGE sql;
plProxy: Configuration Functions
CREATE FUNCTION plproxy.get_cluster_partitions(cluster_name text)
RETURNS SETOF text AS $$
begin
    if cluster_name = 'userdb' then
        return next 'port=9000 dbname=userdb_p00 user=proxy';
        return next 'port=9000 dbname=userdb_p01 user=proxy';
        return;
    end if;
    raise exception 'no such cluster: %', cluster_name;
end; $$ LANGUAGE plpgsql SECURITY DEFINER;

CREATE FUNCTION plproxy.get_cluster_partitions(i_cluster_name text)
RETURNS SETOF text AS $$
declare     r record;
begin
    for r in
         select connect_string from plproxy.conf
         where cluster_name = i_cluster_name
         order by part_nr
    loop
         return next r.connect_string;
    end loop;
    if not found then
         raise exception 'no such cluster: %', i_cluster_name;
    end if;
    return;
end; $$ LANGUAGE plpgsql SECURITY DEFINER;
plProxy: Read Only Remote Calls
   We use remote calls mostly for read only queries in cases where it is not reasonable
    to replicate data needed to calling database.
   For example balance data is changing very often but whenever doing decisions
    based on balance we must use the latest balance so we use remote call to get user
    balance.
   plProxy remote calls are not suitable for data modification because there is no
    guarantee that both transactions will be committed or rolled back (No 2phase
    commit). Instead of using remote calls we try to use PgQ queues and various
    consumers on them that provide our 2phase commit.



                                                     userDB

                           shopDb

                                                   balanceDB
plProxy: Remote Calls to Archive
   Closed records from large online tables (invoices, payments, call detail records, etc)
    are moved to archive database on monthly basis after month is closed and data is
    locked against changes.
   Normally users work only on online table but in some cases they need to see data
    from all the history.
   plProxy can used to also query data from archive database when user requests data
    for longer period than online database holds.




                                  onlineDB                 archiveDB
plProxy: Remote Calls for Autonomous Transactions
   PostgreSQL does not have autonomous transactions but plProxy calls can be used
    for mimicking them.
   Autonomous transactions can be useful for logging failed procedure calls.
   If used extensively then usage of pgBouncer should be considered to reduce
    number of additional connections needed and reducing connection creation
    overhead.
plProxy: Proxy Databases as Interface
   Additional layer between application and databases.
   Keep applications database connectivity simpler giving DBA's and developer's more
    flexibility for moving data and functionality around.
   It gets really useful when number of databases gets bigger then connectivity
    management outside database layer gets too complicated :)


                                                                              BackOffice
                                                                              Application

                              shopDb              backofficeDb
                              (proxy)               (proxy)




                        shopDb              userDB               internalDb
plProxy: Proxy Databases for Security
   Security layer. By giving access to proxy database DBA's can be sure that user has
    no way of accessing tables by accident or by any other means as only functions
    published in proxy database are visible to user.
   Such proxy databases may be located in servers visible from outside network while
    databases containing data are usually not.


                                                                         BackOffice
                                                                         Application


                   manualfixDb                          ccinterfaceDb
                     (proxy)                               (proxy)




                         shopDb                 creditcardDb
plProxy: Run On All for Data
     Also usable when exact partition where data resides is not known. Then
      function may be run on all partitions and only the one that has data does
      something.                                       CREATE FUNCTION get_balances(
                                                           i_users text[],
    CREATE FUNCTION balance.get_balances(                  OUT user text,
         i_users text[],                                   OUT currency text,
         OUT username text,                                OUT balance numeric)
         OUT currency text,                            RETURNS SETOF record AS $$
         OUT balance numeric)                              CLUSTER 'userdb'; RUN ON ALL;
    RETURNS SETOF record AS $$                         $$ LANGUAGE plproxy;
    BEGIN
         FOR i IN COALESCE(array_lower(i_users,1),0) ..
                   COALESCE(array_upper(i_users,1),-1)
         LOOP
             IF partconf.valid_hash(i_users[i]) THEN
                 SELECT i_users[i], u.currency, u.balance
                    INTO username, currency, balance                       userdb
                    FROM balances
                   WHERE username = i_users[i];
                 RETURN NEXT;
             END IF;
         END LOOP;
         RETURN;
    END;
    $$ LANGUAGE plpgsql SECURITY DEFINER;       userdb_p0 userdb_p1 userdb_p2 userdb_p3
plProxy: Run On All for for Stats
     Gather statistics from all the partitions   CREATE FUNCTION stats._get_stats(
                                                       OUT stat_name text,
      and return summaries to the caller               OUT stat_value bigint)
    CREATE FUNCTION stats._get_stats(             RETURNS SETOF record AS $$
        OUT stat_name text,                       declare
        OUT stat_value bigint)                         seq record;
                                                  begin
    RETURNS SETOF record AS $$                         for seq in
        cluster 'userdb';                                   select c.relname,
        run on all;                                                'select last_value from stats.'
    $$ LANGUAGE plproxy;                                           || c.relname as sql
                                                              from pg_class c, pg_namespace n
    CREATE FUNCTION stats.get_stats(                         where n.nspname = 'stats'
        OUT stat_name text,                                    and c.relnamespace = n.oid
        OUT stat_value bigint)                                 and c.relkind = 'S'
    RETURNS SETOF record AS $$                               order by 1
        select stat_name,                              loop
              (sum(stat_value))::bigint                     stat_name := seq.relname;
        from stats._get_stats()                             execute seq.sql into stat_value;
        group by stat_name                                  return next;
        order by stat_name;                            end loop;
    $$ LANGUAGE sql;                                   return;
                                                  end;
                                                  $$ LANGUAGE plpgsql;
plProxy: Geographical Partitioning
   plProxy can be used to split database into partitions based on country code.
    Example database is split into 'us' and 'row' (rest of the world)
   Each function call caused by online users has country code as one of the
    parameters
   All data is replicated into backend database for use by backoffice applications and
    batch jobs. That also reduces number of indexes needed in online databases.

    CREATE FUNCTION get_cluster(
        i_key_cc text)
                                                       onlinedb
    RETURNS text AS $$                                  (proxy)
    BEGIN
        IF i_key_cc = 'ru' THEN
             RETURN 'oltp_ru';
        ELSE
             RETURN 'oltp_row';          onlinedb_RU    onlinedb_ROW       backenddb
        END IF;
    END;
    $$ LANGUAGE plpgsql;
plProxy: Application Driven Partitioning
   Sometimes same function can be run on read only replica and on online database
    but the right partition is known only on runtime depending on context.
   In that situation get_cluster can be used to allow calling application to choose where
    this function will be executed.

      CREATE FUNCTION get_cluster(i_refresh boolean)
      RETURNS text AS $$
      BEGIN
          IF i_refresh THEN
               return 'shopdb';
          ELSE                                                            shopdb
               return 'shopdb_ro';                                        (proxy)
          END IF;
      END; $$ LANGUAGE plpgsql;




                                                     shopdb_RW      shopdb_RO       shopdb_RO
plProxy: Horizontal Partitioning
     We have partitioned most of our database by username using PostgreSQL hashtext
      function to get equal distribution between partitions.
     As proxy databases are stateless we can have several exact duplicates for load
      balancing and high availability.
    CREATE FUNCTION public.get_user_email(text)
    RETURNS text AS $$
        cluster 'userdb';
        run on get_hash($1);
    $$ LANGUAGE plproxy;

    CREATE FUNCTION public.get_hash(
         i_key_user text)                                   userdb    userdb
    RETURNS integer AS $$
    BEGIN
         return hashtext(lower(i_key_user));
    END;
    $$ LANGUAGE plpgsql;
                                               userdb_p0   userdb_p1 userdb_p2   userdb_p3
plProxy: Partitioning Partition Functions
   check_hash used to validate incoming calls
   partconf functionality to handle sequences
   validate_hash function in triggers to exclude data during split

    CREATE FUNCTION public.get_user_email( i_username text)
    RETURNS text AS $$
    DECLARE
        retval text;
    BEGIN
        PERFORM partconf.check_hash(i_username);

        SELECT email FROM users
        WHERE username = lower(i_username) INTO retval;

        RETURN retval;
    END;
    $$ LANGUAGE plpgsql;
plProxy: Horizontal Split
   When splitting databases we usually prepare new partitions in other servers and
    then switch all traffic at once to keep our life simple.
   Steps how to increase partition count by 2x:
       Create 2 partitions for each old one.
       Replicate partition data to corresponding new partitions
       Delete unnecessary data from new partitions and cluster and analyze tables
       Prepare new plProxy configuration
       Switch over to them
           Take down ip
           Up plproxy version
           Wait replica to catch up
           Set ip up
   You can have several partitions in one server. Moving them out later will be easy.


    userdb_a0    userdb_a1                 userdb_b0   userdb_b1 userdb_b2   userdb_b3
plProxy: Partconf
   Partconf is useful when you have split database into several partitions and there are
    also failover replicas of same databases.
      Some mechanism for handling sequences id's is needed
      Some mechanism for protecting from misconfiguration makes life safer
   CREATE FUNCTION partconf.set_conf(
       i_part_nr integer ,
       i_max_part integer ,
       i_db_code integer )
     Used when partconf is installed to initialize it. db_code is supposed to be unique
         over all the databases.
   CREATE FUNCTION partconf.check_hash(key_user text) RETURNS boolean
     used internally from other functions to validate that function call was directed to
         correct partition. Raises error if not right partition for that username.
   CREATE FUNCTION global_id() RETURNS bigint
     used to get globally unique id's
plProxy: Summary
   plProxy adds very little overhead when used together with pgBouncer.
   On the other hand plProxy adds complexity to development and
    maintenace so it must be used with care but that is true for most
    everything.
pgBouncer
pgBouncer
   pgBouncer is lightweight and robust connection pooler for Postgres.
   Low memory requirements (2k per connection by default). This is due to the fact that
    PgBouncer does not need to see full packet at once.
   It is not tied to one backend server, the destination databases can reside on
    different hosts.
   Supports online reconfiguration for most of the settings.
   Supports online restart/upgrade without dropping client connections.
   Supports protocol V3 only, so backend version must be >= 7.4.
   Does not parse SQL so is very fast and uses little CPU time.
pgBouncer Pooling Modes
   Session pooling - Most polite method. When client connects, a server connection
    will be assigned to it for the whole duration it stays connected. When client
    disconnects, the server connection will be put back into pool. Should be used with
    legacy applications that won't work with more efficient pooling modes.
   Transaction pooling - Server connection is assigned to client only during a
    transaction. When PgBouncer notices that transaction is over, the server will be put
    back into pool. This is a hack as it breaks application expectations of backend
    connection. You can use it only when application cooperates with such usage by not
    using features that can break.
   Statement pooling - Most aggressive method. This is transaction pooling with a
    twist - multi-statement transactions are disallowed. This is meant to enforce
    "autocommit" mode on client, mostly targeted for PL/Proxy.
pgBouncer configuration
[databases]
forcedb = host=127.0.0.1 port=300 user=baz password=foo
bardb = host=127.0.0.1 dbname=bazdb
foodb =

[pgbouncer]
logfile = pgbouncer.log
pidfile = pgbouncer.pid
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = any / trust / crypt / md5
auth_file = etc/userlist.txt
pool_mode = session / transaction / statement
max_client_conn = 100
default_pool_size = 20
admin_users = admin, postgres
stats_users = stats
pgBouncer console
   Connect to database 'pgbouncer':
    $ psql -d pgbouncer -U admin -p 6432
   SHOW HELP;
   SHOW CONFIG: SHOW DATABASES;
   SET default_pool_size = 50;
   RELOAD;
   SHOW POOLS; SHOW STATS;
   PAUSE; PAUSE <db>;
   RESUME; RESUME <db>;
pgBouncer Use-cases
 Decreasing the number of connections hitting real databases
 Taking the logging-in load from real databases
 Central redirection point for database connections. Function s like
  ServiceIP. Allows pausing, redirecting of users.
 Backend pooler for PL/Proxy
 Database aliases. You can have different external names connected to
  one DSN. Makes it possible to pause them separately and merge
  databases.
SkyTools/PgQ
SkyTools and PgQ
   We keep SkyTools like pgBouncer and plProxy in our standard database server
    image so each new server has it installed from the start.
   Framework provides database connectivity, logging, stats management, encoding,
    decoding etc for batch jobs. Developers need only to take care of business logic in
    batch jobs all the rest is handled by batch jobs.
   SkyTools contains many reusable generic scripts.
PgQ: PostgreSQL Queuing
   Ideally suited for all kinds of batch processing.        onlinedb
                                                             SkyTools
   Efficient. Creating events is adds very little
    overhead and queue has beautiful interface that
    allows events to be created using standard SQL
    insert into syntax.                                           queuedb
   Fast. Events are processed in batches that usually            SkyTools
    means higher speed when processing.
   Flexible. Each queue can have several producers
    and consumers and any number of event types
    handled in it.                                                batchdb
   Robust. PgQ guarantees that each consumers
    sees event at least once. There several methods
    how the processed events can be tracked
    depending on business needs.
                                                         batch
                                                          job
PgQ - Setup
Basic PgQ setup and usage can be illustrated by the following steps:
1. create the database
2. edit a PgQ ticker configuration file, say ticker.ini
3. install PgQ internal tables
    $ pgqadm.py ticker.ini install
4. launch the PgQ ticker on database machine as daemon
    $ pgqadm.py -d ticker.ini ticker
5. create queue
    $ pgqadm.py ticker.ini create <queue>
6. register or run consumer to register it automatically
    $ pgqadm.py ticker.ini register <queue> <consumer>
7. start producing events
PgQ – Configuration                    Ticker reads event id sequence for
                                        each queue.
[pgqadm]                               If new events have appeared, then
job_name = pgqadm_somedb                inserts tick if:
                                           Configurable amount of events
db = dbname=somedb                           have appeared
                                             ticker_max_count (500)
# how often to run maintenance             Configurable amount of time has
maint_delay = 600                            passed from last tick
                                             ticker_max_lag (3 sec)
# how often to check for activity      If no events in the queue, creates tick
loop_delay = 0.1                        if some time has passed.
                                           ticker_idle_period (60
logfile = ~/log/%(job_name)s.log             sec)
pidfile = ~/pid/%(job_name)s.pid       Configuring from command line:
                                           pgqadm.py ticker.ini
                                             config my_queue
                                             ticker_max_count=100
PgQ event structure
CREATE TABLE pgq.event (
     ev_id   int8 NOT NULL,
     ev_txid int8 NOT NULL DEFAULT txid_current(),
     ev_time timestamptz NOT NULL DEFAULT now(),
     -- rest are user fields --
     ev_type text,   -- what to expect from ev_data
     ev_data text,   -- main data, urlenc, xml, json
     ev_extra1 text, -- metadata
     ev_extra2 text, -- metadata
     ev_extra3 text, -- metadata
     ev_extra4 text -- metadata
);
CREATE INDEX txid_idx ON pgq.event (ev_txid);
PgQ: Producer - API Event Insertion
   Single event insertion:
       pgq.insert_event(queue, ev_type, ev_data): int8
   Bulk insertion, in single transaction:
       pgq.current_event_table(queue): text
   Inserting with triggers:
       pgq.sqltriga(queue, ...) - partial SQL format
       pgq.logutriga(queue, ...) - urlencoded format
PgQ API: insert complex event with pure SQL
   Create interface table for queue with logutriga for encoding inserted events
   Type safety, default values, sequences, constraints!
   Several tables can insert into same queue.
   Encoding decoding is totally hidden from developers they just work with tables.
    CREATE TABLE queue.mailq (
        mk_template bigint NOT NULL,
        username text NOT NULL,
        email text,
        mk_language text,
        payload text
    );

    CREATE TRIGGER send_mailq_trg
        BEFORE INSERT ON queue.mailq
        FOR EACH ROW
        EXECUTE PROCEDURE pgq.logutriga('mail_events', 'SKIP');

    -- send e-mail
    INSERT INTO queue.mailq (mk_template, username, email)
    VALUES(73, _username, _email);
PgQ: Consumer - Reading Events
Registering
    pgq.register_consumer(queue, consumer)
    pgq.unregister_consumer(queue, consumer)
    or
    $ pgqadm.py <ini> register <queue> <consumer>
    $ pgqadm.py <ini> unregister <queue> <consumer>
Reading
    pgq.next_batch(queue, consumer): int8
    pgq.get_batch_events(batch_id): SETOF record
    pgq.finish_batch(batch_id)
PgQ: Tracking Events
   Per-event overhead
   Need to avoid accumulating
   pgq_ext solution
       pgq_ext.is_event_done(consumer, batch_id, ev_id)
       pgq_ext.set_event_done(consumer, batch_id, ev_id)
   If batch changes, deletes old events
   Eg. email sender, plProxy.
PgQ: Tracking Batches
   Minimal per-event overhead
   Requires that all batch is processed in one TX
      pgq_ext.is_batch_done(consumer, batch_id)
      pgq_ext.set_batch_done(consumer, batch_id)
   Eg. replication, most of the SkyTools partitioning scripts like queue_mover,
    queue_splitter, table_dispatcher, cube_dispatcher.
PgQ: Queue Mover
   Moves data from source queue in one database to another queue in other database.
   Used to move events from online databases to other databases for processing or
    just storage.
   Consolidates events if there are several producers as in case of partitioned
    databases.
    [queue_mover]
    job_name = eventlog_to_target_mover

    src_db = dbname=oltpdb
    dst_db = dbname=batchdb
                                          userdb_p1           queue
    pgq_queue_name = batch_events                             mover
    dst_queue_name = batch_events                                              logdb
                                                              queue
    pidfile = log/%(job_name)s.pid        userdb_p2
    logfile = pid/%(job_name)s.log                            mover
PgQ: Queue Mover Code
import pgq
class QueueMover(pgq.RemoteConsumer):
  def process_remote_batch(self, db, batch_id, ev_list, dst_db):
    # prepare data
    rows = []
    for ev in ev_list:
      rows.append([ev.type, ev.data, ev.time])
      ev.tag_done()

    # insert data
    fields = ['ev_type', 'ev_data', 'ev_time']
    curs = dst_db.cursor()
    dst_queue = self.cf.get('dst_queue_name')
    pgq.bulk_insert_events(curs, rows, fields, dst_queue)

script = QueueMover('queue_mover', 'src_db', 'dst_db', sys.argv[1:])
script.start()
PgQ: Queue Splitter
 Moves data from source queue in one database to one or more
  queue's in target database based on producer. That is another version
  of queue_mover but it has it's own benefits.
 Used to move events from online databases to queue databases.
 Reduces number of dependencies of online databases.

          database:                                             promotional
                                             database:
          userdb_pN                                                mailer
                                              mailqdb
           producer:                         queue;
        welcome_emails                  welcome_emails

           producer:                         queue:
        password_emails                 password_emails
                                                               transactional
           queue:                                                  mailer
        batch_events        queue
                            splitter
PgQ: Simple Consumer
   Simplest consumer. Basic idea is to read url encoded events from queue and
    execute SQL using fields from those events as parameters.
   Very convenient when you have events generated in one database and want to get
    background processing or play them into another database.
   Does not guarantee that each event is processed only once so it's up to the called
    function to guarantee that duplicate calls will be ignored.
    [simple_consumer]
    job_name       = job_to_be_done
    src_db         = dbname=sourcedb
    dst_db         = dbname=targetdb
    pgq_queue_name = event_queue
    dst_query      =
        SELECT * FROM function_to_be_run(%%(username)s, %%(email)s);

    logfile          = ~/log/%(job_name)s.log
    pidfile          = ~/pid/%(job_name)s.pid
PgQ: Simple Serial Consumer
    Similar to simpler consumer but uses pgq_ext to guarantee that each event will be
     processed only once.

    import sys, pgq, skytools

    class SimpleSerialConsumer(pgq.SerialConsumer):
        def __init__(self, args):
            pgq.SerialConsumer.__init__(self,"simple_serial_consumer","src_db","dst_db", args)
            self.dst_query = self.cf.get("dst_query")

        def process_remote_batch(self, batch_id, ev_list, dst_db):
            curs = dst_db.cursor()
            for ev in ev_list:
                payload = skytools.db_urldecode(ev.payload)
                curs.execute(self.dst_query, payload)
                res = curs.dictfetchone()
                ev.tag_done()

    if __name__ == '__main__':
        script = SimpleSerialConsumer(sys.argv[1:])
        script.start()
PgQ: Table Dispatcher
   Has url encoded events as data source and writes them into table on target
   database.
   Used to partiton data. For example change log's that need to kept online only shortly
   can be written to daily tables and then dropped as they become irrelevant.
   Also allows to select which columns have to be written into target database
   Creates target tables according to configuration file as needed


                                        table                          table:history
                                     dispatcher                        history_2008_09
                                                                       history_2008_10
                                                                       ...
                                   table
          queue:                dispatcher        table:cr
        call_records                              cr_2008_09_01
                                                  cr_2008_09_02
                                                  ...
PgQ: Cube Dispatcher
   Has url encoded events as data source and writes them into partitoned tables in
    target database. Logutriga is used to create events.
   Used to provide batches of data for business intelligence and data cubes.
   Only one instance of each record is stored. For example if record is created and
    then updated twice only latest version of record stays in that days table.
   Does not support deletes.

                                                          table:invoices
                                                          invoices_2008_09_01
         producer:                                        invoices_2008_09_02
          invoices                                        ...

         producer:                                        table:payments
         payments                       cube
                                     dispatcher           payments_2008_09_01
                                                          payments_2008_09_02
              queue:                                      ...
             cube_events
Londiste
Londiste: Replication
   We use replication
       to transfer online data into internal databases
       to create failover databases for online databases.
       to switch between PostgreSQL versions
       to distribute internally produced data into online servers
       to create read only replicas for load balancing
   Londiste is implemented using PgQ as transport layer.
   It has DBA friendly command line interface.


                   shopdb                       analysisdb
                   (online)                      (internal)



                 shopdb_ro          shopdb
                 (read only)       (failover)
Londiste: Setup
1. create the subscriber database, with tables to replicate
2. edit a londiste configuration file, say conf.ini, and a PgQ ticker
  configuration file, say ticker.ini
3. install londiste on the provider and subscriber nodes. This step
  requires admin privileges on both provider and subscriber sides,
  and both install commands can be run remotely:
    $ londiste.py conf.ini provider install
    $ londiste.py conf.ini subscriber install
4. launch the PgQ ticker on the provider machine:
    $ pgqadm.py -d ticker.ini ticker
5. launch the londiste replay process:
    $ londiste.py -d conf.ini replay
6. add tables to replicate from the provider database:
    $ londiste.py conf.ini provider add table1 table2 ...
7. add tables to replicate to the subscriber database:
    $ londiste.py conf.ini subscriber add table1 table2 ...
Londiste: Configuration
 [londiste]
 job_name = londiste_providerdb_to_subscriberdb

 provider_db = dbname=provider port=5432 host=127.0.0.1
 subscriber_db = dbname=subscriber port=6432 host=127.0.0.1

 # it will be used as sql ident so no dots/spaces
 pgq_queue_name = londiste.replika

 logfile = /tmp/%(job_name)s.log
 pidfile = /tmp/%(job_name)s.pid
Londiste: Usage
   Tables can be added one-by-one into set.
   Initial COPY for one table does not block event replay for other tables.
   Can compare tables on both sides.
   Supports sequences.
   There can be several subscribers listening to one provider queue.
   Each subscriber can listen to any number of tables provided by
    provider all the other events are ignored.
That was not all
SkyTools: WalMgr
 Used to creat standby servers inside same colocation.
 Also provides point in time recovery.
 Requires more network bandwidth than londiste but has very little
  management overhead.
 Most common usage is when servers starts showing signs of braking
  down then we move database into gresh server until other one gets
  fixed.
 Also upgrade on better hardware is quite common.
SODI Framework
   Rapid application development                            Application layer
   Cheap changes                                            - java / php ...
                                                             - UI logic
   Most of the design is in metadata.
                                                             . mostly generated
   Application design stays in sync with application over   - SODI framework
    time.
                                                             AppServer layer:
   Minimizes number of database roundtrips.                 - java / php( ...
   Can send multiple recordsets to one function call.       - user authentication
                                                             - roles rights
   Can receive several recordsets from function call.       - no business logic
                                                               Database layer
                                                               - business logic
                                                               - plPython
                                                               - PostgreSQL
                                                               - SkyTools
Multiple Recordsets in Function Resultset
 CREATE FUNCTION meta.get_iotype(i_context text, i_params text)
 RETURNS SETOF public.ret_multiset AS $$
     import dbservice2
     dbs = dbservice2.ServiceContext( i_context, GD )
     params = dbs.get_record( i_params )

     sql = """
         select id_iotype, mod_code, schema || '.' || iotype_name as iotype
               , comments ,version
         from meta.iotype
         where id_iotype = {id_iotype}
     """
     dbs.return_next_sql( sql, params, 'iotype' )

     sql = """
         select id_iotype_attr, key_iotype ,attr_name, mk_datatype
               , comments, label, version, order_by, br_codes
         from meta.iotype_attr
         where key_iotype = {id_iotype}
         order by order_by
     """
     dbs.return_next_sql( sql, params, 'ioattr')

     return dbs.retval()
 $$ LANGUAGE plpythonu;
Multiple Recordsets as Function Parameters
CREATE FUNCTION dba.set_role_grants(
    i_context text,
    i_params text,
    i_roles text,
    i_grants text)
RETURNS SETOF public.ret_multiset AS $$
    import dbservice2
    dbs = dbservice2.ServiceContext( i_context, GD )
    params = dbs.get_record( i_params )
    t_roles = dbs.get_record_list( i_roles )
    t_grants = dbs.get_record_list( i_grants )
...
    for r in t_roles:
        key_role = roles.get(r.role_name)
        r.id_host = params.id_host
        r.key_role = key_role

        if key_role is None:
            dbs.run_query( """
                insert into dba.role (
                    key_host, role_name, system_privs, role_options, password_hash
                  , connection_limit, can_login, granted_roles
                ) values (
                  {id_host},{role_name},{system_privs},{role_options},{password_hash}
                 , {connection_limit},{can_login},string_to_array({granted_roles},','))
            """, r)
            dbs.tell_user(dbs.INFO, 'dbs1213', 'Role added: %s' % r.role_name)
...
That Was All
   :)

Mais conteúdo relacionado

Mais procurados

006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradminScott Miao
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Denish Patel
 
Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Sameer Kumar
 
PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4Ashnikbiz
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitRedis Labs
 
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...Rajni Baliyan
 
Security features In MySQL 8.0
Security features In MySQL 8.0Security features In MySQL 8.0
Security features In MySQL 8.0Mydbops
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Denish Patel
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetAlexey Lesovsky
 
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Denish Patel
 
Nginx Scalable Stack
Nginx Scalable StackNginx Scalable Stack
Nginx Scalable StackBruno Paiuca
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsMydbops
 
Fortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleuFortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleuMarco Tusa
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replicationsatejsahu
 
Enterprise grade deployment and security with PostgreSQL
Enterprise grade deployment and security with PostgreSQLEnterprise grade deployment and security with PostgreSQL
Enterprise grade deployment and security with PostgreSQLHimanchali -
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
 
Mysql data replication
Mysql data replicationMysql data replication
Mysql data replicationTuấn Ngô
 
MySQL High Availability Deep Dive
MySQL High Availability Deep DiveMySQL High Availability Deep Dive
MySQL High Availability Deep Divehastexo
 
Intro ProxySQL
Intro ProxySQLIntro ProxySQL
Intro ProxySQLI Goo Lee
 

Mais procurados (19)

006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3
 
PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbit
 
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
 
Security features In MySQL 8.0
Security features In MySQL 8.0Security features In MySQL 8.0
Security features In MySQL 8.0
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 
Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2
 
Nginx Scalable Stack
Nginx Scalable StackNginx Scalable Stack
Nginx Scalable Stack
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
 
Fortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleuFortify aws aurora_proxy_2019_pleu
Fortify aws aurora_proxy_2019_pleu
 
Intro to MySQL Master Slave Replication
Intro to MySQL Master Slave ReplicationIntro to MySQL Master Slave Replication
Intro to MySQL Master Slave Replication
 
Enterprise grade deployment and security with PostgreSQL
Enterprise grade deployment and security with PostgreSQLEnterprise grade deployment and security with PostgreSQL
Enterprise grade deployment and security with PostgreSQL
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
 
Mysql data replication
Mysql data replicationMysql data replication
Mysql data replication
 
MySQL High Availability Deep Dive
MySQL High Availability Deep DiveMySQL High Availability Deep Dive
MySQL High Availability Deep Dive
 
Intro ProxySQL
Intro ProxySQLIntro ProxySQL
Intro ProxySQL
 

Destaque

Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergendistributed matters
 
Demystifying PostgreSQL
Demystifying PostgreSQLDemystifying PostgreSQL
Demystifying PostgreSQLNOLOH LLC.
 
"Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (...
"Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (..."Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (...
"Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (...AvitoTech
 
plProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerplProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerelliando dias
 
Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016Robert Treat
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 

Destaque (6)

Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike SteenbergenMeet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
 
Demystifying PostgreSQL
Demystifying PostgreSQLDemystifying PostgreSQL
Demystifying PostgreSQL
 
"Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (...
"Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (..."Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (...
"Отказоустойчивый standby PostgreSQL (HAProxy + PgBouncer)" Виктор Ягофаров (...
 
plProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancerplProxy, pgBouncer, pgBalancer
plProxy, pgBouncer, pgBalancer
 
Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016Managing Databases In A DevOps Environment 2016
Managing Databases In A DevOps Environment 2016
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Semelhante a Database Tools by Skype

Small Overview of Skype Database Tools
Small Overview of Skype Database ToolsSmall Overview of Skype Database Tools
Small Overview of Skype Database Toolselliando dias
 
Asko Oja Moskva Architecture Highload
Asko Oja Moskva Architecture HighloadAsko Oja Moskva Architecture Highload
Asko Oja Moskva Architecture HighloadOntico
 
Moskva Architecture Highload
Moskva Architecture HighloadMoskva Architecture Highload
Moskva Architecture HighloadOntico
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonTimothy Spann
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Timothy Spann
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...ScaleGrid.io
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practiceDocker, Inc.
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil Naik
 
Www Kitebird Com Articles Pydbapi Html Toc 1
Www Kitebird Com Articles Pydbapi Html Toc 1Www Kitebird Com Articles Pydbapi Html Toc 1
Www Kitebird Com Articles Pydbapi Html Toc 1AkramWaseem
 
Replication - Nick Carboni - ManageIQ Design Summit 2016
Replication - Nick Carboni - ManageIQ Design Summit 2016Replication - Nick Carboni - ManageIQ Design Summit 2016
Replication - Nick Carboni - ManageIQ Design Summit 2016ManageIQ
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingStanislav Osipov
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021StreamNative
 
Webinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python DriverWebinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python DriverDataStax Academy
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Kasper Nissen
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance ComputersDave Hiltbrand
 
Skytools: PgQ Queues and applications
Skytools: PgQ Queues and applicationsSkytools: PgQ Queues and applications
Skytools: PgQ Queues and applicationselliando dias
 

Semelhante a Database Tools by Skype (20)

Small Overview of Skype Database Tools
Small Overview of Skype Database ToolsSmall Overview of Skype Database Tools
Small Overview of Skype Database Tools
 
Asko Oja Moskva Architecture Highload
Asko Oja Moskva Architecture HighloadAsko Oja Moskva Architecture Highload
Asko Oja Moskva Architecture Highload
 
Moskva Architecture Highload
Moskva Architecture HighloadMoskva Architecture Highload
Moskva Architecture Highload
 
Apache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
 
Container orchestration from theory to practice
Container orchestration from theory to practiceContainer orchestration from theory to practice
Container orchestration from theory to practice
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Www Kitebird Com Articles Pydbapi Html Toc 1
Www Kitebird Com Articles Pydbapi Html Toc 1Www Kitebird Com Articles Pydbapi Html Toc 1
Www Kitebird Com Articles Pydbapi Html Toc 1
 
Replication - Nick Carboni - ManageIQ Design Summit 2016
Replication - Nick Carboni - ManageIQ Design Summit 2016Replication - Nick Carboni - ManageIQ Design Summit 2016
Replication - Nick Carboni - ManageIQ Design Summit 2016
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
 
Webinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python DriverWebinar | Building Apps with the Cassandra Python Driver
Webinar | Building Apps with the Cassandra Python Driver
 
Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"Lunar Way and the Cloud Native "stack"
Lunar Way and the Cloud Native "stack"
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
Skytools: PgQ Queues and applications
Skytools: PgQ Queues and applicationsSkytools: PgQ Queues and applications
Skytools: PgQ Queues and applications
 

Mais de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

Mais de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Database Tools by Skype

  • 1. Database Tools by Skype Asko Oja
  • 2. Stored Procedure API  Skype has had from the beginning the requirement that all database access must be implemented through stored procedures.  Using function-based database access has more general good points:  It's good to have SQL statements that operate on data near to tables. That makes life of DBA's easier.  It makes it possible to optimize and reorganize tables transparently to the application.  Enables DBA's to move parts of database into another database without changing application interface.  Easier to manage security if you don't have to do it on table level. In most cases you need to control what user can do and on which data not on what tables.  All transactions can be made in autocommit mode. That means absolutely minimal amount of roundtrips (1) for each query and also each transaction takes shortest possible amount of time on server - remember that various locks that transactions acquire are release on COMMIT.
  • 3. Keeping Online Databases Slim and Fit  Rule 1: Minimize number of connections. Channel all incoming queries through optimal number of database connections using pgBouncer  Rule 2: Minimize number of indexes constraints that use up performance and resources. For example backoffice applications need quite often more and different indexes than are needed for online queries.  Rule 3: Keep as little data as possible in online database. Move all data that is not need into second or third tier databases. Use remote calls to access it if needed.  Rule 4: Keep transactions short. Using functions is the best way to do this as you can use functions in autocommit mode and still do complex SQL operations.  Rule 5: Keep external dependancies to the minimum then you have less things that can get broken and affect online database. For examples run batch jobs against second and third tier databases.
  • 4. Overall picture Online databases proxydb proxydb pgBouncer ● Proxy db's pgBouncer pgBouncer ● pgBouncers ● OLTP db's ● read only db's onlinedb_p01 onlinedb_p02 shopdb shopdb_ro SkyTools SkyTools SkyTools SkyTools Support databases ● Queue db's queuedb SkyTools ● Datamining ● Batchjobs ● Backoffice ● Greenplum batchdb analysisdb backofficedb
  • 5. pgBouncer – PostgreSQL Connection pooler  pgBouncer is lightweight and robust connection pooler for PostgreSQL. FrontEnd WebServer  Reduces thousands of incoming connections to only tens of connections in database.  Low number of connections is important because each connection uses computer resources and each new connection is quite pgBouncer expensive as prepared plans have to be created each time from scratch.  We are not using pgBouncer for load userdb balancing.  Can be used to redirect database calls (database aliases).
  • 6. plProxy – Remote Call Language  PL/Proxy is compact language for remote calls between PostgreSQL databases.  With PL/Proxy user can create proxy functions that have same signature as userdb remote functions to be called. The function body describes how the remote connection should be acquired. userdb_ro  CREATE FUNCTION get_user_email(username text) RETURNS text AS $$ CONNECT 'dbname=userdb_ro'; $$ LANGUAGE plproxy;
  • 7. SkyTools: Batch Job Framework  Written in python and contain most everything we have found useful in our everyday work with databases and PostgreSQL.  Framework provides database connectivity, logging, stats management, encoding, decoding etc for batch jobs. Developers need only to take care of business logic in batch jobs all the rest database is handled by batch jobs.  SkyTools contains 10s of reusable generic scripts for doing various data related tasks.  PgQ that adds event queues to PostgreSQL.  Londiste replication. batch database job  Walmgr for wal based log shipping. database
  • 8. PgQ: PostgreSQL Queues  PgQ is PostgreSQL based event processing system. It is part of SkyTools package that contains several useful implementations on this engine.  Event - atomic piece of data created by Producers. In PgQ event is one record in one of tables that services that queue. PgQ guarantees that each event is seen at least once but it is up to consumer to make sure that event is processed no more than once if that is needed.  Batch - PgQ is designed for efficiency and high throughput so events are grouped into batches for bulk processing.  Queue - Event are stored in queue tables i.e queues. Several producers can write into same queue and several consumers can read from the queue. Events are kept in queue until all the consumers have seen them. Queue can contain any number of event types it is up to Producer and Consumer to agree on what types of events are passed and how they are encoded  Producer - applications that pushes event into queue. Producer can be written in any language that is able to run stored procedures in PostgreSQL.  Consumer - application that reads events from queue. Consumers can be written in any language that can interact with PostgreSQL.
  • 9. PgQ: PostgreSQL Queues Illustration  Database that keeps track of user status (online, offline, busy, etc.  Producer can be anybody who can call stored procedure  The same goes about consumers java presence_cache1 c++: presence_listener PresenceDB java presence_cache2 function: set_user_presence php: presence_listener c++ table: presence_poller user_status queue: user_status python: presence_updater
  • 10. PgQ: Features  Transactional. Event insertion is committed or rolled back together with the other things that transaction changes.  Efficient. Usually database based queue implementations are not efficient but we managed to solve it because PostgreSQL makes transaction state visible to users.  Fast. Events are processed in batches which gives low per event overhead.  Flexible. Each queue can have several producers and consumers and any number of event types handled in it.  Robust. PgQ guarantees that each consumers sees event at least once. There several methods how the processed events can be tracked depending on business needs.  Ideally suited for all kinds of batch processing.
  • 12. plProxy: Installation  Download PL/Proxy from http://pgfoundry.org/projects/plproxy and extract.  Build PL/Proxy by running make and make install inside of the plproxy directory. If your having problems make sure that pg_config from the postgresql bin directory is in your path.  To install PL/Proxy in a database execute the commands in the plproxy.sql file. For example psql -f $SHAREDIR/contrib/plproxy.sql mydatabase  Steps 1 and 2 can be skipped if your installed pl/proxy from a packaging system such as RPM.  Create a test function to validate that plProxy is working as expected.  CREATE FUNCTION get_user_email(username text) RETURNS text AS $$ CONNECT 'dbname=userdb'; $$ LANGUAGE plproxy;
  • 13. plProxy Language  The language is similar to plpgsql - string quoting, comments, semicolon at the statements end.It contains only 4 statements: CONNECT, CLUSTER, RUN and SELECT.  Each function needs to have either CONNECT or pair of CLUSTER + RUN statements to specify where to run the function.  CONNECT 'libpq connstr'; -- Specifies exact location where to connect and execute the query. If several functions have same connstr, they will use same connection.  CLUSTER 'cluster_name'; -- Specifies exact cluster name to be run on. The cluster name will be passed to plproxy.get_cluster_* functions.  CLUSTER cluster_func(..); -- Cluster name can be dynamically decided upon proxy function arguments. cluster_func should return text value of final cluster name.
  • 14. plProxy Language RUN ON ...  RUN ON ALL; -- Query will be run on all partitions in cluster in parallel.  RUN ON ANY; -- Query will be run on random partition.  RUN ON <NR>; -- Run on partition number <NR>.  RUN ON partition_func(..); -- Run partition_func() which should return one or more hash values. (int4) Query will be run on tagged partitions. If more than one partition was tagged, query will be sent in parallel to them. CREATE FUNCTION get_user_email(text) RETURNS text AS $$ CLUSTER 'userdb'; RUN ON get_hash($1); $$ LANGUAGE plproxy; Partition selection is done by taking lower bits from hash value. hash & (n-1), where n is power of 2.
  • 15. plProxy Configuration  Schema plproxy and three functions are needed for plProxy  plproxy.get_cluster_partitions(cluster_name text) – initializes plProxy connect strings to remote databases  plproxy.get_cluster_version(cluster_name text) – used by plProxy to determine if configuration has changed and should be read again. Should be as fast as possible because it is called for every function call that goes through plProxy.  plproxy.get_cluster_config(in cluster_name text, out key text, out val text) – can be used to change plProxy parameters like connection lifetime. CREATE FUNCTION plproxy.get_cluster_version(i_cluster text) RETURNS integer AS $$ SELECT 1; $$ LANGUAGE sql; CREATE FUNCTION plproxy.get_cluster_config(cluster_name text, OUT key text, OUT val text) RETURNS SETOF record AS $$ SELECT 'connection_lifetime'::text as key, text( 30*60 ) as val; $$ LANGUAGE sql;
  • 16. plProxy: Configuration Functions CREATE FUNCTION plproxy.get_cluster_partitions(cluster_name text) RETURNS SETOF text AS $$ begin if cluster_name = 'userdb' then return next 'port=9000 dbname=userdb_p00 user=proxy'; return next 'port=9000 dbname=userdb_p01 user=proxy'; return; end if; raise exception 'no such cluster: %', cluster_name; end; $$ LANGUAGE plpgsql SECURITY DEFINER; CREATE FUNCTION plproxy.get_cluster_partitions(i_cluster_name text) RETURNS SETOF text AS $$ declare r record; begin for r in select connect_string from plproxy.conf where cluster_name = i_cluster_name order by part_nr loop return next r.connect_string; end loop; if not found then raise exception 'no such cluster: %', i_cluster_name; end if; return; end; $$ LANGUAGE plpgsql SECURITY DEFINER;
  • 17. plProxy: Read Only Remote Calls  We use remote calls mostly for read only queries in cases where it is not reasonable to replicate data needed to calling database.  For example balance data is changing very often but whenever doing decisions based on balance we must use the latest balance so we use remote call to get user balance.  plProxy remote calls are not suitable for data modification because there is no guarantee that both transactions will be committed or rolled back (No 2phase commit). Instead of using remote calls we try to use PgQ queues and various consumers on them that provide our 2phase commit. userDB shopDb balanceDB
  • 18. plProxy: Remote Calls to Archive  Closed records from large online tables (invoices, payments, call detail records, etc) are moved to archive database on monthly basis after month is closed and data is locked against changes.  Normally users work only on online table but in some cases they need to see data from all the history.  plProxy can used to also query data from archive database when user requests data for longer period than online database holds. onlineDB archiveDB
  • 19. plProxy: Remote Calls for Autonomous Transactions  PostgreSQL does not have autonomous transactions but plProxy calls can be used for mimicking them.  Autonomous transactions can be useful for logging failed procedure calls.  If used extensively then usage of pgBouncer should be considered to reduce number of additional connections needed and reducing connection creation overhead.
  • 20. plProxy: Proxy Databases as Interface  Additional layer between application and databases.  Keep applications database connectivity simpler giving DBA's and developer's more flexibility for moving data and functionality around.  It gets really useful when number of databases gets bigger then connectivity management outside database layer gets too complicated :) BackOffice Application shopDb backofficeDb (proxy) (proxy) shopDb userDB internalDb
  • 21. plProxy: Proxy Databases for Security  Security layer. By giving access to proxy database DBA's can be sure that user has no way of accessing tables by accident or by any other means as only functions published in proxy database are visible to user.  Such proxy databases may be located in servers visible from outside network while databases containing data are usually not. BackOffice Application manualfixDb ccinterfaceDb (proxy) (proxy) shopDb creditcardDb
  • 22. plProxy: Run On All for Data  Also usable when exact partition where data resides is not known. Then function may be run on all partitions and only the one that has data does something. CREATE FUNCTION get_balances( i_users text[], CREATE FUNCTION balance.get_balances( OUT user text, i_users text[], OUT currency text, OUT username text, OUT balance numeric) OUT currency text, RETURNS SETOF record AS $$ OUT balance numeric) CLUSTER 'userdb'; RUN ON ALL; RETURNS SETOF record AS $$ $$ LANGUAGE plproxy; BEGIN FOR i IN COALESCE(array_lower(i_users,1),0) .. COALESCE(array_upper(i_users,1),-1) LOOP IF partconf.valid_hash(i_users[i]) THEN SELECT i_users[i], u.currency, u.balance INTO username, currency, balance userdb FROM balances WHERE username = i_users[i]; RETURN NEXT; END IF; END LOOP; RETURN; END; $$ LANGUAGE plpgsql SECURITY DEFINER; userdb_p0 userdb_p1 userdb_p2 userdb_p3
  • 23. plProxy: Run On All for for Stats  Gather statistics from all the partitions CREATE FUNCTION stats._get_stats( OUT stat_name text, and return summaries to the caller OUT stat_value bigint) CREATE FUNCTION stats._get_stats( RETURNS SETOF record AS $$ OUT stat_name text, declare OUT stat_value bigint) seq record; begin RETURNS SETOF record AS $$ for seq in cluster 'userdb'; select c.relname, run on all; 'select last_value from stats.' $$ LANGUAGE plproxy; || c.relname as sql from pg_class c, pg_namespace n CREATE FUNCTION stats.get_stats( where n.nspname = 'stats' OUT stat_name text, and c.relnamespace = n.oid OUT stat_value bigint) and c.relkind = 'S' RETURNS SETOF record AS $$ order by 1 select stat_name, loop (sum(stat_value))::bigint stat_name := seq.relname; from stats._get_stats() execute seq.sql into stat_value; group by stat_name return next; order by stat_name; end loop; $$ LANGUAGE sql; return; end; $$ LANGUAGE plpgsql;
  • 24. plProxy: Geographical Partitioning  plProxy can be used to split database into partitions based on country code. Example database is split into 'us' and 'row' (rest of the world)  Each function call caused by online users has country code as one of the parameters  All data is replicated into backend database for use by backoffice applications and batch jobs. That also reduces number of indexes needed in online databases. CREATE FUNCTION get_cluster( i_key_cc text) onlinedb RETURNS text AS $$ (proxy) BEGIN IF i_key_cc = 'ru' THEN RETURN 'oltp_ru'; ELSE RETURN 'oltp_row'; onlinedb_RU onlinedb_ROW backenddb END IF; END; $$ LANGUAGE plpgsql;
  • 25. plProxy: Application Driven Partitioning  Sometimes same function can be run on read only replica and on online database but the right partition is known only on runtime depending on context.  In that situation get_cluster can be used to allow calling application to choose where this function will be executed. CREATE FUNCTION get_cluster(i_refresh boolean) RETURNS text AS $$ BEGIN IF i_refresh THEN return 'shopdb'; ELSE shopdb return 'shopdb_ro'; (proxy) END IF; END; $$ LANGUAGE plpgsql; shopdb_RW shopdb_RO shopdb_RO
  • 26. plProxy: Horizontal Partitioning  We have partitioned most of our database by username using PostgreSQL hashtext function to get equal distribution between partitions.  As proxy databases are stateless we can have several exact duplicates for load balancing and high availability. CREATE FUNCTION public.get_user_email(text) RETURNS text AS $$ cluster 'userdb'; run on get_hash($1); $$ LANGUAGE plproxy; CREATE FUNCTION public.get_hash( i_key_user text) userdb userdb RETURNS integer AS $$ BEGIN return hashtext(lower(i_key_user)); END; $$ LANGUAGE plpgsql; userdb_p0 userdb_p1 userdb_p2 userdb_p3
  • 27. plProxy: Partitioning Partition Functions  check_hash used to validate incoming calls  partconf functionality to handle sequences  validate_hash function in triggers to exclude data during split CREATE FUNCTION public.get_user_email( i_username text) RETURNS text AS $$ DECLARE retval text; BEGIN PERFORM partconf.check_hash(i_username); SELECT email FROM users WHERE username = lower(i_username) INTO retval; RETURN retval; END; $$ LANGUAGE plpgsql;
  • 28. plProxy: Horizontal Split  When splitting databases we usually prepare new partitions in other servers and then switch all traffic at once to keep our life simple.  Steps how to increase partition count by 2x:  Create 2 partitions for each old one.  Replicate partition data to corresponding new partitions  Delete unnecessary data from new partitions and cluster and analyze tables  Prepare new plProxy configuration  Switch over to them  Take down ip  Up plproxy version  Wait replica to catch up  Set ip up  You can have several partitions in one server. Moving them out later will be easy. userdb_a0 userdb_a1 userdb_b0 userdb_b1 userdb_b2 userdb_b3
  • 29. plProxy: Partconf  Partconf is useful when you have split database into several partitions and there are also failover replicas of same databases.  Some mechanism for handling sequences id's is needed  Some mechanism for protecting from misconfiguration makes life safer  CREATE FUNCTION partconf.set_conf( i_part_nr integer , i_max_part integer , i_db_code integer ) Used when partconf is installed to initialize it. db_code is supposed to be unique over all the databases.  CREATE FUNCTION partconf.check_hash(key_user text) RETURNS boolean used internally from other functions to validate that function call was directed to correct partition. Raises error if not right partition for that username.  CREATE FUNCTION global_id() RETURNS bigint used to get globally unique id's
  • 30. plProxy: Summary  plProxy adds very little overhead when used together with pgBouncer.  On the other hand plProxy adds complexity to development and maintenace so it must be used with care but that is true for most everything.
  • 32. pgBouncer  pgBouncer is lightweight and robust connection pooler for Postgres.  Low memory requirements (2k per connection by default). This is due to the fact that PgBouncer does not need to see full packet at once.  It is not tied to one backend server, the destination databases can reside on different hosts.  Supports online reconfiguration for most of the settings.  Supports online restart/upgrade without dropping client connections.  Supports protocol V3 only, so backend version must be >= 7.4.  Does not parse SQL so is very fast and uses little CPU time.
  • 33. pgBouncer Pooling Modes  Session pooling - Most polite method. When client connects, a server connection will be assigned to it for the whole duration it stays connected. When client disconnects, the server connection will be put back into pool. Should be used with legacy applications that won't work with more efficient pooling modes.  Transaction pooling - Server connection is assigned to client only during a transaction. When PgBouncer notices that transaction is over, the server will be put back into pool. This is a hack as it breaks application expectations of backend connection. You can use it only when application cooperates with such usage by not using features that can break.  Statement pooling - Most aggressive method. This is transaction pooling with a twist - multi-statement transactions are disallowed. This is meant to enforce "autocommit" mode on client, mostly targeted for PL/Proxy.
  • 34. pgBouncer configuration [databases] forcedb = host=127.0.0.1 port=300 user=baz password=foo bardb = host=127.0.0.1 dbname=bazdb foodb = [pgbouncer] logfile = pgbouncer.log pidfile = pgbouncer.pid listen_addr = 127.0.0.1 listen_port = 6432 auth_type = any / trust / crypt / md5 auth_file = etc/userlist.txt pool_mode = session / transaction / statement max_client_conn = 100 default_pool_size = 20 admin_users = admin, postgres stats_users = stats
  • 35. pgBouncer console  Connect to database 'pgbouncer': $ psql -d pgbouncer -U admin -p 6432  SHOW HELP;  SHOW CONFIG: SHOW DATABASES;  SET default_pool_size = 50;  RELOAD;  SHOW POOLS; SHOW STATS;  PAUSE; PAUSE <db>;  RESUME; RESUME <db>;
  • 36. pgBouncer Use-cases  Decreasing the number of connections hitting real databases  Taking the logging-in load from real databases  Central redirection point for database connections. Function s like ServiceIP. Allows pausing, redirecting of users.  Backend pooler for PL/Proxy  Database aliases. You can have different external names connected to one DSN. Makes it possible to pause them separately and merge databases.
  • 38. SkyTools and PgQ  We keep SkyTools like pgBouncer and plProxy in our standard database server image so each new server has it installed from the start.  Framework provides database connectivity, logging, stats management, encoding, decoding etc for batch jobs. Developers need only to take care of business logic in batch jobs all the rest is handled by batch jobs.  SkyTools contains many reusable generic scripts.
  • 39. PgQ: PostgreSQL Queuing  Ideally suited for all kinds of batch processing. onlinedb SkyTools  Efficient. Creating events is adds very little overhead and queue has beautiful interface that allows events to be created using standard SQL insert into syntax. queuedb  Fast. Events are processed in batches that usually SkyTools means higher speed when processing.  Flexible. Each queue can have several producers and consumers and any number of event types handled in it. batchdb  Robust. PgQ guarantees that each consumers sees event at least once. There several methods how the processed events can be tracked depending on business needs. batch job
  • 40. PgQ - Setup Basic PgQ setup and usage can be illustrated by the following steps: 1. create the database 2. edit a PgQ ticker configuration file, say ticker.ini 3. install PgQ internal tables $ pgqadm.py ticker.ini install 4. launch the PgQ ticker on database machine as daemon $ pgqadm.py -d ticker.ini ticker 5. create queue $ pgqadm.py ticker.ini create <queue> 6. register or run consumer to register it automatically $ pgqadm.py ticker.ini register <queue> <consumer> 7. start producing events
  • 41. PgQ – Configuration  Ticker reads event id sequence for each queue. [pgqadm]  If new events have appeared, then job_name = pgqadm_somedb inserts tick if:  Configurable amount of events db = dbname=somedb have appeared ticker_max_count (500) # how often to run maintenance  Configurable amount of time has maint_delay = 600 passed from last tick ticker_max_lag (3 sec) # how often to check for activity  If no events in the queue, creates tick loop_delay = 0.1 if some time has passed.  ticker_idle_period (60 logfile = ~/log/%(job_name)s.log sec) pidfile = ~/pid/%(job_name)s.pid  Configuring from command line:  pgqadm.py ticker.ini config my_queue ticker_max_count=100
  • 42. PgQ event structure CREATE TABLE pgq.event ( ev_id int8 NOT NULL, ev_txid int8 NOT NULL DEFAULT txid_current(), ev_time timestamptz NOT NULL DEFAULT now(), -- rest are user fields -- ev_type text, -- what to expect from ev_data ev_data text, -- main data, urlenc, xml, json ev_extra1 text, -- metadata ev_extra2 text, -- metadata ev_extra3 text, -- metadata ev_extra4 text -- metadata ); CREATE INDEX txid_idx ON pgq.event (ev_txid);
  • 43. PgQ: Producer - API Event Insertion  Single event insertion:  pgq.insert_event(queue, ev_type, ev_data): int8  Bulk insertion, in single transaction:  pgq.current_event_table(queue): text  Inserting with triggers:  pgq.sqltriga(queue, ...) - partial SQL format  pgq.logutriga(queue, ...) - urlencoded format
  • 44. PgQ API: insert complex event with pure SQL  Create interface table for queue with logutriga for encoding inserted events  Type safety, default values, sequences, constraints!  Several tables can insert into same queue.  Encoding decoding is totally hidden from developers they just work with tables. CREATE TABLE queue.mailq ( mk_template bigint NOT NULL, username text NOT NULL, email text, mk_language text, payload text ); CREATE TRIGGER send_mailq_trg BEFORE INSERT ON queue.mailq FOR EACH ROW EXECUTE PROCEDURE pgq.logutriga('mail_events', 'SKIP'); -- send e-mail INSERT INTO queue.mailq (mk_template, username, email) VALUES(73, _username, _email);
  • 45. PgQ: Consumer - Reading Events Registering pgq.register_consumer(queue, consumer) pgq.unregister_consumer(queue, consumer) or $ pgqadm.py <ini> register <queue> <consumer> $ pgqadm.py <ini> unregister <queue> <consumer> Reading pgq.next_batch(queue, consumer): int8 pgq.get_batch_events(batch_id): SETOF record pgq.finish_batch(batch_id)
  • 46. PgQ: Tracking Events  Per-event overhead  Need to avoid accumulating  pgq_ext solution  pgq_ext.is_event_done(consumer, batch_id, ev_id)  pgq_ext.set_event_done(consumer, batch_id, ev_id)  If batch changes, deletes old events  Eg. email sender, plProxy.
  • 47. PgQ: Tracking Batches  Minimal per-event overhead  Requires that all batch is processed in one TX  pgq_ext.is_batch_done(consumer, batch_id)  pgq_ext.set_batch_done(consumer, batch_id)  Eg. replication, most of the SkyTools partitioning scripts like queue_mover, queue_splitter, table_dispatcher, cube_dispatcher.
  • 48. PgQ: Queue Mover  Moves data from source queue in one database to another queue in other database.  Used to move events from online databases to other databases for processing or just storage.  Consolidates events if there are several producers as in case of partitioned databases. [queue_mover] job_name = eventlog_to_target_mover src_db = dbname=oltpdb dst_db = dbname=batchdb userdb_p1 queue pgq_queue_name = batch_events mover dst_queue_name = batch_events logdb queue pidfile = log/%(job_name)s.pid userdb_p2 logfile = pid/%(job_name)s.log mover
  • 49. PgQ: Queue Mover Code import pgq class QueueMover(pgq.RemoteConsumer): def process_remote_batch(self, db, batch_id, ev_list, dst_db): # prepare data rows = [] for ev in ev_list: rows.append([ev.type, ev.data, ev.time]) ev.tag_done() # insert data fields = ['ev_type', 'ev_data', 'ev_time'] curs = dst_db.cursor() dst_queue = self.cf.get('dst_queue_name') pgq.bulk_insert_events(curs, rows, fields, dst_queue) script = QueueMover('queue_mover', 'src_db', 'dst_db', sys.argv[1:]) script.start()
  • 50. PgQ: Queue Splitter  Moves data from source queue in one database to one or more queue's in target database based on producer. That is another version of queue_mover but it has it's own benefits.  Used to move events from online databases to queue databases.  Reduces number of dependencies of online databases. database: promotional database: userdb_pN mailer mailqdb producer: queue; welcome_emails welcome_emails producer: queue: password_emails password_emails transactional queue: mailer batch_events queue splitter
  • 51. PgQ: Simple Consumer  Simplest consumer. Basic idea is to read url encoded events from queue and execute SQL using fields from those events as parameters.  Very convenient when you have events generated in one database and want to get background processing or play them into another database.  Does not guarantee that each event is processed only once so it's up to the called function to guarantee that duplicate calls will be ignored. [simple_consumer] job_name = job_to_be_done src_db = dbname=sourcedb dst_db = dbname=targetdb pgq_queue_name = event_queue dst_query = SELECT * FROM function_to_be_run(%%(username)s, %%(email)s); logfile = ~/log/%(job_name)s.log pidfile = ~/pid/%(job_name)s.pid
  • 52. PgQ: Simple Serial Consumer  Similar to simpler consumer but uses pgq_ext to guarantee that each event will be processed only once. import sys, pgq, skytools class SimpleSerialConsumer(pgq.SerialConsumer): def __init__(self, args): pgq.SerialConsumer.__init__(self,"simple_serial_consumer","src_db","dst_db", args) self.dst_query = self.cf.get("dst_query") def process_remote_batch(self, batch_id, ev_list, dst_db): curs = dst_db.cursor() for ev in ev_list: payload = skytools.db_urldecode(ev.payload) curs.execute(self.dst_query, payload) res = curs.dictfetchone() ev.tag_done() if __name__ == '__main__': script = SimpleSerialConsumer(sys.argv[1:]) script.start()
  • 53. PgQ: Table Dispatcher  Has url encoded events as data source and writes them into table on target  database.  Used to partiton data. For example change log's that need to kept online only shortly  can be written to daily tables and then dropped as they become irrelevant.  Also allows to select which columns have to be written into target database  Creates target tables according to configuration file as needed table table:history dispatcher history_2008_09 history_2008_10 ... table queue: dispatcher table:cr call_records cr_2008_09_01 cr_2008_09_02 ...
  • 54. PgQ: Cube Dispatcher  Has url encoded events as data source and writes them into partitoned tables in target database. Logutriga is used to create events.  Used to provide batches of data for business intelligence and data cubes.  Only one instance of each record is stored. For example if record is created and then updated twice only latest version of record stays in that days table.  Does not support deletes. table:invoices invoices_2008_09_01 producer: invoices_2008_09_02 invoices ... producer: table:payments payments cube dispatcher payments_2008_09_01 payments_2008_09_02 queue: ... cube_events
  • 56. Londiste: Replication  We use replication  to transfer online data into internal databases  to create failover databases for online databases.  to switch between PostgreSQL versions  to distribute internally produced data into online servers  to create read only replicas for load balancing  Londiste is implemented using PgQ as transport layer.  It has DBA friendly command line interface. shopdb analysisdb (online) (internal) shopdb_ro shopdb (read only) (failover)
  • 57. Londiste: Setup 1. create the subscriber database, with tables to replicate 2. edit a londiste configuration file, say conf.ini, and a PgQ ticker configuration file, say ticker.ini 3. install londiste on the provider and subscriber nodes. This step requires admin privileges on both provider and subscriber sides, and both install commands can be run remotely: $ londiste.py conf.ini provider install $ londiste.py conf.ini subscriber install 4. launch the PgQ ticker on the provider machine: $ pgqadm.py -d ticker.ini ticker 5. launch the londiste replay process: $ londiste.py -d conf.ini replay 6. add tables to replicate from the provider database: $ londiste.py conf.ini provider add table1 table2 ... 7. add tables to replicate to the subscriber database: $ londiste.py conf.ini subscriber add table1 table2 ...
  • 58. Londiste: Configuration [londiste] job_name = londiste_providerdb_to_subscriberdb provider_db = dbname=provider port=5432 host=127.0.0.1 subscriber_db = dbname=subscriber port=6432 host=127.0.0.1 # it will be used as sql ident so no dots/spaces pgq_queue_name = londiste.replika logfile = /tmp/%(job_name)s.log pidfile = /tmp/%(job_name)s.pid
  • 59. Londiste: Usage  Tables can be added one-by-one into set.  Initial COPY for one table does not block event replay for other tables.  Can compare tables on both sides.  Supports sequences.  There can be several subscribers listening to one provider queue.  Each subscriber can listen to any number of tables provided by provider all the other events are ignored.
  • 61. SkyTools: WalMgr  Used to creat standby servers inside same colocation.  Also provides point in time recovery.  Requires more network bandwidth than londiste but has very little management overhead.  Most common usage is when servers starts showing signs of braking down then we move database into gresh server until other one gets fixed.  Also upgrade on better hardware is quite common.
  • 62. SODI Framework  Rapid application development Application layer  Cheap changes - java / php ... - UI logic  Most of the design is in metadata. . mostly generated  Application design stays in sync with application over - SODI framework time. AppServer layer:  Minimizes number of database roundtrips. - java / php( ...  Can send multiple recordsets to one function call. - user authentication - roles rights  Can receive several recordsets from function call. - no business logic Database layer - business logic - plPython - PostgreSQL - SkyTools
  • 63. Multiple Recordsets in Function Resultset CREATE FUNCTION meta.get_iotype(i_context text, i_params text) RETURNS SETOF public.ret_multiset AS $$ import dbservice2 dbs = dbservice2.ServiceContext( i_context, GD ) params = dbs.get_record( i_params ) sql = """ select id_iotype, mod_code, schema || '.' || iotype_name as iotype , comments ,version from meta.iotype where id_iotype = {id_iotype} """ dbs.return_next_sql( sql, params, 'iotype' ) sql = """ select id_iotype_attr, key_iotype ,attr_name, mk_datatype , comments, label, version, order_by, br_codes from meta.iotype_attr where key_iotype = {id_iotype} order by order_by """ dbs.return_next_sql( sql, params, 'ioattr') return dbs.retval() $$ LANGUAGE plpythonu;
  • 64. Multiple Recordsets as Function Parameters CREATE FUNCTION dba.set_role_grants( i_context text, i_params text, i_roles text, i_grants text) RETURNS SETOF public.ret_multiset AS $$ import dbservice2 dbs = dbservice2.ServiceContext( i_context, GD ) params = dbs.get_record( i_params ) t_roles = dbs.get_record_list( i_roles ) t_grants = dbs.get_record_list( i_grants ) ... for r in t_roles: key_role = roles.get(r.role_name) r.id_host = params.id_host r.key_role = key_role if key_role is None: dbs.run_query( """ insert into dba.role ( key_host, role_name, system_privs, role_options, password_hash , connection_limit, can_login, granted_roles ) values ( {id_host},{role_name},{system_privs},{role_options},{password_hash} , {connection_limit},{can_login},string_to_array({granted_roles},',')) """, r) dbs.tell_user(dbs.INFO, 'dbs1213', 'Role added: %s' % r.role_name) ...