O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Customize and Secure the Runtime and Dependencies of Your Procedural Languages Using PL/Container - Greenplum Summit 2018

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Próximos SlideShares
LXC on Ganeti
LXC on Ganeti
Carregando em…3
×

Confira estes a seguir

1 de 42 Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Customize and Secure the Runtime and Dependencies of Your Procedural Languages Using PL/Container - Greenplum Summit 2018 (20)

Anúncio

Mais de VMware Tanzu (20)

Mais recentes (20)

Anúncio

Customize and Secure the Runtime and Dependencies of Your Procedural Languages Using PL/Container - Greenplum Summit 2018

  1. 1. © Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0 Hubert Zhang (hzhang@pivotal.io) Jack WU (jwu@pivotal.io) PL/Container Introduction Customize and Secure the Runtime and Dependencies of Procedural Languages
  2. 2. Cover w/ Image Agenda ■ The Problem ■ What is PL/Container ■ How to use PL/Container ■ PL/Container Internals ■ Future Work ■ Q+A
  3. 3. The Problem PGConf 2018: PL/Container Introduction
  4. 4. We generate More and More Data
  5. 5. We generate More and More Data 1.2 ZB 2.8ZB 8.5ZB 40ZB2010 2012 2015 20201ZB=1,000,000,000,000,000,000,000 Byte
  6. 6. We want to analyze data for knowledge
  7. 7. We want to analyse data IN Database
  8. 8. But…
  9. 9. PL/Python and PL/R are UNTRUSTED Languages
  10. 10. Only Superuser can Create UDF in Untrusted Languages System(“rm -rf /data”)
  11. 11. The Problem: Triangle Dependency Data Scientist DBA UDF Review & Create Run UDF Greenplum Package UDFPackage 1. Greenplum 2. Operation System 3. Python / R 4. TensorFlow
  12. 12. Resolve The Problem: untrusted -> untrusted Data Scientist DBA UDF Review & Create Run UDF Create UDF
  13. 13. How to Make untrusted to untrusted? PL/Container PL/Container
  14. 14. What is PL/Container PGConf 2018: PL/Container Introduction
  15. 15. What is PL/Container? PL/Container is a customizable, secure runtime for Greenplum Database Procedural Languages. ● Greenplum Database Extension ● Stateless ● Based on Docker Container ● Customizable ● Secure ● Isolated PL/Container PL/Container
  16. 16. 6.Ping 7.Ping 8.Call 9.SPI 10.Result 11.Result PL/Container Architecture GPDB Segment host 1. Query plan 2. Parse container name from UDF body 3. Read configuration 4. Bring up container 5. Start Repeat p.8–p.11 12. Result Repeat p.12
  17. 17. How UDF run in PL/Container? PL/Container is a customizable, secure runtime for Greenplum Database Procedural Languages. a. PL/Container Extension starts a docker container (only in 1st call) b. Transfer UDF and data to docker container c. Run the UDF in docker container d. Contact the docker container to get the results PL/Container PL/Container
  18. 18. How to use PL/Container PGConf 2018: PL/Container Introduction
  19. 19. Install PL/Container on Greenplum Install from Source Code ● source $GPHOME/greenplum_path.sh ● make install Install from GPPKG ● gppkg –i plcontianer-1.1.0-rhel7-x86_64.gppkg ● no additional dependencies Platform Centos 6.6+ or 7.x Database Greenplum 5.2+ Docker Docker 17.05+ on Centos7 Docker 1.7+ on Centos6 Prerequisites
  20. 20. Build Custom Docker Image (optional) Minimum Requirement: ● Python or R environment ● Add location of libpython.so and libR.so to LD_LIBRARY_PATH Customize Your image: ● Install specific packages FROM continuumio/anaconda ENV LD_LIBRARY_PATH "/opt/conda/lib:$LD_LIBRARY_PATH” FROM continuumio/anaconda3 RUN conda install -c conda-forge -y tensorflow ENV LD_LIBRARY_PATH "/opt/conda/lib:/usr/local/lib:$LD_LIBRARY_PATH"
  21. 21. runtime <id> <image> <command> <shared directory> <setting memory_mb> <setting cpu_share> runtime add runtime delete runtime backup runtime restore runtime edit runtime show image add image delete image list Image RuntimeXML Configure PL/Container container cgroup node memory.memsw.limit_in_bytes cpu.shares
  22. 22. Run PL/Container Running a simple plpython UDF to calculate log10 postgres=# CREATE LANGUAGE plpythonu; plcontainer; postgres=# CREATE OR REPLACE FUNCTION pylog10(input double precision) RETURNS double precision AS $$ import math return math.log10(input) $$ LANGUAGE plpythonu; postgres=# SELECT pylog10(100); pylog10 -------------- 2 (1 row)
  23. 23. Run PL/Container Running a simple PL/Container UDF to calculate log10 postgres=# CREATE EXTENSION plcontainer; postgres=# CREATE OR REPLACE FUNCTION pylog10(Input double precision) RETURNS double precision AS $$ # container: plc_python_shared import math return math.log10(input) $$ LANGUAGE plcontainer; postgres=# SELECT pylog10(100); pylog10 -------------- 2 (1 row)
  24. 24. Demos How to use PLcontainer
  25. 25. PL/Container Internal PGConf 2018: PL/Container Introduction
  26. 26. Cover w/ Image PL/Container Internals ■ Message Protocol ■ SPI Support ■ Pluggable Backend ■ Resource Management ■ Error Handling ■ Performance
  27. 27. Message Protocol PL/Container use messages to communicate between QEs and containers. ● plcMsgPing ● plcMsgCallreq ● plcMsgResult ● plcMsgError ● plcMsgLog ● plcMsgSQL ● plcMsgSubtransaction ● plcMsgRaw Query Executor Container PING PONG SQLResultSQLError/LogCallReq Result
  28. 28. SPI support Server Programming Interface enable UDF to run SQL queries. Query ExecutorContainer plpy.execute(query) plan=plpy.prepare plpy.execute(plan) SPI_execute() create&cache plan SPI_execute_plan Where Query Generated Where Query Executed Problem: SPI is called inside container but executed at QE side.
  29. 29. Pluggable Backend Docker as a sandbox Container as a serviceSeparate Process GPDB Query Executor C CODE Python Executor Python Code
  30. 30. Resource Management Container level ● Memory: memory limit, minimum is 4M ● CpuShares: relative weight of CPU Extension level ● Integrated with GPDB resource group extension framework. OOM
  31. 31. Container level ● Memory: memory limit, minimum is 4M ● CpuShares: relative weight of CPU Extension level ● Integrated with GPDB resource group extension framework. create resource group plgroup (concurrency=0, cpu_rate_limit=10, memory_limit=30, memory_auditor=‘cgroup’) Resource Management OOM
  32. 32. Resource Management Container level ● Memory: memory limit, minimum is 4M ● CpuShares: relative weight of CPU Extension level ● Integrated with GPDB resource group extension framework. memory gpdb default_grou p OID plgroup oid aefs9err8e flkr345ere4 cgroup 38G 1G 2G create resource group plgroup (concurrency=0, cpu_rate_limit=10, memory_limit=30, memory_auditor=‘cgroup’)
  33. 33. Error Handling Container failure should not affect GPDB core ● Containers fail to create ● Containers fail to start ● Containers crash when running ● Cached containers crash Container cleanup ● Query Cancel ● QE error ● Cached QE quit when idle for a long time (By Cleanup process)
  34. 34. Performance Optimization ● Cached container (lifecycle same as QE) ● Unix domain socket ● Type conversion ● Resource management (CPU share) Best practices ● Array instead of multiple rows. ● Complex UDF instead of simple one QE Container start container tuple 1 tuple 2 query 1 query 2 tuple 1 tuple n tuple n . . . . .
  35. 35. Performance Test Environment ● Hardware: 6 virtual machines, each with 19G memory and 5 processors. (Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz) ● Software: Centos7, GPDB 5.2 with 30 segments. Workloads ● Long-running function ● Large input array function ● Large output array function
  36. 36. Performance Long-running function CREATE OR REPLACE FUNCTION pysleep(i int) RETURNS void AS $$ # contaziner: plc_python_shared import time time.sleep(i) $$ LANGUAGE plcontainer; SELECT count(pysleep(1)) FROM tbl; no performance downgrade
  37. 37. Performance Large input array function CREATE OR REPLACE FUNCTION pylargeint8in(a int8[]) RETURNS float8 AS $$ #container : plc_python_shared return sum(a)/float(len(a)) $$ LANGUAGE plcontainer; SELECT count(pylargeint8in(ARRAY(SELECT column1 FROM tbl1))) FROM tbl2; 2 times performance downgrade for Python 30% performance improvement for R
  38. 38. Performance Large output array function CREATE OR REPLACE FUNCTION pylargeoutfloat8(num int) RETURNS float8[] AS $$ # container: plc_python_shared return [x/3.0 for x in range(num)] $$ LANGUAGE plcontainer; SELECT count(pylargeoutfloat8(n)) FROM tbl; 4 times performance improvement
  39. 39. Future Work PGConf 2018: PL/Container Introduction
  40. 40. PL/Container Future Work (subject to change) Support More Languages Support More TechnologyContainer Orchestrater / Cloud
  41. 41. https://github.com/greenplum-db/plcontainer https://gpdb.docs.pivotal.io/570/ref_guide/extensions/pl_container.html
  42. 42. Transforming How The World Builds Software © Copyright 2017 Pivotal Software, Inc. All rights Reserved.

×