O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop

101 visualizações

Publicada em

Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop

Publicada em: Educação
  • Seja o primeiro a comentar

TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de alto desempenho para Hadoop

  1. 1. © 2016 IBM Corporation Big SQL – An Overview Julio Boehl boehl@br.ibm.com
  2. 2. © 2016 IBM Corporation2 Big SQL Master Class ▪ 25+ Micro Learning Topics (5-15 minute, short Videos)  Use Cases  Install  Security  Performance  Federation  More…! http://bit.ly/2tHYfw0
  3. 3. © 2016 IBM Corporation3 Leaders in Technology with Common Goals Consumers get the best in class technology with a solid roadmap • Data Science Platform ranked #1 by Gartner • Leader in SQL technology for Hadoop • Leader in on premise and hybrid cloud data and analytics solutions • Leader in Open Source Hadoop Distribution • 1000+ customers and 2100+ ecosystem partners • Original architects, developers and operators of Hadoop Commitment to progressing advanced analytics through open source +
  4. 4. © 2016 IBM Corporation4 IBM and Hortonworks Partnership History IBM and Hortonworks co-found ODPi IBM IOP and HDP Certify for ODPi V1 IBM and Hortonworks Power partnership IBM IOP and HDP Certify for ODPi V2 201720162015 Big SQL Certified for IOP and HDP ODPi = Open Data Platform initiative. For more information, visit odpi.org IBM and Hortonworks Expand Partnership +
  5. 5. © 2016 IBM Corporation5 IBM and Hortonworks Advance Client’s Analytics Journey Big Data Persistent Storage Hortonworks Data Platform IBM Big SQL Big Data Access Layer IBM Data Science & Machine Learning Data Science and Machine Learning IDE
  6. 6. © 2016 IBM Corporation6 16+ SQL Engines for Hadoop (Alphabetical Ordering) Big SQL (IBM) Drill HAWQ Hive Impala InfiniDB JethroData MemSQL Phoenix Presto Spark SQL Splice Machine Transwarp Trifodion Vertica on Hadoop (… and I’m sure we’re missing a few …)
  7. 7. © 2016 IBM Corporation7 Fewer Users Ad Hoc Queries & Discovery Transactional Fast Lookups Operational Data Store Ad Hoc Data Preparation EL-T and Simpler Large Scale Queries Hive Complex SQL, Many Users, Warehousing Spark SQL Drill Phoenix + HBase Splice-Machine ??? Cada engine SQL tem a sua vantagem
  8. 8. © 2016 IBM Corporation8 Hive is Really 3 Things… Open source SQL on Hadoop SQL Execution Engine Hive (Open Source) Hive Storage Model (open source) CSV Parquet ORC Others…Tab Delim. Hive Metastore (open source) MapReduce Tez Applications
  9. 9. © 2016 IBM Corporation9 Big SQL Preserves Open Source Foundation Leverages Hive metastore and storage formats. No Lock-in. Data part of Hadoop, not Big SQL. Fall back to Open Source Hive Engine at any time. SQL Execution Engines Big SQL (IBM) Hive (Open Source) Hive Storage Model (open source) CSV Parquet ORC Others…Tab Delim. Hive Metastore (open source) Applications
  10. 10. © 2016 IBM Corporation10 IBM Big SQL Making Big Data SQL Accessible Rich SQL Application Portability High Performance Enterprise Ready ANSI Compliant SQL IBM SQL PL Compatibility Extensive Analytic Functions Fluid Query for Heterogeneous DB support SQL Compatibility Standard ODBC and JDBC Drivers Comprehensive File Format Support Data Shared with Hadoop Ecosystem Modern MPP Runtime Cost based Optimizer Powerful Query Rewrite Optimized for Concurrent User Throughput Advanced Security and Auditing Workload Management Self-Tuning Memory Management Comprehensive Monitoring
  11. 11. © 2016 IBM Corporation11 IBM Big SQL on Hadoop ▪ Comprehensive ANSI SQL on Hadoop – All standard SQL language – Stored procedures and user-defined functions ▪ Integration with RDBMS ▪ BIG SQL LOAD command can load data from a remote database or table ▪ Query heterogeneous databases, such as Oracle or Teradata, using the federation feature ▪ Optimization and performance – Replaces MapReduce layer – In-memory operations with ability to spill to disk – Cost-based query optimization ▪ Open hadoop storage supported – Data persisted in HDFS, Hive, HBase SQL-based Application Big SQL Engine Data Storage SQL MPP Run-time HDFS Hadoop
  12. 12. © 2016 IBM Corporation12 Boost Your Performance! Hive 2 LLAP is good and with Big SQL it’s even better Concurrent Queries Hive 2 +LLAP 24 x 2TB disks Big SQL 12 x 2TB disks 5 7.76 4.23 25 36.24 4.42 100 102.89 4.72 Despite running with with 50% less nodes Big SQL was 22X faster @ 100 concurrent users Concurrent Queries Hive 2+LLAP @ 1 TB Big SQL @ 1 TB Big SQL @ 10 TB 5 7.76 4.23 8.72 25 36.24 4.42 36.39 100 102.89 4.72 37.02 Let’s try 10x more data Big SQL performs 275% Faster @ 100 concurrent users! 0 20 40 60 80 100 120 5 25 100 ElapsedTime Hive 2 + LLAP and Big SQL 4.3 Hive Big SQL
  13. 13. © 2016 IBM Corporation14 ▪ Easy porting of enterprise applications ▪ Ability to work seamlessly with Business Intelligence tools like Cognos to gain insights ▪ Big SQL integrates with Information Governance Catalog by enabling easy shared imports to InfoSphere Metadata Asset Manager, which allows:  Analyze assets  Utilize assets in jobs  Designate stewards for the assets Oracle SQL DB2 SQL Netezza SQL Big SQL SQL syntax tolerance (ANSI SQL Compliant) Cognos Analytics InfoSphere Metadata Asset Manager Data engineer Big SQL is a synergetic SQL engine that offers SQL compatibility, portability and collaborative ability to get composite analysis on data
  14. 14. © 2016 IBM Corporation15 Manhattan Associates A World Leader for Warehouse Management Solutions #1 Requirement Existing Cognos reports (on Oracle) must run against data archived to Hadoop. Before Big SQL,… Failed with Cloudera Impala, Hive + Tez, MapR DB With Big SQL,… Successful with all reports, unmodified, in 1 day PoC. "If you're using Cognos, Big SQL is the best option for Hadoop” - Vivek Srivastava, Sr. Director, Manhattan Associates Supply Chain Solutions
  15. 15. © 2016 IBM Corporation16 PERFORMANCE Big SQL 4.3 is 3.2x faster than Spark SQL 2.1 (4 Concurrent Streams) 100TB HADOOP-DS AT A GLANCE I/O (vs Spark) Big SQL reads 12x less data Big SQL writes 30x less data COMPRESSION 60% SPACE SAVED WITH PARQUET AVERAGE CPU USAGE 76.4% MAX I/O THROUGHPUT READ 4.4 GB/SEC WRITE 2.8 GB/SEC WORKING QUERIES Data engineer Big SQL is a powerful analytical engine with leading performance metrics on high volumes of data and concurrent streams
  16. 16. © 2016 IBM Corporation17 BigSQLworker Sparkexecutor Share data in memory Spark 2.1 is a powerful analytic co-processor that complements the rich SQL functionality of Big SQL Tight integration with Spark enables Big SQL worker and Spark Executor to communicate in memory without writing to disk Bi-directional integration allows Spark jobs can be executed from Big SQL HDFS Data engineer Big SQL is a self-tuning memory management SQL engine that integrates with Spark 2.1
  17. 17. © 2016 IBM Corporation18 Big SQL transparently queries heterogeneous systems in a single query  Join Hadoop to RDBMSs  Query optimizer understands capabilities of external system including available statistics Pre-bundled Progress’ DataDirect drivers offers easy connection setup Big SQL Fluid Query (federation) Oracle SQL Server Teradata DB2 Netezza (PDA) Informix Microsoft SQL Server Hive HBase HDFSObject Store (Swift / S3) WebHDFS Data engineer Big SQL is the ultimate hybrid SQL engine that allows query federation by virtualizing data sources and pushes processing where data resides
  18. 18. © 2016 IBM Corporation19 BRANCH_A FINANCE (security admin)BRANCH_B Role Based Access Control enables separation of Duties / Audit Row Level Security Row and Colum Level Security Data engineer Big SQL is the most secure analytical engine that offers row and column level access control (RCAC) among other security settings
  19. 19. © 2016 IBM Corporation20 Leading Technology for Advanced Analytics Big Data Storage SQL Access Data Science

×