A Simulated Diabetes Learning Intervention Improves Provider Knowledge and Co...
An Eye on the Future A Review of Data Virtualization Techniques to Improve Research Analytics RICHTER
1. An eye on the future: A review of Data
Virtualization Techniques to improve research
Presenter: Jack A Richter
Authors: Jack A Richter, Lela McFarland, Christine Bredfeldt, Ph.D
Contributors:
Rajesh V DevKaran (KPIT), Wayne Little (KPIT), Sriram Thiruvenkatachari (KPIT),
Sean Mikha (Teradata), Steven C Werntz (KPGA) .,
MID-ATLANTIC PERMANENTE RESEARCH INSTITUTE
Change History v7.1 - added links to slide 14 (vendor info) and corrected spelling errors - added comments to slides 2 and 13
Quote From the "BeyeNETWORK : Comprehensive resources for business intelligence and data warehousing professionals" Web Link : http://www.b-eye-network.com/view/14815 =========================================================================================== Rick van der Lans Rick F. van der Lans is an independent consultant, author and lecturer specializing in XML, data warehousing, application integration, and information modelling. He is Managing Director of R20/Consultancy based in The Netherlands. Rick has advised many large companies worldwide on defining their data warehouse architectures. He is chairman of the Database Systems Show (organized annually in The Netherlands since 1984), and he is columnist for two major newspapers in the Benelux, "Computable" and "DataNews". Additionally, he is advisor for magazines such as Software Release Magazine and Database Magazine. ============================================================================================ Leading DV Vendor reports A top 5 global bank used DV to create SOA data services layer across 200+ sources and 20+ applications. 250% ROI in 3 months elapsed time, 2% revenue increase within the business unit, 50-60% reduction in integration design and development time for new applications and portals, 25% increase in object reuse for downstream BI reporting projects. A top 3 global pharmaceutical firm used DV to quickly prototype, develop and deploy the new information solutions required to support strategic business decision 90% reduction in time to create a new report, 10X faster time-to-business-value for new views and applications, 200% ROI in 3 months elapsed time, 100% increase in key business analyst productivity, and 5% improvement in R&D project on-time delivery. Cable and Home Internet Service Provider – used DV to federate data from different source - data, e-mail data, credential data, etc into one virtual layer Reduced transaction time for processing a request from average of 5 seconds down to 1 second, significantly reduced manpower to manage the systems by 20% (development costs reduced), reduced time-to-market for deploying new applications by 25%. ** Notes & Slide Source: PPT Presentation - Data Virtualization .. Data Execution Strategy & Architecture Presentation 0/27/2011 by: Wayne Little & Raj Devkaran (KPIT)
Some Key features of Data Virtualization Multiple data delivery methods to the consuming applications SOA data services ODBC/JDBC REST Embedded Metadata in the Virtual Platform Availability of re-usable Data Quality and Data Transform definitions Scheduling & pre-processing Data Caching Data Discovery Rapid Prototyping DV enables integration of structured (databases), semi-structured (spreadsheets) and non-structured (weblogs, pdf file) data. Data can be cached for purposes of pre-fetching static or large data. Cache jobs are usually scheduled while DV jobs are usually realtime. Mobile devices can also "connect" to the virtual schema. Data Warehouse extension and virtual data marts are some other uses.
Concerns Timing Costs (Hardware, Programming, Support) Constant redesign of database and data movement (ETL) processes Data Quality issues introduced within the process Benefits Single Source of data – Consistent view of the data for all users
Concerns Same issues as the EDW Data duplicated between Data Marts Inconsistent ETL processes/Business rules may be applied If sourced from EDW may not be possible to trace back to true data source Benefits Allows for consistent interpretation of the data sets involved ETLs can be subject area specific Smaller datasets improve performance
Concerns Usually requires all databases be from the same vendor May Require users to know the relationships of data between different databases Performance issues (network, source system, end user) Benefits Allows access to data across databases without physical movement of the data No ETLs and Minimal IT involvement
Concerns Requires all data to be in the same database unless Distributed Database Links or something similar is used. Usually requires knowledge of the data, data relationships and rules May introduce performance issues (can be “materialized” to resolve this issue) May require movement of the view dataset Benefits Insulates the user from physical database changes. Allows for centralized consistent application of business rules Allows for centralized consistent creation derived data elements Makes data queries easier to write - less joins Fixes need to be applied only once – in the view Simple Example create view pat_names as SELECT pat_mrn, pat_name FROM base.patient WHERE patient_type <> ‘TEST’; Complex Example: Create view VDW_DEATH as SELECT smrn.PERSON_ID as MRN, rdd.DEATHDT as DeathDt, rdd.DTIMPUTE as DtImpute, rdd.COD_DX as UnderCOD, rdd.CODETYPE as Codetype, rdd.SOURCE as Source, rdd.SOURCE_CONFIDENCE as Confidence, rdd.DEATH_ID as Local_Death_ID FROM RSRCH_DEATH_DETAILS as rdd INNER JOIN RSRCH_ID_PAT_PERSON as smrn ON smrn.VDW_EXCLUDED_FLAG='N‘ and rdd.PAT_ID = smrn.PAT_ID AND rdd.LINE=1;
Problems Addressed by Data Virtualization Data Challenges Rapid data proliferation Increasing complexity & volume Data needs beyond structured Data duplication & inconsistency Silo’d data approaches Dollars & Delivery Speed Shrinking budgets Need for personalized real-time Health Care data Classic delivery methods too slow Increasing regulations Lack of agility to timely meet business needs Value Proposition for Data Virtualization Faster, More Agile Delivery Quicker time to market – virtual rather than physical integration Updating data views easier/faster than corresponding physical changes for DB/DW/DM Single Unified Source of Consistent On-Demand Data Access Re-use of consistent rules: data cleansing, business rules, security rules, etc. Common “virtual schema” with unified metadata and definitions Uniform data integration for multiple workflows and down-stream consumers: Data Services to SOA Services ( foundational data support for SOA Strategy ) ETL Reporting & Analytics Web Portals Combine multiple data sources, types & formats into a complete and unified set Structured to unstructured data Internal to external data (including cloud) Operational systems to DWs to Data Marts to Web Applications
Picture – free usage from http://freebigpictures.com/clouds-pictures/cloud-sea/
AN INTERNET BASED (i.e. Cloud) DATABASE HOSTING SERVICE - A service which hosts the users’ database on a remote system that can be accessed via the internet. The database support security and management is performed by the service vendor. It may or may not present a virtualized data view. It may require user software in addition to an internet browser to access the data. Some Vendors: SimpleDB – NoSQL (Amazon) ClearDB (CloudDB) CouchDB – NoSQL (CouchOne) Xeround - Sql*Server (Amazon) AppEngine (Google)\\ Database.com AN INTERNET BASED (i.e. Cloud) HOSTING SERVICE - A database independent service which presents the users’ database(s) to internet users.. The database support security and management is performed by customer. The tool may or may not present a virtualized data view. It may require user software in addition to an internet browser to access the data.
Cloud Database as a Generic Database Access Service This definition of a Cloud database is one where a database access service connects users with databases hosted on the internet (i.e. Cloud based databases). These databases may be hosted by the vendor running the access site or by others. Generally the access follows a format wherein the user does not need to know the database format nor method in which the database is implemented. SaaS – Software as a Service IasS – Infrastructure as a Service or Information as a Service PaaS – Platform as a Service
The Forrester Wave™: Data Virtualization, Q1 2012 Informatica, IBM, Composite Software, And Denodo Technologies Lead, With SAP, Microsoft, Oracle, Stone Bond, And Red Hat Close Behind by Noel Yuhanna, Mike Gilpin with Adam Knoll http://www.forrester.com/search?#/The+Forrester+Wave+Data+Virtualization+Q1+2012/quickscan/-/E-RES60746 ------------------------------------------------------------------------------------------------------ Article Abstract: In Forrester's 53-criteria evaluation of data virtualization — also known as information-as-a-service (IaaS) — vendors, we found that Informatica, IBM, Composite Software, and Denodo Technologies lead the pack because of strong enterprise-class data virtualization features and functionality such as real-time integration, data quality, transformation, caching, and modeling. SAP, Microsoft, Oracle, Stone Bond Technologies, and Red Hat are Strong Performers; each offers a viable option to support particular use cases. Although Oracle no longer positions itself in the data virtualization market, Forrester evaluated its solution as a nonparticipating vendor. Red Hat continues as the most substantial vendor supporting an open source data virtualization solution. This market has sufficiently matured in that its Leaders include large well-established platform vendors, but it continues to exhibit important innovation from smaller players more exclusively focused on information virtualization and federation.