SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
The Role of Data
                                                 Virtualization in a
                                                 World of Big Data

                                                 June 6, 2012

                                                 Mark Madsen
                                                 @markmadsen
                                                 www.ThirdNature.net




 Information Management Through Human History

                               New technology development
                                        (innovation)
                                       creates
                                  New methods to cope
                                       (maturation)
                                   creates
                     New information scale and availability
                                        (saturation)
                                        creates…

Copyright Third Nature, Inc.
Big Data




    You keep using that word.
    I do not think it means
    what you think it means.
What makes data “big”?

        Hierarchical structures
        Nested structures
        Encoded values
        Non‐standard (for a 
        database) types
        Deep structure
        Very large amounts
        Human authored text
  “big” is better off being defined as “complex” or “hard to manage”

Copyright Third Nature, Inc.
You could store this data in the data warehouse but…




Old database technology has so many problems
“Big Data”




New technology has so many problems
Reality is multiple data stores and platforms
Separate, purpose-built databases and processing systems for
different types of data and query / computing workloads is the
norm for information delivery. Data flows between most of these
environments.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      BI, Reporting, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Dashboards




                1 Marge Inovera $150,000 Statsi tic ai n             1 Marge I novera $150,000 Statsi tic ai n          1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n
                2 Anit a  Bath $120,000 Sewer i nspector             2 Anita Bath $120,000 Sew er i nspector            2 Anit  aBath $120,000 Sewer i nspector             2 Anit  aBath $120,000 Sewer i nspector              2 Anit  aBath $120,000 Sewer i nspector             2 Anit a  Bath $120,000 Sewer i nspector            2 Anit  aBath $120,000 Sewer i nspector              2 Anit  aBath $120,000 Sewer i nspector             2 Anit  aBath $120,000 Sewer i nspector
                3 vI an Awfulti ch $160,000 Derm atologist           3 Ivan Awfulit ch $160,000 Dermatologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist
                4 Nadia Geddit $36,000                DBA            4 N daia  Geddit $36,000             DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Data
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Warehouse
         1 Marge I novera $150,000 Statsi tic ai n           1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n
         2 Anita Bath $120,000 Sew er i nspector             2 Anit  aBath $120,000 Sewer i nspector             2 Anit a  Bath $120,000 Sewer i nspector            2 Anit  aBath $120,000 Sewer i nspector             2 Anit  aBath $120,000 Sewer i nspector              2 Anit a  Bath $120,000 Sewer i nspector            2 Anit a  Bath $120,000 Sewer i nspector            2 Anit  aBath $120,000 Sewer i nspector              2 Anit a  Bath $120,000 Sewer i nspector
         3 Ivan Awfulit ch $160,000 Dermatologist            3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist
         4 N daia  Geddit $36,000             DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA




      Databases                                                                                                                                                                                                                                                                                                                                                                                                           Documents                                                                 Flat Files      XML      Queues   ERP     Applications



                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Source Environments




       Example “big data”: Web tracking data
  USER_ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      301212631165031
  SESSION_ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   590387153892659
  VISIT_DATE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   1/10/2010 0:00
  SESSION_START_DATE                                                                                                                                                                                                                                                                                                                                                                                                                                                                           1:41:44 AM
  PAGE_VIEW_DATE                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1/10/2010 9:59
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               https://www.phisherking.com/gifts/store/LogonForm?mmc=
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐
  DESTINATION_URL                                                                                                                                                                                                                                                                                                                                                                                                                                                                              1&storeId=1055&URL=BECGiftListItemDisplay
  REFERRAL_NAME                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Direct
  REFERRAL_URL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 ‐
  PAGE_ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      PROD_24259_CARD
  REL_PRODUCTS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 PROD_24654_CARD, PROD_3648_FLOWERS
  SITE_LOCATION_NAME                                                                                                                                                                                                                                                                                                                                                                                                                                                                           VALENTINE'S DAY MICROSITE
  SITE_LOCATION_ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                             SHOP‐BY‐HOLIDAY VALENTINES DAY
  IP_ADDRESS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   67.189.110.179
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               MOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS 
  BROWSER_OS_NAME                                                                                                                                                                                                                                                                                                                                                                                                                                                                              NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322)
Example “big data”: Web tracking data
    USER_ID              301212631165031
    SESSION_ID           590387153892659
                                                     The event stream
    VISIT_DATE           1/10/2010 0:00              contains IDs, but no
    SESSION_START_DATE   1:41:44 AM                  reference data…
    PAGE_VIEW_DATE       1/10/2010 9:59
                         https://www.phisherking.com/gifts/store/LogonForm?mmc=
                         link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐
    DESTINATION_URL      1&storeId=1055&URL=BECGiftListItemDisplay
    REFERRAL_NAME        Direct
    REFERRAL_URL         ‐
    PAGE_ID              PROD_24259_CARD
    REL_PRODUCTS         PROD_24654_CARD, PROD_3648_FLOWERS
    SITE_LOCATION_NAME   VALENTINE'S DAY MICROSITE
    SITE_LOCATION_ID     SHOP‐BY‐HOLIDAY VALENTINES DAY
    IP_ADDRESS           67.189.110.179
                         MOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS 
    BROWSER_OS_NAME      NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322)
    Reference data, aka dimensions, master data. This isn’t an OLTP
    DB, there is no reference data available from the source.




                      I need that             It would be logical
                      data now.               to keep all the
                                                                        It will take
.                                             data in one place.
                                                                        6 months




                   The typical situation for analysts
There are two architectural approaches to 
 facilitating analysis, depending on where the 
 analyst works in the environment:

   1. Back end integration: For analysts working within 
      the BD environment ‐ Reaching out from the 
      environment to get other data that's needed to 
      make sense of information.

    2. Front end integration: For analysts working in a 
       more conventional BI / analysis environment ‐
       reaching in to the BD environment from other tools.




        Solution: copy the data into Hadoop?
Just load it from the DW. If it’s there. Otherwise, dump and load
the data from the sources.
Great for one-time analysis, but if you need to do it again next
week, or if you need current values on a regular basis?
You can build custom extracts from each source. But…
                          Data warehouse   • Poor tool support
 OLTP Sources
                                           • Problem of on-demand
                                             / current values
                                           • Minimal data
                                             management possible
                                             in the Hadoop
                                             environment
                                           • The analyst waits
Alternative: data virtualization to enable access
A data virtualization layer can be used to make other sources 
(OLTP, the data warehouse) appear locally accessible to the 
analyst or Hadoop programmer. Then, two choices are 
possible:
  ▪ extract the data and load it into the local environment
  ▪ access it dynamically from within the environment 

                         Data warehouse
          OLTP Sources




Alternative: data virtualization to bridge stores
A data virtualization layer can be used to bridge the database 
and big data environments, hiding the back end complexities.
Allows one to access raw or processed data from Hadoop 
alongside data from other environments with some benefits: 
no limited Hive connectors, no client‐side data merging, no 
difficult metadata layer integrations.

                         Data warehouse
          OLTP Sources
Data virtualization can simplify access across the entire 
               data environment, “big” or not
 DV also enables shared metadata across environments, avoiding 
 the costs of model integration and burying it in source code.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              BI, Reporting, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Dashboards

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Data virtualization layer (front end)




                1 Marge Inovera $150,000 Statsi tic ai n             1 Marge I novera $150,000 Statsi tic ai n          1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n
                2 Anit  aBath $120,000 Sewer i nspector              2 Anita Bath $120,000 Sew er i nspector            2 Anit  aBath $120,000 Sewer i nspector             2 Anit  aBath $120,000 Sewer i nspector              2 Anit  aBath $120,000 Sewer i nspector             2 Anit a  Bath $120,000 Sewer i nspector            2 Anit  aBath $120,000 Sewer i nspector              2 Anit  aBath $120,000 Sewer i nspector             2 Anit  aBath $120,000 Sewer i nspector
                3 Iv an Awfulti ch $160,000 Derm atologist           3 Ivan Awfulit ch $160,000 Dermatologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist
                4 Nadia Geddit $36,000                DBA            4 N daia  Geddit $36,000             DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA




                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       Data
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Warehouse
         1 Marge I novera $150,000 Statsi tic ai n           1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n            1 Marge Inovera $150,000 Statsi tic ai n             1 Marge Inovera $150,000 Statsi tic ai n
         2 Anita Bath $120,000 Sew er i nspector             2 Anit  aBath $120,000 Sewer i nspector             2 Anit a  Bath $120,000 Sewer i nspector            2 Anit  aBath $120,000 Sewer i nspector             2 Anit  aBath $120,000 Sewer i nspector              2 Anit a  Bath $120,000 Sewer i nspector            2 Anit a  Bath $120,000 Sewer i nspector            2 Anit  aBath $120,000 Sewer i nspector              2 Anit a  Bath $120,000 Sewer i nspector
         3 Ivan Awfulit ch $160,000 Dermatologist            3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist          3 Iv an Awfulti ch $160,000 Derm atologist           3 Iv an Awfulti ch $160,000 Derm atologist
         4 N daia  Geddit $36,000             DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA           4 Nadia Geddit $36,000                DBA            4 Nadia Geddit $36,000                DBA




     DV  layer (back end)
      Databases                                                                                                                                                                                                                                                                                                                                                                                                           Documents                                                                     Flat Files      XML        Queues       ERP   Applications



                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Source Environments




    Bridge the data environment to uses beyond BI




The use cases are now interactive applications, lower latency 
data, complex analytics and extend beyond read‐only queries.
About the Presenter
                                       Mark Madsen is president of Third
                                       Nature, a technology research and
                                       consulting firm focused on business
                                       intelligence, analytics and
                                       information management. Mark is an
                                       award-winning author, architect and
                                       former CTO whose work has been
                                       featured in numerous industry
                                       publications. During his career Mark
                                       received awards from the American
                                       Productivity & Quality Center, TDWI,
                                       Computerworld and the Smithsonian
                                       Institute. He is an international
                                       speaker, contributing editor at
                                       Intelligent Enterprise, and manages
                                       the open source channel at the
                                       Business Intelligence Network. For
                                       more information or to contact Mark,
                                       visit http://ThirdNature.net.




            About Third Nature

Third Nature is a research and consulting firm focused on new and
emerging technology and practices in business intelligence, analytics and
performance management. If your question is related to BI, analytics,
information strategy and data then you‘re at the right place.
Our goal is to help companies take advantage of information-driven
management practices and applications. We offer education, consulting
and research services to support business and IT organizations as well as
technology vendors.
We fill the gap between what the industry analyst firms cover and what IT
needs. We specialize in product and technology analysis, so we look at
emerging technologies and markets, evaluating technology and hw it is
applied rather than vendor market positions.

Mais conteúdo relacionado

Semelhante a Using Data Virtualization to Integrate With Big Data

3 6 leipzig
3 6 leipzig3 6 leipzig
3 6 leipzigbfnd
 
HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!
HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!
HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!Myrna Greenhut
 
E commerce search strategies
E commerce search strategiesE commerce search strategies
E commerce search strategiesRoger Xia
 
Dental Site Portfolio
Dental Site PortfolioDental Site Portfolio
Dental Site Portfoliorhrodi
 
5 steps to healthy data
5 steps to healthy data5 steps to healthy data
5 steps to healthy dataKeith Braswell
 
Boom startup overview
Boom startup overviewBoom startup overview
Boom startup overviewbjb84
 
AutoSuccess Oct09
AutoSuccess Oct09AutoSuccess Oct09
AutoSuccess Oct09autosuccess
 
ePharma Summit Next Generation Content Infographic 3 5 13
ePharma Summit Next Generation Content Infographic 3 5 13ePharma Summit Next Generation Content Infographic 3 5 13
ePharma Summit Next Generation Content Infographic 3 5 13Craig DeLarge, MBA, CPC
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software marketmark madsen
 
Mobile Marketing Mystery
Mobile Marketing MysteryMobile Marketing Mystery
Mobile Marketing MysteryBen Grossman
 
Hc Brand Sell Sheets
Hc Brand Sell SheetsHc Brand Sell Sheets
Hc Brand Sell SheetsJesse Bender
 
Place Making and the Politics of Planning: Jennifer Keesmaat
Place Making and the Politics of Planning: Jennifer KeesmaatPlace Making and the Politics of Planning: Jennifer Keesmaat
Place Making and the Politics of Planning: Jennifer KeesmaatCityRegionStudies
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesmark madsen
 
Reporting KPI's with Chernoff Faces by Super Analytics
Reporting KPI's with Chernoff Faces by Super AnalyticsReporting KPI's with Chernoff Faces by Super Analytics
Reporting KPI's with Chernoff Faces by Super AnalyticsKalle Heinonen
 
Strategic IT Governance & IT Security Managament for Executives
Strategic IT Governance & IT Security Managament for ExecutivesStrategic IT Governance & IT Security Managament for Executives
Strategic IT Governance & IT Security Managament for ExecutivesSoftware Park Thailand
 
Life Science Venture Capital 2012 Update
Life Science Venture Capital 2012 UpdateLife Science Venture Capital 2012 Update
Life Science Venture Capital 2012 UpdateBurak Alpar
 

Semelhante a Using Data Virtualization to Integrate With Big Data (20)

Open Source Search Applications
Open Source Search ApplicationsOpen Source Search Applications
Open Source Search Applications
 
3 6 leipzig
3 6 leipzig3 6 leipzig
3 6 leipzig
 
HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!
HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!
HOW-T0 CREATE ORIGINAL VISUALS FOR PINTEREST AND WHY THAT'S A GOOD IDEA!
 
E commerce search strategies
E commerce search strategiesE commerce search strategies
E commerce search strategies
 
Atlas Slide Deck
Atlas Slide DeckAtlas Slide Deck
Atlas Slide Deck
 
Upsurging Bihar
Upsurging BiharUpsurging Bihar
Upsurging Bihar
 
Dental Site Portfolio
Dental Site PortfolioDental Site Portfolio
Dental Site Portfolio
 
5 steps to healthy data
5 steps to healthy data5 steps to healthy data
5 steps to healthy data
 
Boom startup overview
Boom startup overviewBoom startup overview
Boom startup overview
 
AutoSuccess Oct09
AutoSuccess Oct09AutoSuccess Oct09
AutoSuccess Oct09
 
ePharma Summit Next Generation Content Infographic 3 5 13
ePharma Summit Next Generation Content Infographic 3 5 13ePharma Summit Next Generation Content Infographic 3 5 13
ePharma Summit Next Generation Content Infographic 3 5 13
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
Mobile Marketing Mystery
Mobile Marketing MysteryMobile Marketing Mystery
Mobile Marketing Mystery
 
Hc Brand Sell Sheets
Hc Brand Sell SheetsHc Brand Sell Sheets
Hc Brand Sell Sheets
 
REI Strategy Brief
REI Strategy BriefREI Strategy Brief
REI Strategy Brief
 
Place Making and the Politics of Planning: Jennifer Keesmaat
Place Making and the Politics of Planning: Jennifer KeesmaatPlace Making and the Politics of Planning: Jennifer Keesmaat
Place Making and the Politics of Planning: Jennifer Keesmaat
 
Assumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slidesAssumptions about Data and Analysis: Briefing room webcast slides
Assumptions about Data and Analysis: Briefing room webcast slides
 
Reporting KPI's with Chernoff Faces by Super Analytics
Reporting KPI's with Chernoff Faces by Super AnalyticsReporting KPI's with Chernoff Faces by Super Analytics
Reporting KPI's with Chernoff Faces by Super Analytics
 
Strategic IT Governance & IT Security Managament for Executives
Strategic IT Governance & IT Security Managament for ExecutivesStrategic IT Governance & IT Security Managament for Executives
Strategic IT Governance & IT Security Managament for Executives
 
Life Science Venture Capital 2012 Update
Life Science Venture Capital 2012 UpdateLife Science Venture Capital 2012 Update
Life Science Venture Capital 2012 Update
 

Mais de mark madsen

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of Peoplemark madsen
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humansmark madsen
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Rangemark madsen
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customersmark madsen
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsmark madsen
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except usmark madsen
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)mark madsen
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)mark madsen
 
Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...mark madsen
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good storymark madsen
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Don't follow the followers
Don't follow the followersDon't follow the followers
Don't follow the followersmark madsen
 

Mais de mark madsen (20)

Data Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of PeopleData Architecture: OMG It’s Made of People
Data Architecture: OMG It’s Made of People
 
Solve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for HumansSolve User Problems: Data Architecture for Humans
Solve User Problems: Data Architecture for Humans
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou RangeA Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
A Brief Tour through the Geology & Endemic Botany of the Klamath-Siskiyou Range
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
A Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing CustomersA Pragmatic Approach to Analyzing Customers
A Pragmatic Approach to Analyzing Customers
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Briefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analyticsBriefing Room analyst comments - streaming analytics
Briefing Room analyst comments - streaming analytics
 
Everything has changed except us
Everything has changed except usEverything has changed except us
Everything has changed except us
 
Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)Bi isn't big data and big data isn't BI (updated)
Bi isn't big data and big data isn't BI (updated)
 
On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)On the edge: analytics for the modern enterprise (analyst comments)
On the edge: analytics for the modern enterprise (analyst comments)
 
Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...Crossing the chasm with a high performance dynamically scalable open source p...
Crossing the chasm with a high performance dynamically scalable open source p...
 
Don't let data get in the way of a good story
Don't let data get in the way of a good storyDon't let data get in the way of a good story
Don't let data get in the way of a good story
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Don't follow the followers
Don't follow the followersDon't follow the followers
Don't follow the followers
 

Último

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Último (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Using Data Virtualization to Integrate With Big Data

  • 1. The Role of Data Virtualization in a World of Big Data June 6, 2012 Mark Madsen @markmadsen www.ThirdNature.net Information Management Through Human History New technology development (innovation) creates New methods to cope (maturation) creates New information scale and availability (saturation) creates… Copyright Third Nature, Inc.
  • 2. Big Data You keep using that word. I do not think it means what you think it means.
  • 3. What makes data “big”? Hierarchical structures Nested structures Encoded values Non‐standard (for a  database) types Deep structure Very large amounts Human authored text “big” is better off being defined as “complex” or “hard to manage” Copyright Third Nature, Inc.
  • 6. Reality is multiple data stores and platforms Separate, purpose-built databases and processing systems for different types of data and query / computing workloads is the norm for information delivery. Data flows between most of these environments. BI, Reporting,  Dashboards 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anit a  Bath $120,000 Sewer i nspector 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 3 vI an Awfulti ch $160,000 Derm atologist 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 Nadia Geddit $36,000 DBA 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA Data Warehouse 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA Databases Documents Flat Files XML Queues ERP Applications Source Environments Example “big data”: Web tracking data USER_ID 301212631165031 SESSION_ID 590387153892659 VISIT_DATE 1/10/2010 0:00 SESSION_START_DATE 1:41:44 AM PAGE_VIEW_DATE 1/10/2010 9:59 https://www.phisherking.com/gifts/store/LogonForm?mmc= link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐ DESTINATION_URL 1&storeId=1055&URL=BECGiftListItemDisplay REFERRAL_NAME Direct REFERRAL_URL ‐ PAGE_ID PROD_24259_CARD REL_PRODUCTS PROD_24654_CARD, PROD_3648_FLOWERS SITE_LOCATION_NAME VALENTINE'S DAY MICROSITE SITE_LOCATION_ID SHOP‐BY‐HOLIDAY VALENTINES DAY IP_ADDRESS 67.189.110.179 MOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS  BROWSER_OS_NAME NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322)
  • 7. Example “big data”: Web tracking data USER_ID 301212631165031 SESSION_ID 590387153892659 The event stream VISIT_DATE 1/10/2010 0:00 contains IDs, but no SESSION_START_DATE 1:41:44 AM reference data… PAGE_VIEW_DATE 1/10/2010 9:59 https://www.phisherking.com/gifts/store/LogonForm?mmc= link‐src‐email‐_‐m100109‐_‐44IOJ1‐_‐shop&langId=‐ DESTINATION_URL 1&storeId=1055&URL=BECGiftListItemDisplay REFERRAL_NAME Direct REFERRAL_URL ‐ PAGE_ID PROD_24259_CARD REL_PRODUCTS PROD_24654_CARD, PROD_3648_FLOWERS SITE_LOCATION_NAME VALENTINE'S DAY MICROSITE SITE_LOCATION_ID SHOP‐BY‐HOLIDAY VALENTINES DAY IP_ADDRESS 67.189.110.179 MOZILLA/4.0 (COMPATIBLE; MSIE 7.0; AOL 9.0; WINDOWS  BROWSER_OS_NAME NT 5.1; TRIDENT/4.0; GTB6; .NET CLR 1.1.4322) Reference data, aka dimensions, master data. This isn’t an OLTP DB, there is no reference data available from the source. I need that It would be logical data now. to keep all the It will take . data in one place. 6 months The typical situation for analysts
  • 8. There are two architectural approaches to  facilitating analysis, depending on where the  analyst works in the environment: 1. Back end integration: For analysts working within  the BD environment ‐ Reaching out from the  environment to get other data that's needed to  make sense of information. 2. Front end integration: For analysts working in a  more conventional BI / analysis environment ‐ reaching in to the BD environment from other tools. Solution: copy the data into Hadoop? Just load it from the DW. If it’s there. Otherwise, dump and load the data from the sources. Great for one-time analysis, but if you need to do it again next week, or if you need current values on a regular basis? You can build custom extracts from each source. But… Data warehouse • Poor tool support OLTP Sources • Problem of on-demand / current values • Minimal data management possible in the Hadoop environment • The analyst waits
  • 9. Alternative: data virtualization to enable access A data virtualization layer can be used to make other sources  (OLTP, the data warehouse) appear locally accessible to the  analyst or Hadoop programmer. Then, two choices are  possible: ▪ extract the data and load it into the local environment ▪ access it dynamically from within the environment  Data warehouse OLTP Sources Alternative: data virtualization to bridge stores A data virtualization layer can be used to bridge the database  and big data environments, hiding the back end complexities. Allows one to access raw or processed data from Hadoop  alongside data from other environments with some benefits:  no limited Hive connectors, no client‐side data merging, no  difficult metadata layer integrations. Data warehouse OLTP Sources
  • 10. Data virtualization can simplify access across the entire  data environment, “big” or not DV also enables shared metadata across environments, avoiding  the costs of model integration and burying it in source code. BI, Reporting,  Dashboards Data virtualization layer (front end) 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anit  aBath $120,000 Sewer i nspector 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 3 Iv an Awfulti ch $160,000 Derm atologist 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 Nadia Geddit $36,000 DBA 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA Data Warehouse 1 Marge I novera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 1 Marge Inovera $150,000 Statsi tic ai n 2 Anita Bath $120,000 Sew er i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 2 Anit  aBath $120,000 Sewer i nspector 2 Anit a  Bath $120,000 Sewer i nspector 3 Ivan Awfulit ch $160,000 Dermatologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 3 Iv an Awfulti ch $160,000 Derm atologist 4 N daia  Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA 4 Nadia Geddit $36,000 DBA DV  layer (back end) Databases Documents Flat Files XML Queues ERP Applications Source Environments Bridge the data environment to uses beyond BI The use cases are now interactive applications, lower latency  data, complex analytics and extend beyond read‐only queries.
  • 11. About the Presenter Mark Madsen is president of Third Nature, a technology research and consulting firm focused on business intelligence, analytics and information management. Mark is an award-winning author, architect and former CTO whose work has been featured in numerous industry publications. During his career Mark received awards from the American Productivity & Quality Center, TDWI, Computerworld and the Smithsonian Institute. He is an international speaker, contributing editor at Intelligent Enterprise, and manages the open source channel at the Business Intelligence Network. For more information or to contact Mark, visit http://ThirdNature.net. About Third Nature Third Nature is a research and consulting firm focused on new and emerging technology and practices in business intelligence, analytics and performance management. If your question is related to BI, analytics, information strategy and data then you‘re at the right place. Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors. We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.