SlideShare uma empresa Scribd logo
1 de 63
Building a Big Data solution
“Building an Effective Data Warehouse Architecture
with Hadoop, the cloud, and MPP”
James Serra
Big Data Evangelist
Microsoft
JamesSerra3@gmail.com
Other Presentations
 Building an Effective Data Warehouse Architecture
Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)
 Building a Big Data Solution (Building an Effective Data Warehouse
Architecture with Hadoop, the cloud and MPP)
Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in
 Finding business value in Big Data (What exactly is Big Data and why
should I care?)
Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects
 How does Microsoft solve Big Data?
Covers the Microsoft products that can be used to create a Big Data solution
 Modern Data Warehousing with the Microsoft Analytics Platform System
The next step in data warehouse performance is APS, a MPP appliance
 Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc
Deep dives into the various Microsoft Big Data related products
About Me
 Business Intelligence Consultant, in IT for 28 years
 Microsoft, Big Data Evangelist
 Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW developer
 Been perm, contractor, consultant, business owner
 Presenter at PASS Business Analytics Conference and PASS Summit
 MCSE for SQL Server 2012: Data Platform and BI
 Blog at JamesSerra.com
 SQL Server MVP
 Author of book “Reporting with Microsoft SQL Server 2012”
I tried building a Big Data solution…
And ended up passed-out drunk in a Denny’s
parking lot
Let’s prevent that from happening…
Agenda
 Review of Building an Effective Data Warehouse Architecture
 Overview of Big Data and Analytics
 Use cases
 Data Lake
 Hadoop and its role
 IoT and real-time data
 Modern data warehouse
 Federated querying
 DW and the cloud
 Symmetric Multiprocessing (SMP) vs. Massively Parallel Processing (MPP)
Review of Building and Effective Data
Warehouse Architecture
What is a Data Warehouse and why use one?
A data warehouse is where you store data from multiple data sources to be used for historical and trend
analysis reporting. It acts as a central repository for many subject areas and contains the "single version of
truth". It is NOT to be used for OLTP applications.
Reasons for a data warehouse:
 Reduce stress on production system
 Optimized for read access, sequential disk scans
 Integrate many sources of data
 Keep historical records (no need to save hardcopy reports)
 Restructure/rename tables and fields, model data
 Protect against source system upgrades
 Use Master Data Management, including hierarchies
 No IT involvement needed for users to create reports
 Improve data quality and plugs holes in source systems
 One version of the truth
 Easy to create BI solutions on top of it (i.e. SSAS Cubes)
Previous presentation “Building an Effective Data Warehouse Architecture”:
http://pragmaticworks.com/Training/FreeTraining/ViewWebinar/WebinarID/532
http://www.slideshare.net/jamserra/data-warehouse-architecture-16065902
Why use a Data Warehouse?
Legacy applications + databases = chaos
Production
Control
MRP
Inventory
Control
Parts
Management
Logistics
Shipping
Raw Goods
Order Control
Purchasing
Marketing
Finance
Sales
Accounting
Management
Reporting
Engineering
Actuarial
Human
Resources
Continuity
Consolidation
Control
Compliance
Collaboration
Enterprise data warehouse = order
Single version
of the truth
Enterprise Data
Warehouse
Every question = decision
Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could not do before
Data Warehouse Hybrid Model
Advice: Use SQL Server Views to interface between each level in the model
In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject areas), all in one database.
Another option is to have each data mart in its own database with all databases on one server or spread among multiple servers.
Also, the staging areas, CIF, and DW Bus can all be on the same powerful server (MPP)
Data Warehouse Architecture
How does “Big Data” change this architecture?
Overview of Big Data and Analytics
What differentiates today’s
thriving organizations?
Data.
What is Big Data, really?
Data in all forms & sizes
is being generated
faster than ever before
Capture & combine it
for new insights & better,
faster decisions
16
Harness the growing and changing nature of data
Collect any data
StreamingStructured
Challenge is combining transactional data stored in relational databases with less structured data
Big Data = All Data
Get the right information to the right people at the right time in the right format
Unstructured
“ ”
An illustration of the velocity of data created
Kalakota, R. (2012, October 22). Sizing “Mobile + Social” Big Data Stats. Retrieved from http://practicalanalytics.wordpress.com/
The three V’s
Complex implementations
Enterprise data warehouse
Spreadmarts
Siloed data
Hadoop
DashboardsAd hoc analysis
Machine learning
OLAP
Any dataIn-memory
Internet of Things
Innovation
Transactional systems
ETL
Operational reporting
Value
Technology innovation accelerates value
Discover and connect
Answering new questions
Value
26
Put data to work for everyone
in your organization
Inspire innovation
Accelerate decision-making
Learn from & share insights
Units Sold, Discounts, and Profit
before Tax
27
Embrace Big Data across your business
Revenue and Target by Region Departments HeadcountXT2000 Status List
Show Only Problems
Indicator
Preliminary Budget
Materials and Packaging Review
Book Advertising Slots
Fall Showcase Event Analysis
End User Survey
Technical Review Milestone
Status 2M
1.5M
1M
0.5M
0M
Discounts(Millions)
50K 60K 70K 80K 90K 100K 110
Product A
Product D Product C
Product F
Product G
0 5 10 15
Accounting
Administration
Customer Support
Finance
Human Resources
IT
Marketing
R&D
Sales
Sales
Improve revenue
performance
HR
Maximize employee
engagement
Marketing
Build deeper customer
relationships
Finance
Impact your company’s
bottom line
0
5
10
15
0
5
10
15
(Thousands)
North South
Region: South
Target: 13450
Highlighted:
4900
Revenue Target
28
The Data Divide
80%
of data
stored
70%
of data
generated by
customers
<0.5%
being
operationalized
0.5%
being
analyzed
3%
prepared for
analysis
Major Fail
Gartner: “Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation”
Paradigm4: 76% of those who have used Hadoop or Apache Spark complained of significant limitations
Analytics Solution
Capture and
integrate data
from multiple internal
and external sources
Derive insight
from data
with rich, interactive dashboards
and reports using the tools you know
Put insight
into action
to increase efficiency
and constituent satisfaction
Advanced Analytics Defined
The end result of Big Data - Icing on the cake
Use Cases
Let’s set off light bulbs in your head
Recommenda-
tion engines
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting for
business
planning
Oil & Gas
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
IT infrastructure
& Web App
optimization
Legal
discovery and
document
archiving
Data Analytics is needed everywhere
Intelligence
Gathering
Location-based
tracking &
services
Pricing Analysis
Personalized
Insurance
Personalized
policies can
reduce costs &
better meet
customer needs
Insurance companies can help
(and some have already started
helping) their customers with truly
personalized insurance plans
tailored to their needs and risks
Personalized Insurance
Insurance Companies can collect real-time data from in-
car sensors and combine it with geolocation and in-house
systems. With information such as distance and speed,
provide personalized insurance offers based on driving
amount, risk, and other factors, for a truly personalized
plan that may often save drivers money
$1,600/yr.
US national avg. car
insurance premium
The vast amount of current and ever-growing customer
purchase, rating and click data can all be collected and
managed with an Hadoop-based solution, to pinpoint
preferences based on purchase history and demographics, and
be able to serve useful and compelling cross-sell and up-sell
recommendations.
Recommendation Engines
Significantly
improve up-sell
and cross-sell
opportunities
Retailers can use customer
purchase & rating information to
serve recommendations to current
customers, based on similarities
across many dimensions
158
Items sold/second
by Amazon.com on
11/29/2010 (Cyber
Monday)
Retailers – whether large, small, online or in-store – can improve
margins with more detailed pricing analysis. When a customer
is in range of a transaction (either in the store, online or perhaps
passing by), offer personalized offers, real-time price quotes, or
other frequent-buyer perks to help bring more customers to the
store and improve repeat business.
Pricing Analysis
Significantly
improve sales
and customer
satisfaction
Retailers can use customer past
purchase, preference, and demo-
graphic information to serve real-
time custom pricing, instant
discounts when near the store.
up to 30%
Additional price Mac
users accepted for
travel from Orbitz
Using Big data to complete the picture
Data Lake
What is a data lake?
A storage repository, usually Hadoop, that holds a vast amount of raw data in its native
format until it is needed.
• A place to store unlimited amounts of data in any format inexpensively
• Allows collection of data that you may or may not use later: “just in case”
• A way to describe any large data pool in which the schema and data requirements are not
defined until the data is queried: “just in time” or “schema on read”
• Complements EDW and can be seen as a data source for the EDW – capturing all data but
only passing relevant data to the EDW
• Frees up expensive EDW resources (storage and processing), especially for data refinement
• Allows for data exploration to be performed without waiting for the EDW team to model
and load the data
• Some processing in better done on Hadoop than ETL tools like SSIS
• Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera)
Current state of a data warehouse
Traditional Approaches
CRMERPOLTP LOB
DATA SOURCES ETL DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
BI AND ANALYTCIS
Emailed,
centrally
stored Excel
reports and
dashboards
Well manicured, often relational
sources
Known and expected data volume
and formats
Little to no change
Complex, rigid transformations
Required extensive monitoring
Transformed historical into read
structures
Flat, canned or multi-dimensional
access to historical data
Many reports, multiple versions of
the truth
24 to 48h delay
MONITORING AND TELEMETRY
Current state of a data warehouse
Traditional Approaches
CRMERPOLTP LOB
DATA SOURCES ETL DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
BI AND ANALYTCIS
Emailed,
centrally
stored Excel
reports and
dashboards
Increase in variety of data sources
Increase in data volume
Increase in types of data
Pressure on the ingestion engine
Complex, rigid transformations can’t
longer keep pace
Monitoring is abandoned
Delay in data, inability to transform
volumes, or react to new sources
Repair, adjust and redesign ETL
Reports become invalid or unusable
Delay in preserved reports increases
Users begin to “innovate” to relieve
starvation
MONITORING AND TELEMETRY
INCREASING DATA VOLUME NON-RELATIONAL DATA
INCREASE IN TIME
STALE REPORTING
Data Lake Transformation (ELT not ETL)
New Approaches
All data sources are considered
Leverages the power of on-prem
technologies and the cloud for
storage and capture
Native formats, streaming data, big
data
Extract and load, no/minimal transform
Storage of data in near-native format
Orchestration becomes possible
Streaming data accommodation becomes
possible
Refineries transform data on read
Produce curated data sets to
integrate with traditional warehouses
Users discover published data
sets/services using familiar tools
CRMERPOLTP LOB
DATA SOURCES
FUTURE DATA
SOURCESNON-RELATIONAL DATA
EXTRACT AND LOAD
DATA LAKE DATA REFINERY PROCESS
(TRANSFORM ON READ)
Transform
relevant data
into data sets
BI AND ANALYTCIS
Discover and
consume
predictive
analytics, data
sets and other
reports
OTHER REFINERY
PROCESSES
DATA WAREHOUSE
Star schemas,
views
other read-
optimized
structures
Hadoop and its role
What is Hadoop?
Microsoft Confidential
 Distributed, scalable system on commodity HW
 Composed of a few parts:
 HDFS – Distributed file system
 MapReduce – Programming model
 Other tools: Hive, Pig, SQOOP, HCatalog, HBase,
Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie,
ZooKeeper, Flume, Storm
 Main players are Hortonworks, Cloudera, MapR
 WARNING: Hadoop, while ideal for processing huge
volumes of data, is inadequate for analyzing that
data in real time (companies do batch analytics
instead)
Core Services
OPERATIONAL
SERVICES
DATA
SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD &
EXTRACT
WebHDFS
OOZIE
AMBARI
YARN
MAP
REDUCE
HIVE &
HCATALOG
PIG
HBASEFALCON
Hadoop Cluster
compute
&
storage . . .
. . .
. .
compute
&
storage
.
.
Hadoop clusters provide
scale-out storage and
distributed data processing
on commodity hardware
Hortonworks Data Platform 2.3
Simply put, Hortonworks ties all the open source products together (22)
The real cost of Hadoop
http://www.wintercorp.com/tcod-report/
Use cases using Hadoop and a DW in combination
Bringing islands of Hadoop data together
Archiving data warehouse data to Hadoop (move)
(Hadoop as cold storage)
Exporting relational data to Hadoop (copy)
(Hadoop as backup/DR, analysis, cloud use)
Importing Hadoop data into data warehouse (copy)
(Hadoop as staging area, sandbox, Data Lake)
IoT and real-time data
What is the Internet of Things?
Connectivity Data AnalyticsThings
IoT = sensor-acquired data
What is the Internet of Things (IoT)?
Internet-connected devices that can perceive the environment in some way, share their data, and communicate with
you. IoT is just a catch-all term for ways of using machine-generated data to create something useful.
- Has it one processor and sensor to collect information
- Examples: heart monitoring implants, biochip transponders on farm animals, automobiles with build-in sensors, field
operation devices that assist firefighters in search and rescue
- Excludes computers, tablets, and smart phones
- But really, it’s in the sphere of business intelligence that IoT will really make a difference.
Cool possibilities
- When a milk carton is almost empty it will ping you when you are near a store
- An alarm clock that signals your coffee maker to start brewing when you wake up
- An embedded chip that monitors your vital signs and notifies a medical provider if exceeds limit
Gartner: 10 billion devices connected to the internet today, 26B by 2020
At some point in the future, nearly every manmade object will contain a device that transmits data!
Modern Data Warehouse
Modern Data Warehouse
Think about future needs:
• Increasing data volumes
• Real-time performance
• New data sources and types
• Cloud-born data
• Multi-platform solution
• Hybrid architecture
Modern Data Warehouse Defined
Modern Data WarehouseThe
Dream
The
Reality
Federated Querying
Federated Querying
Other names: Data virtualization, logical data warehouse, data
federation, virtual database, and decentralized data warehouse.
A model that allows a single query to retrieve and combine data as it sits
from multiple data sources, so as to not need to use ETL or learn more
than one retrieval technology
Select… Result set
Federated Querying
Relational
Data
DB2
Oracle
MongoDB
SQL Server
Query Model
Non-
Relational
Data
Cloudera CHD Linux
Hortonworks HDP
Windows Azure
HDInsight
DW and the Cloud
Can I use the cloud with my DW?
• Public and private cloud
• Cloud-born data vs on-prem born data
• Transfer cost from/to cloud and on-prem
• Sensitive data on-prem, non-sensitive in cloud
• Look at hybrid solutions
TDWI Best Practices Report (2015)
SMP vs MPP
SMP vs MPP
• Uses many separate CPUs running in parallel to execute a single program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing
50 TB
100 TB
500 TB
10 TB
5 PB
1.000
100
10.000
3-5 Way
Joins
 Joins +
 OLAP operations +
 Aggregation +
 Complex “Where”
constraints +
 Views
 Parallelism
5-10 Way
Joins
Normalized
Multiple, Integrated
Stars and Normalized
Simple
Star
Multiple,
Integrated
Stars
TB’s
MB’s
GB’s
Batch Reporting,
Repetitive Queries
Ad Hoc Queries
Data Analysis/Mining
Near Real Time
Data Feeds
Daily
Load
Weekly
Load
Strategic, Tactical
Strategic
Strategic, Tactical
Loads
Strategic, Tactical
Loads, SLA
“Query Freedom“
“Query complexity“
“Data
Freshness”
“Query Data Volume“
“Query Concurrency“
“Mixed
Workload”
“Schema Sophistication“
“Data Volume”
DW SCALABILITY SPIDER CHART
MPP – Multidimensional
Scalability
SMP – Tunable in one dimension
on cost of other dimensions
The spiderweb depicts
important attributes to
consider when evaluating
Data Warehousing options.
Big Data support is newest
dimension.
When do you need a MPP solution?
• We need at least 3x query performance improvement
• We are near disk capacity and see a lot of growth in the upcoming years
• We need to support queries during our maintenance window
• We need to load data outside of our maintenance window
• We will spend a lot of money for FusionIO cards, SSDs, more SAN space, more
memory, faster cpu
Summary
• We live in an increasingly data-intensive world
• Much of the data stored online and analyzed today is more varied than the data stored in recent years
• More of our data arrives in near-real time
This present a large business opportunity. Are you ready for it?
Resources
 The Modern Data Warehouse: http://bit.ly/1xuX4Py
 Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6
 Should you move your data to the cloud? http://bit.ly/1xuXbKU
 Presentation slides for Modern Data Warehousing: http://bit.ly/1xuXcP5
 Presentation slides for Building an Effective Data Warehouse Architecture: http://bit.ly/1xuXeX4
 Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
 What is the Microsoft Analytics Platform System (APS)? http://bit.ly/1xuXipO
 Parallel Data Warehouse (PDW) benefits made simple: http://bit.ly/1xuXlSy
 What is Advanced Analytics? http://bit.ly/1LDklkB
Q & A ?
James Serra, Big Data Evangelist
Email me at: JamesSerra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com (where this slide deck will be posted)

Mais conteúdo relacionado

Mais procurados

Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemJames Serra
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceSnowflake Computing
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summits
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 

Mais procurados (20)

Modern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform SystemModern Data Warehousing with the Microsoft Analytics Platform System
Modern Data Warehousing with the Microsoft Analytics Platform System
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Data as a service
Data as a serviceData as a service
Data as a service
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Modern Data Platform on AWS
Modern Data Platform on AWSModern Data Platform on AWS
Modern Data Platform on AWS
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Data Mesh
Data MeshData Mesh
Data Mesh
 

Semelhante a Building a Big Data Solution

Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergencekvnnrao
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Itay Braun
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateCCG
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationDenodo
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Big Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxBig Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxZitin Technologies PVT LTD
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceSalesforce Developers
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionMongoDB
 

Semelhante a Building a Big Data Solution (20)

Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Hadoop Demo eConvergence
Hadoop Demo eConvergenceHadoop Demo eConvergence
Hadoop Demo eConvergence
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011Extreme SSAS- SQL 2011
Extreme SSAS- SQL 2011
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data EstateEnable Better Decision Making with Power BI Visualizations & Modern Data Estate
Enable Better Decision Making with Power BI Visualizations & Modern Data Estate
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data Virtualization
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Big Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxBig Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docx
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 

Mais de James Serra

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric IntroductionJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernanceJames Serra
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI OverviewJames Serra
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AIJames Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
 
How to build your career
How to build your careerHow to build your career
How to build your careerJames Serra
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed InstanceJames Serra
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at itJames Serra
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016James Serra
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB James Serra
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBaseJames Serra
 

Mais de James Serra (20)

Microsoft Fabric Introduction
Microsoft Fabric IntroductionMicrosoft Fabric Introduction
Microsoft Fabric Introduction
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Power BI Overview, Deployment and Governance
Power BI Overview, Deployment and GovernancePower BI Overview, Deployment and Governance
Power BI Overview, Deployment and Governance
 
Power BI Overview
Power BI OverviewPower BI Overview
Power BI Overview
 
Machine Learning and AI
Machine Learning and AIMachine Learning and AI
Machine Learning and AI
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Power BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data SolutionsPower BI for Big Data and the New Look of Big Data Solutions
Power BI for Big Data and the New Look of Big Data Solutions
 
How to build your career
How to build your careerHow to build your career
How to build your career
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Azure SQL Database Managed Instance
Azure SQL Database Managed InstanceAzure SQL Database Managed Instance
Azure SQL Database Managed Instance
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Learning to present and becoming good at it
Learning to present and becoming good at itLearning to present and becoming good at it
Learning to present and becoming good at it
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 
Introducing DocumentDB
Introducing DocumentDB Introducing DocumentDB
Introducing DocumentDB
 
Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 

Último

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Building a Big Data Solution

  • 1. Building a Big Data solution “Building an Effective Data Warehouse Architecture with Hadoop, the cloud, and MPP” James Serra Big Data Evangelist Microsoft JamesSerra3@gmail.com
  • 2. Other Presentations  Building an Effective Data Warehouse Architecture Reasons for building a DW and the various approaches and DW concepts (Kimball vs Inmon)  Building a Big Data Solution (Building an Effective Data Warehouse Architecture with Hadoop, the cloud and MPP) Explains what Big Data is, it’s benefits including use cases, and how Hadoop, the cloud, and MPP fit in  Finding business value in Big Data (What exactly is Big Data and why should I care?) Very similar to “Building a Big Data Solution” but target audience is business users/CxO instead of architects  How does Microsoft solve Big Data? Covers the Microsoft products that can be used to create a Big Data solution  Modern Data Warehousing with the Microsoft Analytics Platform System The next step in data warehouse performance is APS, a MPP appliance  Power BI, Azure ML, Azure HDInsights, Azure Data Factory, etc Deep dives into the various Microsoft Big Data related products
  • 3. About Me  Business Intelligence Consultant, in IT for 28 years  Microsoft, Big Data Evangelist  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW developer  Been perm, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference and PASS Summit  MCSE for SQL Server 2012: Data Platform and BI  Blog at JamesSerra.com  SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  • 4. I tried building a Big Data solution… And ended up passed-out drunk in a Denny’s parking lot Let’s prevent that from happening…
  • 5. Agenda  Review of Building an Effective Data Warehouse Architecture  Overview of Big Data and Analytics  Use cases  Data Lake  Hadoop and its role  IoT and real-time data  Modern data warehouse  Federated querying  DW and the cloud  Symmetric Multiprocessing (SMP) vs. Massively Parallel Processing (MPP)
  • 6. Review of Building and Effective Data Warehouse Architecture
  • 7. What is a Data Warehouse and why use one? A data warehouse is where you store data from multiple data sources to be used for historical and trend analysis reporting. It acts as a central repository for many subject areas and contains the "single version of truth". It is NOT to be used for OLTP applications. Reasons for a data warehouse:  Reduce stress on production system  Optimized for read access, sequential disk scans  Integrate many sources of data  Keep historical records (no need to save hardcopy reports)  Restructure/rename tables and fields, model data  Protect against source system upgrades  Use Master Data Management, including hierarchies  No IT involvement needed for users to create reports  Improve data quality and plugs holes in source systems  One version of the truth  Easy to create BI solutions on top of it (i.e. SSAS Cubes) Previous presentation “Building an Effective Data Warehouse Architecture”: http://pragmaticworks.com/Training/FreeTraining/ViewWebinar/WebinarID/532 http://www.slideshare.net/jamserra/data-warehouse-architecture-16065902
  • 8. Why use a Data Warehouse? Legacy applications + databases = chaos Production Control MRP Inventory Control Parts Management Logistics Shipping Raw Goods Order Control Purchasing Marketing Finance Sales Accounting Management Reporting Engineering Actuarial Human Resources Continuity Consolidation Control Compliance Collaboration Enterprise data warehouse = order Single version of the truth Enterprise Data Warehouse Every question = decision Two purposes of data warehouse: 1) save time building reports; 2) slice in dice in ways you could not do before
  • 9. Data Warehouse Hybrid Model Advice: Use SQL Server Views to interface between each level in the model In the DW Bus Architecture, each data mart could be a schema (broken out by business process subject areas), all in one database. Another option is to have each data mart in its own database with all databases on one server or spread among multiple servers. Also, the staging areas, CIF, and DW Bus can all be on the same powerful server (MPP)
  • 10. Data Warehouse Architecture How does “Big Data” change this architecture?
  • 11. Overview of Big Data and Analytics
  • 13. What is Big Data, really? Data in all forms & sizes is being generated faster than ever before Capture & combine it for new insights & better, faster decisions 16
  • 14. Harness the growing and changing nature of data Collect any data StreamingStructured Challenge is combining transactional data stored in relational databases with less structured data Big Data = All Data Get the right information to the right people at the right time in the right format Unstructured “ ”
  • 15. An illustration of the velocity of data created Kalakota, R. (2012, October 22). Sizing “Mobile + Social” Big Data Stats. Retrieved from http://practicalanalytics.wordpress.com/
  • 17. Complex implementations Enterprise data warehouse Spreadmarts Siloed data Hadoop DashboardsAd hoc analysis Machine learning OLAP Any dataIn-memory Internet of Things Innovation Transactional systems ETL Operational reporting Value Technology innovation accelerates value
  • 18. Discover and connect Answering new questions Value
  • 19. 26 Put data to work for everyone in your organization Inspire innovation Accelerate decision-making Learn from & share insights
  • 20. Units Sold, Discounts, and Profit before Tax 27 Embrace Big Data across your business Revenue and Target by Region Departments HeadcountXT2000 Status List Show Only Problems Indicator Preliminary Budget Materials and Packaging Review Book Advertising Slots Fall Showcase Event Analysis End User Survey Technical Review Milestone Status 2M 1.5M 1M 0.5M 0M Discounts(Millions) 50K 60K 70K 80K 90K 100K 110 Product A Product D Product C Product F Product G 0 5 10 15 Accounting Administration Customer Support Finance Human Resources IT Marketing R&D Sales Sales Improve revenue performance HR Maximize employee engagement Marketing Build deeper customer relationships Finance Impact your company’s bottom line 0 5 10 15 0 5 10 15 (Thousands) North South Region: South Target: 13450 Highlighted: 4900 Revenue Target
  • 21. 28 The Data Divide 80% of data stored 70% of data generated by customers <0.5% being operationalized 0.5% being analyzed 3% prepared for analysis
  • 22. Major Fail Gartner: “Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation” Paradigm4: 76% of those who have used Hadoop or Apache Spark complained of significant limitations
  • 23. Analytics Solution Capture and integrate data from multiple internal and external sources Derive insight from data with rich, interactive dashboards and reports using the tools you know Put insight into action to increase efficiency and constituent satisfaction
  • 25. The end result of Big Data - Icing on the cake
  • 27. Let’s set off light bulbs in your head
  • 28. Recommenda- tion engines Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting for business planning Oil & Gas exploration Social network analysis Churn analysis Traffic flow optimization IT infrastructure & Web App optimization Legal discovery and document archiving Data Analytics is needed everywhere Intelligence Gathering Location-based tracking & services Pricing Analysis Personalized Insurance
  • 29. Personalized policies can reduce costs & better meet customer needs Insurance companies can help (and some have already started helping) their customers with truly personalized insurance plans tailored to their needs and risks Personalized Insurance Insurance Companies can collect real-time data from in- car sensors and combine it with geolocation and in-house systems. With information such as distance and speed, provide personalized insurance offers based on driving amount, risk, and other factors, for a truly personalized plan that may often save drivers money $1,600/yr. US national avg. car insurance premium
  • 30. The vast amount of current and ever-growing customer purchase, rating and click data can all be collected and managed with an Hadoop-based solution, to pinpoint preferences based on purchase history and demographics, and be able to serve useful and compelling cross-sell and up-sell recommendations. Recommendation Engines Significantly improve up-sell and cross-sell opportunities Retailers can use customer purchase & rating information to serve recommendations to current customers, based on similarities across many dimensions 158 Items sold/second by Amazon.com on 11/29/2010 (Cyber Monday)
  • 31. Retailers – whether large, small, online or in-store – can improve margins with more detailed pricing analysis. When a customer is in range of a transaction (either in the store, online or perhaps passing by), offer personalized offers, real-time price quotes, or other frequent-buyer perks to help bring more customers to the store and improve repeat business. Pricing Analysis Significantly improve sales and customer satisfaction Retailers can use customer past purchase, preference, and demo- graphic information to serve real- time custom pricing, instant discounts when near the store. up to 30% Additional price Mac users accepted for travel from Orbitz
  • 32. Using Big data to complete the picture
  • 34. What is a data lake? A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. • A place to store unlimited amounts of data in any format inexpensively • Allows collection of data that you may or may not use later: “just in case” • A way to describe any large data pool in which the schema and data requirements are not defined until the data is queried: “just in time” or “schema on read” • Complements EDW and can be seen as a data source for the EDW – capturing all data but only passing relevant data to the EDW • Frees up expensive EDW resources (storage and processing), especially for data refinement • Allows for data exploration to be performed without waiting for the EDW team to model and load the data • Some processing in better done on Hadoop than ETL tools like SSIS • Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera)
  • 35. Current state of a data warehouse Traditional Approaches CRMERPOLTP LOB DATA SOURCES ETL DATA WAREHOUSE Star schemas, views other read- optimized structures BI AND ANALYTCIS Emailed, centrally stored Excel reports and dashboards Well manicured, often relational sources Known and expected data volume and formats Little to no change Complex, rigid transformations Required extensive monitoring Transformed historical into read structures Flat, canned or multi-dimensional access to historical data Many reports, multiple versions of the truth 24 to 48h delay MONITORING AND TELEMETRY
  • 36. Current state of a data warehouse Traditional Approaches CRMERPOLTP LOB DATA SOURCES ETL DATA WAREHOUSE Star schemas, views other read- optimized structures BI AND ANALYTCIS Emailed, centrally stored Excel reports and dashboards Increase in variety of data sources Increase in data volume Increase in types of data Pressure on the ingestion engine Complex, rigid transformations can’t longer keep pace Monitoring is abandoned Delay in data, inability to transform volumes, or react to new sources Repair, adjust and redesign ETL Reports become invalid or unusable Delay in preserved reports increases Users begin to “innovate” to relieve starvation MONITORING AND TELEMETRY INCREASING DATA VOLUME NON-RELATIONAL DATA INCREASE IN TIME STALE REPORTING
  • 37. Data Lake Transformation (ELT not ETL) New Approaches All data sources are considered Leverages the power of on-prem technologies and the cloud for storage and capture Native formats, streaming data, big data Extract and load, no/minimal transform Storage of data in near-native format Orchestration becomes possible Streaming data accommodation becomes possible Refineries transform data on read Produce curated data sets to integrate with traditional warehouses Users discover published data sets/services using familiar tools CRMERPOLTP LOB DATA SOURCES FUTURE DATA SOURCESNON-RELATIONAL DATA EXTRACT AND LOAD DATA LAKE DATA REFINERY PROCESS (TRANSFORM ON READ) Transform relevant data into data sets BI AND ANALYTCIS Discover and consume predictive analytics, data sets and other reports OTHER REFINERY PROCESSES DATA WAREHOUSE Star schemas, views other read- optimized structures
  • 39. What is Hadoop? Microsoft Confidential  Distributed, scalable system on commodity HW  Composed of a few parts:  HDFS – Distributed file system  MapReduce – Programming model  Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm  Main players are Hortonworks, Cloudera, MapR  WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead) Core Services OPERATIONAL SERVICES DATA SERVICES HDFS SQOOP FLUME NFS LOAD & EXTRACT WebHDFS OOZIE AMBARI YARN MAP REDUCE HIVE & HCATALOG PIG HBASEFALCON Hadoop Cluster compute & storage . . . . . . . . compute & storage . . Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware
  • 40. Hortonworks Data Platform 2.3 Simply put, Hortonworks ties all the open source products together (22)
  • 41. The real cost of Hadoop http://www.wintercorp.com/tcod-report/
  • 42. Use cases using Hadoop and a DW in combination Bringing islands of Hadoop data together Archiving data warehouse data to Hadoop (move) (Hadoop as cold storage) Exporting relational data to Hadoop (copy) (Hadoop as backup/DR, analysis, cloud use) Importing Hadoop data into data warehouse (copy) (Hadoop as staging area, sandbox, Data Lake)
  • 44. What is the Internet of Things? Connectivity Data AnalyticsThings IoT = sensor-acquired data
  • 45. What is the Internet of Things (IoT)? Internet-connected devices that can perceive the environment in some way, share their data, and communicate with you. IoT is just a catch-all term for ways of using machine-generated data to create something useful. - Has it one processor and sensor to collect information - Examples: heart monitoring implants, biochip transponders on farm animals, automobiles with build-in sensors, field operation devices that assist firefighters in search and rescue - Excludes computers, tablets, and smart phones - But really, it’s in the sphere of business intelligence that IoT will really make a difference. Cool possibilities - When a milk carton is almost empty it will ping you when you are near a store - An alarm clock that signals your coffee maker to start brewing when you wake up - An embedded chip that monitors your vital signs and notifies a medical provider if exceeds limit Gartner: 10 billion devices connected to the internet today, 26B by 2020 At some point in the future, nearly every manmade object will contain a device that transmits data!
  • 47. Modern Data Warehouse Think about future needs: • Increasing data volumes • Real-time performance • New data sources and types • Cloud-born data • Multi-platform solution • Hybrid architecture
  • 52. Federated Querying Other names: Data virtualization, logical data warehouse, data federation, virtual database, and decentralized data warehouse. A model that allows a single query to retrieve and combine data as it sits from multiple data sources, so as to not need to use ETL or learn more than one retrieval technology
  • 53. Select… Result set Federated Querying Relational Data DB2 Oracle MongoDB SQL Server Query Model Non- Relational Data Cloudera CHD Linux Hortonworks HDP Windows Azure HDInsight
  • 54. DW and the Cloud
  • 55. Can I use the cloud with my DW? • Public and private cloud • Cloud-born data vs on-prem born data • Transfer cost from/to cloud and on-prem • Sensitive data on-prem, non-sensitive in cloud • Look at hybrid solutions
  • 56. TDWI Best Practices Report (2015)
  • 58. SMP vs MPP • Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes MPP - Massively Parallel Processing • Multiple CPUs used to complete individual processes simultaneously • All CPUs share the same memory, disks, and network controllers (scale-up) • All SQL Server implementations up until now have been SMP • Mostly, the solution is housed on a shared SAN SMP - Symmetric Multiprocessing
  • 59. 50 TB 100 TB 500 TB 10 TB 5 PB 1.000 100 10.000 3-5 Way Joins  Joins +  OLAP operations +  Aggregation +  Complex “Where” constraints +  Views  Parallelism 5-10 Way Joins Normalized Multiple, Integrated Stars and Normalized Simple Star Multiple, Integrated Stars TB’s MB’s GB’s Batch Reporting, Repetitive Queries Ad Hoc Queries Data Analysis/Mining Near Real Time Data Feeds Daily Load Weekly Load Strategic, Tactical Strategic Strategic, Tactical Loads Strategic, Tactical Loads, SLA “Query Freedom“ “Query complexity“ “Data Freshness” “Query Data Volume“ “Query Concurrency“ “Mixed Workload” “Schema Sophistication“ “Data Volume” DW SCALABILITY SPIDER CHART MPP – Multidimensional Scalability SMP – Tunable in one dimension on cost of other dimensions The spiderweb depicts important attributes to consider when evaluating Data Warehousing options. Big Data support is newest dimension.
  • 60. When do you need a MPP solution? • We need at least 3x query performance improvement • We are near disk capacity and see a lot of growth in the upcoming years • We need to support queries during our maintenance window • We need to load data outside of our maintenance window • We will spend a lot of money for FusionIO cards, SSDs, more SAN space, more memory, faster cpu
  • 61. Summary • We live in an increasingly data-intensive world • Much of the data stored online and analyzed today is more varied than the data stored in recent years • More of our data arrives in near-real time This present a large business opportunity. Are you ready for it?
  • 62. Resources  The Modern Data Warehouse: http://bit.ly/1xuX4Py  Fast Track Data Warehouse Reference Architecture for SQL Server 2014: http://bit.ly/1xuX9m6  Should you move your data to the cloud? http://bit.ly/1xuXbKU  Presentation slides for Modern Data Warehousing: http://bit.ly/1xuXcP5  Presentation slides for Building an Effective Data Warehouse Architecture: http://bit.ly/1xuXeX4  Hadoop and Data Warehouses: http://bit.ly/1xuXfu9  What is the Microsoft Analytics Platform System (APS)? http://bit.ly/1xuXipO  Parallel Data Warehouse (PDW) benefits made simple: http://bit.ly/1xuXlSy  What is Advanced Analytics? http://bit.ly/1LDklkB
  • 63. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck will be posted)