1. Delivering trusted data for analyst
autonomy and operational agility
with a unified big data fabric
Vishal Bamba, VP Strategy & Architecture, Transamerica
Murthy Mathiprakasam, Product Marketing, Informatica
1
2. Informatica Overview
Cloud Data IntegrationEnterprise Data Integration Data Quality Master Data Management
Over 20
years in
data mgmt
500+
partners
including
5 Hadoop
vendors
5000+
customers
globally
3. Transamerica’s Business
• Investments & Retirement
• Retirement and Benefit Plan services to employers/employees
• Mutual funds and variable annuities
• Mission to help people save and invest wisely to secure their retirement
dreams, the I&R business unit serves more than 3 million retirement plan
participants across the entire spectrum of defined benefit and defined
contribution plans.
• Life & Protection
• Term and Perm insurance products
• Medicare supplement, long term care, accidental death, final expense
• Mission to protect what you’ve built, secure what’s next.
4. Data Architecture - Complexity
Hard to ingest?
Slow to ingest?
Security?
Governance?
Access?
5. Data Architecture - Objectives
Discover and mine data
relationships in a trusted fashion
Leverage a 360 degree view of data
and its relationships to develop a
360 degree view of consumers
Create highly targeted and
personalized multi-channel
marketing programs
Set up a 30 Node Hadoop Cluster ingesting
approx 30TB of structured & semi-
structured data
800 Million Rows of data from 1200+ input
files
Ingest new types of data about consumers
Ingest data into a collated data lake and
make it available quickly
Mask sensitive information (eg like SSN)
and trace lineage through the pipeline
Offer both “unmanaged” and “managed”
datasets for different purposes
6. Seven Habits of Highly Successful Big Data Projects
6
1
Establish 360 view of
data & relationships for
360 view of customers,
products, services,
and risk
4
Use a data lake for
different levels of
management and fit-
for-purpose confidence
2
Centralize data
management &
automate with high
performance
integration
3
Design to use cases
and execute using a
small, flexible teams
with rapid, iterative,
agile development
5
Establish tools,
taxonomies, & processes
for collaborative
validation, stewardship,
traceability, and masking
6
Identify and socialize
data issues earlier with
data scorecarding and
data quality
7
Partner with leading
vendors to accelerate
development and
ensure flexible
deployment
7. Habit #1: Design to Use Cases & Be Agile
Tie development to established business use cases
Build a small, flexible team to iterate on quick wins
Staff with naturally innovative individuals
Look for people who can wear multiple hats
Use technology to drive agile processes
Partner with the business
Socialize the platform & the vision
Data management requires evangelism
7
1
Design to use cases
and execute using a
small, flexible teams
with rapid, iterative,
agile development
8. Habit #2: Create a Big Data Integration Machine
Leverage a centralized team to manage and deliver trusted
data assets
Cross functional team - BizDevOps
Core set of Informatica developers in IT to create mappings
Data jobs are operationalized and monitored
Leverage technology for heavy lifting of big data integration
Use prebuilt connectors for RDBMS, OLAP, Salesforce, Social
Media, etc
Use Natural Language Processing (NLP) to mine semi-
structured & unstructured data (emails, twitter feeds, facebook
posts, etc)
8
2
Centralize data
management &
automate with high
performance
integration
9. Habit #3: Manage Fit-for-purpose Data Assets
9
3
Use a data lake for
different levels of
management and fit-
for-purpose confidence
IT-assisted onboarding process
setting up directories
reviewing data flows
reviewing privileges
Access is tracked and is included
in auditing reports/events
Autonomously managed Centrally managed
Provision access upon request to
data with secure and governed
process
Access is tracked and included in
auditing reports/events
Data fit for exploration Data fit for reporting
vs
vs
Business analysts are empowered
to get timely access for holistic
exploration
Data need not be at highest level of
quality for data scientist use
Business analysts are empowered
to get trusted access to best data
Data at higher level of quality for
reporting and business intelligence
10. Habit #4: Establish Collaborative Governance
10
4
Establish tools,
taxonomies, &
processes for
collaborative
validation,
stewardship,
traceability, and
masking
Apply
Data
Governance
Apply
Measure
and
Monitor
Define
Discover
IT Business
Define Terms, Policies, and Rules
business glossary
technical metadata
data taxonomy
access policies
data retention rules
Apply and Execute Processes
stewardship processes
provisioning processes
Measure and Monitor Continuously
lineage validation
security views
11. Habit #5: Drive Early Detection of Data Quality
5
Identify and socialize
data issues earlier with
data scorecarding and
data quality
Ingest
and Land
Data
Profile
and
Discover
Data
Define Data
Quality
Scorecards
Define DI
Mappings
and IR
Rules
Execute
Workflows
Monitor
Data Quality
Scorecards
Manage
Record
Exceptions
Generate
Audit
Reports
Implement
Security
Views
12. Habit #6: 360 View of Data = 360 Insights
6
Establish 360 view of
data & relationships for
360 view of
customers/intermederi
as, products, services,
and risk
Protect sensitive data holistically
Provide access on a ‘need to know’
basis
Alert and audit on events that occur
Utilize exception reporting for fast
action notification
360 View of Data for Security
Match very large volumes of
weblog visitor data, Contact Center
data to Salesforce leads and to
policy holders to generate a sales
pipeline; time-series analysis
Apply standardization, aggregation,
de-duping process to generate a
master record of party/ household
identification
360 View of Relationships for Matching
13. Habit #7: Why Informatica and Cloudera?
13
Strong commitment to community
driven, open source platform
Support for security & governance
with authentication; authorization;
auditing, etc
Strong presence in Financial
Services Industry
Increased developer productivity with
visual development and deployment
abstraction
Fast deployment with prebuilt
connectors
Natural Language Processing (NLP):
Strong big data security and
governance with data profiling & data
quality on Hadoop
7
Partner with leading
vendors to accelerate
development and
ensure flexible
deployment
Thanks for joining us today blah blah
My name is Murthy blah blah
I’m very fortunate to share the stage with a big data industry expert, Vishal Bamba, who is VP of Architecture at Transamerica. Thanks for joining me today for this discussion on delivering trusted data
Just some background, I can provide a quick overview of Informatica for those of you who haven’t heard of us.
Question: So Vishal, can you maybe share a quick overview of Transamerica’s business?
Question: So Vishal, needless to say, part of the motivation for your big data initiatives was the complexity of your data architecture. A lot of people in the audience can probably relate to this. Can you share a little bit about the challenges you faced with your data architecture, particularly around 5 areas: how easy was it to ingest new types of data, how quickly could you ingest new data, how easy was it to secure the data, how easy was it to govern it, how easy was it to offer different types of access to the data?
Question: And Vishal, so given those challenges you faced with managing the data in your environment, what were the goals and objectives of the new big data project you embarked on?
Comment: Well in an age when so many big data projects stall or sometimes even fail, you’ve been uniquely successful with your goals and objectives. And what I find most fascinating is that you seem to have established a couple of habits or lessons from your experiences as some of the key drives for your success. So maybe we could just quickly go over a couple of your key learnings and success factors.
Question: How was being Agile a differentiator. Can you talk to some of the benefits you achieved here. Fail Fast etc.
Question:
Question: Can you expand on your approach here. It seems like you took the middle road. What were some of the benefits you realized doing this?
Question: Talk about your governance process. Why was this important?
Question: Talk about your data challenges. How did Informatica support your data quality challenges?
Security as first class citizen; talk about Managed Views – Access can be controlled on a need to know basis.
Question to Vishal: So Vishal, given everything we’ve discussed today, could you summarize how this end-to-end approach you’ve taken to building a sort of “big data fabric” for data management has simplified big data integration and supported your big data security and governance objectives? Ultimately, how has this design empowered your business analysts and driven a more 360 degree view of your customers?