Mais conteúdo relacionado
Semelhante a Ten tools for ten big data areas 01 informatica (20)
Ten tools for ten big data areas 01 informatica
- 2. Ten Tools for Ten Big Data Areas – Overview
2© Sparkera. Confidential. All Rights Reserved
10 Tools
10 Areas
Programming
SearchandIndex
First ETL fully on Yarn
Data storing platform
Data computing platform
SQL & Metadata
Visualize with just few clicks
Powerful as Java
Simple as Python
real-time
streaming
Made easier
Yours
Google
Lightning-fast cluster computing
Real-time distributed data store
High throughput
distributed messaging
- 3. Agenda
3© Sparkera. Confidential. All Rights Reserved
About data integration
2 About Informatica company and its approach
3 Informatica architecture, client, server components, developer tool overview
4 Informatica why and why not
5 Informatica job trend
1
- 4. Little About DI – Data Integration
• DI involves combining data residing in different sources and
providing users with a unified view of these data.
• DI process is also called Enterprise Information Integration (EII).
• DI usually means ETL - data extract, transformation, load.
• 80% of enterprise data projects' efforts are spent on DI work.
• Data cleansing, audit, master data management are usually
considered with DI.
© Sparkera. Confidential. All Rights Reserved
- 5. About Informatica Company
• Found in 1993
• 2014 revenue – US$1.05 billion
• Average growth rate 17% per year
• Employee – 5500+
• Customers – 5000
• Value customer covers up to 70% of global top 500 company
• Partners – 500+
• Cover various business, industries and government organizations
including telecommunications, health care, financial and insurance
services.
• A company dedicate on data integration and management
• Bought out as private company on August 2015.
© Sparkera. Confidential. All Rights Reserved
- 6. The Tradition Approach
Application Database Partner Data
SWIFT NACHA HIPAA …
Cloud Computing Unstructured
87% of enterprises use hand-coding for data integration
75% of enterprises reported increased maintenance costs
Data
Warehouse
Data
Migration
Test Data
Management
& Archiving
Master Data
Management
Data
Synchronization
B2B Data
Exchange
Data
Consolidation
Complex
Event
Processing
Ultra
Messaging
© Sparkera. Confidential. All Rights Reserved
- 7. The Informatica Approach
Application Partner Data
SWIFT NACHA HIPAA …
Cloud Computing UnstructuredDatabase
Data
Warehouse
Data
Migration
Test Data
Management
& Archiving
Master Data
Management
Data
Synchronization
B2B Data
Exchange
Data
Consolidation
Complex
Event
Processing
Ultra
Messaging
© Sparkera. Confidential. All Rights Reserved
- 8. Informatica Latest Products v9.6
• Data Integration
PowerCenter
PowerExchange
• Master Data Management
• Cloud Integration
• Big Data
BDE – Informatica Developer
Big data parser
© Sparkera. Confidential. All Rights Reserved
- 9. Informatica PowerCenter Overview
• An ETL tool ( Extract, Transform and Load)
• The main advantages over other ETL tools lies in its robustness,
across OS, and high performance.
• It can read from a variety of different sources and write to as many
targets, while transforming data in between.
• The architecture design use SOA concept for better extensibility and
high availability
• Single sign on access, built-in version control, GUI development,
built-in schedule and monitoring
© Sparkera. Confidential. All Rights Reserved
- 11. Informatica PowerCenter Client Component
• Repository Manager – meta data management
• Designer – Tool to build mapping for ETL logic
• Workflow Manager – Tool to build/run session and workflow
• Workflow Monitor – Tool to monitor job running
• Administration Console (browser based) - administration
© Sparkera. Confidential. All Rights Reserved
- 13. Designer
Create and debug mapping & maplet including source, target,
transformations for core ETL logic.
© Sparkera. Confidential. All Rights Reserved
- 17. Informatica PowerCenter Server Components
• Repository service: The Repository service manages the repository.
It retrieves, inserts, and updates metadata into the repository
database tables.
• Integration service: The Integration service runs sessions and
workflows.
• Web services hub: The Web services hub receives requests from
web service clients and exposes PowerCenter workflows as services.
• Informatica service: Overall service management and coordination
© Sparkera. Confidential. All Rights Reserved
- 18. Informatica Big Data Edition Overview
Extract, load, and transform with big data ecosystem.
© Sparkera. Confidential. All Rights Reserved
- 19. Informatica BDE Component - Developer
BDE is all in one tool and can fully push job running on Hadoop
Developer component
• Mapping – Tool to build mapping for ETL logic
• Maplet – Reusable mapping
• Workflow – Tool to build workflow
• Application – Tool to deploy mapping/workflow
Others
• Monitoring Console (browser based) – job monitoring
• Administration Console (browser based) - administration
© Sparkera. Confidential. All Rights Reserved
- 20. Why Informatica Product
• Proven technology leadership
• A track record of continuous innovation
• The most neutral trusted partner – very focus
• Long history of customer success
• Over 5000+ industry leaders relies on Informatica
• Major banks, telecom, insurance, energy, health, research
companies are using Informatica in Toronto
• Easy and popular to use
• Pull push job to Hadoop
• Connector for many kinds of source
• Performance and reliability
© Sparkera. Confidential. All Rights Reserved
- 21. Side Effect - When May Not To
• High price: 150K+ to start
• Get challenges from ELT – Leverage database for transformation.
Need investment on ETL server. Its push to database optimization
has limitations.
• Schedule, monitoring, and version control functions are limited
• BDE is relative new although the concept is great
• Alternatives - MS SSIS, Talend Studio, Pentaho Data Integration
© Sparkera. Confidential. All Rights Reserved
- 22. Informatica Job Trends
Level Junior Level
(20%)
Middle Level
(40%)
Expert Level
(40%)
Position ETL developer
Informatica dev.
DW developer
Sr. ETL developer
Data Specialist
ETL specialist
ETL designer
ETL Admin
Big data ETL dev.
BDE developer
Informatica architect
Informatica consultant
Tool PowerCenter Informatica
Developer
Other
Usage
Percentage
80% 10% 10%
© Sparkera. Confidential. All Rights Reserved
- 23. www.sparkera.ca
BIG DATA is not only about data,
but the understanding of the data
and how people use data actively to improve their life.
Notas do Editor
- Lightning-fast cluster computing