SlideShare uma empresa Scribd logo
1 de 11
Baixar para ler offline
DATA SCIENCE
DATA ENGINEERS
DATA SOLUTIONS
Think&Big&Start&Smart&Scale&Fast
Eliano Marques-– Senior-Data-Scientist
Martin-Oberhuber-– Senior-Data-Scientist
CONFIDENTIAL+ +++++++| 2© 2015 Think Big, a Teradata Company
Think+Big+History
1st
SI+Solution+Provider+with+100%+focus+on+open+source+
and+Big+Data+Hadoop ecosystem
• 100++Successful+Programs
• 70++Clients
• Global+Delivery+Capabilities
• We-are-hiring
CONFIDENTIAL+ +++++++| 3© 2015 Think Big, a Teradata Company
Think-Big-Clients
Trusted&Analytics&Services&Provider&to&the&Fortune&1000
eCommerce
2+of+Global+Top+5
Internet-Transaction-Security
Global #1
Retail
2+of+Global+Top+5
Brokerage &-Mutual-Funds
2+of+Global+Top+5
Social-Networking
Global #1
Asset-Management
Global #1
Credit-Issuer
2+of Global+Top+5
Semiconductor
2+of+Global Top+5
Banking
4+of+Global Top+10
Data Storage-Devices
3+of+Global Top+5
Financial Data-Services
2+of+Global+Top+5
Disk Manufacturing
Global+#1
Financial-Exchanges
Global #2
Telecommunications
2+of+Global Top+5
Media-& Advertising
2+of+Global+Top+5
CONFIDENTIAL+ +++++++| 4© 2015 Think Big, a Teradata Company
Think+Big+VELOCITY Methodology
Big+Data
Strategy
Think+Big
Academy
Big+Data
Program+Mgt
Business
Analytics
Managed+
Services
Data+
Engineering
Big+Data+Lab
Think+Big+engages+with+it’s+client’s+business,+technical,+analyst+and+support+teams+in+
an+agile+inspired+VELOCITY+Methodology+to+continuously+develop+Big+Data+solutions+
CONFIDENTIAL+ +++++++| 5© 2015 Think Big, a Teradata Company
What+is+Apache+Spark?+
• Open+source+Apache+project
− Parallel+middleware+for+server+
clusters
− Spark.apache.org+(2014)
• Developed+by+UC+Berkeley’s+
AMPLab
− Supported+by+Databricks
• Top+use+cases
− SQLaonaHadoop
− Machine+learning
− Streaming+data+miniabatches
CONFIDENTIAL+ +++++++| 6© 2015 Think Big, a Teradata Company
Apache-Spark-Core-Engine
Spark-SQL
Spark-
Streaming
MLib
(Machine-learning)
GraphX
(Graph)
Scala,-R-(SparkR),-Python-(PySpark)
What+is+Apache+Spark?+
CONFIDENTIAL+ +++++++| 7© 2015 Think Big, a Teradata Company
Data+Science+Approaches
7
Single-Workstation
- Small+data+sets
- No+distributed+analytics+
across+multiple+nodes
- Powerful+tools+are+R+or+
Python
- Data+Scientist+can+focus+on+
business+problem
Mixed
Single/Workstation/+/Cluster
- Small+or+large+data+sets
- Data+wrangling+and+feature+
engineering+is+performed+on+
cluster
- Predictive+analysis+and+
modeling+can+be+performed+on+
single+workstation
- Powerful+tools+are+Hadoop
Streaming+and+Spark
combined+with+R+and+Python
- Data+Scientist+now+have+to+
worry+about+parallelisation of+
some+data+mining+tasks+
(ususally the+ones+that+are+
embarrassingly+parallel)
Cluster
- Large+data+sets
- Both+data+wrangling+and+
modeling+is+performed+on+
cluster
- Spark+is+one+of+the+few+tools+
that+support+efficient+parallel+
machine+learning
- Parallelising machine+learning+
algorithms+is+challenging
CONFIDENTIAL+ +++++++| 8© 2015 Think Big, a Teradata Company
Data-Lake-(HDFS)
Core-Data-ScienceProduction
• Dashboards
• R+Shiny+Apps
• Predictive+model+
scoring
Plug+&+play+model+deployment
Data-Sources-
(Operations,+
Sales,+
marketing,+etc)
Ingestion
Realatime+
Optimization+with+
Multiaarmed+Bandit
Data
• Integration+of+R+and+
Python+with+Hadoop and+
Spark
• Leveraging+computing+
power+of+Hadoop cluster+
for+distributed+analytics
• Plug+&+play+model+
deployment+tools+for+
easy+and+robust+
productionising of+
analytics+models
Realatime+Data
Productionising Analytics
CONFIDENTIAL+ +++++++| 9© 2015 Think Big, a Teradata Company
Project-KickVoff
Data-Profiling-
and-Exploratory-
Analysis
Analytics-
Modeling
Model-Validation Model-Publishing Reporting
Data-Science-Project
Data+Science+and+Analytics+Overview
CONFIDENTIAL+ +++++++| 10© 2015 Think Big, a Teradata Company
We+leverage+our+expertise+across+industries
Dynamic-Pricing
Fraud-Detection
Customer-Segmentation
Recommendation-
Engine
Predictive-Asset-
Maintenance
Proactive-
Customer-
Support
Credit-Default-
Prediction
Churn-Modeling
Scenario-Simulation
A/B-Testing
Display-Targeting-Optimisation
Demand-Forecast
Cluster-Analysis-&-
Segmentation
Device-Analytics
Risk-Analytics
Customer-Analytics
CONFIDENTIAL+ +++++++| 11© 2015 Think Big, a Teradata Company
Thank+you

Mais conteúdo relacionado

Mais procurados

Alliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyronAlliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyron
BigDataExpo
 
Big Data Analytic with Hadoop: Customer Stories
Big Data Analytic with Hadoop: Customer StoriesBig Data Analytic with Hadoop: Customer Stories
Big Data Analytic with Hadoop: Customer Stories
Yellowfin
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Cloudera, Inc.
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Renee Yao
 

Mais procurados (16)

Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest MindsCase study: Hadoop as ELT for Leading US Retailer - Happiest Minds
Case study: Hadoop as ELT for Leading US Retailer - Happiest Minds
 
Alliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyronAlliander robin hagemans daniel peyron
Alliander robin hagemans daniel peyron
 
Big Data Analytic with Hadoop: Customer Stories
Big Data Analytic with Hadoop: Customer StoriesBig Data Analytic with Hadoop: Customer Stories
Big Data Analytic with Hadoop: Customer Stories
 
Eneco Ronald Root
Eneco Ronald RootEneco Ronald Root
Eneco Ronald Root
 
De groote de man Ingrid de Poorter
De groote de man Ingrid de PoorterDe groote de man Ingrid de Poorter
De groote de man Ingrid de Poorter
 
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
 
Neo4j Aura Enterprise
Neo4j Aura EnterpriseNeo4j Aura Enterprise
Neo4j Aura Enterprise
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Drive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data EnvironmentsDrive Business Outcomes for Big Data Environments
Drive Business Outcomes for Big Data Environments
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Itility marianne faro
Itility marianne faroItility marianne faro
Itility marianne faro
 
Six steps to leveraging location for the Canadian insurance industry
Six steps to leveraging location for the Canadian insurance industrySix steps to leveraging location for the Canadian insurance industry
Six steps to leveraging location for the Canadian insurance industry
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
 
Multi Cloud Data Integration- Manufacturing Industry
Multi Cloud Data Integration- Manufacturing IndustryMulti Cloud Data Integration- Manufacturing Industry
Multi Cloud Data Integration- Manufacturing Industry
 
Talend mike hirt
Talend mike hirtTalend mike hirt
Talend mike hirt
 

Destaque

Destaque (8)

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
We Are Social presents Social Brands: The eBook
We Are Social presents Social Brands: The eBookWe Are Social presents Social Brands: The eBook
We Are Social presents Social Brands: The eBook
 

Semelhante a Today’s reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies?

BIGITUAE201
BIGITUAE201BIGITUAE201
BIGITUAE201
Bryan C.
 
Chordify Brochure Staffing
Chordify Brochure StaffingChordify Brochure Staffing
Chordify Brochure Staffing
varunberry
 

Semelhante a Today’s reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies? (20)

Key benefits of outsourcing data abstraction services to our team
Key benefits of outsourcing data abstraction services to our teamKey benefits of outsourcing data abstraction services to our team
Key benefits of outsourcing data abstraction services to our team
 
ZIGRAM Introduction September 2020
ZIGRAM Introduction September 2020ZIGRAM Introduction September 2020
ZIGRAM Introduction September 2020
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
The Art of Data Science - event slides
The Art of Data Science - event slidesThe Art of Data Science - event slides
The Art of Data Science - event slides
 
Metaoption IT solutions
Metaoption IT solutionsMetaoption IT solutions
Metaoption IT solutions
 
12th July GDPR event slides
12th July GDPR event slides12th July GDPR event slides
12th July GDPR event slides
 
BIGITUAE201
BIGITUAE201BIGITUAE201
BIGITUAE201
 
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
Intelligent Business Process Management Suites (iBPMS) - The Next-Generation ...
 
ZIGRAM Introduction July 2021
ZIGRAM Introduction July 2021ZIGRAM Introduction July 2021
ZIGRAM Introduction July 2021
 
InfoVision Corporate Profile
InfoVision Corporate ProfileInfoVision Corporate Profile
InfoVision Corporate Profile
 
ThousandEyes Webinar: How to see and resolve office 365 performance challenges
ThousandEyes Webinar: How to see and resolve office 365 performance challengesThousandEyes Webinar: How to see and resolve office 365 performance challenges
ThousandEyes Webinar: How to see and resolve office 365 performance challenges
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
 
Digital Velocity London 2018 - James Morgan, Sainsbury's
Digital Velocity London 2018 - James Morgan, Sainsbury'sDigital Velocity London 2018 - James Morgan, Sainsbury's
Digital Velocity London 2018 - James Morgan, Sainsbury's
 
Chordify Brochure Staffing
Chordify Brochure StaffingChordify Brochure Staffing
Chordify Brochure Staffing
 
The Rise of Data Science Master Class - Joe Nguyen, H Plus
The Rise of Data Science Master Class - Joe Nguyen, H PlusThe Rise of Data Science Master Class - Joe Nguyen, H Plus
The Rise of Data Science Master Class - Joe Nguyen, H Plus
 
Making Big Data Work
Making Big Data WorkMaking Big Data Work
Making Big Data Work
 
BICS empowers predictive analytics and customer centricity with a Hadoop base...
BICS empowers predictive analytics and customer centricity with a Hadoop base...BICS empowers predictive analytics and customer centricity with a Hadoop base...
BICS empowers predictive analytics and customer centricity with a Hadoop base...
 
Shift Money 2019 - Staying Competitive in a Digital World – moving to Intelli...
Shift Money 2019 - Staying Competitive in a Digital World – moving to Intelli...Shift Money 2019 - Staying Competitive in a Digital World – moving to Intelli...
Shift Money 2019 - Staying Competitive in a Digital World – moving to Intelli...
 
Insight Presentation - Anil Kaul, AbsolutData
Insight Presentation - Anil Kaul, AbsolutDataInsight Presentation - Anil Kaul, AbsolutData
Insight Presentation - Anil Kaul, AbsolutData
 
From Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data RevolutionFrom Paris Hilton to Walmart: welcome to the Big Data Revolution
From Paris Hilton to Walmart: welcome to the Big Data Revolution
 

Mais de huguk

Mais de huguk (20)

Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaming
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub
Kevin O'Dell - Fraud and event detection using the Enterprise Data HubKevin O'Dell - Fraud and event detection using the Enterprise Data Hub
Kevin O'Dell - Fraud and event detection using the Enterprise Data Hub
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
 
Process Scheduling on Hadoop at Expedia
Process Scheduling on Hadoop at ExpediaProcess Scheduling on Hadoop at Expedia
Process Scheduling on Hadoop at Expedia
 
Developing Unit Testable Software with Hadoop at Expedia
Developing Unit Testable Software with Hadoop at ExpediaDeveloping Unit Testable Software with Hadoop at Expedia
Developing Unit Testable Software with Hadoop at Expedia
 
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKSUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Today’s reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies?

  • 1. DATA SCIENCE DATA ENGINEERS DATA SOLUTIONS Think&Big&Start&Smart&Scale&Fast Eliano Marques-– Senior-Data-Scientist Martin-Oberhuber-– Senior-Data-Scientist
  • 2. CONFIDENTIAL+ +++++++| 2© 2015 Think Big, a Teradata Company Think+Big+History 1st SI+Solution+Provider+with+100%+focus+on+open+source+ and+Big+Data+Hadoop ecosystem • 100++Successful+Programs • 70++Clients • Global+Delivery+Capabilities • We-are-hiring
  • 3. CONFIDENTIAL+ +++++++| 3© 2015 Think Big, a Teradata Company Think-Big-Clients Trusted&Analytics&Services&Provider&to&the&Fortune&1000 eCommerce 2+of+Global+Top+5 Internet-Transaction-Security Global #1 Retail 2+of+Global+Top+5 Brokerage &-Mutual-Funds 2+of+Global+Top+5 Social-Networking Global #1 Asset-Management Global #1 Credit-Issuer 2+of Global+Top+5 Semiconductor 2+of+Global Top+5 Banking 4+of+Global Top+10 Data Storage-Devices 3+of+Global Top+5 Financial Data-Services 2+of+Global+Top+5 Disk Manufacturing Global+#1 Financial-Exchanges Global #2 Telecommunications 2+of+Global Top+5 Media-& Advertising 2+of+Global+Top+5
  • 4. CONFIDENTIAL+ +++++++| 4© 2015 Think Big, a Teradata Company Think+Big+VELOCITY Methodology Big+Data Strategy Think+Big Academy Big+Data Program+Mgt Business Analytics Managed+ Services Data+ Engineering Big+Data+Lab Think+Big+engages+with+it’s+client’s+business,+technical,+analyst+and+support+teams+in+ an+agile+inspired+VELOCITY+Methodology+to+continuously+develop+Big+Data+solutions+
  • 5. CONFIDENTIAL+ +++++++| 5© 2015 Think Big, a Teradata Company What+is+Apache+Spark?+ • Open+source+Apache+project − Parallel+middleware+for+server+ clusters − Spark.apache.org+(2014) • Developed+by+UC+Berkeley’s+ AMPLab − Supported+by+Databricks • Top+use+cases − SQLaonaHadoop − Machine+learning − Streaming+data+miniabatches
  • 6. CONFIDENTIAL+ +++++++| 6© 2015 Think Big, a Teradata Company Apache-Spark-Core-Engine Spark-SQL Spark- Streaming MLib (Machine-learning) GraphX (Graph) Scala,-R-(SparkR),-Python-(PySpark) What+is+Apache+Spark?+
  • 7. CONFIDENTIAL+ +++++++| 7© 2015 Think Big, a Teradata Company Data+Science+Approaches 7 Single-Workstation - Small+data+sets - No+distributed+analytics+ across+multiple+nodes - Powerful+tools+are+R+or+ Python - Data+Scientist+can+focus+on+ business+problem Mixed Single/Workstation/+/Cluster - Small+or+large+data+sets - Data+wrangling+and+feature+ engineering+is+performed+on+ cluster - Predictive+analysis+and+ modeling+can+be+performed+on+ single+workstation - Powerful+tools+are+Hadoop Streaming+and+Spark combined+with+R+and+Python - Data+Scientist+now+have+to+ worry+about+parallelisation of+ some+data+mining+tasks+ (ususally the+ones+that+are+ embarrassingly+parallel) Cluster - Large+data+sets - Both+data+wrangling+and+ modeling+is+performed+on+ cluster - Spark+is+one+of+the+few+tools+ that+support+efficient+parallel+ machine+learning - Parallelising machine+learning+ algorithms+is+challenging
  • 8. CONFIDENTIAL+ +++++++| 8© 2015 Think Big, a Teradata Company Data-Lake-(HDFS) Core-Data-ScienceProduction • Dashboards • R+Shiny+Apps • Predictive+model+ scoring Plug+&+play+model+deployment Data-Sources- (Operations,+ Sales,+ marketing,+etc) Ingestion Realatime+ Optimization+with+ Multiaarmed+Bandit Data • Integration+of+R+and+ Python+with+Hadoop and+ Spark • Leveraging+computing+ power+of+Hadoop cluster+ for+distributed+analytics • Plug+&+play+model+ deployment+tools+for+ easy+and+robust+ productionising of+ analytics+models Realatime+Data Productionising Analytics
  • 9. CONFIDENTIAL+ +++++++| 9© 2015 Think Big, a Teradata Company Project-KickVoff Data-Profiling- and-Exploratory- Analysis Analytics- Modeling Model-Validation Model-Publishing Reporting Data-Science-Project Data+Science+and+Analytics+Overview
  • 10. CONFIDENTIAL+ +++++++| 10© 2015 Think Big, a Teradata Company We+leverage+our+expertise+across+industries Dynamic-Pricing Fraud-Detection Customer-Segmentation Recommendation- Engine Predictive-Asset- Maintenance Proactive- Customer- Support Credit-Default- Prediction Churn-Modeling Scenario-Simulation A/B-Testing Display-Targeting-Optimisation Demand-Forecast Cluster-Analysis-&- Segmentation Device-Analytics Risk-Analytics Customer-Analytics
  • 11. CONFIDENTIAL+ +++++++| 11© 2015 Think Big, a Teradata Company Thank+you