SlideShare uma empresa Scribd logo
1 de 19
© 2014 MapR Technologies 1
© 2014 MapR Technologies 2
© 2014 MapR Technologies 3
© 2014 MapR Technologies 4
A typical
encounter with a
potential Mahout
user
© 2014 MapR Technologies 5
Which leads us to
the Mahout 1.0
vision
© 2014 MapR Technologies 6
© 2014 MapR Technologies 7
© 2014 MapR Technologies 8
© 2014 MapR Technologies 9
Example: Cooccurrence Analysis
© 2014 MapR Technologies 10
How often do items co-occur?
// load distributed matrix
val A = drmFromHDFS(...)
// compute co-occurrences
val C = A.t %*% A
© 2014 MapR Technologies 11
How often do items co-occur?
// load distributed matrix
val A = drmFromHDFS(...)
// compute co-occurrences
val C = A.t %*% A
Under the covers:
Optimizer rewrites the matrix multiplication and
transpose operations to a TransposeSelf operator
Optimizer chooses from two physical operators for
TransposeSelf
© 2014 MapR Technologies 12
Which items co-occur anomalously?
// compute & broadcast number
// of interactions per item
val numInteractions =
drmBroadcast(A.colSums)
// create indicator matrix
val I = C.mapBlock() {
case (keys, block) =>
// allocate sparse block of indicator matrix
val indicatorBlock = sparse(block.nrow, block.ncol)
// compute indicators with loglikelihood ratio test
for (row <- block)
indicatorBlock(row.index,::) = computeLLR(row,numInteractions)
keys -> indicatorBlock
}
© 2014 MapR Technologies 13
Runtime
• prototype on Apache Spark
– fast and expressive cluster
computing system
– general computation graphs, in-memory primitives, rich API, interactive
shell
• future: add Stratosphere
– project proposed to
Apache Incubator recently
– similar to Apache Spark, adds data flow optimization and efficient out-
of-core execution
© 2014 MapR Technologies 14
© 2014 MapR Technologies 15
© 2014 MapR Technologies 16
How Does This Apply?
© 2014 MapR Technologies 17
How Can I Start?
© 2014 MapR Technologies 18
Q&A
@ted_dunning @mapr maprtech
tdunning@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
© 2014 MapR Technologies 20

Mais conteúdo relacionado

Mais procurados

Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the MoviesDataWorks Summit
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With ChaosDataWorks Summit
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with ChaosMapR Technologies
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.Albert Bifet
 
Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0Aditya Yadav
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessDataWorks Summit
 

Mais procurados (20)

Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0Universal Adiabatic Quantum Computer v1.0
Universal Adiabatic Quantum Computer v1.0
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Architecting R into Storm Application Development Process
Architecting R into Storm Application Development ProcessArchitecting R into Storm Application Development Process
Architecting R into Storm Application Development Process
 

Destaque

HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLHBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 

Destaque (6)

HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLHBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQL
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 

Semelhante a Possible Visions for Mahout 1.0

Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksKelly Kohlleffel
 
Uav route planning for maximum target coverage
Uav route planning for maximum target coverageUav route planning for maximum target coverage
Uav route planning for maximum target coveragecseij
 
Survey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environmentSurvey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environmentijccsa
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Amir Payberah
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...MapR Technologies
 
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...daoudalaa
 
Multi-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleetMulti-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleetdaoudalaa
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedAllen Day, PhD
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014Hortonworks
 
The multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURFThe multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURFMiguel Rebollo
 
Integrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system forIntegrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system forijcsa
 
IEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan KumarIEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan KumarChandan Kumar
 
Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014ijcsbi
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfKAI CHU CHUNG
 
Optimal location of relief facility.pptx
Optimal location of relief facility.pptxOptimal location of relief facility.pptx
Optimal location of relief facility.pptxgyaneshtripathiirsme
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 

Semelhante a Possible Visions for Mahout 1.0 (20)

Trucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - HortonworksTrucking demo w Spark ML - Paul Hargis - Hortonworks
Trucking demo w Spark ML - Paul Hargis - Hortonworks
 
Uav route planning for maximum target coverage
Uav route planning for maximum target coverageUav route planning for maximum target coverage
Uav route planning for maximum target coverage
 
Survey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environmentSurvey on virtual machine placement techniques in cloud computing environment
Survey on virtual machine placement techniques in cloud computing environment
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
A Generic Agent Model Towards Comparing Resource Allocation Approaches to On-...
 
Multi-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleetMulti-agent approach to resource allocation inautonomous vehicle fleet
Multi-agent approach to resource allocation inautonomous vehicle fleet
 
PFD UAV Final Presentation
PFD UAV Final PresentationPFD UAV Final Presentation
PFD UAV Final Presentation
 
Building Data Science Teams, Abbreviated
Building Data Science Teams, AbbreviatedBuilding Data Science Teams, Abbreviated
Building Data Science Teams, Abbreviated
 
Openshift YARN - strata 2014
Openshift YARN - strata 2014Openshift YARN - strata 2014
Openshift YARN - strata 2014
 
The multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURFThe multigent Layer for CALMeD SURF
The multigent Layer for CALMeD SURF
 
Integrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system forIntegrating fuzzy and ant colony system for
Integrating fuzzy and ant colony system for
 
journal for research
journal for researchjournal for research
journal for research
 
IEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan KumarIEEE Paper Presentation by Chandan Kumar
IEEE Paper Presentation by Chandan Kumar
 
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
Optimize Virtual Machine Placement in Banker Algorithm for Energy Efficient C...
 
Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014Vol 9 No 1 - January 2014
Vol 9 No 1 - January 2014
 
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdfDevfest 2023 - Service Weaver Introduction - Taipei.pdf
Devfest 2023 - Service Weaver Introduction - Taipei.pdf
 
Optimal location of relief facility.pptx
Optimal location of relief facility.pptxOptimal location of relief facility.pptx
Optimal location of relief facility.pptx
 
Ch1
Ch1Ch1
Ch1
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 

Mais de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

Mais de Ted Dunning (11)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Possible Visions for Mahout 1.0

  • 1. © 2014 MapR Technologies 1
  • 2. © 2014 MapR Technologies 2
  • 3. © 2014 MapR Technologies 3
  • 4. © 2014 MapR Technologies 4 A typical encounter with a potential Mahout user
  • 5. © 2014 MapR Technologies 5 Which leads us to the Mahout 1.0 vision
  • 6. © 2014 MapR Technologies 6
  • 7. © 2014 MapR Technologies 7
  • 8. © 2014 MapR Technologies 8
  • 9. © 2014 MapR Technologies 9 Example: Cooccurrence Analysis
  • 10. © 2014 MapR Technologies 10 How often do items co-occur? // load distributed matrix val A = drmFromHDFS(...) // compute co-occurrences val C = A.t %*% A
  • 11. © 2014 MapR Technologies 11 How often do items co-occur? // load distributed matrix val A = drmFromHDFS(...) // compute co-occurrences val C = A.t %*% A Under the covers: Optimizer rewrites the matrix multiplication and transpose operations to a TransposeSelf operator Optimizer chooses from two physical operators for TransposeSelf
  • 12. © 2014 MapR Technologies 12 Which items co-occur anomalously? // compute & broadcast number // of interactions per item val numInteractions = drmBroadcast(A.colSums) // create indicator matrix val I = C.mapBlock() { case (keys, block) => // allocate sparse block of indicator matrix val indicatorBlock = sparse(block.nrow, block.ncol) // compute indicators with loglikelihood ratio test for (row <- block) indicatorBlock(row.index,::) = computeLLR(row,numInteractions) keys -> indicatorBlock }
  • 13. © 2014 MapR Technologies 13 Runtime • prototype on Apache Spark – fast and expressive cluster computing system – general computation graphs, in-memory primitives, rich API, interactive shell • future: add Stratosphere – project proposed to Apache Incubator recently – similar to Apache Spark, adds data flow optimization and efficient out- of-core execution
  • 14. © 2014 MapR Technologies 14
  • 15. © 2014 MapR Technologies 15
  • 16. © 2014 MapR Technologies 16 How Does This Apply?
  • 17. © 2014 MapR Technologies 17 How Can I Start?
  • 18. © 2014 MapR Technologies 18 Q&A @ted_dunning @mapr maprtech tdunning@mapr.com Engage with us! MapR maprtech mapr-technologies
  • 19. © 2014 MapR Technologies 20

Notas do Editor

  1. I just have 5 minutes for this talk. Given the short time I thought I’d share with you some of the more interesting things you can do with Hadoop in 5 minutes or less…
  2. In 1 minute you can perform 4.73 million concurrent authentications in the largest biometric database in the worldIn India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with. The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required an integrated in-Hadoop database that was capable of 200 millisecond response times while supporting millions of concurrent look-ups.
  3. In 1 minute you can perform 4.73 million concurrent authentications in the largest biometric database in the worldIn India, there is no social security card. It’s difficult for the average citizen to set up a bank account, access benefit programs, and enjoy economic mobility. It’s difficult for the government as well with over a $1B of government aid classified as leakage, the result of fraud and corruption. The Aadhaar program is poised to change all that by leveraging the unique IDs that all people are born with. The program aims to get fingerprints and retina scans for all 1.2 billion citizens. The scale of this project required an integrated in-Hadoop database that was capable of 200 millisecond response times while supporting millions of concurrent look-ups.