SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Fast Machine Learning
with
by Fujio Turner
@FujioTurner
Current & Future Problems
Churn Prediction Truth and Veracity
Recommendations Online Advertisement
News Aggregation
Scalability
Content Discovery/Search
Intelligent Learning Machine Learning for Medicine
Source: Abhishek Shivkumar
LexisNexis is a provider of legal,
tax, regulatory, news, business
information, and analysis to
legal, corporate, government,
accounting and academic
markets.
LexisNexis has been in
business since 1977 with over
30,000 employees worldwide. 
What is HPCC Systems?Who is ?
LexisNexis Risk is the division
of the LexisNexis which focuses
on data, Big Data processing,
linking and vertical expertise
and supports HPCC Systems
as an open source project
under Apache 2.0 License.
http://hpccsystems.com/
Problems
Data from 10,000+
Different Source
Different Needs
for the Data
Different Levels
of Proficiency
Lots of Data
Different Needs
for the Data
Different Levels
of Proficiency
Alot of Data
Normalized / Denormalized
Structured / Unstructured
Data from 10,000+
Different Source
DEDUP, JOIN , INDEX ,
COUNT , REGEX, K-Means
BETWEEN, GROUP, CASE, Custom
1 Easy Language (ECL)
or
SQL , R , JAVA , Python , C++, SAS
Reliable Data Distribution & Processing
System that scales to exabytes+
Solutions
Machine Learning Built-in
Regression
Linear Regression
Classification
Naive Bayes
Perceptron
Decisions Trees
Logistic Regression
Clustering
K-Means
KD Trees
Agglomerative/Hierarchical
Association Analysis
AprioriN
EclatN
Rules
http://hpccsystems.com/ml
Michael Payne ,of Clemson University,
on high speed machine learning with
PB-BLAS in HPCC Systems.
http://youtu.be/s_HWlMwi6iI
“I’m sub-second
fast.”
“I can query all
or part of your
data.”
Thor Roxie
Single Threaded
Hard Disk
Index(optional)
Multi-Threaded
Hard Disk
Index(optional)
In-memory
SSD
Either/Both
Cluster Architecture
Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of
~/facebook_2013
Query is Completed in a Single Job
Asynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
SORT
GROUP
DEDUP
JOIN
MERGE
BETWEEN
LENGTH
REGEX
ROUND
SUM
COUNT
TRIM
WHEN
AVE
CASE
NORMALIZE
DENORMALIZE
K-MEANS
more ….
+
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install
HPCC Systems
in 5 Minutes
Download HPCC Systems
Open Source
Community Edition
or
Source Code
https://github.com/hpcc-systems
http://hpccsystems.com/download/
+
Common Big Data Setup
What is Couchbase ?
Open Source
Memcached Built-In
What is Couchbase ?
Open Source
Memcached Built-In w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Flexible Schema (JSON)
w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Key/Value & Distributed
Flexible Schema (JSON)
Cross Data Center Replication
w/ Replicas
What is Couchbase ?
Open Source
Memcached Built-In
Flexible Schema (JSON)
SQL++ (N1QL)
w/ Replicas
What is Couchbase ?
Key/Value & Distributed
Cross Data Center Replication
Open Source
+
Sub-Millisecond
SQL++(N1QL)
JSON
Distributed & Reliable
Distributed & Reliable
1 Language
Flexible Data Types
Ready for the Future
XDCR
Couchbase Mobile
.
.
.
.
.
Embedded JSON NoSQL Database
.
.
.
.
.
+ Sync Data Online / Offline
Embedded JSON NoSQL Database
+ Sync & Channel Data Peer-To-Peer
+ Sync Data Peer-To-Peer (directly)
Couchbase Mobile
Couchbase Mobile + HPCC Systems
.
.
.
.
.
Process & Store Data to Scale
INSTALL in 5 Minutes
Download
Source Code
Learning More - Couchbase Server & Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA
San Francisco ,CA
https://www.youtube.com/
user/CouchbaseVideo

Mais conteúdo relacionado

Mais procurados

Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Karel Minarik
 
Polyglot metadata for Hadoop
Polyglot metadata for HadoopPolyglot metadata for Hadoop
Polyglot metadata for HadoopJim Dowling
 
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of LuaRedis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of LuaItamar Haber
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for ElasticsearchJodok Batlogg
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learnedTit Petric
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using PigDavid Wellman
 
Redis 101 Data Structure
Redis 101 Data StructureRedis 101 Data Structure
Redis 101 Data StructureIsmaeel Enjreny
 
GlobalLogic Webinar: Massive aggregations with Spark and Hadoop
GlobalLogic Webinar: Massive aggregations with Spark and HadoopGlobalLogic Webinar: Massive aggregations with Spark and Hadoop
GlobalLogic Webinar: Massive aggregations with Spark and HadoopGlobalLogic Ukraine
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Beginner Apache Spark Presentation
Beginner Apache Spark PresentationBeginner Apache Spark Presentation
Beginner Apache Spark PresentationNidhin Pattaniyil
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]knowbigdata
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) Ryuji Tamagawa
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
Apache SOLR in AEM 6
Apache SOLR in AEM 6Apache SOLR in AEM 6
Apache SOLR in AEM 6Yash Mody
 

Mais procurados (20)

Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)Your Data, Your Search, ElasticSearch (EURUKO 2011)
Your Data, Your Search, ElasticSearch (EURUKO 2011)
 
Polyglot metadata for Hadoop
Polyglot metadata for HadoopPolyglot metadata for Hadoop
Polyglot metadata for Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Redis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of LuaRedis Developers Day 2015 - Secondary Indexes and State of Lua
Redis Developers Day 2015 - Secondary Indexes and State of Lua
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for Elasticsearch
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learned
 
Practical Hadoop using Pig
Practical Hadoop using PigPractical Hadoop using Pig
Practical Hadoop using Pig
 
Redis 101 Data Structure
Redis 101 Data StructureRedis 101 Data Structure
Redis 101 Data Structure
 
GlobalLogic Webinar: Massive aggregations with Spark and Hadoop
GlobalLogic Webinar: Massive aggregations with Spark and HadoopGlobalLogic Webinar: Massive aggregations with Spark and Hadoop
GlobalLogic Webinar: Massive aggregations with Spark and Hadoop
 
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Pig & Pig Latin | Big Data Hadoop Spark Tutorial | CloudxLab
 
Beginner Apache Spark Presentation
Beginner Apache Spark PresentationBeginner Apache Spark Presentation
Beginner Apache Spark Presentation
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
January 2011 HUG: Howl Presentation
January 2011 HUG: Howl PresentationJanuary 2011 HUG: Howl Presentation
January 2011 HUG: Howl Presentation
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]Interview questions on Apache spark [part 2]
Interview questions on Apache spark [part 2]
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
Apache SOLR in AEM 6
Apache SOLR in AEM 6Apache SOLR in AEM 6
Apache SOLR in AEM 6
 

Semelhante a Big Data - Fast Machine Learning at Scale + Couchbase

Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overviewABC Talks
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Federico Panini
 
Automated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gposAutomated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gposPriyanka Aash
 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsSPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsRod Soto
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
Fosdem17 honeypot your database server
Fosdem17 honeypot your database serverFosdem17 honeypot your database server
Fosdem17 honeypot your database serverGeorgi Kodinov
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Sintelix Software is Fantastic For Text Mining Software
Sintelix Software is Fantastic For Text Mining SoftwareSintelix Software is Fantastic For Text Mining Software
Sintelix Software is Fantastic For Text Mining Softwarenonstopshopper249
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionAmazon Web Services
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDatatdc-globalcode
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Nicolas Morales
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveAerospike, Inc.
 

Semelhante a Big Data - Fast Machine Learning at Scale + Couchbase (20)

Adarsh grid
Adarsh gridAdarsh grid
Adarsh grid
 
Adarsh grid
Adarsh gridAdarsh grid
Adarsh grid
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
broadfield_vm_cv
broadfield_vm_cvbroadfield_vm_cv
broadfield_vm_cv
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)Elasticsearch quick Intro (English)
Elasticsearch quick Intro (English)
 
Automated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gposAutomated prevention of ransomware with machine learning and gpos
Automated prevention of ransomware with machine learning and gpos
 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOsSPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Fosdem17 honeypot your database server
Fosdem17 honeypot your database serverFosdem17 honeypot your database server
Fosdem17 honeypot your database server
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Sintelix Software is Fantastic For Text Mining Software
Sintelix Software is Fantastic For Text Mining SoftwareSintelix Software is Fantastic For Text Mining Software
Sintelix Software is Fantastic For Text Mining Software
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
SQL In The Big Data Era
SQL In The Big Data EraSQL In The Big Data Era
SQL In The Big Data Era
 
4AA4-1812ENW
4AA4-1812ENW4AA4-1812ENW
4AA4-1812ENW
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0Taming Big Data with Big SQL 3.0
Taming Big Data with Big SQL 3.0
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 

Último

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Big Data - Fast Machine Learning at Scale + Couchbase

  • 1. Fast Machine Learning with by Fujio Turner @FujioTurner
  • 2. Current & Future Problems Churn Prediction Truth and Veracity Recommendations Online Advertisement News Aggregation Scalability Content Discovery/Search Intelligent Learning Machine Learning for Medicine Source: Abhishek Shivkumar
  • 3. LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government, accounting and academic markets. LexisNexis has been in business since 1977 with over 30,000 employees worldwide.  What is HPCC Systems?Who is ? LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License. http://hpccsystems.com/
  • 4. Problems Data from 10,000+ Different Source Different Needs for the Data Different Levels of Proficiency Lots of Data
  • 5. Different Needs for the Data Different Levels of Proficiency Alot of Data Normalized / Denormalized Structured / Unstructured Data from 10,000+ Different Source DEDUP, JOIN , INDEX , COUNT , REGEX, K-Means BETWEEN, GROUP, CASE, Custom 1 Easy Language (ECL) or SQL , R , JAVA , Python , C++, SAS Reliable Data Distribution & Processing System that scales to exabytes+ Solutions
  • 6. Machine Learning Built-in Regression Linear Regression Classification Naive Bayes Perceptron Decisions Trees Logistic Regression Clustering K-Means KD Trees Agglomerative/Hierarchical Association Analysis AprioriN EclatN Rules http://hpccsystems.com/ml Michael Payne ,of Clemson University, on high speed machine learning with PB-BLAS in HPCC Systems. http://youtu.be/s_HWlMwi6iI
  • 7. “I’m sub-second fast.” “I can query all or part of your data.” Thor Roxie Single Threaded Hard Disk Index(optional) Multi-Threaded Hard Disk Index(optional) In-memory SSD Either/Both Cluster Architecture
  • 8. Sort Count Group Classification (ROXIE) 0.27 seconds to (THOR) few hours Country = ‘US’ Join Index of ~/facebook_2013 Query is Completed in a Single Job Asynchronously ~/facebook_2013 Country = ‘US’ ~/twitter_2013 SORT GROUP DEDUP JOIN MERGE BETWEEN LENGTH REGEX ROUND SUM COUNT TRIM WHEN AVE CASE NORMALIZE DENORMALIZE K-MEANS more …. +
  • 9. http://www.youtube.com/watch?v=8SV43DCUqJg Watch how to install HPCC Systems in 5 Minutes Download HPCC Systems Open Source Community Edition or Source Code https://github.com/hpcc-systems http://hpccsystems.com/download/
  • 11. What is Couchbase ? Open Source
  • 12. Memcached Built-In What is Couchbase ? Open Source
  • 13. Memcached Built-In w/ Replicas What is Couchbase ? Open Source
  • 14. Memcached Built-In Flexible Schema (JSON) w/ Replicas What is Couchbase ? Open Source
  • 15. Memcached Built-In Key/Value & Distributed Flexible Schema (JSON) Cross Data Center Replication w/ Replicas What is Couchbase ? Open Source
  • 16. Memcached Built-In Flexible Schema (JSON) SQL++ (N1QL) w/ Replicas What is Couchbase ? Key/Value & Distributed Cross Data Center Replication Open Source
  • 17. + Sub-Millisecond SQL++(N1QL) JSON Distributed & Reliable Distributed & Reliable 1 Language Flexible Data Types Ready for the Future XDCR
  • 19. . . . . . + Sync Data Online / Offline Embedded JSON NoSQL Database + Sync & Channel Data Peer-To-Peer + Sync Data Peer-To-Peer (directly) Couchbase Mobile
  • 20. Couchbase Mobile + HPCC Systems . . . . . Process & Store Data to Scale
  • 21. INSTALL in 5 Minutes Download Source Code Learning More - Couchbase Server & Lite http://couchbase.com/download https://github.com/couchbase Mountain View, CA San Francisco ,CA https://www.youtube.com/ user/CouchbaseVideo