Machine Learning at scale is full of challenges. Many data scientist are finding that HPCC Systems is the right fit for their needs with Machine Learning in HPCC Systems already "built-in".
2. Current & Future Problems
Churn Prediction Truth and Veracity
Recommendations Online Advertisement
News Aggregation
Scalability
Content Discovery/Search
Intelligent Learning Machine Learning for Medicine
Source: Abhishek Shivkumar
3. LexisNexis is a provider of legal,
tax, regulatory, news, business
information, and analysis to
legal, corporate, government,
accounting and academic
markets.
LexisNexis has been in
business since 1977 with over
30,000 employees worldwide.
What is HPCC Systems?Who is ?
LexisNexis Risk is the division
of the LexisNexis which focuses
on data, Big Data processing,
linking and vertical expertise
and supports HPCC Systems
as an open source project
under Apache 2.0 License.
http://hpccsystems.com/
5. Different Needs
for the Data
Different Levels
of Proficiency
Alot of Data
Normalized / Denormalized
Structured / Unstructured
Data from 10,000+
Different Source
DEDUP, JOIN , INDEX ,
COUNT , REGEX, K-Means
BETWEEN, GROUP, CASE, Custom
1 Easy Language (ECL)
or
SQL , R , JAVA , Python , C++, SAS
Reliable Data Distribution & Processing
System that scales to exabytes+
Solutions
6. Machine Learning Built-in
Regression
Linear Regression
Classification
Naive Bayes
Perceptron
Decisions Trees
Logistic Regression
Clustering
K-Means
KD Trees
Agglomerative/Hierarchical
Association Analysis
AprioriN
EclatN
Rules
http://hpccsystems.com/ml
Michael Payne ,of Clemson University,
on high speed machine learning with
PB-BLAS in HPCC Systems.
http://youtu.be/s_HWlMwi6iI
7. “I’m sub-second
fast.”
“I can query all
or part of your
data.”
Thor Roxie
Single Threaded
Hard Disk
Index(optional)
Multi-Threaded
Hard Disk
Index(optional)
In-memory
SSD
Either/Both
Cluster Architecture
8. Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of
~/facebook_2013
Query is Completed in a Single Job
Asynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
SORT
GROUP
DEDUP
JOIN
MERGE
BETWEEN
LENGTH
REGEX
ROUND
SUM
COUNT
TRIM
WHEN
AVE
CASE
NORMALIZE
DENORMALIZE
K-MEANS
more ….
+
9. http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install
HPCC Systems
in 5 Minutes
Download HPCC Systems
Open Source
Community Edition
or
Source Code
https://github.com/hpcc-systems
http://hpccsystems.com/download/
15. Memcached Built-In
Key/Value & Distributed
Flexible Schema (JSON)
Cross Data Center Replication
w/ Replicas
What is Couchbase ?
Open Source
16. Memcached Built-In
Flexible Schema (JSON)
SQL++ (N1QL)
w/ Replicas
What is Couchbase ?
Key/Value & Distributed
Cross Data Center Replication
Open Source
21. INSTALL in 5 Minutes
Download
Source Code
Learning More - Couchbase Server & Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA
San Francisco ,CA
https://www.youtube.com/
user/CouchbaseVideo