SlideShare uma empresa Scribd logo
1 de 29
HPCC Systems 
See Through Patterns, Hidden Relationships and 
Networks to Find Opportunities in Big Data. 
Open Source, Big Data Processing and Analytics
LexisNexis
3 
LexisNexis leverages data about people, businesses, and assets to assess risk and 
opportunity associated with industry-specific problems 
Contributory 
Data 
Public 
Records 
Credit 
Headers 
Phone 
Listings Insurance 
Financial Services 
HPCC Systems 
technology 
Receivables 
Management 
Legal 
Government 
Health Care 
Assess underwriting risk and verify applicant data; prevent/ 
investigate fraudulent claims; optimize policy administration 
Prevent/investigate money laundering and comply with laws 
Prevent/investigate identity fraud 
Open Source and commercial offerings of our leading Big Data 
platform 
Assist collections by locating delinquent debtors 
Assess collectability of debts and prioritize collection efforts 
Locate/vet witnesses, assess assets/associations of parties in legal 
actions; Perform diligence on prospective clients (KYC) 
Locate missing children/suspects; research/document cases; 
reduce entitlement fraud; accelerate revenue collection 
Verify patient identity, eligibility, and ability to pay 
Validate provider credentials; prevent/investigate fraud
LexisNexis Risk Solutions works with Fortune 1000 and mid-market clients across 
all industries and both federal and state governments. 
4 
‱ Customers in over 139 countries 
‱ 6 of the world’s top 10 banks 
‱ 100% of the top 50 U.S. banks 
‱ 80% of the Fortune 500 companies 
‱ 100% of U.S. P&C insurance carriers 
‱ All 50 U.S. states 
‱ 70% of local governments 
‱ 80% of U.S. Federal agencies 
‱ 97 of AM Law 100 firms
HPCC Systems
Collection – Structured, unstructured and semi-structured data from multiple sources 
Ingestion – loading vast amounts of data onto a single data store 
Discovery & Cleansing – understanding format and content; clean up and formatting 
Integration – linking, entity extraction, entity resolution, indexing and data fusion 
Analysis – Intelligence, statistics, predictive and text analytics, machine learning 
Delivery – querying, visualization, real time delivery on enterprise-class availability 
6 
LexisNexis Big Data Value Chain
Sample Use Cases for HPCC Systems 
Vertical Example 
Automate Identification Automate Identity Identification and disambiguation for people, businesses, 
assets 
Complex Queries Run complex queries on transaction data (years, months, decades) to see past and 
present behavior 
Predict Behavior Predict behavior and patterns leveraging current and historical data from a variety 
of data sources 
Financial Services Analyze current and historical consumer transaction data from various sources to 
see fraud patterns or opportunities during a lifetime cycle of a customer. 
Government Manage and analyze the flow of people, businesses and assets from various data 
sources to monitor transactions and fraud rings. 
Health Care/Insurance Manage customer/patient transaction data on electronic medical records /claims 
including medical information, medical procedures, prescriptions, etc. 
Internet Monitor social media outlets for trends and patterns. Monitor all the data on your 
network to guard against cyber security attacks. 
Retail Capture and analyze information from all branches of stores, locations, 
departments to manage pricing, inventory, distribution. 
Telecommunications Processing customer data such as billions of call records, texts, streaming media 
and GPS history. Analyze customer churn, usage behavior patterns, failure. 
7
8 
HPCC Systems is an Open Source Platform for Big Data Processing 
‱ High Performance Computing Cluster (HPCC Systems) enables data integration on a scale not previously available and real-time 
answers to millions of users. Built for Big Data and proven for 10 years with enterprise customers. 
‱ ECL Parallel Programming Language optimized for business differentiating data intensive applications 
‱ Single architecture offers two data platforms (query and refinery) and a consistent data-intensive programming language (ECL)
HPCC Systems Architecture 
9
HPCC Systems Roadmap Solves the Talent & Work Crunch Problem 
Developers Linkers Modelers Data 
Scientists 
Business 
Managers 
Types of 
Users: 
Talent & 
Work 
Crunch 
Issue 
High-level 
tools: 
Link 
datasets 
Ingest data 
Write queries 
Create 
products 
Discover 
attributes, 
create models 
& scoring logic 
Discover 
patterns 
Write 
algorithms 
Monitor 
business 
performance 
and customer 
needs, dynamics 
Smart View, 
SALT, KEL 
HPCC Flow, 
KEL 
HPCC Flow, 
EDA 
HPCC Flow, 
Circuits, 
RAMPS 
Dashboard 
Understand 
patterns 
Discover 
anomalies 
ECL N/A N/A KEL, SALT, ECL N/A 
10 
Low-level 
tools: 
BI Analysts 
Scored Search, 
Dashboard 
N/A 
What 
they 
do 
Developers 
want a faster, 
better way to 
solve 
problems. 
Hadoop 
requires many 
developers, 
which is 
expensive for 
firms 
Hard for 
companies to 
find and hire. 
They need a 
better, faster 
way to 
understand 
how to link 
data for 
analysis. 
Hard and 
expensive for 
companies to 
find and hire. 
Need more 
time on 
analysis not 
low-level data 
manipulation 
tasks. 
They struggle 
with IT and 
data solutions 
to get the data 
and insight they 
need to make 
business 
decisions. Most 
IT and data 
solutions are 
not connected. 
Hard for 
companies to 
find and hire. 
They need a 
better, faster 
way to model 
data so that 
are spending 
their time on 
analysis, not 
programing 
models 
Hard and 
expensive for 
companies to 
find and hire. 
Need more 
time on 
analysis not 
low-level data 
manipulation 
tasks. 
How HPCC Systems Helps Address the Talent Crunch
Deep Dive on HPCC Systems Components
Thor Logical Architecture 
12
Roxie Logical Architecture 
13
Thor Physical Architecture 
14
Data Refinery Process 
15
16 
Programming Language is called Enterprise Control Language (ECL) 
Declarative programming language: Describe what needs 
to be done and not how to do it 
Powerful: Unlike Java, high level primitives as JOIN, 
TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. 
are available. Higher level code means fewer 
programmers & shortens time to delivery 
Extensible: As new attributes are defined, they become 
primitives that other programmers can use 
Implicitly parallel: Parallelism is built into the underlying 
platform. The programmer needs not be concerned 
with it 
Maintainable: A high level programming language, no side 
effects and attribute encapsulation provide for more 
succinct, reliable and easier to troubleshoot code 
Complete: ECL provides for a complete data programming 
paradigm 
Homogeneous: One language to express data algorithms 
across the entire HPCC platform, including data ETL 
and high speed data delivery
17 
‱ ECL is a declarative, data-centric, programming language 
which can expressed concisely, parallelizes naturally, is 
free from side effects, and results in highly-optimized 
executable code. 
‱ ECL is designed for a specific problem domain (data-intensive 
computing), which makes resulting programs 
clearer, more compact, and more expressive. ECL 
provides a more natural way to think about data 
processing problems for large distributed datasets. 
‱ Since ECL is declarative, execution is not determined by 
the order of the language statements, but from the 
sequence of dataflows and transformations represented 
by the language statements. The ECL compiler 
determines the optimum execution strategy and graph. 
‱ ECL incorporates transparent and implicit parallelism 
regardless of the size of the computing cluster and 
reduces the complexity of parallel programming 
increasing the productivity of application developers. 
‱ The ECL compiler generates highly optimized C++ for 
execution. 
‱ ECL provides a comprehensive IDE and programming 
tools including an Eclipse plugin. 
‱ ECL is provided with a large library of efficient modules to 
handle common data manipulation tasks. 
1 
1 
2 
2 
Programming Language is called Enterprise Control Language (ECL)
Scalable Automated Linking Technology (SALT) and LexID
SALT - Algorithms for Data Analysis and Linking 
‱ Transform a base file into a rolling historic record 
Data Ingest of incoming records. 
‱ Generate statistics for field lengths, word counts, 
Data Profiling population rate, etc. 
‱ Format fields correctly at the character level (i.e., 
Data Hygiene capitals, punctuation, spelling errors, etc.) 
‱ Create 360⁰ profiles on individuals, organizations, etc. 
through iteratively linking data from multiple sources 
Entity 
Resolution 
‱ Uncover hidden patterns within massive datasets 
to determine how entities relate to one another 
Relationship 
Extraction 
SALT 
Scalable 
Automated 
Linking 
Technology 
19
20 
LexisNexis linking is statistically-based (not rules-based), using the LexisNexis 
master databases for reference 
Two examples where linkage strictly based on rules doesn’t work: 
1. LN Linking sufficiently explores all 
 Flavio Villanustre, Atlanta 
 Javio Villanustre, Atlanta 
Match, because the system has learnt that 
“Villanustre” is specific because the frequency of 
occurrence is small and there is only one present in 
Atlanta 
NO MATCH, because the rules determine that 
“Flavio“ and “Javio” are not the same 
LN 
Linking 
RULES 
INPUT 
Error 
 John Smith, Atlanta 
 John Smith, Atlanta 
NO Match, because the system has learnt that “John 
Smith” is not specific because the frequency of 
occurrence is large and there are many present in 
Atlanta 
MATCH, because the rules determine that 
“John Smith“ and the city for both the records match 
LN 
Linking 
RULES 
INPUT 
Error 
Record 1 
Record 2 
credible matches 
2. LN Linking effectively minimizes 
false matches
21 
The secret sauce in our portfolio is LexIDSM 
The fastest linking technology platform available 
with results that help you make intelligent 
information connections. 
LexIDSM is the ingredient behind our products that turns 
disparate information into meaningful insights. This technology 
enables customers using our products to identify, link and 
organize information quickly with a high degree of accuracy. 
Get a more complete picture 
Make intelligent information 
connections beyond the obvious by 
drawing insights from both traditional 
and new sources of data. 
Better results, faster 
Use the fastest technology for 
processing large amounts of data to 
help you solve cases more quickly 
and confidently. 
Protect private information 
Keep customer SSNs and FEINs 
secure and enjoy peace of mind 
knowing you are taking steps to 
observe the highest levels of 
privacy and compliance.
Case Studies
Case Study: Fraud in Medicaid 
Scenario 
Proof of concept for Office of the Medicaid Inspector 
Generation (OMIG) of large Northeastern state. Social groups 
game the Medicaid system which results in fraud and 
improper payments. 
Task 
Given a large list of names and addresses, identify social 
clusters of Medicaid recipients living in expensive houses, 
driving expensive cars. 
Result 
Interesting recipients were identified using asset 
variables, revealing hundreds of high-end 
automobiles and properties. 
Leveraging the Public Data Social Graph, large social 
groups of interesting recipients were identified 
along with links to provider networks. 
The analysis identified key individuals not in the data 
supplied along with connections to suspicious 
volumes of “property flipping” potentially indicative 
of mortgage fraud and money laundering 
23
Scenario 
Healthcare insurers need better analytics to identify drug 
seeking behavior and schemes that recruit members to use 
their membership fraudulently. 
Groups of people collude to source schedule drugs through 
multiple members to avoid being detected by rules based 
systems. 
Providers recruit members to provide and escalate services 
that are not rendered. 
Task 
Given a large set of prescriptions. Calculate normal social 
distributions of each brand and detect where there is an 
unusual socialization of prescriptions and services. 
Result 
The analysis detected social groups that are sourcing Vicodin 
and other schedule drugs. Identifies prescribers and 
pharmacies involved to help the insurer focus investigations 
and intervene strategically to mitigate risk. 
24 
Case Study: Fraud in Prescription Drugs
Case Study: Insurance Score From 100 Days To 30 Minutes 
Scenario 
One of the top 3 insurance providers using Oracle analytics products on multiple statistical model 
platforms with disparate technologies 
– Insurer issues request to re-run all past reports for a customer: 11.5M reports since 2004 
– Using their current technology infrastructure it would take 100 days to run these 11.5M reports 
Task 
Migration of 75 different models, 74,000 lines of code and approximately 700 definitions to the 
HPCC 
– Models were migrated in 1 man month. 
– Using a small development system (and only one developer), we ran 11.5 million reports in 66 minutes 
– Performance on a production-size system: 30 minutes 
– Testing demonstrated our ability to work in batch or online 
Result 
– Reduced manual work, increases reliability, and created capability to do new scores 
– Decreased development time from 1 year to several weeks; decreased run time from 100 days to 30 
minutes 
– One HPCC (one infrastructure) translates into less people maintaining systems. 
25
Scenario 
This view of carrier data shows 
seven known fraud claims and an 
additional linked claim. 
The Insurance company data only 
finds a connection between two 
of the seven claims, and only 
identified one other claim as 
being weakly connected. 
26 
Case Study: Fraud & Collusion in Auto Insurance Claims
Task 
After adding the LexID to the 
carrier Data, LexisNexis 
HPCC technology then 
added 2 additional degrees 
of associations 
Result 
The results showed two 
family groups 
interconnected on all of 
these seven claims. 
The links were much 
stronger than the carrier data 
previously supported. 
Family 1 
Family 2 
27 
Case Study: Fraud & Collusion in Auto Insurance Claims
Case Study: Network Traffic Analysis in Seconds 
Scenario 
Conventional network sensor and monitoring 
solutions are constrained by inability to quickly 
ingest massive data volumes for analysis 
- 15 minutes of network traffic can generate 4 
Terabytes of data, which can take 6 hours to process 
- 90 days of network traffic can add up to 300+ 
Terabytes 
Task 
Drill into all the data to see if any US government 
systems have communicated with any suspect 
systems of foreign organizations in the last 6 months 
- In this scenario, we look specifically for traffic 
occurring at unusual hours of the day 
Result 
In seconds, HPCC Systems sorted through months of 
network traffic to identify patterns and suspicious 
behavior 
28
info@hpccsystems.com 
US: 1.877.316.9669 
Intl: 1.678.694.2200 
Contact Us 
29

Mais conteĂșdo relacionado

Mais procurados

The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
DataWorks Summit
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
Dataconomy Media
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
DataWorks Summit
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturing
DataWorks Summit
 

Mais procurados (20)

The Implacable advance of the data
The Implacable advance of the dataThe Implacable advance of the data
The Implacable advance of the data
 
Ultralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC EdgeUltralight Data Movement for IoT with SDC Edge
Ultralight Data Movement for IoT with SDC Edge
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Paris FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks PresentationParis FOD Meetup #5 Hortonworks Presentation
Paris FOD Meetup #5 Hortonworks Presentation
 
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)FOD Paris Meetup -  Global Data Management with DataPlane Services (DPS)
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 
Risk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growthRisk listening: monitoring for profitable growth
Risk listening: monitoring for profitable growth
 
Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...Balancing data democratization with comprehensive information governance: bui...
Balancing data democratization with comprehensive information governance: bui...
 
Achieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturingAchieving a 360 degree view of manufacturing
Achieving a 360 degree view of manufacturing
 
Intro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJIntro to Spark & Zeppelin - Crash Course - HS16SJ
Intro to Spark & Zeppelin - Crash Course - HS16SJ
 

Destaque

Sp apps for the end user 2013-08-15
Sp apps for the end user 2013-08-15Sp apps for the end user 2013-08-15
Sp apps for the end user 2013-08-15
RBC Wealth Management
 
Bridge the gap alerts
Bridge the gap   alertsBridge the gap   alerts
Bridge the gap alerts
aoconno2
 
BEST HARDWARE AND NETWORKING TRAINING
BEST HARDWARE AND NETWORKING TRAININGBEST HARDWARE AND NETWORKING TRAINING
BEST HARDWARE AND NETWORKING TRAINING
CMS Computer
 
My E-mail appears as spam - troubleshooting - domain name and E-mail content ...
My E-mail appears as spam - troubleshooting - domain name and E-mail content ...My E-mail appears as spam - troubleshooting - domain name and E-mail content ...
My E-mail appears as spam - troubleshooting - domain name and E-mail content ...
Eyal Doron
 

Destaque (20)

Data Analytics Governance and Ethics
Data Analytics Governance and EthicsData Analytics Governance and Ethics
Data Analytics Governance and Ethics
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago Chapter
 
HPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the World
 
HPCC Presentation
HPCC PresentationHPCC Presentation
HPCC Presentation
 
Inox thuan thanh
Inox thuan thanhInox thuan thanh
Inox thuan thanh
 
Sp apps for the end user 2013-08-15
Sp apps for the end user 2013-08-15Sp apps for the end user 2013-08-15
Sp apps for the end user 2013-08-15
 
ORGANIZATIONS CHART
ORGANIZATIONS CHARTORGANIZATIONS CHART
ORGANIZATIONS CHART
 
How to simulate spoof e mail attack and bypass spf sender verification - 2#2
How to simulate spoof e mail attack and bypass spf sender verification - 2#2How to simulate spoof e mail attack and bypass spf sender verification - 2#2
How to simulate spoof e mail attack and bypass spf sender verification - 2#2
 
Matriz de datos
Matriz de datosMatriz de datos
Matriz de datos
 
Looking at INSPIRE from an Open Source obsessed SME
Looking at INSPIRE from an Open Source obsessed SMELooking at INSPIRE from an Open Source obsessed SME
Looking at INSPIRE from an Open Source obsessed SME
 
Bridge the gap alerts
Bridge the gap   alertsBridge the gap   alerts
Bridge the gap alerts
 
BEST HARDWARE AND NETWORKING TRAINING
BEST HARDWARE AND NETWORKING TRAININGBEST HARDWARE AND NETWORKING TRAINING
BEST HARDWARE AND NETWORKING TRAINING
 
Si spersonalizzante
Si spersonalizzanteSi spersonalizzante
Si spersonalizzante
 
My E-mail appears as spam - troubleshooting - domain name and E-mail content ...
My E-mail appears as spam - troubleshooting - domain name and E-mail content ...My E-mail appears as spam - troubleshooting - domain name and E-mail content ...
My E-mail appears as spam - troubleshooting - domain name and E-mail content ...
 
Hackathon - Mapping da National Core a INSPIRE (Hydrography)
Hackathon - Mapping da National Core a INSPIRE (Hydrography)Hackathon - Mapping da National Core a INSPIRE (Hydrography)
Hackathon - Mapping da National Core a INSPIRE (Hydrography)
 
10 Things To Look For In A GP
10 Things To Look For In A GP10 Things To Look For In A GP
10 Things To Look For In A GP
 
Connettivi per contraddire
Connettivi per contraddireConnettivi per contraddire
Connettivi per contraddire
 
Unleash the Mathematical Genius in You - Problem Solving Mastery
Unleash the Mathematical Genius in You - Problem Solving MasteryUnleash the Mathematical Genius in You - Problem Solving Mastery
Unleash the Mathematical Genius in You - Problem Solving Mastery
 
E Commerce
E CommerceE Commerce
E Commerce
 
Uu ite
Uu iteUu ite
Uu ite
 

Semelhante a HPCC Systems - Open source, Big Data Processing & Analytics

BCBS -By Ontology2
BCBS -By Ontology2BCBS -By Ontology2
BCBS -By Ontology2
bfreeman1987
 
Flow-ABriefExplanation
Flow-ABriefExplanationFlow-ABriefExplanation
Flow-ABriefExplanation
Anthony Parziale
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Precisely
 
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Denodo
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 

Semelhante a HPCC Systems - Open source, Big Data Processing & Analytics (20)

Semantic business applications - case examples - Ontology Summit 2011
Semantic business applications - case examples - Ontology Summit 2011Semantic business applications - case examples - Ontology Summit 2011
Semantic business applications - case examples - Ontology Summit 2011
 
BCBS -By Ontology2
BCBS -By Ontology2BCBS -By Ontology2
BCBS -By Ontology2
 
Big Data Bootcamp 2017 - Atlanta - Flavio Villanustre
Big Data Bootcamp 2017 - Atlanta - Flavio VillanustreBig Data Bootcamp 2017 - Atlanta - Flavio Villanustre
Big Data Bootcamp 2017 - Atlanta - Flavio Villanustre
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Semantic Applications for Financial Services
Semantic Applications for Financial ServicesSemantic Applications for Financial Services
Semantic Applications for Financial Services
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Reasons Why Health Data is Poorly Integrated Today and What We Can Do About It
Reasons Why Health Data is Poorly Integrated Today and What We Can Do About ItReasons Why Health Data is Poorly Integrated Today and What We Can Do About It
Reasons Why Health Data is Poorly Integrated Today and What We Can Do About It
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White Paper
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Flow-ABriefExplanation
Flow-ABriefExplanationFlow-ABriefExplanation
Flow-ABriefExplanation
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Taming the data beast
Taming the data beastTaming the data beast
Taming the data beast
 
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big HaystackBig Data Matching - How to Find Two Similar Needles in a Really Big Haystack
Big Data Matching - How to Find Two Similar Needles in a Really Big Haystack
 
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 

Mais de HPCC Systems

Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 

Mais de HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

HPCC Systems - Open source, Big Data Processing & Analytics

  • 1. HPCC Systems See Through Patterns, Hidden Relationships and Networks to Find Opportunities in Big Data. Open Source, Big Data Processing and Analytics
  • 3. 3 LexisNexis leverages data about people, businesses, and assets to assess risk and opportunity associated with industry-specific problems Contributory Data Public Records Credit Headers Phone Listings Insurance Financial Services HPCC Systems technology Receivables Management Legal Government Health Care Assess underwriting risk and verify applicant data; prevent/ investigate fraudulent claims; optimize policy administration Prevent/investigate money laundering and comply with laws Prevent/investigate identity fraud Open Source and commercial offerings of our leading Big Data platform Assist collections by locating delinquent debtors Assess collectability of debts and prioritize collection efforts Locate/vet witnesses, assess assets/associations of parties in legal actions; Perform diligence on prospective clients (KYC) Locate missing children/suspects; research/document cases; reduce entitlement fraud; accelerate revenue collection Verify patient identity, eligibility, and ability to pay Validate provider credentials; prevent/investigate fraud
  • 4. LexisNexis Risk Solutions works with Fortune 1000 and mid-market clients across all industries and both federal and state governments. 4 ‱ Customers in over 139 countries ‱ 6 of the world’s top 10 banks ‱ 100% of the top 50 U.S. banks ‱ 80% of the Fortune 500 companies ‱ 100% of U.S. P&C insurance carriers ‱ All 50 U.S. states ‱ 70% of local governments ‱ 80% of U.S. Federal agencies ‱ 97 of AM Law 100 firms
  • 6. Collection – Structured, unstructured and semi-structured data from multiple sources Ingestion – loading vast amounts of data onto a single data store Discovery & Cleansing – understanding format and content; clean up and formatting Integration – linking, entity extraction, entity resolution, indexing and data fusion Analysis – Intelligence, statistics, predictive and text analytics, machine learning Delivery – querying, visualization, real time delivery on enterprise-class availability 6 LexisNexis Big Data Value Chain
  • 7. Sample Use Cases for HPCC Systems Vertical Example Automate Identification Automate Identity Identification and disambiguation for people, businesses, assets Complex Queries Run complex queries on transaction data (years, months, decades) to see past and present behavior Predict Behavior Predict behavior and patterns leveraging current and historical data from a variety of data sources Financial Services Analyze current and historical consumer transaction data from various sources to see fraud patterns or opportunities during a lifetime cycle of a customer. Government Manage and analyze the flow of people, businesses and assets from various data sources to monitor transactions and fraud rings. Health Care/Insurance Manage customer/patient transaction data on electronic medical records /claims including medical information, medical procedures, prescriptions, etc. Internet Monitor social media outlets for trends and patterns. Monitor all the data on your network to guard against cyber security attacks. Retail Capture and analyze information from all branches of stores, locations, departments to manage pricing, inventory, distribution. Telecommunications Processing customer data such as billions of call records, texts, streaming media and GPS history. Analyze customer churn, usage behavior patterns, failure. 7
  • 8. 8 HPCC Systems is an Open Source Platform for Big Data Processing ‱ High Performance Computing Cluster (HPCC Systems) enables data integration on a scale not previously available and real-time answers to millions of users. Built for Big Data and proven for 10 years with enterprise customers. ‱ ECL Parallel Programming Language optimized for business differentiating data intensive applications ‱ Single architecture offers two data platforms (query and refinery) and a consistent data-intensive programming language (ECL)
  • 10. HPCC Systems Roadmap Solves the Talent & Work Crunch Problem Developers Linkers Modelers Data Scientists Business Managers Types of Users: Talent & Work Crunch Issue High-level tools: Link datasets Ingest data Write queries Create products Discover attributes, create models & scoring logic Discover patterns Write algorithms Monitor business performance and customer needs, dynamics Smart View, SALT, KEL HPCC Flow, KEL HPCC Flow, EDA HPCC Flow, Circuits, RAMPS Dashboard Understand patterns Discover anomalies ECL N/A N/A KEL, SALT, ECL N/A 10 Low-level tools: BI Analysts Scored Search, Dashboard N/A What they do Developers want a faster, better way to solve problems. Hadoop requires many developers, which is expensive for firms Hard for companies to find and hire. They need a better, faster way to understand how to link data for analysis. Hard and expensive for companies to find and hire. Need more time on analysis not low-level data manipulation tasks. They struggle with IT and data solutions to get the data and insight they need to make business decisions. Most IT and data solutions are not connected. Hard for companies to find and hire. They need a better, faster way to model data so that are spending their time on analysis, not programing models Hard and expensive for companies to find and hire. Need more time on analysis not low-level data manipulation tasks. How HPCC Systems Helps Address the Talent Crunch
  • 11. Deep Dive on HPCC Systems Components
  • 16. 16 Programming Language is called Enterprise Control Language (ECL) Declarative programming language: Describe what needs to be done and not how to do it Powerful: Unlike Java, high level primitives as JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. are available. Higher level code means fewer programmers & shortens time to delivery Extensible: As new attributes are defined, they become primitives that other programmers can use Implicitly parallel: Parallelism is built into the underlying platform. The programmer needs not be concerned with it Maintainable: A high level programming language, no side effects and attribute encapsulation provide for more succinct, reliable and easier to troubleshoot code Complete: ECL provides for a complete data programming paradigm Homogeneous: One language to express data algorithms across the entire HPCC platform, including data ETL and high speed data delivery
  • 17. 17 ‱ ECL is a declarative, data-centric, programming language which can expressed concisely, parallelizes naturally, is free from side effects, and results in highly-optimized executable code. ‱ ECL is designed for a specific problem domain (data-intensive computing), which makes resulting programs clearer, more compact, and more expressive. ECL provides a more natural way to think about data processing problems for large distributed datasets. ‱ Since ECL is declarative, execution is not determined by the order of the language statements, but from the sequence of dataflows and transformations represented by the language statements. The ECL compiler determines the optimum execution strategy and graph. ‱ ECL incorporates transparent and implicit parallelism regardless of the size of the computing cluster and reduces the complexity of parallel programming increasing the productivity of application developers. ‱ The ECL compiler generates highly optimized C++ for execution. ‱ ECL provides a comprehensive IDE and programming tools including an Eclipse plugin. ‱ ECL is provided with a large library of efficient modules to handle common data manipulation tasks. 1 1 2 2 Programming Language is called Enterprise Control Language (ECL)
  • 18. Scalable Automated Linking Technology (SALT) and LexID
  • 19. SALT - Algorithms for Data Analysis and Linking ‱ Transform a base file into a rolling historic record Data Ingest of incoming records. ‱ Generate statistics for field lengths, word counts, Data Profiling population rate, etc. ‱ Format fields correctly at the character level (i.e., Data Hygiene capitals, punctuation, spelling errors, etc.) ‱ Create 360⁰ profiles on individuals, organizations, etc. through iteratively linking data from multiple sources Entity Resolution ‱ Uncover hidden patterns within massive datasets to determine how entities relate to one another Relationship Extraction SALT Scalable Automated Linking Technology 19
  • 20. 20 LexisNexis linking is statistically-based (not rules-based), using the LexisNexis master databases for reference Two examples where linkage strictly based on rules doesn’t work: 1. LN Linking sufficiently explores all  Flavio Villanustre, Atlanta  Javio Villanustre, Atlanta Match, because the system has learnt that “Villanustre” is specific because the frequency of occurrence is small and there is only one present in Atlanta NO MATCH, because the rules determine that “Flavio“ and “Javio” are not the same LN Linking RULES INPUT Error  John Smith, Atlanta  John Smith, Atlanta NO Match, because the system has learnt that “John Smith” is not specific because the frequency of occurrence is large and there are many present in Atlanta MATCH, because the rules determine that “John Smith“ and the city for both the records match LN Linking RULES INPUT Error Record 1 Record 2 credible matches 2. LN Linking effectively minimizes false matches
  • 21. 21 The secret sauce in our portfolio is LexIDSM The fastest linking technology platform available with results that help you make intelligent information connections. LexIDSM is the ingredient behind our products that turns disparate information into meaningful insights. This technology enables customers using our products to identify, link and organize information quickly with a high degree of accuracy. Get a more complete picture Make intelligent information connections beyond the obvious by drawing insights from both traditional and new sources of data. Better results, faster Use the fastest technology for processing large amounts of data to help you solve cases more quickly and confidently. Protect private information Keep customer SSNs and FEINs secure and enjoy peace of mind knowing you are taking steps to observe the highest levels of privacy and compliance.
  • 23. Case Study: Fraud in Medicaid Scenario Proof of concept for Office of the Medicaid Inspector Generation (OMIG) of large Northeastern state. Social groups game the Medicaid system which results in fraud and improper payments. Task Given a large list of names and addresses, identify social clusters of Medicaid recipients living in expensive houses, driving expensive cars. Result Interesting recipients were identified using asset variables, revealing hundreds of high-end automobiles and properties. Leveraging the Public Data Social Graph, large social groups of interesting recipients were identified along with links to provider networks. The analysis identified key individuals not in the data supplied along with connections to suspicious volumes of “property flipping” potentially indicative of mortgage fraud and money laundering 23
  • 24. Scenario Healthcare insurers need better analytics to identify drug seeking behavior and schemes that recruit members to use their membership fraudulently. Groups of people collude to source schedule drugs through multiple members to avoid being detected by rules based systems. Providers recruit members to provide and escalate services that are not rendered. Task Given a large set of prescriptions. Calculate normal social distributions of each brand and detect where there is an unusual socialization of prescriptions and services. Result The analysis detected social groups that are sourcing Vicodin and other schedule drugs. Identifies prescribers and pharmacies involved to help the insurer focus investigations and intervene strategically to mitigate risk. 24 Case Study: Fraud in Prescription Drugs
  • 25. Case Study: Insurance Score From 100 Days To 30 Minutes Scenario One of the top 3 insurance providers using Oracle analytics products on multiple statistical model platforms with disparate technologies – Insurer issues request to re-run all past reports for a customer: 11.5M reports since 2004 – Using their current technology infrastructure it would take 100 days to run these 11.5M reports Task Migration of 75 different models, 74,000 lines of code and approximately 700 definitions to the HPCC – Models were migrated in 1 man month. – Using a small development system (and only one developer), we ran 11.5 million reports in 66 minutes – Performance on a production-size system: 30 minutes – Testing demonstrated our ability to work in batch or online Result – Reduced manual work, increases reliability, and created capability to do new scores – Decreased development time from 1 year to several weeks; decreased run time from 100 days to 30 minutes – One HPCC (one infrastructure) translates into less people maintaining systems. 25
  • 26. Scenario This view of carrier data shows seven known fraud claims and an additional linked claim. The Insurance company data only finds a connection between two of the seven claims, and only identified one other claim as being weakly connected. 26 Case Study: Fraud & Collusion in Auto Insurance Claims
  • 27. Task After adding the LexID to the carrier Data, LexisNexis HPCC technology then added 2 additional degrees of associations Result The results showed two family groups interconnected on all of these seven claims. The links were much stronger than the carrier data previously supported. Family 1 Family 2 27 Case Study: Fraud & Collusion in Auto Insurance Claims
  • 28. Case Study: Network Traffic Analysis in Seconds Scenario Conventional network sensor and monitoring solutions are constrained by inability to quickly ingest massive data volumes for analysis - 15 minutes of network traffic can generate 4 Terabytes of data, which can take 6 hours to process - 90 days of network traffic can add up to 300+ Terabytes Task Drill into all the data to see if any US government systems have communicated with any suspect systems of foreign organizations in the last 6 months - In this scenario, we look specifically for traffic occurring at unusual hours of the day Result In seconds, HPCC Systems sorted through months of network traffic to identify patterns and suspicious behavior 28
  • 29. info@hpccsystems.com US: 1.877.316.9669 Intl: 1.678.694.2200 Contact Us 29

Notas do Editor

  1. Expanded text: 11.6 billion consumer records ‱ 6.5 billion unique Name & Address Combinations and 30+ years of Address History ‱ 2.7 billion motor vehicle registrations ‱ 2.7 billion property records ‱ 302 million unique businesses ‱ 227 million auto and home claim records; 99% of all U.S. auto claims ‱ 12 million background checks and 3 million drug screen tests per year
  2. 28