SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Text-Mining:
Big Data Analytics voor ongestructureerde data
Prof dr ir Jan C. Scholtes
https://textmining.nu
Prof dr ir Jan C. Scholtes
3
Exploratory Search
4
Text Mining
Text Mining: The next step in
Search Technology
Finding without knowing exactly what
you’re looking for, or finding what
apparently isn’t there (or who do not
want to be found …).
5
5
•Social network analysis
•Community Detection
•Different types of
visualization for
temporal, geographical,
semantic or relational
mappings.
•Anomaly Detection
•Decision Tree
•Bayes Classifiers
•Rochio
•k-NN
•Support Vector Machines
•Clustering
•CNN
•LSTM
•Entity extraction
•Fact, Event & Concept
extraction
•Negations, co-reference
resolution
•Grammars
•Statistical methods: Hidden
Markov Models, Maximum
Entropy Models, Conditional
Random Fields, …
•Data normalization
(Ontology matching)
•Inverted file index
•Relevance ranking
•Relevance feedback
•Faceted search
•Incomplete matching
•Index compression
•Precision & Recall
Search
Information
Extraction
Link Analysis
& Data
Visualization
Machine
Learning
6
Language_Name English
CITY New Brunswick, WASHINGTON
COMPANY J&J, Johnson & Johnson
COUNTRY Greece, Poland, Romania, United Kingdom
CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD
DATE 04-08
DAY Fri, Friday
NOUN_GROUP
biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations,
intense revenue pressure, meaningful credit, medical device kickbacks, medical devices, multiple businesses, next several
days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past
year, paying kickbacks, same time, several new positions, similar violations, travel gifts
ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan
PEOPLES Iraqi
PERSON Erik Gordon, Mythili Raman, William Weldon
PLACE_REGION Europe
PRODUCT Benadryl, Tylenol
PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil
STATE N.J.
TIME 1:32 pm ET
TIME_PERIOD 13 years, five years, six months, three years
YEAR 2007
Problem
"We went to the government to report improper payments and have taken full responsibility for these actions," said
William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where
millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign
Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain
business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in
illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver
the bribes.
Sentiment
giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing
foreign officials, what is honest
Request make sure it complies with anti-bribery laws across its businesses
7
WHAT happened?
8
WHO
8
9
WHAT-WHEN: Topic Rivers
10
WHY & WHO: Emotion Detection
11
Anomaly Detection
Σ(Φ)
12
Text Mining the Lord of the Rings
• Automatic
identification of
key players
(custodians)
• Automatic
identification of
locations.
• Automatic
identification of
travel patterns of
key players.
• Visualize in time.
Memory Consistency
24/7
Speed &
Scalability
Search
M&A and
Restructuring
Data
Collection
Analytics
eDiscovery,
Regulatory
Requests,
Investigations,
Fact-Finding
Missions
Reporting
Archiving
Knowledge
Management
Production
Big Data Analytics and the Law
ZyLAB used as e-
Discovery & e-Disclosure
standard for all United
Nations-backed War Crime
Tribunals and ongoing UN
courts
16SLIDE / 16
• FOIA (WOB)
• Audits &
Internal Investigations
• Litigation
• Arbitration
• Answering Regulatory
Requests
• Subject Access
Requests
• Right to be Forgotten
eDiscovery
17
3x more relevant
documents than
Boolean search
No complex queries, just
review documents
2x total number of
relevant documents
is all that need to be
reviewed
Estimate
accurately percentage of all
relevant documents found at
end
Teach the computer what to look for …
18
CCPA
SLIDE / 19
GDPR & AVG: Aflakken, anonimiseren, …
SLIDE / 20
Hoe werkt dat?
Search Pattern Recognition Text-Mining
Thank you!
Time for Q&A
Prof dr ir Jan C. Scholtes
https://www.linkedin.com/in/jscholtes/
https://textmining.nu

Mais conteúdo relacionado

Semelhante a Text mining scholtes - big data congress utrecht 2019

1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxshyam1985
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
 
Cognitive Legal Science V5
Cognitive Legal Science  V5Cognitive Legal Science  V5
Cognitive Legal Science V5Howard Moskowitz
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378nitttin
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021andygustafson
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Stuart Shulman
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for DiscoveryDayOne
 
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesPremNarayanan6
 

Semelhante a Text mining scholtes - big data congress utrecht 2019 (20)

Ona 2012
Ona 2012Ona 2012
Ona 2012
 
benfords Law
benfords Lawbenfords Law
benfords Law
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptx
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Cognitive Legal Science V5
Cognitive Legal Science  V5Cognitive Legal Science  V5
Cognitive Legal Science V5
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data miningppt378
Data miningppt378Data miningppt378
Data miningppt378
 
Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021Big Data Ethics Cjbe july 2021
Big Data Ethics Cjbe july 2021
 
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
 
Artificial Intelligence for Discovery
Artificial Intelligence for DiscoveryArtificial Intelligence for Discovery
Artificial Intelligence for Discovery
 
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation Slides - Ethical Data Science - BoI Analytics Connect 2018
 
Big Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical DevicesBig Data in Healthcare and Medical Devices
Big Data in Healthcare and Medical Devices
 
mineria de datos
mineria de datosmineria de datos
mineria de datos
 
mineria datos
mineria datosmineria datos
mineria datos
 

Mais de jcscholtes

Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029jcscholtes
 
LegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote ScholtesLegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote Scholtesjcscholtes
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Taskjcscholtes
 
Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101jcscholtes
 
Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030jcscholtes
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
 
Hogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal AnalyticsHogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal Analyticsjcscholtes
 
HvA Legaltech Lab Opening
HvA Legaltech Lab OpeningHvA Legaltech Lab Opening
HvA Legaltech Lab Openingjcscholtes
 
Big Data en Data Science en de Rechtspraak
Big Data en Data Science en de RechtspraakBig Data en Data Science en de Rechtspraak
Big Data en Data Science en de Rechtspraakjcscholtes
 
How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?jcscholtes
 
Big data analytics for legal fact finding
Big data analytics for legal fact findingBig data analytics for legal fact finding
Big data analytics for legal fact findingjcscholtes
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finaljcscholtes
 
Efficiently Handling Subject Access Requests
Efficiently Handling Subject Access RequestsEfficiently Handling Subject Access Requests
Efficiently Handling Subject Access Requestsjcscholtes
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftjcscholtes
 

Mais de jcscholtes (14)

Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029Legal tech Alliance Workshop 20191029
Legal tech Alliance Workshop 20191029
 
LegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote ScholtesLegalTech Alliance eDiscovery keynote Scholtes
LegalTech Alliance eDiscovery keynote Scholtes
 
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging TaskTarget-Based Sentiment Anaysis as a Sequence-Tagging Task
Target-Based Sentiment Anaysis as a Sequence-Tagging Task
 
Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101Ai and applications in the legal domain studium generale maastricht 20191101
Ai and applications in the legal domain studium generale maastricht 20191101
 
Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030Augmented intelligence and the impact on your world in 2030
Augmented intelligence and the impact on your world in 2030
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Hogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal AnalyticsHogeschool Den Haag Legal Analytics
Hogeschool Den Haag Legal Analytics
 
HvA Legaltech Lab Opening
HvA Legaltech Lab OpeningHvA Legaltech Lab Opening
HvA Legaltech Lab Opening
 
Big Data en Data Science en de Rechtspraak
Big Data en Data Science en de RechtspraakBig Data en Data Science en de Rechtspraak
Big Data en Data Science en de Rechtspraak
 
How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?How can Artificial Intelligence help me on the Battlefield?
How can Artificial Intelligence help me on the Battlefield?
 
Big data analytics for legal fact finding
Big data analytics for legal fact findingBig data analytics for legal fact finding
Big data analytics for legal fact finding
 
How new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-finalHow new ai based analytics ignite a productivity revolution in e discovery-final
How new ai based analytics ignite a productivity revolution in e discovery-final
 
Efficiently Handling Subject Access Requests
Efficiently Handling Subject Access RequestsEfficiently Handling Subject Access Requests
Efficiently Handling Subject Access Requests
 
Waarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeftWaarom LegalTech de toekomst heeft
Waarom LegalTech de toekomst heeft
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Text mining scholtes - big data congress utrecht 2019

  • 1. Text-Mining: Big Data Analytics voor ongestructureerde data Prof dr ir Jan C. Scholtes https://textmining.nu
  • 2. Prof dr ir Jan C. Scholtes
  • 4. 4 Text Mining Text Mining: The next step in Search Technology Finding without knowing exactly what you’re looking for, or finding what apparently isn’t there (or who do not want to be found …).
  • 5. 5 5 •Social network analysis •Community Detection •Different types of visualization for temporal, geographical, semantic or relational mappings. •Anomaly Detection •Decision Tree •Bayes Classifiers •Rochio •k-NN •Support Vector Machines •Clustering •CNN •LSTM •Entity extraction •Fact, Event & Concept extraction •Negations, co-reference resolution •Grammars •Statistical methods: Hidden Markov Models, Maximum Entropy Models, Conditional Random Fields, … •Data normalization (Ontology matching) •Inverted file index •Relevance ranking •Relevance feedback •Faceted search •Incomplete matching •Index compression •Precision & Recall Search Information Extraction Link Analysis & Data Visualization Machine Learning
  • 6. 6 Language_Name English CITY New Brunswick, WASHINGTON COMPANY J&J, Johnson & Johnson COUNTRY Greece, Poland, Romania, United Kingdom CURRENCY .02 USD, 21400000 USD, 48600000 USD, 59.47 USD, 70000000 USD DATE 04-08 DAY Fri, Friday NOUN_GROUP biotech drugs, bribery case, denying guilt, final growth frontier, foreign countries, giving gifts, holding corporations, intense revenue pressure, meaningful credit, medical device kickbacks, medical devices, multiple businesses, next several days, non-U.S. markets, only way, orthopedic hips, other countries, over-the-counter medicines, paid kickbacks, past year, paying kickbacks, same time, several new positions, similar violations, travel gifts ORGANIZATION Department of Justice, Justice Department, SEC, Securities and Exchange Commission, University of Michigan PEOPLES Iraqi PERSON Erik Gordon, Mythili Raman, William Weldon PLACE_REGION Europe PRODUCT Benadryl, Tylenol PROP_MISC Band-Aids, Food Program, Foreign Corrupt Practices Act, United Nations Oil STATE N.J. TIME 1:32 pm ET TIME_PERIOD 13 years, five years, six months, three years YEAR 2007 Problem "We went to the government to report improper payments and have taken full responsibility for these actions," said William Weldon, Chairman and CEO of J&J., Last month federal health regulators took legal control of the plant where millions of bottles of defective medication were produced., The charges against J&J were brought under the Foreign Corrupt Practices Act, which bars publicly traded companies from bribing officials in other countries to get or retain business., The company will pay $21.4 million in criminal penalties for improper payments and return $48.6 million in illegal profits, according to the government., The SEC says J&J agents used fake contracts and sham companies to deliver the bribes. Sentiment giving meaningful credit to companies that self-report, We are committed to holding corporations accountable for bribing foreign officials, what is honest Request make sure it complies with anti-bribery laws across its businesses
  • 10. 10 WHY & WHO: Emotion Detection
  • 12. 12 Text Mining the Lord of the Rings • Automatic identification of key players (custodians) • Automatic identification of locations. • Automatic identification of travel patterns of key players. • Visualize in time.
  • 13.
  • 14. Memory Consistency 24/7 Speed & Scalability Search M&A and Restructuring Data Collection Analytics eDiscovery, Regulatory Requests, Investigations, Fact-Finding Missions Reporting Archiving Knowledge Management Production Big Data Analytics and the Law
  • 15. ZyLAB used as e- Discovery & e-Disclosure standard for all United Nations-backed War Crime Tribunals and ongoing UN courts
  • 16. 16SLIDE / 16 • FOIA (WOB) • Audits & Internal Investigations • Litigation • Arbitration • Answering Regulatory Requests • Subject Access Requests • Right to be Forgotten eDiscovery
  • 17. 17 3x more relevant documents than Boolean search No complex queries, just review documents 2x total number of relevant documents is all that need to be reviewed Estimate accurately percentage of all relevant documents found at end Teach the computer what to look for …
  • 19. SLIDE / 19 GDPR & AVG: Aflakken, anonimiseren, …
  • 20. SLIDE / 20 Hoe werkt dat? Search Pattern Recognition Text-Mining
  • 21. Thank you! Time for Q&A Prof dr ir Jan C. Scholtes https://www.linkedin.com/in/jscholtes/ https://textmining.nu