Mass declassification sept 23 2010v2.1

•Transferir como PPT, PDF•

0 gostou•996 visualizações

My public presentation as delivered to the Public Interest Declassification Board (PIDB) trying to determine the best way to declassify and release over 400M classified documents.

Tecnologia

Mass Declassification What If? Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] September 23, 2010

The Ask ,[object Object],[object Object],[object Object],[object Object]

The Problem at Hand ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Background ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

In Today’s Session ,[object Object],[object Object],[object Object],[object Object],[object Object]

From Pixels to Pictures to Insight Observations Context Relevance Consumer (An analyst, a system, the sensor itself, etc.) Contextualization

Consequences ,[object Object],[object Object],[object Object],[object Object],[object Object]

Context Accumulation Trusted Supplier Job Applicant Stolen Identity Known Terrorist [email_address]

Puzzle Metaphor Primer ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

How Context Accumulates ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

False Negatives Overstate The Universe Observations Unique Identities True Population

Counting Is Difficult Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2

The Rise and Fall of a Population Observations Unique Identities True Population

Data Triangulation Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2 Mark Randy Smith 443-43-0000 DL: 00001234 New Record

Increasing Accuracy and Performance Observations Unique Identities True Population

“ Expert Counting” is Fundamental to Prediction ,[object Object],[object Object],[object Object],[object Object]

Mass Declassification Predictions ,[object Object],[object Object],[object Object]

Using What Data Points? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Open Source Discovery/Scoring ,[object Object],[object Object],[object Object],[object Object],[object Object]

Context Accumulation FOIA March 2010 Open Source Reference Dirty Word Classified – Asserted Mufasa 7 Warhead

Context Accumulation + Statistics ,[object Object],[object Object],[object Object],[object Object],[object Object],Declassification dispositions … becoming a force multiplier. The more human dispositions, the more automated dispositions. Human Triage Auto Triage 5,000 20 10,000 4,000 100,000 65,000 1,000,000 17,000,000

Policy Questions ,[object Object],[object Object],[object Object],[object Object]

Strawman Architecture 450M Docs Historical Dispositions DirtyWords Etc. Feature Extraction & Classification Context Accumulation Predictions(*) Workflow System (*) Recommendations: Equity of, Disposition, Priority Dispositions

Another Idea: Crowd Sourcing ,[object Object],[object Object]

Another Idea: Better Classification ,[object Object],[object Object]

Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Closing Thoughts ,[object Object],[object Object],[object Object],[object Object]

Worst Case Scenario ,[object Object],[object Object]

Related Blog Posts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Blogging At: www.JeffJonas.TypePad.com Information Management Privacy National Security and Triathlons Questions?

Mais conteúdo relacionado

Semelhante a Mass declassification sept 23 2010v2.1

Machine Learning ICS 273Abutest

Qualitative Legal Prediction - Prof. Daniel Katzsmahboobani

Data Science in the Real World: Making a Difference Srinath Perera

Thinkful DC - Intro to Data Science TJ Stalcup

Integrating and publishing public safety data using semantic technologiesAlvaro Graves

Architecting a Platform for Enterprise Use - Strata London 2018mark madsen

Thinkful - Intro to Data Science - Washington DCTJ Stalcup

How new ai based analytics ignite a productivity revolution in e discovery-finaljcscholtes

Say "Hi!" to Your New BossAndreas Dewes

Machine Learning, Data Mining, and butest

ACEDS - ZyLAB webinar - AI Based eDiscovery AnalyticsAnnelore van der Lint

The Future of Advanced AnalyticsHaystax Technology

Log Mining: Beyond Log AnalysisAnton Chuvakin

Machine learning at b.e.s.t. summer universityLászló Kovács

Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison

Data Visualizations in Cyber Security: Still Home of the WOPR?Matthew Park

Career_Jobs_in_Data_Science.pptxHarpreetSharma14

TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...Tata Consultancy Services

From Rocket Science to Data ScienceSanghamitra Deb

Semelhante a Mass declassification sept 23 2010v2.1 (20)

Machine Learning ICS 273A

Qualitative Legal Prediction - Prof. Daniel Katz

Data Science in the Real World: Making a Difference

Thinkful DC - Intro to Data Science

Integrating and publishing public safety data using semantic technologies

Architecting a Platform for Enterprise Use - Strata London 2018

Thinkful - Intro to Data Science - Washington DC

How new ai based analytics ignite a productivity revolution in e discovery-final

Say "Hi!" to Your New Boss

Machine Learning, Data Mining, and

ACEDS - ZyLAB webinar - AI Based eDiscovery Analytics

The Future of Advanced Analytics

Log Mining: Beyond Log Analysis

Machine learning at b.e.s.t. summer university

Creating a Data-Driven Government: Big Data With Purpose

Data Visualizations in Cyber Security: Still Home of the WOPR?

Career_Jobs_in_Data_Science.pptx

TCS Point of View Session - Analyze by Dr. Gautam Shroff, VP and Chief Scient...

From Rocket Science to Data Science

Último

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

How to write a Business Continuity PlanDatabarracks

"ML in Production",Oleksandr BaganFwdays

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

From Family Reminiscence to Scholarly Archive .Alan Dix

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Mass declassification sept 23 2010v2.1

1. Mass Declassification What If? Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] September 23, 2010

6. Context Accumulating Systems

7. From Pixels to Pictures to Insight Observations Context Relevance Consumer (An analyst, a system, the sensor itself, etc.) Contextualization

9. Without Context [email_address]

10.

11. Context Accumulation Trusted Supplier Job Applicant Stolen Identity Known Terrorist [email_address]

12.

13.

14. False Negatives Overstate The Universe Observations Unique Identities True Population

15. Counting Is Difficult Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2

16. The Rise and Fall of a Population Observations Unique Identities True Population

17. Data Triangulation Mark Smith 6/12/1978 443-43-0000 Mark R Smith (707) 433-0000 DL: 00001234 File 1 File 2 Mark Randy Smith 443-43-0000 DL: 00001234 New Record

18. Increasing Accuracy and Performance Observations Unique Identities True Population

19.

20. Mass Declassification Predictions

21.

22.

23.

24.

25. Context Accumulation FOIA March 2010 Open Source Reference Dirty Word Classified – Asserted Mufasa 7 Warhead

26.

27.

28. Strawman Architecture

29. Strawman Architecture 450M Docs Historical Dispositions DirtyWords Etc. Feature Extraction & Classification Context Accumulation Predictions(*) Workflow System (*) Recommendations: Equity of, Disposition, Priority Dispositions

30.

31.

32. Challenges

33.

34. Closing Thoughts

35.

36.

37.

38. Blogging At: www.JeffJonas.TypePad.com Information Management Privacy National Security and Triathlons Questions?

39. Mass Declassification What If? Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics [email_address] September 23, 2010

Notas do Editor

Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high. Remember, natural language is ambiguous, polysemous, tacit and its meaning is often highly contextual. Bottom line -- the computer needs to consider many possible meanings, attempting to find the inference paths that are most confidently supported by the data. The primary computational principle supported by the DeepQA architecture is to assume and maintain multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and process many different evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed from the outset to support interoperability across independently developed analytics. For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community and now an Apache Project (http://uima.apache.org) Over 100 different algorithms, implemented as UIMA components, were developed, advanced and integrated into this architecture to build Watson . In the first step, Question and Category analysis , parsing algorithms decompose the question into its grammatical or syntactic components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”. In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system. In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. These searches are performed over a combination of unstructured data, natural language documents, and structured data, available knowledge bases. The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is not a lot of confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus is on generating a broad set of hypotheses, – or for this application what we call “Candidate Answers”. To implement this step for Watson we used multiple open-source text and KB search components. DeepQA, acknowledges that resources are ultimately limited. And some parameterized judgment about which candidate answers are worth pursuing further must be made given constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and latency, DeepQA uses soft filtering -- it uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the filter would be eliminated from consideration entirely at this point. In Hypothesis & Evidence Scoring the candidate answers are scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”? Evidence , in this case, more documents, passages and more structured facts, are collected for the many candidate answers. Each of these pieces of evidence are subjected to many independently developed algorithms that deeply analyze the evidentiary passages, for example, and score the likelihood that the passage supports or refutes the correctness of the candidate answer. In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire, with varying levels of certainty, They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts. Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----------------------- The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure. No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, 1 month later might only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops is regularly trained, evaluated and retrained. DeepQA is a complex system architecture designed to incrementally extend both in data and algorithms to deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system. -David A. Ferrucci

Mass declassification sept 23 2010v2.1

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Mass declassification sept 23 2010v2.1

Semelhante a Mass declassification sept 23 2010v2.1 (20)

Último

Último (20)

Mass declassification sept 23 2010v2.1

Notas do Editor