TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
knowAAN final presentation
1. Project group knowAAN
Final presentation
Adrian Wilke
info[REMOVE]@adrianwilke.de
Computer Science Education Group
University of Paderborn
October 20th 2011
2. Overview
Overview
Introduction
System components & Work flow
Demonstration
Development process
Summary & Outlook
Time for further questions of detail
PG knowAAN 2
3. Overview
Overview: First part
Goals
Extraction & Storage (of data)
Exploration (of data)
System components & Work flow
Analysis & Visualization (of data)
PG knowAAN 3
4. Goals
Goals
Explore research networks
Based on: Artifacts (scientific publications) and metadata
Combination and analysis of data
Computation of similarities of full texts
Support for conference management system Ginkgo
Data visualization
Recommendations
(Source: PG knowAAN project description)
PG knowAAN 4
5. Goals
Imagine you are interested in a conference.
You downloaded the papers of 2 or 3 years.
Now you have nearly 100 publications.
How do you explore them?
100 publications. Do you know tools?
PG knowAAN 5
14. DocumentBrowser: RoundTrip : RoundTripExecutor : PDFToText : Parscit: Languagedetection: Lemmatizer: NounExtraction: Solr: DB:
a / 1) .addPDF
a / 2) .writeToFS
a / 2) Path
a / 3) .createThread
.submitThread
a / 3)
a / 1)
b / 1) .run
b / 2) .getText
b / 2) Text
b / 3) .ParseFullText
b / 3) ParscitXML
b / 4) .extractBodyAndAstract
b / 4) BodyAndAbstract
b / 5) .getLanguage
b / 5) LanguageString
b / 6) .lemmatize
b / 6) LemmatizedText
b / 7) .extractNouns
b / 7) NounsList
b / 8) .lemmatizeNounslist
b / 8) LemmatizedNouns
b / 9) .ReduceToTopNouns
b / 9) TopNouns
b / 10) .writeToFiles
b / 10) Paths
b / 11) .addTexts
b / 11) Solrid
b / 12) .addPublication
b / 12)
b / 1)
22. Development process
Methods of agile software development
Weekly meetings
Sit together (as much as possible)
Automated building system
Continuous integration
Issue tracking
PG knowAAN 22
23. Summary and Outlook
Summary and future work
Summary
Integrated processing of scientific papers
Aggregated visualization of authors, publications and
events
Compute various analysis over the data
Cleaning functionality for automated processed data
Future work
Parallelized Clustering
Additional graphical visualization
Improve extraction of metadata from PDF files
PG knowAAN 23