More Related Content Similar to Visualizing Relationships: Journalistic Problems in a Digital Age (20) More from 3Pillar Global (11) Visualizing Relationships: Journalistic Problems in a Digital Age2. Summary
1. Introduction
2. The Problem we are solving
3. Involved issues
4. Problems we found
5. The Challenge
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
2
3. WHO ARE WE?
• Mariano Blejman is a technology editor
and youth editor in Argentine newspaper
Página/12, and Hacks/Hackers Buenos
Aires co-founder. @blejmanevel
• Marcos Vanetta is a biomedical engineer.
Software developer at 3PillarGlobal and
hacker at Hacks/Hackers Buenos Aires.
@malev
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
3
5. THE PROBLEM
• 1976 A dictatorship started in Argentina.
• 30,000 persons were kidnapped and disappeared.
• 1985 First trials happened in Argentina. They judged the
bad guys but we have to stop.
• 2003 Justice start judging the bad guys again.
• 2012 Large amount of judicial documents.
No one can read all of them
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
5
6. INVOLVED ISSUES
• Semantic Analytics
• Ontology
• Data Mining
• Social Network Analysis
• Visualizations
Who were dealing with documents?
DocumentCloud, Overview, Open Calais, NLTK, Gate
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
6
7. FIRST APPROACH
Read all the documents
Software solution based on regular expressions
Ruby, Padrino and MySQL database.
def self.extract_plain_text(path)
basename = File.basename(path).split('.')[0..-2].join('.')
tmp_dir = Dir.tmpdir
Docsplit.extract_text(path, :output => tmp_dir, :ocr => false)
text = File.open(File.join(tmp_dir, "#{basename}.txt")).read
self.clean_text(text)
end
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
7
8. THE PROBLEMS WE FOUND
• Convert text from pdf files
• Extract entities from documents
• Parse dates and addresses
• Co-reference names resolution
• How to store relations
• Documents contextual information
• Confidence on data on a crowdsourcing platform.
Visualizing Relationships over the Time
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
8
9. WHAT DO WE HAVE
NOW?
Prototype for a single (and
local) use case: mapa76
Platform for different use
cases: analice.me
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
9
14. THE #MOZFEST CHALLENGE
Find a big journalistic issue that involves:
• Lot of documents with unstructured data
• Lot of data to find inside
• What relationships do you wants to find
© Copyright 2014. 3Pillar | All rights reserved Strictly Confidential
14
Editor's Notes 3Pillar Global has brought together the expertise of engineering and the critical understanding of the market and business needs to build innovative software products that propels clients’ businesses forward.