The objective of this research project is to develop a software toolset that mines a large set of unstructured text archives of human rights abuses. The software tool is designed to discover stories of hidden human rights victims and unidentified perpetrators. These stories do not exist in one document but as fragments of text embedded across multiple documents. Thus, these stories can be identified only when reading across a large number of related documents. The current approach of manually reading to identify such stories is incredibly tedious, time-consuming, unsystematic, and error-prone. Human readers find it difficult to correlate the identity of victims, perpetrators, and details of abuse that reside across multiple documents. Thus, the success of this project has significant implications for the human rights community, as currently there is a lack of adequate tool support for automatically reading and identifying stories from large-scale unstructured text document sets.
1. Digging
By Karthikeyan Umapathy, Associate Professor, School of Computing, University of North Florida.
Digging into Human Rights Violations
Human rights corpora contain reports from survivors
and witnesses to human rights violations describing
and relating information about a traumatic event.
This is problematic as it creates multiple accounts of a
singular event. As each person would recall the
incidents and details regarding the event differently
resulting in narrations that may not always line up
with one another perfectly.
Despite these variations, scholars need to create a
narrative history that make up a collective memory
and shape cultural identity regarding the traumatic
event.
Currently, humanities scholars perform co-reference
analysis manually using traditional methods such as
qualitative coding and string matching.
Humanities scholars need analytical tools to dig into
text corpora and extract relevant information.
Unidentified victims, perpetrators, and
other details of human rights violations are
camouflaged by the scale of archival
records of witness reports.
Human Rights Violations are acts typically
deemed as crimes against humanity.
Examples of such acts include genocide,
torture, slavery, rape, enforced sterilization
or medical experimentation, and deliberate
starvation.
Most severe violations are committed by
state (or non-state) when they abuse,
ignore, or deny basic human rights due to
wars of aggression, war crimes, and crimes
against humanity.
Human Rights Violation Data
Lord’s Resistance Army NGO reports and
statements (~3,000 reports and documents)
Extraordinary Chambers in the Courts of
Cambodia (2,672 court documents and
transcripts)
South African Truth and Reconciliation
Commission (2,004 court documents and
transcripts)
World Trade Center Task Force Interviews
(503 interviews)
Bosnian Historical Memories (84 life stories)
into Human Rights Documents
Project 1:
Context
The objective of this research is to develop a software toolset that mines a large set of unstructured text
archives of human rights abuses. The software tool is designed to discover stories of hidden human rights
victims and unidentified perpetrators. These stories do not exist in one document, but as fragments of text
embedded across multiple documents. We developed a framework to mine human rights corpora and
construct narratives of abuses by identifying cross-documents co-reference of violation event patterns.
Problem
Team: This project was an international effort consisting of
academic and industry team members. Ben Miller (Georgia State
University, US), Karthikeyan Umapathy (University of North
Florida, US), and Lu Xiao (Western University, Canada).
Solution
Human Rights
Data Corpus
Preprocessing
Entity Recognizer
and Anaphora
Resolution
Time Tagger
Event Extractor
Person
Location
Time
Sequence
of Events
Story
Timeline
Fuzzy Cluster
Anaphora
Resolution
Co-reference
Logic Based on
Event Types
Database
Similarity
Scores
Visualizations
Constructed
Storylines
Team: Joshua Joiner (Master Thesis Student) and
Karthikeyan Umapathy.
Developed a Natural Language Processing system that
facilitates cross-document co-reference of traumatic
events from corpus containing collection of witness
statements and interviews.
Reconstructed stories of hidden victims and unidentified
perpetrators from text fragments scattered across a
large collection of related documents.
Automating CIRI Ratings of Human Rights
Reports Using GATE
Project 2:
This project involves parsing human rights reports produced by the U.S Government and rating the
human practices for various countries. The U.S Human Rights Reports are annual reports that cover
internationally recognized human rights practices in regards to individual, civil, political, and worker rights.
F-Measure scores in the above table show accuracy of
ratings by the automated system correctly.
Project Objective
CIRI coders rely on a manual process of reading
through the Human Rights Reports and then
applying ratings to each human rights practice
for each country.
The objective of this project is to automate
the process of scouring the human rights
country reports.
Generating CIRI Rating using GATE
Text Mining Tool
GATE is an open source text mining platform used
for developing custom text processing solutions.
CIRI Ratings Comparison
Denmark Empowerment Rights
CIRI (Cingranelli-Richards) Human Rights Data
Project rates the human rights practices of the
U.S. Human Rights country reports. Students,
scholars, policymakers, and analysts use the CIRI
ratings for practical and research purposes.