Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good:
#LegalNLP #AlgorithmicBias
https://www.linkedin.com/in/ponguru/
March 23 - 24, 2023
IIIT Una
Ponnurangam Kumaraguru (“PK”)
#ProfGiri CS IIIT Hyderabad
ACM Distinguished Member
TEDx Speaker
https://www.instagram.com/pk.profgiri/
Legal AI for Indian Context
District courts are usually the first
point of contact between the people
and the judiciary.
Lower courts in India are burdened
with a backlog of cases (~40 million
as of 2021).
Local languages used in the
documents filed in district courts in
India.
8
Supreme Court
High Courts
District Courts
Legal AI / NLP - Data
We collected ~900k district court case documents from Uttar
Pradesh
All documents in Hindi, written in Devanagari
There are legal corpora for European Court of Justice and Chinese
courts, none for Indian district courts
9
Legal AI / NLP - Data
There are around 300 different case types, table shows the prominent
ones
Majority of the case documents correspond to Bail Applications
10
Variation in number of case documents per district
Case types in HLDC
Legal AI / NLP - Bail Documents
11
District-wise ratio of number of bail applications to total cases
Legal AI / NLP - Bail Prediction Model
12
In general, the performance is lower in district-wise settings, possibly due to large
variation across districts
Overall, summarization models perform better than Doc2Vec and simpler
Transformer-based models
Legal AI / NLP for Indian Context
13
HLDC: Hindi Legal Documents Corpus
Legal AI / NLP for Indian Context - Takeaways
Indian Legal documents are a rich a source of domain-specific Indic-
language corpora, readily available online.
Multiple tasks still need attention especially for Indian settings
Legal Summarization
Case recommendations
Citation predictions / network
Sleeping beauty
Bias
14
We highlight the propagation of learnt algorithmic
biases in the bail prediction task for models trained on
Hindi legal documents.
2
Objective
Recent LegalNLP research for judgement prediction and summarization
Deployment without evaluation of bias can lead to unwarranted outcomes
Perpetuate into unfair decision-making
Motivation
3
Recent LegalNLP research for judgement prediction and summarization
Deployment without evaluation of bias can lead to unwarranted outcomes
Perpetuate into unfair decision-making
An evaluation and investigation of encoded biases helps to
Understanding of historical social disparities
Mitigate any potential harms in the future
Motivation
3
Sample 10,000 cases from HLDC
36% bail granted, 63% bail denied
Data Preparation
5
Fig: HLDC Snippet
Use two features
facts-and-arguments
decision
Basic Pre-processing – stop words removal, cleaning using regex
Each case represented by 7 features
5 – keywords of the case
2 – category of crime of the case
Data Preparation
6
Represent a case using keywords – LDA (Topic Modelling)
All cases assigned (top) two topics
10 keywords representing each topic
3 keywords for dominant topic
2 keywords for second-dominant topic
Data Preparation
11
Identify a subset of cases from the dataset using the theme
Sample cases having either a Hindu or a Muslim proper noun
Training Decision Tree Classifier
Model Training
14
For every case, we identify the
True Label
Model’s Predicted Label
Number of times the model’s prediction changes when the proper noun
is replaced with another Hindu proper noun
Number of times the model’s prediction changes when the proper noun
is replaced with another Muslim proper noun
Model Training
15
If the model changes its predictions from 0 (bail dismissed) to 1 (bail
granted) more for Muslim nouns replaced by Hindu nouns than Hindu
nouns with Muslim nouns, then there exists a bias against Muslims
This bias may be due to inherent characteristics of the dataset
Model Training
17
Ethical considerations
Results in no way indicate a bias in the judicial system of India (Small
data set, lot more open ended questions)
HLDC – Only UP data
Identifying de-biasing methods
32
Initial investigation into bias and fairness for Indian legal data
Highlight preferentially encoded stereotypes that models might pick up
in downstream tasks like bail prediction
Need for algorithmic approaches to mitigate the bias learned by these
models
Conclusions
25