Anúncio

Data Science for Social Good: #LegalNLP #AlgorithmicBias

IIIT Hyderabad
25 de Mar de 2023
Anúncio

Mais conteúdo relacionado

Similar a Data Science for Social Good: #LegalNLP #AlgorithmicBias(20)

Mais de IIIT Hyderabad (20)

Anúncio

Data Science for Social Good: #LegalNLP #AlgorithmicBias

  1. Data Science for Social Good: #LegalNLP #AlgorithmicBias https://www.linkedin.com/in/ponguru/ March 23 - 24, 2023 IIIT Una Ponnurangam Kumaraguru (“PK”) #ProfGiri CS IIIT Hyderabad ACM Distinguished Member TEDx Speaker https://www.instagram.com/pk.profgiri/
  2. 2
  3. 3
  4. 4
  5. 5
  6. What is Social Computing? 6 https://en.wikipedia.org/wiki/Social_computing
  7. 7
  8. Legal AI for Indian Context District courts are usually the first point of contact between the people and the judiciary. Lower courts in India are burdened with a backlog of cases (~40 million as of 2021). Local languages used in the documents filed in district courts in India. 8 Supreme Court High Courts District Courts
  9. Legal AI / NLP - Data We collected ~900k district court case documents from Uttar Pradesh All documents in Hindi, written in Devanagari There are legal corpora for European Court of Justice and Chinese courts, none for Indian district courts 9
  10. Legal AI / NLP - Data There are around 300 different case types, table shows the prominent ones Majority of the case documents correspond to Bail Applications 10 Variation in number of case documents per district Case types in HLDC
  11. Legal AI / NLP - Bail Documents 11 District-wise ratio of number of bail applications to total cases
  12. Legal AI / NLP - Bail Prediction Model 12 In general, the performance is lower in district-wise settings, possibly due to large variation across districts Overall, summarization models perform better than Doc2Vec and simpler Transformer-based models
  13. Legal AI / NLP for Indian Context 13 HLDC: Hindi Legal Documents Corpus
  14. Legal AI / NLP for Indian Context - Takeaways Indian Legal documents are a rich a source of domain-specific Indic- language corpora, readily available online. Multiple tasks still need attention especially for Indian settings Legal Summarization Case recommendations Citation predictions / network Sleeping beauty Bias 14
  15. Are Models Trained on Indian Legal Data Fair?
  16. An initial investigation of fairness from the Indian perspective in the legal domain 1 Overview
  17. We highlight the propagation of learnt algorithmic biases in the bail prediction task for models trained on Hindi legal documents. 2 Objective
  18. Recent LegalNLP research for judgement prediction and summarization Deployment without evaluation of bias can lead to unwarranted outcomes Perpetuate into unfair decision-making Motivation 3
  19. Recent LegalNLP research for judgement prediction and summarization Deployment without evaluation of bias can lead to unwarranted outcomes Perpetuate into unfair decision-making An evaluation and investigation of encoded biases helps to Understanding of historical social disparities Mitigate any potential harms in the future Motivation 3
  20. Sample 10,000 cases from HLDC 36% bail granted, 63% bail denied Data Preparation 5 Fig: HLDC Snippet
  21. Use two features facts-and-arguments decision Basic Pre-processing – stop words removal, cleaning using regex Each case represented by 7 features 5 – keywords of the case 2 – category of crime of the case Data Preparation 6
  22. Represent a case using keywords – LDA (Topic Modelling) All cases assigned (top) two topics 10 keywords representing each topic 3 keywords for dominant topic 2 keywords for second-dominant topic Data Preparation 11
  23. Identify a subset of cases from the dataset using the theme Sample cases having either a Hindu or a Muslim proper noun Training Decision Tree Classifier Model Training 14
  24. For every case, we identify the True Label Model’s Predicted Label Number of times the model’s prediction changes when the proper noun is replaced with another Hindu proper noun Number of times the model’s prediction changes when the proper noun is replaced with another Muslim proper noun Model Training 15
  25. If the model changes its predictions from 0 (bail dismissed) to 1 (bail granted) more for Muslim nouns replaced by Hindu nouns than Hindu nouns with Muslim nouns, then there exists a bias against Muslims This bias may be due to inherent characteristics of the dataset Model Training 17
  26. Demographic Parity Outcome of a classifier to be independent of a protected attribute Evaluating Fairness 18
  27. Evaluating Fairness 18 Demographic Parity Outcome of a classifier to be independent of a protected attribute Fairness Gap – Deviation of a trained classifier away from ideal demographic parity
  28. Evaluating Fairness 20 Fig: Fairness Gap on Denial of Bail
  29. Evaluating Fairness 20 Fig: Fairness Gap on Denial of Bail
  30. Changes in Predictions for Theme: Hatya (Murder) Results 22
  31. Changes in Predictions for Theme: Dahej (Dowry) Results 23
  32. Ethical considerations Results in no way indicate a bias in the judicial system of India (Small data set, lot more open ended questions) HLDC – Only UP data Identifying de-biasing methods 32
  33. Initial investigation into bias and fairness for Indian legal data Highlight preferentially encoded stereotypes that models might pick up in downstream tasks like bail prediction Need for algorithmic approaches to mitigate the bias learned by these models Conclusions 25
  34. 34 https://precog.iiit.ac.in/pages/publications.html
  35. 35 https://precog.iiit.ac.in/
  36. Group pic & Selfie J 36
  37. 37 Thanks! Questions? pk.guru@iiit.ac.in http://precog.iiit.ac.in/ @ponguru pk.profgiri linkedin/in/ponguru
Anúncio