By Audrey Im, Berkeley CS ‘23 grad, Paper Prisons research associate, Shreya Shankar, Berkeley CS PhD candidate
The California Racial Justice Act provides a mechanism for defendants (and the convicted) to challenge a charge, conviction, or sentence if it is sought or obtained in a racially disparate manner. Paper Prisons’ RJA data tool provides data from the California Department of Justice for evaluating whether or not an individual’s prosecution fits into part of a larger pattern of disparity among similarly situated individuals that could be actionable under the Racial Justice Act’s CPC 745 (a)(3) or (a)(4). In this post we highlight another powerful tool, DocETL, developed by UC Berkeley doctoral student Shreya Shankar in collaboration with her advisor Aditya Parameswaran and their colleagues at the EPIC lab that can be used to explore claims of racially biased or discriminatory language under CPC 745 (a)(1) or (a)(2).
DocETL is a powerful, declarative system for large-language model (LLM)-powered data processing pipelines. In the RJA context, it can help prosecutors, public defenders, advocates, and researchers quickly answer questions about the content of large batches of PDF’s (e.g., online legal documentation) and the presence of actionable language. With its ability to process these large quantities of files as well as run optical character recognition or OCR and then incorporate LLM’s for advanced tasks such as identifying racially coded language, DocETL can augment the task of carrying out the review of documents.
This guide here walks one through on how to get started. Below is a video that shows you how you can use it.