6.2. Prompt Engineering for Legal Judgement Prediction

By: Dietrich Trautmann, Alina Petrova, Frank Schilder

Available at: https://arxiv.org/pdf/2212.02199.pdf

Introduction

Researchers from Thomson Reuters led by Dietrich Trautmann introduce the concept of Legal Prompt Engineering (LPE), a process designed to guide and assist LLMs in possibly performing various legal tasks. Their research focuses on Legal Judgement Prediction (LJP), predicting the outcome of a legal case based on the given legal facts, evidence, precedents, and other relevant information.

Hypothesis

Can LLMs be used to automate the prediction of court decisions? More specifically, can legal prompt engineering guide LLMs to effectively perform the LJP task in a zero-shot manner? As a reminder, zero-shot prompting is the most basic form of prompting, and sadly the most common. A zero-shot prompt simply provides a task (ie. ask a question) to the model, nothing more. Earlier in this guide we discussed several prompting strategies like few-example prompting which significantly improves the results of ChatGPT and other LLMs with relatively little effort.

Why did the authors only use zero-shot prompting?

To learn whether or not the implicit (general) knowledge of a LLM translated into a foundational understanding of law. The only additional context provided within the prompts tested were the case texts from the European Court of Human Rights and the Federal Supreme Court of Switzerland. Ultimately, the prompt stack used throughout the experiment is illustrated below.

Using zero-shot prompting is the most effective way in testing whether a generic LLM, one that is not further trained or fine-tuned is able to perform legal reasoning. More advanced prompting techniques would greatly skew the results and defeat the purpose of this study.

Method of Evaluation

The authors used discrete and manual legal prompt engineering. It's a process where they created and evaluated human-readable prompts to classify legal judgments into two categories, a yes/no task.

Here's a summary of the process they used:

  1. First, they tried using a long legal document as the only input for the language model. The language model tried to continue the document, but the results were not helpful an predicting a guilty or not-guilty prediction.

  2. Then, they added a question after the document that reformulated the task. This improved the model's output, but it was still not effective in many cases. Instead of giving a yes/no answer, the model continued with a list of other questions.

  3. To improve the model's output, they added the indicators "Question:" and "Answer:". However, the model still gave "free-form" responses, which were difficult to classify into "yes" or "no".

  4. They then included answer options "A, Yes" and "B, No" to guide the model's responses.

  5. Finally, they used a special indicator to separate the document from the prompt.

Results

The results reveal that zero-shot LPE performs better than baseline approaches, demonstrating that the transfer to the legal domain is possible for general-purpose LLMs. However, it still falls short compared to the current state-of-the-art supervised approaches. Despite the limitations, the study underscores the potential of LPE in the legal field and its applicability in a multilingual context.

Why is this research important?

AI models can help legal professionals in their decision-making processes, facilitate legal research, and potentially improve the efficiency of legal proceedings. The LJP task being evaluated is challenging due to the complexity of legal language, the need for logical reasoning, and the often extensive length of legal documents. Being able to better understand how LLMs handle complex tasks within specific domains will provide valuable information that can be used to develop strategies for implementing AI to operate effectively throughout entire industries not just limited to law.

Last updated