2.1.3. Extraction

LLMs can also effectively extract information from large volumes of text. Common use cases involving extraction prompts include, but are not limited to:

  • Information Retrieval: Extracting specific information from large datasets or complex documents, like identifying key terms, phrases, or sections in legal documents.

  • Named Entity Recognition (NER): Extracting named entities such as people, organizations, locations, dates, etc. from a text.

  • Event Extraction: Identifying and extracting key events, participants, etc. from a given text.

Example: Corporate Organizational Charts

The blockchain industry has had a string of tumultuous events shakeup the recently. Arguably, the most infamous event was the collapse of FTX, the once high-flying cryptocurrency exchange and hedge fund that filed for bankruptcy in November of 2022. To be more specific, FTX, and over 100 of its subsidiaries filed for bankruptcy. Imagine the amount of effort required from bankruptcy lawyers to uncover the intricacies of the relationships between all these corporate entities into something that can be easily understood... Enter ChatGPT 🙂

The example prompt below demonstrates how we can perform NER extraction to compile an organizational chart.

Identify all corporate entities (businesses), along with their parent companies, 
subsidiaries, and affiliates in the text below. Output in dot language.

Plaintiff Securities and Exchange Commission (the “SEC” or the “Commission”) for 
its Complaint against Defendants Coinbase, Inc. (“Coinbase”) and Coinbase Global, Inc. 
(“CGI”) (collectively, “Defendants”). CGI—Coinbase’s parent company to which 
Coinbase’s revenues flow—is a control person of Coinbase and thus violated the same 
Exchange Act provisions as Coinbase.

This is a contrived example, but it shows that we can direct ChatGPT to extract specific types of information along with additional metadata. Here, we're not simply asking for an exhaustive list of all corporate entities mentioned in the text, we also want to understand the relationships between these entities. You can see our interaction with ChatGPT using this prompt here. Something that we haven't seen yet, but will be discussed at length in chapter 4 are "Output Parsers." In our example prompt above, we provide ChatGPT with a specific instruction – "Output in dot language." ChatGPT keeps things simple, it receives text as input, and returns text as output. One thing that many users don't realize, is that we can ask ChatGPT to return different types of text, text that has specific syntactical meaning, text that represents a computer programming language, etc. In this example, we're interested in generating some type of visual representation of an organizational chart. In this prompt, we are asking ChatGPT to provide its response in Dot language (used to generate visual diagrams–will be covered in chapter 4). We can take the Dot language response provided by ChatGPT to generate an image like the one below:

Extraction prompts can be used heavily by law firms to solve a variety a different problems. In the next section, we'll present an example that uses extraction with few-example prompting to automate a common workflow dreaded by countless attorneys and paralegals.

Last updated