This is a short introduction to event extraction in Natural Language Processing (NLP). I highlight some use cases in and out of the legal sphere. I also introduce Timeline Tailor, a sparse event extraction tool and chronology builder.
Event extraction in NLP
Temporal Information Extraction (aka Event Extraction) is the identification and extraction of ‘events’ from unstructured text.
An ‘event’ is a thing that happens at or over a period of time. The goal of event extraction is to identify event mentions within text and retrieve information about them for further processing.
Take the sentence “Alice and Bob had dated for years, eventually marrying on 5 June 2018”. The sentence contains a “married” event and certain other information such as the date (“5 June 2018”) and the entities involved (“Alice”, “Bob”). The image below shows a structured representation of the event which may be the output of an event extraction algorithm.

Temporal event extraction is part of a larger NLP problem set known as Information Extraction. It has many use cases, including global crisis monitoring, traffic and weather monitoring, finance and biomedical medication and adverse reaction identification (see Liu et. al, 2021).
In the legal sphere it could be used for case fact analysis and case comparison, case summaries, research or litigation preparation.
Accurate and complete event extraction has proved technically challenging. Whilst humans are able to intuitively construct temporal representations of text from a young age, even the simplest of sentences contains surprisingly dense event representations.
Take the example below, from the much adapted TimeBank-Dense corpus paper (Cassidy et al., 2014). The single sentence contains six temporal relationships. Longer passages become more complex:

Event extraction research continues to evolve. The arrival of large language models such as ChatGPT hold the promise of substantial progress in the field.
Timeline Tailor
I introduce TimelineTailor.com.
Timeline Tailor is a spare event extraction tool and chronology builder. It takes free text as an input and returns a structured webpage listing each date mention and a brief description in chronological order. The tool was built with python, docker and postgres, on AWS services with a serverless GPU for text processing.
Timeline Tailor uses a pipeline of two common NLP tools. The first is ‘entity recognition’ from SpaCy. This identifies a ‘date’ entity within the text, such as “4 May 2010” or “early-June”. The second is a sequence-to-sequence transformer from the Hugging Face library which is trained to convert the text surrounding those dates into an event description. Each date string is parsed and all the event summaries are ordered chronologically for the final output.
The tool can be considered a ‘sparse’ entity extractor, because it does not identify and extract every temporal relation in the text. The pipeline will only extract events from the text which explicitly have a ‘date’ associated with them.
To illustrate this, consider again Alice and Bob’s marriage, but now include a mention their honeymoon: “Alice and Bob had dated for years, eventually marrying on 5 June 2018. Their honeymoon was amazing.”
We readers intuitively know that the honeymoon was an event which involved Alice and Bob and likely occurred shortly after the wedding on 5 June 2018 for some fixed duration. However, because the date of the honeymoon is not explicit in the text, the tool will not identify it as an event or be able to order it within a chronology.
This is just one example of the overarching challenge of event extraction in NLP – namely, that humans have an intuitive and seamless grasp on narrative text, but automating a framework which defines and extracts those narratives is very challenging. There is much more progress to be made – perhaps version 2.0 will make some advances. Watch this space.
Cover Image: Pexels, Andrey Grushnikov https://www.pexels.com/photo/black-and-white-photo-of-clocks-707676/
Leave a Reply