- View all blogs
- Entity Recognition case study, with Ripjar
Entity Recognition case study, with Ripjar
Chrissie Cormack Wood, •
Ripjar is a data intelligence platform company whose mission is to provide corporates and institutions with the most advanced data and analytics solutions to protect themselves in real-time from evolving risks that threaten their growth, prosperity and value.
Founded by former members of the UK’s Government Communications Headquarters (GCHQ), Ripjar develops software products that combine automation, artificial intelligence, and data visualisation to help companies solve the most complex risk and security management problems at scale.
Ripjar were developing a next-generation financial crime capability that uses machine learning to identify client risk across all data sources automatically and in real-time. Part of this process is to identify how entities of interest are associated with suspicious activity within unstructured content. For instance, an article describing terrorism might mention lawyers, judges and politicians alongside those implicated in the terrorist activity. Differentiating between them was a crucial part of the solution they were building.
In order to train their algorithms, Ripjar needed a feed of accurately tagged data where humans had identified the entities in the text associated with the suspicious activity.
How Ripjar used Hivemind
Ripjar created a pipeline combining their cutting-edge natural language processing capabilities with human annotation provided by Hivemind to maximize throughput whilst maintaining quality. Within the first year of using Hivemind, Ripjar had created a carefully-crafted training dataset covering over 200,000 documents—in multiple languages—for its client risk systems.
Dean Jones, Head of NLP at Ripjar says:
"Hivemind’s platform provides flexibility to vary the interface presented to human workers so that we can maximise worker efficiency for different use cases. Together with mechanisms for ensuring quality of work and insights into the status of ongoing projects, it allows us to actively manage the balance between quality, cost and time."