Self Supervised Learning, a paradigm shift to assist lawyers searching for the critical information.

We keep hearing the legal market needs to adopt more advanced technologies and in particular AI. But AI has been there for a long time at least within large law firms who have invested vast amounts of money. It is particularly true in e-discovery where many solutions are out there for many years and all are using some flavor of AI trying to leverage the huge potential of Natural Language Processing (NLP). Most of these solutions are using supervised learning which is the most widespread methodology. Supervised learning does the job but it requires many man hours to be built and fine tuned and is therefore quite expensive.

Now thanks to recent major breakthroughs in the field of NLP, the increased computing power, the scalability offered by the cloud and the availability of raw data, new possibilities are offered to apprehend human language without having to create and train models to identify specific concepts in textual data sets. We call this self supervised learning (also referred as unsupervised learning). It provides significant benefits to the legal sector. Let’s discover them and understand why EisphorIA has gone that way.

The constraints of Supervised Learning


“I work for the tool instead of the tool working for me”


The main drawback of supervised learning is that it often relies on large data sets that need to be cleaned and labelled by humans in order to train the models. Very often, it implies pre-work from the customers before they can actually use the tool. In the case of e-discovery, it is not rare to see entire legal teams spending days working on the labeling of the models to achieve a specific task. Worse, during the dispute resolution process, new tasks may surface which require replicating the same fastidious process. This has raised some frustration and one customer summed it up very well: “I work for the tool instead of the tool working for me”. Moreover each person, labeling the data introduces his own biases to the final outcome and thus the work of less experienced people, usually involved in this process, forms the basis of the final result.


As you translate the above in terms of cost, you quickly understand why such technology has been mostly used by law firms with deep pockets. This also means there is a tremendous opportunity to come up with a different approach.

Self supervised learning removes the expensive constraints and allows more law firms to have access to advanced technology


“In many document reviews, having the different patterns and concepts automatically identified by the algorithms is highly valuable”


Self supervised learning is suitable for exploring unknown data. As in the human education process, generally we are not taught to only solve a particular task for the whole life, based on a predefined algorithm, we are studying the ways how to use our knowledge to adapt available skills for unknown situations. Same holds for textual information: humans learn how to read generic text and then to extract some facts from it. Based on specific documentation, like the legal documents, we can learn by our self how it is organised and where to search particular information. Why should the machine behave differently? Let’s learn directly from the texts, which are available and contain the knowledge we are interested in. In many document reviews, having the different patterns and concepts automatically identified by the algorithms is highly valuable. This means you start searching your data right away through contextual search. The algorithms understand the meaning or the developed concept and their related context in your search request and look for all documents in the data set that are developing the same idea even if they are using a different wording. For instance, it will help you put into perspective any counter party conclusions by copy-pasting them into your contextual search and see all documents that are related.


At EisphorIA, we compared the different methodologies (self learning and supervised learning) through multiple simulations and we analyzed the outcomes. While we were expecting the supervised models to fulfill better the dedicated tasks, we were surprised to notice the reverse in terms of relevance for clustering the information and for searching in the data set. The self supervised learning model provided much more robust and stable results.


This is exciting as this means you don’t need to invest as much as you would with supervised learning and for a larger number of law firms, it represents a great opportunity not only to catch up but actually to make a big jump in terms of technology capabilities and improve their competitiveness.


Self supervised learning + Anthill architecture + Modern interface = winning combination

At EisphorIA, we are betting on self supervised learning and we truly believe it can completely change the experience of the lawyer to be truly assisted by technology in searching for the critical information.


Thanks to this modern methodology, the right architecture in place and an ultra modern UX/design, we can guarantee the following:

  • Operational in a few hours, not weeks, not months. Because we don’t need all the pre-work for the models to work as expected, we are able to make the platform up and running in hours only. Our goal is to make you operational as soon as possible, answering the preliminary questions from your customers, being swiftly reactive to any conclusions provided by counter party and finding the critical information you need to build your case.
  • Relevant Results to your search: we are providing a contextual search meaning that more context you provide (not just keywords but sentences, paragraphs or even full text) and the better will be the results. Forget the era, when the search provides you 10 pages of results, not extremely relevant and you have to read them all! Our algorithms will give you those documents, which are indeed relevant. And our heat map is even going beyond, indicating to you within the document itself where to look for the most relevant paragraphs.
  • Ultra fast processing and a great fluidity experience even though you are searching in hundreds of thousands of documents. We are providing this thanks to our “anthill architecture” which mimics the way ants are working (distributing repetitive tasks) for greater efficiency.
  • Easy-to-use and intuitive interface for quick on boarding and broad adoption. You understand the tool and its functionalities in minutes. No need for long training or no fear of not recalling how things are working. An interface that also assists you while performing a contextual search, giving you hints to optimize this search with the right parameters.

Curious to discover our solution more in detail, please contact us.