Open data is not useful without powerful search engines.

Access to Open data is important and hopefully we can expect more data to be released by authorities in the near future. But not having the right tooling to meaningfully explore and navigate the data can hinder the benefits .

Regulated at European level by the Directive 2003/98 / EC on the re-use of public sector information (see here for further details), the issue of Open Data has been a topic that has agitated legal experts for some time now, as evidenced by the numerous publications on this issue.

If, of course, there are very few people to dispute its importance, one thing is to open the data - if necessary, after having overcome the pitfall of its anonymization / pseudonymization (on this topic, see our blog here) - it is another thing to guarantee, in practice, an efficient access to it (i.e. allow users to quickly find relevant information).

And here, in the legal field, experience unfortunately shows that the power of search engines accompanying released dataset is often (very) limited, preventing users from meaningfully exploring and navigating in the dataset.

Actually, this is essentially the result of a highly ‘supervised’ approach which, by nature, implies severe limitations. Indeed, a ‘supervised’ approach involves tagging/labelling out from a text a few, strictly defined, concepts, which:

Ignores most of the actual content of documents (as if those elements only have little or no interest at all and that documents could only be resumed in keywords);
Prevents any contextual analysis (as we mentioned in a previous blog here if the concepts are important, their context, the story they express by their association is also fundamental);
Prohibits any flexibility in the way of conducting searches (it is impossible to find back the documentation if the strictly defined concepts are not used and it is impossible to conduct more complex searches on the basis of, e.g., quotes included in documents).

We therefore come to a situation where massive legal datasets are open here and there in Europe but without effective search engines.

And, unfortunately, that robs open data of its substance and its objectives. The lawyers are indeed led to navigate through the dataset "haphazardly", the results becoming random: sometimes you will find the relevant information but other times, you will miss the critical information for your case; and, always, you will fear missing exhaustiveness.

But, fortunately, there is good news coming from technology.

And this good news is that a new generation of natural language processing models is emerging and creating a change of paradigm: the self-supervised model (on this topic, see our blog here). This model is indeed the ‘perfect partner’ when releasing (legal) datasets as by its self-supervised nature, it ensures, not only, to overcome tagging/labelling, but also to achieve an increased level of relevance and flexibility compared to supervised approaches.

Hence, in the future, if they want to level-up the access to open data, public authorities can only be advised to carefully consider this new trend.

EisphorIA will announce very soon (on January 6th) the launch of a major initiative in relation to legal Open Data

Of course, we could wait for that.

But at EisphorIA, building on the technical fundamentals on which we have built our 'new generation' solution (a.o. ensuring a super easy, efficient and autonomous deployment of our solution, and ensuring a super-fast and relevant processing of (simple or complex) requests made by the user), we prefer to anticipate.

Curious to know more about it? Follow our next blog posts.

By then, our best wishes for 2021!