Hello Legal world! Have you ever been confronted with pseudonymisation tasks?

Privacy and confidentiality are raising the needs to perform pseudonymisation tasks which are very often resource intensive and time consuming . If the second topic has always been a major concern, the first one has seen its importance raising significantly over the last few years with the implementation of GDPR among others. In this new regulatory environment, pseudonymisation has been a hot topic and it will probably become even more important in the next months to come, with an expected rise of controls and fines.

Here, adequate technology – combining the requirements of accuracy, utility and security – can definitely help organizations to save huge amounts of precious time and drastically reduce mistakes. Beyond, it surely secures all stakeholders on the fact that privacy (and confidentiality) is addressed seriously by organisations.

At EisphorIA, we have developed a product – CleanMyDoc – that supports very efficiently the pseudonymisation process with a high accuracy rate and security level. It can be particularly helpful in due diligence processes or other similar situations.

Now, pseudonymisation is also a regulatory concern
If pseudonymisation (or, but not treated here, anonymisation) has always been top of mind for confidentiality purposes, its importance has significantly increased with the emergence/reinforcement of the regulatory landscape on privacy. As we all know, failure in the pseudonymisation process now leads to compliance breaches, which can severely impact an organization both from a financial and reputation point of view.

This explains, as raised by ENISA in its November 2019 Pseudonymisation techniques and best practices Paper, that “in the light of GDPR, the challenge of proper application of pseudonymisation to personal data is gradually becoming a highly debated topic in many different communities”.

Pseudonymisation process should hence be carefully undertaken, satisfying a 3-levels requirements: (1) degree of robustness/accuracy necessary for answering adequately to privacy, (2) degree of utility necessary for processing and understanding the data and (3) degree of security necessary for limiting risks of re-identification. Such a process may obviously differ from a context to another (the specific importance spectrum of each requirement listed above varies in function of circumstances).

Technology as a true helper
Without surprise, adequate technology – i.e. solutions that efficiently responds to the 3-levels requirements of accuracy, utility and security – is a true helper here.

Indeed, it is hardly imaginable that pseudonymisation is still undertaken manually, not solely because of the massive amount of documents to be processed but also because of the high risk of errors such manual review would incur. In other words, in addition to time and energy costs, a manual pseudonymisation does not give sufficient guarantees as regards to the 3-levels requirements.

Technology is hence an essential relay. And the good news is that the algorithms developed are more and more powerful in terms of accuracy, usability and security. State-of-the art algorithms imply (1) an automated recognition of name, location, register numbers, organizations, emails, IP addresses, etc…, (2) coherence in the pseudonymisation processing both in specific document and in the whole data set and (3) capacity of adaptation as regards to clients’ preferred pseudonymisation approaches and policies.

To quote the ENISA Paper, the new-gen algorithms respond adequately to the recommendation that the implementation of pseudonymisation process follows “a risk-based approach, taking into account the purpose and overall context of the personal data processing, as well as the utility and scalability levels they wish to achieve”.


At EisphorIA, we have developed a pseudonymisation solution relying on state-of-the art algorithms and reinforcing them via our in-house R&D.

This solution permits to pseudonymise in a very short time frame, large datasets, showing:

  • Usability. The dataset can be uploaded on a dedicated / personalized platform and, from there, be immediately processed by our algorithms (note that the solution is able to deal with more than 100 formats and includes OCRisation).
  • Accuracy. Our in-house algorithms have achieved very significant accuracy results on testing datasets both positively and negatively.
  • Flexibility. Our in-house algorithms not only address privacy but they can also respond to specific requests (also, here, the combination with our contextual search algorithms highly facilitates some customized black lining).
  • Utility. Our in-house algorithms ensure coherence and consistency in documents or in the whole dataset, facilitating the review of the documentation.
  • Security. Our in-house algorithms are naturally agile and adaptable to follow guidelines in terms of techniques and policies.

Check to the right an example of outcome processed by our algorithms (processing here limited to persons, organizations and locations) and if you want to discover it in more detail, please don’t hesitate to contact us.