Anonymised data, namely information which does not relate to an identified or identifiable person, as well as personal data having undergone a process through which an individual is no longer identifiable, falls in principle outside the scope of GDPR (recital 26 of said Regulation) and of most world's data protection laws. This means that these datasets can be freely used, disclosed and/or sold to others without any restrictions.
Threatened by heavy fines, companies sitting on large piles of personal data have been keen to claim to only use aggregated and incomplete data to protect peoples' privacy and in full respect of GDPR principles, by applying anonymisation techniques that make it impossible to single out a person from the crowd.
Yet is anonymised data really anonymous? Recent research from the University of Louvain and the Imperial College London shows that current methods for anonymising data leave individuals at risk of being re-identified using machine learning - even when incomplete datasets are used or shared.
According to their findings, combining only 15 demographic attributes (including age, gender and marital status) would render virtually 100% of the people in Massachusetts unique. These findings directly challenge the modern anonymisation standards established by GDPR by showing the limitations of currently envisaged techniques. As the Commission plans to take stock of the application of GDPR this Spring, such pitfalls should be taken into account in view of upgraded rules.
While there might be a lot of people who are in their thirties, male, and living in New York City, far fewer of them were also born on 5 January, are driving a red sports car, and live with two kids (both girls) and one dog.