• Blog
  • Podcast
  • Contact
  • Sign in
CloverDX Logo
Product
  • Core platform
  • CloverDX Data Integration Platform###Automation, orchestration & transformation
  • Wrangler###An intuitive interface for business users
  • Data Services###Make CloverDX jobs available as an API
  • Collaboration features
  • Data Catalog###Give business users access to reliable data
  • Data Apps###Allow business users to control data pipelines
  • Anonymization###Share data safely
  • Pricing
  • CloverDX plans and licensing
  • Deployment
  • CloverDX on AWS
  • CloverDX on Azure
  • CloverDX on Google Cloud
  • CloverDX on-premise
  • CloverDX on Docker
  • Resources
  • Release notes
  • Documentation
  • Customer Portal
  • Other resources
isometric-illustration--product@2x 1

Get under the hood of CloverDX

See how CloverDX can benefit your business with a live demo. Simply get in touch with our team and we’ll handle the rest.

Book a demo
Solutions
  • By Industry
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • FinTech
  • Government Agencies
  • Healthcare
  • By Use Case
  • Data Quality
  • Data Ingest
  • Data Warehousing
  • Data Migration
  • Modernizing ETL
  • Digital Transformation
  • Enterprise Data Management
  • Risk & Compliance
How F3 Group use CloverDX to ingest more client data - webinar
Customer interview

Formula 3: Staying Small And Agile While Working With Large Enterprise Ecosystems

Browse webinars
Services
  • Services
  • Onboarding & Training
  • Professional Services
  • Customer Support

More efficient, streamlined data feeds

Discover how Gain Theory automated their data ingestion and improved collaboration, productivity and time-to-delivery thanks to CloverDX.

 

Read case study
Customers
  • By Use Case
  • Analytics and BI
  • Data Ingest
  • Data Integration
  • Data Migration
  • Data Quality
  • Data Warehousing
  • Digital Transformation
  • By Industry
  • App & Platform Providers
  • Banking
  • Capital Markets
  • Consultancy & Advisory
  • E-Commerce
  • FinTech
  • Government
  • Healthcare
  • Logistics
  • Manufacturing
  • Retail
Migrating data to Workday - case study
Case study

Effectively Migrating Legacy Data Into Workday

Read customer story
Company
  • About CloverDX
  • Our story & leadership
  • Contact us
  • Partners
  • CloverDX Partners
  • Become a partner
Pricing
Demo
Trial

8 Fundamental Data Anonymization Mistakes That Could Put Your Business At Risk

Data Anonymization
Posted November 27, 2019
5 min read
8 Fundamental Data Anonymization Mistakes That Could Put Your Business At Risk

For data about EU citizens, it shouldn’t be possible to ‘single out an individual, link records relating to an individual or infer information concerning an individual’ without their consent - according to the GDPR. However, there are many situations where you need to use the data, for example, to facilitate software development, where you need to use real-world data but don’t have the necessary permission. That’s why data anonymization is crucial. 

Data anonymization turns your sensitive data into usable data sets by stripping identifiable information and making it anonymous. 

There are different techniques when it comes to anonymizing data, such as masking, noise addition and randomization. However, without the right best practices, these processes can become confusing and, in turn, lead to mistakes that put your data at risk. 

We want to shed some light on eight common mistakes (and myths) we hear regularly at CloverDX: 

1. You only need to change the obvious Personal Identification Indicators (PII)

Anonymization might seem like an easy task. If you delete the names of individuals in your dataset, it’s done, right? Unfortunately, that’s not the case. 

Variables that are not ‘identifiers’ can still supply context which may lead to identification. For example, when Netflix released data displaying movie ratings, they removed usernames and randomized ID numbers. However, MIT were able to match these anonymized data sets to Amazon users via similar ratings on their site. The data was deanonymized using no PII. 

Book a free demo CloverDX CTA

2. The difference between synthetic and anonymized data

Synthetic data is generated artificial data that resembles your original dataset but contains completely fake information. Synthetic data generates valid values, making it better for certain types of testing and analysis such as software testing, but still has its limitations. It mixes up the original data so much so that it is now difficult to draw useful information.

Unlike synthetic data, anonymized data holds on to some of those important attributes which allow the data to be analyzed for business intelligence purposes. For example, in an anonymized data set you can still ask questions like “What is the most common first name?”, which wouldn’t be possible in a synthetic data set. 

Therefore, improving synthetic data and strengthening the security of anonymized data is often a merged process. The way you use these together will be on a case by case basis. You can use anonymization for data that you need to retain key characteristics from and synthetize high risk or useless data. Both will go hand-in-hand in providing you with the most secure, usable data. 

CloverDX-Pseudonymization-and-anonymization

3. Confusing anonymization with pseudonymization 

Pseudonymization and anonymization are not the same, according to the GDPR. 

Why? Because pseudonymization is reversible if the original data is accessible. 

For example, you anonymize a transactions database by removing all personal details and put a “customer number” there instead. And somewhere else, there is another database saved that matches the customer number to your details. If you give out just the transactions database, no one can tell who anyone is. But the data is reversible if the third party gains access to your customer database. The data then becomes identifiable. 

On the other hand, in some instances of anonymization, it can be difficult to reverse the meaning of incoherent characters to identify key information. 

4. Anonymizing data destroys the quality

There’s always a tradeoff between the danger of reconstructing your original data set and losing its value. If you anonymize your data correctly, it will lose its link to the original dataset. Unlike pseudonymized data where that link is still present and can enable identification. Anonymized data is secure, as close to the real thing as possible and will still provide value to your business.

Depending on what level of anonymization you require, the data will still provide relevant relationships and properties that you need to make well-informed decisions, all while keeping your data subjects safe. For example, you can still get meaningful website traffic analysis using anonymized data.

New call-to-action

5. Keeping original data after anonymization 

If the source data is kept after anonymization takes place, means it’s actually pseudonymized data and is still considered to be personal, therefore, ‘identifiable’. 

Holding original, incorrectly processed data puts your business at risk. This data can only legally be processed in accordance with the relevant data protection legislation, including GDPR. This is concurrent to the next mistake you should be aware of:

6. Anonymizing one data set/occurrence 

So, you’ve successfully deidentified a data set and prepped it for a third party for analysis. 

However, it’s likely some of this information overlaps with data stored elsewhere in your business and is therefore linked. This means that the data becomes identifiable. 

To be safe, you can anonymize all occurrences of the data, reducing the risk of information being linked to individuals. However, you don’t necessarily need to anonymize absolutely everything, just make sure all the data sets available to the specific party don’t pose a threat when they are combined.

7. All cases of anonymization are the same 

The context of your data’s purpose determines the type of anonymization that needs to occur. 

There are different individual and sets of anonymization techniques you can use depending on the size and sensitivity of your data. You may also pair these with other privacy best practices.

8. ‘Differential Privacy’ vs Anonymization

Differential privacy is another trending data protection technique that companies such as Apple, Google and Uber use.

Put simply, differential privacy is one of many types of privacy protection. It is a mathematical definition of privacy in the context of statistical and machine learning analysis. It’s a useful, albeit complicated, method that allows you to measure the privacy of a database rather than actually privatize your data. Another problem with differential data is that it cannot provide results for smaller samples like anonymization can. 

Lessons learned

Mistakes and misconceptions are common when it comes to anonymizing data. We’ve covered eight of the biggest in this blog post.

Put simply, gaining control of your data and complying with data protection laws, all while being able to draw value from your data, is of the utmost importance for your business. 

Anonymization ensures you benefit from your data safely by removing sensitivity and keeping the data as close to the original source as possible. 

So, are you falling into any of these anonymization traps? Perhaps it’s time to re-evaluate your techniques.

New call-to-action

Share

Facebook icon Twitter icon LinkedIn icon Email icon
Behind the Data  Learn how data leaders solve complex problems every day

Newsletter

Subscribe

Join 54,000+ data-minded IT professionals. Get regular updates from the CloverDX blog. No spam. Unsubscribe anytime.

Related articles

Back to all articles
Data Architecture Data Anonymization
4 min read

4 Tips for Solving Large-Scale Enterprise Data Classification Problems

Continue reading
Data Anonymization
5 min read

Data Anonymization: 7 Essential Use Cases

Continue reading
What is data anonymization
Data Anonymization
5 min read

What is Data Anonymization?

Continue reading
CloverDX logo
Book a demo
Get the free trial
  • Company
  • Our story
  • Contact
  • Partners
  • Our partners
  • Become a partner
  • Product
  • Platform overview
  • Plans & Pricing
  • Customers
  • By Use Case
  • By Industry
  • Deployment
  • On-premise
  • AWS
  • Azure
  • Google Cloud
  • Services
  • Onboarding & Training
  • Professional Services
  • CloverCARE Support
  • Resources
  • Customer Portal
  • Documentation
  • Downloads & Licenses
  • Webinars
  • Academy & Training
  • Release Notes
  • CloverDX Forum
  • CloverDX Blog
  • Behind the Data Podcast
  • Tech Blog
  • CloverDX Marketplace
  • Other resources
Blog
Choosing The Right Data Integration Software: 12 Essential Questions
Data Integration
6 major data management risks — and how to tackle them
Data Management
Why data trust matters to your customers
Data Quality
How business systems analysts can make data more accessible
Data Democratization
© 2024 CloverDX. All rights reserved.
  • info@cloverdx.com
  • sales@cloverdx.com
  • ●
  • Legal
  • Privacy Policy
  • Cookie Policy
  • EULA
  • Support Policy