Saving Face - Exploring Methods for Image Anonymisation
In this second blog in our data privacy dialogue series, we take an in-depth look at face detection for our iMCD (integrated Multimedia City Data) project to safeguard privacy.
Anonymisation for research
As mentioned in our last blog on data privacy, our iMCD project participants wore Autographer lifelogging devices that captured images as they went about their daily lives. Although they were willing to have their activities and interactions recorded, people they encountered would not have been aware they were being photographed.
External researchers - after gaining approval - will be able to use our iMCD data sets via UBDC’s third party safe haven provider, so it is important that we anonymise any elements of the data that could raise privacy issues before they are released.
There isn't a consensus on what constitutes a violation of privacy in images taken in public spaces, but many people seem in agreement that Google Street View's policy - that makes human faces and vehicle number plates invisible – is a suitable approach. We therefore decided that a method for blurring human faces in the hundreds of thousands of iMCD images needed to be developed.
Facing the challenges of detection
In order to blur the faces in the iMCD images we needed a machine learning process to detect them. Computers don't see faces as we do, so it is the identification of the regular visual pattern of facial features – eyes, nose and mouth - in a given image region that enables face detection to occur.
For the anonymisation of our images, we decided to use a method based on Haar Cascade Classifiers as implemented in the Open Source Computer Vision (Open CV) library – an open source resource. This method classifies around 6,000 features in a given image region to determine whether there is a face present. The 'cascade' part means that there is a process of elimination – if there isn't a feature that could indicate a face at the first stage the rest of that sequence will be abandoned, rather than pointlessly continuing the process. The method performs more computations on the regions of the image where the probability of a face existing is higher, ensuring that this is a fast method compared to others.
This may be a great method for face detection, but it is not without challenges – the main one being that faces aren't uniformly positioned in images. They can be directly facing the camera but also can be in profile and they may be tilted too. There is also the issue of the different sizes of faces – those that are closer to the camera will appear larger than those in the background of the image will. Considering these variables, our researchers developed a strategy to rotate and analyse the images at multiple angles to account for any tilting and looked at scale factors to account for the difference in size of the faces.
This means that every image is analysed multiple times, which can lead to a high probability of false positives – areas of an image that are blurred because a face has been wrongly detected – and lost image data as a result.
Given the sensitivities of the data, we are adopting a conservative approach and have decided that a number of false positives (and lost data) is preferable to undetected faces in our iMCD images. Google Street View detects 89% of faces but also has its fair share of false positives, such as the popular story of it protecting a cow’s identity. We are aiming for 95% precision in our face detection and will be happy with visual loss, in false positives, of up to 10% if this ensures privacy. This process will need reviewing – we will inspect the archive of images that have had the anonymisation method applied to see if we can use the instances of false positives to refine the methodology.
Potential public benefit from anonymised personal images
This blog is a brief summary of a large and complex piece of ongoing work here at UBDC to anonymise this data. We want to ensure that we are doing as much as possible to protect people's privacy and, as mentioned above, we will always err on the side of caution when it comes to the images we provide to researchers – even if this results in false positives in face detection.
However, it is also important to note the potentially valuable research that can be done with these images once they are anonymised. For example, the images could be used to review at what point pedestrians react to oncoming vehicles when trying to cross the road. There has also been interest from qualitative researchers in viewing people's daily routines in order to determine how much time is spent alone in comparison to interacting with people. As always at UBDC, we want to ensure that private data is used for public benefit.
We'd like to hear your thoughts on image anonymisation. Do you agree with our approach? Do you think that face detection should be used more often, such as for images uploaded to social media channels? Do you have any ideas for research that could be undertaken using anonymised images such as these? Please use the comments form below or contact us via our social media channels to let us know what you think.