Computers, Privacy & the Constitution

Do users have a right to control data about themselves?

-- By LeonHuang - 05 Mar 2017 (Revised - 17 May 2017)

Service providers typically keep anonymized records of how users are using their services, and the users’ agreements typically require the users to agree to such practice.

Why do you say these records are "anonymized"? That's the one thing they are pretty sure not to be.

Users who care about both convenience and privacy may be motivated to argue that they should have a say over data about themselves. Is this position well-justified?

What does "well-justified" mean? Legally, ethically, logically?

Generally speaking, you would want to begin an essay with an idea, not an open-ended question. The reader needs to know why she should read the essay. Obviously, if the reader is a philosopher who doesn't care about time, whether or not to have an idea may be precisely the stage at which she likes to begin reading. But for the remaining fraction of humanity, you haven't provided a reason to read the essay, and that means it will most likely not be read.

Review of the Argument

The position appeals to our intuition in two ways. First, the content of the records is produced by the users. Users may deem themselves as the author of such data and hence the owner of the data. Second, users expect to have control over their privacy. The records on users’ behaviors contain private information that users do not expect others to know, and therefore the users would think that they should have some say in how the records are being used.

Once again the unclarity about what sort of argument is being rehearsed makes reading difficult. One sentence seems to be discussing a legal concept, ownership, while the next appears to be stating a political principle, though whether the principle is that things other people don't know we should have some say in controlling, or whether we have some stake in controlling things others know too is left vague.

There are counterarguments too. The keeping of the records is performed by the service providers. Although the content of the records is largely dependent on users’ behaviors, the service providers are nevertheless the author of the records in the same way that the people writing biographies are the authors of those biographies rather than the subjects of those biographies. In addition, anonymization alleviates the privacy concerns. After identity-sensitive information is striped from the dataset, the record of each individual’s behavior becomes less likely to violate that individual’s privacy. The data is no longer about each of the individual users but rather about a set of users sharing some demographic characteristics. Users cannot be justified to argue for more when the data about them is perfectly anonymized.

Cannot, why? What is "perfectly anonymized" data? Why is data always either about one person or about nobody? If someone is keeping track of all Jews, all Muslims, or all Tutsi, that's not about privacy because privacy is only about one person at a time? Perhaps the definition of "privacy" is at fault. But we don't know what that definition is because you haven't given one.

But perfect anonymization is a high standard. Stripping away identifiable information is not always enough. For example, the TLC Trip Record Data provides publicly available information on the dates, times and locations of all taxi pick-ups/drop-offs in the New York City in a given year. Although the data does not include the identity of the passengers, it nevertheless increases the risks of privacy violations when it is used in conjunction with other publicly available information. Celebrity cab rides become easier to identify. People may simply find a photo of a celebrity getting into a cab and use the date, time, and location to find out where the celebrity was going. And in turn people may easily find out where the celebrity lives. The average Joe faces the same risks. An acquaintance may easily find out where you live after seeing you leaving in a cab after work. This tension between privacy and highly-accurate geolocation data has led to a proposal that the TLC should reveal only census tracts instead of the exact coordinates.1

Anonymized data is still capable of revealing key information about users’ identity when such data is used in combination of data from other sources. A link can be established between datasets when there are significant overlaps, which are not as difficult to achieve as one would expect. In the case of the TLC Trip Record Data, date, time, and location provide enough overlaps to link a particular cab ride to the cab rider being photographed. In the case of purportedly anonymized records collected by cloud service providers, there have already been efforts to combine such data from multiple sources to construct the complete profiles.2 The combination of datasets dilutes the responsibility of each service provider from whom the datasets are obtained. The more service providers complicit in guilt, the more diluted their individual responsibility is.

I don't understand the idea of diluted responsibility. Is that like when three polluters each put one poison into a river?

A Way Out?

Upon review, I realized that it may not be productive to argue for the right to control data about ourselves in order to seek changes in the practices of service providers. I cannot avoid making a choice between convenience and privacy. The calculus partly depends on exactly how much convenience I would have to sacrifice. In March I set out to install a personal cloud in order to obtain a rough frame of reference.

After two hours of preliminary research, I learned the kind of hardware I need to purchase: a single-board computer such as Raspberry Pi plus accessories such as an SD card for storage, at a total of $86.92. Once I obtained the equipment, I spent another two hours in research to learn how to install the images of the operating system and how to communicate with the equipment from my laptop. Then I spent four hours to install personal cloud software on the equipment, following a guide that I found online. Finally, in order to make my cloud accessible outside of my local network, I spent another two hours setting up port forwarding and dynamic DNS. In sum, I spent 10 hours of my time and $86.92 of my dime to set up a workable personal cloud with 32GB storage. In return, I gained the freedom to use cloud storage services without being forced to have my data collected by someone else.

It turns out the tradeoff is not limited to the initial set up cost. My personal cloud is excruciatingly slow compared to the established cloud services. It sometimes stops responding until I reboot the equipment, making it rather unreliable as a service intended for remote access. And I still cannot trust its security, because I know it is set up and maintained by an amateur with little knowledge about network security and little time to even keep its operating software up-to-date.

I cannot think of any files that I need remote access from time to time, that I am willing to tolerate the quirkiness of my personal cloud so as to prevent any service providers to harvest any data on me, and that are not sensitive enough for me to worry about targeted hacking. In the end, I do not know what to do with my personal cloud. I pulled the plug by the end of April.

This anecdote doesn't seem to me related in any way to the previous analysis. I do think that it would probably have been a good idea to consult somebody else rather than trying to do your engineering by the light of nature. By spending $150 instead of $90, you could have had a single-board computer much faster than a RasberryPi? , and if you had installed FreedomBox? software your personal cloud would have done a great deal more for you than it appears you could figure out yourself how to do, which isn't surprising considering that dozens of experts have already been working on it for seven years and you took a couple of hours.

But how did this help us to determine whether something or other was "justified"? And what was the idea the essay wanted to communicate to the reader? I think it was "privacy, whatever that is, isn't very important even though we intuit at first that it might be. But mostly it's just a lot of unnecessary trouble." Did I get that right?


Webs Webs

r3 - 27 Sep 2017 - 15:38:33 - EbenMoglen
This site is powered by the TWiki collaboration platform.
All material on this collaboration platform is the property of the contributing authors.
All material marked as authored by Eben Moglen is available under the license terms CC-BY-SA version 4.
Syndicate this site RSSATOM