Privacy in the social world – the Google+ way

In the online social world, the current buzz is the launch of Google+, by Google of course. Given the popularity of Facebook, and twitter, it is not difficult to see that Google+ has to have something significantly unique about its offering to be a real challenge to the main social networks. So, first, a little background to what I think Google+ is. Well, it is a social network with the characteristics of Facebook i.e. you can add friends, photos etc. In addition, it has the characteristics of twitter – you can post messages, only longer than 160 characters, and you can follow ( and be followed by) people.

Facebook has faced criticisms  with regards to privacy issues. But how is Google+ attempting to address that? Well, users can put friends in ‘circles’. These circles are more of partitions rather access groups. Default circles include ‘family’, ‘acquaintances’, and ‘friends’. You can share posts, pictures etc with only selected circles.

From my personal experience in the last few days that I have played with Google+, it seems straight forward enough to drag and drop friends into different circles and create new circles if necessary. If it works according to how I understand it is supposed to work, then I think many of the facebook privacy issues will be addressed. This, of course, is ignoring that Google+ is integrated with Gmail and Google search uses Gmail and Google+ data for its own competitive advantage.

What doesn’t surprise me, though, is that in the 26 days that Google+ has been operational the majority of the 18+ million users who have adopted (well, mostly likely just trying it out) it are tech users. Of course there may be a number of reasons for this but in my view, I think that tech users understand issues around privacy and have some ideas of how they can achieve privacy in their various social interactions (please note how I loosely use the term ‘privacy’ in this article). When a product that promises and attempts to address privacy concerns, tech users are likely to adopt it early on and experiment with it.

But if my assumption here is correct, how long will it take for the almost 1 billion facebook users to realise and jump ship? Oh, wrong question. Are facebook users actually concerned about their privacy? If they are, will they know when a better product comes on the market (not that Google+ is that product)? And most importantly, will they jump ship once that product all privacy campaigners are yearning for? In attempting to answer these questions, let’s ignore the effect of externalities. I would argue that since facebook users DO NOT understand or care about privacy (I would like to see a study that nullifies this claim), a robust and privacy enhanced social networking tool is unlikely to take off, at least among the general population.

local news

I’m delighted to say that Professor Sadie Creese will be joining the Department of Computer Science – hopefully in October, but perhaps later – to become Professor of Cyber Security and bring leadership to our activity in that area.

Prof. Creese studied for her DPhil in Oxford, with Bill Roscoe as supervisor.  She then worked at QinetiQ before moving to Warwick University.  Coming with her will be Professor Michael Goldsmith and about eight other research staff.  The objective of this move is to create a large centre of expertise in Oxford, able to take an internationally-leading role in research around cyber security, information assurance, and related fields.  This is of course a major step forward in the vision I have been touting for some time (my ‘world domination plan’ as Ivan put it), and has every prospect of making Oxford an even more attractive partner for funders and other projects.  We will be looking for ways to enhance cross-disciplinary working in order that we can make genuine steps forward in this area.


disk erasure

A recent pointer to Peter Guttman’s new book on security engineering (looks good, by the way) reminds me that Guttman’s name is associated with the woeful tale of disk erasure norms.

The argument goes this way: ‘normal’ file erases (from your windows shell, say) merely change pointers in tables, and do not remove any data from the disk.  A fairly unsophisticated process will recover your ‘deleted’ files.  Wiser people ensure that the file is deleted from the media itself – by writing zeros over the sectors on the disk that formerly contained the file.  Guttman’s argument was that because of minor variations in disk head alignment with the platters, this is insufficient to ensure the complete removal of the residual magnetic field from the former data.  There is a possibility that, with the right equipment, someone could recover the formerly-present files.  So he has an algorithm involving, I think, 35 passes, writing various patterns, calculated to destroy any remaining underlying data.

Now, the problem appears/appeared real enough: various government standards have, for decades now, ruled that magnetic media which has held classified material cannot be declassified but must be destroyed  before leaving secure custody.  Whether anyone has ever managed to recover a non-trivial amount of data from a once-zeroed disk is much less clear: as far as I know, there’s not a lot in the open literature to suggest it’s possible, and none of the companies specializing in data recovery will offer it as a service.  Furthermore, since Guttman did his original work, disk design has evolved (and the ‘size’ of the bits on the disk become so small that any residual effect is going to be truly minimal), and disk manufacturers have built a ‘secure erase’ into their controllers for quite a few years now.  Even better, the new generation of self-encrypting drives can be rendered harmless by the deletion of just one key (don’t do this by accident!).

Yet, the perception persists that the simple solutions are insufficient. Let us leave aside Government security standards and think simply of commercial risk.  Multi-pass erasure is downright time-consuming.  You can buy a disk-shredding service –  but this attracts quite a fee.  So it is not uncommon simply to archive used disks in a warehouse somewhere (with or without a single zeroing pass, I suppose).  How long you would keep those disks for is unclear: until their data ceases to be valuable, I suppose.  But without a detailed inventory of their contents, that cannot easily be determined.  So perhaps you have to keep them forever.

My simple question is: which attracts the lower risk (and/or the lower total predicated cost)? (a) Zeroing a disk and putting it in a skip, or (b) Warehousing it until the end of the lifetime of the data it holds?  You can postulate whatever adversary model you wish.  The answer is not obvious to me.  And if we can’t make a simple determination about risk in this case (because, frankly, the parameters are all lost in the noise), what possible chance do we have of using risk calculations to make decisions in the design of more complex systems?

Webinos versus Meego

One of the systems security projects we’re working on in Oxford is webinos – a secure, cross-device web application environment.   Webinos will provide a set of standard APIs so that developers who want to use particular device capabilities – such as location services, or media playback – don’t need to customise their mobile web app to work on every platform.  This should help prevent the fragmentation of the web application market and is an opportunity to introduce a common security model for access control to device APIs.  Webinos is aimed at mobile phones, cars, smart TVs and PCs, and will probably be implemented initially as a heavy-weight web browser plugin on Android and other platforms.

By a staggering coincidence, the Meego project has a similar idea and a similarly broad ranges of devices it intends to work on.  However, Meego is aimed at native applications, and is built around the Qt framework.  Meego is also a complete platform rather than a browser plugin, containing a Linux kernel.  Meego requires that all applications are signed, and can enforce mandatory access controls through the SMACK Linux Security Module.

In terms of security, these two projects have some important differences.  Meego can take advantage of all kinds of interesting trusted infrastructure concepts, including Trusted Execution Environments and Trusted Platform Modules, as it can instrument the operating system to support hardware security features.  Meego can claim complete control of the whole platform, and mediate all attempts to run applications, checking that only those with trusted certificates are allowed (whitelisting).  Webinos has neither of these luxuries.  It can’t insist on a certain operating system (in fact, we would rather it didn’t) and can only control access to web applications, not other user-space programs.  This greatly limits the number of security guarantees we can make, as our root of trust is the webinos software itself rather than an operating system kernel or tamper-proof hardware.

This raises an interesting question.  If I am the developer of a system such as webinos, can I provide security to users – who may entrust my system with private and valuable data – without having full control of the complete software stack?  Is the inclusion of a hardened operating system necessary for me to create a secure application?  Is it reasonable for me to offload this concern to the user and the user’s system administrator (who are likely to be the same person?)

While it seems impractical for developers to ship an entire operating system environment with every application they create, isn’t this exactly what is happening with the rise of virtualization?


email retention

Someone in the group suggested a blog post on email retention.  It’s a good topic, because it tracks the co-evolution of technology and process.

The Evolution

Back in the day, storage was expensive – relative to the cost of sending an email, anyway.   To save space, people would routinely delete emails when they were no longer relevant.

Then storage got cheap, and even though attachments got bigger and bigger, storing email ceased to be a big deal.  By the late 1990s, one of my colleagues put it to me that the time you might spend deciding whether or not to delete something already cost more than the cost of storing it forever.  I have an archive copy of every email I’ve sent or received – in professional or personal capacities – since about 1996 (even most or all of the spam).

Happily, one other technology helped with making email retention worthwhile: indexing.  This, too, is predicated on having enough storage to store the index, and having enough CPU power to build the index.  All of this we now have.

However, a third force enters the fray: lawyers started discovering the value of emails as evidence (even if it’s rubbish as evidence, needing massive amounts of corroboration if it is not to be forged trivially). And many people – including some senior civil servants, it seems – failed to spot this in time, and were very indiscreet in their subsequently- subpoenaed communications.

As a result, another kind of lawyers – corporate lawyers – issued edicts which first required, and then forced, employees of certain companies to delete any email more than six months old.  That way, they cannot be ‘discovered’ in an adverse legal case, because they have already been erased.

Never mind that many people’s entire working lives are mediated by email today: the email is an accurate and (if permitted) complete record of decisions taken and the process by which that happened.  Never mind that emails have effectively replaced the hardbound notebooks that many engineers and scientists would use to retain their every thought and discussion (though in many places good lab practice retains such notebooks).  Never mind that although it is creaking under the present strain of ‘spam’ and ‘nearly spam’ (the stuff that I didn’t want, but was sent by coworkers, not random strangers), we simply haven’t got anything better.

The state-of-the art

So now, in those companies, there is no email stored for more than six months, yes?  Well, no.  Of course not.  There are lots of emails which are just too valuable to delete.  And so people extract them from the mail system and store them elsewhere.  Or forward them to private email accounts.  There are, and always will be, many ways to defeat the corporate mail reaper.  The difference is that the copies are not filed systematically, are not subject to easy search by the organisation, and will probably not be disclosed to regulators or in other legal discovery processes.  This is the state-of-the art in every such organisation I’ve encountered (names omitted to protect the … innocent?).

Sooner or later, a quick-witted external lawyer is going to guess that this kind of informal archiving might help their case, and is going to manage to dig deeper into the adversary’s filestores and processes.  When they find some morsels of beef secreted in unlikely places, there will be a rush of corporate panic.

The solution

It’s easy to spot the problem.  It’s much harder to know what to do about it.  Threatening the employees isn’t very productive – especially if the ‘security’ rule is at odds with the goal of getting their work done.  Making it a sacking offence, say, to save an email outside the corporate mail  system is just going to make people more creative about what they save, and how they save it.  Unchecked retention, on the other hand, will certainly leave the organisation ‘remembering’ things it would much rather it had ‘forgotten’.

At least it would be better if the ‘punishment’ matched the crime: restricting the retention of email places the control in the wrong place.   It would be much better to reserve the stiff penalties for those engaged in libel, corporate espionage, anticompetitive behaviour, and the rest.  Undoubtedly, that would remain a corporate risk, and a place where prevention seems better than cure: the cost to the organisation may be disproportionately greater than any cost that can be imposed upon the individual.  But surely it’s the place to look, because in the other direction lies madness.

Spectacular Fail!

Step 1.  Install “Windows XP Mode” using Microsoft Virtual PC on Microsoft Windows 7.

Step 2. Windows XP warns that there is no anti-virus program installed.  Use the supplied Internet Explorer 6 to try to download Microsoft Security Essentials.  Browsing to the page fails.   I happen to know that this is a manifestation of a browser incompatibility.

Step 3. Use the Bing search box on the default home page of Internet Explorer 6 to search for “Internet Explorer 9”.   You have to scroll down a long way before finding a real Microsoft link: who knows how much Malware the earlier links would serve up?!

Words fail me, really.

Outsourcing undermined

In the current headlong rush towards cloud services – outsourcing, in other words – leads to increasingly complex questions about what the service provider is doing with your data.  In classical outsourcing, you’d usually be able to drive to the provider’s data centre, and touch the disks and tapes holding your precious bytes (if you paid enough, anyway).  In a service-oriented world with global IT firms using data centres which follow the cheapest electricity, sometimes maybe themselves buying services from third parties, that becomes a more difficult task.

A while ago, I was at a meeting where someone posed the question “What happens when the EU’s Safe Harbour Provisions meet the Patriot Act?”.  The former is the loophole by which personal data (which normally cannot leave the EU) is allowed to be exported to data processors in third countries, provided they demonstrably meet standards equivalent to those imposed on data processors within the EU.  The latter is a far-reaching piece of legislation allowing US law enforcement agencies powers of interception and seizure of data.  The consensus at the meeting was that, of course the Patriot Act would win: the conclusion that Safe Harbour is of limited value.  Incidentally, this neatly illustrates the way that information assurance is about far more than just some crypto (or even cloud) technology.

Today, ZDNet reports that the data doesn’t even have to leave the EU for it to be within the reach of the Patriot Act:  Microsoft launched their ‘Office 365’ product, and admitted in answer to a question that data belonging to (relating to) someone in the EU, residing on Microsoft’s servers within the EU, would be surrendered by Microsoft – a US company – to US law enforcement upon a Patriot Act-compliant request.  Surely, then, any multinational (at least, those with offices? headquarters? in the US) is in the same position.  Where the subject of such a request includes personal information,  that faces them with a potential tension: they either break US law or they break EU law.  I suppose they just have to ask themselves which carries the stiffer penalties.

Now, is this a real problem or just a theoretical one? Is it a general problem with trusting the cloud, or a special case that need not delay us too long?   On one level, it’s a fairly unique degree of legal conflict, from two pieces of legislation that were rather deliberately made to be high minded and far reaching in their own domains.  But, in general, cloud-type activity is bound to raise jurisdictional conflicts: the data owner, the data processor, and the cloud service provider(s) may all be in different, or multiple, countries, and any particular legal remedy will be pursued in whichever country gives the best chance of success.

Can technology help with this?  Not as much as we might wish, I think.  The best we can hope for, I think, is an elaborate overlay of policy information and metadata so that the data owner can make rational risk-based decisions.  But that’s a big, big piece of standards work, and making it comprehensible and usable will be challenging. And, it looks like there could be at least a niche market for service providers who make a virtue of not being present in multiple jurisdictions.  In terms of trusted computing, and deciding whether the service metadata is accurate, perhaps we will need a new root of trust for location…

Experiences at TaPP’11

On Monday and Tuesday this week I attended the third “Theory and Practice of Provenance” workshop in Crete. The event was a great success: lively discussion from people presenting interesting and practical work.   For those who don’t know about Provenance, here’s a snappy definition:

‘Provenance’ or ‘lineage’ generally refers to information that ‘helps determine the derivation history of a data product, starting from its original sources’ . In other words, a record of where data came from and how it has been processed.

Provenance applies to many different domains, and at the TaPP’11 workshop there were researchers working on theoretical database provenance, scientific workflows, practical implementation issues, systems provenance (who want to collect provenance at the operating system level) as well as a few security people. I presented a short paper on collecting provenance in clouds, which got some useful feedback.

At the end of the event we ended with a debate on “how much provenance should we store” – with most people sitting somewhere between two extremes: either we should store just the things we think are most important to our queries, or we store everything that could possible impact what we are doing. The arguments on both side were good: there was a desire to avoid collecting too much useless data, as this slows down search and has an attached cost in terms of storage and processing. On the other hand, the point was made that we didn’t actually know how much provenance was enough, and that if we don’t collect all of it, we could come back and find we missed something. Considering the cheapness of storage and processing power, some believe that the overhead was unimportant. As a security researcher interested in trusted provenance, the “collect everything” approach seemed like my cup of tea. If the collecting agent was trusted and could attest to its proper behaviour, provenance information could be made much more tamper-resistant.

However, from the perspective of someone involved in privacy and looking at storage of context (which is a part of provenance), the preservation of privacy seemed to be an excellent reason not to collect everything. For example, I suspect that academic researchers don’t want to store all their data sources: what if you browsed Wikipedia for an overview of a subject area, and that was forever linked with your research paper? Similarly, full provenance during computation might reveal all the other programs you were using, many of which you might not want to share with your peers. Clearly some provenance information has to stay secret.

The rebuttal to this point was that this was an argument for controlled disclosure rather than controlled collection. I think this argument can occur quite often. From a logical perspective (considering only confidentiality) it might be enough to apply access controls and limit some of your provenance collection. However, this adds some interesting requirements. It is now necessary for users to specify policies on what they do and don’t want to reveal. This has shown to be difficult in practice. Furthermore, the storage of confidential data requires better security than the storage of public (if high integrity) data. The problem quickly turns into digital right management, which is easier said than implemented. I believe that controlled disclosure and controlled collection are fundamentally different approaches, and the conscientious privacy research must choose the latter.

I still believe that provenance can learn quite a lot from Trusted Computing, and vice-versa. In particular, the concept of a “root of trust” – the point at which your trust in a computing system started and the element which you may have no ability to assure – is relevent. Provenance data also must start somewhere – the first element in the history of a data item, and the trusted agent used to record it. Furthermore, the different types of root of trust are relevent: provenance is reported just like attestations report platform state. In trusted computing we have a “root of trust for reporting” and perhaps we also need one in provenance. The same is true for measurement of provenance data, and storage. Andrew Martin and I wrote about some of this in our paper at TaPP last year but there is much more to do. Could TCG attestation conform with the Open Provenance Model? Can we persuade those working in operating system provenance that the rest of the trusted computing base – the BIOS, bootloader, option roms, SMM, etc – also need to be recorded as provenance? Can the provenance community show us how to query our attested data, or make sense of a Trusted Network Connect MAP database?

Finally, one of the most interesting short talks was by Devan Donaldson, who studied whether or not provenance information actually made data more trustworthy. He performed a short study of various academic researchers, using structured interviews, and found (perhaps unsurprisingly) that yes, some provenance information really does improve the perception of trustworthiness in scientific data. He also found that a key factor in addition to provenance was the ability to use and query the new data. While these results are what we might expect, they do confirm the theory that provenance can be used to enhance perceived trustworthiness, at least in an academic setting. Whether it works outside academia is a good question: could provenance of the climategate data has reassured the press and the public?

on an unfortunate tension

It’s frustrating when you’re not allowed to use electronic devices during the first and last fifteen minutes of a flight – sometimes much longer. I rather resent having to carry paper reading material, or to stare at the wall in those periods. On today’s flight, they even told us to switch off e-book readers.

E-book readers! Don’t these people realise that the whole point of epaper is that you don’t turn it off: it consumes a minimal amount of power, so that the Kindle can survive a month on a single charge. It has no ‘off’ switch per se, its slide switch simply invoking the “screen saver” mode. This doesn’t change the power consumption by much: it just replaces the on-screen text with pictures, and disables the push buttons.

And the answer is that of course they don’t know this stuff. Why would they? Indeed, it would be absurd to expect a busy cabin attendant to be able to distinguish, say, an ebook reader from a tablet device. If we accept for a moment the shaky premise that electronic devices might interfere with flight navigation systems, then we must accept that the airlines need to ensure that as many as possible of these are swiched off – even those with no off switch to speak of, whose electromagnetic emissions would be difficult to detect at a distance of millimetres.

Of course, this is a safety argument, but much the same applies to security. Even the best of us would struggle to look at a device, look at an interface, and decide whether it is trustworthy. This, it seems to me, is a profound problem. I’m sure evolutionary psychologists could tell us in some detail about the kind of risks we are adapted to evaluate. Although we augment those talents through nurture and education, cyber threats look different every day. Children who have grown up in a digital age will have developed much keener senses for evaluating cyber-goodness than those coming to these things later in life, but we should not delude ourselves into thinking this is purely a generational thing.

People have studied the development of trust, at some length. Although the clues for trusting people seem to be quite well established, we seem to be all over the place in deciding whether to trust an electronic interface – and will tend to do so on the basis of scant evidence. (insert citations here). That doesn’t really bode well for trying to improve the situation. In many ways, the air stewardess’s cautionary approach has much to commend it, but the adoption of computing technology always seems to have been led by a ‘try it and see’ curiosity, and we destroy that at out peril.

Explaining the new rules on cookies

The European Union recently tightened the e-Privacy Directive (pdf of the full legislation), requiring user consent for the storage of cookies on websites. You could be forgiven for thinking that this is a good thing: long-lived cookies can be something of a menace, as they allow your behaviour to be tracked by websites. This kind of tracking is used for “good” things such as personalization and sessions management, as well as “bad” things like analytics and personalised marketing, which often involve sharing user details with a third-party.

However, what this legislation is certainly not going to do is stop these cookies from existing. It seems very difficult to enforce, and many websites are likely to operate an opt-out rather than opt-in consent model, no matter what the directive says.  Instead, I suspect it’s going to force conscientious (aka public sector) websites to require explicit user consent for perfectly reasonable requests to accept cookies. This well-meaning (but probably futile) legislation therefore raises the practical question: how does one ask a user for permission to store cookies?

One approach which I’m prepared to bet wont work is that taken by the UK Information Commissioner’s Office. Here’s what they display to users at the top of each screen:

The Information Commissioner's Office cookie consent form

In text:

“On 26 May 2011, the rules about cookies on websites changed. This site uses cookies. One of the cookies we use is essential for parts of the site to operate and has already been set. You may delete and block all cookies from this site, but parts of the site will not work. To find out more about cookies on this website and how to delete cookies, see our privacy notice.”

Before going further, I think it’s important to say that this is not really a criticism of the ICO website. Indeed, this is a logical approach to take when looking for user consent. The reason for the box is shown and the notice is fairly clear and concise. However, I have the following problems with it, to name just a few:

  • Cookies are not well understood by users, and probably not even the target audience of the ICO website.  Can they provide informed consent without understanding what a cookie is?
  • Why does this site use cookies?  All that this box says is that “parts of the site will not work” if cookies are blocked.  Is any user likely to want to block these cookies with this warning?  If not, why bother with the warning at all?
  • The site operates both an opt-in and an opt-out policy.  I find this surprising and a little bit confusing.  If it was considered reasonable to not warn users about the first cookie, why are the others different?
  • To really understand the question, I am expected to read the full privacy policy.  As far as privacy policies goes, this is a fairly good one, but I’m still not going to read all 1900 words of it.  I’m at the website for other reasons (to read about Privacy Impact Assessments, as it happens).
If this is the best that the Information Commissioner’s Office can do, what chance do the rest of us have?  More to the point, how does anyone obtain informed user consent for cookies without falling into the same traps?  Without a viable solution, I fear this EU legislation is going to have no impact whatsoever on those websites which do violate user privacy expectations and worse, it will punish law-abiding websites with usability problems.