# Where is all the nodejs malware?

We’re using nodejs extensively in our current research project – webinos – and I have personally enjoyed programming with it and the myriad of useful 3rd-party modules available online.

However, I’ve always been concerned about the ease at which new modules are made available and may be integrated into bigger systems.  If I want to create a website that supports QR code generation, for example, as a developer I might do the following:

1. Visit google and search for “nodejs qrcode”.  The first result that comes up is this module - https://github.com/soldair/node-qrcode .  From a brief look at the github page, it seems to do exactly what I want.
2. Download the module locally using ‘npm install qrcode’.  This fetches the module from the npmjs registry and then installs it using the process defined in the package.json file bundled with this module.
3. Test the module to see how it works, probably using the test cases included in the module download.
4. Integrate the module into webinos and then add it to webinos’ package.json file.
5. When the changes make their way into the main project, everyone who downloads and installs webinos will also download and install the qrcode module.

I’m going to go out on a limb and suggest that this is a common way of behaving.  So what risk am I (and anyone developing with nodejs modules) taking?

If you’re using nodejs in production, you’re putting it on a web server.  By necessity, you are also also giving it access to your domain certificates and private keys.  You may also be running nodejs as root (even though this is a bad idea).  As such, that nodejs module (which has full access to your operating system) can steal those keys, access your user database, install further malware and take complete control of your webserver.  It can also take control of the PCs you and your fellow developers use every day.

The targets are juicy and the protection is minimal.

And yet, so far, I have encountered no malware (or at least none that I know about).  Some modules have been reported, apparently, but not many.   Why is that?

It could partly be because the npmjs repository offers a way for malware to be reported and efficiently removed.  Except that it doesn’t.  It may do so informally, but it’s not obvious how one might report malware, and there’s no automatic revocation mechanism or update system for already-deployed modules.

It could be that the source code for most modules is open and therefore malware authors dare not submit malicious modules for fear of being exposed, and those that do rapidly are.  Indeed, in the case of the qrcode module (and most nodejs modules) I can inspect the source code to my heart’s content.  However, the “many eyes” theory of open source security is known to be unreliable and it is unreasonable to suppose that this would provide any level of protection for anything but the most simple of modules.

I can only assume, therefore, that there is little known nodejs malware because the nodejs community are all well-intentioned people.  It may also be because developers who use nodejs modules form a relationship with the developer of the module and therefore establish enough trust to rely on their software.

However, another way of putting it is: nobody has written any yet.

The problem isn’t unique – any third party software could be malicious, not just nodejs modules – but the growing popularity of nodejs makes it a particularly interesting case.  The ease at which modules can be downloaded and used, in combination with their intended target being highly privileged, is cause for concern.

Disagree?  Think that I’ve missed something?  Send me an email – details here.

# Do Garfinkel’s design patterns apply to the web?

A widely cited publication in usable security research is Simson L. Garfinkel’s thesis: “Design Principles and Patterns for Computer Systems That Are Simultaneously Secure and Usable”.  In Chapter 10 he describes six principles and about twenty patterns which can be followed in order to align security and usability in system design.

We’ve been referring to these patterns throughout the webinos project when designing the system and security architecture.  However, it’s interesting to note that the web (and web applications) actually directly contradict many of them.  Does this make the web insecure?  Or does it suggest that the patterns and principles are inadequate?  Either way, in this blog post I’m going to explore the relationship between some of these principles and the web.

# Turing’s Cathedral book jacket

The punched card code on the cover of the new book Turing’s Cathedral doesn’t make sense as a zone punch.

# Lost Treasures

Some say computer science rediscovers old ideas every twenty years or so. Justin mentioned it last week in the context of explicit vs implicit information flows. I was reminded again today when I saw a call for papers from IEEE Security & Privacy titled ‘Lost Treasures of Computer Security & Privacy’ [http://www.computer.org/portal/web/computingnow/spcfp6] for a special issue next year. The list of topics the editors seek makes for fascinating reading, but I wish to note a different, practical reason.

When tracking down a reference a few months ago, I ran into an example of what librarians call a ‘black hole’ or ‘dark age’: periods of history inaccessible due to changing technology. The document I was looking for contains hearings before the U.S. Senate Select Committee on Small Business, 85th Congress, in 1957. But when I went to that room in the regional depository library, all I found were pieces of shelving on the floor. The microform collection is being digitised and decades of microfilm are ‘temporarily unavailable’, where temporary may mean upwards of a year or more.

What other instances of forgotten lore have you personally encountered?

# How not to look like a spearphishing attack

This is not a protip on how to make your spearphishing attacks more effective.

Today I received an email on my work account. It happens to be at a large defence contractor, and that’s relevant. Because spear phishing attacks are a primary threat in my environment, and they look just like this:

From: [redacted] SPAWARSYSCEN-ATLANTIC, 987654 [[redacted]@navy.mil]
To:
From:
Attachment: Newsletter_1.docx

All,

Attached is our latest Newsletter.  Please review.

r/
[name redacted]
ASSO/Security Specialist
SSC-Atlantic SSO
[telephone redacted]
[fax redacted]
https://iweb.spawar.navy.mil/depts/[redacted]
For Official Use Only - Privacy Sensitive - Any misuse or unauthorized disclosure may result in both civil or criminal penalties.

A few things stand out in that email: the empty To: and Cc: fields, the extremely generic filename of the attachment, the fact that the attachment is, or at least appears to be, a Word document; and in the body of the message, the odd capitalisation of ‘Newsletter’, the imperative phrase ‘Please review.’

I took the precaution of examining the mail headers in detail.  Thanks, Microsoft, for making that difficult to do.  The Received: header chain looked reassuring; it came from the expected place.  Interestingly, I only now noticed that the email was digitally signed.  The icon is so tiny I overlooked it.  Thank you, Microsoft, again for hiding that piece of important information from me.

I was still wary about the attachment, though.  After a suitable period of contemplation, I clicked on it.  The expected warning message from the OS appeared: “you should not open files received from unknown senders”.  Why show me that warning message when it knows that the message is digitally signed?  Instead of saying it’s from an unknown sender, why not show me the certificate path of the digital signature?  My future career prospects flashing before my eyes, I hesitated.  Instead of opening the attachment at once, I decided to try scanning it first with my computer’s anti-virus programme.

And promptly received a demand for the Administrator password—which I don’t have—because apparently that’s not something users are allowed to do.

So my question for the community is, how can this problem be solved?  Crippling suspicion can’t be good for the efficiency of organisations.

P.S. It was not a spear-phishing attack.  I had a nice conversation with the sender later and we comiserated over the state of trust on the internet.

# Privacy in the social world – the Google+ way

Facebook has faced criticisms  with regards to privacy issues. But how is Google+ attempting to address that? Well, users can put friends in ‘circles’. These circles are more of partitions rather access groups. Default circles include ‘family’, ‘acquaintances’, and ‘friends’. You can share posts, pictures etc with only selected circles.

From my personal experience in the last few days that I have played with Google+, it seems straight forward enough to drag and drop friends into different circles and create new circles if necessary. If it works according to how I understand it is supposed to work, then I think many of the facebook privacy issues will be addressed. This, of course, is ignoring that Google+ is integrated with Gmail and Google search uses Gmail and Google+ data for its own competitive advantage.

What doesn’t surprise me, though, is that in the 26 days that Google+ has been operational the majority of the 18+ million users who have adopted (well, mostly likely just trying it out) it are tech users. Of course there may be a number of reasons for this but in my view, I think that tech users understand issues around privacy and have some ideas of how they can achieve privacy in their various social interactions (please note how I loosely use the term ‘privacy’ in this article). When a product that promises and attempts to address privacy concerns, tech users are likely to adopt it early on and experiment with it.

But if my assumption here is correct, how long will it take for the almost 1 billion facebook users to realise and jump ship? Oh, wrong question. Are facebook users actually concerned about their privacy? If they are, will they know when a better product comes on the market (not that Google+ is that product)? And most importantly, will they jump ship once that product all privacy campaigners are yearning for? In attempting to answer these questions, let’s ignore the effect of externalities. I would argue that since facebook users DO NOT understand or care about privacy (I would like to see a study that nullifies this claim), a robust and privacy enhanced social networking tool is unlikely to take off, at least among the general population.

# disk erasure

A recent pointer to Peter Guttman’s new book on security engineering (looks good, by the way) reminds me that Guttman’s name is associated with the woeful tale of disk erasure norms.

The argument goes this way: ‘normal’ file erases (from your windows shell, say) merely change pointers in tables, and do not remove any data from the disk.  A fairly unsophisticated process will recover your ‘deleted’ files.  Wiser people ensure that the file is deleted from the media itself – by writing zeros over the sectors on the disk that formerly contained the file.  Guttman’s argument was that because of minor variations in disk head alignment with the platters, this is insufficient to ensure the complete removal of the residual magnetic field from the former data.  There is a possibility that, with the right equipment, someone could recover the formerly-present files.  So he has an algorithm involving, I think, 35 passes, writing various patterns, calculated to destroy any remaining underlying data.

Now, the problem appears/appeared real enough: various government standards have, for decades now, ruled that magnetic media which has held classified material cannot be declassified but must be destroyed  before leaving secure custody.  Whether anyone has ever managed to recover a non-trivial amount of data from a once-zeroed disk is much less clear: as far as I know, there’s not a lot in the open literature to suggest it’s possible, and none of the companies specializing in data recovery will offer it as a service.  Furthermore, since Guttman did his original work, disk design has evolved (and the ‘size’ of the bits on the disk become so small that any residual effect is going to be truly minimal), and disk manufacturers have built a ‘secure erase’ into their controllers for quite a few years now.  Even better, the new generation of self-encrypting drives can be rendered harmless by the deletion of just one key (don’t do this by accident!).

Yet, the perception persists that the simple solutions are insufficient. Let us leave aside Government security standards and think simply of commercial risk.  Multi-pass erasure is downright time-consuming.  You can buy a disk-shredding service –  but this attracts quite a fee.  So it is not uncommon simply to archive used disks in a warehouse somewhere (with or without a single zeroing pass, I suppose).  How long you would keep those disks for is unclear: until their data ceases to be valuable, I suppose.  But without a detailed inventory of their contents, that cannot easily be determined.  So perhaps you have to keep them forever.

My simple question is: which attracts the lower risk (and/or the lower total predicated cost)? (a) Zeroing a disk and putting it in a skip, or (b) Warehousing it until the end of the lifetime of the data it holds?  You can postulate whatever adversary model you wish.  The answer is not obvious to me.  And if we can’t make a simple determination about risk in this case (because, frankly, the parameters are all lost in the noise), what possible chance do we have of using risk calculations to make decisions in the design of more complex systems?

# email retention

Someone in the group suggested a blog post on email retention.  It’s a good topic, because it tracks the co-evolution of technology and process.

The Evolution

Back in the day, storage was expensive – relative to the cost of sending an email, anyway.   To save space, people would routinely delete emails when they were no longer relevant.

Then storage got cheap, and even though attachments got bigger and bigger, storing email ceased to be a big deal.  By the late 1990s, one of my colleagues put it to me that the time you might spend deciding whether or not to delete something already cost more than the cost of storing it forever.  I have an archive copy of every email I’ve sent or received – in professional or personal capacities – since about 1996 (even most or all of the spam).

Happily, one other technology helped with making email retention worthwhile: indexing.  This, too, is predicated on having enough storage to store the index, and having enough CPU power to build the index.  All of this we now have.

However, a third force enters the fray: lawyers started discovering the value of emails as evidence (even if it’s rubbish as evidence, needing massive amounts of corroboration if it is not to be forged trivially). And many people – including some senior civil servants, it seems – failed to spot this in time, and were very indiscreet in their subsequently- subpoenaed communications.

As a result, another kind of lawyers – corporate lawyers – issued edicts which first required, and then forced, employees of certain companies to delete any email more than six months old.  That way, they cannot be ‘discovered’ in an adverse legal case, because they have already been erased.

Never mind that many people’s entire working lives are mediated by email today: the email is an accurate and (if permitted) complete record of decisions taken and the process by which that happened.  Never mind that emails have effectively replaced the hardbound notebooks that many engineers and scientists would use to retain their every thought and discussion (though in many places good lab practice retains such notebooks).  Never mind that although it is creaking under the present strain of ‘spam’ and ‘nearly spam’ (the stuff that I didn’t want, but was sent by coworkers, not random strangers), we simply haven’t got anything better.

The state-of-the art

So now, in those companies, there is no email stored for more than six months, yes?  Well, no.  Of course not.  There are lots of emails which are just too valuable to delete.  And so people extract them from the mail system and store them elsewhere.  Or forward them to private email accounts.  There are, and always will be, many ways to defeat the corporate mail reaper.  The difference is that the copies are not filed systematically, are not subject to easy search by the organisation, and will probably not be disclosed to regulators or in other legal discovery processes.  This is the state-of-the art in every such organisation I’ve encountered (names omitted to protect the … innocent?).

Sooner or later, a quick-witted external lawyer is going to guess that this kind of informal archiving might help their case, and is going to manage to dig deeper into the adversary’s filestores and processes.  When they find some morsels of beef secreted in unlikely places, there will be a rush of corporate panic.

The solution

It’s easy to spot the problem.  It’s much harder to know what to do about it.  Threatening the employees isn’t very productive – especially if the ‘security’ rule is at odds with the goal of getting their work done.  Making it a sacking offence, say, to save an email outside the corporate mail  system is just going to make people more creative about what they save, and how they save it.  Unchecked retention, on the other hand, will certainly leave the organisation ‘remembering’ things it would much rather it had ‘forgotten’.

At least it would be better if the ‘punishment’ matched the crime: restricting the retention of email places the control in the wrong place.   It would be much better to reserve the stiff penalties for those engaged in libel, corporate espionage, anticompetitive behaviour, and the rest.  Undoubtedly, that would remain a corporate risk, and a place where prevention seems better than cure: the cost to the organisation may be disproportionately greater than any cost that can be imposed upon the individual.  But surely it’s the place to look, because in the other direction lies madness.

# Spectacular Fail!

Step 1.  Install “Windows XP Mode” using Microsoft Virtual PC on Microsoft Windows 7.

Step 2. Windows XP warns that there is no anti-virus program installed.  Use the supplied Internet Explorer 6 to try to download Microsoft Security Essentials.  Browsing to the page fails.   I happen to know that this is a manifestation of a browser incompatibility.

Step 3. Use the Bing search box on the default home page of Internet Explorer 6 to search for “Internet Explorer 9″.   You have to scroll down a long way before finding a real Microsoft link: who knows how much Malware the earlier links would serve up?!

Words fail me, really.

# Experiences at TaPP’11

On Monday and Tuesday this week I attended the third “Theory and Practice of Provenance” workshop in Crete. The event was a great success: lively discussion from people presenting interesting and practical work.   For those who don’t know about Provenance, here’s a snappy definition:

‘Provenance’ or ‘lineage’ generally refers to information that ‘helps determine the derivation history of a data product, starting from its original sources’ . In other words, a record of where data came from and how it has been processed.

Provenance applies to many different domains, and at the TaPP’11 workshop there were researchers working on theoretical database provenance, scientific workflows, practical implementation issues, systems provenance (who want to collect provenance at the operating system level) as well as a few security people. I presented a short paper on collecting provenance in clouds, which got some useful feedback.

At the end of the event we ended with a debate on “how much provenance should we store” – with most people sitting somewhere between two extremes: either we should store just the things we think are most important to our queries, or we store everything that could possible impact what we are doing. The arguments on both side were good: there was a desire to avoid collecting too much useless data, as this slows down search and has an attached cost in terms of storage and processing. On the other hand, the point was made that we didn’t actually know how much provenance was enough, and that if we don’t collect all of it, we could come back and find we missed something. Considering the cheapness of storage and processing power, some believe that the overhead was unimportant. As a security researcher interested in trusted provenance, the “collect everything” approach seemed like my cup of tea. If the collecting agent was trusted and could attest to its proper behaviour, provenance information could be made much more tamper-resistant.

However, from the perspective of someone involved in privacy and looking at storage of context (which is a part of provenance), the preservation of privacy seemed to be an excellent reason not to collect everything. For example, I suspect that academic researchers don’t want to store all their data sources: what if you browsed Wikipedia for an overview of a subject area, and that was forever linked with your research paper? Similarly, full provenance during computation might reveal all the other programs you were using, many of which you might not want to share with your peers. Clearly some provenance information has to stay secret.

The rebuttal to this point was that this was an argument for controlled disclosure rather than controlled collection. I think this argument can occur quite often. From a logical perspective (considering only confidentiality) it might be enough to apply access controls and limit some of your provenance collection. However, this adds some interesting requirements. It is now necessary for users to specify policies on what they do and don’t want to reveal. This has shown to be difficult in practice. Furthermore, the storage of confidential data requires better security than the storage of public (if high integrity) data. The problem quickly turns into digital right management, which is easier said than implemented. I believe that controlled disclosure and controlled collection are fundamentally different approaches, and the conscientious privacy research must choose the latter.

I still believe that provenance can learn quite a lot from Trusted Computing, and vice-versa. In particular, the concept of a “root of trust” – the point at which your trust in a computing system started and the element which you may have no ability to assure – is relevent. Provenance data also must start somewhere – the first element in the history of a data item, and the trusted agent used to record it. Furthermore, the different types of root of trust are relevent: provenance is reported just like attestations report platform state. In trusted computing we have a “root of trust for reporting” and perhaps we also need one in provenance. The same is true for measurement of provenance data, and storage. Andrew Martin and I wrote about some of this in our paper at TaPP last year but there is much more to do. Could TCG attestation conform with the Open Provenance Model? Can we persuade those working in operating system provenance that the rest of the trusted computing base – the BIOS, bootloader, option roms, SMM, etc – also need to be recorded as provenance? Can the provenance community show us how to query our attested data, or make sense of a Trusted Network Connect MAP database?

Finally, one of the most interesting short talks was by Devan Donaldson, who studied whether or not provenance information actually made data more trustworthy. He performed a short study of various academic researchers, using structured interviews, and found (perhaps unsurprisingly) that yes, some provenance information really does improve the perception of trustworthiness in scientific data. He also found that a key factor in addition to provenance was the ability to use and query the new data. While these results are what we might expect, they do confirm the theory that provenance can be used to enhance perceived trustworthiness, at least in an academic setting. Whether it works outside academia is a good question: could provenance of the climategate data has reassured the press and the public?