Where is all the nodejs malware?

We’re using nodejs extensively in our current research project – webinos – and I have personally enjoyed programming with it and the myriad of useful 3rd-party modules available online.

However, I’ve always been concerned about the ease at which new modules are made available and may be integrated into bigger systems.  If I want to create a website that supports QR code generation, for example, as a developer I might do the following:

  1. Visit google and search for “nodejs qrcode”.  The first result that comes up is this module – https://github.com/soldair/node-qrcode .  From a brief look at the github page, it seems to do exactly what I want.
  2. Download the module locally using ‘npm install qrcode’.  This fetches the module from the npmjs registry and then installs it using the process defined in the package.json file bundled with this module.
  3. Test the module to see how it works, probably using the test cases included in the module download.
  4. Integrate the module into webinos and then add it to webinos’ package.json file.
  5. When the changes make their way into the main project, everyone who downloads and installs webinos will also download and install the qrcode module.

I’m going to go out on a limb and suggest that this is a common way of behaving.  So what risk am I (and anyone developing with nodejs modules) taking?

If you’re using nodejs in production, you’re putting it on a web server.  By necessity, you are also also giving it access to your domain certificates and private keys.  You may also be running nodejs as root (even though this is a bad idea).  As such, that nodejs module (which has full access to your operating system) can steal those keys, access your user database, install further malware and take complete control of your webserver.  It can also take control of the PCs you and your fellow developers use every day.

The targets are juicy and the protection is minimal.

And yet, so far, I have encountered no malware (or at least none that I know about).  Some modules have been reported, apparently, but not many.   Why is that?

It could partly be because the npmjs repository offers a way for malware to be reported and efficiently removed.  Except that it doesn’t.  It may do so informally, but it’s not obvious how one might report malware, and there’s no automatic revocation mechanism or update system for already-deployed modules.

It could be that the source code for most modules is open and therefore malware authors dare not submit malicious modules for fear of being exposed, and those that do rapidly are.  Indeed, in the case of the qrcode module (and most nodejs modules) I can inspect the source code to my heart’s content.  However, the “many eyes” theory of open source security is known to be unreliable and it is unreasonable to suppose that this would provide any level of protection for anything but the most simple of modules.

I can only assume, therefore, that there is little known nodejs malware because the nodejs community are all well-intentioned people.  It may also be because developers who use nodejs modules form a relationship with the developer of the module and therefore establish enough trust to rely on their software.

However, another way of putting it is: nobody has written any yet.

The problem isn’t unique – any third party software could be malicious, not just nodejs modules – but the growing popularity of nodejs makes it a particularly interesting case.  The ease at which modules can be downloaded and used, in combination with their intended target being highly privileged, is cause for concern.

Disagree?  Think that I’ve missed something?  Send me an email – details here.

Update – I’ve blogged again about this subject over at webinos.org

Do Garfinkel’s design patterns apply to the web?

A widely cited publication in usable security research is Simson L. Garfinkel’s thesis: “Design Principles and Patterns for Computer Systems That Are Simultaneously Secure and Usable”.  In Chapter 10 he describes six principles and about twenty patterns which can be followed in order to align security and usability in system design.

We’ve been referring to these patterns throughout the webinos project when designing the system and security architecture.  However, it’s interesting to note that the web (and web applications) actually directly contradict many of them.  Does this make the web insecure?  Or does it suggest that the patterns and principles are inadequate?  Either way, in this blog post I’m going to explore the relationship between some of these principles and the web.

Continue reading

webinos secure storage: a cross-platform dilemma.

Encrypted storage for sensitive data and credentials is an obvious requirement for any system with pretences towards being secure. As part of the webinos project, we have been thinking about solutions to this problem which work across multiple platforms.

As a brief recap: the webinos project aims to design and deliver an open source web runtime for four types of device: smartphones, media centres, PCs and in-car computers.  It will provide a set of standard JavaScript APIs for accessing device features from web applications, as well as synchronising data and providing a “seamless” end user experience.  We’re working on it with over 20 other companies and are primarily researching the security and privacy aspects of the system.  More details are available on our website: http://www.cs.ox.ac.uk/projects/webinos/

In webinos we think we have (at least) the following data to protect:

Continue reading

Webinos versus Meego

One of the systems security projects we’re working on in Oxford is webinos – a secure, cross-device web application environment.   Webinos will provide a set of standard APIs so that developers who want to use particular device capabilities – such as location services, or media playback – don’t need to customise their mobile web app to work on every platform.  This should help prevent the fragmentation of the web application market and is an opportunity to introduce a common security model for access control to device APIs.  Webinos is aimed at mobile phones, cars, smart TVs and PCs, and will probably be implemented initially as a heavy-weight web browser plugin on Android and other platforms.

By a staggering coincidence, the Meego project has a similar idea and a similarly broad ranges of devices it intends to work on.  However, Meego is aimed at native applications, and is built around the Qt framework.  Meego is also a complete platform rather than a browser plugin, containing a Linux kernel.  Meego requires that all applications are signed, and can enforce mandatory access controls through the SMACK Linux Security Module.

In terms of security, these two projects have some important differences.  Meego can take advantage of all kinds of interesting trusted infrastructure concepts, including Trusted Execution Environments and Trusted Platform Modules, as it can instrument the operating system to support hardware security features.  Meego can claim complete control of the whole platform, and mediate all attempts to run applications, checking that only those with trusted certificates are allowed (whitelisting).  Webinos has neither of these luxuries.  It can’t insist on a certain operating system (in fact, we would rather it didn’t) and can only control access to web applications, not other user-space programs.  This greatly limits the number of security guarantees we can make, as our root of trust is the webinos software itself rather than an operating system kernel or tamper-proof hardware.

This raises an interesting question.  If I am the developer of a system such as webinos, can I provide security to users – who may entrust my system with private and valuable data – without having full control of the complete software stack?  Is the inclusion of a hardened operating system necessary for me to create a secure application?  Is it reasonable for me to offload this concern to the user and the user’s system administrator (who are likely to be the same person?)

While it seems impractical for developers to ship an entire operating system environment with every application they create, isn’t this exactly what is happening with the rise of virtualization?


Experiences at TaPP’11

On Monday and Tuesday this week I attended the third “Theory and Practice of Provenance” workshop in Crete. The event was a great success: lively discussion from people presenting interesting and practical work.   For those who don’t know about Provenance, here’s a snappy definition:

‘Provenance’ or ‘lineage’ generally refers to information that ‘helps determine the derivation history of a data product, starting from its original sources’ . In other words, a record of where data came from and how it has been processed.

Provenance applies to many different domains, and at the TaPP’11 workshop there were researchers working on theoretical database provenance, scientific workflows, practical implementation issues, systems provenance (who want to collect provenance at the operating system level) as well as a few security people. I presented a short paper on collecting provenance in clouds, which got some useful feedback.

At the end of the event we ended with a debate on “how much provenance should we store” – with most people sitting somewhere between two extremes: either we should store just the things we think are most important to our queries, or we store everything that could possible impact what we are doing. The arguments on both side were good: there was a desire to avoid collecting too much useless data, as this slows down search and has an attached cost in terms of storage and processing. On the other hand, the point was made that we didn’t actually know how much provenance was enough, and that if we don’t collect all of it, we could come back and find we missed something. Considering the cheapness of storage and processing power, some believe that the overhead was unimportant. As a security researcher interested in trusted provenance, the “collect everything” approach seemed like my cup of tea. If the collecting agent was trusted and could attest to its proper behaviour, provenance information could be made much more tamper-resistant.

However, from the perspective of someone involved in privacy and looking at storage of context (which is a part of provenance), the preservation of privacy seemed to be an excellent reason not to collect everything. For example, I suspect that academic researchers don’t want to store all their data sources: what if you browsed Wikipedia for an overview of a subject area, and that was forever linked with your research paper? Similarly, full provenance during computation might reveal all the other programs you were using, many of which you might not want to share with your peers. Clearly some provenance information has to stay secret.

The rebuttal to this point was that this was an argument for controlled disclosure rather than controlled collection. I think this argument can occur quite often. From a logical perspective (considering only confidentiality) it might be enough to apply access controls and limit some of your provenance collection. However, this adds some interesting requirements. It is now necessary for users to specify policies on what they do and don’t want to reveal. This has shown to be difficult in practice. Furthermore, the storage of confidential data requires better security than the storage of public (if high integrity) data. The problem quickly turns into digital right management, which is easier said than implemented. I believe that controlled disclosure and controlled collection are fundamentally different approaches, and the conscientious privacy research must choose the latter.

I still believe that provenance can learn quite a lot from Trusted Computing, and vice-versa. In particular, the concept of a “root of trust” – the point at which your trust in a computing system started and the element which you may have no ability to assure – is relevent. Provenance data also must start somewhere – the first element in the history of a data item, and the trusted agent used to record it. Furthermore, the different types of root of trust are relevent: provenance is reported just like attestations report platform state. In trusted computing we have a “root of trust for reporting” and perhaps we also need one in provenance. The same is true for measurement of provenance data, and storage. Andrew Martin and I wrote about some of this in our paper at TaPP last year but there is much more to do. Could TCG attestation conform with the Open Provenance Model? Can we persuade those working in operating system provenance that the rest of the trusted computing base – the BIOS, bootloader, option roms, SMM, etc – also need to be recorded as provenance? Can the provenance community show us how to query our attested data, or make sense of a Trusted Network Connect MAP database?

Finally, one of the most interesting short talks was by Devan Donaldson, who studied whether or not provenance information actually made data more trustworthy. He performed a short study of various academic researchers, using structured interviews, and found (perhaps unsurprisingly) that yes, some provenance information really does improve the perception of trustworthiness in scientific data. He also found that a key factor in addition to provenance was the ability to use and query the new data. While these results are what we might expect, they do confirm the theory that provenance can be used to enhance perceived trustworthiness, at least in an academic setting. Whether it works outside academia is a good question: could provenance of the climategate data has reassured the press and the public?

Explaining the new rules on cookies

The European Union recently tightened the e-Privacy Directive (pdf of the full legislation), requiring user consent for the storage of cookies on websites. You could be forgiven for thinking that this is a good thing: long-lived cookies can be something of a menace, as they allow your behaviour to be tracked by websites. This kind of tracking is used for “good” things such as personalization and sessions management, as well as “bad” things like analytics and personalised marketing, which often involve sharing user details with a third-party.

However, what this legislation is certainly not going to do is stop these cookies from existing. It seems very difficult to enforce, and many websites are likely to operate an opt-out rather than opt-in consent model, no matter what the directive says.  Instead, I suspect it’s going to force conscientious (aka public sector) websites to require explicit user consent for perfectly reasonable requests to accept cookies. This well-meaning (but probably futile) legislation therefore raises the practical question: how does one ask a user for permission to store cookies?

One approach which I’m prepared to bet wont work is that taken by the UK Information Commissioner’s Office. Here’s what they display to users at the top of each screen:

The Information Commissioner's Office cookie consent form

In text:

“On 26 May 2011, the rules about cookies on websites changed. This site uses cookies. One of the cookies we use is essential for parts of the site to operate and has already been set. You may delete and block all cookies from this site, but parts of the site will not work. To find out more about cookies on this website and how to delete cookies, see our privacy notice.”

Before going further, I think it’s important to say that this is not really a criticism of the ICO website. Indeed, this is a logical approach to take when looking for user consent. The reason for the box is shown and the notice is fairly clear and concise. However, I have the following problems with it, to name just a few:

  • Cookies are not well understood by users, and probably not even the target audience of the ICO website.  Can they provide informed consent without understanding what a cookie is?
  • Why does this site use cookies?  All that this box says is that “parts of the site will not work” if cookies are blocked.  Is any user likely to want to block these cookies with this warning?  If not, why bother with the warning at all?
  • The site operates both an opt-in and an opt-out policy.  I find this surprising and a little bit confusing.  If it was considered reasonable to not warn users about the first cookie, why are the others different?
  • To really understand the question, I am expected to read the full privacy policy.  As far as privacy policies goes, this is a fairly good one, but I’m still not going to read all 1900 words of it.  I’m at the website for other reasons (to read about Privacy Impact Assessments, as it happens).
If this is the best that the Information Commissioner’s Office can do, what chance do the rest of us have?  More to the point, how does anyone obtain informed user consent for cookies without falling into the same traps?  Without a viable solution, I fear this EU legislation is going to have no impact whatsoever on those websites which do violate user privacy expectations and worse, it will punish law-abiding websites with usability problems.