Analytics for characterising and measuring the naturalness of online personae
Jason R.C. Nurse‚ Arnau Erola‚ Thomas Gibson−Robinson‚ Michael Goldsmith and Sadie Creese
Introduction Currently 40 % of the world’s population, around 3 billion users, are online using cyberspace for everything from work to pleasure. While there are numerous benefits accompanying this medium, the Internet is not without its perils. In this case study article, we focus specifically on the challenge of fake (or unnatural) online identities, such as those used to defraud people and organisations, with the aim of exploring an approach to detect them. Case description In particular, through our method and case study we outline and experiment with novel analytics for characterising and measuring the naturalness of an online persona or identity; this naturalness is defined as the extent to which that persona has features similar to those expected for comparable personae online. Our case scenario involves a participant set of two types of individuals, and our aim at this stage is to use our approach to correctly characterise, and then distinguish between, these two types. Discussion and evaluation To briefly précis our case study results, we found that our method to conceptualise an individual’s complete online presence was very successful. This was undoubtedly linked to its detailed consideration of how cyberspace is typically used, while also building on our existing model of identity which has been used to aid law enforcement in identification tasks. In terms of developing effective analytics for naturalness however, improvements in our approach (e.g., features selected and nuanced metrics) are required. Moreover, the study would benefit from a larger sample size to better identify common aspects between natural personae. Conclusions Overall, the case study allowed us to explore a novel technique to characterise naturalness and to examine its utility at detecting unnatural personae. Our goal now is to build on the study’s findings in several key ways. Specifically, we aim to conduct further assessments on the criteria through which naturalness is defined, and refine our analytics and combinatorics to measure a persona’s naturalness. We will also explore clustering approaches based on complete online personae, as a means to complement our identification of naturally occurring personae types in large datasets.