Data-driven modelling and probabilistic analysis of software usage
Software developers cannot always anticipate how users will actually use their software, and usage styles vary from user to user, and even from use to use for an individual user. The research question we address is how can we model and understand the ways in which users interact with software? We propose a new approach based on probabilistic, data-driven models, inferred from sets of logged user traces, and analysis using probabilistic temporal logic properties. The first problem we address is how to abstract models from sets of user traces over different time periods. This is a nuanced problem: models must encapsulate the temporal and stochastic aspects of usage, the heterogeneous and dynamic nature of users, and the temporal aspects of the interval over which the data was collected (e.g. one day, one month, etc.). Fundamental to our approach is activity patterns, which are discrete-time Markov chains. We define two new parametrised, admixture, discrete-time Markov models that include hidden and observed states and depend on a finite number of activity patterns. The second problem we address is analysis, and for that we define classes of temporal logic properties that encapsulate indicative behaviours within an activity pattern, and between activity patterns. The result is combinations of different models and properties, over different time intervals, that afford a rich set of techniques for understanding software usage. We demonstrate our results by application to user traces from a mobile app that has been used by tens of thousands of users worldwide.