Prediction of Monetary Penalties for Data Protection Cases in Multiple Languages

Ceross, Aaron

Prediction of Monetary Penalties for Data Protection Cases in Multiple Languages

Aaron Ceross and Tingting Zhu

Abstract

As the use of personal data becomes further entrenched in the function of societal interaction, the regulation of such data continues to grow as an important area of law. Nevertheless, it is unfortunately the case that data protection authorities have limited resources to address an increasing number of investigations. The leveraging of appropriate data-driven models, coupled with the automation of decision making, has the potential to help in such circumstances. In this paper, we evaluate machine learning models in the literature (such as Support Vector Machine (SVM), Random Forest, and Multinomial Naive Bayes (MNB) classifiers) for natural language processing in order to predict whether a monetary penalty was levied based on a description of case facts. We tested these models on a novel data set collected from the data protection authority of Macao across the three languages (i.e., Chinese, English, and Portuguese). Our experimental results show that the machine learning models provide the necessary predictability in order to automate the evaluation of data protection cases. In particular, SVM has consistent performance across three languages and achieving an AUROC of 0.725, 0.762, and 0.748 for Chinese, English, and Portuguese, respectively. We further evaluated the interpretability of the results independently for each of the languages and found that the salient texts that were identified are shared across the three languages.

Address

New York‚ NY‚ USA

Book Title

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law

Location

São Paulo‚ Brazil

Pages

185–189

Publisher

Association for Computing Machinery

Series

ICAIL '21

Year

2021

Prediction of Monetary Penalties for Data Protection Cases in Multiple Languages

Abstract

Links

See Also