2023
In the age of digital transformation, enterprises are increasingly recognizing the value of external data, which originates beyond their four walls. Despite the growing number of datasets and their potential value, external data is sourced in an ad-hoc manner without clear guidelines. This leads to inconsistent sourcing decisions, characterized by a lack of clarity on the object of sourcing and the underlying data sourcing practices. Furthermore, the field of data sourcing lacks extensive research, necessitating urgent action within the Information Systems (IS) research community to bridge this gap. Considering the abovementioned research opportunities, this thesis – through three interrelated research streams – provides foundations for, analyzes, and improves data sourcing practices in the enterprise context. The contributions of the first research stream are an external data sourcing taxonomy (Essay 1), which informs sourcing decisions in an enterprise context, and a reference process to source and manage external data, which is accompanied by explicit prescriptions in the form of design principles (Essay 2). The second research stream proposes a use case-driven assessment of open corporate registers (Essay 3) and, building on the subsequent findings, a method to screen, assess, and prepare open data for use in support of companies’ open data activities (Essay 4). Finally, the third research stream reveals and elaborates on three data sourcing practices developed by companies in response to institutional pressures in the sustainability context (Essay 5). Thus, the outcomes of this thesis enable the transition from ad-hoc acquisition to well-informed, professional data sourcing approaches in the enterprise context.
Wearable devices, such as wearable activity trackers (WATs), are increasing in popularity. Although they can help improve a person's quality of life, they also raise serious privacy issues. Although security aspects of WATs have been widely studied (e.g., Bluetooth security, inference of password or biometrics), as well as privacy-related aspects such as users' attitudes and concerns, we lack knowledge about the privacy of WAT users. Indeed, the security aspects that were studied in prior work are not enough to build a realistic adversary model, as these studies focus mostly on communication protocols and not on large-scale data collection. Furthermore, previous work related to data inference by using WATs focuses on only functionalities rather than on privacy (e.g., better monitoring of activity or health to improve user experience). Moreover, these studies focus only on the inference of behavioral patterns (e.g., activities, consumption) or conditions (e.g., diseases), but none of them investigate the inference of users’ personal attributes (e.g., personality, religion, political views). In this thesis, composed of three research papers and a literature review, we contribute to the WAT security \& privacy research field by analyzing how the data of WAT users can be accessed at a large scale by many potential adversaries, by evaluating how such data can be used to infer users' personal attributes and, finally, by proposing privacy enhancing technologies (PETs) to protect their privacy. Concretely, after analyzing the current literature about WAT security \& privacy, we conduct a user-survey study to better understand the WAT user’s behaviors towards data sharing, especially with respect to third-party applications (TPAs) that can easily be used by adversaries to collect data. We then use a rigorous machine-learning approach to evaluate to what extent users’ psychological profiles (Big 5) can be inferred from WAT data, and we discuss the related consequences on the users’ privacy and society as a whole. Finally, to propose effective and likely-to-be-adopted protection mechanisms, we conduct a user-centered design study by using a participatory design methodology before analyzing and evaluating the proposed designs in order.
The growing use of complex Machine Learning (ML) models, especially in critical domains such as healthcare and finance, raises concerns about security, ethics, and comprehensibility. The field of Interpretable Machine Learning (IML) emerged to provide insights into the workings of these models. In this thesis, we cover interpretability methods for both classical ML and deep neural networks (DNNs). In terms of classical ML methods, the significant contribution of our work is MoDE, a novel technique that generates interpretable, low-dimensional visualizations for large-scale, high-dimensional datasets. MoDE stands out by preserving not only inter-data point distances but also correlations and ordinal scores, offering enhanced data understanding. We also evaluate various explanation methods for DNNs, revealing that the selection of the optimal method depends on the specific task. Moreover, we analyze the robustness of these methods against adversarial attacks, highlighting vulnerabilities and proposing novel attack strategies with sparse perturbations. These findings carry implications for enhancing the robustness of explanation methods.
Furthermore, as an application of IML, we apply IML to recommender systems, utilizing graph neural networks to achieve superior recommendation accuracy and interpretability. By leveraging user interaction subgraphs, we are able to provide in-depth interpretations for recommendations. This dual emphasis on accuracy and interpretability holds promise for enhancing recommender systems in practical business scenarios.