Privacy-preserving multi-party PCA computation on horizontally and vertically partitioned data based on outsourced QR decomposition
Abstract
Data mining has received many applications in diverse areas such as banking, marketing, healthcare and fraud detection. One of the valuable tools in data mining is principal component analysis (PCA). Computing PCA over data belonging to several data owners with respect to their privacy is a need in many industries such as healthcare. Here, we propose a privacy-preserving multi-party protocol to compute PCA over horizontally and vertically distributed data using QR matrix decomposition and homomorphic encryption. Our protocol is the first privacy-preserving PCA computation scheme which is applicable for both horizontally and vertically partitioned data and finds all of the principal components. Our protocol is secure against collusion of the data owners in the semi-honest security model. In the performance analysis, we show that in the horizontal settings increasing the number of data owners will decrease the computation overhead of each of data owners, but it will increase the communication and the computation overhead of the server. We also show that the time consumption of using our proposed scheme on Australian data set of size 690 × 14 , distributed horizontally among 50 data owners, is 4.38 s. On the Ionosphere data set of size 351 × 34 , distributed horizontally among 10 data owners, it takes 31.8 s. In the vertical distribution, the time consumption of using our scheme on Gait data set of size 48 × 321 distributed among 7 data owners and on Gastrointestinal Lesions data set of size 76 × 698 distributed among 10 data owners is 4.4 h and 15.7 h, respectively. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.