soft c-means with fixed set of memberships to be applicable over homomorphically-encrypted data

What is it about?

In order to preserve the privacy of client data on cloud, the client data should be encrypted. But the data owner usually need to do computations on the encrypted data. If the data were homomorphically-encrypted then it is possible to do only addition and multiplication on the encrypted. however, in machine learning tasks such as clustering using possibilistic c-means, the task of computing memberships and centers involves division and exponentiation. In this paper, instead of allowing the membership to be any value in [0,1], the user input set of values in [0,1] to be the set of allowed memberships. At each iteration, distances to the new clusters’ centers are determined, then the distances are compared to the initially computed and dynamically updated range of values, that divide the entire range of distances associated with each cluster center into intervals (bins), to assign appropriate soft memberships to objects. The required number of comparisons is O(log the number of discretization levels). Thus, the computation of centers and memberships is greatly simplified during execution. Also, the use of discrete values for memberships allows soft modification (increment or decrement) of the soft memberships of identified outliers and core objects instead of rough modification (setting to zero or one) in related algorithms. Experimental results on synthetic and standard test data sets verified the efficiency and effectiveness of the proposed algorithm.

Why is it important?

When building a privacy-preserving approach for a data mining algorithm, the appropriate scenario and the balance of privacy preservation, accuracy, data utility, and efficiency are both major problems. Several exiting approaches are based on approximating the formulas of centers and weights and memberships as three polynomial functions according to multivariate Taylor formula. However, they usually suffer an increase in complexity and a slight drop in accuracy. To help tackle these problems, this paper proposes a semi possibilistic clustering algorithm based on the possibilistic paradigm. The main contributions of this work can be summarized as follows: - SPCM reduces the complexity of possibilistic c-means (PCM) - SPCM avoids the division and exponentiation operations that makes SPCM applicable over homomorphically-encrypted data. - The algorithm avoids the effect of early wrong rough decisions that may be taken in [28].

The following have contributed to this page:

Mohamed Mahfouz