Abstract:
The data retention within an organization may increase rapidly with time. In order to reduce cost of organization, they may choose a third-party storage provider. There is a leakage crisis when provider cannot be trusted. Another scenario is a dealer collects all transaction data and provides it to a data analysis company for marketing purpose. For these reasons and beyons, preserving privacy in database becomes an important issue. This paper concerns the prediction of disclosure risk in numerical database. It presents an efficient noise generation that relies on Huffman coding algorithm and builds a noise matrix that can add noise intuitively to original value. Moreover, we adopt clustering technique before generating noise. The result shows that the running time of noise generation of clustering scheme is faster than non-clustering scheme.