Secure Outsourced Association Rule Mining using Homomorphic Encryption
Abstract
Several techniques are used in data analysis, where frequent itemset mining and association rule mining are very popular among them. The motivation for ‘Data Mining as a Service’ (DMaaS) paradigm is that when the data owners are not capable of doing mining tasks internally they have to outsource the mining work to a trusted third party. Multiple data owners can also collaboratively mine by combining their databases. In such cases the privacy of outsourced data is a major issue. Here the context includes necessity of ‘corporate privacy’ which means other than the data, the result of mining should also preserve privacy requirements. The system proposed uses Advanced Encryption Standard (AES) to encrypt the data items before outsourcing in order to prevent the vulnerability of ‘Known Plaintext’ attack in the existing system. Fictitious transactions are inserted to the databases using k-anonymity method to counter the frequency analysis attack. A symmetric homomorphic encryption scheme is applied in the databases for performing the mining securely. Based on the experiments and findings, though the running time of proposed solution is slightly greater than the existing system, it provides better security to the data items. Since the computations tasks are performed by the third party server, consumption of resources at the data owners' side is very less.
Keywords
Download Options
Introduction
The method of evaluating data from different angles and summarizing it into favorable information is the essence of data mining [14]. Reviewing the data or mining information can be very useful to a business. The extracted information can be used to raise revenue, reduce costs, or both. The method is also called data or knowledge discovery. The term data mining refers to extracting or mining information from massive quantity of data. There are several analytical tools for analyzing data. Data mining software is one among them. The data analyzed from different angles or dimensions are categorized and the identified relationships are summarized. Identifying the inter relations or patterns from large relational databases is carried out in data mining. Data mining is particularly vulnerable to misuse. So, the requirement of protecting privacy is a major concern in data mining.
In data mining the term privacy [6] is referred for finding valuable information. There is an enormous collection of large amounts of personal data such as criminal records, purchase details, health records etc. Each individual has the right to control their personal information. When the control in privacy is lost, the major issues which can be occurred are misuse of private information, handling misinformation and granulated access to personal information.
The datasets can be analyzed by techniques called frequent itemsets mining and association rule mining [3]. The former method is used to find data items or itemsets that co-occur frequently and the latter method is used for identifying exciting association coherence between data items in heavy transaction databases. The consistency between the products in large transaction databases are found out using association rules and an event which involves one or more products (items) in the trade or domain is referred as a transaction. For example, purchasing of items by a customer in a supermarket is a transaction. A set of items is called an “itemset” and an itemset with “k” number of items is called “k-itemset”.
Conclusion
The scenario dealt here is the problem of corporate privacy in the cases where the database for mining tasks are outsourced to a third party. The reason for the need of outsourcing is that sometimes the data owners may not have the expertise or resources for performing the mining tasks internally. In this system, multiple data owners can do the mining tasks collaboratively. The paradigm of data mining as a service helps them to perform such tasks. In the stage before outsourcing the database called pre-processing, the database goes through data item encryption using AES and insertion of fictitious transactions with the help of k-anonymity method to counter the frequency analysis attack. A ciphertext tag approach is used to identify genuine and fictitious transactions which is encrypted using homomorphic encryption. The server who receives the outsourced databases from different data owners combines them and perform the mining task using Eclat algorithm. Homomorphic operations are the key to compute the support values of itemsets. The frequent itemsets are found securely and association rules can be generated from those itemsets. Based on the experiments and findings, though the time taken for encryption shows some increase compared to the existing system, the strength of encryption has increased from 𝑂(𝑛) to 2 128 .