| Peer-Reviewed

Efficient Anonymization Algorithms to Prevent Generalized Losses and Membership Disclosure in Microdata

Received: 11 January 2017     Accepted: 25 January 2017     Published: 22 February 2017
Views:       Downloads:
Abstract

Nowadays, data and knowledge extracted by data mining techniques represent a key asset driving research, innovation, and policy-making activities. Many agencies and organizations have recognized the need of accelerating such trends and are therefore willing to release the data they collected to other parties, for purposes such as research and the formulation of public policies. However, the data publication processes are today still very difficult. Data often contains personally identifiable information and therefore releasing such data may result privacy breaches, this is the case for the examples of micro-data, e.g., census data and medical data. This thesis studies how we can publish and share micro data in privacy-preserving manner. This present a next ensive study of this problem along three dimensions: Designing a simple, intuitive, and robust privacy model, designing an effective anonymization technique that works on sparse and high-dimensional data and developing a methodology for evaluating privacy and utility tradeoffs. Here, we present a novel technique called slicing which partitions the data both horizontally and vertically. It preserves better data utility than generalization and is more effective than bucketization in terms of sensitive attribute.

Published in American Journal of Data Mining and Knowledge Discovery (Volume 2, Issue 2)
DOI 10.11648/j.ajdmkd.20170202.13
Page(s) 54-61
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2017. Published by Science Publishing Group

Keywords

Data Anonymization, Micro Data, PPDP, Slicing

References
[1] C. Aggarwal, "On $k$ -Anonymity and the Curse of Dimensionality," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 901-909, 2005.
[2] A. Blum, C. Dwork, F. McSherry, and K. Nissim, "Practical Privacy: The SULQ Framework," Proc. ACM Symp. Principles of Database Systems (PODS), pp. 128-138, 2005.
[3] J. Brickell and V. Shmatikov, "The Cost of Privacy: Destruction of Data-Mining Utility in Anonymized Data Publishing," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 70-78, 2008.
[4] B.-C. Chen, K. LeFevre, and R. Ramakrishnan, "Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 770-781, 2007.
[5] H. Cramt'er, Mathematical Methods of Statistics. Princeton Univ. Press, 1948.
[6] I. Dinur and K. Nissim, "Revealing Information while Preserving Privacy," Proc. ACM Symp. Principles of Database Systems (PODS), pp. 202-210, 2003.
[7] C. Dwork, "Differential Privacy," Proc. Int'l Colloquium Automata, Languages and Programming (ICALP), pp. 1-12, 2006.
[8] C. Dwork, "Differential Privacy: A Survey of Results," Proc. Fifth Int'l Conf. Theory and Applications of Models of Computation (TAMC), pp. 1-19, 2008.
[9] C. Dwork, F. McSherry, K. Nissim, and A. Smith, "Calibrating Noise to Sensitivity in Private Data Analysis," Proc. Theory of Cryptography Conf. (TCC), pp. 265-284, 2006.
[10] J. H. Friedman, J. L. Bentley, and R. A. Finkel, "An Algorithm for Finding Best Matches in Logarithmic Expected Time," ACM Trans. Math. Software, vol. 3, no. 3, pp. 209-226, 1977.
[11] B. C. M. Fung, K. Wang, and P. S. Yu, "Top-Down Specialization for Information and Privacy Preservation," Proc. Int'l Conf. Data Eng. (ICDE), pp. 205-216, 2005.
[12] G. Ghinita, Y. Tao, and P. Kalnis, "On the Anonymization of Sparse High-Dimensional Data," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), pp. 715-724, 2008.
[13] Y. He and J. Naughton, "Anonymization of Set-Valued Data via Top-Down, Local Generalization," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 934-945, 2009.
[14] A. Inan, M. Kantarcioglu, and E. Bertino, "Using Anonymized Data for Classification," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE), pp. 429-440, 2009.
[15] L. Kaufman and P. Rousueeuw, "Finding Groups in Data: An Introduction to Cluster Analysis," John Wiley & Sons, 1990.
[16] D. Kifer and J. Gehrke, "Injecting Utility into Anonymized Data Sets," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 217-228, 2006.
[17] N. Koudas, D. Srivastava, T. Yu, and Q. Zhang, "Aggregate Query Answering on Anonymized Tables," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), pp. 116-125, 2007.
[18] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Incognito: Efficient Full-Domain $k$ -Anonymity," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 49-60, 2005.
[19] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Mondrian Multidimensional $k$ -Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), p. 25, 2006.
[20] K. LeFevre, D. DeWitt, and R. Ramakrishnan, "Workload-Aware Anonymization," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 277-286, 2006.
[21] N. Li, T. Li, and S. Venkatasubramanian, "$t$ -Closeness: Privacy Beyond $k$ -Anonymity and $\ell$ -Diversity," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), pp. 106-115, 2007.
[22] T. Li and N. Li, "Injector: Mining Background Knowledge for Data Anonymization," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), pp. 446-455, 2008.
[23] T. Li and N. Li, "On the Tradeoff between Privacy and Utility in Data Publishing," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 517-526, 2009.
[24] T. Li, N. Li, and J. Zhang, "Modeling and Integrating Background Knowledge in Data Anonymization," Proc. IEEE 25th Int'l Conf. Data Eng. (ICDE), pp. 6-17, 2009.
[25] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam, "$\ell$ -Diversity: Privacy Beyond $k$ -Anonymity," Proc. Int'l Conf. Data Eng. (ICDE), p. 24, 2006.
[26] D. J. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, and J. Y. Halpern, "Worst-Case Background Knowledge for Privacy-Preserving Data Publishing," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), pp. 126-135, 2007.
[27] M. E. Nergiz, M. Atzori, and C. Clifton, "Hiding the Presence of Individuals from Shared Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 665-676, 2007.
[28] P. Samarati, "Protecting Respondent's Privacy in Microdata Release," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 1010-1027, Nov./Dec. 2001.
[29] L. Sweeney, "Achieving $k$ -Anonymity Privacy Protection Using Generalization and Suppression," Int'l J. Uncertainty Fuzziness and Knowledge-Based Systems, vol. 10, no. 6, pp. 571-588, 2002.
[30] L. Sweeney, "$k$ -Anonymity: A Model for Protecting Privacy," Int'l J. Uncertainty Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.
[31] M. Terrovitis, N. Mamoulis, and P. Kalnis, "Privacy-Preserving Anonymization of Set-Valued Data," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 115-125, 2008.
[32] R.C.-W. Wong, A.W.-C. Fu, K. Wang, and J. Pei, "Minimality Attack in Privacy Preserving Data Publishing," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 543-554, 2007.
[33] R.C.-W. Wong, J. Li, A.W.-C. Fu, and K. Wang, "($\alpha$, $k$)-Anonymity: An Enhanced $k$ -Anonymity Model for Privacy Preserving Data Publishing," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 754-759, 2006.
[34] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 139-150, 2006.
[35] Y. Xu, K. Wang, A.W.-C. Fu, and P. S. Yu, "Anonymizing Transaction Databases for Publication," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 767-775, 2008.
Cite This Article
  • APA Style

    Shivani Rohilla, Manish Bhardwaj. (2017). Efficient Anonymization Algorithms to Prevent Generalized Losses and Membership Disclosure in Microdata. American Journal of Data Mining and Knowledge Discovery, 2(2), 54-61. https://doi.org/10.11648/j.ajdmkd.20170202.13

    Copy | Download

    ACS Style

    Shivani Rohilla; Manish Bhardwaj. Efficient Anonymization Algorithms to Prevent Generalized Losses and Membership Disclosure in Microdata. Am. J. Data Min. Knowl. Discov. 2017, 2(2), 54-61. doi: 10.11648/j.ajdmkd.20170202.13

    Copy | Download

    AMA Style

    Shivani Rohilla, Manish Bhardwaj. Efficient Anonymization Algorithms to Prevent Generalized Losses and Membership Disclosure in Microdata. Am J Data Min Knowl Discov. 2017;2(2):54-61. doi: 10.11648/j.ajdmkd.20170202.13

    Copy | Download

  • @article{10.11648/j.ajdmkd.20170202.13,
      author = {Shivani Rohilla and Manish Bhardwaj},
      title = {Efficient Anonymization Algorithms to Prevent Generalized Losses and Membership Disclosure in Microdata},
      journal = {American Journal of Data Mining and Knowledge Discovery},
      volume = {2},
      number = {2},
      pages = {54-61},
      doi = {10.11648/j.ajdmkd.20170202.13},
      url = {https://doi.org/10.11648/j.ajdmkd.20170202.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20170202.13},
      abstract = {Nowadays, data and knowledge extracted by data mining techniques represent a key asset driving research, innovation, and policy-making activities. Many agencies and organizations have recognized the need of accelerating such trends and are therefore willing to release the data they collected to other parties, for purposes such as research and the formulation of public policies. However, the data publication processes are today still very difficult. Data often contains personally identifiable information and therefore releasing such data may result privacy breaches, this is the case for the examples of micro-data, e.g., census data and medical data. This thesis studies how we can publish and share micro data in privacy-preserving manner. This present a next ensive study of this problem along three dimensions: Designing a simple, intuitive, and robust privacy model, designing an effective anonymization technique that works on sparse and high-dimensional data and developing a methodology for evaluating privacy and utility tradeoffs. Here, we present a novel technique called slicing which partitions the data both horizontally and vertically. It preserves better data utility than generalization and is more effective than bucketization in terms of sensitive attribute.},
     year = {2017}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Efficient Anonymization Algorithms to Prevent Generalized Losses and Membership Disclosure in Microdata
    AU  - Shivani Rohilla
    AU  - Manish Bhardwaj
    Y1  - 2017/02/22
    PY  - 2017
    N1  - https://doi.org/10.11648/j.ajdmkd.20170202.13
    DO  - 10.11648/j.ajdmkd.20170202.13
    T2  - American Journal of Data Mining and Knowledge Discovery
    JF  - American Journal of Data Mining and Knowledge Discovery
    JO  - American Journal of Data Mining and Knowledge Discovery
    SP  - 54
    EP  - 61
    PB  - Science Publishing Group
    SN  - 2578-7837
    UR  - https://doi.org/10.11648/j.ajdmkd.20170202.13
    AB  - Nowadays, data and knowledge extracted by data mining techniques represent a key asset driving research, innovation, and policy-making activities. Many agencies and organizations have recognized the need of accelerating such trends and are therefore willing to release the data they collected to other parties, for purposes such as research and the formulation of public policies. However, the data publication processes are today still very difficult. Data often contains personally identifiable information and therefore releasing such data may result privacy breaches, this is the case for the examples of micro-data, e.g., census data and medical data. This thesis studies how we can publish and share micro data in privacy-preserving manner. This present a next ensive study of this problem along three dimensions: Designing a simple, intuitive, and robust privacy model, designing an effective anonymization technique that works on sparse and high-dimensional data and developing a methodology for evaluating privacy and utility tradeoffs. Here, we present a novel technique called slicing which partitions the data both horizontally and vertically. It preserves better data utility than generalization and is more effective than bucketization in terms of sensitive attribute.
    VL  - 2
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Science and Engineering, HRIT, Ghaziabad, India

  • Department of Computer Science and Engineering, SRM University, Modinagar, India

  • Sections