Contents & References of Consensus clustering on heterogeneous distributed data
List:
Abstract 1
Chapter 1 Introduction 2
1-1- Introduction 3
1-2- Data mining 3
1-3- Data mining methods 4
1-4- Clustering 5
1-5- Consensual clustering 9
1-6- The research conducted in the thesis 12
1-7- Results obtained 13
1-8- The structure of the thesis 13
Chapter two, an overview of the work done 14
2-1- Introduction 15
2-2- Clustering methods 15
2-2-1- Segmentation methods 17
2-2-2- Hierarchical methods 19
2-2-3- K-Means clustering algorithm 19
2-3- Consensus clustering 22
2-3-1- Motives for using consensus clustering 23
2-3-2- Clustering problem Consensus: providing an example 25
2-3-3- An overview of consensus clustering methods 26
2-3-4- Grouping consensus clustering methods 27
2-3-5- Similarity-based methods 31
Two-by-two similarity (correlation matrix) 31
Graph-based 35
2-3-6- consensus methods using mutual information 39
2-3-7- consensus methods using a hybrid model 40
2-3-8- vote-based consensus methods 42
2-4- methods of generating clustering community 46
2-5- chapter summary 49
Chapter 3 Presentation of the proposed solution: Consensus clustering on heterogeneous distributed data 51
3-1- Introduction 52
3-2- Proposed solution 53
3-2-1- Identifying the similarity of clusters 53
3-2-2- Weighted clustering 60
3-2-3- Consensual clustering on heterogeneous distributed data 64
3-3- Production of clustering community 67
3-4- Summary of chapter 68
Chapter 4 Implementation of the proposed solution and its evaluation results 70
4-1- Introduction 71
4-2- Evaluation criteria 71
4-2-1- Accuracy criterion 72
4-2-2- Davies-Bouldin index 73
4-2-3- Rand index73
4-2-4- Average normalized bilateral information (ANMI) 75
4-3- Implementation 76
4-4- Data sets 76
4-5- Evaluation results78
4-5-1- Accuracy criterion 78
4-5-2- Davies-Bouldin index81
4-5-3- Rand index 83
4-5-4- Average normalized bilateral information (ANMI) 85
4-6- Summary of chapter 87
Chapter Five Conclusion and future works 88
5-1- Introduction 89
5-2- Conclusion 89
5-3- Future works 92
References 94
Appendix A: List of abbreviations 100
Appendix B: English to Persian dictionary 101
Appendix C: Persian to Persian dictionary English 107
Source:
[1]
Agarwal, P. K., Har-Peled, S., & Yu, H. 2013. Embeddings of surfaces, curves, and moving points in Euclidean space. SIAM Journal on Computing, 42(2), 442-458.
[2]
Alam, S., Dobbie, G., Koh, Y. S., & Riddle, P. 2013, April, Clustering heterogeneous web usage data using hierarchical particle swarm optimization, In Swarm Intelligence (SIS), 2013 IEEE Symposium on (pp. 147-154). IEEE.
[3]
Al-Zoubi, M. B., Hudaib A., Huneiti A. and Hammo B. 2008. New Efficient Strategy to Accelerate k-Means Clustering Algorithm. American Journal of Applied Sciences. 5:1247-1250
[4]
Amig?, E., Gonzalo, J., Artiles, J. and Verdejo, F. 2008. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Journal of Information Retrieval. Springer.
[5]
Arthur, D. and Vassilvitskii, S. 2007. k-means++: the advantages of careful seeding. Proceedings of the 18th annual ACM-SIAM symposium on Discrete algorithms. p:1027-1035.
[6]
Ayad, H. G. 2008. Voting-Based Consensus of Data Partitions. PhD Thesis (In University of Waterloo).
[7]
Ayad, H. G. and Kamel, M. S. 2005. Cluster-based cumulative ensembles. In Multiple Classifier Systems: Sixth International Workshop, MCS 2005. Seaside, CA, USA. p:236–245.
[8]
Belghini, N., Zarghili, A., Kharroubi, J., & Majda, A. 2011, January. Sparse Random Projection and Dimensionality ReductionSparse Random Projection and Dimensionality Reduction Applied on Face Recognition. In The Proceedings of International Conference on Intelligent Systems & Data Processing (pp. 78-82).
[9]
Berkhin, P. 2006. Survey on Clustering Data Mining Techniques. Grouping Multidimensional Data. Springer. p:25-71.
[10]
Boulis, C. and Ostendorf, M. 2004. Combining multiple clustering systems. In The 8th European conference on Principles and Practice of Knowledge Discovery in Databases(PKDD), LNAI 3202. p:63–74.
[11]
Chunsheng, H., Qian, C., Haiyuan, W. and Wada, T. 2008. RK-Means Clustering: K-Means with Reliability. IEICE transactions on information and systems. 91(1):96-104.
[12]
David, G. and Thomas, H. 2005. Non-redundant clustering with conditional ensembles. The 11th ACM SIGKDD international conference on knowledge discovery in data mining. p:70-77.
[13]
Dimitriadou, E., Weingessel, A. and Hornik, K. 2002. A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence. 16:901–912.
[14]
Domeniconi, C. and Al-Razgan, M. 2007. Weighted Cluster Ensembles: Methods and Analysis. Technical Report ISE-TR-07-06.
[15]
Domininique, V., Abdi, H., Williams, L. J., Bennani?Dosse, M. 2012. Statis and disstatis: optimum multitable principal component analysis and three way metric multidimensional scaling. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2), 124-167.
[16]
Duda, R. O., Hart, P. E., & Stork, D. G. 2012. Pattern classification. John Wiley & Sons.
[17]
Dudoit, S. and Fridlyand, J. 2003. Bagging to improve the accuracy of a clustering procedure. Bioinformatics. 19(9):1090-1099
[18]
Elkan, C. 2003. Using the triangle inequality to accelerate k-means. Proceedings of the 20th International Conference on Machine Learning (ICML-2003).
[19]
Fischer, B. and Buhmann, J. M. 2003. Bagging for path-based clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence. 25:1411–1415.
[20]
Fred, A. 2001. Finding consistent clusters in data partitions. The Second International Workshop on Multiple Classifier Systems. Springer-Verlag. p:309-318.
[21]
Fred, A. and Jain, K. A. 2002. Evidence Accumulation Clustering Based on the K-Means Algorithm. The Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition. Springer-Verlag. p:442-451.
[22]
Gasieniec, L., Jansson, J. and Lingas, A. 2004. Approximation algorithms for Hamming clustering problems. Journal of Discrete Algorithms. Elsevier. 2:289-301
[23]
Gionis, A., Mannila, H. and, Tsaparas, P. 2005. Clustering Aggregation. In Proceedings of Twenty-first International Conference on Data Engineering (ICDE). p:341-352.
[24]
Guillaume, R., & Mouaddib, N. 2002. SAINTETIQ: a fuzzy set-based approach to database summarization. Fuzzy sets and systems, 129(2), 137-162.
[25]
Gondek, D. and Hofmann, T. 2004. Non-redundant data clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining. p:75–82.
[26]
Gordon, A. D. and Vichi, M. 2001. Fuzzy partition models for fitting a set of partitions. Psychometrika. 66:229–248.
[27]
Greene, D., Tsymbal A., Bolshakova, N. and Cunningham P. 2004. Ensemble Clustering in Medical Diagnostics. Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems. p:576-581.
[28]
Gupta, M., & Han, J. 2011, Heterogeneous network-based trust analysis: a survey, ACM SIGKDD Explorations Newsletter, 13(1), 54-71.
[29]
Halkidi, M., Batistakis, Y. and Vazirgiannis, M. 2002. Clustering validity checking methods: part II. ACM SIGMOD Record. 31:19-27.
[30]
Han, J.