Investigating the random forest method to improve urban land cover classification using satellite images

Number of pages: 117 File Format: word File Code: 31437
Year: 2011 University Degree: Master's degree Category: Civil Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Investigating the random forest method to improve urban land cover classification using satellite images

    Master thesis

    in the field of civil engineering - mapping

    Remote sensing trend

    Abstract

    Urban land cover classification has always been important due to the ability of human elements to communicate with physical environments and use in better management of resources. The need for up-to-date, accurate and detailed knowledge of urban land cover information derived from remote sensing data is increasingly felt among many societies. With recent advances in remote sensing data, technology, and theories, the need for this is even greater. The development of new sensors shows a high potential for urban classification. Nevertheless, the performance of conventional and old classification methods is limited due to the complexity of image interpretation; And studying newer methods to overcome these limitations is more felt. On the other hand, new methods in pattern recognition such as random forest (RF[1]), as a new machine learning algorithm, have attracted much attention in the field of image classification and pattern recognition. Several studies have also shown the benefits of RF in land use classification. However, few of these researches have focused on the urban context and the use of new satellite imagery and lidar. In this research, an urban scene with a new method based on combining the object-oriented method and RF classification to improve the classification has been proposed and investigated. For comparison, conventional classification methods have also been performed. In pixel-based classification with different classification algorithms, RF produced the highest overall accuracy of about 82%, and in object-based classification, SVM produced the highest overall accuracy of about 79% and RF produced 77% accuracy. In the classification of object-based features for objects resulting from the segmentation of a surface with the proposed method, the overall accuracy of the RF algorithm was improved from 75% to 76%, the overall accuracy of the SVM algorithm [2] was improved from 75% to 78%, the overall accuracy of the neural network was improved from 70% to 75%, and the overall accuracy of the maximum similarity algorithm was improved from 44% to 77%. Therefore, the results of this research show that the proposed method can improve the classification performance in terms of accuracy and speed. Keywords: urban land cover classification, image segmentation, object-oriented features, random forest algorithm, feature selection, support vector machine, neural networks, remote sensing. 1 Introduction 1-1 Preface In today's life, having up-to-date information is a great advantage that leads to making the right decision and better life in human societies. One of the most important of this information is the updated land cover maps, which are needed for (urban) managers to make informed decisions and manage and plan. Remote sensing is a rich source for producing a lot of spatial and environmental information, and one of the most fundamental information it produces is land cover maps. Land cover information is used to produce land use maps, study environmental changes and communicate between different human factors and physical variables of the environment. To produce land cover maps, this information must first be extracted from satellite images and other data. Visual interpretation and machine learning algorithms are two common methods for extracting information from satellite images and remote sensing data, each of which has advantages and disadvantages. In some cases, extracting information from satellite and aerial images by human agents produces more favorable results than automatic or semi-automatic methods. But in today's societies, the production of information by human agents and traditional methods no longer meets the existing needs, and it is necessary to develop automatic methods free of human intervention. In this regard, newer learning algorithms are continuously developed to meet this need. In the field of extracting information from remote sensing images using the traditional method, the issues that should be considered are: 1- The large volume and rapid growth of data and images in remote sensing, 2- The time-consuming nature of information extraction by humans, and on the other hand, 3- The complexity of complications for visual interpretation and extraction by eye may cause errors, and in some cases, extracting information in this way becomes impossible. The solution to this problem is the use of machine learning algorithms whose ultimate goal is to extract information without human intervention.. The most important thing that machine learning algorithms do in remote sensing is to classify data into information classes. Common machine learning algorithms in remote sensing such as maximum similarity classification methods (MLC[1]), support vector machine (SVM[2]) and artificial neural networks (ANN[3]) have problems such as 1- the need for large and error-free training data, 2- the need for optimal and correct determination of initial parameters, 3- many calculations and 4- low accuracy in information extraction.  Random forest (RF[4]) is a new machine learning algorithm that produces satisfactory classification results by combining tree classifiers, and using this method can solve some of the problems raised in previous algorithms.

    The information value of a picture is more than a thousand words. Remote sensing provides us with images with various information about the environment. As mentioned, this information can be obtained by classifying images. In most cases, pixel-based methods are used in image classification. These methods classify image pixels based on their numerical information. But usually, the complications that we are looking for in most cases in an image are not single pixels but are in the form of a collection of pixels or an object. Therefore, in this research, taking into account that the aim is to classify land cover and the desired final features are not single pixels, first a segmentation is done on the image to produce image objects, and then these objects are classified according to their characteristics to provide land cover information classes.

    In this research, classification is done both pixel-based and object-based with several methods and the results of each are discussed. and it is investigated, so that finally a suitable method among the investigated methods for the classification of urban land cover using hyperspectral images is provided. From there, urban land cover is more complex and important than natural land cover. In this research, an image of an urban scene with various complications has been examined so that we can make a more accurate evaluation of different classification methods.

    1-2 Necessities, motivations and characteristics of the research

    In previous researches in the field of land cover classification, many methods and data have been used (Lu and Weng, 2007). In most of these researches, advanced and at the same time complex methods such as neural networks, support vector machines, RFM or combining these methods with optimization and fuzzification techniques have been used. A deep understanding of many of these methods and solving the problems resulting from the use of these methods or determining the parameters of these methods for the general users of remote sensing requires a lot of study and time. For this reason, these methods may not be used properly in some remote sensing applications.

    Nowadays, with the advancement of remote sensing sensors, it is possible to simultaneously use spectral and spatial information with high resolution. In addition, lidar sensors are able to provide us with accurate height information of the environment (Hodgson et al., 2003). The integration of these two types of data can greatly help to improve the accuracy of classification and preparation of urban land cover maps. Many researches have been done to classify and integrate these data in order to produce land cover maps. Most of these researches have been able to increase the classification accuracy of hyperspectral and lidar images by relying on advanced and complex methods. But the question that arises here is whether it is always necessary to use such complex methods (which often have high calculations) to increase the classification accuracy of hyperspectral images, or whether this accuracy can be achieved with simpler methods.

    One of the new classification methods is RF, which works with a very simple algorithm by combining several simple basic classifiers, and its parameters are very simple to determine. (Joelsson et al., 2010). Previous studies on RF have introduced the practical capabilities of this method. The proposed advantages of this method and its simplicity are the main motivation for using this method to classify hyperspectral images in this research. Some researchers have shown in previous works that image segmentation and object-oriented classification can increase the accuracy of classification (Kettig and Landgrebe, 1976, Geneletti and Gorte, 2003, Benz et al., 2004, Walter, 2004, Blaschke, 2010).

  • Contents & References of Investigating the random forest method to improve urban land cover classification using satellite images

    List:

    Chapter 1 Introduction. 2

    1-1 Preface. 2

    1-2 Necessities, motivations and characteristics of research. 4

    1-3      Objectives and research questions. 5

    1-4 Research method. 6

    1-5 Brief introduction of other chapters. 7

    Chapter 2 review of previous research. 10

    2-1 Introduction. 10

    2-2 An overview of land cover classification methods. 10

    2-2-1 Object-oriented classification techniques 11

    2-2-2 Unsupervised pixel-based classification techniques 12

    2-2-3 Supervised pixel-based classification techniques 12

    2-3 Overview of new classification methods in remote sensing. 13

    2-3-1           Classification with artificial neural networks. 14

    2-3-2           Classification with decision trees. 15

    2-3-3           Classification with support vector machine based methods. 15

    2-3-4 knowledge-based classification techniques. 17

    2-3-5           Classification with combined algorithms. 18

    2-4      Methods of selecting and reducing feature space. 21

    2-5 Summary of the chapter. 22

    Chapter 3 concepts and methods. 25

    3-1 Introduction. 25

    3-2     Basic concepts. 25

    3-3 Common learning algorithms. 27

    3-3-1           Linear separation analysis. 27

    3-3-2           Decision trees. 28

    3-3-3           Neural networks. 31

    3-3-4           Simple Bayes classifier 33

    3-3-5            Methods based on support vector machines and kernel. 34

    3-4 Collective methods. 39

    3-5     Reinforcement. 41

    3-6 Bagging method. 42

    3-6-1 Two group patterns. 42

    3-6-2           Bagging algorithm. 43

    3-6-3 Random forest. 47

    3-6-4           Feature selection with the help of RF feature importance index. 51

    3-7 Image segmentation. 53

    3-7-1           Segmentation by multi-resolution method. 54

    3-7-2           Method of estimating the appropriate scale for image segmentation. 58

    3-8     Estimation of classification accuracy. 59

    3-8-1           Ambiguity matrix. 60

    3-9 Summary. 62

    Chapter 4 research method and results. 64

    4-1 Introduction. 64

    4-2 Data and study area. 64

    4-3 Proposed research method. 66

    4-3-1           Band selection with the help of RF feature importance index. 69

    4-3-2           Hyperspectral image segmentation. 70

    4-3-3 Feature groups. 71

    4-3-4            Classification. 72

    4-4 Evaluation. 74

    4-4-1           Evaluation results of overall accuracy and Kappa coefficient 74

    4-4-2           Time evaluation of classification methods. 79

    4-4-3           Classification results by classes 80

    4-4-4           Visual evaluation. 84

    4-5 Summary of the contents of the chapter. 88

    Chapter 5 conclusions and suggestions. 91

    5-1 Introduction. 91

    5-2 Summary of the research. 91

    5-3     Research achievements. 92

    5-4      Suggestions 95

    Resources      97

     

    Source:

     

    Alexander C, Tansey K, Kaduk J, Holland D, and Tate NJ (2011) An approach to classification of airborne laser scanning point cloud data in an urban environment. International Journal of Remote Sensing 32, 9151-69.

    Baatz M and Sch?pe A (2000) Multiresolution segmentation: an optimization approach for high quality multi-scale image segmentation. Angewandte geographische informationsverarbeitung 12, 12-23.

    Ball GH and Hall DJ (1965) ISODATA, a novel method of data analysis and pattern classification. In (ed.), Vol. pp. DTIC Document, Bartels M and Wei H (2006) Maximum Likelihood Classification of LIDAR Data incorporating multiple co-registered Bands. In Proceedings of 4th International Workshop on Pattern Recognition in Remote Sensing/18th International Conference on Pattern Recognition.  (ed.), Vol. 20, pp. 17-20, Citeseer,

    Bekkari A, Idbraim S, Elhassouny A, Danielle Ducrot DM, Yassa ME, and Ducrot D (2012) Spectral and17-20, Citeseer,

    Bekkari A, Idbraim S, Elhassouny A, Danielle Ducrot DM, Yassa ME, and Ducrot D (2012) Spectral and Spatial Classification of High Resolution Urban Satellites Images using Haralick features and SVM with SAM and EMD distance metrics. International Journal of Computer Applications 46,

    Benz UC, Hofmann P, Willhauck G, Lingenfelder I, and Heynen M (2004) Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS Journal of Photogrammetry and Remote Sensing 58, 239-58.

    Bilgin G, Erturk S, and Yildirim T (2011) Segmentation of hyperspectral images via subtractive clustering and cluster validation using one-class support vector machines. Geoscience and Remote Sensing, IEEE Transactions on 49, 2936-44.

    Blaschke T (2010) Object based image analysis for remote sensing. ISPRS journal of photogrammetry and remote sensing 65, 2-16.

    Bonissone P, Cadenas J, Garrido M, and D?az-Valladares R (2008a) A fuzzy random forest: Fundamental for design and construction. In Proceedings of the 12th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'08).  (ed.), Vol. pp. 1231-8,

    Bonissone PP, Cadenas JM, Garrido M, and Diaz-Valladares RA (2008b) Combination methods in a fuzzy random forest. In Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on.  (ed.), Vol. pp. 1794-9, IEEE,

    Bosch A, Zisserman A, and Muoz X (2007) Image classification using random forests and ferns. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on.  (ed.), Vol. pp. 1-8, IEEE,

    Boulesteix AL, Janitza S, Kruppa J, and K?nig IR (2012) Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, 493-507.

    Breiman L (1996) Bagging predictors. Machine learning 24, 123-40.

    Breiman L (2003) Manual for Setting Up, Using, and Understanding Random Forest V4.0. In (ed.), Vol. pp.

    Breiman L (2001) Random forests. Machine learning 45, 5-32.

    Bruzzone L, Marconcini M, Wegmuller U, and Wiesmann A (2004) An advanced system for the automatic classification of multitemporal SAR images. Geoscience and Remote Sensing, IEEE Transactions on 42, 1321-34.

    Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery 2, 121-67.

    Camps-Valls G and Bruzzone L (2005) Kernel-based methods for hyperspectral image classification. Geoscience and Remote Sensing, IEEE Transactions on 43, 1351-62.

    Chan JC-W and Canters F (2007) Ensemble classifiers for hyperspectral classification. In Proceedings 5th EARSeL Workshop on Imaging Spectroscopy. Bruges.  (ed.), Vol. 1, pp.

    Chapelle O and Keerthi SS (2008) Multi-class feature selection with support vector machines. In Proceedings of the American statistical association.  (ed.), Vol. pp.

    Chehata N, Guo L, and Mallet C (2009) Airborne lidar feature selection for urban classification using random forests. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 39, 207-12.

    Chen Y-W and Lin C-J (2006) Combining SVMs with various feature selection strategies. In Feature Extraction.  Vol.  pp. 24-315.  Springer, Cohen J (1960) A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 37-46.

    Cortes C and Vapnik V (1995) Support-vector networks. Machine learning 20, 273-97.

    Crawford MM, Ham J, Chen Y, and Ghosh J (2004) Random forests of binary hierarchical classifiers for analysis of hyperspectral data. In Advances in Techniques for Analysis of Remotely Sensed Data, 2003 IEEE Workshop on.  (ed.), Vol. pp.

Investigating the random forest method to improve urban land cover classification using satellite images