WWW.ABSTRACT.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Abstract, dissertation, book
 
<< HOME
CONTACTS



Pages:   || 2 |

«arch J. Senthilnatha1, Vipul Dasb2, S.N. Omkara3, V. Mania4 a Department of Aerospace Engineering, Indian Institute of Science, Bangalore, India b ...»

-- [ Page 1 ] --

Clustering using Levy Flight Cuckoo Search

J. Senthilnatha1, Vipul Dasb2, S.N. Omkara3, V. Mania4

a

Department of Aerospace Engineering, Indian Institute of Science, Bangalore, India

b

Department of Information technology, National Institute of Technology, Karnataka, India

{ snrj@aero.iisc.ernet.in; 2vipulramdas@gmail.com; 3omkar@aero.iisc.ernet.in;

mani@aero.iisc.ernet.in}

Abstract. In this paper, a comparative study is carried using three nature-inspired

algorithms namely Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Cuckoo Search (CS) on clustering problem. Cuckoo search is used with levy flight. The heavy-tail property of levy flight is exploited here. These algorithms are used on three standard benchmark datasets and one real-time multi-spectral satellite dataset. The results are tabulated and analysed using various techniques. Finally we conclude that under the given set of parameters, cuckoo search works efficiently for majority of the dataset and levy flight plays an important role.

Keywords: Genetic algorithm, Particle swarm optimization, Cuckoo search, Levy flight, Clustering.

1 Introduction Clustering is an unsupervised learning method where objects with closer resemblance are grouped together to form a cluster based on a similarity measure. The objective of clustering is to minimize intra-cluster distance while inter-cluster distance is maximized [1]. Clustering has various applications which include data analysis, machine learning, image analysis and other engineering applications.

Clustering can be classified into two types: hierarchical and partition. In hierarchical clustering, objects belong to more than one cluster forming a hierarchical pattern. Hierarchical clustering is carried out by splitting and merging the dataset.

In splitting the number of cluster centres generated would be greater than the number of classes while merging is to group the dataset to exact number of classes. In partition clustering, objects are clustered into disjoint groups without forming a hierarchy. In both methods, similarity measure is used to generate cluster centres.

Previously, the most popularly used and tested partition based algorithm is kmeans clustering. The main disadvantage of k-means clustering is convergence to J. C. Bansal et al. (eds.), Proceedings of Seventh International Conference on Bio-Inspired 65 Computing: Theories and Applications (BIC-TA 2012), Advances in Intelligent Systems and Computing 202, DOI: 10.1007/978-81-322-1041-2_6, Ó Springer India 2013 66 J. Senthilnatha et al.

the local minima [2]. In literature, nature inspired algorithms are used effectively in clustering problem as it converges to global minima [2, 3]. These algorithms are based on the exploration and exploitation behaviour observed in nature and is effectively used in optimization problems.

In this paper, a comparative performance study is carried out based on the results obtained using three nature inspired algorithms namely genetic algorithm (GA), particle swarm optimization (PSO) and cuckoo search algorithm (CS) on clustering problem. The standard benchmark clustering data used in our study are the same that is available in the (UCI machine learning repository) literature [4] and a real-time multi-spectral satellite image for crop type classification. Xin-She et.al [5] has implemented and analyzed CS algorithm by comparing with GA and PSO using standard benchmark functions. In their study, CS algorithm is used with levy flight and is found to be performing better compared to the other two methods. In literature, CS has been used without levy distribution for clustering problem on satellite image [3]. In our study, we use CS with levy flight as used in [5], on clustering data set by comparing with GA and PSO. The important property of levy flight is it makes sure that the whole search space is covered, which is due to the heavy-tailed property of levy distribution [6-10]. In our study, we split the data into training and testing samples. The cluster centres are determined using the algorithms on the training dataset and the testing dataset is used to determine the classification error percentage (CEP).

The remaining sections are in the following order: in section 2 the problem formulation for clustering is discussed, in section 3 a brief discussion of the algorithms is presented, in section 4 and section 5 we discuss analysis of the results obtained and conclusion respectively..

2 Problem Formulation

–  –  –

3 Methodology This section gives brief introduction about the algorithms used in our study, the way it has been applied for clustering problem and also the pseudo-code for the algorithms are discussed.

3.1 Genetic algorithm This algorithm is based on the natural selection process seen in nature [11, 12].

The best fit organism of the current generation carries on the genes to the next generation. The concept of genetic operators (cross-over and mutation) is included in the algorithm wherein a change in the gene structure is introduced that produces an entirely different trait. The main idea behind genetic algorithm is the operators used namely reproduction, crossover and mutation.

This algorithm takes a predetermined number of random solutions (population) in the search space called chromosomes. Here the convergence criterion is used to terminate the algorithm. At each iteration the chromosomes are made to crossover using single point crossover and the fitness of each chromosomes is calculated using fi = f(xi) i=1,2,...,n 3 where f(xi) is the fitness function given by Eq. 1 considering the clusters individually and n is the population size.





The fittest chromosomes (solutions) among the entire population are considered for the next generation (iteration). At any random point the chromosomes undergo mutation based on the mutation rate. The fitness is calculated and the best solutions carryon till termination criteria is reached. Thus the cluster centres are generated using the training data set.

Pseudo-code

1. Initialize population of n chromosomes

2. Repeat till stopping criteria

a) Calculate fitness using Eq. 3

b) Apply elitism by sorting the fitness value of the population

c) Retain the best fit solutions (reproduction)

d) Crossover the adjacent chromosomes at a random position using single point crossover

e) Mutate randomly selected point within a chromosome

3. Cluster centre will be the best fit solution from the population 68 J. Senthilnatha et al.

3.2 Particle Swarm Optimization

This is a population based method which iteratively improves the solution by moving the solutions closer to the optimal solution. Here each particle moves towards the optimal solution with a velocity vi at each iteration. Eventually all particles converge to an optimal position [13].

Initially n particles are created and randomly distributed in the search space. The fitness of each particle is evaluated using Eq.3 and Eq.1, considering the classes individually. All the particles are made to move one step towards the fittest particle (global best solution) as well as towards its personal best position with a velocity vi given by vi(t+1)=w*vi(t)+bp*rand*(pi-ci)+bg*rand*(g-ci) 4 where pi is the personal best position of the particle, ci is the current position of the particle, g is the global best of the entire particle, w is the inertial constant, bp is the personal best constant and bg is the global best constant, i=1, 2,..., n. Each particle moves using ci(t+1)=ci(t)+vi 5 The fitness of each particle is calculated and the personal best position and the global best are determined. This process is repeated until stopping criteria is met.

The global best position will be the cluster centre to the given data set.

Pseudo-code

1. Initialize n particles

2. Repeat till stopping criteria met

a) Calculate fitness of each particle using Eq.3

b) global best position is the best fit particle

c) move all the particles towards the global best position using Eq.4 and Eq.5

d) for each particle if (fitness of current position fitness of personal best) then personalbest = current position

e) update personal best position for each particle

f) global best fitness value is retained

3. Cluster centre is the global best position

3.3 Cuckoo Search

This algorithm is based on the breeding pattern of parasitic cuckoos [3, 5, 14].

Some species of cuckoo namely ani and Guira lay their eggs in the nest of other birds. The possibility of occurrence of such act leads to i) the host birds’ eggs being destroyed by the cuckoo itself or the cuckoo chick upon hatching; ii) the host birds may realise the presence of a foreign egg in its nest and may throw away these eggs or abandon the nest altogether and build a new nest elsewhere [5].

These are the processes in nature that this algorithm inculcates. The basic assumptions made are: 1) At a time each cuckoo lays one egg and dumps it into ranClustering using Levy Flight Cuckoo Search 69 domly chosen nest; 2) The best nest with high quality eggs will carry over to the next generation; 3) Each nest contains only one egg and the number of host nests are fixed and; 4) The probability that the host bird discovers the cuckoo egg is pa..

This implies that the fraction pa of n nests is replaced by new nests (with new random solutions) [5].

Each nest represents a solution and a cuckoo egg represents a new solution. The aim is to use the new and potentially better solutions (cuckoo eggs). An initial population of host nest is generated randomly. The algorithm runs till the convergence is reached. At each iteration a cuckoo is selected at random using levy flight as given [5] xi(t+1)=xi(t) + α*L 6 where α is the step-size, L is a value from the Levy distribution, i=1,2,...,n, n is the number of nests considered. The fitness of the cuckoo is calculated using Eq.3 and Eq.1, considering the classes individually.

Choose a random nest from the given population of nests and evaluate its fitness from Eq.6. If the fitness of the new solution is better than the older one then replace the older one with the new one. A fraction pa of the total number of nests is replaced by new nests with new random solution. The best nests with the fittest egg (solution) are carried-on to the next generation.

This is continued till the termination criteria is reached and the best nest with fittest egg is taken as the optimal value. Thus the cluster centres can be generated using this optimal value.

Pseudo-code

1. Initialise n nests

2. Repeat till stopping criteria is met

a) Randomly select a cuckoo using levy flight using Eq.6

b) Calculate its fitness using Eq.3 (Fc)

c) Randomly select a nest

d) Calculate its fitness using Eq.3 (Fn)

e) If (Fc Fn) then Replace the nest with the cuckoo

f) A fraction pa of nest are replaced by new nests

g) Calculate fitness and keep best nests

h) Store the best nest as optimal fitness value

3. Cluster centre will be the best nest position 4 Results and discussion

–  –  –

The performance measures used in this paper are classification error percentage [2], Statistical significance test [4], Receiver operating characteristic [15, 16] and time complexity analyses.

4.1. Classification error percentage

–  –  –

The algorithms are run five times and the average of the results is as shown in Table 2. The values are obtained using the testing dataset. The parameters such as the maximum generation and the number of initial random solution are kept the same for all the algorithms. Each algorithm is run till it converged to a point with a tolerance of 0.01. In GA, the best 40% of the parent generation and the best 60% of the offspring generation are carried on to the next generation. In PSO, the inertial constant (w), the personal best constant (bp) and the global best constant (bg) are all set to 1. In CS algorithm, the probability factor pa is set to 0.25.

Clustering using Levy Flight Cuckoo Search 71

4.2. Statistical significance test

Statistical significance is done to ascertain that the results are obtained consistently. No matter where the initial random solutions are picked up from, they would always converge to the global optimum position (cluster centre). This would imply that an algorithm which performed better than the other algorithms will always perform better when run under similar initial conditions. In this study a binomial test is conducted [3] between CS and GA and also CS and PSO based on the result obtained on image segmentation dataset.

Assume the test is carried between CS and GA. Here the total number of testruns is N, i.e., the result of CS and GA differ in N places. Let S (success) is the number of times CS gave correct result and F (failure) is the number times GA gave correct result. Now, calculating the p-value (probability of S successes out of N trials) using the binomial distribution as N

–  –  –

In our case, we analyse on the image segmentation dataset. We give an example of the analyses using cuckoo search. The classification matrix obtained after applying cuckoo search algorithm on image segmentation data is given in Table 5.

–  –  –

In the above representation, row indicates the class the element belongs to and the column indicates the class the elements are classified into after using the cluster centre based on the CS algorithm. The principal diagonal elements represent correctly classified elements. Consider class 1, from Table 4, we have TP=126, FN= 174, FP=168, TN=1632. From this data, we calculate the true posiClustering using Levy Flight Cuckoo Search 73 tive rate, false positive rate and the accuracy of the given algorithm on class 1 of the given clustering dataset. From Eq. 9, Eq. 10 and Eq. 11, TPR is 0.4200, FPR is

0.0933 and ACC is 0.8371. This implies that 42% of what actually belonged to class 1 was correctly classified and 9% of the data which did not belong to class 1 were added to class 1. The overall efficiency of the algorithm with respect to class 1 is 83%. Similarly the ROC analyses for all the classes of image segmentation dataset for the above three algorithms are given in Table 6.

–  –  –



Pages:   || 2 |


Similar works:

«TECHNICAL REPORT AND RESOURCE ESTIMATE ON THE GREAT BURNT COPPER PROPERTY, CENTRAL NEWFOUNDLAND FOR PAVEY ARK MINERALS INC. LATITUDE 48o 20’ 28” N LONGITUDE 56o 09’ 06” W UTM WGS84 Zone 21U 562869 mE 5354553 mN; NTS 12A/08 NI-43-101 & 43-101F1 TECHNICAL REPORT Eugene Puritch, P.Eng. Jarita Barry, P.Geo. P&E Mining Consultants Inc., Report 297 Effective Date: January 12, 2015 Signing Date: February 18, 2015 TABLE OF CONTENTS 1.0 SUMMARY 2.0 INTRODUCTION AND TERMS OF REFERENCE 2.1 TERMS...»

«Surface Web Semantics for Structured Natural Language Processing Mohit Bansal Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2015-20 http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-20.html May 1, 2015 Copyright © 2015, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or...»

«ACAR, KELECİOĞLU / Comparison of Differential Item Functioning Determination Techniques:. • 1343 Predictiveness of Identity Status, Main Internet Use Purposes and Gender on University Students’ the Problematic Internet Use Esra CEYHAN* Abstract This study aims at revealing the relationships between the problematic Internet use of university students and their identity status, main Internet use purposes, and gender. A total of 464 university students participated in the study, and the...»

«          UNITED NATIONS INSTITUTE EUROPEAN UNION INDONESIA FOR DISARMAMENT RESEARCH ail.unog.     Supporting the Arms Trade Treaty Negotiations  through Regional Discussions and Expertise Sharing              Regional Seminar for Countries   in Eastern Asia and the Pacific      6–8 June 2011  Bali, Indonesia    SUMMARY REPORT    Table of contents         Introduction Seminar proceedings Findings and recommendations Calls for strong national...»

«TECHNISCHE UNIVERSITÄT MÜNCHEN Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt Lehrstuhl für Experimentelle Genetik Physiology and molecular characterization of metabolism related mouse models for bone disease Shen Chi Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften...»

«Admission test 5EC – English Admission test 5EC – English Date: Surname: Mark: Name: Students’ level of English will be evaluated through a 3 hour test (CEF level B2) and is composed of the following parts: 1. A listening comprehension (20 m) 2. A reading comprehension (20 m) 3. A written essay in academic English (+/250 words) (40 m) 4. An oral examination of 10-15 minutes (40 m) 1|Page Admission test 5EC – English English admissions test for 5IA at the Lycée Technique Michel Lucius...»

«Institut für Informatik der Technischen Universität München Augmented Chemical Reactions – Research on 3D Selection and Confirmation Methods Patrick Julian Ludwig Maier Institut für Informatik der Technischen Universität München Fachgebiet Augmented Reality Augmented Chemical Reactions – Research on 3D Selection and Confirmation Methods Patrick Julian Ludwig Maier Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des...»

«СОБЫТИЙНЫЙ МАРКЕТИНГ ВУЗА НА ПРИМЕРЕ ПРАЗДНОВАНИЯ «80 ЛЕТ ДГТУ» Астахова И.В. Донской государственный технический университет Ростов-на-Дону, Россия UNIVERSITY EVENT MARKETING BY EXAMPLE OF CELEBRATING “THE 80TH ANNIVERSARY OF DSTU” Astahova I.V. Don State Technical University Rostov-on-Don, Russia В условиях современного рынка, когда...»

«RISK MANAGEMENT AND CONTROL GUIDANCE FOR SECURITIES FIRMS AND THEIR SUPERVISORS A Report by the Technical Committee of the International Organization of Securities Commissions May 1998 TABLE OF CONTENTS I. Introduction II. The Role of Risk Management and Controls III. Firm and Supervisory Considerations IV. Elements of a Risk Management and Control System. 13 Appendix A Significant Risk Management and Control Papers. 19 Appendix B Risk Management and Control Self-Assessment Grid. 27 -1I....»

«Schlussbericht zu dem IGF-Vorhaben Beherrschung stark korrelierter Logistikund Produktions-Prozesse (AutokorrelierteAuftragsstroeme) der Forschungsstelle(n) (1) Technische Universität Dresden, ITLA, Professur für Technische Logistik (3) Universität der Bundeswehr München, Institut für Technische Informatik Das IGF-Vorhaben 17344BR der Forschungsvereinigung Logistik wurde über die im Rahmen des Programms zur Förderung der Industriellen Gemeinschaftsforschung (IGF) vom aufgrund eines...»

«Univac Tubecrafts pvt. Ltd. Univac Group India is one of the leading manufacturer Exporter in the hair extension Industry, established in 1997 with an objective of offering the finest quality virgin Indian Remy Hair to the hair loving people across the world. A multi dimensional company have wide experience of souring virgin Indian Remy hair directly from it’s source ‘the Indian Temples’ where Hindu women save off their heads to honor the God on fulfillment of their wishes, the religious...»

«Global Value Chains and the Great Recession: Evidence from Italian and German Firms Antonio Accetturo, Banca d’Italia* Anna Giunta, Università Roma Tre Abstract During the last two decades, profound changes in the international division of labour among firms have occurred, with impressive growth in outsourcing, off-shoring of some stages of production and the globalization of intermediates goods markets. This new model of the international division of labour has both initiated an increasing...»





 
<<  HOME   |    CONTACTS
2016 www.abstract.xlibx.info - Free e-library - Abstract, dissertation, book

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.