Literature review cluster analysis
October 16, ; Accepted: February 23, ; Published: The challenging aspect is to organize the documents in a way that results in better search without introducing much extra cost and complexity. The Cluster Hypothesis is fundamental to the issue of improved effectiveness. It states that relevant documents tend to be more similar to each other than to non-relevant documents and therefore tend to appear in the same clusters Jardine literatue van Rijsbergen, If the cluster hypothesis holds for a particular document collection, then ligerature documents will be well separated from non-relevant ones.
A relevant document may be ranked low in a best-match search because it may lack some of the query terms. In a clustered collection, this relevant document may be clustered together with other relevant items that do have the required query terms and could therefore be retrieved through a clustered clusted Croft, According to piterature IR systems, if a document does not contain any of the query terms then its similarity to cluater query will be zero and this document will not be retrieved in response to the revisw.
Document clustering offers an alternative file organization to that of best-match retrieval and it has the potential to address this issue, thereby increase the effectiveness of an IR system. Document clustering has traditionally been applied statically to an entire document collection before querying static clustering. An alternative application of clustering is to only cluster documents that have been retrieved by an IR system in response to a query post-retrieval clustering Preece, Under post-retrieval clustering the resulting groups of documents are likely to be different for different queries.
Document clustering typically operates based on the notion literaturd inter document similarity. The set of terms shared between a pair of documents is typically used as an indication of the similarity of the pair. Document clustering has been investigated for use in different areas of text mining and IR. Initially, document clustering was investigated for improving the precision or recall in IR systems Rijsbergen et al.
Clustering has been proposed for use in browsing a collection of documents Cutting et al. The goal of clustering is stated as follows: VECTOR-SPACE MODEL OF DOCUMENTS In IR, a document refers generically to the unit of text indexed in the system and available for retrieval. A collection refers to a set of documents being used to satisfy user requests. A term clusher to a lexical item that occurs in a collection, but it may also include phrases. The representation of a set of documents as vectors in a common vector space is known as the vector space model.
In the vector space model of IR, documents are represented as vectors of features representing the terms that occur within the collection Salton, The value of each feature is called the term weight and is usually a function cludter the term's frequency or tfidf in the clister, along analywis other factors. The vector space model procedure can be divided in to three stages. The first stage is the document indexing where content bearing terms are extracted from the document text.
The second stage is the weighting of the ana,ysis terms to enhance retrieval of document relevant to the user. The final stage is identifying the similarities between the document and centroid of the cluster. More generally, a vector for a document Di is represented as: That is a document vector contains a weight feature for each of the Nd terms that occur in aalysis collection as whole; w1,i anlysis refers to the weight that term 1 has in document i.
It is useful continue reading view the features used to represent documents in a multi-dimensional space, where the feature weights serve to locate documents in that space. The set of documents in a collection then turns into a vector space, with one axis for each term. Premalatha and Natarajan b introduced concept based indexing for dimensionality reduction in documents. In this method the concept hierarchy xnalysis created and the documents are indexed based on the concept rather keywords.
The term weights are set as the simple frequency counts of the terms in the documents. This method is used to assign terms weights in the document. The term frequency is simplest form the raw frequency of a term within a document. This reflects the intuition that terms occur ahalysis within a document may reflect its meaning more strongly than terms that occur less frequently and should thus have higher weights.
The second factor is used to give a higher weight to words that only occur teview a few documents. The Inverse Document Frequency or IDF term weight Sparck, is one way of assigning higher weights to these more discriminative words. The fewer documents a term feview in, the higher this weight. The lowest weight of 1 is assigned to terms that occur in all the documents. Due to the large number of documents in many collections, this measure is usually squashed with a log function. The resulting definition IDF is thus: The tfidf thus prefers words which are frequent analysiss the current document j but rare overall in cluzter collection.
The characterization of documents as vectors of term weights allows us to revies the document collection as a whole as a matrix of weights, where wi,j represents the weight of term i in document reeview. In vector-based information retrieval the similarity between two documents measured by the cosine of the angle between their vectors.
When two documents are identical they will receive a cosine of one; when they are orthogonal that is it shares no common terms they will receive a cosine of zero. The equation for cosine is: That is, cosine is literature review cluster analysis dot product between the two vectors divided by the lengths of each of the two vectors. This is because the numerator of the cosine is the dot product. The denominator of the cosine contains terms for the lengths of the two vectors.
The similarity between the document Di and centroid Oj of the cluster Cj is measured as shown in Eq. Some of the distance measures are: Cljster words that occur in a collection are used to represent the documents in the collection. The common variations on this assumption involve the tokenization, a stop list and use of stemming is used to identify words in the documents. For grammatical reasons, documents are going to use different forms of a word, such as organize, organizes and organizing.
Additionally, there are families of derivationally related words with similar meanings, such as democracy, democratic and democratization. The goal of stemming is to reduce inflectional forms and sometimes derivationally related forms of literagure word to a common base form. Removing suffixes by automatic means is an operation which is especially useful in the field of IR. The suffix stripping process will rview the total number of terms in the IR system and hence reduce the size and complexity of the data in the system.
The Porter stemmer Porter, is frequently used for retrieval from rveiew of English documents. The proposed work uses Porter Stemmer algorithm for stemming. This section also discusses some other important clustering algorithms. Hierarchical agglomerative clustering Clusster clustering: Hierarchical clustering techniques proceed by either a series of successive merges or a series of successive divisions. For both methods, the number of clusters is needed to select a clustering from the analysiz. However the difference between the levels of the hierarchy may be an indication of the correct number of clusters.
Hierarchical clustering generates a hierarchical tree of clusters. This tree is also called a dendrogram Berkhin, Hierarchical methods can be further classified into agglomerative methods and divisive methods. In an agglomerative method, originally, each object forms a cluster.
- Proceedings of the IEEE International Conference on Fuzzy Systems, June , , Hong Kong, pp:
- Discovering Knowledge from Hypertext Data.
- In a divisive method, from a cluster which consists of all the objects, one cluster is selected and split into smaller clusters recursively until some termination criterion is satisfied.
Then the two most similar clusters are merged iteratively until some termination criterion is satisfied. Figure 1 shows an example for Hierarchical Agglomerative method. This figure depicts four patterns labeled A1, A2, A3 and A4. Initially agglomerative method places each object into a cluster of its own. The clusters are merged step-by-step according to some criterion. In the given example A2 and A3 are merged because A2 and A3 form the minimum Euclidean distance. The cluster merging process repeats until all of the objects are eventually merged to form one cluster.
A hierarchical algorithm yields a dendrogram representing the nested grouping of patterns and similarity levels at which groupings change. The dendrogram can be broken at different levels to yield different clusterings of the data. The following are the steps in an agglomerative hierarchical clustering algorithm for grouping N objects. Begin with N analysos, each containing one object Step 2: Calculate the distance between each pair of clusters. These distances are usually stored in a symmetric distance matrix Step 3: Merge the two clusters with the minimum distance Step 4: Repeat steps 3 and 4 until a single cluster remains Agglomerative algorithms are usually classified according to the inter-cluster similarity measure they use.
The most popular of these are single-link, complete-link and group average. Common for all rfview methods is high computational complexity, often quadratic or worse. Single-link clustering algorithms based on this similarity measure join the two clusters containing the two closest documents that are not yet in the same cluster. Complete-link clustering algorithms using this similarity measure join the two clusters with the minimum most-distant pair clhster documents.
In this way, clusters are kept small and compact since all documents within a cluster clusteer a maximum distance to each other Frakes and Baeza-Yates, Group Average Clustering algorithms using this similarity measure join the two clusters with the minimum average document distance. In a divisive method, from a cluster which consists of all the objects, one cluster is selected and split into smaller clusters recursively until some termination criterion analysjs satisfied. Top-down clustering requires a method for splitting a cluster and proceeds by splitting clusters recursively until individual documents are reached.
Top-down clustering is conceptually more complex than bottom-up clustering. It is evident that divisive algorithms produce more accurate hierarchies than bottom-up algorithms in some circumstances Steinbach et al. Bottom-up methods make clustering decisions based on local patterns without initially taking into account the global distribution. These early decisions cannot be undone. Top-down clustering ljterature from complete information about the global distribution when making top-level partitioning decisions. Partitional clustering divides data into several subsets.
Because checking all possible subset systems is clustsr infeasible, certain greedy heuristics are used in the form of iterative optimization. Specifically, this means different relocation schemes literatuge iteratively reassign points between the K clusters. Unlike hierarchical methods, in which clusters are not revisited after being constructed, relocation algorithms gradually improve clusters. With appropriate data, this results in high quality clusters. A problem accompanying the use of a partitional algorithm is the choice of the number of desired output clusters.
Reciew partitional revirw usually produce clusters by optimizing a criterion function defined either locally or globally. The algorithm is typically run multiple times with different starting states and the best analyeis obtained from all of the runs is used as the output clustering. The K-means algorithm is by far the most popular clustering tool used in scientific and industrial applications.
A centroid almost never corresponds to an actual data point. The sum of differences between a point and its centroid expressed through appropriate distance is used as the objective function. It is the sum of reiew squares of errors between the points ccluster the corresponding centroids, is equal to the total intra-cluster variance. This measure is used to show how well the centroids represent the members of their cluster. It then moves the cluster seed centers around in space in order to minimize E C. This is done iteratively by repeating two steps until a stopping criterion is met: Several termination conditions have been proposed: An introduction to data litetature clustering techniques can be found in Han et al.
There is a close relationship between clustering techniques and many other disciplines. Clustering has been used in statistics Arabie and Hubert, and science Annalysis and Kaufman, The introduction into pattern recognition framework is given in Duda and Hart Machine learning clustering algorithms were applied to image segmentation and computer vision Jain and Flynn, Statistical approaches to pattern recognition found in Dempster et al.
Clustering is widely used for data compression in image processing Gersho and Grey, Clustering in data mining was brought to life by intense developments in information retrieval and text literatufe Dhillon et al. The papers published in the document clustering field covers a number of diverse areas, such literagure the development of efficient algorithms for document clustering Larsen and Aone,the visualization of clustered document spaces Allan et al.
Document clustering has also been used to automatically generate hierarchical clusters of documents Koller and Sahami, A somewhat different approach Aggarwal et al. In this work the clustering algorithm used falls under the partitional category. Chakrabarti also discusses various types of clustering methods and categorizes them read more partitioning, geometric embedding and probabilistic approaches.
Friend literature analysis cluster review when
Text mining research in general relies analgsis a vector wnalysis model, first proposed by Salton to model text documents as vectors in the feature space. Features are considered to be the words in the document collection and feature values come from different term weighting schemes, the most popular of which is the Term Frequency-Inverse Document Frequency tfidf term weighting scheme. This model is simple but assumes independence between words in a document, which is not a major problem for statistical-based methods, but poses difficulty in phrase-based analysis. Many clustering techniques have been applied to clustering documents.
Willett provided a survey on applying hierarchical clustering algorithms into clustering documents. Two literathre the techniques are Buckshot and Aanlysis. Buckshot selects a small sample of documents to pre-cluster them using a standard clustering algorithm and assigns the rest of the documents to the clusters formed. Loterature each of these clusters is treated as if they were individual documents and the whole process is repeated until there are only K clusters.
A relatively new technique was proposed by Zamir and Etzioni They introduced the notion of phrase-based document clustering.
The first stage is the document indexing where content bearing terms are extracted from the document text. The following are the steps in an agglomerative hierarchical clustering algorithm for grouping N objects. Common for all agglomerative methods is high computational literqture, often quadratic or worse. Csorba and Vajk presented a novel topic based document clustering technique where there is no literature review cluster analysis to assign all the documents to the clusters. It then moves the cluster seed centers around in space in order to minimize E C. Csorba and Vajk,
They used a generalized suffix-tree to obtain information about the phrases and used them to cluster the documents. As far as revjew application to IR is concerned, cluster analysis has been used both for term or keyword clustering and for document clustering. Term clustering Wulfekuhler and Punch, is performed on the basis of the documents in which terms co-occur and it allows each term in anlaysis document, or query, to be replaced by the representation describing the cluster to which this term belongs. Application areas for term clustering include query expansion Rijsbergen et al.
Peat and Willett have raised questions regarding the effectiveness of the use of keyword co-occurrence data. Early experimentation showed that the effectiveness of searches based on document partitions is significantly inferior to that based on searches of the unclustered file Salton, Most recent applications of partitioning methods to IR Zamir and Etzioni, have also focused on efficiency aspects for on-line browsing tasks, rather than on the effectiveness of the methods. Huang introduced k-modes, an extension to the well-known K-Means algorithm for clustering numerical data.
By defining the mode notion for categorical clusters and introducing an incremental update rule for cluster modes, the literaturf preserved the scaling properties of K-Means. Naturally, it also inherited its disadvantages, such as dependence on the liherature clusters and ,iterature inability to automatically detect the number of clusters. It iterated multiple instances of these graphs using a user defined combination operator to eventually converge to a fix point. The authors argued that upon reaching this fixed point, the weights of the basins can be used analsis partition the data points, yielding the final clusters.
The dynamical systems approach underlying this method was problematic with regards to the type of detected clusters; the separation of attribute values by their weights was non-intuitive. Moreover, the number of basins required to revifw a sufficiently large probability of convergence can be significant.
The number of links intuitively captures the number of records that two records are both sufficiently similar to. This approach yields satisfactory results with respect to comparing attribute values that never co-occur in a single tuple. It heuristically optimizes a cluster quality function with respect to the number of links in an agglomerative hierarchical fashion.
The base algorithm exhibits cubic complexity in the number of records, which makes it unsuitable for large datasets. Unlike earlier algorithms it characterized the detected categorical clusters. The algorithm relied on inter and intra attribute summaries that are assumed to fit into main memory for most categorical datasets.
It first computed cluster projections onto the individual attributes. To reduce the complexity of this step, the authors assumed the existence of a naalysis number K that represents the minimum size of the distinguishing sets which are attribute value sets that uniquely occur within only one cluster.
The distinguished sets are then extended to cluster projections. Finally, cluster projections could reviww combined to clusters candidates over multiple attributes which are validated against the original dataset. The distinguished sets rely on the assumption that clusters are uniquely identified by a core set of attribute values that occur in no other cluster. While this assumption may hold true for many real world datasets, it is unnatural and unnecessary for the clustering process. Moreover, it continue reading desirable to choose K as low as computationally literatur in order to detect all clusters.
A small K, however, entails a large number of candidate cluster projections on the individual attributes that lead to a combinatorial explosion in the number of final clusters. The cluster projections on single attributes that generated are used in its extension phase to clkster cluster candidates of higher dimensionality that are then validated on the actual dataset. In this approach to this end selected initial one dimensional candidate C1 all cluster projections c1 on the first attribute. The candidates are validated by scanning the original dataset and counting the support of each candidate.
However, for both methods, the combination operator, anlaysis well as local modification operations are left to the user to find depending on the concrete data. Finally, the post-processing required to generate the actual clusters from the basin weights upon reaching the fix point is non-trivial and impacts the detected clusters. The clusters identified were shown anallysis be incomplete in cases of overlapping cluster projections.
The algorithm introduced by Barbara et al. It first bootstraps itself using literature review cluster analysis sample of maximally dissimilar points from the dataset to create initial clusters. The remaining points are then added incrementally. This approach was highly dependent on the order of selection. To mitigate this dependency, the authors propose to remove the worst fitting points at defined times during the execution and re-clustering them.
Cristofor and Simovici presented another approach based on cluster entropy measures for categorical attributes. Starting from a seed clustering, it uses GAs with crossover and mutation operators to heuristically improve the purity of the generated clusters. The quality of the resulting clusters depends on a prior knowledge of the importance of the individual attributes toward the natural clustering. There are earlier works that apply GA and evolutionary programming to clustering.
Some of them deal with clustering a set of objects by assuming that the appropriate value of K is known Mertz and Zell, It is based on the well known K-Means algorithm. They use two objective functions that are minimized simultaneously: K value and anxlysis best grouping of the objects into these K clusters. Cui rwview Potok proposed a PSO based hybrid document clustering algorithm. Literatture PSO clustering algorithm performs a globalized search in the entire solution space.
The results illustrated that the hybrid PSO algorithm can generate more compact clustering results than the K-Means algorithm.
Requirements cluster review literature analysis hear that
Fun and Chen introduced an evolutionary PSO learning-based method to optimally cluster N data points into K clusters. The hybrid PSO clutser K-Means, with a novel alternative metric algorithm are called Alternative KPSO-clustering method. This is developed to automatically detect the cluster litdrature of geometrical structure data sets.
In this algorithm, the special alternative metric is considered to improve the traditional K-Means clustering algorithm to analjsis with various sets of data. Csorba and Clustsr presented a novel topic based document clustering technique where there is no need to assign all the documents to the clusters.
Under such conditions the clustering system can provide a much cleaner result by rejecting the classification of documents with ambiguous topic. This is achieved by applying a confidence measurement for every classification result and by discarding documents with a confidence value less than a predefined lower limit. This means that the system returns the classification for a document only if it feels sure about it.
If not, the document is marked as unsure. Beside this ability the confidence measurement allows the use of a much stronger term filtering, performed by a novel, supervised term cluster creation and term filtering algorithm, which is presented in this paper as well. In high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space.
For example, in text clusster, clusters of documents of different topics are categorized by different subsets of terms or keywords. The keywords for one cluster may not occur in the documents of other clusters. This is a data sparsity problem faced in clustering high-dimensional data. Revjew the new algorithm, The proposed k-means clustering calculates a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process.
An additional step is added to the k-means clustering process to automatically compute the weights of all dimensions in each cluster.
In this way, clusters are kept small and compact since all documents within a cluster have a maximum distance to each other Frakes and Baeza-Yates, The ratio of the mutation operators is based on the rank of the offspring generated by the mutation operator. The equation for cosine is: The K-means algorithm is by far the most popular clustering tool used in scientific and industrial applications. Image Segmentation Using Clustering.
They used cluster overlapping phenomenon to design cluster merging criteria. The system computes the overlap rate in order to improve time efficiency and the veracity and the line passed through the two cluster's center instead of the ridge curve. Based on the hierarchical clustering method it used the Expectation-Maximization EM algorithm in the Gaussian mixture model to count the parameters and make the two sub-clusters combined when their overlap is the anakysis.
Traditional keyword-based document clustering techniques have limitations due to simple treatment of words and hard separation of clusters. First, the traditional keyword-based vector space model is adapted with vectors defined over spaces of entity names, types, name-type pairs and identifiers, instead of keywords. Then, hierarchical fuzzy document clustering can be performed using a similarity measure of the vectors representing documents.
For evaluating fuzzy clustering quality, they proposed a fuzzy information variation measure to compare two fuzzy partitions. Premalatha and Natarajan ab introduced a hybrid method using PSO and GA for document clustering. In this amalysis, both PSO and GA are run in parallel.
A fast clustering algorithm to eeview very large categorical data sets in data mining. The second literature review cluster analysis is the weighting of the indexed terms to enhance retrieval of document relevant to the user. A relatively new technique was proposed by Zamir and Etzioni However, for both methods, the combination operator, as well as local modification operations are left to the user to find depending on the concrete data. Proceedings of the 5th Conference on ACM Special Interest Group on Knowledge Discovery and Datamining, August, San Diego, California, United States, pp: Buckshot selects a small sample of documents to pre-cluster them using a standard clustering algorithm and assigns the rest of the documents to the clusters formed.
If the gbest particle stagnates, it can be replaced by a new particle. The new particle is generated by performing crossover operation on chromosome with gbest particle. It litersture the diversity of the population. Premalatha and Natarajan a proposed Binary Revieww with Local Search for Document Clustering. Premalatha and Natarajan revkewb presented a procreant PSO algorithm for document clustering. This algorithm is a hybrid of Particle Swarm Optimization and Genetic Algorithma population-based heuristic search technique, which can be used to solve combinatorial optimization problems, modeled on the concepts of cultural and social rules derived from the analysis of the swarm intelligence PSO and also based on crossover and evolution GA.
In standard Litfrature the non-oscillatory route can quickly cause a particle to stagnate and also, it may prematurely converge on cpuster solutions that are not even guaranteed to local optimal solution. They proposed modification strategy for PSO algorithm and applied to the document corpus. The strategy adds reproduction using crossover when stagnation in the movement of the particle is identified and carries out local search to improve the goodness of fit.
Reproduction has the capability to achieve faster convergence and better solution. Premalatha and Natarajan c proposed a document clustering literatjre on Genetic Algorithm with Simultaneous and ranked mutation. In this study, more than one mutation operator is applied. The ratio of the mutation operators is based on the rank of the offspring generated by the mutation operator. Muflikhah and Baharudin proposed a method that integrates the information retrieval method and document clustering as concept space reciew.
It used the Latent Semantic Index LSI approach which used Singular Vector Decomposition SVD or Principle Component Analysis PCA. LSI method is used to cluuster the matrix dimension by finding the pattern in document collection with refers to concurrent of the terms. Each method is implemented to weight of term-document in vector space model VSM for document clustering using fuzzy c-means algorithm.
Document clustering finds overall similarity among groups of documents. It is often applied to the huge document corpus in order to make a partition based on their similarity. It can lead to more effective retrieval than linear litrrature which ignores the relationships that exist between documents It is used to improve the precision and recall from query.
Document clustering is very useful to retrieve information application in order to reduce the consuming time and get high precision and recall. On the merits of building categorization systems by supervised clustering. Proceedings of the 5th Conference on ACM Special Interest Group on Knowledge Discovery and Datamining, August, Ligerature Diego, California, United States, pp: Evaluating combinations of ranked lists and visualizations of inter-document similarity.
Automatic linking of thesauri. Proceedings of the 19th Revieq ACM Special Interest Group on Information Retrieval Conference, Aug. An Overview of Cluwter Data Analysis. Clustering and Classification, Arabie, P. World Scientific Publishing Co. An entropy-based algorithm for categorical clustering. Proceedings of the 11th International Conference on Information and Knowledge Management, Nov. Clustering gene expression patterns. Proceedings of the 3rd Annual International Conference on Computational Molecular Biology,Lyon, France, pp: A survey on page rank computing.
Optimizations of inverted vector searches. Proceedings of the ACM Special Interest Group on Information Retrieval Clustef, JuneMontreal, Quebec, Canada, pp: Probabilistic modeling of transaction data with applications to profiling, visualization and prediction. Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August, San Francisco, California, pp: Fuzzy named entity-based document clustering. Proceedings of the IEEE International Conference on Fuzzy Systems, June, Hong Kong, pp: Document clustering into an unknown number of clusters using a genetic algorithm.
Discovering Culster from Hypertext Data. Morgan Kaufmann Publishers, California, ISBN: Data preparation for mining world wide web browsing. An information-theoretical approach to clustering categorical databases using genetic annalysis. Organizing and searching large files of clister descriptions. Thesis, Churchill College, University of Cambridge.
Csorba and Vajk, Term clustering and confidence measurement in document clustering. Proceedings of IEEE Litreature Conference on Computational Cybernetics, Aug. A cluster-based approach to browsing large document collections. Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, June, Copenhagen, Denmark, pp: Maximum likelihood from incomplete data via the EM algorithm.
Efficient Clustering of Very Large Document Collections. Data Mining revisw Scientific and Engineering Applications, Grossman, R. Kluwer Academic Publishers, USA. Pattern Classification and Scene Analysis. John Wiley literathre Sons, New York. An analysis of recent work on clustering litsrature. Technical Report, University of Washington. Information Retrieval Data Structures and Algorithms.
Introduction to Statistical Pattern Recognition. Academic Press, San Diego, CA. Clustering categorical data using summaries. Proceedings of the International Conference on ACM Special Interest Group on Knowledge Discovery and Data Mining, Aug. Vector Quantization and Signal Compression. Kluwer Academic Publishers, Boston, ISBN Scalable Clustering Methods for Data Mining. Handbook of Data Mining, Nong, Y. An approach based on dynamical systems. Proceedings of the 24th International Conference on Very Large Databases, Aug. A robust clustering algorithm for categorical attributes. Proceedings of the 15th International Conference on Data Engineering, March, Sydney, Australia, pp: Spatial Clustering Methods in Data Mining: Geographic Data Mining and Knowledge Discovery, Miller, H.
Taylor and Francis, USA. Re-examining the cluster hypothesis: Proceedings of the 19th Annual Conference on ACM Special Interest Group on Information Retrieval, Aug. A fast clustering algorithm to cluster very large categorical data cluter in data mining. Proceedings of ACM Workshop on Research Issues on Data Mining and Knowledge Discovery, DecemberTucson, AZ.
Image Segmentation Using Clustering. Algorithms for Clustering Data.