ebooks logo journals logo reference works logo abstract databases logo
bullet  SIGN IN Register | Why Register? | Got a Voucher? alerts   marked lists   shopping cart 

informaworld

HOME   |   SEARCH   |   BROWSE
    Issues List       Latest Issue       Volume 9 Issue 3       Subscribe       Article       Related articles      
<< firstfirst   < prevprev   Table of contentstoc   next >next   last >>last
Publisher Logo Publication Cover
Search within this journal

STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS 

Authors: R. D. King - Address correspondence to Dr. Ross D. King, Biomolecular Modelling Laboratory,Imperial Cancer Research Fund, P. O. Box 123,44 Lincoln's Inn Fields, London WC2A 3PX, UK. E-mail: rd_king@icrf.ac.uk.a;  C. Feng - Present address of C. Feng is Computer Science Department, Ottawa University, Ottawa, Ontario, Canada. E-mail: cfeng@csi.uottawa.ca.b; A. Sutherland - Present address of A. Sutherland is Hitachi Dublin Laboratory, Dublin, Ireland.a
Affiliations:   a Department of Statistics, Strathclyde University, Glasgow, UK
b The Turing Institute Ltd, Glasgow, UK
DOI: 10.1080/08839519508945477
Publication Frequency: 10 issues per year
Published in: journal Applied Artificial Intelligence, Volume 9, Issue 3 May 1995 , pages 289 - 333
Formats available: PDF (English)
Article Requests: Order Reprints : Request Permissions
View Article: View Article (PDF) View Article (PDF)


Abstract

This paper describes work in the StatLog project comparing classification algorithms on large real-world problems. The algorithms compared were from symbolic learning (CART. C4.5, NewID, AC2,ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (backpropagation, radial basis functions). Twelve datasets were used: five from image analysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on the data set investigated. We therefore developed a set of data set descriptors to help decide which algorithms are suited to particular data sets. For example, data sets with extreme distributions (skew > l and kurtosis > 7) and with many binary/categorical attributes (>38%) tend to favor symbolic learning algorithms. We suggest how classification algorithms can be extended in a number of directions.
Bookmark with:
  • CiteULike
  • Del.icio.us
  • BibSonomy
  • Connotea
  • More bookmarks
Privacy Policy | Terms & Conditions | Accessibility | RSS
FAQs in: English . Français . Español . 中文(简体和繁體)
© 2009 Informa plc