+Advanced Search

Performance and Choice of Chinese Text Classification Models in Different Situations
Author:
Affiliation:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
    Abstract:

    N-Gram, Nave Bayes, K nearest neighbors and TF-IDF are classical text classification models with a wide range of applications. People are often puzzled about which classification model should be used in a certain Chinese text classification task. This paper collected more than ten thousand Chinese news texts, and designed a series of experiments to analyze the performance of these models in varied situations from classification parameters, training data scale, text length and skewed data sets. The characteristics of these models were summarized, which provides a practical guide for the model selection in Chinese text classification tasks.

    Reference
    Related
    Cited by
Article Metrics
  • PDF:
  • HTML:
  • Abstract:
  • Cited by:
Get Citation
History
  • Received:
  • Revised:
  • Adopted:
  • Online: April 26,2016
  • Published: