A Partial Comparative Mixture Model for Multi-collections Documents

Home > Archive>Volume 40, Issue 11, 2013 >101-107

PDF HTML Export

Author:

TAN Wen-tang, WANG Zhen-wen, YIN Feng-jing, GE Bing, XIAO Wei-dong
TAN Wen-tang, WANG Zhen-wen, YIN Feng-jing, GE Bing, XIAO Wei-dong
(Science and Technology on Information Systems Engineering Laboratory in National Univ of Defense Technology, Changsha, Hunan410073,China)
在知网中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Abstract:

State-of-the-art cross collections topic models suffer from major flaw that they can only analyze the common topics among document collections. We introduced a mixture model PCCMix (Partial comparative Cross Collections Mixture) for multi-collections CTM to detect both common topics and collection-special topics. PCCMix divides the two types of topics in document collections by estimating a probability distribution from the whole dataset in advance, and then trains the model by the Expectation-maximuzation algorithm (EM). Experiment results show that PCCMix can analyze both common topics among collections and collection special topics. The PCCMix model is very effective and can model the document collections more precisely than the two main CTM models.

Key words:probability distributions; comparative text mining; partial comparative; PCCMix（Partial comparative Cross Collections Mixture）model; mixture model

Article Metrics

PDF:
HTML:
Abstract:
Cited by:

Get Citation

Cope

History

Received:
Revised:
Adopted:
Online:
Published:

Home

About the Journal

Editorial Board

Guides for Authors

Download

Publishing Ethics

Contact Us

中文

Article Metrics

Get Citation

History