TAN Wen-tang, WANG Zhen-wen, YIN Feng-jing, GE Bing, XIAO Wei-dong
(Science and Technology on Information Systems Engineering Laboratory in National Univ of Defense Technology, Changsha, Hunan410073,China) 在知网中查找 在百度中查找 在本站中查找
State-of-the-art cross collections topic models suffer from major flaw that they can only analyze the common topics among document collections. We introduced a mixture model PCCMix (Partial comparative Cross Collections Mixture) for multi-collections CTM to detect both common topics and collection-special topics. PCCMix divides the two types of topics in document collections by estimating a probability distribution from the whole dataset in advance, and then trains the model by the Expectation-maximuzation algorithm (EM). Experiment results show that PCCMix can analyze both common topics among collections and collection special topics. The PCCMix model is very effective and can model the document collections more precisely than the two main CTM models.