+高级检索
基于多标签分类的学术文献潜在时间意图识别研究
DOI:
作者:
作者单位:

作者简介:

通讯作者:

基金项目:


Research on Identifying Potential Temporal Intentions of AcademicLiterature Based on Multi-label Classification
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    为了提高检索结果的时间相关性,将文本特征抽取和多标签分类算法应用于文献检索的潜在时间意图分类研究之中.从检索潜在时间意图分类的角度出发,提出一种基于文本时间信息抽取和Labeled LDA(标签主题模型)的文献潜在时间意图自动分类算法.首先,在获取的文献时间信息基础上,将文献检索潜在时间意图映射至具体时间类别.其次,为了减少时间信息的稀疏性对分类特征学习过程的影响,利用交叉学科中时间短语分布特征优化Labeled LDA分类模型的标签选择过程.最后,将所提算法与其他多标签分类算法进行对比实验,分析和评估文献检索潜在时间意图自动分类的准确率.结果表明,所提算法的AUC的值达到79.6%,较同类基准算法ECC(整体分类链)提高约10.9%,且针对不同学科均取得了较好的分类效果,是一种有效的文献检索潜在时间意图学习方法.

    Abstract:

    In order to enhance the temporal relevance of retrieval result,the text feature extraction and algorithm of multi-label classification were applied to potential temporal intention classification of literature retrieval. From the perspective of retrieving the classification of potential temporal intentions,an algorithm was proposed to automatically classifiy potential temporal intentions of literature,based on text temporal information extraction and labeled LDA. Firstly,by use of such temporal information,the potential temporal intention of literature retrieval was mapped onto specific temporal categories based on temporal information gained from literature. Secondly,the distribution features of temporal phrases across disciplines were used to optimize the process of label selection of the classification model of labeled LDA in order to reduce the impact of sparsity of temporal information on the learning process of classification features. Finally,the proposed algorithm was compared with other multi-label classification algorithms in specific experiments,and the accuracy of automated classification of potential temporal intentions of literature retrieval was analyzed and evaluated. The result shows that the AUC value of the proposed algorithm reaches 94.3%,which increases approximately 4.3%,compared with the algorithm of ECC (Ensembles of Classifler Chains). In addition,the present algorithm has produced favorable classifying effects in different disciplines. Thus,it is an effective learning method for potential temporal intention of literature retrieval.

    参考文献
    相似文献
    引证文献
文章指标
  • PDF下载次数:
  • HTML阅读次数:
  • 摘要点击次数:
  • 引用次数:
引用本文

沈思,吴玺煜.基于多标签分类的学术文献潜在时间意图识别研究[J].湖南大学学报:自然科学版,2017,44(10):158~165

复制
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2017-10-30
  • 出版日期:
作者稿件一经被我刊录用,如无特别声明,即视作同意授予我刊论文整体的全部复制传播的权利,包括但不限于复制权、发行权、信息网络传播权、广播权、表演权、翻译权、汇编权、改编权等著作使用权转让给我刊,我刊有权根据工作需要,允许合作的数据库、新媒体平台及其他数字平台进行数字传播和国际传播等。特此声明。
关闭