文本描述
本人郑重声明:本人所呈交的学位论文,是在导师的指导下,独立进行研究所取
得的成果。除文中已经注明引用的内容外,本论文不含任何其他个人或集体已经发表
或撰写的作品,也不包含为获得安徽财经大学或其他教育机构的学位或证书所使用过
的材料。对本文的研究做出重要贡献的个人和集体,均已在文中标明并表示了谢意。
本声明的法律后果由本人承担。
论文作者(签名):
2022年 5月 17日
本论文作者完全了解学校关于保存、使用学位论文的管理办法及规定,即学校有
权保留并向国家有关部门或机构送交论文的复印件和电子版,允许论文被查阅和借阅。
本人授权安徽财经大学可以将本学位论文的全部或部分内容编入学校有关数据库和
授权学校研究生处与中国知网和万方数据签订收录协议及收录并由作者本人享有、承
担相应的权利和义务,也可以采用影印、缩印或扫描等复制手段保存或汇编本学位论
文。
注:保密学位论文,在解密后适用于本授权书。
作者签名:
2022年 5月 17日
`
III
ABSTRACT
2020年,北大方正等国有企业发生信用债违约和永续债展期后,一石激起千层
浪,恶性信用违约迅速扩散至数十家国有企业。在信用市场面临严峻考验的同时,财
务困境预测成为了决策者和学者关注的重点研究方向。为了更全面地考察上市公司的
财务状况,除了财务数据以外,以“管理层讨论与分析”(MD&A)为代表的上市公司
披露文本也引起了学者的广泛关注。目前,多数相关研究从文本披露指标或特征词频
的角度量化了 MD&A文本信息,并且证明了融入文本信息能够提升模型对财务困境
预测的准确率。但是,无论是文本披露指标还是特征词频,都舍弃了文本中的有效语
义信息,导致融合文本信息对财务困境预测效果的提升程度相对有限。
为了进一步提升模型对财务困境预测的准确率,本文提出了 BERT+HAN模型对
MD&A的语义信息进行量化分析,结合语义信息和财务数据预测上市公司财务困境。
本文首先使用了微调后的预训练模型 BERT对原文进行逐字嵌入,获得了包含大量语
义信息的字向量,随后设计了带有分层自注意力机制的神经网络(HAN),先后在“字”
和“句”的层面提取文本的有效信息,逐步将全文的字向量转换成若干句向量和最终的
文本向量,再通过神经网络的全连接层结合文本向量和财务指标预测财务困境。与使
用文本披露指标或特征词频表示的文本信息相比,通过预训练模型得到的文本向量保
留了原始文本的大量语义信息。同时,分层自注意力机制实现了对重点语义信息的分
步萃取,因此, BERT+HAN模型中的文本向量只包括了与财务困境预测相关的最有
效信息,对财务困境预测的效果更佳。
本文以 2012-2020年中国沪深两市 A股上市公司为研究对象,以未来一至两年
内被沪深交易所冠以“ST”的上市公司作为财务困境样本,结合财务指标分别以文本披
露指标、特征词频和语义文本向量表示文本信息预测财务困境。实验对比了基准文本
挖掘方法+经典机器学习模型和 BERT+HAN模型对财务困境的预测效果。实验结果
表明,相较于基准模型中表现最佳的特征词频+极端梯度(XGBoost)提升模型,融合
了文本语义信息的BERT+HAN模型对上市公司财务困境预测的AUC值提升了1.03%,
对财务困境样本的召回率提升了 2.39%。另外,本文设计的分层自注意力机制可以通
过权重矩阵量化文本中各词句对财务困境预测的重要性,可以实现对文本重要信息的
智能标注。相关决策人员可以据此直接参考 MD&A披露的重点信息。
财务困境;深度学习;文本分类;注意力机制
IV
ABSTRACT
In 2020, state-owned enterprises such as Peking University Founder defaulted on credit
bonds or renewed perpetual bonds. Misfortune never comes alone. Malicious credit defaults
quickly spread to dozens of state-owned enterprises. While the credit market is facing severe
challenges, it has also become a key research direction for policymakers and researchers. In
order to have a more comprehensive understanding of the listed companies’ financial distress
of, in addition to structural financial indicators, the disclosure text in annual reports, such as
"management discussion and analysis" (MD&A) has also attracted widespread attention
from scholars. At present, most of the relevant studies analyzed quantitative disclosure texts
at the two levels, text disclosure indicators and word count vectors, and proved that the
integration of quantified text information successfully improve the accuracy of the model in
financial distress prediction. However, both of the text disclosure indicator and the word
count vector discard the large amount of effective semantic information in the original text,
so that there is only limited improvement for the model's predictive ability.
In order to retain the effective semantic information in MD&A as much as possible and
further improve the accuracy of the model for predicting financial distress, this paper
proposes the BERT+HAN model to quantitatively analyze MD&A, and combines financial
data to predict the financial distress. This paper first employs the fine-tuned BERT language
model to embed the original text verbatim and obtains word vectors containing a large
amount of semantic information, and then designs a neural network with hierarchical self-
attention networks (HAN), successively in the "word" and "sentence" levels. HAN extracts
the effective information of the text, thereby gradually expressing the word vectors into a
number of sentence vectors and the final text vector, and then predicts financial distress
through through the fully connected layer in terms of combined the text vector and financial
ratios. Compared with text information represented by only a few indicators or the word
count vector, the text vector retains the semantic information of the original text to the
greatest possible extent for task achievement. At the same time, the proposed HAN only
focuses on the key words and sentences in the text and the final text vector only includes
valid information related to financial distress prediction. Therefore, the effect of financial
distress prediction based on BERT+HAN is better.
This article takes the 2012-2020 A-share listed companies on the Shanghai and
Shenzhen stock exchanges in China as the research object. The listed companies that will be
labeled "ST" by the Shanghai and Shenzhen stock exchanges in the next one to two years
are marked as financial distress samples. Based on the text information represented by the
`
V
。。。以下略