文本描述
摘要
论文题目:基于 CNN-LSTM多特征数据融合的新能源概念指数涨跌预测方案研
究
论文类型:方案策划
专业方向:金融数据分析
摘要
股票市场是资本市场的重要组成部分,也是投资者和学者研究的重点。如何
预测股票价格、股指走势是目前最契合金融专业的主流问题。国内外对于股票指
数的预测多聚焦于宏观层面,对于中观层面概念板块指数的研究相对较少。然而
近年来股票市场中热点概念的炒作越来越频繁,所以如何设计概念指数的涨跌预
测方案,并通过预测结果获得超额收益,成为了当前投资者关注的重点。
影响股票市场的因素非常复杂,预测难度较高,所以近年来更多的将计算机
技术应用到金融时间序列的预测当中,并且股票指数的预测方面以LSTM算法最
为著名,效果显著。本文以LSTM算法为基础,利用CNN算法在特征选择上的优
势,并且加入Attention机制聚焦重点特征信息,结合两者构建CNN-LSTM股票
指数涨跌预测模型,进一步提高了预测准确率和分类效果。由于概念板块很容易
受到新闻政策的影响,投资者对其关注度和情绪的变化相比其他指数变化幅度更
大,从而也导致概念板块指数回撤幅度和上涨幅度波动较大,所以本文的模型将
投资者关注度和情绪均作为影响因子,比较适合概念指数的研究。近年来新能源
是股票市场上相当活跃的一个概念,其包含了:“光伏”、“新能源车”、“清
洁能源”等相关概念成分股,所以本文以新能源概念板块指数为研究对象对其进
行实证研究来测试模型方案的有效性。
本文选取了新能源概念板块指数从2012年1月1日至2021年11月1日的
日度数据进行实证研究,通过对股票市场交易数据、技术指标数据、投资者情绪、
投资者关注度四种具有不同来源特征的数据来预测概念指数的涨跌走势。通过爬
取“新能源”和“000941”为关键词的百度指数搜索量构建投资者关注度指标;
通过爬取东方财富网站、中华财经网等十大主流财经网站的新闻标题,采用哈工
大的金融证券情感词库和人工打标的方式,利用BERT模型对新闻标题的积极、
消极、中性情绪进行分类,并将日度新闻标题的得分加总,构建投资者情绪指标,
并进一步研究多特征融合数据相对于单一特征数据在预测效果方面的差异,验证
了投资者情绪和关注度对于预测新能源概念指数走势的重要性。同时,为了优化
模型的预测效果,利用网格化进行调参寻优。
结果显示:综合比较LSTM和CNN-LSTM两个模型在风险和收益方面的表现,
I
摘要
CNN-LSTM模型的预测效果比LSTM模型更为优秀,多特征融合数据也对模型准确
率的提升有较大贡献,进一步提高了该模型的策略收益,该策略总收益为109.9%,
最大回撤为28.3%,夏普比率为2.7,远超新能源概念指数本身的基准收益61.28%,
说明本文设计的CNN-LSTM模型多特征数据融合的方案可以获得超额收益。
关键词:CNN-LSTM;新能源概念指数;多特征数据;涨跌预测
II
Abstract
Abstract
The stock market is an important part of the capital market and also the focus of
investors and scholars. How to predict the trend of stock price and stock index is the
mainstream problem most suitable for finance majors at present. The prediction of
stock index at home and abroad mostly focuses on the macro level, while the research
on the meso level concept plate index is relatively few. However, in recent years, hot
concept speculation in the stock market has become more and more frequent, so how
to design the concept index of the rise and fall forecast scheme, and through the
prediction results to obtain excess returns, has become the focus of current investors.
The factors affecting the stock market are very complex and difficult to predict,
so in recent years more computer technology is applied to the prediction of financial
time series, and the LSTM algorithm is the most famous in the prediction of stock
index, with remarkable effect. Therefore, based on LSTM algorithm, this paper takes
advantage of CNN algorithm in feature selection and adds Attention mechanism to
focus on key feature information, and combines the two to build cnN-LSTM stock
index rise and fall prediction model, which further improves the prediction accuracy
and classification effect. Due to the concept of plate is easily affected by the news
policy, investors for their attention and emotion than other index change is bigger, and
thus cause concept plate index retracement amplitude rises volatile, so in this paper,
the model of the investors' attention and emotions are as impact factor, fit in with the
concept of index research. In recent years, new energy is a very active concept in the
stock market, including: "photovoltaic", "new energy vehicles", "clean energy" and
other related concepts component stocks, so this paper tries to take the new energy
concept index as the research object to conduct empirical research on it to test the
effectiveness of the model scheme.
This paper selects the daily data of the new energy concept index from January 1,
2012 to November 1, 2021 for empirical research, and predicts the trend of the
concept index through four kinds of data with different characteristics of stock market
trading data, technical index data, investor sentiment, and investor attention. By
crawling "new energy" and "000941" as the keyword Baidu index search volume to
build investor attention indicators; Climbing through Oriental wealth website, Chinese
business network and so on ten big mainstream financial website news headlines, the
III
。。。以下略