✅作者简介:热爱科研的Matlab仿真开发者,修心和技术同步精进,matlab项目合作可私信。
🍎个人主页:Matlab科研工作室
🍊个人信条:格物致知。
⛄ 内容介绍
一种基于长短期记忆(LSTM)模型的多标签行业分类方法及装置,其方法包括:采集公司名,公司描述,公司经营范围数据;按类划分测试集,以及对所述采集的数据进行切分词等预处理操作;采用LSTM模型构建多个二分类器对所述预处理后的数据进行分类训练,以训练数据真实标签作为寻优方向,训练出多标签行业分类模型;以精度,召回率和F1值作为评估指标,实现对测试集数据的自动评估,并采集小部分新公司数据抽样进行人工评估,最终训练出精度更高的多标签行业分类模型;利用基于LSTM的多标签行业分类模型自动实现对待分类公司的多个行业标签预测.本发明的实施可大大降低人工标注成本,提高分类精度,且符合大多数公司非单一行业,而具有多个行业属性的情况.
⛄ 部分代码
% @=============================================================================
%
% Reference: Identifying Neuroimaging Biomarkers of Major Depressive Disorder
% from Cortical Hemodynamic Responses
% Using Machine Learning Approaches
%
% =============================================================================@
%
% This function is developed by Optical Imaging Lab, BME, NUS
% https://wiki.nus.edu.sg/display/OIL/NUS+Optical+Bioimaging+Laboratory
% The data is provided by fNIRS imaging team, iHealthTech, NUS
% https://ihealthtech.nus.edu.sg/
%
% @=============================================================================
function main()
clc; clear; close all;
% Add current path and all subfolders to the path.
addpath(genpath(pwd));
% -----------------------------------------------------------------
% Pre-processed NIRS signals by: linear fitting, moving average, and
% removing artifact channels.
% The generated ∆HbO dataset:
% samples_52ch_HbO.mat
% -----------------------------------------------------------------
% Three steps of fNRIS signals analysis for differing MDDs from HCs
step = 3;
if step == 1
Method1_RankingFeatures();
elseif step == 2
Method2_GASelection();
elseif step == 3
Validation();
end
end
%% ===================================================================
% Feature Selection Method I: Ranking Features by Statistical Test
%
function Method1_RankingFeatures()
% (1) A data matrix consists of 52 channels × 16 variables was extracted.
% (2) Statistical test was applied to find the significantly different
% channels on each variable, and subsequently generate feature channels
% as predictors for a classifier.
stat_result = statistics_test_feature();
% Results:
% Supplementary Figure 1 - Color Map
% Supplementary Figure 2 - count_sigdiff_channels
% Supplementary Table 3 - hc_cf_mstd, mdd_cf_mstd, pvalues_cluster
% % generate feature set with significant difference
% generate_sigdiff_feature(stat_result.feature_names, ...
% stat_result.diff_feature_cluster, ...
% stat_result.count_sigdiff_channels);
% (3) Five supervised models were implemented to learn pattern
% from feature channels
% (6)(10) Performances were evaluated by five-fold cross-validation and
% prediction accuracy
data_type = 'sigdiff_feature_topsigch'; % 'sigdiff_feature_topcluster'
feature_type = 'feature_channel';
feature_fname = 'featureset_sigdiff_5ch.mat';
integral_type = '';
centroid_type = '';
model_type = 'funcfit_nb';
[pred_train, pred_test] = test_feature_performance(data_type, ...
feature_type, feature_fname, ...
integral_type, centroid_type, ...
model_type);
% Result: Supplementary Table 6
% (10) performances were estimated by nested cross-validation
[result_inner_cv, result_outer_train, result_outer_test] = ...
nested_crossvalidation(data_type, ...
feature_type, feature_fname, ...
'', '', model_type);
% Result: Supplementary Table 6
end
%% ===================================================================
% Feature Selection Method II: Two-phase Feature Selection by Genetic Algorithm
%
function Method2_GASelection()
% -------------- Phase-One --------------
% The input is the candidate channels from one of the 10 significant variables,
% while the output is a channel subset of the specific variable.
% The optimization of channel selection was performed over all 10 variables.
data_type = 'integral';
feature_type = 'feature_channel';
feature_fname = '';
integral_type = 'integral_stim';
centroid_type = '';
model_type = 'funcfit_svm';
func_pop = @func_population_rand;
binary_ga(data_type, feature_type, feature_fname, ...
integral_type, centroid_type, model_type, func_pop);
% Result: ga_ma50_integral_stim__svm_0.7316_0.7363.mat, etc.
% -------------- Phase-Two --------------
% The selected channel subsets from 10 significant variables were then
% combined into a feature set, i.e., fusion features.
generate_fusion_feature('svm');
% Result: fusion_10variants_svm.mat, etc.
% GA learned which feature channels contributed best to the accuracy
% of a supervised model.
data_type = 'fusion_feature';
feature_type = 'feature_channel';
feature_fname = 'fusion_10variants_svm.mat';
model_type = 'funcfit_svm';
func_pop = @func_population_optm;
binary_ga(data_type, feature_type, feature_fname, ...
'', '', model_type, func_pop);
% Results:
% ga_fusion_features_svm_0.8053_0.7802.mat, etc.
% Supplementary Figure 3
end
%% ===================================================================
% Validate the Performance of Optimal Features
%
function Validation()
% -----------------------------------------------------------------
% Classification performances were reported by the
% 5-fold cross-validation in training set and
% prediction accuracy in test set.
data_type = 'ga_optimal_feature';
feature_type = 'feature_channel';
feature_fname = 'ga_fusion_features_svm_0.8053_0.7802.mat';
model_type = 'funcfit_svm';
[pred_train, pred_test] = test_feature_performance(data_type, ...
feature_type, feature_fname, ...
'', '', model_type);
% Result: TABLE 1 of main text
% ----------------------------------------------------------------------
% Classification performances were estimated by nested cross-validation
[result_inner_cv, result_outer_train, result_outer_test] = ...
nested_crossvalidation(data_type, ...
feature_type, feature_fname, ...
'', '', model_type);
% Result: TABLE 1 of main text
% -------------- Characteristics of optimal feature --------------
p1_fus_feature_fname = 'fusion_10variants_svm.mat';
p2_opt_feature_fname = 'ga_fusion_features_svm_0.8053_0.7802.mat';
[pvalues_optfeatures, pvalues_roifeatures, ...
hc_roi, mdd_roi, count_common_channels] = ...
test_optimalfeature(p1_fus_feature_fname, p2_opt_feature_fname);
% Results:
% Figure 2 of main text -- hc_roi, mdd_roi
% Figure 3 of main text -- count_common_channels
% Supplementary Table 4 -- hc_roi, mdd_roi, pvalues_roifeatures
% -------------- Common features between different models --------------
% test_common_features();
end
⛄ 运行结果
⛄ 参考文献
[1]易思宇. 基于LSTM心冲击信号的心率异常分类方法的研究.
[2]彭燕虹, 潘嵘, 周赖靖竞,等. 一种基于长短期记忆(LSTM)模型的多标签行业分类方法及装置:, CN106777335A[P]. 2017.