✅作者简介:热爱科研的Matlab仿真开发者,修心和技术同步精进,matlab项目合作可私信。

🍎个人主页:Matlab科研工作室

🍊个人信条:格物致知。​

⛄ 内容介绍

一种基于长短期记忆(LSTM)模型的多标签行业分类方法及装置,其方法包括:采集公司名,公司描述,公司经营范围数据;按类划分测试集,以及对所述采集的数据进行切分词等预处理操作;采用LSTM模型构建多个二分类器对所述预处理后的数据进行分类训练,以训练数据真实标签作为寻优方向,训练出多标签行业分类模型;以精度,召回率和F1值作为评估指标,实现对测试集数据的自动评估,并采集小部分新公司数据抽样进行人工评估,最终训练出精度更高的多标签行业分类模型;利用基于LSTM的多标签行业分类模型自动实现对待分类公司的多个行业标签预测.本发明的实施可大大降低人工标注成本,提高分类精度,且符合大多数公司非单一行业,而具有多个行业属性的情况.

⛄ 部分代码

% @=============================================================================

%

% Reference: Identifying Neuroimaging Biomarkers of Major Depressive Disorder

%            from Cortical Hemodynamic Responses 

%            Using Machine Learning Approaches

% =============================================================================@

%

% This function is developed by Optical Imaging Lab, BME, NUS

% https://wiki.nus.edu.sg/display/OIL/NUS+Optical+Bioimaging+Laboratory

% The data is provided by fNIRS imaging team, iHealthTech, NUS

% https://ihealthtech.nus.edu.sg/

%

% @=============================================================================

function main()

    clc; clear; close all;

    

    % Add current path and all subfolders to the path.

    addpath(genpath(pwd));

    

    % -----------------------------------------------------------------

    % Pre-processed NIRS signals by: linear fitting, moving average, and

    %   removing artifact channels. 

    % The generated ∆HbO dataset: 

    %   samples_52ch_HbO.mat

    

    

    % -----------------------------------------------------------------

    % Three steps of fNRIS signals analysis for differing MDDs from HCs

    step = 3;

    

    if step == 1

        Method1_RankingFeatures();

        

    elseif step == 2

        Method2_GASelection();

        

    elseif step == 3

        Validation();

        

    end

    

end




%% ===================================================================

% Feature Selection Method I: Ranking Features by Statistical Test

function Method1_RankingFeatures()

    %   (1) A data matrix consists of 52 channels × 16 variables was extracted.

    %   (2) Statistical test was applied to find the significantly different 

    %       channels on each variable, and subsequently generate feature channels 

    %       as predictors for a classifier. 

    stat_result = statistics_test_feature();

    % Results: 

    %           Supplementary Figure 1 - Color Map

    %           Supplementary Figure 2 - count_sigdiff_channels

    %           Supplementary Table 3 - hc_cf_mstd, mdd_cf_mstd, pvalues_cluster

    

    

%     % generate feature set with significant difference

%     generate_sigdiff_feature(stat_result.feature_names, ...

%                               stat_result.diff_feature_cluster, ...

%                               stat_result.count_sigdiff_channels);

    


    %   (3) Five supervised models were implemented to learn pattern 

    %       from feature channels

    %   (6)(10) Performances were evaluated by five-fold cross-validation and 

    %       prediction accuracy

    data_type       = 'sigdiff_feature_topsigch';   % 'sigdiff_feature_topcluster'

    feature_type    = 'feature_channel';

    feature_fname   = 'featureset_sigdiff_5ch.mat';

    integral_type   = '';

    centroid_type   = '';

    model_type      = 'funcfit_nb';

    [pred_train, pred_test] = test_feature_performance(data_type, ...

                                feature_type, feature_fname, ...

                                integral_type, centroid_type, ...

                                model_type);

    % Result: Supplementary Table 6

    

    %   (10) performances were estimated by nested cross-validation 

    [result_inner_cv, result_outer_train, result_outer_test] =      ...

                              nested_crossvalidation(data_type,     ...

                                feature_type, feature_fname,        ...

                                '', '', model_type);

    % Result: Supplementary Table 6

end



%% ===================================================================

% Feature Selection Method II: Two-phase Feature Selection by Genetic Algorithm

%    

function Method2_GASelection()

    % -------------- Phase-One --------------

    % The input is the candidate channels from one of the 10 significant variables, 

    % while the output is a channel subset of the specific variable.

    % The optimization of channel selection was performed over all 10 variables.

    data_type       = 'integral';

    feature_type    = 'feature_channel';

    feature_fname   = '';

    integral_type   = 'integral_stim';

    centroid_type   = '';

    model_type      = 'funcfit_svm';

    func_pop        = @func_population_rand;

    binary_ga(data_type, feature_type, feature_fname, ...

                integral_type, centroid_type, model_type, func_pop);

    % Result: ga_ma50_integral_stim__svm_0.7316_0.7363.mat, etc.

    

    % -------------- Phase-Two --------------

    % The selected channel subsets from 10 significant variables were then 

    % combined into a feature set, i.e., fusion features.

    generate_fusion_feature('svm');

    % Result: fusion_10variants_svm.mat, etc.

    

    % GA learned which feature channels contributed best to the accuracy 

    % of a supervised model.

    data_type       = 'fusion_feature';

    feature_type    = 'feature_channel';

    feature_fname   = 'fusion_10variants_svm.mat';

    model_type      = 'funcfit_svm';

    func_pop        = @func_population_optm;

    binary_ga(data_type, feature_type, feature_fname, ...

                 '', '', model_type, func_pop);

    % Results:   

    %           ga_fusion_features_svm_0.8053_0.7802.mat, etc.

    %           Supplementary Figure 3

end



%% ===================================================================

% Validate the Performance of Optimal Features

function Validation()

    % -----------------------------------------------------------------

    % Classification performances were reported by the 

    % 5-fold cross-validation in training set and 

    % prediction accuracy in test set.

    data_type       = 'ga_optimal_feature';

    feature_type    = 'feature_channel';

    feature_fname   = 'ga_fusion_features_svm_0.8053_0.7802.mat';

    model_type      = 'funcfit_svm';

    [pred_train, pred_test] = test_feature_performance(data_type,   ...

                                feature_type, feature_fname,        ...

                                '', '', model_type);

    % Result: TABLE 1 of main text

    

    % ----------------------------------------------------------------------

    % Classification performances were estimated by nested cross-validation 

    [result_inner_cv, result_outer_train, result_outer_test] =      ...

                              nested_crossvalidation(data_type,     ...

                                feature_type, feature_fname,        ...

                                '', '', model_type);

    % Result: TABLE 1 of main text

    

    

    % -------------- Characteristics of optimal feature --------------

    p1_fus_feature_fname = 'fusion_10variants_svm.mat';

    p2_opt_feature_fname = 'ga_fusion_features_svm_0.8053_0.7802.mat';

    [pvalues_optfeatures, pvalues_roifeatures, ...

            hc_roi, mdd_roi, count_common_channels] = ...

        test_optimalfeature(p1_fus_feature_fname, p2_opt_feature_fname);

    % Results: 

    %           Figure 2 of main text -- hc_roi, mdd_roi

    %           Figure 3 of main text -- count_common_channels

    %           Supplementary Table 4 -- hc_roi, mdd_roi, pvalues_roifeatures

    

    

    % -------------- Common features between different models --------------

%     test_common_features();

end

⛄ 运行结果

分类预测 | MATLAB实现LSTM长短期记忆神经网络多特征分类预测_预处理

⛄ 参考文献

[1]易思宇. 基于LSTM心冲击信号的心率异常分类方法的研究. 

[2]彭燕虹, 潘嵘, 周赖靖竞,等. 一种基于长短期记忆(LSTM)模型的多标签行业分类方法及装置:, CN106777335A[P]. 2017.

⛄ 完整代码

❤️部分理论引用网络文献,若有侵权联系博主删除
❤️ 关注我领取海量matlab电子书和数学建模资料