快十年没碰过编程了,最近公司服务器换了一下,挪腾机器的过程里面顺便触发了再写点程序玩玩的想法,于是拿起.net的教材看了两天,练手写了个,测试反正能用,哈哈,发来让高手指点指点,揪揪错!
盘古开源分词组件直接去http:///下载就是了,Pangu.dll和Pangu.xml文件放入wwwroot的bin目录,字典别忘记放进去bin下的Dictionaries目录,呵呵,pangu.xml要设置好Dictionaries目录位置
用aspx共写了三个程序,第一个:
文件名:Default.aspx,存放目录:wwwroot


代码
<% @ Page Language
=
"
C#
"
AutoEventWireup
=
"
true
"
CodeFile
=
"
Default.aspx.cs
"
Inherits
=
"
_Default
"
%>
<
html
>
<
head runat
=
"
server
"
>
<
title
>
WWW.RCSKY.NET
</
title
>
</
head
>
<
body
>
<
p
>
<
form id
=
"
form1
"
runat
=
"
server
"
>
<
div
class
=
"
align-center
"
>
<
p
>
原文:
<
asp:Label ID
=
"
fc_content
"
runat
=
"
server
"
Text
=
"
分词内容
"
></
asp:Label
>
<
p
>
分词结果:
<
br
><
asp:Label ID
=
"
fc_result
"
runat
=
"
server
"
Text
=
"
分词结果
"
></
asp:Label
>
<
p
><
asp:DataGrid id
=
"
Orign
"
runat
=
"
server
"
HeaderStyle
-
BackColor
=
"
#aaaadd
"
AlternatingItemStyle
-
BackColor
=
"
#eeeeee
"
/>
<
br
/>
</
div
>
</
form
>
</
body
>
</
html
>第二个:default.aspx.cs,存放目录:wwwroot


代码
using System;
using System.Data;
using System.Data.OleDb;
using Rcsky.GetKeyword;public partial class _Default : System.Web.UI.Page
{
private void Page_Load(Object src,EventArgs e)
{
string MyConnString = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source="+Server.MapPath("DatabaseDir/data.mdb");
string strSel = "select * from db_table where id="+Request.QueryString["id"];
DataSet ds = new DataSet();
OleDbConnection MyConn = new OleDbConnection(MyConnString);
OleDbDataAdapter MyAdapter = new OleDbDataAdapter(strSel,MyConn);
OleDbCommandBuilder custCB = new OleDbCommandBuilder(MyAdapter);
MyAdapter.Fill(ds,"TB_content");
Orign.DataSource = ds;
Orign.DataMember = "TB_content";
Orign.DataBind();
if (ds.Tables[0].Rows.Count > 0) {
DataRow dr=ds.Tables[0].Rows[0];
fc_content.Text=dr["description"].ToString();//对db_table的description字段进行分词
fc_result.Text = Segment.DoSegment(fc_content.Text);
}
GC.Collect();
}
}
第三个程序:keyword.cs,存放目录:wwwroot/App_Code


代码
using System;
using System.Collections;
using System.Collections.Generic;
namespace Rcsky.GetKeyword {
public class
Segment {
public static
string
DoSegment(
string
keyWord)
{
return DoSegment(keyWord, "
<br>
"
); //分词输出的间隔符
}
public static
string
DoSegment(
string
keyWord,
string
separator) {
PanGu.Segment.Init();
PanGu.Segment segment = new
PanGu.Segment();
ICollection < PanGu.WordInfo >
words
=
segment.DoSegment(keyWord);
keyWord = ""
;
int i =
0
;
string v_list =
""
;
foreach (PanGu.WordInfo wordInfo in
words)
{
v_list=wordInfo.Word+"^"+wordInfo.Rank+"^"+wordInfo.Frequency+"^"+wordInfo.WordType+"^"+wordInfo.Pos;
// 词 +
"
^
"
+
权重
+
"
^
"
+
词频
+
"
^
"
+
"
^
"
+
词性;
if (i ==
0
) keyWord
=
v_list;
else keyWord +=
separator
+
v_list;
i ++ ;
}
return keyWord;
}
}
}
运行default.aspx,结果是这个样子滴,呵呵,词^权重^词频^词性,我的程序没有做任何校验和判断,看官要自己加上,要不id缺失,或者表的description为Null,应该会出错滴
问题关键不在这,呵呵,反正有了“词^权重^词频^词性”这个结果,后面的事情不就好办了,切分一下,爱筛选也行,爱按权重词频排序也行,那就不细说了。我计划的应用是自动提取文章的关键词,填入到db_table的keywords字段里面去,这样无论搜索还是输入到页面做SEO,不都挺好用的嘛,呵呵
















