本文介绍情感分析领域最常见的一些数据集。

目录

  • TOC
    {:toc}

SemEval

SemEval-2014 Task 4: Aspect Based Sentiment Analysis


这是SemEval-2014语义评测任务的第4个任务,它又包含4个子任务。

子任务1:Aspect term extraction

给定针对某个entity(比如餐馆)的一些句子,识别其中的aspect term。比如句子"The food was nothing much, but I loved the staff",我们需要识别"food"和"staff"这两个aspect term。一个句子里可能会出现多个(或者零个)aspect term。另外aspect term可能包含多个词,比如"The hard disk is very noisy",这里的aspect term是"hard disk"。

子任务2:Aspect term的极性分类

给定一个句子和这个句子里的所有aspect term,判定每一个term的情感极性。可能的极性包括正面(positive)、负面(negative)、中性(neutral)和冲突(conflict)。比如:

“I loved their fajitas” → {fajitas: positive}
“I hated their fajitas, but their salads were great” → {fajitas: negative, salads: positive}
“The fajitas are their first plate” → {fajitas: neutral}
“The fajitas were great to taste, but not to see” → {fajitas: conflict}

冲突的意思是在这个句子里既有正面的评价也有负面的评价,比如上面的第四个句子。

子任务3:Aspect类别(category)识别

因为很多不同的aspect term都可以归为一类,比如fajitas和salads都是餐馆的菜品,我们希望把它们都归类到food。这个任务定义了几个类别,比如餐馆(restaurant)的数据集上定义里food, service, price, ambience, anecdotes/miscellaneous等5个类别。这个任务为:给定一个句子,识别出其中的类别(注意一个句子可能包含多个类别)。比如:

“The restaurant was too expensive” → {price}
“The restaurant was expensive, but the menu was great” → {price, food}

有的读者可能回想,如果能识别aspect term,然后再判断aspect term是哪个category。这可能有一个问题,对于隐式的aspect,可能只有形容词而没有名词,比如第一个句子没有price这样的aspect term,我们需要根据形容词expensive来推测类别为price。

子任务4:Aspect类别的情感分类

给定一个句子以及句子里的一个或者多个aspect类别,输出每个类别的情感分类。和前面的term分类一样,这里的分类也是正面(positive)、负面(negative)、中性(neutral)和冲突(conflict)。比如:

“The restaurant was too expensive” → {price: negative}
“The restaurant was expensive, but the menu was great” → {price: negative, food: positive}

对于上面的第一个例子,输入是句子和negative与food两个类别,输出是这两个类别的极性。

示例数据

全部数据可以在这里下载,它包括餐馆和笔记本电脑两个数据集,其中餐馆数据集包含上面的4个子任务的标注,而笔记本电脑的数据只有前两个任务的标注数据(没有类别的标注)。

下面是餐馆的一个示例数据:

<sentence id="813">
          <text>All the appetizers and salads were fabulous, the steak was mouth watering and the pasta was delicious!!!</text>
          <aspectTerms>
                    <aspectTerm term="appetizers" polarity="positive" from="8" to="18"/>
                    <aspectTerm term="salads" polarity="positive" from="23" to="29"/>
                    <aspectTerm term="steak" polarity="positive" from="49" to="54"/>
                    <aspectTerm term="pasta" polarity="positive" from="82" to="87"/>
          </aspectTerms>
          <aspectCategories>
                    <aspectCategory category="food" polarity="positive"/>
          </aspectCategories>
</sentence>

下面是笔记本电脑的示例:

<sentence id="353">
          <text>From the build quality to the performance, everything about it has been sub-par from what I would have expected from Apple.</text>
          <aspectTerms>
                    <aspectTerm term="build quality" polarity="negative" from="9" to="22"/>
                    <aspectTerm term="performance" polarity="negative" from="30" to="41"/>
          </aspectTerms>
</sentence>

SemEval-2015 Task 12: Aspect Based Sentiment Analysis

任务的介绍主要参考了SemEval-2015 Task 12: Aspect Based Sentiment Analysis,官方网站为SemEval-2015 Task 12

这是SemEval-2014任务的任务假定评论的都是给定实体(餐馆或者笔记本电脑)的某个属性,但是我们也可能点评这个实体的部件,比如笔记本电脑的鼠标。前面介绍过,aspect更加通用的表示方法是一棵树。不过这里的任务还是简化里一些,认为这棵树最多两层,树根是餐馆或者笔记本电脑,我们可以点评电脑的属性(比如价格),也可以点评部件鼠标的属性(比如鼠标的灵敏度)。此外,有一些点评aspect的句子并不见得会出现对应的名词,比如下面的文字:

They sent it back with a huge crack in it and it still didn't work; and that was the fourth time I’ve sent it to them to get fixed

它点评的实体是餐馆的服务(service),属性是服务的质量(quality),但是文字中没有任何service或者quality相关的文字。这和前面的expensive的句子类似的。因此2015年的任务预定义里所有的Entity和属性,然后让我们识别文本中出现里哪些实体和属性的组合,也就是E#A。比如上面的句子,输出就是service#quality。另外这个任务的输入不是一个一个的句子,而是整段评论,这样我们可以利用上下文信息。当然标注和识别的粒度还是句子,只不过我们的算法可以(但大部分算法都没有)利用上下文的信息。

任务1:In-domain任务

给定一个完整的评论,我们需要完成如下3个子任务。

Aspect类别识别

识别评论里所有的实体(E)和属性(A)对。E和A都是预定义集合中的某一个值,比如餐馆数据集,E包含laptop, keyboard, operating system, restaurant, food, drinks等实体和performance, design, price, quality等属性。

更具体的,对于笔记本电脑数据集来说,E共用22个实体类别(比如LAPTOP, DISPLAY, CPU, MOTHERBOARD, HARD DISC, MEMORY, BATTERY等)和9个属性标签(比如GENERAL, PRICE, QUALITY, OPERATION_PERFORMANCE等)。完整的实体列表和属性标签列表可以参考这里,下面是一些示例:

(1)  It fires up in the morning in less than 30 seconds and I have never had any issues with it freezing. → {LAPTOP#OPERATION_PERFORMANCE}
(2)  Sometimes you will be moving your finger and the pointer will not even move.  → {MOUSE#OPERATION_PERFORMANCE}
(3)  The backlit keys are wonderful when you are working in the dark.  → {KEYBOARD#DESIGN_FEATURES}
(4)  I dislike the quality and the placement of the speakers. {MULTIMEDIA DEVICES#QUALITY}, {MULTIMEDIA DEVICES#DESIGN_FEATURES}
(5)  The applications are also very easy to find and maneuver.  → {SOFTWARE#USABILITY}
(6)  I took it to the shop and they said it would cost too much to repair it.  → {SUPPORT#PRICE}
(7)  It is extremely portable and easily connects to WIFI at the library and elsewhere. → {LAPTOP#PORTABILITY}, {LAPTOP#CONNECTIVITY}

比如第一个句子是说笔记本的操作响应很快,而第二个是说鼠标的操作很不灵敏。

对于餐馆数据集来说,E有6个实体类别(RESTAURANT, FOOD, DRINKS, SERVICE, AMBIENCE, LOCATION)和5个属性标签(GENERAL, PRICES, QUALITY, STYLE_OPTIONS, MISCELLANEOUS),详细信息可以参考这里,下面是一些示例:

(1) Great for a romantic evening, but over-priced. → {AMBIENCE#GENERAL}, {RESTAURANT#PRICES}
(2) The fajitas were delicious, but expensive. → {FOOD#QUALITY}, {FOOD# PRICES}
(3)The exotic food is beautifully presented and is a delight in delicious combinations. → {FOOD#STYLE_OPTIONS}, {FOOD#QUALITY}
(4) The atmosphere isn't the greatest , but I suppose that's how they keep the prices down. → {AMBIENCE#GENERAL}, {RESTAURANT# PRICES}
(5) The staff is incredibly helpful and attentive. → {SERVICE# GENERAL}
Opinion Target Expression(OTE)识别

这个任务只有餐馆数据集上有标注数据。OTE任务的输入是所有的E#A对,需要识别E#A对里实体E对应的字符串。当隐式的表达实体时用特殊的"NULL"表示,比如代词"它"这样的代词,有的文本甚至根本找不到和E相关的字符串。下面是一些例子:

(1) Great for a romantic evening, but over-priced. → {AMBIENCE#GENERAL, “NULL”}, {RESTAURANT# PRICES, “NULL”}
(2) The fajitas were delicious, but expensive. → {FOOD#QUALITY, “fajitas”}, {FOOD# PRICES, “fajitas”}
(3) The exotic food is beautifully presented and is a delight in delicious combinations. → {FOOD#STYLE_OPTIONS, “exotic food”}, {FOOD# QUALITY, “exotic food”}
(4) The atmosphere isn't the greatest , but I suppose that's how they keep the prices down. → {AMBIENCE#GENERAL, “atmosphere”}, {RESTAURANT# PRICES, “NULL”}
(5) The staff is incredibly helpful and attentive. → {SERVICE# GENERAL, “staff”}

比如在第4个句子里,they指代的是餐馆,但是它不是OTE。

情感分类

给定一个句子(有上下文)和所有的E#A对,判断其情感分类,可能的分类为正面、负面和中性。这和2014年的任务有所不同。下面是一些示例:

(1) The applications are also very easy to find and maneuver. → {SOFTWARE#USABILITY,  positive}
(2) The fajitas were great to taste, but not to see”→ {FOOD#QUALITY, “fajitas”, positive},  {FOOD#STYLE_OPTIONS, “fajitas”, negative }
(3) We were planning to get dessert, but the waitress basically through the bill at us before we had a chance to order.  → {SERVICE# GENERAL, “waitress”, negative}
(4) It does run a little warm but that is a negligible concern.  → {LAPTOP#QUALITY neutral}
(5) The fajitas are nothing out of the ordinary” → {FOOD#GENERAL, “fajitas”,  neutral}
(6) I bought this laptop yesterday. → {}
(7) The fajitas are their first plate  → {}
任务2:Out-of-domain任务

增加里一个酒店的测试数据集(没有训练数据),然后考察模型则不同领域的泛化能力。它的输入是E#A对和句子,要求我们输出这个E#A对的情感极性。

下面是一个完整的评论的标注数据:

Review id:"1004293"

 Judging from previous posts this used to be a good place, but not any longer.
 {target:"NULL" category:"RESTAURANT#GENERAL" polarity:"negative" from:"-" to="-"}

 We, there were four of us, arrived at noon - the place was empty - and the staff acted 
 like we were imposing on them and they were very rude.
 {target:"staff" category:"SERVICE#GENERAL" polarity:"negative" from:"75" to:"80"}

 They never brought us complimentary noodles, ignored repeated requests for sugar, 
 and threw our dishes on the table.
 {target:"NULL" category:"SERVICE#GENERAL" polarity:"negative" from:"-" to:"-"}

 The food was lousy - too sweet or too salty and the portions tiny.
 {target:"food" category="FOOD#QUALITY" polarity="negative" from:"4" to:"8"}
 {target:"portions" category:"FOOD#STYLE_OPTIONS" polarity:"negative" from:"52" to:"60"}

 After all that, they complained to me about the small tip.
 {target:"NULL" category:"SERVICE#GENERAL" polarity:"negative" from:"-" to:"-"}
			
 Avoid this place!
 {target:"place" category:"RESTAURANT#GENERAL" polarity:"negative" from:"11" to:"16"}

其中from和to表示OTE字符串在句子开始和结束的下标。

SemEval-2016 Task 5: Aspect Based Sentiment Analysis

任务的介绍主要参考了SemEval-2016 Task 5: Aspect Based Sentiment Analysis,官方网站为SemEval-2016 Task 5

2016年的任务延续里2015年的任务,为它增加了新的测试数据(15年的训练数据和测试数据都变成16年的训练数据),此外它还首次加入了英语之外的多种语言,包括中文。它包括如下几个子任务:

句子级别的ABSA(Aspect-Based Sentiment Analysis)

给定某个实体(笔记本电脑、餐馆或者酒店)的一篇评论的一个句子,需要确定所有观点三元组的如下内容(slot):

Aspect Category Detection

这个任务是确定文本里所有出现的E#A对,其中E来自预定义的实体类列表,A来自预定义的属性标签列表。

Opinion Target Expression (OTE)

和上年的任务一样,需要确定每个E#A对里实体对应的字符串的开始和结束下标,如果找不到则输出"NULL"。只有餐馆的数据有这个子任务。

情感极性

判断每一个E#A对的情感分类,类别包括正面、负面和中性。

上面的任务在人工标注时每次处理一个句子,但是会参考它的前后上下文的其它句子。下面是一个笔记本的评论的标注示例:

S1:The So called laptop Runs to Slow and I hate it! →
{LAPTOP#OPERATION_PERFORMANCE, negative}, {LAPTOP#GENERAL, negative}
S2:Do not buy it! → {LAPTOP#GENERAL, negative}
S3:It is the worst laptop ever. → {LAPTOP#GENERAL, negative}

下面是餐馆的数据:

S1:I was very disappointed with this restaurant. →
{RESTAURANT#GENERAL, “restaurant”, negative, from="34" to="44"}
S2:I’ve asked a cart attendant for a lotus leaf wrapped rice and she replied back rice and just walked away. →{SERVICE#GENERAL, “cart attendant”, negative, from="12" to="26"}
S3:I had to ask her three times before she finally came back with the dish I’ve requested. →
{SERVICE#GENERAL, “NULL”, negative}
S4:Food was okay, nothing great. →
{FOOD#QUALITY, “Food”, neutral, from="0" to="4"}
S5:Chow fun was dry; pork shu mai was more than usually greasy and had to share a table with loud and rude family. →
{FOOD#QUALITY, “Chow fun”, negative, from="0" to="8"},
{FOOD#QUALITY, “pork shu mai”, negative, from="18" to="30"},
{AMBIENCE#GENERAL, “NULL”, negative}
S6:I/we will never go back to this place again. →
{RESTAURANT#GENERAL, “place”, negative, from="32" to="37"}
文本级别的ABSA

上面的句子级别的问题是模型不能参考上下文(人工标注是参考了的),因此还有一个文本级别的ABSA任务。它的任务和前面是一样的,只不过输入是整个评论文本,下面是一些示例。

下面是整个评论文本:

Review id:LPT1 (Laptop)
"The So called laptop Runs to Slow and I hate it! Do not buy it! It is the worst laptop ever."

期望的输出(标注)为:

{LAPTOP#OPERATION_PERFORMANCE, negative}
{LAPTOP#GENERAL, negative}

但是它并不能简单的把文本分成句子,然后把所有句子的结果合并起来,因为一个段落里可能有多个句子都在说同一个E#A对,如果是这样的话需要判断最主要的情感倾向,比如下面的例子:

Review id:RST1 (Restaurant)
"I was very disappointed with this restaurant. I’ve asked a cart attendant for a lotus leaf wrapped rice and she replied back rice and just walked away. I had to ask her three times before she finally came back with the dish I’ve requested. Food was okay, nothing great. Chow fun was dry; pork shu mai was more than usually greasy and had to share a table with loud and rude family. I/we will never go back to this place again."

它的输出为:

{RESTAURANT#GENERAL, negative}
{SERVICE#GENERAL, negative}
{FOOD#QUALITY, negative}
{AMBIENCE#GENERAL, negative}

它就是前面句子级别的同一段文本,关于FOOD#QUALITY有一个中性两个负面的,因此总的情感倾向是负面的。如果多个句子的情感倾向是冲突的,比如一个正面一个负面,则需要识别为冲突(conflict)。比如下面的例子:

Review id: RST2 (Restaurant)
“This little place has a cute interior decor and affordable city prices. The pad seew chicken was delicious, however the pad thai was far too oily. I would just ask for no oil next time.”

它的输出为:

{AMBIENCE#GENERAL, positive}
{RESTAURANT#PRICES, positive}
{FOOD#QUALITY, conflict}
{RESTAURANT#GENERAL, positive}

FOOD#QUALITY既有正面的又有负面的,因此标注为冲突。

Out-of-domain ABSA

这个任务的测试数据的领域没有训练数据,它考察的是模型在不同领域的泛化能力。

IMDB

电影的评论数据,二分类任务,包括25,000个训练数据和25,000个测试数据。可以在这里下载。