sklearn.feature_extraction.DictVectorizer(sparse=True/False..)

  • - DictVectorizer.fit_transform(X) X:字典或者包含字典的迭代器返回值;返回sparse矩阵
  • - DictVectorizer.inverse_transform(X) X:array数组或者sparse矩阵返回值:转换之前数据格式
  • - DictVectorizer.get_feature_names() 返回类别名称
'''字典特征抽取取'''
from sklearn.feature_extraction import DictVectorizer
# 1、数据:字典或字典迭代器形式
data=[{"city":"北京","housing_price":250},
        {"city":"上海","housing_price":260},
        {"city":"广州","housing_price":200}]  #字典迭代器
# 2、实例化一个转换器类
transfer = DictVectorizer(sparse=True)
# 3、调用fit_transform()
data_new = transfer.fit_transform(data)
print(data_new)  #非0值的坐标,值
'''
  (0, 1)  1.0
  (0, 3)  250.0
  (1, 0)  1.0
  (1, 3)  260.0
  (2, 2)  1.0
  (2, 3)  200.0
'''
print(transfer.get_feature_names()) #返回类别名称
# 2、实例化一个转换器类
transfer = DictVectorizer(sparse=False)
# 3、调用fit_transform()
data_new = transfer.fit_transform(data)
print(data_new)  #二维数组
'''
[[  0.   1.   0.  250.]
 [  1.   0.   0.  260.]
 [  0.   0.   1.  200.]]
'''
print(transfer.get_feature_names()) #返回类别名称
#['city=上海', 'city=北京', 'city=广州', 'housing_price']