The Paintings Dataset数据集主要是Visual Geometry Group官方开源出来的以绘画为主题的数据集,不同的绘画数据有着不同的主题,可以是一个单独的主题也可以是多个混合的主题组成的,所以这个数据集用来做图像识别任务的话就是一个比较经典的多任务学习模型了,这个在我之前的云状识别一文里面也有提到,这里就不再多赘述了。
首先看下官方数据集介绍,截图如下所示:
数据详情统计如下所示:
这里官方提供的数据集形式有别于VOC这类的数据集,他不是直接的图像形式的数据集,而是一堆链接,需要自己下载下来,这里官方一共提供了三个年份的版本的数据集可供使用,如下所示:
这里我直接使用的是2021年,也就是最新的数据集,下载下来数据集样例如下所示:
Image URL,Web page URL,Subset,Labels
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NID/QUB/NID_QUB_QUB_264-001.jpg,https://artuk.org/discover/artworks/and-the-cow-jumped-over-the-moon-168957,'test',' cow'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/GMIII/MOSI/GMIII_MOSI_A1978_72_3-001.jpg,https://artuk.org/discover/artworks/0-6-00-6-0-garratt-locomotive-203965,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1979_7964-001.jpg,https://artuk.org/discover/artworks/044t-locomotive-no-1431-passing-mosley-siding-signal-box-9593,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/CHE/CRHC/CHE_CRHC_PCF40-001.jpg,https://artuk.org/discover/artworks/080-locomotive-on-freight-duty-103049,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NOT/NTMAG/NOT_NTMAG_1997_31-001.jpg,https://artuk.org/discover/artworks/17th-and-21st-lancers-46478,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/STF/STRM/STF_STRM_832-001.jpg,https://artuk.org/discover/artworks/1st-south-staffords-on-the-march-in-burma-1944-19642,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1986_9418-001.jpg,https://artuk.org/discover/artworks/222-locomotive-built-by-george-forrester-8695,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_2004_7349-001.jpg,https://artuk.org/discover/artworks/222-locomotive-jenny-lind-9616,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1986_9421-001.jpg,https://artuk.org/discover/artworks/222-locomotive-patentee-robert-stephensons-patent-locomotive-9409,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1996_7374-001.jpg,https://artuk.org/discover/artworks/264t-locomotive-alice-9530,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/LLR/RLRH/LLR_RLRH_L_H38_1988_3_0-001.jpg,https://artuk.org/discover/artworks/2nd-battalion-the-leicestershire-regiment-as-chindits-during-operations-against-the-japanese-at-indaw-lake-burma-1944-80060,'train',' aeroplane horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/IWM/IWM/IWM_IWM_LD_5509-001.jpg,https://artuk.org/discover/artworks/43-repair-group-air-frame-repair-service-lincoln-repairing-liberator-aircraft-7481,'test',' aeroplane'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1977_5834-001.jpg,https://artuk.org/discover/artworks/460-locomotive-no-1306-mayflower-next-to-unit-m77165-in-the-paint-shop-at-horwich-works-1975-9765,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/CW/MTE/CW_MTE_45-001.jpg,https://artuk.org/discover/artworks/6th-earl-and-countess-of-mount-edgcumbe-in-coronation-robes-14840,'validation',' chair'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/NY/YAM/NY_YAM_260367-001.jpg,https://artuk.org/discover/artworks/87-squadron-gladiators-tied-together-k7967-k8027-k7972-10402,'test',' aeroplane'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/YAG/NY_YAG_YORAG_326-001.jpg,https://artuk.org/discover/artworks/a-seventeenth-century-dutch-interior-with-a-seated-lady-8374,'test',' chair diningtable dog'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/LNE/RAFM/LNE_RAFM_FA03538-001.jpg,https://artuk.org/discover/artworks/a-20-havoc-light-bomber-136117,'test',' aeroplane'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/DUR/DBM/DUR_DBM_770-001.jpg,https://artuk.org/discover/artworks/a-basket-of-flowers-with-a-dog-chasing-a-bird-44680,'train',' bird dog'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/DUR/DBM/DUR_DBM_769-001.jpg,https://artuk.org/discover/artworks/a-basket-of-flowers-with-birds-44726,'train',' bird'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/YAG/NY_YAG_YORAG_66-001.jpg,https://artuk.org/discover/artworks/a-bather-7936,'test',' chair'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/LW/NARM/LW_NARM_131900-001.jpg,https://artuk.org/discover/artworks/a-battery-of-the-royal-horse-artillery-galloping-to-a-fresh-position-182844,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/SYO/CG/SYO_CG_CP_TR_156-001.jpg,https://artuk.org/discover/artworks/a-bavarian-lake-with-fishing-boats-68811,'test',' boat'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_262-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-4097,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/KT/MMB/KT_MMB_06_029-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-76966,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w800h800/collection/NG/NG/NG_NG_NG983-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-a-cow-a-goat-and-three-sheep-near-a-building-113901,'test',' cow horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/TRR/NTI_TRR_337062-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-and-a-pony-in-a-landscape-101086,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/CAL/NTIII_CAL_290260-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-called-fleacatcher-169526,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/PNC/NTI_PNC_1420370-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-chance-and-a-jockey-102326,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_258-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-3778,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_590-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-4061,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIV/UPP/NTIV_UPP_138312-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-with-his-groom-and-two-hounds-220307,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/CAL/NTIII_CAL_290484-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-169372,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIV/KHA/NTIV_KHA_445331-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-220614,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_184-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-3599,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/SFK/NHM/SFK_NHM_1986_004-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-wooded-landscape-11244,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_455-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-near-a-building-3946,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/FEL/NTIII_FEL_1401221-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-pony-bloodhound-and-dachshund-outside-felbrigg-hall-171257,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTII/BKH/NTII_SKH_1196043-001.jpg,https://artuk.org/discover/artworks/a-bay-hunter-and-a-pug-dog-in-a-landscape-132043,'test',' dog'
https://d3d00swyhr67nd.cloudfront.net/w800h800/collection/NG/NG/NG_NG_NG818-001.jpg,https://artuk.org/discover/artworks/a-beach-scene-with-fishermen-113903,'test',' boat'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/STF/WAG/STF_WAG_OP536-001.jpg,https://artuk.org/discover/artworks/a-belgian-school-19183,'test',' chair diningtable'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/VA/PC/VA_PC_2007BP2389-001.jpg,https://artuk.org/discover/artworks/a-bird-32686,'train',' bird'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/LAN/TURT/LAN_TURT_PCF21-001.jpg,https://artuk.org/discover/artworks/a-bird-in-a-tree-152732,'validation',' bird'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/CDN/LBCN/CDN_LBCN_426-001.jpg,https://artuk.org/discover/artworks/a-bird-in-half-123669,'test',' bird'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/LDK/NTI_LDK_884912-001.jpg,https://artuk.org/discover/artworks/a-black-dog-100725,'test',' dog'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/SYO/BHA/SYO_BHA_90009742-001.jpg,https://artuk.org/discover/artworks/a-black-dog-68634,'train',' dog'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_443-001.jpg,https://artuk.org/discover/artworks/a-black-horse-at-newmarket-3596,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTII/ATT/NTII_ATT_609052-001.jpg,https://artuk.org/discover/artworks/a-black-horse-called-bishop-with-his-groom-in-a-landscape-131049,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/ERH/NTIII_ERH_201463-001.jpg,https://artuk.org/discover/artworks/a-black-horse-in-a-courtyard-167313,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/TATE/TATE/TATE_TATE_T00888_10-001.jpg,https://artuk.org/discover/artworks/a-black-horse-with-two-dogs-201729,'test',' dog horse'
可以看到:
数据集一共分为四列,第一列是图像的链接,第二列是页面的链接【这个用处不大】,第三列是该图像所属的子类,比如【train/test/val】,第四列表示图像的主题集合。
弄清楚了原始数据的含义,接下来第一步就是需要完成指定链接下图像数据的下载,核心实现如下所示:
def loadUrls2Img(data="painting_dataset_2021.csv", resDir="data/"):
"""
读取图像链接,下载图像存储本地
"""
with open("img_url.txt") as f:
img_urls=[one.strip() for one in f.readlines() if one.strip()]
print("img_urls_length: ", len(img_urls))
df=pd.read_csv(data)
print(df.head(10))
data_list=df.values.tolist()
print("data_list_length: ", len(data_list))
left_length=len(data_list)-len(img_urls)
print("left_length: ", left_length)
for one_list in data_list[1:]:
try:
img_url,web_page_url,Subset,Labels=one_list
while "'" in Subset:
Subset=Subset.replace("'","")
while "'" in Labels:
Labels=Labels.replace("'","")
oneDir=resDir+Subset.strip()+"/"+Labels.strip()+"/"
if not os.path.exists(oneDir):
os.makedirs(oneDir)
if img_url.strip():
if img_url.strip() not in img_urls:
print("img_url: ", img_url)
with open('img_url.txt','a') as f:
f.write(img_url.strip()+'\n')
one_path=oneDir+str(len(os.listdir(oneDir))+1)+".jpg"
downloadSingleImg(img_url,save_path=one_path)
except Exception as e:
print("Exception: ", e)
我将下载部分单独封装成一块,方便替换为其他方法:
def downloadSingleImg(img_url,save_path="data/a.jpg"):
"""
下载单个图像数据
"""
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
}
r = requests.get(img_url,headers=headers)
# 下载图片
f = open(save_path,'wb')
f.write(r.content)
f.close()
下载过程中,会实时记录下载到的图像的链接存储本地文件,方便断点重新继续下载,记录文件截图如下所示:
下载还是需要挺久的,因为没有做加速处理,就慢慢等着吧,下载结束如下所示:
因为得到的各个子集下的数据集类别不对等,我对其进行了合并处理,结果如下所示:
这里面每个子目录的名称就是目录下所有图像的多个主题,这样就好处理了。
首先确定所有的单主题的数量:
classes_list = os.listdir("data/train/")
print("classes_list: ", classes_list)
print("classes_list_length: ", len(classes_list))
labels_list=[]
for one_class in classes_list:
one_c2l_list=[one.strip() for one in one_class.split(" ")]
labels_list+=one_c2l_list
labels_list=list(set(labels_list))
labels_list.sort()
print("labels_list: ", labels_list)
print("labels_list_length: ", len(labels_list))
with open("labels_list.json","w") as f:
f.write(json.dumps(labels_list))
numbers=len(labels_list)
print("numbers: ", numbers)
之后进行数据标签生成:
def generateLabel(one):
"""
标签生成
"""
one_label_list=[o.strip() for o in one.split(" ")]
print("one_label_list: ", one_label_list)
one_y_list = []
for i in range(len(labels_list)):
if labels_list[i] in one_label_list:
one_y_list.append(1)
else:
one_y_list.append(0)
print("one_y_list: ", one_y_list)
print("one_y_list_length: ", len(one_y_list))
assert len(one_y_list) == numbers
return one_y_list
搭建所需的模型,可以根据自己的需求变化:
model = Sequential()
input_shape = (h, w, way)
model.add(Conv2D(64, (3, 3), input_shape=input_shape))
model.add(Activation("relu"))
model.add(Dropout(0.3))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(Dropout(0.3))
model.add(Dense(numbers))
model.add(Activation("sigmoid"))
lrate = 0.01
decay = lrate / 100
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
print(model.summary())
拟合训练:
# 拟合训练
checkpoint = ModelCheckpoint(
filepath=saveDir + "best.h5",
monitor="val_loss",
verbose=1,
mode="auto",
save_best_only="True",
period=1,
)
history = model.fit(
X_train,
y_train,
validation_data=(X_test, y_test),
callbacks=[checkpoint],
epochs=nepochs,
batch_size=32,
)
print(history.history.keys())
#可视化
plt.clf()
plt.plot(history.history["acc"])
plt.plot(history.history["val_acc"])
plt.title("model accuracy")
plt.ylabel("accuracy")
plt.xlabel("epochs")
plt.legend(["train", "test"], loc="upper left")
plt.savefig(saveDir + "train_validation_acc.png")
plt.clf()
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("model loss")
plt.ylabel("loss")
plt.xlabel("epochs")
plt.legend(["train", "test"], loc="upper left")
plt.savefig(saveDir + "train_validation_loss.png")
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1] * 100))
准确度、损失值曲线如下所示:
这里同样基于界面做了可视化,简单看下效果: