The Paintings Dataset数据集主要是Visual Geometry Group官方开源出来的以绘画为主题的数据集,不同的绘画数据有着不同的主题,可以是一个单独的主题也可以是多个混合的主题组成的,所以这个数据集用来做图像识别任务的话就是一个比较经典的多任务学习模型了,这个在我之前的云状识别一文里面也有提到,这里就不再多赘述了。

首先看下官方数据集介绍,截图如下所示:

Pythonram数据集在哪个软件包 python tips数据集_python

数据详情统计如下所示:

Pythonram数据集在哪个软件包 python tips数据集_python_02

这里官方提供的数据集形式有别于VOC这类的数据集,他不是直接的图像形式的数据集,而是一堆链接,需要自己下载下来,这里官方一共提供了三个年份的版本的数据集可供使用,如下所示:

Pythonram数据集在哪个软件包 python tips数据集_3d_03

         这里我直接使用的是2021年,也就是最新的数据集,下载下来数据集样例如下所示:

Image URL,Web page URL,Subset,Labels
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NID/QUB/NID_QUB_QUB_264-001.jpg,https://artuk.org/discover/artworks/and-the-cow-jumped-over-the-moon-168957,'test',' cow'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/GMIII/MOSI/GMIII_MOSI_A1978_72_3-001.jpg,https://artuk.org/discover/artworks/0-6-00-6-0-garratt-locomotive-203965,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1979_7964-001.jpg,https://artuk.org/discover/artworks/044t-locomotive-no-1431-passing-mosley-siding-signal-box-9593,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/CHE/CRHC/CHE_CRHC_PCF40-001.jpg,https://artuk.org/discover/artworks/080-locomotive-on-freight-duty-103049,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NOT/NTMAG/NOT_NTMAG_1997_31-001.jpg,https://artuk.org/discover/artworks/17th-and-21st-lancers-46478,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/STF/STRM/STF_STRM_832-001.jpg,https://artuk.org/discover/artworks/1st-south-staffords-on-the-march-in-burma-1944-19642,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1986_9418-001.jpg,https://artuk.org/discover/artworks/222-locomotive-built-by-george-forrester-8695,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_2004_7349-001.jpg,https://artuk.org/discover/artworks/222-locomotive-jenny-lind-9616,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1986_9421-001.jpg,https://artuk.org/discover/artworks/222-locomotive-patentee-robert-stephensons-patent-locomotive-9409,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1996_7374-001.jpg,https://artuk.org/discover/artworks/264t-locomotive-alice-9530,'test',' train'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/LLR/RLRH/LLR_RLRH_L_H38_1988_3_0-001.jpg,https://artuk.org/discover/artworks/2nd-battalion-the-leicestershire-regiment-as-chindits-during-operations-against-the-japanese-at-indaw-lake-burma-1944-80060,'train',' aeroplane horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/IWM/IWM/IWM_IWM_LD_5509-001.jpg,https://artuk.org/discover/artworks/43-repair-group-air-frame-repair-service-lincoln-repairing-liberator-aircraft-7481,'test',' aeroplane'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/NRM/NY_NRM_1977_5834-001.jpg,https://artuk.org/discover/artworks/460-locomotive-no-1306-mayflower-next-to-unit-m77165-in-the-paint-shop-at-horwich-works-1975-9765,'train',' train'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/CW/MTE/CW_MTE_45-001.jpg,https://artuk.org/discover/artworks/6th-earl-and-countess-of-mount-edgcumbe-in-coronation-robes-14840,'validation',' chair'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/NY/YAM/NY_YAM_260367-001.jpg,https://artuk.org/discover/artworks/87-squadron-gladiators-tied-together-k7967-k8027-k7972-10402,'test',' aeroplane'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/YAG/NY_YAG_YORAG_326-001.jpg,https://artuk.org/discover/artworks/a-seventeenth-century-dutch-interior-with-a-seated-lady-8374,'test',' chair diningtable dog'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/LNE/RAFM/LNE_RAFM_FA03538-001.jpg,https://artuk.org/discover/artworks/a-20-havoc-light-bomber-136117,'test',' aeroplane'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/DUR/DBM/DUR_DBM_770-001.jpg,https://artuk.org/discover/artworks/a-basket-of-flowers-with-a-dog-chasing-a-bird-44680,'train',' bird dog'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/DUR/DBM/DUR_DBM_769-001.jpg,https://artuk.org/discover/artworks/a-basket-of-flowers-with-birds-44726,'train',' bird'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NY/YAG/NY_YAG_YORAG_66-001.jpg,https://artuk.org/discover/artworks/a-bather-7936,'test',' chair'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/LW/NARM/LW_NARM_131900-001.jpg,https://artuk.org/discover/artworks/a-battery-of-the-royal-horse-artillery-galloping-to-a-fresh-position-182844,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/SYO/CG/SYO_CG_CP_TR_156-001.jpg,https://artuk.org/discover/artworks/a-bavarian-lake-with-fishing-boats-68811,'test',' boat'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_262-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-4097,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/KT/MMB/KT_MMB_06_029-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-76966,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w800h800/collection/NG/NG/NG_NG_NG983-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-a-cow-a-goat-and-three-sheep-near-a-building-113901,'test',' cow horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/TRR/NTI_TRR_337062-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-and-a-pony-in-a-landscape-101086,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/CAL/NTIII_CAL_290260-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-called-fleacatcher-169526,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/PNC/NTI_PNC_1420370-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-chance-and-a-jockey-102326,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_258-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-3778,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_590-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-4061,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIV/UPP/NTIV_UPP_138312-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-landscape-with-his-groom-and-two-hounds-220307,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/CAL/NTIII_CAL_290484-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-169372,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIV/KHA/NTIV_KHA_445331-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-220614,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_184-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-stable-3599,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/SFK/NHM/SFK_NHM_1986_004-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-in-a-wooded-landscape-11244,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_455-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-near-a-building-3946,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/FEL/NTIII_FEL_1401221-001.jpg,https://artuk.org/discover/artworks/a-bay-horse-pony-bloodhound-and-dachshund-outside-felbrigg-hall-171257,'train',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTII/BKH/NTII_SKH_1196043-001.jpg,https://artuk.org/discover/artworks/a-bay-hunter-and-a-pug-dog-in-a-landscape-132043,'test',' dog'
https://d3d00swyhr67nd.cloudfront.net/w800h800/collection/NG/NG/NG_NG_NG818-001.jpg,https://artuk.org/discover/artworks/a-beach-scene-with-fishermen-113903,'test',' boat'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/STF/WAG/STF_WAG_OP536-001.jpg,https://artuk.org/discover/artworks/a-belgian-school-19183,'test',' chair diningtable'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/VA/PC/VA_PC_2007BP2389-001.jpg,https://artuk.org/discover/artworks/a-bird-32686,'train',' bird'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/LAN/TURT/LAN_TURT_PCF21-001.jpg,https://artuk.org/discover/artworks/a-bird-in-a-tree-152732,'validation',' bird'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/CDN/LBCN/CDN_LBCN_426-001.jpg,https://artuk.org/discover/artworks/a-bird-in-half-123669,'test',' bird'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTI/LDK/NTI_LDK_884912-001.jpg,https://artuk.org/discover/artworks/a-black-dog-100725,'test',' dog'
https://d3d00swyhr67nd.cloudfront.net/w944h944/collection/SYO/BHA/SYO_BHA_90009742-001.jpg,https://artuk.org/discover/artworks/a-black-dog-68634,'train',' dog'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/ESX/AM/ESX_AM_443-001.jpg,https://artuk.org/discover/artworks/a-black-horse-at-newmarket-3596,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTII/ATT/NTII_ATT_609052-001.jpg,https://artuk.org/discover/artworks/a-black-horse-called-bishop-with-his-groom-in-a-landscape-131049,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/NTIII/ERH/NTIII_ERH_201463-001.jpg,https://artuk.org/discover/artworks/a-black-horse-in-a-courtyard-167313,'test',' horse'
https://d3d00swyhr67nd.cloudfront.net/w1200h1200/collection/TATE/TATE/TATE_TATE_T00888_10-001.jpg,https://artuk.org/discover/artworks/a-black-horse-with-two-dogs-201729,'test',' dog horse'

        可以看到:

       数据集一共分为四列,第一列是图像的链接,第二列是页面的链接【这个用处不大】,第三列是该图像所属的子类,比如【train/test/val】,第四列表示图像的主题集合。

        弄清楚了原始数据的含义,接下来第一步就是需要完成指定链接下图像数据的下载,核心实现如下所示:

def loadUrls2Img(data="painting_dataset_2021.csv", resDir="data/"):
    """
    读取图像链接,下载图像存储本地
    """
    with open("img_url.txt") as f:
        img_urls=[one.strip() for one in f.readlines() if one.strip()]
    print("img_urls_length: ", len(img_urls))
    df=pd.read_csv(data)
    print(df.head(10))
    data_list=df.values.tolist()
    print("data_list_length: ", len(data_list))
    left_length=len(data_list)-len(img_urls)
    print("left_length: ", left_length)
    for one_list in data_list[1:]:
        try:
            img_url,web_page_url,Subset,Labels=one_list
            while "'" in Subset:
                Subset=Subset.replace("'","")
            while "'" in Labels:
                Labels=Labels.replace("'","")
            oneDir=resDir+Subset.strip()+"/"+Labels.strip()+"/"
            if not os.path.exists(oneDir):
                os.makedirs(oneDir)
            if img_url.strip():
                if img_url.strip() not in img_urls:
                    print("img_url: ", img_url)
                    with open('img_url.txt','a') as f:
                        f.write(img_url.strip()+'\n')
                    one_path=oneDir+str(len(os.listdir(oneDir))+1)+".jpg"
                    downloadSingleImg(img_url,save_path=one_path)
        except Exception as e:
            print("Exception: ", e)

        我将下载部分单独封装成一块,方便替换为其他方法:

def downloadSingleImg(img_url,save_path="data/a.jpg"):
    """
    下载单个图像数据
    """
    headers ={
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
    }
    r = requests.get(img_url,headers=headers)
    # 下载图片
    f = open(save_path,'wb')
    f.write(r.content)
    f.close()

        下载过程中,会实时记录下载到的图像的链接存储本地文件,方便断点重新继续下载,记录文件截图如下所示:

Pythonram数据集在哪个软件包 python tips数据集_.net_04

          下载还是需要挺久的,因为没有做加速处理,就慢慢等着吧,下载结束如下所示:

Pythonram数据集在哪个软件包 python tips数据集_Pythonram数据集在哪个软件包_05

           因为得到的各个子集下的数据集类别不对等,我对其进行了合并处理,结果如下所示:

Pythonram数据集在哪个软件包 python tips数据集_3d_06

          这里面每个子目录的名称就是目录下所有图像的多个主题,这样就好处理了。

          首先确定所有的单主题的数量:

classes_list = os.listdir("data/train/")
print("classes_list: ", classes_list)
print("classes_list_length: ", len(classes_list))
labels_list=[]
for one_class in classes_list:
    one_c2l_list=[one.strip() for one in one_class.split(" ")]
    labels_list+=one_c2l_list
labels_list=list(set(labels_list))
labels_list.sort()
print("labels_list: ", labels_list)
print("labels_list_length: ", len(labels_list))
with open("labels_list.json","w") as f:
    f.write(json.dumps(labels_list))
numbers=len(labels_list)
print("numbers: ", numbers)

          之后进行数据标签生成: 

def generateLabel(one):
    """
    标签生成
    """
    one_label_list=[o.strip() for o in one.split(" ")]
    print("one_label_list: ", one_label_list)
    one_y_list = []
    for i in range(len(labels_list)):
        if labels_list[i] in one_label_list:
            one_y_list.append(1)
        else:
            one_y_list.append(0)
    print("one_y_list: ", one_y_list)
    print("one_y_list_length: ", len(one_y_list))
    assert len(one_y_list) == numbers
    return one_y_list

         搭建所需的模型,可以根据自己的需求变化:

model = Sequential()
input_shape = (h, w, way)
model.add(Conv2D(64, (3, 3), input_shape=input_shape))
model.add(Activation("relu"))
model.add(Dropout(0.3))
model.add(Conv2D(64, (3, 3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(Dropout(0.3))
model.add(Dense(numbers))
model.add(Activation("sigmoid"))
lrate = 0.01
decay = lrate / 100
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss="binary_crossentropy", optimizer=sgd, metrics=["accuracy"])
print(model.summary())

           拟合训练:

# 拟合训练
checkpoint = ModelCheckpoint(
    filepath=saveDir + "best.h5",
    monitor="val_loss",
    verbose=1,
    mode="auto",
    save_best_only="True",
    period=1,
)
history = model.fit(
    X_train,
    y_train,
    validation_data=(X_test, y_test),
    callbacks=[checkpoint],
    epochs=nepochs,
    batch_size=32,
)
print(history.history.keys())
#可视化
plt.clf()
plt.plot(history.history["acc"])
plt.plot(history.history["val_acc"])
plt.title("model accuracy")
plt.ylabel("accuracy")
plt.xlabel("epochs")
plt.legend(["train", "test"], loc="upper left")
plt.savefig(saveDir + "train_validation_acc.png")
plt.clf()
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("model loss")
plt.ylabel("loss")
plt.xlabel("epochs")
plt.legend(["train", "test"], loc="upper left")
plt.savefig(saveDir + "train_validation_loss.png")
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1] * 100))

       准确度、损失值曲线如下所示:

Pythonram数据集在哪个软件包 python tips数据集_3d_07

           这里同样基于界面做了可视化,简单看下效果:
 

Pythonram数据集在哪个软件包 python tips数据集_.net_08

 

Pythonram数据集在哪个软件包 python tips数据集_开发语言_09

 

Pythonram数据集在哪个软件包 python tips数据集_3d_10

 

Pythonram数据集在哪个软件包 python tips数据集_python_11