使用Python从COCO数据集中提取特定类别数据进行AI模型训练

图片[1]-使用Python从COCO数据集中提取特定类别数据进行AI模型训练-山海云端论坛

在准备训练数据以训练人工智能检测模型时,有时候我们会选择从公开数据集中抽取数据,以节省时间和精力。本文将介绍如何从COCO数据集中抽取特定类别的数据用于训练YOLO系列模型。

数据准备

1. 下载COCO2017数据集

你可以从以下链接下载COCO2017数据集的训练集和验证集:

  • 训练集:train:http://images.cocodataset.org/zips/train2017.zip
  • 验证集:valid:http://images.cocodataset.org/zips/val2017.zip

2. 下载对应的YOLO格式标签文件

由于我们将训练YOLO系列模型,需要下载相应的格式标签文件,可以从以下链接下载:

https://github.com/ultralytics/yolov5/releases/download/v1.0/coco2017labels.zip

Python代码

一旦准备好数据,我们就可以开始抽取了。首先,需要创建一个用于抽取类别的yaml文件。

<code># 定义需要抽取的类别(classes.yaml) path: /home/dataset/coco train: train2017.txt val: val2017.txt names: 0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus</code>

接下来,我们编写抽取代码:

<code># create_sub_coco_dataset.py import json import yaml import sys import os import shutil import tqdm # 省略了MakeDirs函数 def create_sub_coco_dataset(data_yaml, src_root_dir, dst_root_dir, folder): MakeDirs(dst_root_dir + "/annotations/") MakeDirs(dst_root_dir + "/images/" + folder) MakeDirs(dst_root_dir + "/labels/" + folder) print(yaml.safe_load(open(data_yaml).read())['names']) keep_names = [x + 1 for x in yaml.safe_load(open(data_yaml).read())['names'].keys()] all_annotations = json.loads(open(src_root_dir + "/annotations/instances_{}.json".format(folder)).read()) keep_categories = [x for x in all_annotations["categories"] if x["id"] in keep_names] keep_annotations = [x for x in all_annotations['annotations'] if x['category_id'] in keep_names] all_annotations['annotations'] = keep_annotations all_annotations["categories"] = keep_categories if not os.path.exists(dst_root_dir + "/annotations/instances_{}.json".format(folder)): with open(dst_root_dir + "/annotations/instances_{}.json".format(folder), "w") as f: json.dump(all_annotations, f) filelist = set() for i in tqdm.tqdm(keep_annotations): img_src_path = "/images/{}/{:012d}.jpg".format(folder, i["image_id"]) label_src_path = "/labels/{}/{:012d}.txt".format(folder, i["image_id"]) if not os.path.exists(dst_root_dir + img_src_path): shutil.copy(src_root_dir + img_src_path, dst_root_dir + img_src_path) if not os.path.exists(dst_root_dir + label_src_path): keep_records = [x for x in open(src_root_dir + label_src_path, "r").readlines() if (int(x.strip().split(" ")[0]) + 1) in keep_names] with open(dst_root_dir + label_src_path, "w") as f: for r in keep_records: f.write(r) filelist.add("./images/{}/{:012d}.jpg\n".format(folder, i["image_id"])) with open(dst_root_dir + "/{}.txt".format(folder), "w") as f: for r in filelist: f.write(r) new_data_yaml = yaml.safe_load(open(data_yaml).read()) new_data_yaml["path"] = dst_root_dir with open(dst_root_dir + "/coco.yaml", 'w') as f: f.write(yaml.dump(new_data_yaml, allow_unicode=True)) create_sub_coco_dataset(sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])</code>

使用命令示例:

<code>python create_sub_coco_dataset.py car.yaml ../../Datasets/coco_extract/coco2017/coco car train2017 python create_sub_coco_dataset.py person.yaml ../../Datasets/coco_extract/coco2017/coco person val2017</code>

以上就是抽取完成的结果文件目录,可以用于训练YOLO模型了。

图片[2]-使用Python从COCO数据集中提取特定类别数据进行AI模型训练-山海云端论坛

通过以上步骤,你可以快速从COCO数据集中抽取特定类别的数据,以便用于训练人工智能检测模型。

© 版权声明
THE END
喜欢就支持一下吧
点赞6 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容