本文共 2633 字,大约阅读时间需要 8 分钟。
JSON数据格式是常用的数据交换格式,广泛应用于机器学习、数据分析等领域。在ICDAR(ICdar一个IDAR标注任务)中,JSON格式被广泛用于标注训练数据。以下是一个典型的JSON数据示例,展示了图像标注的具体结构:
{ "data_root": "datasets/data/train", "data_list": [ { "img_name": "X00016469670.jpg", "annotations": [ { "polygon": [ [98.0, 26.0], [321.0, 26.0], [321.0, 66.0], [98.0, 66.0] ], "text": "TAN CHAY YEE", "illegibility": false, "language": "Latin", "chars": [ { "polygon": [], "char": "", "illegibility": false, "language": "Latin" } ] }, # ... 其他注释 ] } ]}
以下是一个用于将JSON格式数据转换为ICDAR标注文件的Python脚本示例:
# -*- coding: utf-8 -*-import globimport os.pathimport numpy as npimport shutilimport jsonINPUT_PATH = "E:/card_data/card_autolabel/20200116"def jsonTotxt(jsonfile): filename = os.path.split(jsonfile)[1].split('.')[0] savefile = os.path.join("E:/datasets/icdar", filename + '.txt') # 读取JSON文件 with open(jsonfile, 'r') as f: b = f.read() data = json.loads(b) # 写入文本文件 with open(savefile, "a", encoding='utf-8') as f: for coordict in data['shapes']: coordlist = coordict['points'] for point in coordlist: f.write(str(point).replace('[','').replace(']','') + ',') f.write('###\n')# 执行批量处理shutil.rmtree("E:/datasets/icdar/", ignore_errors=True)sub_dirs = [x[0] for x in os.walk(INPUT_PATH)]is_rootdir = Truefor sub_dir in sub_dirs: if is_rootdir: is_rootdir = False continue extensions_images = ['jpg', 'JPG', 'jpeg', 'JPEG'] extensions_labels = 'json' images_list = [] for extension in extensions_images: images_glob = os.path.join(INPUT_PATH, '*.' + extension) images_list.extend(glob.glob(images_glob)) if not images_list: continue for image in images_list: shutil.copy(image, "E:/datasets/icdar/") json_glob = os.path.join(INPUT_PATH, '*.' + extensions_labels) json_lists = glob.glob(json_glob) for json_list in json_lists: jsonTotxt(json_list)
###Script 功能说明
该脚本适用于大规模图像数据的标注转换任务,支持批量处理,确保处理效率。
转载地址:http://ptqiz.baihongyu.com/