json数据格式-白红宇

json数据格式

阅读量：539 次

发布时间：2019-03-09

本文共 2633 字，大约阅读时间需要 8 分钟。

JSON数据格式是常用的数据交换格式，广泛应用于机器学习、数据分析等领域。在ICDAR（ICdar一个IDAR标注任务）中，JSON格式被广泛用于标注训练数据。以下是一个典型的JSON数据示例，展示了图像标注的具体结构：

{  "data_root": "datasets/data/train",  "data_list": [    {      "img_name": "X00016469670.jpg",      "annotations": [        {          "polygon": [            [98.0, 26.0],            [321.0, 26.0],            [321.0, 66.0],            [98.0, 66.0]          ],          "text": "TAN CHAY YEE",          "illegibility": false,          "language": "Latin",          "chars": [            {              "polygon": [],              "char": "",              "illegibility": false,              "language": "Latin"            }          ]        },        # ... 其他注释      ]    }  ]}

JSON数据结构解析

data_root：指定了数据集的基础路径。

data_list：包含了所有待处理图像的信息。每个图像包含以下子项：

img_name：图像文件名。

annotations：图像的标注结果，包括：
- polygon：多边形边界坐标，用于标注图像中的特定区域。
- text：标注的文本内容。
- illegibility：标注是否存在不清晰字符。
- language：标注语言（如Latin表示拉丁文字母）。
- chars：单个字符的标注信息，包括字符位置和不清晰度。

JSON到ICDAR标注脚本的功能说明

以下是一个用于将JSON格式数据转换为ICDAR标注文件的Python脚本示例：

# -*- coding: utf-8 -*-import globimport os.pathimport numpy as npimport shutilimport jsonINPUT_PATH = "E:/card_data/card_autolabel/20200116"def jsonTotxt(jsonfile):    filename = os.path.split(jsonfile)[1].split('.')[0]    savefile = os.path.join("E:/datasets/icdar", filename + '.txt')        # 读取JSON文件    with open(jsonfile, 'r') as f:        b = f.read()        data = json.loads(b)        # 写入文本文件    with open(savefile, "a", encoding='utf-8') as f:        for coordict in data['shapes']:            coordlist = coordict['points']            for point in coordlist:                f.write(str(point).replace('[','').replace(']','') + ',')            f.write('###\n')# 执行批量处理shutil.rmtree("E:/datasets/icdar/", ignore_errors=True)sub_dirs = [x[0] for x in os.walk(INPUT_PATH)]is_rootdir = Truefor sub_dir in sub_dirs:    if is_rootdir:        is_rootdir = False        continue        extensions_images = ['jpg', 'JPG', 'jpeg', 'JPEG']    extensions_labels = 'json'        images_list = []    for extension in extensions_images:        images_glob = os.path.join(INPUT_PATH, '*.' + extension)        images_list.extend(glob.glob(images_glob))        if not images_list:        continue        for image in images_list:        shutil.copy(image, "E:/datasets/icdar/")        json_glob = os.path.join(INPUT_PATH, '*.' + extensions_labels)    json_lists = glob.glob(json_glob)        for json_list in json_lists:        jsonTotxt(json_list)

###Script 功能说明

文件清理：删除目标目录中所有现有的文件和目录。

图像分类与复制：遍历指定路径中的所有图像文件，根据文件扩展名分类并将文件复制到目标目录中。

JSON转TXT：读取每个JSON文件，将其转换为文本文件，记录图像的标注信息。

标注处理：解析JSON中的标注信息，按照预定格式将标注结果写入文本文件。

该脚本适用于大规模图像数据的标注转换任务，支持批量处理，确保处理效率。

转载地址：http://ptqiz.baihongyu.com/

你可能感兴趣的文章