Exportación del juego de datos
Puede exportar juegos de datos en Data Labeling en varios formatos de texto e imagen, así como en archivos JSONL de instantáneas.
Puede exportar juegos de datos en Data Labeling a cualquier ubicación de Object Storage del arrendamiento. De ese modo, puede mantener versiones o usar el juego de datos en cualquier otro lugar; por ejemplo, como una entrada para el desarrollo del modelo de Machine Learning. La ubicación del archivo de salida se incluirá en el panel de exportación. Después de la exportación, el destino estará disponible en la solicitud de trabajo asociada. El destino también se mostrará en la página Detalles de juego de datos, pero solo mientras exista la solicitud de trabajo.
Para documentos, puede exportar a archivos JSONL.
- JSONL
- Sólo se vive una vez V5
- COCO
- VOC PASCAL
- JSONL
- JSONL Compact Plus Content
- spaCy
- CoNLL V2003 Nota
Si exporta texto en formato CoNLL, se ignoran las entidades recursivas y superpuestas.
Para CSV, la única opción es exportar a
JSONL
.Ejemplos de juegos de datos de texto, imágenes y documentos exportados
Ejemplos de los archivos JSON creados cuando se exporta un juego de datos en Data Labeling.
Ejemplo de un archivo JSON consolidado exportado.
{
"id": "ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyag7jcbu3xnpw4dcn3tmniarzorpxbtegnipsw5oleeauq",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaihdqc5z4zq4sqt7t4c7vbwc6lbf5dr6mky2phcpvdlh7c3p5mtuq",
"displayName": "test-check",
"description": "test check",
"labelsSet": [{
"name": "location"
}, {
"name": "university"
}],
"annotationFormat": "ENTITY_EXTRACTION",
"datasetSourceDetails": {
"namespace": "idrcdhfxwqwa",
"bucket": "test-sachin-cucket"
},
"datasetFormatDetails": {
"formatType": "TEXT"
}
} {
"id": "ocid1.datalabelingrecord.oc1.iad.amaaaaaazaehrjyahykmu6hvdksayw64a3wmur7mk2366hgitlypk6u2soea",
"timeCreated": "2021-10-12 12:09:37",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "sample-text.txt"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyat64zcfbjviu3pttykthabv5jiuicva3dkv6oikstzd7q",
"timeCreated": "2021-10-12 12:16:51",
"createdBy": "ocid1.user.oc1..aaaaaaaaktqgvx2skco6bfyziwjzfjaxensoewscqbk7p44sjqyrxmz4qozq",
"entities": [{
"entityType": "TEXTSELECTION",
"labels": [{
"label_name": "university"
}],
"textSpan": {
"offset": 60,
"length": 11
}
}]
}]
}
Ejemplo de un archivo JSON de juego de datos de documento exportado.
{
"id":"ocid1.datalabelingdatasetint.oc1.iad.amaaaaaaniob46iafkiyw6a4uwgrnpy4lfxjoslocap7elaj257mxh4fzuwq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaajqiw27knoagxurhzjlihw7ijnoshsu4zi2uawdn5gfexdqwvu4vq",
"displayName":"Sep6_PDF",
"labelsSet":[
{
"name":"L1"
},
{
"name":"L"
},
{
"name":"23423"
}
],
"annotationFormat":"MULTI_LABEL",
"datasetSourceDetails":{
"namespace":"idgszs0xipmn",
"bucket":"Demo-bucket"
},
"datasetFormatDetails":{"formatType":"DOCUMENT"},
"recordFiles":[
{
"namespace":"idgszs0xipmn",
"bucket":"COVID_Dataset",
"path":"Snapshotsrecords_1632479104889.jsonl"
}
]
}
Ejemplo de un archivo JSON de juego de datos de imagen exportado.
{
"id": "ocid1...",
"compartmentId": "",
"timeCreated":2020-12-15...,
"displayName":...,
"description":...,
"labelsSet": [
{"name":"germanshepherd"},
{"name":"americanshepherd"},
{"name":"australianshepherd"},
{"name":"irishwolfhound"}
]
"annotationFormat": "IMAGE_OBJECT_SELECTION",
"datasetSourceDetails": {
"sourceType": "OBJECT_STORAGE",
"namespace": "i235o3idk",
"bucket": "mytrainingdata",
"prefix": "puppyproject/"
}
"datasetFormatDetails": {
"formatType": "IMAGE" # image requires less metadata than delimited for example
}
"recordsFiles: {
[
{
"namespace": "i235o3idk"
"bucket": "mylabels"
"path": "puppyproject/records1.json"
}
]
}
"definedTags": {}
"freeformTags": {}
}
Ejemplo de un archivo JSON de juego de datos de texto exportado.
{
"id":"ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyamqjx733dhxd25zxcro2nftrewq7ltj34ua2cfapzsmjq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaagzh2kii2frktoc7bcvfydpzkxr7dbn6nf6jcyrxwgzen4pi5y4zq",
"displayName":"NER DEMO DATASET UNLABELLED",
"description":"NER DEMO DATASET UNLABELLED",
"labelsSet":[
{
"name":"Person"
},
{
"name":"Organization"
},
{
"name":"Event"
},
{
"name":"Place"
}
],
"annotationFormat":"ENTITY_EXTRACTION",
"datasetSourceDetails":{
"namespace":"idrcdhfxwqwa",
"bucket":"news-articles"
},
"datasetFormatDetails":{
},
"recordFiles":[
{
"namespace":"idrcdhfxwqwa",
"bucket":"snapshots",
"path":"forReview/records_1621847577526.jsonl"
}
]
}
Ejemplo de un archivo JSON de registro de documento exportado.
{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaqgpzhscdpdcgohg5ocp3obwmjjgju6m73bmyrt4aovhq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 98.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iatjg3p6hlszxrgmsj4y76b5tndddaedm6ardkoxbtt6mq",
"timeCreated":"2021-09-06 03:42:43",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"23423"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iasb5klulgaj4djn3acsgsd3cekx3ix46ftxjdip4tu23a",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 99.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iav45mlpcleqjt7cnmhyogopszi2rfnilwjhd4xyxa7irq",
"timeCreated":"2021-09-06 03:42:47",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"L1"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaxhixolkqryomyu6i4jrrmzwcckw2tmgva47suylu5rzq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 97.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iagymrjuem42kvzilxjd5hdrr3djznrl7aajvvcr6zc6sq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 96.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaclpccpxn5hgmplesv3mt3g6hxkfaepzv6fuy7b6he3ca",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 2.pdf"
}
}
Ejemplo de un archivo JSON de registro de imagen exportado.
{
"id": "ocid1...",
"timeCreated": 2020-12-15...,
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "filename2.jpg"
}
"annotations": [
{
"id": "ocid1....",
"timeCreated": ...,
"createdBy": ...,
"entities: [
{
"entityType": "IMAGEOBJECTSELECTION",
"labels": [
{"name": "germanshepherd"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.2, "y":0.2},
{"x":0.3, "y":0.2},
{"x":0.3, "y":0.3},
{"x":0.2, "y":0.3}
]
}
},
{
"entityType": "BOUNDING_BOX",
"labels": [
{"name": "irishwolfhound"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.4, "y":0.4},
{"x":0.5, "y":0.4},
{"x":0.5, "y":0.5},
{"x":0.4, "y":0.5}
]
}
}
]
}
],
"freeformTags": {
"set": "validation" # optional, user defined convention used for reproducibility
}
}
Ejemplo de un archivo JSON de registro de texto exportado.
{
"id":"ocid1.record.oc1.iad.UxxfPBMZVYfwZHZnjCPUGkhMwpWoTPMOnxDnrgXbBxwLKkrdeGwewdViOoUJ",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_3.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyadghacojq3nmo2mtcbcmlo4rgslmpzxeboujhduft5nta",
"timeCreated":"2021-46-21 09:46:45",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":141,
"length":12
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":204,
"length":20
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":254,
"length":15
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":402,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":638,
"length":11
}
}
]
}
]
}{
"id":"ocid1.record.oc1.iad.AakCoDHvJpnZofzIYfRCfpZnFUqNmfiWNIuNysbXCSRZeTVqdwKGvYjJpMvh",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_1.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyafoed6oimxqxeyey6osjo3jp52vsyd75i5zspfvcfdz3q",
"timeCreated":"2021-30-21 03:30:10",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":36,
"length":8
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":147,
"length":23
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":196,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":311,
"length":22
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":512,
"length":49
}
}
]
}
]
}
Ejemplo de un archivo JSON de juego de datos CSV (texto) exportado.
{
"id": "ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46iaxarhafiu42tbdm2d2nkxlkxwhnc76ohnwvpsdfccqw5q",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaundh4v2w4spnyt4hgy367qf54jonakpz6gh573bspmgzfoj2auga",
"displayName": "Text Classification CSV dataset",
"labelsSet": [{
"name": "positive"
}, {
"name": "neutral"
}, {
"name": "negative"
}],
"annotationFormat": "SINGLE_LABEL",
"datasetSourceDetails": {
"namespace": "idgszs0xipmn",
"bucket": "TEST",
"prefix": "languageteam/Text_Classification_Context_Oracle_advt.csv"
},
"datasetFormatDetails": {
"formatType": "TEXT",
"textFileTypeMetadata": {
"formatType": "DELIMITED",
"delimitedFileTypeMetaData": {
"columnIndex": 5,
"columnName": "CONTENT",
"columnDelimiter": ","
}
}
}
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iajx42mojwkktind744i3t2q3di6tdhwysw2wy4d42tseq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/546"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iadsu6zpch4lvozx7ci3as5st23jqxjpjdcryp4jworala",
"timeCreated": "2022-06-05 05:40:48",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia7otgs2rb3kuh464sisfbjxxbbkb65sbg2icst3gquw3q",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/303"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatfuceqzjb5nnh7quk5wupvwe74bfpn5oka57cz6gqv4a",
"timeCreated": "2022-06-05 05:41:30",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iab55fqcxlfb3xszlpp7qnpsthjdhzzb7nki65xqdvgceq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/547"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iamosgunt72lci3g3mzyyx2sskjdje4e5zspts7mbnsl5q",
"timeCreated": "2022-06-05 05:41:36",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia45ave4zhtisvu2k7d6tbciskcge4ecm2imb6bvdqe4da",
"timeCreated": "2022-06-05 04:39:21",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/564"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iauqo6tlqil7vijetsayt6vsmpohxum5vmj6cde3wbfxua",
"timeCreated": "2022-06-05 05:40:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iasymkpbstgjwmae7ar5ikgp5mtth2izcaaaruatpl45ma",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/545"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatu6k7afdwirdtvv6bofrquc65m4ruet4hlfmhgzhqjxa",
"timeCreated": "2022-06-05 05:41:02",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia6n4whohdhn257pmot7zlncawockthadosdhrp5so2nna",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/304"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iaslgb6s6h5ffce5mcgeidndp3vydcxzjya7yrbaj6pw5a",
"timeCreated": "2022-06-05 05:40:57",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iamgsncrjarzujr6duaedmsjyrp67yi7dpe2uoi6h54c5a",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/548"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iabt3hwyc7mkaanez7q24k7vlfds3lisa6hdu53hntq2qq",
"timeCreated": "2022-06-05 05:42:55",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iactsl4j7v633d2y2t67lkxawv2nyemz7wwarppjpxeofq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/305"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46ia7xxg4ukky3ur56zzwaodvwrks4vqgvoug2z2moif274a",
"timeCreated": "2022-06-05 05:41:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]