Exportation du jeu de données
Vous pouvez exporter des jeux de données dans le service d'étiquetage de données dans différents formats de texte et d'image, ainsi que des fichiers JSONL d'instantané.
Vous pouvez exporter des jeux de données dans le service d'étiquetage de données vers n'importe quel emplacement de stockage d'objets de la location. Ainsi, vous pouvez tenir à jour des versions ou utiliser le jeu de données ailleurs, par exemple, comme entrée pour le développement de modèles d'apprentissage automatique. L'emplacement du fichier de sortie est inclus dans le panneau d'exportation. Après l'exportation, la destination est disponible dans la demande de travail associée. La destination est également affichée dans la page Détails du jeu de données, mais seulement lorsque la demande de travail existe.
Pour les documents, vous pouvez exporter vers des fichiers JSONL.
- JSONL
- On ne vit qu'une fois V5
- COCO
- PASCAL VOC
- JSONL
- JSONL Compact Plus Content
- spaCy
- CoNLL V2003 Note
Si vous exportez du texte au format CoNLL, les entités récursives et qui se chevauchent sont ignorées.
Pour CSV, la seule option est d'exporter vers
JSONL.Cette tâche n'est pas disponible dans l'interface de ligne de commande.
Cette tâche n'est pas disponible dans l'API.
Exemples de jeux de données exportés de type document, image et texte
Exemples des fichiers JSON créés lors de l'exportation d'un jeu de données dans le service Étiquetage de données.
Exemple de fichier JSON consolidé exporté.
{
"id": "ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyag7jcbu3xnpw4dcn3tmniarzorpxbtegnipsw5oleeauq",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaihdqc5z4zq4sqt7t4c7vbwc6lbf5dr6mky2phcpvdlh7c3p5mtuq",
"displayName": "test-check",
"description": "test check",
"labelsSet": [{
"name": "location"
}, {
"name": "university"
}],
"annotationFormat": "ENTITY_EXTRACTION",
"datasetSourceDetails": {
"namespace": "idrcdhfxwqwa",
"bucket": "test-sachin-cucket"
},
"datasetFormatDetails": {
"formatType": "TEXT"
}
} {
"id": "ocid1.datalabelingrecord.oc1.iad.amaaaaaazaehrjyahykmu6hvdksayw64a3wmur7mk2366hgitlypk6u2soea",
"timeCreated": "2021-10-12 12:09:37",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "sample-text.txt"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyat64zcfbjviu3pttykthabv5jiuicva3dkv6oikstzd7q",
"timeCreated": "2021-10-12 12:16:51",
"createdBy": "ocid1.user.oc1..aaaaaaaaktqgvx2skco6bfyziwjzfjaxensoewscqbk7p44sjqyrxmz4qozq",
"entities": [{
"entityType": "TEXTSELECTION",
"labels": [{
"label_name": "university"
}],
"textSpan": {
"offset": 60,
"length": 11
}
}]
}]
}
Exemple de fichier JSON de jeu de données de document exporté.
{
"id":"ocid1.datalabelingdatasetint.oc1.iad.amaaaaaaniob46iafkiyw6a4uwgrnpy4lfxjoslocap7elaj257mxh4fzuwq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaajqiw27knoagxurhzjlihw7ijnoshsu4zi2uawdn5gfexdqwvu4vq",
"displayName":"Sep6_PDF",
"labelsSet":[
{
"name":"L1"
},
{
"name":"L"
},
{
"name":"23423"
}
],
"annotationFormat":"MULTI_LABEL",
"datasetSourceDetails":{
"namespace":"idgszs0xipmn",
"bucket":"Demo-bucket"
},
"datasetFormatDetails":{"formatType":"DOCUMENT"},
"recordFiles":[
{
"namespace":"idgszs0xipmn",
"bucket":"COVID_Dataset",
"path":"Snapshotsrecords_1632479104889.jsonl"
}
]
}
Exemple de fichier JSON de jeu de données d'image exporté.
{
"id": "ocid1...",
"compartmentId": "",
"timeCreated":2020-12-15...,
"displayName":...,
"description":...,
"labelsSet": [
{"name":"germanshepherd"},
{"name":"americanshepherd"},
{"name":"australianshepherd"},
{"name":"irishwolfhound"}
]
"annotationFormat": "IMAGE_OBJECT_SELECTION",
"datasetSourceDetails": {
"sourceType": "OBJECT_STORAGE",
"namespace": "i235o3idk",
"bucket": "mytrainingdata",
"prefix": "puppyproject/"
}
"datasetFormatDetails": {
"formatType": "IMAGE" # image requires less metadata than delimited for example
}
"recordsFiles: {
[
{
"namespace": "i235o3idk"
"bucket": "mylabels"
"path": "puppyproject/records1.json"
}
]
}
"definedTags": {}
"freeformTags": {}
}
Exemple de fichier JSON de jeu de données exporté de type texte.
{
"id":"ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyamqjx733dhxd25zxcro2nftrewq7ltj34ua2cfapzsmjq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaagzh2kii2frktoc7bcvfydpzkxr7dbn6nf6jcyrxwgzen4pi5y4zq",
"displayName":"NER DEMO DATASET UNLABELLED",
"description":"NER DEMO DATASET UNLABELLED",
"labelsSet":[
{
"name":"Person"
},
{
"name":"Organization"
},
{
"name":"Event"
},
{
"name":"Place"
}
],
"annotationFormat":"ENTITY_EXTRACTION",
"datasetSourceDetails":{
"namespace":"idrcdhfxwqwa",
"bucket":"news-articles"
},
"datasetFormatDetails":{
},
"recordFiles":[
{
"namespace":"idrcdhfxwqwa",
"bucket":"snapshots",
"path":"forReview/records_1621847577526.jsonl"
}
]
}
Exemple de fichier JSON d'enregistrement de document exporté.
{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaqgpzhscdpdcgohg5ocp3obwmjjgju6m73bmyrt4aovhq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 98.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iatjg3p6hlszxrgmsj4y76b5tndddaedm6ardkoxbtt6mq",
"timeCreated":"2021-09-06 03:42:43",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"23423"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iasb5klulgaj4djn3acsgsd3cekx3ix46ftxjdip4tu23a",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 99.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iav45mlpcleqjt7cnmhyogopszi2rfnilwjhd4xyxa7irq",
"timeCreated":"2021-09-06 03:42:47",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"L1"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaxhixolkqryomyu6i4jrrmzwcckw2tmgva47suylu5rzq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 97.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iagymrjuem42kvzilxjd5hdrr3djznrl7aajvvcr6zc6sq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 96.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaclpccpxn5hgmplesv3mt3g6hxkfaepzv6fuy7b6he3ca",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 2.pdf"
}
}
Exemple de fichier JSON d'enregistrement d'image exporté.
{
"id": "ocid1...",
"timeCreated": 2020-12-15...,
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "filename2.jpg"
}
"annotations": [
{
"id": "ocid1....",
"timeCreated": ...,
"createdBy": ...,
"entities: [
{
"entityType": "IMAGEOBJECTSELECTION",
"labels": [
{"name": "germanshepherd"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.2, "y":0.2},
{"x":0.3, "y":0.2},
{"x":0.3, "y":0.3},
{"x":0.2, "y":0.3}
]
}
},
{
"entityType": "BOUNDING_BOX",
"labels": [
{"name": "irishwolfhound"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.4, "y":0.4},
{"x":0.5, "y":0.4},
{"x":0.5, "y":0.5},
{"x":0.4, "y":0.5}
]
}
}
]
}
],
"freeformTags": {
"set": "validation" # optional, user defined convention used for reproducibility
}
}
Exemple de fichier JSON d'enregistrement de texte exporté.
{
"id":"ocid1.record.oc1.iad.UxxfPBMZVYfwZHZnjCPUGkhMwpWoTPMOnxDnrgXbBxwLKkrdeGwewdViOoUJ",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_3.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyadghacojq3nmo2mtcbcmlo4rgslmpzxeboujhduft5nta",
"timeCreated":"2021-46-21 09:46:45",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":141,
"length":12
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":204,
"length":20
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":254,
"length":15
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":402,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":638,
"length":11
}
}
]
}
]
}{
"id":"ocid1.record.oc1.iad.AakCoDHvJpnZofzIYfRCfpZnFUqNmfiWNIuNysbXCSRZeTVqdwKGvYjJpMvh",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_1.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyafoed6oimxqxeyey6osjo3jp52vsyd75i5zspfvcfdz3q",
"timeCreated":"2021-30-21 03:30:10",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":36,
"length":8
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":147,
"length":23
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":196,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":311,
"length":22
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":512,
"length":49
}
}
]
}
]
}
Exemple de fichier JSON de jeu de données exporté au format CSV (texte).
{
"id": "ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46iaxarhafiu42tbdm2d2nkxlkxwhnc76ohnwvpsdfccqw5q",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaundh4v2w4spnyt4hgy367qf54jonakpz6gh573bspmgzfoj2auga",
"displayName": "Text Classification CSV dataset",
"labelsSet": [{
"name": "positive"
}, {
"name": "neutral"
}, {
"name": "negative"
}],
"annotationFormat": "SINGLE_LABEL",
"datasetSourceDetails": {
"namespace": "idgszs0xipmn",
"bucket": "TEST",
"prefix": "languageteam/Text_Classification_Context_Oracle_advt.csv"
},
"datasetFormatDetails": {
"formatType": "TEXT",
"textFileTypeMetadata": {
"formatType": "DELIMITED",
"delimitedFileTypeMetaData": {
"columnIndex": 5,
"columnName": "CONTENT",
"columnDelimiter": ","
}
}
}
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iajx42mojwkktind744i3t2q3di6tdhwysw2wy4d42tseq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/546"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iadsu6zpch4lvozx7ci3as5st23jqxjpjdcryp4jworala",
"timeCreated": "2022-06-05 05:40:48",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia7otgs2rb3kuh464sisfbjxxbbkb65sbg2icst3gquw3q",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/303"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatfuceqzjb5nnh7quk5wupvwe74bfpn5oka57cz6gqv4a",
"timeCreated": "2022-06-05 05:41:30",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iab55fqcxlfb3xszlpp7qnpsthjdhzzb7nki65xqdvgceq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/547"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iamosgunt72lci3g3mzyyx2sskjdje4e5zspts7mbnsl5q",
"timeCreated": "2022-06-05 05:41:36",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia45ave4zhtisvu2k7d6tbciskcge4ecm2imb6bvdqe4da",
"timeCreated": "2022-06-05 04:39:21",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/564"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iauqo6tlqil7vijetsayt6vsmpohxum5vmj6cde3wbfxua",
"timeCreated": "2022-06-05 05:40:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iasymkpbstgjwmae7ar5ikgp5mtth2izcaaaruatpl45ma",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/545"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatu6k7afdwirdtvv6bofrquc65m4ruet4hlfmhgzhqjxa",
"timeCreated": "2022-06-05 05:41:02",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia6n4whohdhn257pmot7zlncawockthadosdhrp5so2nna",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/304"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iaslgb6s6h5ffce5mcgeidndp3vydcxzjya7yrbaj6pw5a",
"timeCreated": "2022-06-05 05:40:57",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iamgsncrjarzujr6duaedmsjyrp67yi7dpe2uoi6h54c5a",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/548"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iabt3hwyc7mkaanez7q24k7vlfds3lisa6hdu53hntq2qq",
"timeCreated": "2022-06-05 05:42:55",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iactsl4j7v633d2y2t67lkxawv2nyemz7wwarppjpxeofq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/305"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46ia7xxg4ukky3ur56zzwaodvwrks4vqgvoug2z2moif274a",
"timeCreated": "2022-06-05 05:41:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]