Datasets exportieren
Sie können Datasets in Data Labeling in verschiedenen Text- und Bildformaten sowie Snapshot-JSONL-Dateien exportieren.
Sie können Datasets in Data Labeling in einen beliebigen Object Storage-Speicherort im Mandanten exportieren. So können Sie Versionen verwalten oder das Dataset an anderer Stelle verwenden, z.B. als Eingabe für die Entwicklung von Modellen für maschinelles Lernen. Der Speicherort der Ausgabedatei ist im Exportfensterbereich enthalten. Nach dem Export ist das Ziel in der zugehörigen Arbeitsanforderung verfügbar. Das Ziel wird auch auf der Seite "Dataset-Details" angezeigt, jedoch nur, solange die Arbeitsanforderung vorhanden ist.
Für Dokumente können Sie in JSONL-Dateien exportieren.
- JSONL
- Du lebst nur einmal V5
- COCO
- PASCAL VOC
- JSONL
- JSONL Compact Plus Content
- spaCy
- CoNLL V2003 Hinweis
Wenn Sie Text im Format CoNLL exportieren, werden rekursive und sich überschneidende Entitys ignoriert.
Bei CSV besteht die einzige Option darin, nach
JSONL
zu exportieren.Diese Aufgabe ist in der CLI nicht verfügbar.
Diese Aufgabe ist in der API nicht verfügbar.
Beispiele für exportierte Dokument-, Bild- und Text-Datasets
Beispiele für die JSON-Dateien, die erstellt werden, wenn ein Dataset in Data Labeling exportiert wird.
Ein Beispiel für eine exportierte konsolidierte JSON-Datei.
{
"id": "ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyag7jcbu3xnpw4dcn3tmniarzorpxbtegnipsw5oleeauq",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaihdqc5z4zq4sqt7t4c7vbwc6lbf5dr6mky2phcpvdlh7c3p5mtuq",
"displayName": "test-check",
"description": "test check",
"labelsSet": [{
"name": "location"
}, {
"name": "university"
}],
"annotationFormat": "ENTITY_EXTRACTION",
"datasetSourceDetails": {
"namespace": "idrcdhfxwqwa",
"bucket": "test-sachin-cucket"
},
"datasetFormatDetails": {
"formatType": "TEXT"
}
} {
"id": "ocid1.datalabelingrecord.oc1.iad.amaaaaaazaehrjyahykmu6hvdksayw64a3wmur7mk2366hgitlypk6u2soea",
"timeCreated": "2021-10-12 12:09:37",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "sample-text.txt"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyat64zcfbjviu3pttykthabv5jiuicva3dkv6oikstzd7q",
"timeCreated": "2021-10-12 12:16:51",
"createdBy": "ocid1.user.oc1..aaaaaaaaktqgvx2skco6bfyziwjzfjaxensoewscqbk7p44sjqyrxmz4qozq",
"entities": [{
"entityType": "TEXTSELECTION",
"labels": [{
"label_name": "university"
}],
"textSpan": {
"offset": 60,
"length": 11
}
}]
}]
}
Ein Beispiel für eine JSON-Datei für ein exportiertes Dokument-Dataset.
{
"id":"ocid1.datalabelingdatasetint.oc1.iad.amaaaaaaniob46iafkiyw6a4uwgrnpy4lfxjoslocap7elaj257mxh4fzuwq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaajqiw27knoagxurhzjlihw7ijnoshsu4zi2uawdn5gfexdqwvu4vq",
"displayName":"Sep6_PDF",
"labelsSet":[
{
"name":"L1"
},
{
"name":"L"
},
{
"name":"23423"
}
],
"annotationFormat":"MULTI_LABEL",
"datasetSourceDetails":{
"namespace":"idgszs0xipmn",
"bucket":"Demo-bucket"
},
"datasetFormatDetails":{"formatType":"DOCUMENT"},
"recordFiles":[
{
"namespace":"idgszs0xipmn",
"bucket":"COVID_Dataset",
"path":"Snapshotsrecords_1632479104889.jsonl"
}
]
}
Ein Beispiel für eine JSON-Datei für ein exportiertes Bild-Dataset.
{
"id": "ocid1...",
"compartmentId": "",
"timeCreated":2020-12-15...,
"displayName":...,
"description":...,
"labelsSet": [
{"name":"germanshepherd"},
{"name":"americanshepherd"},
{"name":"australianshepherd"},
{"name":"irishwolfhound"}
]
"annotationFormat": "IMAGE_OBJECT_SELECTION",
"datasetSourceDetails": {
"sourceType": "OBJECT_STORAGE",
"namespace": "i235o3idk",
"bucket": "mytrainingdata",
"prefix": "puppyproject/"
}
"datasetFormatDetails": {
"formatType": "IMAGE" # image requires less metadata than delimited for example
}
"recordsFiles: {
[
{
"namespace": "i235o3idk"
"bucket": "mylabels"
"path": "puppyproject/records1.json"
}
]
}
"definedTags": {}
"freeformTags": {}
}
Ein Beispiel für eine JSON-Datei für ein exportiertes Textfeld.
{
"id":"ocid1.datalabelingdatasetdev.oc1.iad.amaaaaaazaehrjyamqjx733dhxd25zxcro2nftrewq7ltj34ua2cfapzsmjq",
"compartmentId":"ocid1.compartment.oc1..aaaaaaaagzh2kii2frktoc7bcvfydpzkxr7dbn6nf6jcyrxwgzen4pi5y4zq",
"displayName":"NER DEMO DATASET UNLABELLED",
"description":"NER DEMO DATASET UNLABELLED",
"labelsSet":[
{
"name":"Person"
},
{
"name":"Organization"
},
{
"name":"Event"
},
{
"name":"Place"
}
],
"annotationFormat":"ENTITY_EXTRACTION",
"datasetSourceDetails":{
"namespace":"idrcdhfxwqwa",
"bucket":"news-articles"
},
"datasetFormatDetails":{
},
"recordFiles":[
{
"namespace":"idrcdhfxwqwa",
"bucket":"snapshots",
"path":"forReview/records_1621847577526.jsonl"
}
]
}
Beispiel für eine JSON-Datei eines exportierten Dokumentdatensatzes.
{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaqgpzhscdpdcgohg5ocp3obwmjjgju6m73bmyrt4aovhq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 98.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iatjg3p6hlszxrgmsj4y76b5tndddaedm6ardkoxbtt6mq",
"timeCreated":"2021-09-06 03:42:43",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"23423"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iasb5klulgaj4djn3acsgsd3cekx3ix46ftxjdip4tu23a",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 99.pdf"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaaniob46iav45mlpcleqjt7cnmhyogopszi2rfnilwjhd4xyxa7irq",
"timeCreated":"2021-09-06 03:42:47",
"createdBy":"ocid1.user.oc1..aaaaaaaa6ynps4htdea6fqoerfhkedp3lih2ktureqhw3hmfojde6ukf3mpa",
"entities":[
{
"entityType":"GENERIC","labels":[
{
"label_name":"L1"
}
]
}
]
}
]
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaxhixolkqryomyu6i4jrrmzwcckw2tmgva47suylu5rzq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 97.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iagymrjuem42kvzilxjd5hdrr3djznrl7aajvvcr6zc6sq",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 96.pdf"
}
}{
"id":"ocid1.datalabelingrecord.oc1.iad.amaaaaaaniob46iaclpccpxn5hgmplesv3mt3g6hxkfaepzv6fuy7b6he3ca",
"timeCreated":"2021-09-06 03:40:02",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"SampleDocs-sample-pdf-file copy 2.pdf"
}
}
Beispiel für die JSON-Datei eines exportierten Bilddatensatzes.
{
"id": "ocid1...",
"timeCreated": 2020-12-15...,
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "filename2.jpg"
}
"annotations": [
{
"id": "ocid1....",
"timeCreated": ...,
"createdBy": ...,
"entities: [
{
"entityType": "IMAGEOBJECTSELECTION",
"labels": [
{"name": "germanshepherd"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.2, "y":0.2},
{"x":0.3, "y":0.2},
{"x":0.3, "y":0.3},
{"x":0.2, "y":0.3}
]
}
},
{
"entityType": "BOUNDING_BOX",
"labels": [
{"name": "irishwolfhound"}
],
"boundingPolygon": {
normalizedVertices: [
{"x":0.4, "y":0.4},
{"x":0.5, "y":0.4},
{"x":0.5, "y":0.5},
{"x":0.4, "y":0.5}
]
}
}
]
}
],
"freeformTags": {
"set": "validation" # optional, user defined convention used for reproducibility
}
}
Beispiel für eine JSON-Datei eines exportierten Textdatensatzes.
{
"id":"ocid1.record.oc1.iad.UxxfPBMZVYfwZHZnjCPUGkhMwpWoTPMOnxDnrgXbBxwLKkrdeGwewdViOoUJ",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_3.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyadghacojq3nmo2mtcbcmlo4rgslmpzxeboujhduft5nta",
"timeCreated":"2021-46-21 09:46:45",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":141,
"length":12
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":204,
"length":20
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":254,
"length":15
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":402,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":638,
"length":11
}
}
]
}
]
}{
"id":"ocid1.record.oc1.iad.AakCoDHvJpnZofzIYfRCfpZnFUqNmfiWNIuNysbXCSRZeTVqdwKGvYjJpMvh",
"timeCreated":"2021-06-21 09:06:01",
"sourceDetails":{
"sourceType":"OBJECT_STORAGE",
"path":"article_1.txt"
},
"annotations":[
{
"id":"ocid1.datalabelingannotation.oc1.iad.amaaaaaazaehrjyafoed6oimxqxeyey6osjo3jp52vsyd75i5zspfvcfdz3q",
"timeCreated":"2021-30-21 03:30:10",
"createdBy":"ocid1.user.oc1..aaaaaaaazjupiis2cu54smlzemiujpqxriz6i4wp3euuqrzffdugib73epbq",
"entities":[
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":36,
"length":8
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Person"
}
],
"textSpan":{
"offset":147,
"length":23
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Organization"
}
],
"textSpan":{
"offset":196,
"length":3
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Event"
}
],
"textSpan":{
"offset":311,
"length":22
}
},
{
"entityType":"TEXTSELECTION",
"labels":[
{
"label_name":"Place"
}
],
"textSpan":{
"offset":512,
"length":49
}
}
]
}
]
}
Ein Beispiel für eine JSON-Datei für ein exportiertes CSV-(Text-)Dataset.
{
"id": "ocid1.datalabelingdatasetint.oc1.phx.amaaaaaaniob46iaxarhafiu42tbdm2d2nkxlkxwhnc76ohnwvpsdfccqw5q",
"compartmentId": "ocid1.compartment.oc1..aaaaaaaaundh4v2w4spnyt4hgy367qf54jonakpz6gh573bspmgzfoj2auga",
"displayName": "Text Classification CSV dataset",
"labelsSet": [{
"name": "positive"
}, {
"name": "neutral"
}, {
"name": "negative"
}],
"annotationFormat": "SINGLE_LABEL",
"datasetSourceDetails": {
"namespace": "idgszs0xipmn",
"bucket": "TEST",
"prefix": "languageteam/Text_Classification_Context_Oracle_advt.csv"
},
"datasetFormatDetails": {
"formatType": "TEXT",
"textFileTypeMetadata": {
"formatType": "DELIMITED",
"delimitedFileTypeMetaData": {
"columnIndex": 5,
"columnName": "CONTENT",
"columnDelimiter": ","
}
}
}
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iajx42mojwkktind744i3t2q3di6tdhwysw2wy4d42tseq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/546"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iadsu6zpch4lvozx7ci3as5st23jqxjpjdcryp4jworala",
"timeCreated": "2022-06-05 05:40:48",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia7otgs2rb3kuh464sisfbjxxbbkb65sbg2icst3gquw3q",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/303"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatfuceqzjb5nnh7quk5wupvwe74bfpn5oka57cz6gqv4a",
"timeCreated": "2022-06-05 05:41:30",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iab55fqcxlfb3xszlpp7qnpsthjdhzzb7nki65xqdvgceq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/547"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iamosgunt72lci3g3mzyyx2sskjdje4e5zspts7mbnsl5q",
"timeCreated": "2022-06-05 05:41:36",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "neutral"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia45ave4zhtisvu2k7d6tbciskcge4ecm2imb6bvdqe4da",
"timeCreated": "2022-06-05 04:39:21",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/564"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iauqo6tlqil7vijetsayt6vsmpohxum5vmj6cde3wbfxua",
"timeCreated": "2022-06-05 05:40:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iasymkpbstgjwmae7ar5ikgp5mtth2izcaaaruatpl45ma",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/545"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iatu6k7afdwirdtvv6bofrquc65m4ruet4hlfmhgzhqjxa",
"timeCreated": "2022-06-05 05:41:02",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46ia6n4whohdhn257pmot7zlncawockthadosdhrp5so2nna",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/304"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iaslgb6s6h5ffce5mcgeidndp3vydcxzjya7yrbaj6pw5a",
"timeCreated": "2022-06-05 05:40:57",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iamgsncrjarzujr6duaedmsjyrp67yi7dpe2uoi6h54c5a",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/548"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46iabt3hwyc7mkaanez7q24k7vlfds3lisa6hdu53hntq2qq",
"timeCreated": "2022-06-05 05:42:55",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "positive"
}]
}]
}]
} {
"id": "ocid1.datalabelingrecord.oc1.phx.amaaaaaaniob46iactsl4j7v633d2y2t67lkxawv2nyemz7wwarppjpxeofq",
"timeCreated": "2022-06-05 04:39:18",
"sourceDetails": {
"sourceType": "OBJECT_STORAGE",
"path": "/305"
},
"annotations": [{
"id": "ocid1.datalabelingannotation.oc1.phx.amaaaaaaniob46ia7xxg4ukky3ur56zzwaodvwrks4vqgvoug2z2moif274a",
"timeCreated": "2022-06-05 05:41:44",
"createdBy": "ocid1.user.oc1..aaaaaaaaavjgmgh67ndbznlhnuxhzswfbwcpd5tlvugskeeqt7noudcu7xha",
"entities": [{
"entityType": "GENERIC",
"labels": [{
"label_name": "negative"
}]
}]