Run NVIDIA NeMo Framework Training Jobs
NVIDIA
NeMo Framework Launcher is cloud-native tool for launching end-to-end NeMo Framework
training jobs across thousands of GPUs for large scale LLM training. In this example we use
NeMo Framework Launcher to run the gpt3_5b large
language model, data
preparation and training stages.
See the NVIDIA documentation for more details on both NeMo and the NeMo Framework Launcher:
Data Preparation
The data preparation stage performs three tasks: download “the pile” uncopyrighted dataset; extract (uncompress) the data; and preprocess the data.
On a two-node cluster the steps took:
Run times would be substantially lower on a larger cluster where data
shards can be parallelized on up to 30 nodes.
- 90 minutes to download
- 46 minutes to extract
- 5 hrs 45 minutes for preprocessing