Clean an External Repository Before Pushing it to a Project

If you’ve been using a Git repository on another platform, such as GitHub or Bitbucket, and are ready to have VB Studio manage the repository, it's a good idea to do a little cleanup before pushing the repository to VB Studio. For example, consider using Git Large File Storage (LFS) to store large files (more than 50 MB) or remove them altogether. If a Git repository has too many large files, operations, such as cloning, fetching, and pulling, can become extremely slow over time.

Let's look at the process.

Remove Large Files from an External Git Repository

Use Git commands to remove large files from an external Git repository before pushing it to VB Studio.

A file is considered large if it exceeds 50MB, which is the maximum object size for a VB Studio repository. If you try pushing an external Git repository to VB Studio for the first time and files exist that are larger than 50MB, you'll get an error. Your organization may choose to follow an even lower threshold, such as 25MB and above, to more stringently manage your repository size.
Here’s how to identify and remove large files in an external Git repository:
Before you begin, tell relevant users that you’re cleaning up the external repository and ask them to check in any pending changes.
  1. Use --mirror to clone an exact copy of your external repository, preserving all references, branches, and tags:
    git clone --mirror <repository-url>
  2. Open the Git client - perhaps the Git CLI - and create a backup of your local repository:
    cp -r <repository-root-directory> <backup-directory>
    This ensures you can restore your repository if something goes wrong. You'll also have a copy of the large files.
  3. Go to the root directory of the clone you just created:
    cd <repository-root-directory>
  4. To identify large files:
    1. In your external repository, create a text file named files-to-remove.txt. You'll soon use this file to record the large files you want to get rid of.
    2. Run:
      git rev-list --objects --all | 
      git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | 
      sed -n 's/^blob //p' | 
      sort --numeric-sort --key=2 | 
      cut -c 1-12,41-
    3. In the output, identify all files over your chosen threshold (here, 50MB) and copy their paths to files-to-remove.txt.
  5. Delete the large files you've identified:
    1. Create a temporary directory with write permissions, /tmp/a/scratch. You'll need this directory in the next step.
      mkdir -p /tmp/a/scratch
    2. For each large file you identified (look at files-to-remove.txt for the list), run:
      git filter-branch --prune-empty -d /tmp/a/scratch
      --index-filter "git rm --cached -f --ignore-unmatch <path-to-file-to-remove>" 
      --tag-name-filter cat -- --all
  6. Verify all the large files you want removed are gone.
    1. Run the same command you used in Step 4b:
      git rev-list --objects --all |
      git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | 
      sed -n 's/^blob //p' | 
      sort --numeric-sort --key=2 | 
      cut -c 1-12,41-
    2. In the output, look for files larger than your chosen threshold (that is, 50MB).
      If you don't see any, your purge was successful. If you do see files greater than 50MB, repeat Step 5b for these files.
  7. Now that you've removed large files, clean up any remaining references to free up space:
    1. To remove all references to the deleted files:
      git reflog expire --expire=now --all
    2. To permanently remove unreferenced objects (deleted files) and free up space:
      git gc --aggressive --prune=now
  8. In VB Studio, create an empty Git repository. Then, push the clean external Git repository to the empty project Git repository you just created.
Now you can tell all external repository users that the clean up is complete. If you're going to use Git LFS to add large files back, ask users to re-clone only after these large files are re-added.

Use Git LFS to Add Large Files Back to a Git Repository

After removing large files from an external Git repository, you can add them back using Git Large File Storage (LFS), once the initial push to VB Studio is complete. Git LFS replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server. When you use Git LFS with VB Studio, files are stored in an OCI Object Storage bucket.

Before you begin, verify that your VB Studio instance is authorized to access OCI Object Storage. This is important because Git LFS needs an OCI Storage Bucket to hold large files. Here's how to check this:
  1. In the VB Studio left navigator, click Organization > OCI Account, and look for this message: VBS instance configured in OCI console.
  • If the message appears, your instance has access to OCI Object Storage. You can go ahead with using Git LFS to add large files back to the repository.
  • If the message doesn't appear, your instance doesn't have access to OCI Object Storage. Before proceeding, you or an administrator will need to enable this for your VB Studio instance.

Here’s how to use Git LFS to add files back to a repository:

  1. Copy the Git repository URL. On the Git page, from the Repositories drop-down list, select the Git repository. From the Clone drop-down list, click Copy to clipboardthe Copy icon.
  2. Open the Git client - perhaps the Git CLI - and navigate to the directory where you want to clone the remote Git repository.
    If the directory into which you want to clone the repository isn't empty, you'll need to create a new subdirectory and clone the repository into it. You can only perform a cloning operation into an empty directory.
  3. Clone the repository and check out the branch you want to use:
    1. git clone <repository-url>
    2. git checkout <branch-name>
  4. To the branch you checked out, copy the large files you previously deleted.
    You can copy the files from the backup you created.
  5. Install Git LFS in the local repository:
    git lfs install
  6. Add the large files you want Git LFS to manage. You can get the file names from files-to-remove.txt. Remember that you'll need to adjust the file paths to match your local VB Studio repository's file structure.
    git lfs track <path-to-large-file>
    You'll see a new file, .gitattributes, in the root directory.
  7. Stage the files to be committed:
    git add .
    git add .gitattributes
    To verify that Git LFS is tracking the correct files, run git lfs ls-files and make sure the files you see are identical to the list in files-to-remove.txt.
  8. Commit your changes locally.
    git commit -m "Initial commit using LFS"
  9. Check out the main branch and push your changes to the remote repository:
    git checkout main
    git push -u --all
Now, in the remote repository, verify that large files are tracked by Git LFS:
  1. In VB Studio, go to the Git page, and select your repository and branch from the Repositories drop-down list.
  2. Click any large file you tracked in Step 6. If Git LFS is working, you’ll see a pointer instead of the file.