Best Practices for Storing Large Files and Binaries

If one or more of your project's repositories contain large files and/or many binaries, it may make sense to use Cloud storage, Maven, or Git LFS instead of versioning these kinds of files with Git.

Here are a few reasons why you shouldn't store binary files in a Git repository:
  • You can't use Git to diff binary files.
  • Large files grow your repository's history every time they're updated, because Git stores the full size of every version of every binary file. If these binaries are large, they'll quickly become the largest item(s) in the repository.
  • As the repository size grows, Git operations, such as cloning, fetching, and pulling will become extremely slow.
In general, the way you store binaries should be based on the type and use of the binary file:
  • If the binary file is a build artifact, you should store it directly, when the artifact is built, in a Maven repository.
  • If the binary file is a binary document that is saved by an application, such as Microsoft Excel or Word, it should be stored in either:
    • Oracle Cloud storage
    • Git LFS, if the binary documents are somehow associated with source files stored in the same Git repository

Git LFS uses pointers, instead of the physical files, when the files or file types are marked as LFS files. When you pull a Git LFS file to your local repository, the file is sent through a filter that replaces the pointer with the file. The physical files are located on the remote server and files that are pulled are located in a cache on your local machine. This means that your local repository will be smaller in size than the remote repository, where all the files and all the differences are still physically stored. See Enable and Use Git LFS to Version Large Files for more information about using Git LFS.

There are a few potential drawbacks you should consider as you decide whether or not to use Git LFS. Here are a few:

  • The local Git LFS cache won't be cleaned up automatically. Just as you have to prune remote branches on a regular basis, you also have to prune your Git LFS content with the git lfs prune command.
  • You need to make sure that all the developers have Git LFS installed. When someone who doesn't have Git LFS installed commits a file that should be associated with Git LFS, you'll see some strange errors. These problems can be fixed, but it's better to prevent this from happening.

To summarize, the best approach is to store large files and binaries in Cloud storage or Maven whenever possible. If you absolutely need to version these types of files, consider using Git LFS, but recognize that it isn't a panacea. It has its own associated costs, especially for maintenance.