I think it’s generally agreed upon that large files that change often do not belong while small files that never change are fine. But there’s still a lot of middle ground where the answer is not so clear to me.
So what’s your stance on this? Where do you draw the line?
The main downside is Git downloads all history by default, and so any large files will bloat the download for people cloning your repo forever. It isn’t about binary vs text. It’s just the size that matters.
If it’s a build artifact, put it in a registry. If it’s resource type files, Git LFS can be used if it’s not an absolute ton.
This. If the file can be generated from the repository it should not be put inside it, but if you need it to build the project it should (unless it is an easy to install external dependency that should be declared in a Readme file).
Fyi, there’s a fun project designed for handling the syncing of large files that uses git under the hood called git-annex. Fun fact, it’s written in Haskell as well.
I don’t like it, but if they’re part of the project files, then they belong in version control. I do worry about the challenges of combining the difficult-to-merge nature of binaries with the distributed workflows that Git encourages. While data doesn’t get lost, the inability to merge them may mean that someone needs to spend extra time re-performing their changes if they “lose” the push/merge race.
Game engines have been doing a better job of transitioning away from large monolithic binaries by either serializing them in somewhat mergeable text files or at least splitting them into large numbers of smaller binaries to reduce file contention.
Git LFS does offer the ability to off-load them from the repository, reduce download and checkout times as well as the ability to lock files (which does introduce centralization…), but it doesn’t seem to be as ubiquitous and can be more expensive to use, depending on the team’s options for Git repo providers.
Note: I assume you mean binaries as in “non-text files”, not build artifacts, which definitely don’t belong in version control at all.
I’ll go to quite a bit of effort to avoid them. Arguably too much effort, but I often find that the path that avoids them is also useful in other ways.
For example, for a personal project, I automated rendering a PNG fallback icon from an SVG, so now I can have as many different resolutions as I want and don’t need to manually update them, if I want to tweak the icon.
I’d also like to publish a screenshot of the project. The simple solution is to check a PNG into the repo and link it in the README.md. But what would be a lot nicer, is to set up a project webpage, which with Codeberg Pages isn’t even that much effort, but I would have less motivation to do it otherwise.
I think the only binaries I have are tiny samples used by a couple of tests in that repo. I generally try to avoid them altogether.
Never do this.
Git is all about tracking changes over time which is meaningless with binary files. They are bloat for your repo, slowing down operations. Depending on the repo, they are likely to change from CI with every commit. That last one means that every commit turns into 2 commits btw. They are can ruin diffs. I could go on for a long time here.
There are basically 0 upsides. Use an artifact repository instead!
Git is all about tracking changes over time which is meaningless with binary files.
Utter codswallop. You can see the changes to a PNG over time. Lots of different UIs will even show you diffs for images.
Git can track changes to binary files perfectly well. It might not be great at dealing with conflicts in them but that’s another matter.
The only issue is that binary files tend to be large, and often don’t compress very well with Git’s delta compression. It’s large files that are the issue, not binary files. If you have a 20 kB binary file it’s going to be absolutely fine in Git. Likewise a 10 GB CSV file is not going to be such a good idea.
I think assets like app icons are ok. They rarely change, and are often quite small. It’s convenient to have those kinds of things bundled together with the code.