If you tried to add a substring from the git commit hash to your software version, you may have noticed that it is not easy (or possible). For example, imagine you want to version your software as “0.0.0-aabbccdd”, being “aabbccdd” the commit’s prefix in which this version was defined.
It turns out that the git hash is the sha1 of multiple pieces of information: the repository tree hash, the parent commit, author and committer name and email, Unix timestamp, timezone offset and message. You can visualize this information with git cat-file
:
$ git cat-file -p 97dbf1d711ef1c76779735b486546a0bdf75dc13
tree 8352f2c439b16c85e2c8e9fe2af8db9f303933ab
parent 9e32c286db179de34c5a4f0de8596dd8f4c65f08
author Author Name <author@email.com> 1616706238 -0300
committer Committer Name <author@email.com> 1616706238 -0300
Commit message
Many of those fields could be manipulated to search for a given commit hash. You could change a file content to influence the tree hash, the author information, commit time, or even add some random information to the commit message. Finding a full sha1 collision is possible, but not cheap. The first published collision (shattered ) used the equivalent of 6500 years of single-CPU computation or 110 years of single-GPU processing power. Since this investment is not something we want to do for tagging software versions, we’ll stick to small prefixes.
As a proof of concept, I implemented git-wormhole to create files containing the commit hash prefix. In this repository you can see the README with the commit hash inside. You can find more information about generating your own collisions here .