Wednesday, July 29, 2015

git vs svn - pulling in external repositories

I have had the pleasure of using both Subversion (svn) and git. I came out of extensive use of Subversion, having administered repositories both personally and professionally for 8+ years. During that time I participated on the Subversion Users mailing lists, both seeking and providing advice. During which time I had upgraded many repositories from one version to another. Needless to say, I would say I am an expert in Subversion, at least versions 1.2 through 1.7.

In 2013 I started a new position and the teams I have worked with since used git. I had been using git extensively since, and recently started implementing submodules in some newer git repositories. This post is a reflection of my comparison between Subversion's svn:externals system (as it was 2 years ago at least, which I doubt has changed much since), and git's submodule system.

The end-result of the two systems are the same - pulling into one repository one or more other repositories so that it may use them, rely on them. This is favored model of mine when it comes to having public and private interfaces. I create a repository that contains the public interfaces, and all the various repositories implementing those interfaces pull the public interfaces repository in as a dependency. Interfaces internal to the library are kept in a separate section (f.e the include directory in the repository). The beauty of this model is that it allows you to create a consistent set of APIs that can be released; the implementing libraries can change their internals as long as they maintain the public interface. Further, it allows for things like file formats to be abstracted easily since the detailed information is hidden inside a project, not in the public interface - which can be as abstract as needed.

That's the use-case, but what's the difference between these two very good version control systems in providing the requisite functionality?

First, Subversion.

Subversion provides a series of textual content properties in a repository that are, like everything else, versioned as part of the repository. Change a property, and it creates a new revision in the repository. To support the functionality discussed above, Subversion provides a property called "svn:externals". The "svn:externals" property consists of multiple lines; each line describing a repository and where to store it.

Prior to version 1.5, the "svn:externals" property used one format that was specialized to the use case. In 1.5 and later, the format was revised to match that of the command-line "svn" interface. Furthermore, this change provided additional versioning capabilities. The original format had to specify a complete URL, just like one would to access a repository; the new format could continue to do so or one could use a relative URL format - meaning it in the same repository, just under a different portion of the tree.

When one would checkout a repository with additional repositories in the "svn:externals" property, Subversion would also automatically pull all the repositories listed in the "svn:externals" property and place them in a the specified places. Once the checkout was done, you were ready to use the contents of the repository - build the software, etc. No additional steps necessary.

Now, git.

git provides the functionality through the "submodules" sub-command. The "submodule" sub-command has a series of its own sub-commands which perform most of the various tasks on the external sources. git itself controls the entire internal format. though the data is stored in a text file called .gitmodules. git tracks the external source just like any other object - through its commit hash.

However, unlike Subversion, git does not automatically pull the "submodule" repositories when the main repository itself is cloned - this requires an extra step.

The solution? Projects using git create scripts that perform several of the git tasks automatically so that they don't have to remember them every time a repository is cloned.

The difference for git is a fall-out of its design. Since a clone of a repository records all the information, and looks nearly identical to a working copy. A hosting server does not want to checkout every submodule, therefore it cannot be automatic.

In many respects, I feel the functionality in Subversion to be superior in this area as it's a pain to figure out what to do and how to interact with the git submodules. Everyone talks about how the commands work, but no one really goes through the complete steps of using it.

So, to clear the air a little on git submodules:


1. To add a submodule:

myrepo $ git submodule add git://repository/url.git foldername

2. After checking out a repository containing submodules:

myrepo $ git submodule init
myrepo $ git submodule update

3. To update a submodule:

- the submodule can be managed just like any other git repository
- when the submodule is in a state that is desired, just add it like any other git resources and commit it with the rest of the changes.

git does help in that it provides a special "submodule" subcommand to run a command against every submodule - the "git submodule foreach" command. Thus the above can also be stated as:

myrepo $ git submodule foreach init
myrepo $ git submodule foreach update

Well, hope this helps.

No comments: