Wednesday, February 01, 2017

Why I will never Squash or Rebase in Git...

I'm a versioning purist. I'll admit that. I love to be able to access version history in the VCS systems I work with. May be it's just my Subversion (SVN) background, but I like being able to read through the history easily and find the logic for what changed and why in the log messages associated with commits. (And if the log messages don't contain that information then they're not *good* log messages.) So I am heavily against Squashing and Rebasing in Git.

GitHub recently introduced the ability to do to Sqashed Commits on merge, and some of my team members decided to give it a try. However, it was immediately apparent that Squashing is evil. Why? Because it really hurts being able to track stuff and keep a clean working copy locally.

My general local development works something like this (assuming the working copy has already been cloned and upstream setup):
$ git checkout master
$ git fetch upstream
$ git merge upstream/master
$ git checkout -b my_working_branch
do stuff
$ git commit
$ git push origin my_working_branch
get it merged remotely
$ git checkout master
$ git fetch upstream
$ git merge upstream/master
$ git branch --merged
look for my_working_branch
$ git branch -d my_working_branch
Squashing creates several issues:

1. Detection of merges break

For example, if you squash a branch on merge the last couple steps above won't work. Git can't tell that the branch was merged b/c it can't track the hashes of the branch. This means that you now have to do:

$ git branch -D my_working_branch
This means it now becomes extremely easy to remove the wrong branch.

This is also true of specific commits if you squash before pushing and are cherry picking between branches, etc - so it's not isolated to just branch merges.

2. Removes valuable history and insight

 History contains details. Logs contain details. This is very important information when trying to determine why someone did something the way they did - e.g when trying to find and fix a bug.

Git has an awesome feature called git bisect that allows you to find the exact commit a bug was introduced in. Squashing means you can only find the total group of commits that introduced the bug, not the commit itself. You now have a take it or leave it for the entire group. You also lose any contextual information regarding the specific commit and why it may have happened.

3. It foobars anyone tracking what you're doing
When using a VCS system code is meant for sharing, and once you share it others (f.e upstream maintainers, co-maintainers, etc) may  checkout your branches to monitor progress if they are interested in what you are doing. You do not necessarily have no clue who these people are either. However, squashing and rewriting history will screw up their ability to cleanly track your work.

You also make it problematic for yourself, especially if working on the same codebase on multiple systems (f.e laptop, desktop, server) since you will have to do a force push (git push origin my_working_branch --force) if you squash after pushing it, which means you'll have the same issues as others if you need to keep others places in sync, not to mention you may lose your own work in that case too if, for example, you push up one change set from your laptop and another without merging from your desktop. What got pushed from the first system (e.g the laptop) via a forced push will be lost when the second system (e.g the desktop) pushes up the changes.

Git Rebase runs into many of these same issues, even exasperating some of them (#3). Rebasing also runs into the following issues:

1. Your own branch history makes less sense.

That is to say, you lose the context of the changes in your branch by moving about the commits. The reason why you did something in a commit has very much to do with what the code looked like prior to that commit. Rearranging the history so newer commits appear after merges removes that context.

2. Sharing branches becomes that much harder.

This is a really, really big emphasis on #3 above regarding squashing breaking sharing branches with others and even yourself. Only it happens on the merge level instead of the push level.

Now is this not to say that there are not use for these features at times. There are, but they should be used with extreme caution and with extreme rarity. The smaller the project, the less likely they should be used.

For example, I can certainly understand why the Linux Kernel maintainers may use these features - with dozes of people sharing code and consolidating it down as it moves upstream. However, that is a project that has numerous layers where upper layers don't need to care as much about the details of the lowest layers, so squashing and rebasing can happen at controlled points between the layers and everyone - tracking at their layer - will be able to track what's going on more easily. Bugs are tracked and fixed at the various layers. Most projects are neither this size (millions of lines of code, contributed by tens of thousands of people), complexity, or have such a large hierarchy of contributors (subsystem and release has a person dedicated to its maintenance).

In the end, you really need to care more about history than most people do - especially in small projects, and even more so in projects that may have high turnover of its contributors, and even more so when turnover leaves little (if any) time for transfer of knowledge.

History is always important, and you may never know how important it is because it may be the person several times removed from you - long after you have moved on to better things - that is maintaining the code and trying to figure out why you did something that needs to know the history and details. Always write code and use history for that person, not yourself. They won't likely be as smart as you either.

The above is, for now, my current list of well known reasons why not to use Rebase and Squashing. I'll more as they come up.

UPDATE: If you ever have to force push, then you did something wrong.

Thursday, April 14, 2016

Docker and Network Security

Docker is great. Containers are awesome. But we still have to beware of security with them.

I have been getting more and more into Docker and Linux Containers of late. They make the old schroot functionality extremely easy to use (though the same caveats apply), but also make distributing that functionality extremely easy, and building it very reproducible.

Docker Compose takes it a step further, enabling multiple containers to be built and interlinked via the Docker Network. Just don't forget about your firewall.

On my dev boxes, I have a firewalls that by default rejects all traffic and then allows SSH so I can work on them. I've been using Docker containers on one of them lately, and noticed that some of the containers had requests from outside sources. That shouldn't have been - I didn't enable the firewall to allow that. So I checked IPTables, and sure enough there it was:

root@dev:~/project# iptables --list DOCKER
Chain DOCKER (1 references)
 target     prot opt source               destination
 ACCEPT     tcp  --  anywhere           tcp dpt:6379

The problem is the source column. Since it is set to "anywhere" any traffic coming from any IP or Interface can access the container. That's not what I wanted.

After asking around, there's an "--iptables=false" flag that can be provided to the Docker service. Using it prevents the IPtables rule from being entered at all. But then the container can't be accessed. It's isolated unless I write the rules myself - something I also don't want to do since it's more likely that I would get them wrong than if Docker did it.

From a security perspective, the above should be the following:

root@dev:~/project# iptables --list DOCKER
Chain DOCKER (1 references)
 target     prot opt source               destination
 ACCEPT     tcp  --        tcp dpt:6379
 ACCEPT     tcp  --        tcp dpt:6379

This limits all traffic to the containers to (a) anything from local host, and (b) anything from within the Docker Network. Alternatively, it could be resolved by using the Docker Bridget Network devices (e.g docker0) and the loop back interface (lo) so that anything abound to them would work. Either way it would be a dramatic security improvement over the current situation.

So here's an example.

You have an application that requires a database and provides a RESTful API. You want to use a tool like nginx to terminate SSL connections. In the normal case only the SSL connection port would be exposed to the public for use - both the ports for the database and for the RESTful API are to be hidden inside the container network, but they have to be exposed to each other so that all the containers can talk to each other. You dockerize all these. Then you check the firewall and see that all three are exposed to the public network.

This issue is several fold:

1. It's an issue for devs because they may be doing this on systems on random networks (if using a laptop) or publicly available systems (if using a cloud server). Nefarious actors can then target the devs and possibly learn about stuff that will eventually be in production, and know things you don't want them to know.

2. It's an issue for deployments if you're not careful. The only ways to resolve it are (a) disable firewall modifications by Docker and manage it all yourself, or (b) put the entire system into a private network. This also assumes you actually have control to do that instead of using a service that just uses some specifications (e.g docker-compose.yml) to build things out and host the site for you.

I've filed a Bug/Feature-request against Docker on the issue. Hopefully we can get some attention and help to get this fixed and enable everyone to use Docker more securely - preferably by default, but even a non-default option would be an improvement.

Just to be clear - does this mean you shouldn't use containers or Docker? Absolutely NOT. Just be careful when doing so, and take precautions when using it for development and especially for production deployments.

Friday, February 19, 2016

Releasing Python Packages with PBR...

So it's been a while since I've had to release one of my Python-based projects and publish it to the PyPi distribution network. Publishing packages is generally really easy:

$ python sdist build
Writing myproj-x.y.z.tar.gz
$ twine upload -r pypi dist/myproj-x.y.z.tar.gz

However, I also use OpenStack's PBR (Python Build Reasonableness) as it makes the and related functionality very easy. However, it also complicates the above...

$ python sdist build
Writing myproj-x.y.z-devNNN.tar.gz

What to do?

If you look closely at the documentation for PBR you can find some notes for packagers - Among these notes is a statement about the environment variable PBR_VERSION - which is easy to overlook given the non-obvious link to the package your trying to release.

In the end, you just have to use PBR_VERSION to get it right and bypass any version calculations PBR itself does like so:

$ export PBR_VERSION=x.y.z
$ python sdist build
$ twine upload -r pypi dist/myproject-x.y.z.tar.gz

And voilĂ  it's the correct package for the version and now it's up on PyPi.

Wednesday, July 29, 2015

git vs svn - pulling in external repositories

I have had the pleasure of using both Subversion (svn) and git. I came out of extensive use of Subversion, having administered repositories both personally and professionally for 8+ years. During that time I participated on the Subversion Users mailing lists, both seeking and providing advice. During which time I had upgraded many repositories from one version to another. Needless to say, I would say I am an expert in Subversion, at least versions 1.2 through 1.7.

In 2013 I started a new position and the teams I have worked with since used git. I had been using git extensively since, and recently started implementing submodules in some newer git repositories. This post is a reflection of my comparison between Subversion's svn:externals system (as it was 2 years ago at least, which I doubt has changed much since), and git's submodule system.

The end-result of the two systems are the same - pulling into one repository one or more other repositories so that it may use them, rely on them. This is favored model of mine when it comes to having public and private interfaces. I create a repository that contains the public interfaces, and all the various repositories implementing those interfaces pull the public interfaces repository in as a dependency. Interfaces internal to the library are kept in a separate section (f.e the include directory in the repository). The beauty of this model is that it allows you to create a consistent set of APIs that can be released; the implementing libraries can change their internals as long as they maintain the public interface. Further, it allows for things like file formats to be abstracted easily since the detailed information is hidden inside a project, not in the public interface - which can be as abstract as needed.

That's the use-case, but what's the difference between these two very good version control systems in providing the requisite functionality?

First, Subversion.

Subversion provides a series of textual content properties in a repository that are, like everything else, versioned as part of the repository. Change a property, and it creates a new revision in the repository. To support the functionality discussed above, Subversion provides a property called "svn:externals". The "svn:externals" property consists of multiple lines; each line describing a repository and where to store it.

Prior to version 1.5, the "svn:externals" property used one format that was specialized to the use case. In 1.5 and later, the format was revised to match that of the command-line "svn" interface. Furthermore, this change provided additional versioning capabilities. The original format had to specify a complete URL, just like one would to access a repository; the new format could continue to do so or one could use a relative URL format - meaning it in the same repository, just under a different portion of the tree.

When one would checkout a repository with additional repositories in the "svn:externals" property, Subversion would also automatically pull all the repositories listed in the "svn:externals" property and place them in a the specified places. Once the checkout was done, you were ready to use the contents of the repository - build the software, etc. No additional steps necessary.

Now, git.

git provides the functionality through the "submodules" sub-command. The "submodule" sub-command has a series of its own sub-commands which perform most of the various tasks on the external sources. git itself controls the entire internal format. though the data is stored in a text file called .gitmodules. git tracks the external source just like any other object - through its commit hash.

However, unlike Subversion, git does not automatically pull the "submodule" repositories when the main repository itself is cloned - this requires an extra step.

The solution? Projects using git create scripts that perform several of the git tasks automatically so that they don't have to remember them every time a repository is cloned.

The difference for git is a fall-out of its design. Since a clone of a repository records all the information, and looks nearly identical to a working copy. A hosting server does not want to checkout every submodule, therefore it cannot be automatic.

In many respects, I feel the functionality in Subversion to be superior in this area as it's a pain to figure out what to do and how to interact with the git submodules. Everyone talks about how the commands work, but no one really goes through the complete steps of using it.

So, to clear the air a little on git submodules:

1. To add a submodule:

myrepo $ git submodule add git://repository/url.git foldername

2. After checking out a repository containing submodules:

myrepo $ git submodule init
myrepo $ git submodule update

3. To update a submodule:

- the submodule can be managed just like any other git repository
- when the submodule is in a state that is desired, just add it like any other git resources and commit it with the rest of the changes.

git does help in that it provides a special "submodule" subcommand to run a command against every submodule - the "git submodule foreach" command. Thus the above can also be stated as:

myrepo $ git submodule foreach init
myrepo $ git submodule foreach update

Well, hope this helps.

Tuesday, October 21, 2014

Home Owners Associations - America's Hypocracy

I am an American. I value my freedom. And yet so many American's give it up to non-governmental organizations ever day when they purchase a home - a house, a town house - that is part of a Home Owners Association (HOA).  HOA's typically have fees that range any where from a few 10's of dollars to hundreds or thousands of dollars a year.  The cost is one thing; but the freedom that is given up is in the rules.

The rules are set by your neighbors. By those who were part of the HOA, it's politics, etc prior to you buying the home. You may or may not be able to change those rules; and they're not typically subject to the courts. That is, unless you want to refuse to pay the fines they issue against you, accrue some bad credit, have a lein placed against the property (so you can't sell it), and then try to fight it in the courts if you can even get it there. All-in-all you're at the mercy of the HOA.

So why are there HOA's?

Well, some will argue that you have to have them to protect your property value. Huh?  Oh, they want to make sure the neighborhood continues to look nice. So they want to exert control over their neighbors to try to make the whole neighborhood look like what they think it should be.

So what's the problem if it's all about property value?

Well, it's not. It's about control, and control over other people whether they admit it or not.

How's that?

Well, suppose you own a boat. You are legally allowed to keep it on your property. But your neighbor thinks it is unsightly. They won't want to see a boat. So they get the HOA to pass a rule saying that boats have to be in the garage, behind a fence, behind the house, etc. Just so they don't have to see it. They've now impeded your rights in order to satisfy their power thirst.

But it doesn't stop there.

Some places go so far as to control how many plants you can have in your front yard. Or how many cars you can have in the driveway.

One HOA I ran across had some vandalism of the pool that was tracked to some underage kids. They then passed an HOA rule that any minor (e.g under 18) out on common property of the HOA (e.g walking on the sidewalk) after 10PM would be arrested for trespassing. Absolutely the wrong response, but one allowed under HOA rules, and enforced by contract law.

Now don't get me wrong - HOAs can have a purpose - taking care of common property that doesn't belong to any single home owner. But that should be all that HOAs are allowed to do. They should not be allowed to control what goes on on your property. That should only fall under the laws governed by the voters.

But aren't HOA's governed by "voters"?

Not like your county, municipal, or state lawmakers are. Nor are they governed by any politics beyond what little happens outside your small community. They're not answerable to the normal legislative processes, and chances are most of the community knows even less about what is going on in the HOA than they do about the municipality or county politics (which sadly is little enough as it is). Moreover, they're typically private meetings that are not open to journalists, only other HOA members, and therefore not open to the normal public scrutiny that every other legislative body has.

Moreover, you can't get out of them unless you sell your home.

Moreover since many towns don't want to take over the burden of extending their population, they won't allow contractors to have the newly built communities added to them. So then the contractor sets up an HOA; which can't be dissolved unless either a new town is set up or an existing town agrees to absorb the community (which, again they are reluctant to do).

All-in-all it's getting harder and harder to buy a home without an HOA unless you can buy a chunk of land and build it yourself; and even then you have to make sure that it's not part of a community being built out by a contractor that is sub-parceling the land you're buying. Even then, not every State has laws that you, as the buyer, has to be informed about the HOA prior to sale - which has landed many in the position of having an HOA rep knock on their door demanding dues and fines long after they purchased the home.

Still think they're a good thing?

Still think they're out to save your property values?

Sorry, but in my opinion an HOA only DEVALUES your home because it restricts your rights.

HOAs are NOT American. They're an Anti-American entity; existing only to steal your rights so that one of your neighbors can illegal exert control over you.

Time to take back America.
Time to dissolve HOAs.

Tuesday, April 30, 2013

VMware Workstation 8 and Linux Kernel 3.8...

So I recently upgraded to Kubuntu 13.04, which also means upgrading to Linux Kernel 3.8. However, as with most Kernel upgrades my VMware install fails to upgrade. Most of the information out there is for VMware Workstation 9 (W9), but I'm running Workstation 8 (W8). Fortunately the fix for W9 is just as valid, but the lines numbers are a little different.

Here's what you need to do:

1. Linux changed where the "version.h" header file is for the source. The fix is easy - a simply symlink:

# ln -s /usr/src/linux-headers-`uname -r`/include/generated/uapi/linux/version.h /usr/src/linux-headers-`uname -r`/include/linux/version.h

Now that is specifically for Debian derived distros - you distro might put the headers somewhere else. And of course, you might be trying to support a kernel other than your running kernel - so adjust it as necessary.

This will allow the build tool for VMware's modules to actually run.

2. Workstation's VMCI module fails to build.

The Workstation 9 patch is available here:

For Workstation 8, you can go here: It's also below:
--- vmci-only/linux/driver.c    2013-03-01 02:46:05.000000000 -0500
+++ vmci-only.fixed/linux/driver.c  2013-04-30 11:05:25.923550628 -0400
@@ -124,7 +124,7 @@
    .name     = "vmci",
    .id_table = vmci_ids,
    .probe = vmci_probe_device,
-   .remove = __devexit_p(vmci_remove_device),
+   .remove = vmci_remove_device,

@@ -1741,7 +1741,7 @@

-static int __devinit
+static int
 vmci_probe_device(struct pci_dev *pdev,           // IN: vmci PCI device
                   const struct pci_device_id *id) // IN: matching device ID
@@ -1969,7 +1969,7 @@

-static void __devexit
+static void
 vmci_remove_device(struct pci_dev* pdev)
    struct vmci_device *dev = pci_get_drvdata(pdev);

Friday, April 12, 2013

The Post-PC Era

Several years ago I started talking about the how the Motorola Atrix and its laptop-dock would change the world as more manufacturers picked up on the concept and integrated it with Android and other devices. Sadly, The laptop-dock for Motorola's many phones that supported it was far too expensive - nearly $500 USD, so it just didnt make sense for people to buy. So why am I writing about this now?

Well, now we have tablets; bigger than the phones I wrote about, but just as functional, if not more so. In fact, they can cut down the price of that laptop-dock by removing the screen - as indeed ASUS has done with its dock for the ASUS Transformer line - or removing the requirement to dock at all, as many have done by simply adding a BlueTooth Mouse and Keyboard, e.g. LogicTech's BlueTooth Mouse for Android, AKA the V470. So that day I wrote about years ago is now coming to pass - I am now writing this from my ASUS Transformer Infinity using its dock-keyboard.

And, as I said then, Microsoft is not doing well in this kind of mobile world. Win8/WinRT is quite the spectacular failure. While historically Microsoft tried to force everything to the Desktop, they have at least tried to do mobile. However, Win8 is a hybrid between the two worlds - a hybrid for a world where there is no hybrid. The two worlds of computing really are vastly different. Each needs to be taken on on its own terms, exploiting its own nature. In the end, it means that Microsoft's strong hold on the end-user computing market is at its end. And as a result of Microsoft's own nature of everything must be Microsoft, it's not a world in which they will survive.

So we all owe a big thanks to Google and Apple for making it happen.