Git status not showing changes or new files?

April 5, 2012

In a git repository created using the –separate-git-dir option executing “git status” is not showing any changes. This was working correctly for weeks.

Details
I created a new file in the top level folder, it doesn’t show as untracked. Tried “git status -u -v”. Nothing. Removed the .gitignore file. Nothing. Strange.

The separate directory is on a shared drive in Windows with a connected drive “k:”.
“git fsck” shows no issue, etc.

Work around
Anyway, what I did. I deleted the .git file. Then I reinited:

git init –separate-git-dir K:/where/it/is/at.git

Now it works.

Details
OS: Windows 7, 64bit.
Git: 1.7.9.msysgit.0


Current Git repo branch name using Groovy

October 14, 2011

As some light research I was looking at how to determine a Git repository’s current branch name in a Groovy script.

In the Git CLI this would be shown with:

C:\temp\gitSdWf\project>git branch
  d12345-validation-fails
  d54321-login-box
  f52123-error-report
* master

The current branch is shown with a leading asterisk.

Invoke Git
One method is to just invoke the Git executable:

gitExe = "\\servers\\Git\\bin\\git.exe"
currentBranch = ''

matcher = ~/\* (.*)\s/

process = "${gitExe} branch".execute()
process.in.eachLine{ 
     line -> println line
     m = line =~ /\*\s+(.*)\s?/
     if(m){ 	
          currentBranch = m[0][1] 
          return
     }
}

println "current: '$currentBranch'"

Use JGit library
Another is to use a Java based Git library like JGit:

@GrabResolver(
     name='jgit-repository', 
     root='http://download.eclipse.org/jgit/maven'
)
@Grab(
     group='org.eclipse.jgit',
     module='org.eclipse.jgit',
     version='1.1.0.201109151100-r'
)

import org.eclipse.jgit.*
import org.eclipse.jgit.lib.*
import org.eclipse.jgit.util.*

directory = new File(args[0])

Repository repository =
  RepositoryCache.open(
       RepositoryCache.FileKey.lenient(directory,FS.DETECTED), 
       true)
	
println(repository.getFullBranch())	

This would be run as:
groovy testGit.groovy \temp\gitSdWf\depot\project.git
refs/heads/master

One problem with the JGit approach is that, afaik, this won’t work if the repo is represented by a ‘pointer’ file, i.e., created with the –separate-git-dir option. Also, unlike the Git branch command, you must open the target git repo at it’s root level, not a nested subdirectory of the repo.

Environment
– Windows 7 64bit Professional
– git version 1.7.6.msysgit.0
– Groovy Version: 1.8.2 JVM: 1.6.0_25

References


Oregon – Yet To Be (live)


Single Developer Git Workflow

October 7, 2011

I’m finding Git to be very useful. Since I use it in conjunction with the “official” VCS at my job, I classify this as a SDGWF situation. Below I give an example of the simple workflow I’m currently experimenting with.

log in gui

graphic log

Creating the two repositories:

cd \temp\gitSdWf
C:\temp\gitSdWf>mkdir project
C:\temp\gitSdWf>mkdir depot
C:\temp\gitSdWf>cd depot

C:\temp\gitSdWf\depot>git init --bare project.git
Initialized empty Git repository in C:/temp/gitSdWf/depot/project.git/

C:\temp\gitSdWf\depot>cd ..\project
C:\temp\gitSdWf\project>git init --separate-git-dir ..\depot\project.git
Reinitialized existing Git repository in C:/temp/gitSdWf/depot/project.git/
C:\temp\gitSdWf\project>echo hello world! > hello.txt
C:\temp\gitSdWf\project>git add .
C:\temp\gitSdWf\project>git commit -m "initial commit"
[master (root-commit) e935e02] initial commit
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 hello.txt

C:\temp\gitSdWf\project>git log
commit e935e0220ab90945f6160db00eee6bfd13173b6d
Author: josef betancourt 
Date:   Mon Oct 3 19:07:29 2011 -0400
    initial commit


Note that the first repository created is “bare”, i.e., it does not have any work files. It is not created in the
project work folder. At the project folder, a new Git repository is created, but with the –separate-git-dir option the first repository
is used as the actual project repository.

Instead of a .git subdirectory, a .git file is created at the project folder, and this file points to the actual Git repo. Why use this setup?
One reason is to allow tools to transparently work with the sanctioned project folder. For example, if we use the ‘normal’ repo setup, to zip the project, one would have to filter out the
.git subfolder (which could get large).

Next day we are assigned some tasks at work. For this example, we’ll make up a naming approach:
defect: dNNNNN-name
feature: fNNNNN-name

Also, for this example, we’ll just have two defects and one feature:
d12345-validation-fails
d54321-login-box
f52123-error-report

Rather then waiting to create the branches when the branch is needed, we just create them now:

git branch d12345-validation-fails
git branch d54321-login-box
git branch f52123-error-report

Our repository now has four branches:

C:\temp\gitSdWf\project>git branch | sort
  d12345-validation-fails
  d54321-login-box
  f52123-error-report
* master

Let’s work on the first defect.

git checkout b12345-validation-fails

C:\temp\gitSdWf\project>git branch | sort
  d54321-login-box
  f52123-error-report
  master
* d12345-validation-fails

Make this branch is up to date against the master branch:

C:\temp\gitSdWf\project>git merge master
Already up-to-date.
C:\temp\gitSdWf\project>git commit -a -m "worked on defect"
[d12345-validation-fails 0aa5930] worked on defect
 1 files changed, 1 insertions(+), 1 deletions(-)

C:\temp\gitSdWf\project>git diff master

diff --git a/hello.txt b/hello.txt
index d4f3cf2..9f2b679 100644
--- a/hello.txt
+++ b/hello.txt
@@ -1 +1 @@
-hello world!
+initial hello ^M

Work on the other defect:

C:\temp\gitSdWf\project>git checkout d54321-login-box
Switched to branch 'd54321-login-box'

C:\temp\gitSdWf\project>git merge master
Already up-to-date.
C:\temp\gitSdWf\project>git checkout d54321-login-box
Switched to branch 'd54321-login-box'

C:\temp\gitSdWf\project>git merge master
Already up-to-date.

C:\temp\gitSdWf\project>echo Have to go. >> hello.txt

C:\temp\gitSdWf\project>git commit -a -m "made some fixes"
[d54321-login-box db17a9e] made some fixes
 1 files changed, 1 insertions(+), 0 deletions(-)

C:\temp\gitSdWf\project>git diff master

diff --git a/hello.txt b/hello.txt
index d4f3cf2..7fe577f 100644
--- a/hello.txt
+++ b/hello.txt
@@ -1 +1,2 @@
 hello world!
+Have to go. ^M

Now lets switch to the feature branch …


C:\temp\gitSdWf\project>git checkout f52123-error-report
Switched to branch 'f52123-error-report'

C:\temp\gitSdWf\project>git merge master
Already up-to-date.

C:\temp\gitSdWf\project>echo Created new feature >> hello.txt

C:\temp\gitSdWf\project>git diff master
diff --git a/hello.txt b/hello.txt
index d4f3cf2..17244a7 100644
--- a/hello.txt
+++ b/hello.txt
@@ -1 +1,2 @@
 hello world!
+Created new feature ^M

C:\temp\gitSdWf\project>git commit -a -m "finished the feature"
[f52123-error-report aef7f49] finished the feature
 1 files changed, 1 insertions(+), 0 deletions(-)


C:\temp\gitSdWf\project>git log --graph
* commit e935e0220ab90945f6160db00eee6bfd13173b6d
  Author: josef betancourt 
  Date:   Mon Oct 3 19:07:29 2011 -0400

      initial commit

Lets log all the branches ….


C:\temp\gitSdWf\project>git log --graph --branches
* commit aef7f4990b63dbb8da4f81558ac492609afa3522
| Author: josef betancourt 
| Date:   Mon Oct 3 19:33:09 2011 -0400
|
|     finished the feature
|
| * commit db17a9eb9e0672b0e6d2b9a9bef645a5d3a6e1de
|/  Author: josef betancourt 
|   Date:   Mon Oct 3 19:29:41 2011 -0400
|
|       made some fixes
|
| * commit 0aa5930c88d619b58ce14908ba8795736988d46c
|/  Author: josef betancourt 
|   Date:   Mon Oct 3 19:26:05 2011 -0400
|
|       worked on defect
|
* commit e935e0220ab90945f6160db00eee6bfd13173b6d
  Author: josef betancourt 
  Date:   Mon Oct 3 19:07:29 2011 -0400

      initial commit

Show the detailed branch structure …

C:\temp\gitSdWf\project>git show-branch
! [d12345-validation-fails] worked on defect
 ! [d54321-login-box] made some fixes
  ! [f52123-error-report] finished the feature
   * [master] initial commit
----
  +  [f52123-error-report] finished the feature
 +   [d54321-login-box] made some fixes
+    [d12345-validation-fails] worked on defect
+++* [master] initial commit

Now check out master branch and merge the first defect …

git checkout master
C:\temp\gitSdWf\project>git merge --no-ff d12345-validation-fails
Merge made by recursive.
 hello.txt |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

C:\temp\gitSdWf\project>git merge --no-ff d54321-login-box
Auto-merging hello.txt
CONFLICT (content): Merge conflict in hello.txt
Automatic merge failed; fix conflicts and then commit the result.

Yuck, a merge conflict. What was the conflict?


C:\temp\gitSdWf\project>type hello.txt
<<<<<<>>>>>> d54321-login-box

C:\temp\gitSdWf\project>npp hello.txt

C:\temp\gitSdWf\project>"C:\Program Files (x86)\Notepad++\notepad++.exe" hello.txt

Edit the file and accept the changes, then merge …


C:\temp\gitSdWf\project>type hello.txt
initial hello
hello world!
Have to go.

C:\temp\gitSdWf\project>git status
# On branch master
# Unmerged paths:
#   (use "git add/rm ..." as appropriate to mark resolution)
#
#       both modified:      hello.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

C:\temp\gitSdWf\project>git commit -a -m "merged in 2nd defect"
[master cc7ebf6] merged in 2nd defect

Now merge the feature branch into master …


C:\temp\gitSdWf\project>git merge --no-ff f52123-error-report
Auto-merging hello.txt
CONFLICT (content): Merge conflict in hello.txt
Automatic merge failed; fix conflicts and then commit the result.

C:\temp\gitSdWf\project>type hello.txt
initial hello
hello world!
<<<<<<>>>>>> f52123-error-report

Edit the file, then commit the merge …


C:\temp\gitSdWf\project>npp hello.txt

C:\temp\gitSdWf\project>"C:\Program Files (x86)\Notepad++\notepad++.exe" hello.txt

C:\temp\gitSdWf\project>type hello.txt
initial hello
hello world!
Have to go.
Created new feature
C:\temp\gitSdWf\project>git commit -a -m "merged in feature"
[master dc58815] merged in feature

Show the final results using a log with graphics ….


*   commit dc58815ae250ee089e9e83db3b6e301a5a4552a9
|\  Merge: cc7ebf6 aef7f49
| | Author: josef betancourt 
| | Date:   Mon Oct 3 19:57:14 2011 -0400
| |
| |     merged in feature
| |
| * commit aef7f4990b63dbb8da4f81558ac492609afa3522
| | Author: josef betancourt 
| | Date:   Mon Oct 3 19:33:09 2011 -0400
| |
| |     finished the feature
| |
* |   commit cc7ebf6c79fc0abb82b2b0d1fd5e53bcbf4fb94c
|\ \  Merge: 8e47975 db17a9e
| | | Author: josef betancourt 
| | | Date:   Mon Oct 3 19:55:25 2011 -0400
| | |
| | |     merged in 2nd defect
| | |
| * | commit db17a9eb9e0672b0e6d2b9a9bef645a5d3a6e1de
| |/  Author: josef betancourt 
| |   Date:   Mon Oct 3 19:29:41 2011 -0400
| |
| |       made some fixes
| |
* |   commit 8e479756f7db2b926259c9774d5fd8cb38d14935
|\ \  Merge: e935e02 0aa5930
| |/  Author: josef betancourt 
|/|   Date:   Mon Oct 3 19:51:30 2011 -0400
| |
| |       Merge branch 'd12345-validation-fails'
| |
| * commit 0aa5930c88d619b58ce14908ba8795736988d46c
|/  Author: josef betancourt 
|   Date:   Mon Oct 3 19:26:05 2011 -0400
|
|       worked on defect
|
* commit e935e0220ab90945f6160db00eee6bfd13173b6d
  Author: josef betancourt 
  Date:   Mon Oct 3 19:07:29 2011 -0400

      initial commit

That’s a simple example. I’m sure I missed some steps during the text captures.

Currently I’m just using the “master” branch as the development branch. I only use branches for short term coding. I may go back to using more branches as I get better with Git.


Debugging: copying .git folder hangs Windows Explorer

September 26, 2011

Windows Explorer started hanging when using copy & paste of folders. I noticed that it only involved folders that had a “.git” subfolder.

Symptoms

  • Copy a folder and explorer does not refresh; you have to hit F5.
  • After the manual refresh, click on a folder and the explorer gets an hour glass.
  • The explorer.exe CPU usage reaches 50% and higher.
  • Only happens when a subfolder was named “.git”.
  • .git folder could even be empty.

Easy approach unsuccessful
I uninstalled Git and TortoiseGit then restarted the system. Still happening. Hmmm, that should have removed everything. Nope, I looked in the Registry and TortiseGit was still hanging around. I removed all traces and then even used ShellExView tool and could not find any shell extension that could be doing this. Using that tool I disabled all non-Windows shell extensions. Still happening.

More tools to help
This happened a month ago so I don’t remember all the steps I took. I used some of the SysInternals tools but they didn’t help much; you need a lot of internal Windows details to really use some of them.

However, I think it was using the Process Manager and looking only at file activity that all the info finally gave me a clue.

Progress

5:23:14.4666723 PM	explorer.exe	5096	ReadFile	C:\Documents and Settings\t16205\Start Menu\Local Disk (C)\target.lnk	SUCCESS	Offset: 0, Length: 299
5:23:14.4668257 PM	explorer.exe	5096	QueryInformationVolume	C:\Documents and Settings\t16205\Start Menu\Local Disk (C)\target.lnk	SUCCESS	VolumeCreationTime: 9/15/2009 8:47:39 AM, VolumeSerialNumber: 64B6-6315, SupportsObjects: True, VolumeLabel:

It seemed to be hanging at that point!

Solution
I looked in the Windows Start Menu list and saw that I had placed a link of the C: drive there. Not a junction or anything fancy, just a drag and drop of a shortcut (I think). I removed that link and the problem was solved.

Back to Git
Reinstalled Git and all is fine, knock on wood.

No explanation yet
Not really debugging, just floundering with a purpose. But, why would this only effect folders named “.git”, when even Git is not even installed. Strange.

OS
Windows XP Professional

Git
TortoiseGit
mysysGit

Tools
WhatIsHang v 1.10:
ShellExView v 1.66
Process Explorer v15.04
Process Monitor v 2.96
brain


Sneakernet with Git?

September 12, 2011

When remote connection is not possible, you can put a Git repo on a stick. But how do you merge without getting so many conflicts?

Why would you get so many conflicts? Well, since they are the same files, and you are changing them in local and remote, you will probably change the same lines. Or my present Git understanding is missing something, like a force option?

If you don’t care about branching and all that, just use file synchronization tools.

Here is an example session where the USB stick contains a Git repo:


git fetch  path-to-repo-on-stick
git merge -s recursive -Xtheirs --no-commit FETCH_HEAD

<now inspect, test, etc.>

git commit

Explanation
1. Get fetch will get the new stuff from the “remote” repo that is on the USB stick.

2. Now we want to merge this into the local repo.
   -s recursive is the strategy.
   -X gives the option to the strategy
   –no-commit we want to inspect and test before we actually merge.
   FETCH_HEAD is where the new stuff gets placed (the commit object?)

4. We visually inspect and run tests here. Nothing broke?

6. Commit the changes in the working set, that was created at 2.

Warning
Posted by a Git newbie, so “be careful out there”.

Updates
The git-bundle command is the standard way of doing this. Does using bundling remove the excess conflicts?

Further Reading


Remove CVS folders from Git version control

September 4, 2011

Open a bash shell and just do:
$ find . -type d -name CVS -exec git rm -r –cached ‘{}’ \;

Is there an easier or more direct method just using Git commands?

Note the –cached option (there are two dashes there). This will ensure we don’t remove the CVS folders in the work area, only those in the index.

Note:
If your on Windows, the msysGit or Cygwin installs include a Bash shell. Git, though usable on Windoze, still has a lot of *nixisms.

But
before doing the above, make a backup or a backup clone of the project. Maybe even do a: git fsck

Check
if everything was done correctly. Do a git status and it should print a bunch of:
# deleted: some/path/to/CVS/cvs track file

and, a reminder to add the .gitignore and .cvsignore files that you finally created, right?

Then to make sure the folders and their content are in the work area do:
$ find -type d -name CVS -print

In a Windows shell this would be: dir /S CVS
but, I didn’t test that.

Update ignores
; the .gitignore file should contain a line: CVS/
Make sure there is no trailing space at the end of the line.

I did a: git add -n . | grep CVS
to make sure nothing was going to be added.

Or instead, add this line to the .git/info/exclude file.

Don’t forget to modify the .cvsignore file to ignore the .git folder.

Now do
git commit

But wait …

WARNING:
CVS keeps file tracking version information in CVS subfolders (Subversion uses SVN). These are in each subdirectory of your working files.

It is very important that only CVS manage these. Thus, using Git (or Mercurial, btw) should have valid ignore files set up.

Just the existence of these subfolders is a good reason to abandon these systems. They make the use of external tools, even search, even more complicated. You have to use filters for everything. To be fair, good tools already have these filters in place.

Why
would someone want to do all this? Well, if you forget to create a .gitignore file before you put all your CVS officially managed project files into Git. Of course, no one forgets to create the ignore file first.

Why
would you ever use Git parallel with CVS? Well, a centralized repository discourages checking in of code. You have to have perfect code before you commit stuff. Don’t even think about branching off a branch (twigs).

An aspect of a centralized VCS is:

When you check new code in, everybody else gets it.

Since all new code that you write has bugs, you have a choice.

You can check in buggy code and drive everyone else crazy, or
You can avoid checking it in until it’s fully debugged.

Subversion always gives you this horrible dilemma. Either the repository is full of bugs because it includes new code that was just written, or new code that was just written is not in the repository.

As Subversion users, we are so used to this dilemma that it’s hard to imagine it not existing.

— from HgInit

So, if you can’t use the VCS to check in your work what do you do? Make manual backups? If you say use the IDE to manage it, you can’t. For example, an IDE (in my experience) cannot do “undo” and “redo” across separate source files, thus the ‘undo’ can only get you so far. For small changes not really an issue. But, if your working on a subsystem or module, or a small team, this can be an issue.

An alternative is
to just add a DVCS to the Integrated Development Environment (IDE) to do true local versioning (maybe embedded Git?). That way I can just indicate to the IDE that I want to mark my current work as a milestone, for example, and it will know what to do. Now I can proceed and do all the fancy little versioning stuff, like branch, log, etc. It is just local.

I don’t even need to know about the arcana of the embedded VCS, it is just the IDE doing its thing. Further, higher-order functions like tasks, bug tracking, could also relate to the IDE’s repo, not only the centralized repository.

When it is finally debugged brilliant code, as usual, I just push to the ‘real’ local version control system, which could just be the same DVCS or a centralized one like SVN or CVS, and then commit as usual.

I bet some IDE already does this. Oh well.

Updates
5Sep11: Took a look at the IntelliJ site. Their IDE does allow directory history and ability to set a version labels. Nice.

System
Git: git version 1.7.3.1.msysgit.0
OS: Windows 7 64bit Professional
PC: AMD quad-core, 8GB ram.

Further Reading


Ad Hoc Version Control With Git

April 1, 2010

In my blog post “How to do Ad Hoc Version Control” I suggested Mercurial as the DVCS to use.  However, I read that Git is a very popular and powerful alternative to Mercurial.  Over at the VIM Tips Blog, this post, shows how easy it is to create a local folder repository using Git.  Very similar to Mercurial’s steps.

Linked on the Git site we also have this:

Create your own repository anywhere

If you want to get some version control on a simple local project (i.e., it doesn’t have a big remote repository or anything), then you can simply use git init to create your own standalone local repository. For example, if you’re working on some design concepts for a new application, then you could do something like the following:

mkdir design_concepts
git init

Now you can add files, commit, branch, and so on, just like a “real” remote Git repository. If you want to push and pull, you’ll need to set up a remote repository.

git remote add
git pull master

Which is better Git or Mercurial?  
For ad hoc use, it probably doesn’t matter.  What matters is how easy it is to install, learn, and create repositories on demand.   For large systems with many version control requirements, one must evaluate each alternative carefully.  I’m sure advance use will be ‘difficult’ in any VCS system.

A nice recent blog post by C. Beust, “Git for the nervous developer”, provides a nice perspective on using Git, though perhaps not entirely relevant to the stated Ad Hoc usage scenarios.  That post has a nice list of Git references.

What is an optimal ad hoc workflow?
In the original article I gave context for an ad hoc situation. For example, fixing some server configuration files. In that use-case, Git-Flow would be overkill. Perhaps even GitHub Flow too.

A simpler approach is having two “standard” branches: master and dev. The ‘master’ branch is the live or solution branch. The ‘dev’ branch is where you will make any changes or experiments. Rarely would you create a new branch off of dev. If the ad hoc situation is that complicated, you have other issues! In a “lone developer” workflow however, branching off of dev would be more common. Another ingredient is a separate local Git repo to serve as a backup.

The GitHub Flow is still very applicable, of course, and may be better then the above. The GitHub Flow reminds me of “The Flying Fish Approach” that was documented in the cvsbook.

Once one of these attempts on the dev branch (or github flow feature branch) produces a positive result (the dev branch is live while it is current branch), you merge it into the master branch and then tag the master branch. Each success is tagged, that way when the success turns out to be bogus or has unintended consequences, you go back to the last ‘good’ tag. If you don’t use tags, you can just use the commit SHA1 id. Remember the master is always the final live environment of the ad hoc situation.

Now how to use Git to do the above workflow? I don’t know. Still learning Git. :)

Further Reading

  1. Using git for Backup is Asking for Pain
  2. Git
  3. Git Cheatsheet. This one is interactive.
  4. Git Ready
  5. A successful Git branching model
  6. Using Git for Document/Software Version Control
  7. Version control on a 2GB USB drive
  8. Git for designers
  9. Git for the nervous developer
  10. GitHub Flow
  11. The Flying Fish Approach – A Simpler Way To Do It
  12. Using Git as a “Poor Man’s” Time Machine
  13. Using Git as a “Poor Man’s” Time Machine – Part Two
  14. What tools exist for simple one-off sharing of a git repo?
  15. http://chakrit.com/post/5047604766/ad-hoc-git-server-on-any-oses-for-small-teams
  16. For home projects, can Mercurial or Git (or other DVCS) provide more advantages over Subversion?

How to do Ad Hoc Version Control

March 28, 2010

By Josef Betancourt,  Created 12/27/09    Posted 28 Mar 2010

Scenario

You will modify a group of files to accomplish something.  For example, you’re trying to get a server or network operational again after some security requirements.   During this process you also need to keep track of changes so that you can back out of dead ends or combine different approaches.  Or you just simply need to temporarily keep revisions of a group of files for a task based project.

Keywords

Version Control, Revision Control (RCS), Configuration Management (CM), Version Control System (VCS), Distributed Version Control System (DVCS), Mercurial, Git, Agile, Subversion

Lightweight Solution

An lightweight Ad-Hoc Version Control (AHVC) approach may be desirable.  Note that even when there are other solutions in place, a lightweight approach may still be desirable.  What are the requirements of a lightweight and workable solution?

  • Automated:  Thru human error a file or setting may not get versioned or even lost.  Thus, all changes must be tracked.
  • Small:  A large sprawling system that could not even fit on a thumb drive is too big.
  • Multiplatform:  It should be able to run on the major operating systems.
  • Non-intrusive:   Use of this system should not change the target systems in any way.  Ideally should run from a thumb drive or CD.  And, if there is a change, backing it out should be foolproof.
  • Simple:  Anything that requires training or complexity will not be used or adopted.  This reduces collaborative adoption and improvements in tools and process.
  • Fast:   Should be fast and optimized for local use.
  • Distributed:   Since issues can span “boxes”, it should be able to work with network resources.  This reduces the applicability of GUI based solution.
  • Scripting:  Should be easy to optimize patterns of use by creating high-level scripts.
  • Small load:  Of course, we don’t want to grab excessive CPU and memory resources from the target system.
  • Non-Admin:  Even in support situations, full admin access may not be available.
  • Transactional:  Especially in server configuration, changes should be consistent.  Failure to save or revert even one file could be disastrous.
  • Agile:  Not about tools but results.

DVCS

At home when I create a folder to work on some files, like documents or programming projects, I will usually create a version control system right in the folder.  This has saved me a few times.  I also tried to do this at a prior professional assignment and was partially successful (will be discussed later).

I used a Distributed Version Control System (DVCS).  Since it does not require a centralized server or complicated setup to use, a DVCS meets most of the lightweight requirements.  Though, a VCS is usually used for collaborative management of changing source files it may be ideal here.  One popular use case is managing one’s /etc folder in Linux with a VCS.

Mercurial

A good example of a DVCS is:

Mercurial:

“(n) a fast, lightweight Source Control Management system designed for efficient handling of very large distributed projects.” – http://mercurial.selenic.com/

Note:  I use Mercurial as the suggested system simply because I started with it.  Git and others are just as applicable.

To create a repository in a folder, one simply executes three commands init, add, and commit.  The “init” creates a subfolder that serves as the folder history or true repository.  The “add” is recursive, adding all the files to version control, and the “commit”, makes these changes permanent.  Of course, one can ‘add’ a subset of files and create directives for files to skip and so forth.

Before:

\GIZMO
+—client
\—server

In a command shell

cd \GIZMO
hg init
hg add
hg commit -m “initial commit of project”

After:

\GIZMO
+—.hg
+—client
\—server

The terminology may be a little confusing.  What happened is that now the GIZMO folder has a Mercurial repository which consists of a new .hg folder, and the other existing files and folders comprise the working directory (see Mercurial docs for a more accurate description).  There are no other changes!

That’s all it takes to create a repository.  No puzzling about storage, unique names, hierarchy, and all the details that goes with central servers.  The Mercurial docs show how to do the other required tasks, like go back to a previous changeset or retrieve file versions and so forth.  Here is how to view the list of files in a particular changeset:

c:\Users\jbetancourt\…cts\adhocVersioning>hg -v log -r 0

   changeset:   0:f29a0b0ad03c    user:        Josef Betancourt <josef.betancourt>l    date:        Sat Jun 21 10:53:11 2008 -0400    files:       AdhocVersioning.doc   description: first commit

And, here is a log output using the optional graph log extension (http://mercurial.selenic.com/wiki/GraphlogExtension)

c:\Users\jbetancourt\...adhocVersioning>hg glog -l 2
@  changeset:   9:25f4c55e4860
|  tag:         tip
|  user:        Josef <josef.betancourt>
|  date:        Fri Mar 26 22:43:56 2010 -0400
|  summary:     removed repo.bat
|
o  changeset:   8:43a33533c992
|  user:        Josef <josef.betancourt>
|  date:        Thu Mar 25 22:08:35 2010 -0400
|  summary:     removed old files
|

For the lone individual using ad hoc versioning a sample workflow is give at Learning Mercurial in Workflows.

Ad Hoc Sharing

A DVCS, true to its name, shines in how it allows Distributed versioning sharing of these local repositories.  Thus, when a team is working on a technical issue (ad hoc) it is very easy to share each others work. Mercurial includes an embedded web server that can be used for this.

Mercurial’s hg serve command is wonderfully suited to small, tight-knit, and fast-paced group environments.  It also provides a great way to get a feel for using Mercurial commands over a network.

This is illustrated with the coffee shop scenario, see manual.

A sprint or a hacking session in a coffee shop are the perfect places to use the hg serve command, since hg serve does not require any fancy server infrastructure … Then simply tell the person next to you that you’re running a server, send the URL to them in an instant message, and you immediately have a quick-turnaround way to work together. They can type your URL into their web browser and quickly review your changes; or they can pull a bugfix from you and verify it; or they can clone a branch containing a new feature and try it out.

Of course, this would not scale and is for “on-site” use between task focused group members.

A great workflow image by Leon Bambridge for team sharing.

Another simple scenario is taking a few file documents from one location to another with a flash drive (in lieu of using a Cloud storage service). Instead of doing a copy or cp one can simply create a DVCS repository at the work directory, then clone it on the flash drive. Then at home one pulls to the DVCS repository at home. When finished editing the files, one then pushes to the flash repo, and does the reverse process at the work site. Not only are you not missing any files, you are also keeping track of prior versions. Note, for security reasons, not everyone has unfettered web access or should they.

Revisiting the flash drive scenario above; if you plan to use a flash drive for transport multiple times and the group of files are large, using the “bundle/unbundle” hg commands are a good tool, see Communicating Changes on the Mercurial site.

Security
Every connection must be secure and every file must be encrypted, especially if on flash drives. The security policies of the employer come first. Even if only for your own personal ad-hoc use, you should be careful with exposing your data.

Advantages

  • Easy to use.The commands needed to perform normal tracking of changes are few and simple.  The conceptual model is also simple, especially if one is not fixated on use of centralized Version Control System.
  • Some file changes may be dependent on or result in other file changes.In a DVCS, commits or check-ins create a “changeset” in the local repository.  This naturally keeps track of related changes.
  • You may need to work on different operating systems.Mercurial runs on many systems including Windows.
  • You don’t want to change the existing system, low intrusion.  Mercurial can be deployed to a single folder, and the repositories it creates do not pollute the target folders.  For example, in the Subversion VCS, “.svn” folders are created in each subfolder in the target.  Not a drawback but complicates things down the line, such as when using file utilities and filters.

Issues

Unfortunately, the use of a DVCS is not perfect and has its own complexities.  For Mercurial, in the context of this post, these are handling binary files, versioning non-nested folders, and probably for any VCS is the semantic gap between the project task based view and the versioning mindset.

1. Binary Files

Mercurial is really for tracking non-binary files.  That way the advantages of versioning are realized.  Diffs and merges are not normally applied to Binary files. Further the size of binary files impact performance and storage when they reach a certain size.  Yet, for ad hoc use, binary files will have to be easily tracked.  Binary files could be images, libraries, jars, zips, documents, or data.

Large binaries are a problem with all VCS systems.  One author discussed a technique to allow Git to handle them in lieu of his continued use of Unison.  He said use Git’s “–shared” option:  git clone –shared /mnt/fileserver/stuff.git stuff

Note that Mercurial extensions exist to handle binary files.  One of these is the BigFiles extension.  In essence, BigFiles and other similar approaches, handle large binaries using versioned references to the actual binaries which are stored elsewhere.

Update Oct 29, 2011: Looks like Mercurial 2.0 will have a built-in extension for handling binary files, LargeFiles extension.

Another issue is that since binary files may not be diffed within the dvcs tool set.  In a DVCS one can set an external merge agent.   If one is not available, using the app that created the binary diff and merge is cumbersome.    For example, a Word doc is binary (even though internally it could be all XML) in effect.   Thus, a diff would not reveal a usable view.  One must ‘checkout’ particular revisions and then use Word to do a diff or just manually eyeballing it.  Same thing with a zip, jar, image, etc.

Update 02-02-2012: Some tools allow direct use of external tools to diff “binary” files. I think TortoiseSVN does this, allowing Microsoft Word, for example, to diff.

2. Non-nested target folders.

A scenario may involve the manipulation of folders that are not nested. For example, a business system employs two servers and changes must be made to both for a certain service to work, further, physically moving these folders or creating links is not possible or allowed. Mercurial, at this time, works on a single folder tree, and AFAIK there is no way to use symlinks or junctions to create a network folder graph, at least with my testing.  The ForestExtension or subrepositories experimental feature in Mercurial 1.3 do not qualify since they only enable the handling of a folder tree as multiple repositories.

Sure each folder tree in the graph can be managed, but if a particular change effects files in each tree, there is no easy way to transactionally version them into one changeset, though there are ways to share history between repositories (as in the ShareExtension).

A possible solution is to allow the use of indirect folders.  In Mercurial, work files and the actual repository, the .hg folder, are colocated.  Instead the repository can point to the target folders (containing the work files) to be versioned.  In this way multiple non-nested folders can be managed.  Note that this is not a retreat to the centralized VCS since the repository is still local and distributed using DVCS operations.   Below, the user has created a new Mercurial repository in folder “project”.  This creates the actual repo subdirectory “.hg”, and the indirect actual folders to be versioned are pointed to in a “repos” directive file or using actual symlinks.

project

\.hg

repos ——> src_folder1

\—–> src_folder2

\—–> src_folderN

Whether this is useful, possible, or already planned for is unknown.

I mentioned this “limitation” on the Mercurial mailing list and was told that this is not a use case for a DVCS. There are many good reasons why all (?) VCS are focused on the single folder tree.

Update, 2011-08-31 T 11:37
Just learned that Git does have an interesting capability?

It is also possible to have a working tree where .git is a plain ASCII file containing gitdir: , i.e. the path to the real git repository

Though this doesn’t fulfill the non-nested project folders scenario, it does help Git be more applicable in an ad-hoc solution. For example, the repo could be located in a different storage location when the target folder is in a constrained unit.

3. Non-admin install

Updated 25 Aug 2010: In the requirements, non-admin install of the VCS was mentioned. This is where Mercurial fails, somewhat. The default install using the binary, at least on Windows, requires admin privileges. I got around this by first installing on another Windows system, then copying the install target folder to the PC I need to work on. This worked even when I installed on a Windows 7 Pro, and then copied to a Windows XP Pro. No problems yet. The Fossil DVCS does not have this problem.

4. Ignore Files

This is, perhaps, a minor issue. Mercurial, as most VCS do, allow one to filter the files that are versioned in the repo.
In Mercurial one creates an .hgignore file and within it, one can use glob or regular expression syntax to specify the ignore files. Well, this can be tricky. See newsgroup discussion that was started by this post. IMHO, having another syntax declaration that allows specification of directories and files explicitly is the way to go. How do other systems do this? Ant patternsets seem to be pretty usable.

5. Semantic Gap

There is a semantic gap when working on a maintenance problem and switching to the versioning viewpoint.   When versioning systems are routinely used, as in Software Development, this is not an issue, just part of the Software Development Lifecycle or best practice (amazing that some shops don’t use version control).   But, when one uses VC only occasionally as a last resort it’s another story.  QA, Support, and Project Managers, may not be comfortable with repositories, branches, tags, labels, pull, push, and so forth.

When I first tried to use Mercurial for Ad hoc service professionally it quickly lost some of its advantages as the task (fixing a system) reached panic levels (usually the case with customer support and deployment schedules) and simply creating and looking at commit messages failed to follow the workflow.  Manually tracking which tag or branch related to which event of system testing was cumbersome.  Further use would have eventually revealed the patterns of use that would have worked better, but that was a onetime experiment.

A partial solution, other than just getting more expert with the DVCS and better work patterns, is to implement a higher level Domain Specific Language (DSL) that hides the low level DVCS command line and repository centric view.  This could even have a GUI counterpart.  This is not the same as a GUI interface to the DVCS such as TortoiseHg or the Eclipse HG plugin.  What should that DSL  be and is it even possible or useful?

work flow Updates

June 26, 2011: git-flow, is an example of providing high-level operations to enable a specific work flow or model. Perhaps such an approach would be applicable in this AHVC requirements.

Sept 17, 2011: Mercurial Flow-Extension
implements the git-flow branching model.

Alternatives

Naive Approach

The usual approach is to just make copies of the effected folder or files that you will be changing.  You can use suffixes to distinguish them, such as gizmo.1.conf.   It’s very common to see (even in production!) files or folders with people’s initials, gizmo.jb.19.conf.

This gets out of hand very quickly, especially if you are multitasking or working as part of a team and may forget after a good lunch what file “gizmo.24.conf” solved.   This problem is compounded when you need to change multiple files, so for example, gizmo.jb.19.conf may depend on changes to “widget.22.conf”.   This also gets very chaotic when the files to change and track are in different folder trees or even storage system.  Most importantly this will not withstand the throw clay at the wall and see what sticks school of real world maintenance.

One method I’ve seen and used myself is to just clone each folder tree to be modified.  This gives an easy way to back out any changes.  This, alas, is also error prone, resource intensive, and may not be possible on large file sets or very constrained systems.

Client-Server VCS

A Traditional client-server VCS like Subversion can, of course, be used for Ad Hoc Versioning.   With Subversion one can run svnserve, its lightweight server, in daemon mode.  Then create a task based repository:

svnadmin create /var/svn/adHoc

And, import your tree of files:

svn import c:\Users\jbetancourt\Documents\projects\adhocVersioning file:///var/svn/adHoc/project

Plus, Subversion supports offline use.  I think.  Have not used Subversion in a while.

Another effective Subversion feature is the use of local repositories using the “file:// protocol”.

Management consoles

Many systems are managed by various forms of management consoles, graphical user interfaces.  These are client or web based and may be part of the system or a third-party solution.  This is a big advantage from a hands-on administrative point of view.  However, from an automation and scripting viewpoint this is not optimal.  Thus, there is hopefully a API or tool based method of accessing the actual configuration files or databases of the system.  So in this sense, these systems are within the scope of this discussion.

This is not always the case.  One application server comes to mind that was (is?) so complex that there was no way to script it.  Thus, no way to automate the build and release process and versioning of the system.  Consequently, there was also no way to automate the various QA tests that were always panic driven and manually applied.

Managed Approach

The correct or more tractable method is to use a managed approach.  This is a software configuration and distribution system that is usually under the control of the IT staff, for example Microsoft’s System management Server (SMS) or System Center Essentials (SCE) for SMB.  Non-Microsoft solutions are of course available, such as those from IBM Tivoli’s product lineup.

Why is this not always the best approach?  There may be situations where a subset of a managed resource must be modified.  For example, you are a field service engineer and must travel or remotely connect to a client’s system to handle a service call.   This process may also entail making changes to other hosted apps and system configurations, such as network configurations.  Trying to get the IT department to collaborate or change the configuration or schedule of the managed solution may not be possible or timely.  In fact, this would be discouraged (rightly so) since it can wreak havoc on a production system.  Thus, even changing some resource may entail admin of multiple systems, not just a push of a few files and run of some batch files.  It could require interactive set and test.  Picture the Mars Rovers and how the OS reset problem was finally solved.

Closely related to the managed approach is to use a centralized version control system (VCS) or backup support.  Fortunately many operating systems have versioning capabilities built in or readily available.  For example, in the Windows platform one can make System Restore points or use the supported backup subsystems (as found in Windows 7 Professional).  Many *nix’s also have built-in versioning support in the form of installable Version Control Systems or differential backup.  In high-end virtualized systems there are facilities for backup or making snapshots and even transport of live systems.

While these work, there is a certain amount of complexity involved. Also there are issues using the same approach on multiple operating systems.  Another important drawback is that one cannot always modify the target system and, for example, install a VCS, even temporarily.  The common factor in these approaches is that there is a central “server” and associated repository for revision storage.  This is fine when available but not very conducive to ad hoc use.

Versioning File System

A VFS could be of some use.  As far as I know there are no popular file systems that support versioning (as used here).  Digital Equipment’s VAX system had single file versioning and now openVMS.  Microsoft’s Windows was supposed to have this in the future winfiles, but is no longer in the plan(?), though Windows 7 and current servers can allow access to previous file versions as a feature of its   system protection feature.  Plan 9 has a snapshot feature. ZFS has advanced stuff too and I would not be surprised if one can set a folder to be ‘versioned’.

However, a VFS would not help in task based versioning since as discussed previously, there may be a need to change multiple subsystems and track these changes as “change sets”.  Thus, a VFS is not a Revision Control System.

Of course, using a scripted solution (discussed next) in conjunction with a file change notification system (inotify), one could cobble together a folder based VCS.  However, this is outside of our lightweight requirements.

Scripted Solution

Of course, it should be possible, especially in *nix systems, to use the utilities available and construct a tool chain for a lightweight versioning support.  The rich scripting and excellent tools like rsync make this doable.  Some languages such as Perl or Python are ideal for gluing this together.

Yet, this is not optimal since these same tools will not work on all operating systems or require duplication.  For various reasons, for example, it is not always possible to install cygwin on Windows and make use of the excellent *nix utilities and scripting approach.  Likewise, it is not possible to use the outstanding Windows PowerShell in Linux.  This is only a problem of course if we are referring to empowering staff to work seamlessly on different OS or resources.  Having the same tools and workflow are valuable.

Another thing about this alternative is that a custom solution will eventually become or have functions of a version control system like Git, so why not just start with one?

Snapshot

One approach possible by the use of the aforementioned scripted solution is to create a snapshot system.  The DVCS gives us fine grained control of file revisions.  But, do we really need to diff and find out that a command in one batch file used the ‘-R’ option or just get the file with the desired option.  We would know which file we want using task based snapshots.  Before a task is begun, we initiate a snapshot.  This is analogous to the OS type of restore points, except we do this for a specific target set.

NoSQL Database

Finally, there have been alternatives to the Relational Database Management System (DBMS) for many years.  Most recently, this is the NoSQL group of projects such as CouchDB.    CouchDB claims that it is: “Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.”   Those features sound like something an ad hoc version control system should have.  Yet, CouchDB, all?, are document centric.  Still, worth pondering.

Conclusion

Presented were a few thoughts on an approach to ad hoc versioning.  A DVCS was proposed as a lightweight solution and some issues were mentioned.  Alternatives were looked at.  More research is required to evaluate proposal and determine best practices for the discussed scenarios.

Updates

7/15/10:  Changed “maintain” to “accomplish” in Scenario as per feedback from K. Grover.

7/23/10:  Forgot that I visited Ben Tsai’s blog where he discusses using Mercurial within an existing VCS such as Subversion, which I’ve also done, but not really the topic I discussed.

Further Reading

“HgInit: Ground up Mercurial”, http://hginit.com/01.html

Setting up for a Team

“Easy Automated Snapshot-Style Backups with Linux and Rsync” http://www.mikerubel.org/computers/rsync_snapshots/

Using Mercurial as ad-hoc local version control

“Intro to Distributed Version Control (Illustrated)”
http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/

Version Control, infrastructures.org
http://www.infrastructures.org/bootstrap/version.shtml

“The Risks of Distributed Version Control”,
http://blog.red-bean.com/sussman/?p=20

Subversion Re-education

“Subverting your homedir, or keeping your life in svn”
http://kitenet.net/~joey/svnhome/ (He now uses Git)http://microseeds.com/blog/?p=95

Home directory version control notification
http://kristian-domagala.blogspot.com/2008/10/home-direcotry-version-control.html

“Managing your web site with Mercurial”, Tim Post, http://echoreply.us/tuto/mercurial_site_management.html

SingleDeveloperMultipleComputers
http://mercurial.selenic.com/wiki/SingleDeveloperMultipleComputers

Mercurial by example
http://www.jemander.se/MercurialByExample.pdf

Mercurial (hg) with Dropbox
http://www.dzone.com/links/r/mercurial_hg_with_dropbox.html

Mercurial for Git users
http://mercurial.selenic.com/wiki/GitConcepts

Versioning File System
http://en.wikipedia.org/wiki/Versioning_file_system#Linux

Agile Operations in the Enterprise
Michael Nygard, http://www.infoq.com/articles/agile-operations

git-sync
http://code.google.com/p/git-sync/

git-flow
https://github.com/nvie/gitflow

Microsoft System Center Essentials
http://www.microsoft.com/systemcenter/essentials/en/us/default.aspx

A utility that keeps track of changes to the etc configuration folder:
http://kitenet.net/~joey/code/etckeeper/

Version Control for Multiple Agile Teams
http://www.infoq.com/articles/agile-version-control#q22

DVCS
http://en.wikipedia.org/wiki/Distributed_Version_Control_System

DVCS vs Subversion smackdown, round 3

“Using Mercurial as ad-hoc local version control”; Tsai, Ben;
http://bentsai.wordpress.com/2008/05/30/using-mercurial-as-ad-hoc-local-version-control/#comment-24

Tracking /etc etc

Subversion
http://subversion.apache.org/

For a more detailed exposition, see the mecurial tutorial:
http://www.serpentine.com/mercurial/index.cgi?Tutorial

The Hg manpage is available at:  http://www.selenic.com/mercurial/hg.1.html

There’s also a very useful FAQ that explains the terminology:
http://www.selenic.com/mercurial/FAQ.html

There’s also a good README:  http://www.selenic.com/mercurial/README

HG behind the scenes:
http://hgbook.red-bean.com/read/behind-the-scenes.html

Mercurial
http://en.wikipedia.org/wiki/Mercurial%28software%29

Mercurial Basic workflows
http://mercurial.selenic.com/guide/#basic_workflow

Mercurial BigFiles Extension
http://mercurial.selenic.com/wiki/BigfilesExtension

Mercurial LargeFiles Extension
LargeFiles

Mercurial Subrepos: A past example revisited with a new technique
http://playcontrol.net/ewing/jibberjabber/mercurial_subrepos_a_past_e.html

Mercurial(hg) Cheatsheet for Xen  http://xen.org/files/hg-cheatsheet.txt

A Guide to Branching in Mercurial
http://stevelosh.com/blog/2009/08/a-guide-to-branching-in-mercurial/

Subrepositories
http://mercurial.selenic.com/wiki/subrepos

Nested Repositories
http://mercurial.selenic.com/wiki/NestedRepositories

hgdeps
http://ratatanek.cz/hg/hgdeps/file/ab2935095cb9/deps.py

Tracking 3rd-party sources
http://www.selenic.com/pipermail/mercurial/2007-April/013002.html

TortoiseHg
http://tortoisehg.bitbucket.org/

Git
http://en.wikipedia.org/wiki/Git_%28software%29

Git as an alternative to unison
http://kitenet.net/~joey/blog/entry/gitless/


Follow

Get every new post delivered to your Inbox.