Skip to content

VersionControlForGrammarDevelopment

MichaelGoodman edited this page Nov 8, 2011 · 5 revisions

Version Control for Grammar Development

This page is intended to describe how to use several version control systems (VCSs) with grammars. Version control (also "revision control") is essentially a more sophisticated method of backup for files. Users "checkout" files from a repository, modify the files, and "commit" their changes back into the repository. These systems usually have functionality for merging changes from multiples users, thus facilitating collaboration, and users can get previous versions of files from the history in order to revert damaging changes, etc. The repository is usually stored somewhere other than the development machines for safety. More information is available at http://en.wikipedia.org/wiki/Revision_control.

Centralized vs. Distributed

Traditionally, version control has been done with centralized systems like CVS or Subversion (SVN), but distributed systems, such as Git, Bazaar, or Mercurial, have become popular due to a number of benefits they provide.

Centralized systems, true to their name, need a central host for the repository. Users work in an "instance" of the repository. The instance may be moved around in a file system, but the repository must stay in the same location. All users check out of and commit to the same central repository, so their changes are shared immediately after committing the files.

Distributed systems are "headless" in that they do not need a central host. There are no "instances" of the repository, as the repository (including all the history) is built into the directories of the files it manages. This allows one to move the repository around and still be able to commit and view changes. One backs up the repository by simply copying the directories somewhere else. There is usually an official branch located on a publicly accessible server, and users commit changes by merging their own branch with the official version (if they have permission), or by making their version public so the maintainer of the official branch can merge the changes.

Other than the infrastructural differences, interactions with both centralized and distributed systems are largely the same.

Subversion

Subversion has traditionally been the VCS of choice for grammars, as there is usually an official version of the grammar hosted by a university. Following is an example of setting up a Finnish grammar (located at ~/grammars/fin) into an SVN repository.

Note: The example below initializes the repository in /tmp on the same machine as development. It is suggested you initialize it on a separate machine.

Initializing the repository:

user@host:~$ # svnadmin create REPOS_PATH
user@host:~$ svnadmin create /tmp/finnish

Importing the grammar directory (the -m option is for a commit message. If you do not use this, you will be prompted for a message in a text editor. Also, if the PATH is not specified, it will use the current directory.) Importing will not make the current directory an instance of the repository, so we can check out an instance somewhere else:

user@host:~$ cd ~/grammars/fin
user@host:~$ # svn import [PATH] REPOS_URL -m MSG
user@host:~/grammars/fin$ svn import file:///tmp/finnish -m "Initial import."
Adding  licence.txt
Adding  finnish.tdl
 ...
Adding  choices

Committed revision 1.
user@host:~/grammars/fin$ cd ~/grammars
user@host:~/grammars/fin$ # svn co REPOS_URL
user@host:~/grammars$ svn co file:///tmp/finnish fin2
Checked out revision 0.

Alternatively, we could have checked out an instance of the repository, added all the files in the current directory, then committed (note, don't import AND add, just do one or the other.):

user@host:~/grammars/fin$ # svn add FILES
user@host:~/grammars/fin$ svn co file:///tmp/finnish ./
Checked out revision 0.
user@host:~/grammars/fin$ svn add ./*
A       licence.txt
A       finnish.tdl
 ...
A       choices
user@host:~/grammars/fin$ svn ci -m "Initial commit."
Adding  licence.txt
Adding  finnish.tdl
 ...
Adding  choices
Transmitting file data ..........................
Committed revision 1.

Checking status and committing changes:

user@host:~/grammars/fin$ emacs finnish.tdl
user@host:~/grammars/fin$ # (edit and save file)
user@host:~/grammars/fin$ svn status
M       finnish.tdl
user@host:~/grammars/fin$ svn diff finnish.tdl
Index: finnish.tdl
===================================================================
--- finnish.tdl (revision 1)
+++ finnish.tdl (working copy)
@@ -4,8 +4,8 @@
 ;;;     Wed Jul 14 13:59:14 UTC 2010
 ;;; based on Matrix customization system version of:
 ;;;     unknown time
-;;;
-;;; 
+;;; author:
+;;;     user
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;
user@host:~/grammars/fin$ # svn ci [FILES] -m MSG
user@host:~/grammars/fin$ svn ci finnish.tdl -m "Added author comment."
Sending        finnish.tdl
Transmitting file data .
Committed revision 2.

Git and Mercurial

Both Git and Mercurial are distributed version control systems, so the process is slightly different.

The Grammar Matrix Customization System will allow you to download a grammar already initialized with Git or Mercurial, so the next two commands may be unnecessary. If you have a grammar that has no version control configured, you can use the following command to initialize an empty repository and add files to it:

In Git:

user@host:~/grammars/fin$ git init
user@host:~/grammars/fin$ git add .
# no output

In Mercurial:

user@host:~/grammars/fin$ hg init
user@host:~/grammars/fin$ hg add
adding LICENSE
adding METADATA
...
adding tsdb/skeletons/Relations

You can check the status of the repositories with the following commands:

In Git:

user@host:~/grammars/fin$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file:   README
#       new file:   Version.lsp
...
#       new file:   test_sentences
#

In Mercurial:

user@host:~/grammars/fin$ hg st
A LICENSE
A METADATA
...
A tsdb/skeletons/Relations

If the changes are acceptable, you can commit them as follows:

In Git (note there may be a message about Git settings regarding the username):

user@host:~/grammars/fin$ git commit -m "Initial commit."
[master (root-commit) abdb9b7] Initial commit.
 Committer: user <user@host>

 27 files changed, 6507 insertions(+), 0 deletions(-)
 create mode 100644 README
 create mode 100644 Version.lsp
 ...
 create mode 100644 test_sentences

In Mercurial:

user@host:~/grammars/fin$ hg ci -m "Initial commit."
# no output

The difference between SVN and Git or Mercurial is that a commit is for the repository initialized in the grammar directory, not a central repository elsewhere. One major benefit of this system is that you can commit and view changes without an internet connection. If you have an official branch for your grammar, you can "push" your local change history to the official location. The syntax of the push command depends on what kind of location you push to, so you should look up the specific command for your situation. Similarly, when you wish to retrieve changes from a remote branch, you can "pull" the changes and merge them into your local repository.

Clone this wiki locally