2. Git fundamentals

Introduction

Git is a distributed revision control system. It was initially designed by Linus Torvalds for Linux kernel development. Git's current software maintenance is overseen by Junio Hamano.

It was developed as a set of programs designed specifically for use in scripts. Specialized revision control systems or user interfaces based on Git can thus be created. For instance, Cogito is an example of frontend to Git repositories, and StGit uses Git for patch management.

Git supports rapid splitting and merging of versions, includes tools for visualization and navigation through the nonlinear development history. As well as Darcs, BitKeeper, Mercurial, SVK, Bazaar and Monotone, Git gives each developer a local copy of the development history; all changes are copied from one repository to another.

Remote access to Git repositories is provided by git-daemon, _gitosis, SSH or HTTP. The TCP service git-daemon is distributed with Git and is, along with SSH, one of the commonest and most secure access methods. As for HTTP, it is very popular in controlled networks despite its limitations, because it allows using the existing configurations of network filters.

Working with a remote repository: The Basics

git clone — cloning a (remote) repository

To start working with the central repository, you will have to get a local copy of the original project with all its history.

Clone the repository, using the http: protocol:

git clone http://user@somehost:port/~user/repository/project.git

Clone the repository from the same machine to the directory myrepo:

git clone /home/username/project myrepo

Clone the repository, using the secure ssh protocol:

git clone ssh://user@somehost:port/~user/repository

Git has its own protocol, too:

git clone git://user@somehost:port/~user/repository/project.git/

Import the svn repository, using http:

git svn clone -s http://repo/location

-s – recognize standard SVN directories (trunk, branches, tags)

git fetch and git pull — fetching revisions from the central repository

To synchronize the current branch with the repository, git fetch and git pull are used.

git fetch — fetching revisions of the remote branch from the default repository of the main branch, the one that was used for cloning the repository.The remote tracking branch will thus be updated; then use git merge to merge it with the local branch.

git fetch /home/username/project — fetching revisions from the specified repository

You may also use synonyms for paths created by the git remote command:

git remote add username-project /home/username/project

git fetch username-project — fetch revisions at the path the synonym points to.

Once you have studied the differences (for instance, with git diff), you will naturally want to commit the merge with the main branch:

git merge username-project/master

git pull both fetches revisions and merges them with the active branch.

Pull from the repository where the remote branches were created by default:

git pull

Pull revisions and tags from the specified repository:

git pull username-project --tags

Usually, git pull is used from the very start.

git push — making changes in the remote repository

After working in the experimental branch, you will have to update the remote repository (remote branch) before merging. To do so, the git push command is used.

Send your revisons to the remote branch that was cloned by default:

git push

Send revisions from the master branch to the experimental branch of the remote repository:

git push ssh://yourserver.com/~you/proj.git master:experimental

In the origin remote repository, delete the experimental branch:

git push origin :experimental

Push the local master branch to the remote master branch of the origin repository (synonym for the default repository):

git push origin master:master

Send tags to the remote master branch of the origin repository:

git push origin master --tags

Working with a local repository

Basic commands

git init — creating a repository

The git init command creates, in the directory, an empty repository - a directory named .git, where all information on the commit history, on the tags, on the project development in general will be stored:

mkdir project-dir
cd project-dir
git init

git add and git rm — indexing changes

Next thing you need to know is the git add command. It adds revisions, that will later make part of the commit, to an index - that is, a temporary storage. Here are some examples:

indexing a modified file or notification on a new file being created:

git add EDITEDFILE

adding all changes to the index, including the new files:

git add .

To remove a file both from the index and the project tree, use git rm:

removing separate files:

git rm FILE1 FILE2

a good example of removing all .txt documentation files on Git from the directory:

git rm Documentation/\*.txt

indexing all deleted files:

git rm -r --cached .

To clear the index or to remove just the revisions of one file from it, use git reset:

clearing the index:

git reset

removing one particular file from the index:

git reset — EDITEDFILE

git reset can be used for other purposes too and will be described later in much more detail.

git status — the project status, files that have been modified but not added, indexed files

git status is perhaps one of the commonest Git commands, along with committing and indexing. It displays information on all differences in the project directories from the latest commit of the working branch; files that have been indexed and those that have not been indexed are sorted out separately. This command is very easy to use:

git status

besides, git status indicates files with unresolved merge conflicts and files ignored by Git.

git commit — making a commit

A commit is a basic concept in all revision control systems; as such, it has to be easy and as fast as possible. The simplest would be to enter after indexing:

git commit

If the index is not empty, a commit will be made on its basis; the user will be then prompted to comment their revisions with the edit command. Now save, and here we are! The commit is ready.

There are several parameters, which make working with git commit even easier:

git commit -a
The commit will be made with automatic indexing of the project's files. Note that new files will not be indexed - while all removals will be recorded.
git commit -m «commit comment»
- commenting the commit from the command line, not from the text editor.
git commit FILENAME
The revisions will be indexed and the commit made on the bases of the revisions of one file.

git reset — returning to a commit, making a soft or a hard reset

Apart from working with indexes (see above), git reset allows you to return to a commit on the development history. Git supports two resetting modes: soft reset and hard reset.

The soft reset (with the --soft parameter) will leave your index and your file tree as they were, but you will be able to work with the commit you specified. In other terms, if you happen to notice an error in the commit you have just made or in the commentaries to it, you can easily mend the situation:

  1. git commit — the wrong commit
  2. git reset --soft HEAD^ — getting to work on an existing commit, while the project's state and the indexed files will remain untouched
  3. edit WRONGFILE
  4. edit ANOTHERWRONGFILE
  5. git add .
  6. git commit -c ORIG_HEAD — returning to the latest commit; you will be prompted to edit the message. If you do not edit the message, you may just use the upper-case parameter instead of -c:
    git commit -C ORIG_HEAD

Note that HEAD^ means "call the parent of the latest commit". We will give a more detailed description of this relative addressing later, in Hashes, tags, relative addressing. So HEAD points to the latest commit. The link ORIG_HEAD after the soft reset points to the original commit.

Of course, you can get even deeper down your development history, if you need to.

The hard reset (use the --hard parameter) is a command that you should use with caution. Git reset --hard will return the project tree and the index to the state corresponding to the commit you specified, and the revisions brought in by the subsequent commits will be lost:

git add .
git commit -m «destined to death»
git reset --hard HEAD~1 — no one is ever going to see this disgraceful commit again!
git reset --hard HEAD~3 — ...or, rather, these three last commits. No one, remember. Lost for posterity.

If the command reaches the ramification point, the commit will not be deleted.

You will see more examples on merging or on pulling the latest revisions from the remote repository in the respective sections.

git revert — reverting the revisions made with a commit

You may want to undo the revisions brought in by some commit. Git revert will create a new commit, thus reverting the changes.

Revert a tagged commit:

git revert config-modify-tag

Revert the commit using its hash:

git revert cgsjd2h

For you to be able to run this command, the project's state should be the same as the state registered by the latest commit.

git log — miscellaneous on commits

Sometimes you would want to get information about the history of the comments, or on commits which have modified a file, or on the commits made in a time interval, etc. Any of this can be done with git log.

If you do not specify any options, the program will return a summary on all commits in the currently active branch (to learn more about branching and merging, see the Branching section):

git log

To get detailed info on each of them in the form of patches for the commit files, add -p as a parameter (or -u):

git log -p

To see the revision statistics, such as the number of modified files, or the number of lines that were added, use the --stat parameter:

git log --stat

If you append --summary, information of file creations, renaming and access rights will appear:

git log --summary

To see the history of a particular file, specify its name as a parameter (by the way, this does not seem to work in my old version of Git: I have to put "—" before «README»):

git log README

or, as I said before, for an older version of Git:

git log — README

Later on I will stick to the newest version of syntax. It is possible to specify the
time elapsed since some moment (in «weeks», «days», «hours», «s»
and so on):

git log --since=«1 day 2 hours» README
git log --since=«2 hours» README

for revisions belonging to one particular directory:
git log --since=«2 hours» dir/

Tags may also be a reference point.

To view information on all commits, beginning with tag v1:

git log v1...

For all commits that include revisions of the README file, beginning with tag v1:
git log v1... README

For all commits that include revisions of the README file, beginning with tag v1 and ending with tag v2:
git log v1..v2 README

The --pretty parameter has interesting features.

For instance, the following command will display a line for every commit, including the hashed value (the term "hash" applies here to the commit's unique identifier; see more details later):

git log --pretty=oneline

For a summary of commits (only the author and the comment will be displayed), execute:
git log --pretty=short

For verbose info on commits, including the author, the comment, the date of creation and of actual commit, enter:
git log --pretty=full/fuller

You can actually define the return format manually:

git log --pretty=format:'FORMAT'

To define the format correctly, please refer to the "git log" section of the Git Community Book or to the manual. You can display a nice ASCII-graph of the commits with --graph.

git diff — viewing differences between project trees, commits, etc.

The git diff command may be considered as a subset of git log. It determines the differences between objects within the project: that is, between the trees (files and directories).

Show non-indexed revisions:

git diff

Show indexed revisions:
git diff --cached

Shows differences in the project, compared to the latest commit:
git diff HEAD

...compared to the last but one commit:
git diff HEAD^

You can compare the branches' heads:
git diff master..experimental

...or the active branch with any other branch:
git diff experimental

git show — displaying revisions brought in by one commit

To view the changes made by any commit from the history, use the git show command:

git show COMMIT_TAG

git blame and git annotate — tracking the revisions of files

In case of teamworking, you often have to know who did what. To track the latest commit, displaying the line, the author and the hash, you should use git blame:

git blame README

You can also specify the lines you are interested in:

git blame -L 2,+3 README

displays info on three lines, beginning with the second line.

git annotate works similarly: it shows the lines and the info about the commits they relate to:

git annotate README

git grep — searching by word through the project or the history of project's statuses

git grep, in fact, just duplicates the functionality of the well-known Unix command. However, it can look for words and groups of words in the project's history, and this can prove to be handy.

Look for "tst" in the project:

git grep tst

Count the number of entries for "tst" in the project:

git grep -с tst

Seach through an older version of the project:

git grep tst v1

This commands supports boolean AND/OR.

Look for lines containing both the first and second word:

git grep -e 'first' --and -e 'another'

Look for lines containing at least one the words specified:

git grep --all-match -e 'first' -e 'second'

Branching

git branch — creating, listing and removing branches

Working with branching is very easy in Git, for all necessary tools are called with one command:

Just list the existing branches and mark the active one:

git branch

Create a new branch, named "new-branch":
git branch new-branch

Remove the branch, if it was merged with resolving all the possible conflicts to the current branch:
git branch -d new-branch

Remove the branch with no conditions whatsoever:
git branch -D new-branch

Rename the branch:
git branch -m new-name-branch

Show only the branches that have the specified commit as a parent:
git branch --contains v1.2

git checkout — switching between branches, extracting files

The git checkout command allows to switch between the latest commits of the branches (this is a simplified description):

checkout some-other-branch

Creates a branch to switch to
checkout -b some-other-new-branch

If the current branch has any changes compared to the latest commit in the branch (HEAD), the program will refuse to switch, for the work not to be lost. To override this, use the -f parameter:

checkout -f some-other-branch

If you still want to save the changes, use the -m parameter. The program will then try to merge revisions in the current branch before switching and, after resolving possible conflicts, will finally switch to the new branch:

checkout -m some-other-branch

To bring back a file (maybe just from the previous commit), do as shown below:

Return "somefile" to the state of the latest commit:

git checkout somefile

Return "somefile" to the state of last but one commit back on the branch:

git checkout HEAD~2 somefile

git merge — merging branches (resolving conflicts)

Unlike in centralized systems, merging in Git is a commonest practice, which has an easy to use interface:

Try to merge the current branch with the "new-feature" branch:

git merge new-feature

Should any errors occur, there will be no commit, while the problem files will be tagged like in SVN and indexed as "unmerged". As long as the problems persist, you will not be able to make a commit.

Suppose there is a conflict in the file TROUBLE, as you can see with git status:

Произошла неудачная попытка слияния

git merge experiment

Looking for problems:

git status

Resolving problems:

edit TROUBLE

Indexing our revisions, thus clearing the tags:

git add .

Now you can commit the merge:

git commit

And here we are: nothing complicated about it. If you change your mind while the problems are being resolved, just type

(to bring both branches to their initial respective states):

git reset --hard HEAD

If, however, the merge commit was made, use the following command:

git reset --hard ORIG_HEAD

git rebase — creating a straight line of commits

Suppose that the developer began an additional branch for working on one particular feature and made several commits in it. At the same time, commits were also made in the main branch, whatever the reason: for instance, revisions were merged from a remote server, or the developer committed to it as well.

Fundamentally, you could use the common git merge in this situation. But this would complicate the development line, and you would not want this in big projects that employ many developers.

Suppose we had two branches, "master" and "topic", and that commits were made in each of them since the ramification point. The git rebase command will fetch the commits from the "topic" branch and roll them on the latest commit in the "master" branch:

to specify explicitly both the source and the target:

git-rebase master topic

to roll the (now) current branch on the master branch:

git-rebase master

After this, the development history will become linear. When a conflict occurs when progressively applying the commits, the command will stop and the problem lines will be tagged as such. After editing (that is, when the conflicts are resolved), you should index the files with git add and continue rolling on the subsequent commits with git rebase --continue. You could also execute git rebase --skip (skipping the commit) or git rebase --abort (stopping the command and undoing all the changes made).

If used with the -i (or --interactive) parameter, the command will run in interactive mode. The user will be able to determine the order of changes, to decide whether the editor will be launched automatically to resolve conflicts, etc.

git cherry-pick — applying the revisions brought in by one particular commit to the project tree

If the development history is quite complex and ramificated, you may need to apply the changes, that were made by a commit within one branch, to the tree of another branch (the one currently active):

the changes brought in by the specified commit will be applied to the trunk, automatically indexed and will become a commit in the active branch:

git cherry-pick BUG_FIX_TAG

The -n parameter shows that the changes should be simply applied to the project tree without indexing or creating a commit.

git cherry-pick BUG_FIX_TAG -n

Other commands and useful features

Hash — unique identification of objects

To identify any objects, Git uses a unique (that is, unique with a very high probability) hash of 40 characters each, calculated by the hash function on the basis of the object's contents. Objects can be anything: commits, files, tags, trunks. Since any hashed value is unique for a file, it is very easy to compare objects: you need compare only two 40-character strings.

For us, the most important thing is that hashes identify commits. A hash is, so to say, an advanced analog of Subversion revisions. See below some examples of using hashes as a method of addressing:

Finding differences between the current state of the project and commit number… please read its number yourself, will you?

git diff f292ef5d2b2f6312bc45ae49c2dc14588eef8da2

The same, but using the six first characters only. Git will find out which commit it is, if ther is no more commits with the same head of the cash:

git diff f292ef5

Even four characters may suffice:

git diff f292

Reading the log for the commits from X to Y:

git log febc32...f292

You must have noticed by now that working with hashes is not easy: we are not machines, we are human beings. That is why tags were invented.

git tag — tags as a way of marking a unique commit

A tag is an object related to the commit; it stores the link to the commit itself, the author, its own name and some comments. Besides, developers can identify them with their digital signature.

Git also supports the so called «lightweight tags», which consist only of the name and the link pointing to the commit. These are usually used to facilitate navigation; they can be created very easily:

create a lightweight tag related to the latest commit. If the tag already exists, another one will not be created:

git tag stable-1

tag a commit:

git tag stable-2 f292ef5

delete the tag:

git tag -d stable-2

list the tags

git tag -l

create a tag for the latest commit, substitute it to the existing tag, if there was any:

git tag -f stable-1.1

Once the tag is created, its name can be used instead of the hashed value in any commands like git diff, git log, etc.:

git diff stable-1.1...stable-1

Using custom tags is handy for adding some information to the commit, such as the version number or a comment. That is, if you comment a commit with «X fixed», something like «production version» will be written in the comment to the tag v1.0:

create a custom tag for the latest commit; a text editor will be called to leave a comment:

git tag -a stable

create a custom tag, with a comment as argument:

git tag -a stable -m "production version" 

The listing, deleting, overriding commands for custom tags are precisely the same as for the lightweight tags.

Relative addressing

Apart from revisions and tags as the commit name, you can also use relative addressing. For instance, you could address the parent of the latest commit in the master branch directly:

git diff master^

If you append a digit, addressing will be done by several parents of the merge commits:

See the differences between the latest commit in the master branch and its second parent. HEAD here points to the latest commit of the active branch:

git diff HEAD^2

You can use a tilde to tell the program how deep into the history it should look.

See what was brought in by the last but one commit:

git diff master^^

another writing, same result:

git diff master~2

You can combine flags to get to the commit you are interested in:

git diff master~3^~2
git diff master~6

The .gitignore file: Telling Git which files must be ignored

The project's directories may contain files, which you would probably not like to see constantly in the git status report, such as auxiliary files generated by text editors, temporary files and other garbage.

You can make git status ignore them by creating the file .gitignore in the root directory or deeper down the tree (if limitations should apply to some directories only). It can define the templates for the untracked files of a particular format.

Hence your .gitignore file may contain something like this:

#comments on .gitignore
#ignore .gitignore itself
.gitignore
#all html files...
*.html
#...except for one particular file
!special.html
#object files and archives will not be tracked
*.[ao]

There are are ways to indicate the untracked files: to learn more, call the help file by typing git help gitignore.

Server commands for the repository

; git update-server-info : creates auxiliary files for the dumb server in $GIT_DIR/info and $GIT_OBJECT_DIRECTORY/info, so that the clients could see the links and the packages available on the server.

; git count-objects : calculates how many objects will be lost and how much space will be freed after repacking the repository.
; git gc : repacks the local repository.

Techniques

Create an empty repository on the server

repo="repo.git" 
mkdir $repo
cd $repo
git init --bare
chown git. -R ./
cd ../

Import a SVN repository on the Git server

repo="repo.svn" 
svnserver="http://svn.calculate.ru" 
git svn clone -s $svnserver/$repo $repo
mv $repo/.git/refs/remotes/tags $repo/.git/refs/tags
rm -rf $repo/.git/refs/remotes
rm -rf $repo/.git/svn
mv $repo/.git $repo.git
rm -rf $repo
cd $repo.git
chown git. -R ./
cd ../

See also

Thank you!