These days, if you want to work as a professional software developer, you’re going to need to learn how to use Git. A version control system (VCS) is nothing new. Over the years, we’ve had a number of VCS’ including CVS, Subversion, Mercurial, Microsoft TFS, and Perforce. For the last few years, Git has become the version control system of choice for most companies. Knowing how to use it well will make you more employable and will also help you when you land a job as a developer.
In this article I just want to introduce the basics so you understand how Git “thinks” about things and some of the most common operations. It’s a hands on introduction, so open up a terminal window, and jump on in!
What is a version control system?
A VCS allows you to keep track of the changes you’ve made to your work over time. It’s a little like the “track changes” in Google docs, but the difference is that you can save changes across a set of files, not just within an individual file.
Imagine that you’re adding a new “About Us” page to a website. You might need to create a new HTML page, add some new rules to your CSS to make it display right, and upload a couple of images for the page. With a VCS, you can “check in” all of the changes to those different files with a single commit message “Add about us page.” When someone looks back through the history of the commits, they’ll be able to easily find when the change was made and what files were impacted. They’ll even be able to “revert” the commit if they want to get rid of the About Us page!
Most version control systems also support “branching.” With branches, you can have different versions of your code being developed at the same time, so one team can update your ticketing functionality while another changes how your email sending works. While there’s now debate amongst high performing teams about whether they should continue to use branches, learning how to use them will help you to work more successfully in most engineering organizations.
Let’s start off by defining a few key concepts that will help when we’re talking about Git:
A repository - This is Git’s name for a project. It includes all of the files in the project along with all of the information about how they have changed over time. If you have a full copy of a repository (often referred to as a “repo”), you can view the current state, and any previous states, of the project.
A commit - In Git, history is made up of a series of commits which are stored in the changelog. Every time you make a meaningful set of changes to your project, you should commit them so that you can always get back to the project in that state in the future.
The Staging Area - This is like a shopping basket for version control. It’s where you load up the sets of changes that you’d like to put in your next commit, so if you have edited three files, but want to make one commit with two of them and another commit with the third, you just “stage” the first two using the
git addcommand, then commit them with an appropriate message and then add and commit the last file separately.
Getting started with Git
If you don’t have Git installed, you’re going to want to start by installing it. Once you’ve done that, let’s open a terminal window and see what Git is and how to use it.
If you’re on Windows, open the “Git Bash” program, if you’re running Mac or a flavor of Linux, just open up a terminal window. It’s important not to just open up Powershell or the default terminal on a Windows machine - it won’t work correctly.
Go to a directory somewhere within your home directory (so you have write permissions to create files). Let’s make sure you are not already in a directory that is part of a Git repository (unlikely, but it happens):
> git status fatal: not a git repository (or any of the parent directories): .git
Good. We asked Git for the status of the repository we were in, and it let us know we’re not in a Git repo. That’s good. Creating one Git repo inside of another will confuse both you and Git!
Now let’s create an all-in-one new Git repo and directory:
> git init my_first_repo Initialized empty Git repository in /Users/peterbell/Dropbox/code/my_first_repo/.git/
Perfect. It created a repository under the directory I was in. Let’s use the Unix “change directory (cd)” command to go there:
> cd my_first_repo my_first_repo git:(master)
OK, so my terminal tells me when I’m in a Git repo by showing the
git:(master) message. - Don’t worry if your terminal isn’t set up to do that — it’s not required! Let’s see the status of the project:
1 2 3 4
> git status On branch master No commits yet nothing to commit (create/copy files and use "git add" to track)
Cool. Don’t worry if you see slightly different messages - they vary by operating system and version of Git. The bottom line is that Git is telling us that we don’t have any commits yet, we’re on the “master” branch (the main branch) and there aren’t any files here to save into version control.
Let’s just check that you have the basic configuration for Git so that when you save files it knows your name and email address.
> git config --global user.name Peter Bell
With the command above, we’re accessing the configuration settings for Git on your computer. The
--global means we’re looking at the configuration settings that will apply to all of the projects you work on logged in as this user on this machine. The uncommon
--system accesses settings that are shared across all users on your machine and
--local accesses settings that are specific to a single project and only works if you’re within a Git repo when you run the command.
When you pass a key to
git config without a value (in this case, the
user.name key), it returns the existing value. If you also pass a value, it sets that value.
Now depending on your setup you might have seen your name, nothing, a message that Git hasn’t been set up properly, or even an error message that a file could not be found. If you see anything other than your name, set your name like this:
> git config --global user.name ‘Your Name’
> git config --global user.name Your Name
And you should now see your name.
Let’s do the same for your email address:
> git config --global user.email email@example.com
If it doesn’t have the value you want, set it to something. No quotation marks required:
1 2 3
> git config --global user.email firstname.lastname@example.org > git config --global user.email email@example.com
There are a lot of other settings, but now Git knows what name and email address to save your commits.
Adding some files
The easiest way to create a test file is to use the Unix command “touch.” If the file exists, it’ll just update the timestamp. If it doesn’t, it’ll create a blank file that we can add into version control.
So let’s create a couple of files. They won’t have any content, but we’ll give them names that we might use when working on a real software development project. Let’s use the example of building a simple HTML website.
1 2 3 4 5 6 7 8 9 10
> touch index.html > touch index.css > git status On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) index.html index.css nothing added to commit but untracked files present (use "git add" to track)
OK, so we’re still on the master branch (we’re not going to mess with branching in this article). We haven’t committed (saved into permanent history in Git) yet, and the two files are “untracked” — Git isn’t really paying much attention to them until we “add” them.
Now imagine we want to make an initial commit for the home page (the index.html) and then another commit for the css to make it look better.
1 2 3 4 5 6 7 8 9 10
> git add index.html > git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: index.html Untracked files: (use "git add <file>..." to include in what will be committed) index.css
So this is telling us that when we do make a commit now, the index.html file is the one that’ll get saved. Let’s do that:
1 2 3 4
> git commit -m ‘Create home page’ [master (root-commit) 734ca15] Create home page 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 index.html
OK, so what’s going on here? Firstly, I told Git to make a commit and included the
-m flag to pass a message for the commit, followed by single or double quotes containing the message I wanted to associate with this commit that will make it easy for other people to understand what I’m doing.
It’s important to know that every commit requires two things — a commit message and at least one added, modified, renamed, or deleted file. Depending on your operating system and version of Git, if you don’t pass a commit message it’ll either create a default message for you or it will throw you into whatever text editor you use with Git (look out - it might be something a little cryptic like vi) to add a commit message.
And what does the response mean? Well, it’s telling us we’re still on master and that we have just made the root (very first) commit. It’s giving us the first 7 characters of the hexadecimal SHA-1 hash which is the unique identifier for every commit in a Git repository, and it’s sharing my commit message and how many files were changed. In this case we added 1 file, but didn’t add or remove any lines of content because the file was empty. It also shows me the file within the commit (index.html), and it says “create mode 100644” which you can pretty much ignore.
Cool. And what’s our current Git status now?
1 2 3 4 5 6
> git status On branch master Untracked files: (use "git add <file>..." to include in what will be committed) index.css nothing added to commit but untracked files present (use "git add" to track)
Perfect. So it sees that we still have an untracked file. Let’s add and commit it.
> git add .
There are a lot of ways of adding files to the “staging area” in Git. You can name them one at a time (
git add index.css test.css). You can match a set of files using a fileglob pattern (
git add *.css) or you can just add all of the files in the repo (
git add .).
Whichever approach you take, that adds the other file to the staging area.
1 2 3 4 5
> git status On branch master Changes to be committed: (use "git reset HEAD <file>..." to unstage) new file: index.css
So all we have to do is commit it:
1 2 3 4
> git commit -m ‘Style home page’ [master 435c6b5] Style home page 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 index.css
Great - it’s made a new commit on master (
435c6b5 in my case - for you it’ll be different as it is based in part on the username and email used in this and previous commits) and added a new file (but no lines of text because in this simple tutorial example it was a blank file).
Congratulations! You just created a new Git repo, and staged and added some files.
What’s with the Staging Area?
Around this time, when I used to teach this in enterprise classes, someone usually asked the quite reasonable question “why do we have to run two separate commands -
git add and then
git commit just to save our work?”
Firstly, it’s not something you’ll have to do all the time. As a software engineer, you’ll spend most of your time modifying files. When you’re modifying files, Git gives a shortcut of
git commit -am “your message here” which will both add modified files and commit them in a single line, so most of the time you only have to type a single command.
But the real power of the staging area is the ability to go back and sort multiple changes into separate commits.
Again, a hand usually pops up at this point “why bother having a bunch of different commits." This is a particularly common question from software developers who have used older version control systems like Subversion where committing is a slower process and it’s common for devs to just code all day and save their changes with a message along the lines of “stuff I did on Monday!”
The reason it’s important to create meaningful commit messages with one commit for each kind of change made (“add about us page," “style customers page," etc) is so that it’s easy to understand how you got to the current state of the app, and to find and perhaps even revert (undo) anything that is problematic. It’s the same reason you don’t name all your variables “a”, “b” and “c” - the computer wouldn’t mind, but it will not make your life easier the next time you pick up the code and try to figure out what it’s all about!
There’s a lot to learn about Git. We haven’t covered branching, pushing and pulling from a remote server, undoing your changes, more advanced configuration settings, or how to check out previous commits, but once you understand the basic principles of the staging area, you’ll be ahead of a lot of people who have been using git for a while. And keep an eye out for more articles in this series over the upcoming weeks!
Head of Data Science
Peter is a veteran technologist, CTO, entrepreneur, and longtime educator, having taught digital literacy at Columbia and authored numerous programming books.