A Hacker’s Guide to Git

A Hacker’s Guide to Git is now available as an e-book. You can purchase it on LeanPub.

Introduction

Git is currently the most widely used version control system in the world, mostly thanks to GitHub. By that measure, I’d argue that it’s also the most misunderstood version control system in the world.

This statement probably doesn’t ring true straight away because on the surface, Git is pretty simple. It’s really easy to pick up if you’ve come from another VCS like Subversion or Mercurial. It’s even relatively easy to pick up if you’ve never used a VCS before. Everybody understands adding, committing, pushing and pulling; but this is about as far as Git’s simplicity goes. Past this point, Git is shrouded by fear, uncertainty and doubt.

Once you start talking about branching, merging, rebasing, multiple remotes, remote-tracking branches, detached HEAD states… Git becomes less of an easily-understood tool and more of a feared deity. Anybody who talks about no-fast-forward merges is regarded with quiet superstition, and even veteran hackers would rather stay away from rebasing “just to be safe”.

I think a big part of this is due to many people coming to Git from a conceptually simpler VCS — probably Subversion — and trying to apply their past knowledge to Git. It’s easy to understand why people want to do this. Subversion is simple, right? It’s just files and folders. Commits are numbered sequentially. Even branching and tagging is simple — it’s just like taking a backup of a folder.

Basically, Subversion fits in nicely with our existing computing paradigms. Everybody understands files and folders. Everybody knows that revision #10 was the one after #9 and before #11. But these paradigms break down when you try to apply them to Git’s advanced features.

That’s why trying to understand Git in this way is wrong. Git doesn’t work like Subversion at all. Which is pretty confusing, right? You can add and remove files. You can commit your changes. You can generate diffs and patches which look just like Subversion’s. How can something which appears so similar really be so different?

Complex systems like Git become much easier to understand once you figure out how they really work. The goal of this guide is to shed some light on how Git works under the hood. We’re going to take a look at some of Git’s core concepts including its basic object storage, how commits work, how branches and tags work, and we’ll look at the different kinds of merging in Git including the much-feared rebase. Hopefully at the end of it all, you’ll have a solid understanding of these concepts and will be able to use some of Git’s more advanced features with confidence.

It’s worth noting at this point that this guide is not intended to be a beginner’s introduction to Git. This guide was written for people who already use Git, but would like to better understand it by taking a peek under the hood, and learn a few neat tricks along the way. With that said, let’s begin.

(more…)

Understanding JavaScript: Inheritance and the prototype chain

This is the first post in a series on JavaScript. In this post I’m going to explain how JavaScript’s prototype chain works, and how you can use it to achieve inheritance.

First, it’s important to understand that while JavaScript is an object-oriented language, it is prototype-based and does not implement a traditional class system. Keep in mind that when I mention a class in this post, I am simply referring to JavaScript objects and the prototype chain – more on this in a bit.

Almost everything in JavaScript is an object, which you can think of as sort of like associative arrays – objects contain named properties which can be accessed with obj.propName or obj['propName']. Each object has an internal property called prototype, which links to another object. The prototype object has a prototype object of its own, and so on – this is referred to as the prototype chain. If you follow an object’s prototype chain, you will eventually reach the core Object prototype whose prototype is null, signalling the end of the chain.

So what is the prototype chain used for? When you request a property which the object does not contain, JavaScript will look down the prototype chain until it either finds the requested property, or until it reaches the end of the chain. This behaviour is what allows us to create “classes”, and implement inheritance. (more…)

JavaScript Performance: Variable Initialization

Initializing variables properly in JavaScript can have significant performance benefits. This can be shown with a simple synthetic benchmark.

notype.js

var x = null;

for (var i = 0; i < 1e8; i++) {
    x = 1 + x;
}

withtype.js

var x = 0;

for (var i = 0; i < 1e8; i++) {
    x = 1 + x;
}

Benchmark Results

$ time node notype.js
node notype.js  0.30s user 0.01s system 100% cpu 0.301 total

$ time node withtype.js
node withtype.js  0.10s user 0.00s system 99% cpu 0.109 total

This particular benchmark may be trivial, but it demonstrates an important point. In notype.js, x is initialized as null. This makes it impossible for V8 to optimize the arithmetic within the loop, since the type of x must be inferred during the first arithmetic operation. By contrast, the compiler can optimize withtype.js because x is known to be a number.

Running these scripts again with V8′s profiler enabled, we can gain some additional insight into what’s going on under the hood.

notype.js

 [JavaScript]:
   ticks  total  nonlib   name
    181   63.3%   71.8%  LazyCompile: * /home/joseph/dev/jsperf/var_init_value/notype.js:1
     68   23.8%   27.0%  Stub: BinaryOpStub_ADD_Alloc_SMI
      1    0.3%    0.4%  LazyCompile: ~PropertyDescriptor native v8natives.js:482
      1    0.3%    0.4%  KeyedLoadIC: A keyed load IC from the snapshot
      1    0.3%    0.4%  CallInitialize: args_count: 3

withtype.js

 [JavaScript]:
   ticks  total  nonlib   name
     72   66.7%   98.6%  LazyCompile: * /home/joseph/dev/jsperf/var_init_value/withtype.js:1
      1    0.9%    1.4%  LazyCompile: RegExpConstructor native regexp.js:86

The profiler doesn’t give us the full picture here, but we can see that notype.js is spending a fair amount of time in BinaryOpStub_ADD_Alloc_SMI, which V8 uses to create SMI (small integer) values.

It’s possible to dig into this even further by having V8 output the Hydrogen code (the intermediate language which V8 uses to represent the JavaScript code’s abstract syntax tree), or even further by having V8 output the final assembly code. However, both of these things are outside the scope of this post. (If you’re really interested, I’ve posted the output on GitHub.)

If this sort of thing interests you, you might enjoy reading Thorsten Lorenz’s collection of V8 performance resources or Petka Antonov’s V8 optimization killers.

Defining readable code

Code readability is something that I often bring up during code reviews, but I often have trouble explaining why I find a piece of code to be easy or difficult to read.

When you ask programmers how to make code easier to read, many of them will mention things like coding standards, descriptive naming, and decomposition. These things actually aid in making code easier to comprehend rather than easier to read. For me, readability is at a lower level, somewhere between legibility and comprehension.

 

Legibility - Readability - Comprehension

At the lowest level is legibility. This is how easily individual characters can be distinguished from each other, and can usually be boiled down to the choice of font, as well as the foreground & background colours.

At the highest level is comprehension, which is the ease in which a block of code can be fully understood. Decomposition, naming conventions and comments are just a few of the many ways to improve comprehension.

Readability sits between these two. This level is a little harder to define, but I believe it comes down to two main factors: structure and line density. (more…)

HTTP status as a service

Using Node.js* you can run a simple “HTTP status as a service” server. This can be useful for quickly checking whether your application handles various status codes.

var http = require('http');

http.createServer(function (request, response) {
  var status = request.url.substr(1);

  if ( ! http.STATUS_CODES[status]) {
    status = '404';
  }

  response.writeHead(status, { 'Content-Type': 'text/plain' });
  response.end(http.STATUS_CODES[status]);
}).listen(process.env.PORT || 5000);

This will create a server on port 5000, or any port that you specify in the PORT environment variable. It will respond to /{CODE} and return the HTTP status that corresponds to {CODE}. Here’s a couple of examples:

$ curl -i http://127.0.0.1:5000/500
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain
Date: Mon, 30 Sep 2013 14:10:10 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Internal Server Error%
$ curl -i http://127.0.0.1:5000/404
HTTP/1.1 404 Not Found
Content-Type: text/plain
Date: Mon, 30 Sep 2013 14:10:32 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Not Found%

This is a really simple example, and could easily be extended to let you specify a Location header value for 30X responses.

*Well, you could use anything really. I’m just using Node.js since JavaScript is my language of choice.

Converting Bootswatch themes to SASS/SCSS

There’s a fairly quick way to convert Bootswatch themes to Sass (which you might want to do if you use something like sass-bootstrap).

Simply download the theme’s variables.less and run the following find/replace patterns against it:

Variables

Find (regex): @([a-zA-Z0-9_-]+)
Replace: \$$1

Mixins

Find: spin(
Replace: adjust-hue(

This is all I’ve found in the themes that I’ve tried.

Getting Internet Sharing to work on OSX 10.8

I noticed that the Internet Sharing functionality didn’t work on my Macbook Air (running OSX 10.8 – Mountain Lion). This is because the Air’s DNS server (BIND) isn’t configured correctly.

For me, the fix was pretty simple. Edit /etc/com.apple.named.proxy.conf by running sudo nano /etc/com.apple.named.proxy.conf in a terminal, and change

forward first;

to

forward only;

Then turn Internet Sharing off and on again.

The annoying thing is that OSX seems to restore the BIND config the next time you turn Internet Sharing off, so you need to remember to change it each time.

Force Bower to clone from https:// instead of git://

Most Bower packages will be fetched using a git:// URL, which connects on port 9418. This can be problematic if you’re behind a firewall which blocks this port.

You can get around this quite easily by telling Git to always use https:// instead of git://:

git config --global url.https://.insteadOf git://

Dishonest comments

One of my favourite Ruby Rogues episodes (What Makes Beautiful Code) has a short section where the Rogues talk about the concept of dishonest code. David Brady wrote a really good piece on this, which I highly recommend reading.

What I want to talk about is a more specific variant of dishonest code: dishonest comments.

Take this code, for example:

$('a').click(function(e) {
    e.stopPropagation();
    e.preventDefault();
});

If you’re not familiar with JavaScript events, e.stopPropagation() will stop this event from bubbling up to other event handlers. Now, what if somebody decides that the event should bubble up? They might do something like this:

--- a/example.js
+++ b/example.js
@@ -1,4 +1,4 @@
 $('a').click(function(e) {
+    // Let the event bubble up to the next handler
-    e.stopPropagation();
     e.preventDefault();
 });

This is pretty common practice; a developer will leave a comment so that the next person understands why the e.stopPropagation() is gone. (more…)

Don’t use Git’s autocorrect feature

Quite often I’ve accidentally typed “git” twice. Usually this is fine, and Git just does something like this:

$ git git diff
git: 'git' is not a git command. See 'git --help'.

Did you mean this?
    init

But I recently turned on Git’s autocorrect feature, to see what it was like (git config --global help.autocorrect 1). The results were… interesting:

$ git git diff
WARNING: You called a Git command named 'git', which does not exist.
Continuing under the assumption that you meant 'init' in 0.1 seconds automatically...
fatal: internal error: work tree has already been set
Current worktree: /nfs/personaldev/vagrant/mobileweb-v2
New worktree: /nfs/personaldev/vagrant/mobileweb-v2/diff

This is really bizarre behaviour. The fact that it wants to autocorrect it to git init is sort-of okay. But rather than giving me the option to confirm that this is what I want, Git gives me a whole 0.1 seconds to hit Ctrl+C before it automatically runs the command for me.

Thankfully, git init isn’t a very destructive command. I was lucky that the only side effect of this was that Git created a new directory called diff. I can’t help but wonder what would’ve happened if Git decided to autocorrect to a more destructive command like reset, checkout, or gc.

The lesson here? Don’t use Git’s autocorrect. It really sucks.

Update: m_bright pointed out that the value of help.autocorrect is actually how many tenths of a second Git will wait before automatically executing the command. So something like git config --global help.autocorrect 10 would give you 1 second before the command is executed, which is probably slow enough to let you cancel any mistakes, and quick enough to still be useful.