Printing git commit histogram

For some days, I was thinking how many LOCs I am writing at my workplace. We use Git to maintain the source code. So, I came up with a solution using shell script to get a raw data of lines added and deleted to the repo per day for given period.

The script I used is:

alias date="gdate"
start_date=$1

if [ -z "$2" ]; then
end_date=$(date +%F)
else
end_date=$2
fi

while [[ ! "$start_date" > "$end_date" ]]
do
git -C ~/devel/amber \
log --since=$(date --date "$start_date" +%F) \
--until=$(date --date "$start_date 1 day" +%F) \
--author='abhay' --pretty=tformat: --numstat \
| grep -v '^$' \
| awk -v the_date="$start_date" \
"{plus += \$1; minus += \$2;} END {print the_date, plus, minus;}"

start_date=$(date --date "$start_date 1 day" +%F)
done

The script can be fired as:

sh repostat.sh 2014-10-01 2014-10-16

Where the first is a start-date and second is an end-date. Second date can be omitted; In that case, the script will dump the output till today.

The output of above command is:

2014-11-01
2014-11-02 549 72
2014-11-03 126 54
2014-11-04 133 37
2014-11-05 283 87
2014-11-06 97 16
2014-11-07
2014-11-08
2014-11-09
2014-11-10
2014-11-11 153 12
2014-11-12 33 10
2014-11-13 165 153
2014-11-14
2014-11-15
2014-11-16

In the output, the column sequence is: date, number of lines added, and number of lines deleted from source code. The lines where second, and third columns are missing are the dates where I didn’t commit anything.

Now, let’s discuss what the shell script does.

The first statement aliases gdate command to date. The --date option is not available on OS X for date command. So, I brewed coreutils to the system, which installs date utility as gdate to avoid confusion with built-in date command. The date which ships with GNU/Linux doesn’t have this problem, so the alias can be removed.

Then, the if-else block makes the second argument to bash script optional. The +%F option to the date command is used to print the date in YYYY-MM-dd format.

The while loop iterates for the period between given two dates, from first to second date (inclusive).

Next, the -C option for the git specifies the path to the Git repository. This option must be specified before any git command (like log, commit, etc), because it’s a option for git, and not for any particular command.

Then, the log command prints a 3 column output of number of additions, number of deletions, and file name for a specified --since and --until options for a given author (that’s me 🙂 ). Then, I have aggregated the first two columns data using awk.

Then, I redirected the output to the log.txt file using:

sh repostat.sh 2014-10-01 2014-10-16 > log.txt

I used gnuplot to print the histogram of the collected data. Following is the gnuplot script saved as hist.gnu:

set terminal png
set output "graph.png"
set xtics rotate out
set style data histogram
set style fill solid border
plot "log.txt" using 2:xtic(1) title "Additions" lt rgb "green",
"" using 3 title "Deletions" lt rgb "red"

The script is ran as:

gnuplot hist.gnu

The gnuplot command outputs the histogram showing count of lines added and deleted in the repo. The histogram is saved in PNG format.

The set xtics command shows dates vertically; by default, the x-axis values are shown horizontal. But, as dates would have overlapped with each other, I have used xtics rotate command.

Finally, the gnuplot histogram is as below:

Commit Histogram

The histogram omits the x-axis values where I haven’t committed anything.

The shell script currently takes commits from the current checked-out branch, and gives commits for each day. Maybe, some options can be added to add granularity for days, weeks or months…

Printing git commit histogram