Printing git commit histogram

For some days, I was thinking how many LOCs I am writing at my workplace. We use Git to maintain the source code. So, I came up with a solution using shell script to get a raw data of lines added and deleted to the repo per day for given period.

The script I used is:

alias date="gdate"

if [ -z "$2" ]; then
end_date=$(date +%F)

while [[ ! "$start_date" > "$end_date" ]]
git -C ~/devel/amber \
log --since=$(date --date "$start_date" +%F) \
--until=$(date --date "$start_date 1 day" +%F) \
--author='abhay' --pretty=tformat: --numstat \
| grep -v '^$' \
| awk -v the_date="$start_date" \
"{plus += \$1; minus += \$2;} END {print the_date, plus, minus;}"

start_date=$(date --date "$start_date 1 day" +%F)

The script can be fired as:

sh 2014-10-01 2014-10-16

Where the first is a start-date and second is an end-date. Second date can be omitted; In that case, the script will dump the output till today.

The output of above command is:

2014-11-02 549 72
2014-11-03 126 54
2014-11-04 133 37
2014-11-05 283 87
2014-11-06 97 16
2014-11-11 153 12
2014-11-12 33 10
2014-11-13 165 153

In the output, the column sequence is: date, number of lines added, and number of lines deleted from source code. The lines where second, and third columns are missing are the dates where I didn’t commit anything.

Now, let’s discuss what the shell script does.

The first statement aliases gdate command to date. The --date option is not available on OS X for date command. So, I brewed coreutils to the system, which installs date utility as gdate to avoid confusion with built-in date command. The date which ships with GNU/Linux doesn’t have this problem, so the alias can be removed.

Then, the if-else block makes the second argument to bash script optional. The +%F option to the date command is used to print the date in YYYY-MM-dd format.

The while loop iterates for the period between given two dates, from first to second date (inclusive).

Next, the -C option for the git specifies the path to the Git repository. This option must be specified before any git command (like log, commit, etc), because it’s a option for git, and not for any particular command.

Then, the log command prints a 3 column output of number of additions, number of deletions, and file name for a specified --since and --until options for a given author (that’s me 🙂 ). Then, I have aggregated the first two columns data using awk.

Then, I redirected the output to the log.txt file using:

sh 2014-10-01 2014-10-16 > log.txt

I used gnuplot to print the histogram of the collected data. Following is the gnuplot script saved as hist.gnu:

set terminal png
set output "graph.png"
set xtics rotate out
set style data histogram
set style fill solid border
plot "log.txt" using 2:xtic(1) title "Additions" lt rgb "green",
"" using 3 title "Deletions" lt rgb "red"

The script is ran as:

gnuplot hist.gnu

The gnuplot command outputs the histogram showing count of lines added and deleted in the repo. The histogram is saved in PNG format.

The set xtics command shows dates vertically; by default, the x-axis values are shown horizontal. But, as dates would have overlapped with each other, I have used xtics rotate command.

Finally, the gnuplot histogram is as below:

Commit Histogram

The histogram omits the x-axis values where I haven’t committed anything.

The shell script currently takes commits from the current checked-out branch, and gives commits for each day. Maybe, some options can be added to add granularity for days, weeks or months…

Printing git commit histogram

Ruby Programming

I am doing some Ruby programming lately. And, I come across this problem which involve traversing an array:

Problem Text:

Write a program which will accept two rows of integers separated by commas. The program should count the occurrence of every digit from first row in the second row. The program should print the corresponding digit along with count of how many times it appears in second row separated by hyphen (one output per line).

For example, let us suppose the following two string inputs are supplied to the program:

Then, the output of the program should be:

1-1 (as 1 only appears once)
2-1 (as 2 only appears once)
3-2 (as 3 appears twice)

I hacked an easy solution which has some imperative programming ancestry:

keys = gets
input = gets

dict = {}
keys.split(",").map { |s| s.to_i }.each do |key|
  input.split(",").map { |s| s.to_i }.each do |val|
    if key == val
      dict[key] ||= []
      dict[key] << val

keys.split(",").map { |s| s.to_i }.each do |key|
  unless dict.assoc(key).nil?
    puts "#{key}-#{dict.assoc(key).last.size}"
    puts "#{key}-0"

But then, I got some time to think over, and came up with one tidy solution:

keys = gets
input = gets

keys.split(",").map{|a| a.to_i}.each do |key|
  puts "#{key}-#{input.split(",").map{|a| a.to_i}.count(key)}"

And, then, again, I tried to be smarter, and used Hash to minimize some passes:

keys = gets
input = gets

hash = input.split(",").map{|a| a.to_i}
  .inject( {|hash, key|hash[key]+=1; hash}
keys.split(",").map{|a| a.to_i}.each {|key| puts "#{key}-#{hash[key]}"}

Yet, I did something which was not required. I use map() to convert character to integer. This step was not required, because we were just interested in the characters and not the arithmetic on those characters. So, I could have altogether eliminated the map{|a| a.to_i} step.

Ruby Programming