For some days, I was thinking how many LOCs I am writing at my workplace. We use Git to maintain the source code. So, I came up with a solution using shell script to get a raw data of lines added and deleted to the repo per day for given period.
The script I used is:
alias date="gdate" start_date=$1 if [ -z "$2" ]; then end_date=$(date +%F) else end_date=$2 fi while [[ ! "$start_date" > "$end_date" ]] do git -C ~/devel/amber \ log --since=$(date --date "$start_date" +%F) \ --until=$(date --date "$start_date 1 day" +%F) \ --author='abhay' --pretty=tformat: --numstat \ | grep -v '^$' \ | awk -v the_date="$start_date" \ "{plus += \$1; minus += \$2;} END {print the_date, plus, minus;}" start_date=$(date --date "$start_date 1 day" +%F) done
The script can be fired as:
sh repostat.sh 2014-10-01 2014-10-16
Where the first is a start-date and second is an end-date. Second date can be omitted; In that case, the script will dump the output till today.
The output of above command is:
2014-11-01 2014-11-02 549 72 2014-11-03 126 54 2014-11-04 133 37 2014-11-05 283 87 2014-11-06 97 16 2014-11-07 2014-11-08 2014-11-09 2014-11-10 2014-11-11 153 12 2014-11-12 33 10 2014-11-13 165 153 2014-11-14 2014-11-15 2014-11-16
In the output, the column sequence is: date, number of lines added, and number of lines deleted from source code. The lines where second, and third columns are missing are the dates where I didn’t commit anything.
Now, let’s discuss what the shell script does.
The first statement aliases gdate
command to date
. The --date
option is not available on OS X for date
command. So, I brewed coreutils
to the system, which installs date
utility as gdate
to avoid confusion with built-in date
command. The date
which ships with GNU/Linux doesn’t have this problem, so the alias can be removed.
Then, the if-else
block makes the second argument to bash script optional. The +%F
option to the date
command is used to print the date in YYYY-MM-dd
format.
The while
loop iterates for the period between given two dates, from first to second date (inclusive).
Next, the -C
option for the git
specifies the path to the Git repository. This option must be specified before any git
command (like log
, commit
, etc), because it’s a option for git
, and not for any particular command.
Then, the log
command prints a 3 column output of number of additions, number of deletions, and file name for a specified --since
and --until
options for a given author
(that’s me 🙂 ). Then, I have aggregated the first two columns data using awk
.
Then, I redirected the output to the log.txt
file using:
sh repostat.sh 2014-10-01 2014-10-16 > log.txt
I used gnuplot to print the histogram of the collected data. Following is the gnuplot
script saved as hist.gnu
:
set terminal png set output "graph.png" set xtics rotate out set style data histogram set style fill solid border plot "log.txt" using 2:xtic(1) title "Additions" lt rgb "green", "" using 3 title "Deletions" lt rgb "red"
The script is ran as:
gnuplot hist.gnu
The gnuplot
command outputs the histogram showing count of lines added and deleted in the repo. The histogram is saved in PNG format.
The set xtics
command shows dates vertically; by default, the x-axis values are shown horizontal. But, as dates would have overlapped with each other, I have used xtics rotate
command.
Finally, the gnuplot
histogram is as below:
The histogram omits the x-axis values where I haven’t committed anything.
The shell script currently takes commits from the current checked-out branch, and gives commits for each day. Maybe, some options can be added to add granularity for days, weeks or months…