gitgithub-linguist

github-linguist including files with linguist-vendored attribute in language statistics


I have a github repository that is predominately C++ but has lots of vendor-generated C code (drivers for a microcontroller) that is completely throwing off the language statistics. I have read this page and I have created a .gitattributes file in my repository that should mark all these driver files as linguist-vendored and keep them from being included in the statistics. Although git check-attr reports the linguist-vendored attribute as being set, the github-linguist command line tool still ignores this. What am I doing wrong?

$ cat .gitattributes
STM32[[:space:]]Code/*/** linguist-vendored
STM32[[:space:]]Code/*/Core/Src/** -linguist-vendored
STM32[[:space:]]Code/*/Core/Inc/** -linguist-vendored

$ git add .gitattributes
$ git commit --amend --no-edit
[master 017861e] fix github language metrics
 Date: Sat Sep 25 16:09:00 2021 -0700
 1 file changed, 3 insertions(+)
 create mode 100644 .gitattributes

$ git check-attr -a "STM32 Code/BLDC/Drivers/STM32F3xx_HAL_Driver/Src/stm32f3xx_hal.c"
STM32 Code/BLDC/Drivers/STM32F3xx_HAL_Driver/Src/stm32f3xx_hal.c: linguist-vendored: set

$ github-linguist --breakdown
94.75%  C
2.92%   C++
2.09%   Makefile
0.23%   Assembly
0.01%   Shell

...
C:
STM32 Code/BLDC/Drivers/STM32F3xx_HAL_Driver/Src/stm32f3xx_hal.c
...

I have also tried changing the .gitattributes file to just

STM32[[:space:]]Code/** linguist-vendored

and it still doesn't ignore the files inside.


Solution

  • I suspect you may be using an old version of Linguist and/or rugged (which is what does the checking of the gitattributes); possibly the versions shipped with your OS as I can't reproduce your behaviour using the latest version which GitHub is using:

    $ gem install github-linguist
    Fetching github-linguist-7.16.1.gem
    Building native extensions. This could take a while...
    Successfully installed github-linguist-7.16.1
    1 gem installed
    $ git init foo
    Initialized empty Git repository in /Users/lildude/Downloads/trash/foo/.git/
    $ mkdir -p "foo/STM32 Code/BLDC/Drivers/STM32F3xx_HAL_Driver/Src/"
    $ echo "foo" > "foo/STM32 Code/BLDC/Drivers/STM32F3xx_HAL_Driver/Src/stm32f3xx_hal.c"
    $ cd foo
    $ git add .
    $ git commit -m 'Initial commit'
    $ github-linguist . --breakdown
    100.00% 4          C
    
    C:
    STM32 Code/BLDC/Drivers/STM32F3xx_HAL_Driver/Src/stm32f3xx_hal.c
    
    $ echo "STM32[[:space:]]Code/*/** linguist-vendored" > .gitattributes
    $ git add .gitattributes
    $ git commit -m 'overwrite'
    [main 15e0a4e] overwrite
     1 file changed, 1 insertion(+)
     create mode 100644 .gitattributes
    $ github-linguist . --breakdown
    
    
    $