bashshellsedgreptext-processing

Picking out file names from grep results where file name contains numbers and hyphens


I have a script that runs a grep command and formats the results nicely for me, asking if I want to open any of the resulting files in an editor etc.

The core of my script is a command like this:

grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

It runs the grep, outputting the file name on every line and then runs some processing to put the file names on a different line from the results.

> grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
3-ten eleven twelve

I have more processing that pretties up the results further and gives me a list of the matching files, asking me which I want to open up in an editor.

I am having trouble with files whose name/path include hyphens and numbers (e.g. "/tmp/searchTest/numbers/222-222-222/222-222-222.txt") which means my sed command fails to pick out the file name from the hyphen/colon delimited line numbers.

Here is a script that sets up a test case showing this:

#!/bin/bash

rm -rf /tmp/searchTest 2> /dev/null
mkdir -p /tmp/searchTest/numbers/111-111-111
mkdir -p /tmp/searchTest/numbers/222-222-222
mkdir -p /tmp/searchTest/letters/aaa-aaa-aaa
mkdir -p /tmp/searchTest/letters/bbb-bbb-bbb

cat << EOF > /tmp/searchTest/numbers/111-111-111/111-111-111.txt
one two three
four five six
seven eight nine
EOF

cat << EOF > /tmp/searchTest/numbers/222-222-222/222-222-222.txt
four five six
seven eight nine
ten eleven twelve
EOF

cat << EOF > /tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
one two three
four five six
seven eight nine
EOF

cat << EOF > /tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
four five six
seven eight nine
ten eleven twelve
EOF

echo "Contents of /tmp/searchTest"
tree /tmp/searchTest

echo -e "\nFirst search, looking for \"eight\".\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters

echo -e "\nExtending first search, looking for \"eight\" and extracting file names.\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/letters  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

echo -e "\nSecond search, looking for \"eight\".\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers

echo -e "\nExtending second search, looking for \"eight\" and extracting file names - but fails.\n---"
grep -HinER -B 0 -A 2 --include "*.txt" "eight" /tmp/searchTest/numbers  | sed -re "s/:([0-9]+:)/\n\1/" -e "s/-([0-9]+-)/\n\1/" -e "s/^[.]/\\n./"

The results for the second search shows how the file names break the sed command.

First search, looking for "eight".
---
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt:3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt:2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt-3-ten eleven twelve

Extending first search, looking for "eight" and extracting file names.
---
/tmp/searchTest/letters/aaa-aaa-aaa/aaa-aaa-aaa.txt
3:seven eight nine
--
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
2:seven eight nine
/tmp/searchTest/letters/bbb-bbb-bbb/bbb-bbb-bbb.txt
3-ten eleven twelve

Second search, looking for "eight".
---
/tmp/searchTest/numbers/111-111-111/111-111-111.txt:3:seven eight nine
--
/tmp/searchTest/numbers/222-222-222/222-222-222.txt:2:seven eight nine
/tmp/searchTest/numbers/222-222-222/222-222-222.txt-3-ten eleven twelve

Extending second search, looking for "eight" and extracting file names - but fails.
---
/tmp/searchTest/numbers/111
111-111/111-111-111.txt
3:seven eight nine
--
/tmp/searchTest/numbers/222
222-222/222-222-222.txt
2:seven eight nine
/tmp/searchTest/numbers/222
222-222/222-222-222.txt-3-ten eleven twelve

Is there a better way to pick out the file names? This is a general purpose script, so there is no set pattern I can rely on for file names: spaces, digits, letters, no extension etc are all possible.

It seems like the only way to do this reliably would be to run grep twice, with the first being a grep -l just to get the file names alone, which I can then map to the results.. But this is pretty exteme, especially for a big search space.


Update: Thursday 20 March 2025, 06:00:22 pm

Adding more detail on actual use in response to a comment from @Yokai.

Here is an example of how I use this script already. This works quite well for me, showing me search results and asking what files I want to open in a text editor.

> search.sh -d /Users/rob.bram/DirTechTips -y e -t "junit temporary" -A2
Search for pattern "junit temporary" in dir /Users/rob.bram/DirTechTips through file pattern "*.*"

====
./Java/cheat_Java-Junit.md
17:- [JUnit Temporary Files](#junit-temporary-files)
18-    - [Listing files in temp dir during debugging](#listing-files-in-temp-dir-during-debugging)
19-- [Parallel Test Execution for JUnit 5](#parallel-test-execution-for-junit-5)
--
445:## JUnit Temporary Files
446-
447:This section: [JUnit Temporary Files](cheat_Java-Junit.md#junit-temporary-files) | [Back to top](#top)
448-
449-From:  [Working and unit testing with temporary files in Java](https://blogs.oracle.com/javamagazine/working-and-unit-testing-with-temporary-files-in-java).
--
614:- Added section `JUnit Temporary Files`.
615-
616-Wednesday, 27th of October 2021, 10
46:26 AM
--

====
./Java/cheat_Java-File-System.md
43:1. Temp files in JUnit. See [JUnit Temporary Files](cheat_Java-Junit.md#junit-temporary-files).
44-2. Create temp file or directory with Java via `java.nio.file.Files` (Java 7).
45-


Do you want to view any of the matching files?
============
File  0: ./Java/cheat_Java-Junit.md
File  1: ./Java/cheat_Java-File-System.md
----
Specify files to open. [A]ll, [N]one or [x y z] space separated indexes.
Can also override editor choice. EDITOR can be one of favourite [t]ext editor (VS Code), [e]clipse, [l]ess, n[o]tepad, [v]im, co[n]sole or c[y]gstart.

This ends up running the following core grep command: grep -HE --text -i -B 0 -A 2 -n -H "junit temporary"


Solution

  • Use -Z to have the filenames followed by a \0 byte rather than a : or - character, then instruct sed to look for that \0 byte:

    # have sed replace the first 0 byte with \n
    grep -Z -HinER ... | sed -e 's/\x00/\n/'
    

    This should lift the ambiguity