Counting the number of fields stored in a variable

I'm working on a basic file carver and I'm currently stuck on calculate the byte position of the file.

I've worked out that I need a piece of code to perform the following steps;

Locate the $searchQuery in the variable
Remove the rest of the string after the $searchQuery is found
Count the number of fields that now exist within the variable
Minus 2 from this variable to take into account the Hex Offset and the $searchQuery itself
Then multiply the answer by two to get the correct byte count

An example of this would be;

Locate "ffd8" within "00052a0: b4f1 559c ffd8 ffe0 0010 4a46 4946 0001"
Variable is updated to "00052a0: b4f1 559c ffd8"
$fieldCount is assigned the value of "4"
$fieldCount=((fieldCount-2))
$byteCount=((fieldCount*2))

I have a basic idea of how to do everything but count the number of fields in the variable. For example, how would I count how many fields there are in the variable until the $searchQuery is found? And similarly, how do I count the number of fields once I've removed the unnecessary part of the string?

After locating the $searchString with grep I have no idea how to proceed. My current code looks like this;

#!/bin/bash
#***************************************************************
#Name:          fileCarver.sh
#Purpose:       Extracts files hidden within other files
#Author:        
#Date Written:      12/01/2013
#Last Updated:      12/01/2013
#***************************************************************

clear

#Request user input
printf "Please enter the input file name: "
read inputFile
printf "Please enter the search string: "
read searchString

#Search for the required string
searchFunction()
{
    #Search for required string and remove unnecessary characters
    startHexOffset=`xxd $1 | grep $2 | cut -d":" -f 1`
    #Convert the Hex Offset to Decimal
    startDecOffset=$(echo "ibase=16;${startHexOffset^^}" | bc)
}

searchFunction $inputFile $searchString


exit 0

Thanks for the help!

Solution

You might find this easier if you convert the file to hex in a simpler format. For example, you can use the command

hexdump -v -e '/1 "%02x "' $FILE

to print the file with every byte converted to exactly three characters: two hex digits and a space.

You could find all instances of ffd8 prefixed with their byte offset:

hexdump -v -e '/1 "%02x "' $FILE | grep -Fbo 'ff d8 '

(The byte offsets need to be divided by 3.)

So you could stream the entire file from the first instance of ffd8 using:

tail -c+$((
  $(hexdump -v -e '/1 "%02x "' $FILE | grep -Fbo 'ff d8 ' | head -n1 | cut -f1 -d:)
  / 3 + 1)) $FILE

(That assumes that whatever you use to display the file knows enough to stop when it hits the end of the image. But you could similarly find the last end marker.)

This depends on GNU grep; standard Posix grep lacks the -b option. However, it can be done with awk:

tail -c+$(
    hexdump -v -e '/1 "%02x\n"' $FILE |
    awk '/d8/&&p=="ff"{print NR-1;exit}{p=$1}'
  ) $FILE

Explanation of options:

tail    -c+N    file starting at byte number N (first byte is number 1)

hexdump -v      do not compress repeated lines on output
        -e 'FORMAT'  use indicated format for output:
            /1       each format consumes 1 byte
            "%02X "  output two hex digits, including leading 0, using lower case,
                     followed by a space.

grep    -F      pattern is just plain characters, not a regular expression
        -b      print the (0-based) byte offset of the... 
        -o      ... match instead of the line containing the match

cut     -f1     output the first field of each line
        -d:     fields are separated by :