javascriptregexhexhexdump

How to split a continuous hex string into space-delimited hex blocks of 32 tuplets each?


I have a fairly long hex string Buffer.toString("hex") that I want to print to a log file in a block of 32 tuplets each.

So basically going from e01102020809020300800202020809020208095f520c8066054445472b44739621e0d003040401d21044454946583532463447444a4d010000d3104445472b445333374f53474b32010000d4104445472b44533337474b563033010000d503040401d6104445472b444342324354473031010000d7104445472b44504450535f5f5f5f0106009000

to

e0 11 02 02 08 09 02 03 00 80 02 02 02 08 09 02 02 08 09 5f 52 0c 80 66 05 44 45 47 2b 44 73 96
21 e0 d0 03 04 04 01 d2 10 44 45 49 46 58 35 32 46 34 47 44 4a 4d 01 00 00 d3 10 44 45 47 2b 44
53 33 37 4f 53 47 4b 32 01 00 00 d4 10 44 45 47 2b 44 53 33 37 47 4b 56 30 33 01 00 00 d5 03 04
04 01 d6 10 44 45 47 2b 44 43 42 32 43 54 47 30 31 01 00 00 d7 10 44 45 47 2b 44 50 44 50 53 5f
5f 5f 5f 01 06 00 90 00

I have tried hex.replace(/.{64}/g, "$1\n") to wrap the hex string after 32 tuplets but don't know how to add the spaces between the tuplets now. I would prefer to do that in one regex, if possible?


Solution

  • You can't really do this in one regex in JavaScript because it doesn't support features like variable length lookbehind (in general), conditional replacement or the \G meta sequence. You can just split the string into chunks of up to 64 characters and then those strings into chunks of up to 2 characters, joining the splits with space and newline:

    const hex = 'e01102020809020300800202020809020208095f520c8066054445472b44739621e0d003040401d21044454946583532463447444a4d010000d3104445472b445333374f53474b32010000d4104445472b44533337474b563033010000d503040401d6104445472b444342324354473031010000d7104445472b44504450535f5f5f5f0106009000'
    
    const out = hex
        .match(/(.{1,64})/g)
        .map(s => s.match(/.{1,2}/g).join(' '))
        .join('\n')
    
    console.log(out)

    Note I've used {1,2} in the second regex in case your data might have an odd number of characters. If it won't, you can simply use .{2} (or simplify to just ..).

    You could also define a replacer function to split the string using one regex into variable line lengths:

    const replacer = n => {
      let cnt = 0
      return (m) => m + ((++cnt % n) ? ' ' : '\n')
    }
    
    const hex = 'e01102020809020300800202020809020208095f520c8066054445472b44739621e0d003040401d21044454946583532463447444a4d010000d3104445472b445333374f53474b32010000d4104445472b44533337474b563033010000d503040401d6104445472b444342324354473031010000d7104445472b44504450535f5f5f5f0106009000'
    
    let out = hex.replace(/../g, replacer(32))
    
    console.log(out)
    
    out = hex.replace(/../g, replacer(16))
    
    console.log(out)
    
    out = hex.replace(/../g, replacer(8))
    
    console.log(out)

    Out of interest, I ran 1000 iterations of each answer on my computer with a 3000 character input string. The results (in milliseconds) were:

    algorithm result
    Nick (match/join) 1208
    blhsing 1300
    Oussama BASRY 1362
    Pointy 2092
    Nick (replacer) 2142

    So although the replacer functions give great flexibility, it comes at a performance cost because of all the function calls.