shellunixawkscripting

Text extraction content between two string using awk


I am trying to extract the -D port entries from a bunch of java process for monitoring purpose. So I did ps -ef and grep for java process and also after some massaging am left with a list of strings in a text file of the below format. I want to extract specific values from this string and put it into a variable like below for converting it into JSON. I prefer using awk because of readablity instead of using a bunch of grep and sed. What would be the easiest way to do that.

port=awk -F " " {extract text between /-Dserver.port/  and ending with space.}
MinMem=awk -F " " {Extract text following /-Xmxx/ and the space follwing to it}
MaxMem=awk -F " " {.....}

I want to implement something like above.

Thanks in advance. !

12.23.34.45 /usr/bin/java -javaagent:/home/appuser/jars/somethig.jar -Dotel.resource.attributes=application=Spring_boot_DC -Dotel.service.name=some-service-name -Dotel.metrics.exporter=none -Dotel.exporter.otlp.endpoint=http://123.123.123.123:4317 -Xms256m -Xmx512m -Dserver.port=5519 -jar -Dspring.profiles.active=prod SNAPSHOT.jar

Solution

  • extract text between /-Dserver.port/ and ending with space.

    I would use GNU AWK for this task following way. Let file.txt content be

    12.23.34.45 /usr/bin/java -javaagent:/home/appuser/jars/somethig.jar -Dotel.resource.attributes=application=Spring_boot_DC -Dotel.service.name=some-service-name -Dotel.metrics.exporter=none -Dotel.exporter.otlp.endpoint=http://123.123.123.123:4317 -Xms256m -Xmx512m -Dserver.port=5519 -jar -Dspring.profiles.active=prod SNAPSHOT.jar
    

    then

    awk 'match($0,/-Dserver.port([^ ]*)/,arr){print arr[1]}' file.txt
    

    gives output

    =5519
    

    Explanation: match string function could be used in 3 argument form, to extract desired substring, denoted by capturing group enclosed in brackets. Note that I used match as pattern, therefore if it will fail to find provided regular expression, no action will be undertaken. Note that due to lack of support for non-greedy regular expression I used any character not being space repeated zero-or-more times.

    (tested in GNU Awk 5.3.1)

    That being said, your data looks like options string and thus I suggest considering options parser for working with it.