I am on Windows and I am using the "Git for windows" tools in batch files. My etracted code from html site looks like this:
<a xmlns="http://www.w3.org/2000/svg" class="ZLl54 Dysyo" href="./g/git-for-windows/c/jgZ6P7bo7Fo"><div class="t17a0d"><span class="o1DPKc">[ANNOUNCE] Git for Windows 2.41.0</span></div><div class="WzoK">Dear Git users, I hereby announce that Git for Windows 2.41.0 is available from: https://</div></a>
and I want to extract /g/git-for-windows/c/jgZ6P7bo7Fo with sed or awk. The first part is always the same /g/git-for-windows/c/ but the ending of the url part differs.
What I did:
sed 's/^.*\("./g/".*"><div\").*$/\1/' text.txt | tee text2.txt
but it doesn't work.
What I want: I want to extract the upper most (always latest) link to a new release of "Git for Windows" from website https://groups.google.com/g/git-for-windows. The decription shows Announce. Here are my steps:
xidel https://groups.google.com/g/git-for-windows --printed-node-format html -e "//'Links:',//a" | tee text.txt
to get the website as text.
Then I used cat text.txt | grep -F "announce" | head -1 | tee text1.txt
.
The result is the exctracted code I posted above.
My questions: How to use sed or awk correctly to extract the link /g/git-for-windows/c/jgZ6P7bo7Fo from the code? Or how to use xidel in a better way to get better extractable results in text file.
Thank you for your help.
@ECHO OFF
SETLOCAL
rem The following setting for the file is a name
rem that I use for testing and deliberately includes spaces to make sure
rem that the process works using such names. These will need to be changed to suit your situation.
SET "sourcedir=u:\your files"
SET "filename1=%sourcedir%\q76495893.txt"
SET "extracted="
FOR /f "usebackqdelims=" %%e IN ("%filename1%") DO (
FOR %%o IN (%%e) DO (
IF DEFINED extracted FOR /f "delims=<>" %%y IN ("%%o") DO SET "extracted=%%~y"&GOTO gotit
IF "%%~o"=="href" SET "extracted=x"
)
)
ECHO NOT found
GOTO :eof
:gotit
SET "extracted=%extracted:~1%"
ECHO extracted=%extracted%
GOTO :EOF
Since you tagged the post "batch"
Read the data from a file to %%e
. Use standard list-processing of %%e
to set %%o
to each space-separated token in turn. When the href
token is found, set extracted
for use as a flag. When the next token arrives, use tokenising on the redirectors to grab the quoted string, and assign that, minus the quotes to extracted
and done.
Well, almost. Need to remove the first character as you want the string minus the .