I have bash function which run python (which return all found regex from stdin)
function find-all() {
python -c "import re
import sys
print '\n'.join(re.findall('$1', sys.stdin.read()))"
}
When I use this regex find-all 'href="([^"]*)"' < index.html
it should return first group from the regex (value of href attribute from file index.html)
How can I write this in sed or awk?
I suggest you use grep -o
.
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
E.g.:
$ cat > foo
test test test
test
bar
baz test
$ grep -o test foo
test
test
test
test
test
Update
If you were extracting href attributes from html files, using a command like:
$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html
href="style.css"
href="iehacks.css"
href="old/"
You could extract the values by using cut
and sed
like this:
$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html| cut -f2 -d'=' | sed -e 's/"//g'
style.css
iehacks.css
old/
But you'd be better off using html/xml parsers for reliability.