My static site generator has a /pages
directory with a bunch of source files. The names of those source files need to be prepended with my website URL, and then concatenated into a file (Either .txt or .xml).
Here's what I have so far:
find ./pages -name '*.js' \( -exec echo "$FILE"/{} \; -o -print \)
This command prints the names of the files with the extra pages directory up front like this:
/pages/index.js
/pages/articles/article-title.js
/pages/about/index.js
/pages/about/team.js
...
I'm not fantastic with bash. How do I edit each line to include https://www.example.com
in front of each line, removing /pages
?
Also, I'll need to remove the word index anywhere it appears. /pages/about/index.js
should become https://www.example.com/about
for example, and /pages/about/team.js
should become https://www.example.com/about/team
Bonus
A list of URLs in a .txt file is an acceptable sitemap and I'm happy with that, but if we want to go beyond, we can produce an XML file that has last modified dates.
date -r pages/about.js +"%Y-%m-%d" | tee test.xml
This command writes the correct modified date, but I'd have to get it in this final format:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.example.com</loc>
<lastmod>2023-03-01</lastmod>
</url>
<url>
<loc>https://www.example.com/about</loc>
<lastmod>2023-03-12</lastmod>
</url>
</urlset>
find ./pages -name '*.js' \( -exec echo "$FILE"/{} \; -o -print \)
This command prints the names of the files with the extra pages > directory up front like this:
/pages/index.js /pages/articles/article-title.js /pages/about/index.js /pages/about/team.js ...
No, it doesn't. Provided that variable FILE
has not been set, that command will produce output lines of this form:
/./pages/index.js
If FILE
had been set to a non-null string, then the output would differ even more from what you say. To produce output in the form you show, I suppose that you were actually running this similar command:
find pages -name '*.js' \( -exec echo "$FILE"/{} \; -o -print \)
And given that the $FILE
part is doing nothing, and the -o -print
is relevant only in the unlikely event that echo
returns a failure status, a simpler way to achieve the same thing would be
find pages -name '*.js' -exec echo /{} \;
Since you want to modify the beginnings of the output lines, however, it's not particularly useful to prepend a slash, and since -print
is the default, I would start with just
find pages -name '*.js'
Then, sed
is one of the typical tools for modifying lines of a file. It looks like you want something along these lines to substitute the leading pages
of each result line with the first part of the corresponding URL:
find pages -name '*.js' | sed 's|^pages|https://www.example.com|'
There's no one-liner for producing the XML format you describe using common shell utilities. It would be possible to write a script to generate that, but I leave it as an exercise.