I have a html menu file, which contains list of html pages, extracted by chm decoder.
(7,0,"Icons Used in This Book","final/pref04.html");
(8,0,"Command Syntax Conventions","final/pref05.html");
(9,0,"Introduction","final/pref06.html");
(10,0,"Part I: Introduction and Overview of Service","final/part01.html");
(11,10,"Chapter 1. Overview","final/ch01.html");
(12,11,"Technology Motivation","final/ch01lev1sec1.html");
I want create from this a 'table of contents' file for Calibre (HTML file that contains links to all the other files in the desired order). The final file should look like this:
<a href="final/pref04.html">Icons Used in This Book</a><br/>
<a href="final/pref05.html">Command Syntax Conventions</a><br/>
.
.
.
So first I need to remove the digit prefixes with regular expression, then add a href
attribute to make hyperlink, and change the URL and title position. Can anyone show how to make this with Notepad++?
I think this would do it for you, I'm mac based so I don't have notepad++ but this works in dreamweaver. Presuming each expression is one line based.
Find:
\(.*?"(.*?)","(.*?)".*
Replace:
<a href="$2">$1</a><br/>
File:
(7,0,"Icons Used in This Book","final/pref04.html");
(8,0,"Command Syntax Conventions","final/pref05.html");
(9,0,"Introduction","final/pref06.html");
(10,0,"Part I: Introduction and Overview of Service","final/part01.html");
(11,10,"Chapter 1. Overview","final/ch01.html");
(12,11,"Technology Motivation","final/ch01lev1sec1.html");
After Replace All:
<a href="final/pref04.html">Icons Used in This Book</a><br/>
<a href="final/pref05.html">Command Syntax Conventions</a><br/>
<a href="final/pref06.html">Introduction</a><br/>
<a href="final/part01.html">Part I: Introduction and Overview of Service</a><br/>
<a href="final/ch01.html">Chapter 1. Overview</a><br/>
<a href="final/ch01lev1sec1.html">Technology Motivation</a><br/>
If it isn't one line based change .*
to .*?\n
. That should make it stop after each newline. For readability you also may want to add a newline to the replace.
Should probably explain the regex as well in case you want to modify it...
The first \
is escaping the (
so the regex knows to look for the literal character and the not special regex grouping. The *?
says find every character until the first "
; (.
is any single character, *
is zero or more occurrences of the preceding character, and ?
tells it to stop at the first occurrence of the next character, "
). The last .*
says keep going with the search. The (
and )
around the .*?
group the found value into the $1
and $2
. The number correlates to the order in which it is in the regex.