I have a directory D:\DMS
with lots of subfolders. Within these subfolders there are plenty of ".url
" files which include URLs like
http://mmtst399:8080/dms/objekt?page=index&mode=browser&oid=K60081800
http://zrtpwvap877/dms/download?doc=N59748000
In order later on to replace some of the URLs I would like to search for such URLs by PowerShell eg. find a all URLs that start with http://mmtst399:8080/
or find all URLs that contain K60081800
or find all URLs that start with http://zrtpwvap877/dms/
.
That seems to be difficult to search for such URL within .url
files. I tried already many different PowerShell sample scripts with like and so on, but finally it often shows "No results, No files found" even that I know that there are .url
files in sub folder which contain such URLs. Of course it will be difficult to replace URLs in such files if PowerShell cannot even find that .url
files.
I would like to search for such URLs in .url
files, a txt log with paths of the found results. I guess searching such URL is difficult because some contain =
and ?
and other characters.
Use Select-String
with a regex to match the URL parts of interest:
# The literal URL parts to find, either prefixes or substrings.
$urlParts = 'http://mmtst399:8080/', 'K60081800', 'http://zrtpwvap877/dms/'
# Formulate a regex that matches any of the above URL parts.
# The URL lines inside *.url files start with "URL="
$regex = '^URL=({0})' -f (
$urlParts.ForEach({
$escaped = [regex]::Escape($_) # Escape for literal matching
if ($escaped -match '^https?:') { $escaped }
else { '.*' + $escaped } # Match anywhere in the URL
}) -join '|'
)
# Search all *.url files in the subtree of D:\DMS for the URL parts
# and output the full paths of matching files.
# -Force ensures inclusion of *hidden* files too.
# Outputs to the screen; append e.g. > log.txt to save to a file.
Get-ChildItem -Force -Recurse D:\DMS -Filter *.url |
Select-String -Pattern $regex -List |
ForEach-Object Path
Note:
URL=
entry of a .url
file is permitted to contain URLs without a protocol specifier (e.g. example.org
instead of https://example.org
), which is why the regex for partial matching employed starts with .*
(meaning any run of characters including possibly none).