I need to search for https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip url from https://www.windwardstudios.com/version/version-downloads using powershell.
Thus i need https:\\<anything>\JavaRESTfulEngine<anything>.zip
To start off, i tried $regexPattern = 'https://cdn\.windwardstudios\.com/Archive/\d{2}\.X/\d+\.\d+\.\d+/JavaRESTfulEngine-.*?\.zip'
which works and gives me the desired URL
To generalize further i tried $regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'
but now it does not work.
Below is my powershell script.
# URL of the website to scrape
$websiteUrl = https://www.windwardstudios.com/version/version-downloads
# Use Invoke-WebRequest to fetch the web page content
$response = Invoke-WebRequest -Uri $websiteUrl
# Check if the request was successful
if ($response.StatusCode -eq 200) {
# Parse the HTML content to find the zip file URL using a regular expression
$htmlContent = $response.Content
$regexPattern = 'https://cdn\.windwardstudios\.com/Archive/([^/]+)/JavaRESTfulEngine-.*?\.zip'
$zipFileUrls = [regex]::Matches($htmlContent, $regexPattern) | ForEach-Object { $_.Value }
if ($zipFileUrls.Count -gt 0) {
Write-Host "Found zip file URLs:"
$zipFileUrls | ForEach-Object { Write-Host $_ }
} else {
Write-Host "Zip file URLs not found on the page."
}
} else {
Write-Host "Failed to fetch the web page. Status code: $($response.StatusCode)"
}
Output:
Zip file URLs not found on the page.
Desired output:
https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip
Can you please suggest?
You can use
https://cdn\.windwardstudios\.com/Archive/(\S+?)/JavaRESTfulEngine-.*?\.zip
See the regex demo.
Details:
https://cdn\.windwardstudios\.com/Archive/
- a literal https://cdn.windwardstudios.com/Archive/
string(\S+?)
- Group 1: one or more non-whitespace chars as few as possible/JavaRESTfulEngine-
- a literal /JavaRESTfulEngine-
string.*?
- any zero or more chars other than line break chars as few as possible\.zip
- a .zip
string.