I am trying to scrape the pages of website https://www.enghindi.com/ . URLs are saved in csv file, for example
URL | Hindi meaning |
---|---|
Url1 | hindi meaning |
url2 | hindi meaning |
now, everytime I am running following script . it just shows result of only URL1 and that goes into multiple cells. I want all result of url 1 should be in one cell (in hindi meaning box) and similarly for URL2.
url1 : https://www.enghindi.com/index.php?q=close url2 : https://www.enghindi.com/index.php?q=compose
$URLs = import-csv -path C:\Scripts\PS\urls.csv | select -expandproperty urls
foreach ($url in $urls)
{
$web = Invoke-WebRequest $url
$data = $web.AllElements | Where{$_.TagName -eq "BIG"} | Select-Object -Expand InnerText
$datafinal = $data.where({$_ -like "*which*"},'until')
}
foreach ($item in $datafinal) {
[ pscustomobject]@{ Url = $url; Data = $item } | Export-Csv -Path C:\Scripts\PS\output.csv -NoTypeInformation -Encoding unicode -Append
}
Are there other ways I can get english to hindi word meaning using web scraping instead of copying and pasting. I prefer google translate but that I think difficult that is why i am trying with enghindi.com.
thanks alot
Web scraping, due its inherent unreliability, should only be a last resort. You can make it work in Windows PowerShell, but note that the HTML DOM parsing is no longer available in PowerShell (Core) 7+.
You code has two basic problems:
It operates on $datafinal
after the foreach
loop, at which point you only see the results of the last Invoke-WebRequest
call.
You loop over each element of array $datafinal
and create an output object for each, instead of creating an output object per input URL.
The following reformulation fixes these problems:
# Sample input URLs
$URLs = @(
'https://www.enghindi.com/index.php?q=close',
'https://www.enghindi.com/index.php?q=compose'
)
$URLs |
ForEach-Object {
$web = Invoke-WebRequest $_
$data = $web.AllElements | Where { $_.TagName -eq "BIG" } | Select-Object -Expand InnerText
$datafinal = $data.where({ $_ -like "*which*" }, 'until')
# Create the output object for the URL at hand and implicitly output it.
# Join the $datafinal elements with newlines to form a single vaulue.
[pscustomobject] @{
Url = $_
Hindi = $datafinal -join "`n"
}
} |
ConvertTo-Csv -NoTypeInformation
Note that, for demonstration purposes, ConvertTo-Csv
is used in lieu of Export-Csv
, which allows you to see the results instantly.