powershellweb-scrapingweb

web scraping using powershell


I am trying to scrape the pages of website https://www.enghindi.com/ . URLs are saved in csv file, for example

URL Hindi meaning
Url1 hindi meaning
url2 hindi meaning

now, everytime I am running following script . it just shows result of only URL1 and that goes into multiple cells. I want all result of url 1 should be in one cell (in hindi meaning box) and similarly for URL2.

url1 : https://www.enghindi.com/index.php?q=close url2 : https://www.enghindi.com/index.php?q=compose


$URLs = import-csv -path C:\Scripts\PS\urls.csv | select -expandproperty urls

  
foreach ($url in $urls)

{
$web = Invoke-WebRequest $url
$data = $web.AllElements | Where{$_.TagName -eq "BIG"} | Select-Object -Expand InnerText 
$datafinal = $data.where({$_ -like "*which*"},'until')
}

foreach ($item in $datafinal) {
[ pscustomobject]@{ Url = $url; Data = $item  } | Export-Csv -Path C:\Scripts\PS\output.csv -NoTypeInformation -Encoding unicode -Append 
 
     }

Are there other ways I can get english to hindi word meaning using web scraping instead of copying and pasting. I prefer google translate but that I think difficult that is why i am trying with enghindi.com.

thanks alot


Solution

  • Web scraping, due its inherent unreliability, should only be a last resort. You can make it work in Windows PowerShell, but note that the HTML DOM parsing is no longer available in PowerShell (Core) 7+.

    You code has two basic problems:

    The following reformulation fixes these problems:

    # Sample input URLs
    $URLs = @(
      'https://www.enghindi.com/index.php?q=close', 
      'https://www.enghindi.com/index.php?q=compose'
    )
    
    $URLs | 
      ForEach-Object {
        $web = Invoke-WebRequest $_
        $data = $web.AllElements | Where { $_.TagName -eq "BIG" } | Select-Object -Expand InnerText 
        $datafinal = $data.where({ $_ -like "*which*" }, 'until')
        # Create the output object for the URL at hand and implicitly output it.
        # Join the $datafinal elements with newlines to form a single vaulue.
        [pscustomobject] @{
          Url = $_
          Hindi = $datafinal -join "`n"
        }
      } | 
      ConvertTo-Csv -NoTypeInformation
    

    Note that, for demonstration purposes, ConvertTo-Csv is used in lieu of Export-Csv, which allows you to see the results instantly.