powershellexcel-formulaexecute

Convert multiple Excel workbooks each with multiple worksheets to CSV: How do I extract a url that is constructed by a formula?


I'm writing a PowerShell script to convert a collection of multi-worksheet Excel xlsx workbooks into a single csv file. One of the things I want to grab is the calculated text of a hyperlink created by a HYPERLINK formula. For example a cell contains =HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here")

I'm able to grab the cell with $currentCell = $sheet.Cells.Item($r, $c). I can grab the link text Click Here using $currentCell.Text I can detect the cell has a formula by testing $currentCell.HasFormula. I can grab the formula using $currentCell.Formula and parse it with a regex to detect it contains a HYPERLINK formula. But, what I want is to obtain the results of executing the formula. I can execute the formula using $currentCell.Calculate(), but I can't figure out how to get a hold of the results (When I assign the results of $currentCell.Calculate() to a variable, the variable ends up being a System.DBNull).

How do I programmatically obtain the results of an cell's Calculate method?

Update

After thinking about Benoît Mayer's answer, I realized that I didn't understand the basis of my own question. I was trying to generalize the processing of cells that contain formulas, but that won't work. The cell's formula IS being calculated, i.e, as I extract the cell's text (the cell with the HYPERLINK and CONCATENATE formulas), I'm getting Click Here which is the result of executing the formula (e.g.=HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here")). I need to detect and parse the HYPERLINK and CONCATENATE formulas and use the approach that Benoît describes.

Here's my code. It converts multiple Excel workbooks, each with multiple sheets, and extracts the results of the specific formulas in the sheets I need to process. See the code following lines 136 and 145.

**Code. Updated on 5/7 with bug fixes and code to detect and extract data from specific formulas **

cls

#region Functions

Function Remove-WhiteSpaceFromNonQuoted($inString)
{

    $quoted = $false
    $newString = ""

    for ($i = 0; $i -lt $inString.Length; $i++)
    {
        if ($inString[$i] -eq "`"")
        {
            $quoted = $quoted -xor $true
        }

        if (($inString[$i] -match "\S" -and !$quoted) -or ($quoted))
        {
            $newString = $newString + $inString[$i] 
        }
    }

    return $newString
}

#endregion

$sortedFieldNameList = New-Object -TypeName System.Collections.SortedList

$fqBookNames = New-Object -TypeName System.Collections.SortedList

$fqBookNames.Add("C:\foo\bar1.xlsx", "")
$fqBookNames.Add("C:\foo\bar2.xlsx", "")
$fqBookNames.Add("C:\foo\barN.xlsx", "")

$global:workBook = $null
$global:excel =  $null

try
{   

    $global:excel = New-Object -Com Excel.Application
    $global:excel.Visible = $false

    write-host ("Scan for column names")

    #Scan all sheets in all books and create an object with all the column names encountered 
    foreach ($fqBookName in $fqBookNames.Keys)
    {
        $global:workBook = $global:excel.Workbooks.Open($fqBookName)

        foreach ($sheet in $global:workBook.Sheets)
        {
            $columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
            write-host ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name)
            $rowOne = $sheet.Rows(1)

            for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
            {
                $columnName = $rowOne.Cells($columnIndex).Text.Trim().ToUpper()

                if ($columnName.Length -gt 0)
                {
                    if (!$sortedFieldNameList.ContainsKey($columnName)) 
                    {
                        $sortedFieldNameList.Add($columnName, "")
                    }
                }
                else
                {
                    break
                }
            }
        }

        $global:workBook.Close($false)
        Clear-Variable workBook
    }

    #Create a class that represents the worst-case collection of columns that will be output to, e.g., a grid or CSV file
    #https://stackoverflow.com/questions/49117127/create-a-class-with-dynamic-property-names-in-powershell
    Invoke-Expression @"
    Class ClsExportCsv {
    $(($sortedFieldNameList.Keys).ForEach({"[string] `${$($_)}`n "}))
    }
"@

    #create array to hold list of rows that will be output to, e.g., a grid or CSV file
    $itemList = New-Object System.Collections.ArrayList
    $itemList.clear()

    write-host ("Scan for data")

    foreach ($fqBookName in $fqBookNames.Keys)
    {
        $global:workBook = $global:excel.Workbooks.Open($fqBookName)

        foreach ($sheet in $global:workBook.Sheets)
        {
            write-host -NoNewline ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name + ". Rows=")

            $columnNameLookup = @{}
            $columnNameLookup.Clear()

            $columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
            $rowOne = $sheet.Rows(1)

            #create column name index lookup table for this sheet
            for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
            {
                $columnNameLookup.Add($columnIndex, $rowOne.Cells($columnIndex).Text.Trim().ToUpper())
            }

            for ($rowIndex = 2; $rowIndex -le $sheet.Cells.EntireRow.Count; $rowIndex++)
            {
                $rowCurrent = $sheet.Rows($rowIndex)

                if (($rowCurrent.Cells(1).Text).Length -gt 0)
                {

                    $listRow = New-Object -TypeName ClsExportCsv

                    for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
                    {
                        if (($columnNameLookup.$columnIndex).Length -gt 0)
                        {

                            $cellObject = $rowCurrent.Cells($columnIndex)
                            $textFromFormula = ""

                            if ($cellObject.HasFormula)
                            {
                                $formulaNoWhiteSpace = Remove-WhiteSpaceFromNonQuoted -inString $cellObject.Formula

                                #detect and parse cells with =HYPERLINK(CONCATENATE("http://xxxx.aspx?pid=",A2),"Click Here")
                                if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(CONCATENATE\(\")(?<URL>.*)(?:\"\,)(?<A1>.*)(?:\)\,.*)$')
                                {
                                    if (($Matches["URL"] -ne $null) -and ($Matches["A1"] -ne $null))
                                    {
                                        $textFromFormula = ($Matches["URL"] + $sheet.Range($Matches["A1"]).Text) 
                                    }
                                }

                                #detect and parse cells with =HYPERLINK("http://xxxx","Click Here")
                                if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(\")(?<URL>.*)(?:\"\,\".*\"\))$')
                                {
                                    if ($Matches["URL"] -ne $null)
                                    {
                                        $textFromFormula = $Matches["URL"] 
                                    }
                                }
                            }

                            if ($textFromFormula.Length -eq 0)
                            {
                                $listRow.($columnNameLookup.$columnIndex) = $rowCurrent.Cells($columnIndex).Text.Trim()
                            }
                            else
                            {
                                $listRow.($columnNameLookup.$columnIndex) = $textFromFormula
                            }

                        } # if (($columnNameLookup.$columnIndex).Length -gt 0)

                    } # for ($columnIndex = 1; ...

                    $itemList.Add($listRow) | out-null
                }
                else
                {
                    write-host ($rowIndex - 2).ToString()
                    break
                }

            } # for ($rowIndex = 2; .....

        } # foreach ($sheet in $global:workBook.Sheets)

        $global:workBook.Close($false)
        Clear-Variable workBook
    }

    $global:excel.Quit()
    Clear-Variable excel

    $itemList | Export-CSV -LiteralPath "C:\Users\foo\combined.csv" -NoTypeInformation -Encoding UTF8 -Delimiter ',' $itemList | Out-GridView -Title "Rows"

}
finally
{

    if ($global:excel -ne $null)
    {
        if ($global:workBook -ne $null)
        {
            $global:workBook.Close($false)
        }

        $global:excel.Quit()
        Clear-Variable excel
    }
}

Solution

  • It does not seem possible to directly get the adress generated by the Concatenate function, see for example Extract URL From Excel Hyperlink Formula.

    Why a solution using regex such as the one below would not be appropriate?

    $split = $currentCell.Formula -split 'CONCATENATE' | Select -Last 1 | %{$_ -replace `
    '[" ()]','' -split ','}
    $calculatedResult = $split[0] + $sheet.Range("$($split[1])").Text