I'm writing a PowerShell
script to convert a collection of multi-worksheet Excel
xlsx
workbooks into a single csv
file. One of the things I want to grab is the calculated text of a hyperlink created by a HYPERLINK formula. For example a cell contains =HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here")
I'm able to grab the cell with $currentCell = $sheet.Cells.Item($r, $c)
. I can grab the link text Click Here
using $currentCell.Text
I can detect the cell has a formula by testing $currentCell.HasFormula
. I can grab the formula using $currentCell.Formula
and parse it with a regex to detect it contains a HYPERLINK
formula. But, what I want is to obtain the results of executing the formula. I can execute the formula using $currentCell.Calculate()
, but I can't figure out how to get a hold of the results (When I assign the results of $currentCell.Calculate() to a variable, the variable ends up being a System.DBNull
).
How do I programmatically obtain the results of an cell's Calculate
method?
Update
After thinking about Benoît Mayer's answer, I realized that I didn't understand the basis of my own question. I was trying to generalize the processing of cells that contain formulas, but that won't work. The cell's formula IS being calculated, i.e, as I extract the cell's text (the cell with the HYPERLINK and CONCATENATE formulas), I'm getting Click Here
which is the result of executing the formula (e.g.=HYPERLINK(CONCATENATE("http://foo/bar.aspx?pid=",A2),"Click Here")
). I need to detect and parse the HYPERLINK and CONCATENATE formulas and use the approach that Benoît describes.
Here's my code. It converts multiple Excel workbooks, each with multiple sheets, and extracts the results of the specific formulas in the sheets I need to process. See the code following lines 136 and 145.
**Code. Updated on 5/7 with bug fixes and code to detect and extract data from specific formulas **
cls
#region Functions
Function Remove-WhiteSpaceFromNonQuoted($inString)
{
$quoted = $false
$newString = ""
for ($i = 0; $i -lt $inString.Length; $i++)
{
if ($inString[$i] -eq "`"")
{
$quoted = $quoted -xor $true
}
if (($inString[$i] -match "\S" -and !$quoted) -or ($quoted))
{
$newString = $newString + $inString[$i]
}
}
return $newString
}
#endregion
$sortedFieldNameList = New-Object -TypeName System.Collections.SortedList
$fqBookNames = New-Object -TypeName System.Collections.SortedList
$fqBookNames.Add("C:\foo\bar1.xlsx", "")
$fqBookNames.Add("C:\foo\bar2.xlsx", "")
$fqBookNames.Add("C:\foo\barN.xlsx", "")
$global:workBook = $null
$global:excel = $null
try
{
$global:excel = New-Object -Com Excel.Application
$global:excel.Visible = $false
write-host ("Scan for column names")
#Scan all sheets in all books and create an object with all the column names encountered
foreach ($fqBookName in $fqBookNames.Keys)
{
$global:workBook = $global:excel.Workbooks.Open($fqBookName)
foreach ($sheet in $global:workBook.Sheets)
{
$columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
write-host ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name)
$rowOne = $sheet.Rows(1)
for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
{
$columnName = $rowOne.Cells($columnIndex).Text.Trim().ToUpper()
if ($columnName.Length -gt 0)
{
if (!$sortedFieldNameList.ContainsKey($columnName))
{
$sortedFieldNameList.Add($columnName, "")
}
}
else
{
break
}
}
}
$global:workBook.Close($false)
Clear-Variable workBook
}
#Create a class that represents the worst-case collection of columns that will be output to, e.g., a grid or CSV file
#https://stackoverflow.com/questions/49117127/create-a-class-with-dynamic-property-names-in-powershell
Invoke-Expression @"
Class ClsExportCsv {
$(($sortedFieldNameList.Keys).ForEach({"[string] `${$($_)}`n "}))
}
"@
#create array to hold list of rows that will be output to, e.g., a grid or CSV file
$itemList = New-Object System.Collections.ArrayList
$itemList.clear()
write-host ("Scan for data")
foreach ($fqBookName in $fqBookNames.Keys)
{
$global:workBook = $global:excel.Workbooks.Open($fqBookName)
foreach ($sheet in $global:workBook.Sheets)
{
write-host -NoNewline ("Workbook=" + $global:workBook.Name + ". Sheet=" + $sheet.Name + ". Rows=")
$columnNameLookup = @{}
$columnNameLookup.Clear()
$columnIndexMax = $sheet.UsedRange.Column + $sheet.UsedRange.Columns.Count - 1
$rowOne = $sheet.Rows(1)
#create column name index lookup table for this sheet
for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
{
$columnNameLookup.Add($columnIndex, $rowOne.Cells($columnIndex).Text.Trim().ToUpper())
}
for ($rowIndex = 2; $rowIndex -le $sheet.Cells.EntireRow.Count; $rowIndex++)
{
$rowCurrent = $sheet.Rows($rowIndex)
if (($rowCurrent.Cells(1).Text).Length -gt 0)
{
$listRow = New-Object -TypeName ClsExportCsv
for ($columnIndex = 1; $columnIndex -le $columnIndexMax; $columnIndex++)
{
if (($columnNameLookup.$columnIndex).Length -gt 0)
{
$cellObject = $rowCurrent.Cells($columnIndex)
$textFromFormula = ""
if ($cellObject.HasFormula)
{
$formulaNoWhiteSpace = Remove-WhiteSpaceFromNonQuoted -inString $cellObject.Formula
#detect and parse cells with =HYPERLINK(CONCATENATE("http://xxxx.aspx?pid=",A2),"Click Here")
if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(CONCATENATE\(\")(?<URL>.*)(?:\"\,)(?<A1>.*)(?:\)\,.*)$')
{
if (($Matches["URL"] -ne $null) -and ($Matches["A1"] -ne $null))
{
$textFromFormula = ($Matches["URL"] + $sheet.Range($Matches["A1"]).Text)
}
}
#detect and parse cells with =HYPERLINK("http://xxxx","Click Here")
if ($formulaNoWhiteSpace -match '^(?:\=HYPERLINK\(\")(?<URL>.*)(?:\"\,\".*\"\))$')
{
if ($Matches["URL"] -ne $null)
{
$textFromFormula = $Matches["URL"]
}
}
}
if ($textFromFormula.Length -eq 0)
{
$listRow.($columnNameLookup.$columnIndex) = $rowCurrent.Cells($columnIndex).Text.Trim()
}
else
{
$listRow.($columnNameLookup.$columnIndex) = $textFromFormula
}
} # if (($columnNameLookup.$columnIndex).Length -gt 0)
} # for ($columnIndex = 1; ...
$itemList.Add($listRow) | out-null
}
else
{
write-host ($rowIndex - 2).ToString()
break
}
} # for ($rowIndex = 2; .....
} # foreach ($sheet in $global:workBook.Sheets)
$global:workBook.Close($false)
Clear-Variable workBook
}
$global:excel.Quit()
Clear-Variable excel
$itemList | Export-CSV -LiteralPath "C:\Users\foo\combined.csv" -NoTypeInformation -Encoding UTF8 -Delimiter ',' $itemList | Out-GridView -Title "Rows"
}
finally
{
if ($global:excel -ne $null)
{
if ($global:workBook -ne $null)
{
$global:workBook.Close($false)
}
$global:excel.Quit()
Clear-Variable excel
}
}
It does not seem possible to directly get the adress generated by the Concatenate function, see for example Extract URL From Excel Hyperlink Formula.
Why a solution using regex such as the one below would not be appropriate?
$split = $currentCell.Formula -split 'CONCATENATE' | Select -Last 1 | %{$_ -replace `
'[" ()]','' -split ','}
$calculatedResult = $split[0] + $sheet.Range("$($split[1])").Text