I have an program that uses the JExcel library to read in excel sheets and do some processing on them.
Once disturbing thing I've noticed (after being notified by one of our users) is that the JExcel seems to be force converting cells formated to be currency cells to use the $ symbol. I've done a lot of digging but I can't see where to go next. Skip to the bottom to see the crux of the issue.
In essence, we have this method:
import java.io.File;
import java.io.IOException;
import java.util.Locale;
import jxl.Cell;
import jxl.Sheet;
import jxl.Workbook;
import jxl.WorkbookSettings;
import jxl.read.biff.BiffException;
public class Launcher {
/**
* @param args
*/
public static void main(String[] args) {
nothingSpecifc("D:\\Documents and Settings\\castone\\My Documents\\JExcelCurrencyExample.xls");
}
public static void nothingSpecifc(String excelFilePath){
try {
File myFile = new File(excelFilePath);
Locale myLocal = Locale.UK;
WorkbookSettings wbkSettings = new WorkbookSettings();
wbkSettings.setLocale(myLocal);
wbkSettings.setEncoding("UTF-8");
wbkSettings.setExcelDisplayLanguage("UK");
Workbook workbook = Workbook.getWorkbook(myFile,wbkSettings);
Sheet mySheet = workbook.getSheet(0);
Cell[] myRow = mySheet.getRow(0);
String myCellA1 = myRow[0].getContents();
System.out.println(myCellA1);
} catch (BiffException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
My example .xls file has this value in A1:
But after running, I get this value out!
Now I can't fathom why this would happen. The disturbing thing is, I'm worried that any fix I put in place would force the conversion of other currency symbols into the £ sign.
Looking into Workbook.java, I'm calling this method:
/**
* A factory method which takes in an excel file and reads in the contents.
*
* @exception IOException
* @exception BiffException
* @param file the excel 97 spreadsheet to parse
* @param ws the settings for the workbook
* @return a workbook instance
*/
public static Workbook getWorkbook(java.io.File file, WorkbookSettings ws)
throws IOException, BiffException
{
FileInputStream fis = new FileInputStream(file);
// Always close down the input stream, regardless of whether or not the
// file can be parsed. Thanks to Steve Hahn for this
File dataFile = null;
try
{
dataFile = new File(fis, ws);
}
catch (IOException e)
{
fis.close();
throw e;
}
catch (BiffException e)
{
fis.close();
throw e;
}
fis.close();
Workbook workbook = new WorkbookParser(dataFile, ws);
workbook.parse();
return workbook;
}
So in essence, this class is just a clever wrapper/interface between file I/O handling and excel. I'm looking into the Default constructor for WorkbookSettings, and WorkbookParser.
First, WorkbookSettings:
/**
* Default constructor
*/
public WorkbookSettings()
{
initialFileSize = DEFAULT_INITIAL_FILE_SIZE;
arrayGrowSize = DEFAULT_ARRAY_GROW_SIZE;
localeFunctionNames = new HashMap();
excelDisplayLanguage = CountryCode.USA.getCode();
excelRegionalSettings = CountryCode.UK.getCode();
refreshAll = false;
template = false;
excel9file = false;
windowProtected = false;
hideobj = HIDEOBJ_SHOW_ALL;
// Initialize other properties from the system properties
try
{
boolean suppressWarnings = Boolean.getBoolean("jxl.nowarnings");
setSuppressWarnings(suppressWarnings);
drawingsDisabled = Boolean.getBoolean("jxl.nodrawings");
namesDisabled = Boolean.getBoolean("jxl.nonames");
gcDisabled = Boolean.getBoolean("jxl.nogc");
rationalizationDisabled = Boolean.getBoolean("jxl.norat");
mergedCellCheckingDisabled =
Boolean.getBoolean("jxl.nomergedcellchecks");
formulaReferenceAdjustDisabled =
Boolean.getBoolean("jxl.noformulaadjust");
propertySetsDisabled = Boolean.getBoolean("jxl.nopropertysets");
ignoreBlankCells = Boolean.getBoolean("jxl.ignoreblanks");
cellValidationDisabled = Boolean.getBoolean("jxl.nocellvalidation");
autoFilterDisabled = !Boolean.getBoolean("jxl.autofilter");
// autofilter currently disabled by default
useTemporaryFileDuringWrite =
Boolean.getBoolean("jxl.usetemporaryfileduringwrite");
String tempdir =
System.getProperty("jxl.temporaryfileduringwritedirectory");
if (tempdir != null)
{
temporaryFileDuringWriteDirectory = new File(tempdir);
}
encoding = System.getProperty("file.encoding");
}
catch (SecurityException e)
{
logger.warn("Error accessing system properties.", e);
}
// Initialize the locale to the system locale
try
{
if (System.getProperty("jxl.lang") == null ||
System.getProperty("jxl.country") == null)
{
locale = Locale.getDefault();
}
else
{
locale = new Locale(System.getProperty("jxl.lang"),
System.getProperty("jxl.country"));
}
if (System.getProperty("jxl.encoding") != null)
{
encoding = System.getProperty("jxl.encoding");
}
}
catch (SecurityException e)
{
logger.warn("Error accessing system properties.", e);
locale = Locale.getDefault();
}
}
Nothing here seems off, I've set the Encoding, Local and DisplayLanguage to UK specific/compatible ones. Regional settings are already UK by default.
Now for the WorkbookParser:
public class WorkbookParser extends Workbook
implements ExternalSheet, WorkbookMethods
{
/**
* The logger
*/
private static Logger logger = Logger.getLogger(WorkbookParser.class);
/**
* The excel file
*/
private File excelFile;
/**
* The number of open bofs
*/
private int bofs;
/**
* Indicates whether or not the dates are based around the 1904 date system
*/
private boolean nineteenFour;
/**
* The shared string table
*/
private SSTRecord sharedStrings;
/**
* The names of all the worksheets
*/
private ArrayList boundsheets;
/**
* The xf records
*/
private FormattingRecords formattingRecords;
/**
* The fonts used by this workbook
*/
private Fonts fonts;
/**
* The sheets contained in this workbook
*/
private ArrayList sheets;
/**
* The last sheet accessed
*/
private SheetImpl lastSheet;
/**
* The index of the last sheet retrieved
*/
private int lastSheetIndex;
/**
* The named records found in this workbook
*/
private HashMap namedRecords;
/**
* The list of named records
*/
private ArrayList nameTable;
/**
* The list of add in functions
*/
private ArrayList addInFunctions;
/**
* The external sheet record. Used by formulas, and names
*/
private ExternalSheetRecord externSheet;
/**
* The list of supporting workbooks - used by formulas
*/
private ArrayList supbooks;
/**
* The bof record for this workbook
*/
private BOFRecord workbookBof;
/**
* The Mso Drawing Group record for this workbook
*/
private MsoDrawingGroupRecord msoDrawingGroup;
/**
* The property set record associated with this workbook
*/
private ButtonPropertySetRecord buttonPropertySet;
/**
* Workbook protected flag
*/
private boolean wbProtected;
/**
* Contains macros flag
*/
private boolean containsMacros;
/**
* The workbook settings
*/
private WorkbookSettings settings;
/**
* The drawings contained in this workbook
*/
private DrawingGroup drawingGroup;
/**
* The country record (containing the language and regional settings)
* for this workbook
*/
private CountryRecord countryRecord;
private ArrayList xctRecords;
/**
* Constructs this object from the raw excel data
*
* @param f the excel 97 biff file
* @param s the workbook settings
*/
public WorkbookParser(File f, WorkbookSettings s)
{
super();
excelFile = f;
boundsheets = new ArrayList(10);
fonts = new Fonts();
formattingRecords = new FormattingRecords(fonts);
sheets = new ArrayList(10);
supbooks = new ArrayList(10);
namedRecords = new HashMap();
lastSheetIndex = -1;
wbProtected = false;
containsMacros = false;
settings = s;
xctRecords = new ArrayList(10);
}
I can't see anything here that would affect the creation/access of the workbook.
Looking further down, through the sheets and cells, I found two interesting things:
positivePrefix|"$" (id=709)
So my question is, when the getContents()
method is run on a cell, why does it not return with the currency (giving, I suppose £$#####, as it still has the postivePrefix) and why would the positive prefix of this be set to $
anyway?
I can't see where cells are created in the src document, or where the getContents() method is implemented (I found it's declaration) so I can't dig any further, but I was hoping someone else knew the cause of this 'issue', or at least a work around?.
I can't figure out why this is happening. But if you can't fix the system, hack it!
I'm getting returns in one of the following forms, for the cell 'value' of 100:
100
- Plain number[£$ -42] 100
- Prefix Currency not recognised100 [£$ -42]
- Postfix Currency not recognised"?" 100
- Prefix Symbol not renderable100 "?"
- Postfix Symbol not renderableSo I decided I should regex the number out of what cell.getContent()
returns, once I determine I have a number.
public static boolean isNumberFormat(Cell cell){
CellType myCellType = cell.getType();
return myCellType.equals(CellType.NUMBER);
}
I got the 'style' of regex form text2re, for each of the above, and then played with debuggex until I had combined them. This resulted in the following regex
expression:
(?:\"..*?\"|\[.*?\])*?\s*(\d+,?\d*?\.\d*)\s*(?:\[..*?]|".*?")*?
Which looks like a nightmare, but I'll break it down, as in my code:
String digits="(\\d+,?\\d?\\.\\d*)";
One or more digit, with commas, and decimal places.
String sbraceGroup="\\[.*?\\]";
The group that matches zero or more characters enclosed in square braces
String quotationGroup="\"..*?\"";
The group that matches zero or more characters enclosed in quotation marks
String junk = "(?:"+sbraceGroup+"|"+quotationGroup+")*?";
A string for the regex stuff we don't want, using (?:.....)
to enclose it as a non-matching group, and *?
for an ungreedy zero->many match.
I can now compile my pattern:
Pattern p = Pattern.compile(junk+"\\s*"+digits+"\\s*"+junk,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Where the \s*
is zero->many whitespace characters (tabs, spaces etc.).
Now all together:
String myCellA1 = myRow[rowCounter].getContents();
String digits="(\\d+,?\\d?\\.\\d*)"; // Non-greedy match on digits
String sbraceGroup="\\[.*?\\]"; // non-greedy match on square Braces 1
String quotationGroup="\"..*?\""; // non-greedy match on things in quotes
String junk = "(?:"+sbraceGroup+"|"+quotationGroup+")*?";
Pattern p = Pattern.compile(junk+"\\s*"+digits+"\\s*"+junk,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(myCellA1);
String match = "";
if (m.find())
{
match=m.group(1).toString();
}else
{
match = myCellA1;
}
And now I have solved my problem(s)