i have a XLSX file with this content
I have downloaded tika-app for testing:
java -jar tika-app-2.9.2.jar --metadata test.xlsx
Content-Length: 9217
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
X-TIKA:Parsed-By: org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By: org.apache.tika.parser.microsoft.ooxml.OOXMLParser
X-TIKA:origResourceName: C:\Users\users\Documents\
dc:creator: daniele grillo
dc:publisher:
dcterms:created: 2024-04-17T07:44:01Z
dcterms:modified: 2024-04-17T13:58:35Z
extended-properties:AppVersion: 16.0300
extended-properties:Application: Microsoft Excel
extended-properties:Company:
extended-properties:DocSecurityString: None
meta:last-author: daniele grillo
protected: false
resourceName: test.xlsx
So i run the command
java -jar tika-app-2.9.2.jar --text test.xlsx
and this is the output
Foglio1
date name
2/9/72 one
2/10/98 two
1/3/09 three
1/1/00 four
4/11/00 five
I have read know that is possibile to pass a tika-config.xml for manipulate the parser whith this:
java -jar /tika-app-2.9.2.jar --text test.xlsx --config=tika-config.xml
Becase for the date I would the output like: dd/mm/yyyy like in .XLSX format
Is possible? If yes how?
I tried to use this tika-config.xml but the output is the same:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<mime>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</mime>
<parser-exclude class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser"/>
</parser>
</parsers>
<dateFormats>
<dateFormat>dd/MM/yyyy</dateFormat>
</dateFormats>
</properties>
OOXMLParser
has the setDateFormatOverride(String)
method inherited from an AbstractOfficeParser
.
This parameter can be set within the <params>
of a parser.
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
<parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
<params>
<param name="dateFormatOverride" type="string">dd/mm/yyyy</param>
</params>
</parser>
</parsers>
</properties>
Note: --config
option should be specified before the --text
option:
java -jar tika-app-2.9.2.jar --config=tika-config.xml --text test.xlsx