I have a fileSystem datasource, and i have created a dataconfig for it to run DIH the dataconfig is
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="FileDataSource" />
<document>
<entity name="pdf" processor="FileListEntityProcessor" baseDir="/path/to/my/pdf" fileName=".*pdf" newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false" dataSource="pdf">
</entity>
</document>
</dataConfig>
and when i run the DIH, it gives
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 0, Fetched: 35924, Skipped: 0, Processed: 0
Any idea why it didn't process any document?
Thanks, I did it, and below is the needed dataconfig
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
<dataSource type="BinFileDataSource" />
<document>
<entity name="pdf" processor="FileListEntityProcessor" baseDir="/path/to/my/pdf" fileName=".*pdf" newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false" dataSource="null">
<field column="fileAbsolutePath" name="id" />
<entity name="documentImport" processor="TikaEntityProcessor" url="${pdf.fileAbsolutePath}" format="text">
<field column="text" name="text"/>
</entity>
</entity>
</document>
</dataConfig>