batch-processingfilenet-p8filenet-content-engine

FileNet Bulk Search and Update


I have a requirement wherein, I have to update document metadata for millions of documents in the objectstore. So I wrote a simple java stand-alone with the below approach

SearchSQL documentSearchSQL = new SearchSQL();
String selectQuery = "Id ";
String classSymbolicName="Document_Class_Name";
String myAlias1 = "r";
String whereClause="r.Document_Type_Code='DIRMKTGDOC'and VersionStatus=1"
boolean subClassesToo=false;
documentSearchSQL.setSelectList(selectQuery);
documentSearchSQL.setFromClauseInitialValue(classSymbolicName, myAlias1, subClassesToo);
documentSearchSQL.setWhereClause(whereClause);

UpdatingBatch updatingBatch =null;
SearchScope searchScope = new SearchScope(p8ObjectStore);
RepositoryRowSet rowSet = searchScope.fetchRows(documentSearchSQL, new Integer(10000), null, new Boolean(true));
PageIterator pageIterator = rowSet.pageIterator();
RepositoryRow row;
Document document = null;

while(pageIterator.nextPage()){
Object[] rowArray = pageIterator.getCurrentPage();
updatingBatch = UpdatingBatch.createUpdatingBatchInstance(p8ObjectStore.get_Domain(),RefreshMode.NO_REFRESH); 
for (int i = 0; i < rowArray.length; i++) {
row= (RepositoryRow)rowArray[i];
Properties documentProps = row.getProperties();
document = Factory.Document.fetchInstance(p8ObjectStore, documentProps.getIdValue("Id"), null);
// I have the metadata symbolic name and its values within HashMap. So iterating Map to set the values
for(Map.Entry<String, ArrayList<String>> documentMetadata : documentMetadataValues.entrySet()){
document.getProperties().putObjectValue(documentMetadata.getKey(), documentMetadata.getValue().get(1));
}
updatingBatch.add(document, null);
}
updatingBatch.updateBatch();

When I ran a query on docVersion, I found around 700K documents matching the criteria and was expecting all of them to get updated. When I ran the program, it updated about 390k documents and then gave error

com.filenet.api.exception.EngineRuntimeException: FNRCA0031E: API_UNABLE_TO_USE_CONNECTION: The URI for server communication cannot be determined from the connection object http://server:port/wsi/FNCEWS40MTOM. Message was: Connection refused: connect

Is there a better way to achieve this? Also, I will be using, component queue to run this tool in production.


Solution

  • You have 2 better options actually to do this, either by using the script-based bulk actions or the sweeps.


    Bulk Actions

    You can apply bulk actions to the search results of a query. The application of these actions occurs either while the query runs or after the query runs.

    1. Access the object store search in the administration console
    2. On the SQL view tab, enter an appropriate query.
    3. On the Bulk Actions tab, select Enable.
    4. In the Script section, select Run script.
    5. Enter your JavaScript code into the Script field. For more information, see an example below.
    6. Click Run. The administration console runs the query and the JavaScript action.

    importClass(Packages.com.filenet.api.property.Properties);
    importClass(Packages.com.filenet.api.constants.RefreshMode);
    
    function OnCustomProcess(CEObject) {
      CEObject.refresh();
      CEObject.getProperties().putValue("DocumentTitle", "Test1");
      CEObject.save(RefreshMode.REFRESH);
    }

    For more on this you can check the knowledge center here


    Custom Sweep Job

    Alternatively you can use the a custom sweep job. A sweep is an instance of a background service that you configure to process objects in a database table. If an object meets a configured criteria, the sweep performs an action on the object. The sweep consists of a sweep action and a sweep job

    1. In the domain navigation pane, click the object store. In the object store navigation pane, right-click the Sweep Management > Sweep Actions folder and click New Sweep Action.
    2. Select action type. for this example, lets select Java script, an example is listed below, and finish the wizard.
    3. In the domain navigation pane, select the object store. In the object store navigation pane, select the Sweep Management > Job Sweeps > Custom Jobs folder and Click New, refer to the action we just created, and finish the wizard. Now you are all set! run the sweep job

    importPackage(Packages.com.filenet.api.core);
    importPackage(Packages.com.filenet.api.constants);
    importPackage(Packages.com.filenet.api.exception);
    importPackage(Packages.com.filenet.api.sweep);
    importPackage(Packages.com.filenet.api.engine);
    
    // Implement for custom job and queue sweeps.
    function onSweep(sweepObject, sweepItems) {
      var hcc = HandlerCallContext.getInstance();
      hcc.traceDetail("Entering CustomSweepHandler.onSweep");
      hcc.traceDetail("sweepObject = " +
        sweepObject.getProperties().getIdValue(PropertyNames.ID) +
        "sweepItems.length = " + sweepItems.length);
    
      // Iterate the sweepItems and change the class.
      idx = 0;
      for (idx = 0; idx < sweepItems.length; idx++) {
        // At the top of your loop, always check to make sure 
        // that the server is not shutting down. 
        // If it is, clean up and return control to the server.
        if (hcc != null && hcc.isShuttingDown()) {
          throw new EngineRuntimeException(ExceptionCode.E_BACKGROUND_TASK_TERMINATED,
            this.constructor.name + " is terminating prematurely because the server is shutting down");
        }
    
        var item = sweepItems[idx].getTarget();
        String msg = "sweepItems[" + idx + "]= " + item.getProperties().getIdValue("ID");
        hcc.traceDetail(msg);
    
        try {
          var CEObject = Document(item);
          CEObject.getProperties().putValue("DocumentTitle", "Test1");
          CEObject.save(RefreshMode.NO_REFRESH);
    
          // Set outcome to PROCESSED if item processed successfully.
          sweepItems[idx].setOutcome(SweepItemOutcome.PROCESSED,
            "item processed by " + this.constructor.name);
        }
        // Set failure status on objects that fail to process.
        catch (ioe) {
          sweepItems[idx].setOutcome(SweepItemOutcome.FAILED, "CustomSweepHandler: " +
            ioe.rhinoException.getMessage());
        }
      }
      hcc.traceDetail("Exiting CustomSweepHandler.onSweep");
    }
    
    /* 
     * Called automatically when the handler is invoked by a custom sweep job 
     * or sweep policy. Specify properties required by the handler, if any.
     * If you return an empty array, then all properties are fetched.
     */
    function getRequiredProperties() {
      var pnames = ['Id', 'DocumentTitle'];
      return pnames.toString();
    }

    For more on the sweep jobs, please check the link here