nlpchinese-localemachine-translationopencc

OpenCC Not Accepting Any Segmentation Type


I've been trying to use OpenCC on Mac OS Sequoia. The default conversion works without any issues, however, trying to use a custom glossary (I've successfully created my .json and .ocd2 files), I keep running into the following error:

$ opencc -c "/Users/f/Downloads/zh-test/custom-config.json"        -i "/Users/f/Downloads/zh-test/source-cn.txt"        -o "/Users/f/Downloads/zh-test/target-tw.txt"

Invalid format: Unknown segmentation type: default

I've tried simple, text, and none, and the outcome is the same.

My custom-config.json file:

{
  "name": "custom",
  "description": "Custom CN to TW glossary",
  "segmentation": {
    "type": "default"
  },
  "conversion_chain": [
    {
      "dict": {
        "type": "ocd2",
        "file": "custom.ocd2"
      }
    }
  ]
}

I tried removing the segmentation section all together, but then it complains about missing segmentation.

I also tried removing the installation, and building it manually instead of Homebrew, and it was the same. The latest build is clean and up-to-date.

What am I doing wrong, and how can I fix it?


Solution

  • Thanks to the OpenCC community, I found the solution for this:

    {
      "name": "custom",
      "segmentation": {
        "type": "mmseg",
        "dict": {
          "type": "ocd2",
          "file": "custom.ocd2"
        }
      },
      "conversion_chain": [
        {
          "dict": {
            "type": "ocd2",
            "file": "custom.ocd2"
          }
        }
      ]
    }