pythonapache-stormapache-storm-topology

apache-storm mixed topology with python - ModuleNotFoundError: No module named 'storm'


I'm trying to create a mixed storm topology, which is using Java based spout and python based bolt.

For a python based bolt, I wrote a Java wrapper:

class PythonBolt extends ShellBolt implements IRichBolt {

    public PythonBolt() {
        super("python", "C:\\somepath\\sample.py");
    }
    @Override
    public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}

This is how my sample.py looks like:

import storm

class SplitSentenceBolt(storm.BasicBolt):
    def process(self, tup):
        print("Python rocks!")
        words = tup.values[0].split(" ")
        print(tup.values[0])

SplitSentenceBolt().run()

Then I put it all together and try to run via the following snippet:

public class SampleBolt {

    public static void main(String[] args) throws Exception {
        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("Hello", new RawDataLevelSpout(), 12);
        builder.setBolt("World", new PythonBolt(), 12);

        Config config = new Config();
        config.setDebug(true);


        LocalCluster cluster = new LocalCluster();
        cluster.submitTopology("Hello-World-BaiJian", config, builder.createTopology());
        Utils.sleep(100000);
        cluster.killTopology("Hello-World-BaiJian");
        cluster.shutdown();
    }
}

It all boots-up correctly, however, I get the following exception:

import storm
ModuleNotFoundError: No module named 'storm'

    at org.apache.storm.utils.ShellProcess.launch(ShellProcess.java:94) ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.apache.storm.task.ShellBolt.prepare(ShellBolt.java:154) ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.apache.storm.executor.bolt.BoltExecutor.init(BoltExecutor.java:84) ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:93) ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:45) ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
    at org.apache.storm.utils.Utils$2.run(Utils.java:329) ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]

Any hints on how to overcome this? How do I install that python storm package? Is it possible to install it through Anaconda (I failed to find the package)?


Solution

  • Just for the future folks - the message was very precise. I simply missed the storm.py in the same folder where my sample.py was located. Simply adding the following file: https://github.com/apache/storm/blob/v1.2.1/storm-multilang/python/src/main/resources/resources/storm.py into that folder resolved the issue! Is is also possible to run this code through a LocalCluster.