apache-zookeeperdistributed-filesystem

Use zookeeper to distribute files over cluster


I have an API which creates a file based on user input. I need to distribute this file over a cluster, such that 1 file should be on 1 node only. Can I use zookeeper to achieve this, and how?

A user may want to delete the file. Which means that zookeeper needs to delete the file from the node, when asked to.

I've read through the zookeeper wiki, but it is difficult to understand how/when to use it.


Solution

  • Yes, you can do that using Apache Curator Recipes. Apache Curator is an improved client library for Apache Zookeeper.

    You can use zookeeper watch API and Curator's distributed lock to achieve what you want.

    1. First, when a file is created, create an ZNode (may be with the content of the file as data) under an ZNode which is being watched by all nodes in the cluster. Then, all the nodes in the cluster will be notified on the file creation.
    2. Then, nodes will compete to acquire a distributed lock and the node which acquires it will download the file (You can remove the previously created ZNode with file's content if you want or introduce another mechanism to keep track of taken files).

    Hope you got some idea.