hadoopfalcon

Falcon, hadoop core concepts


Im quite a frontend biased dev so some backend weird concepts are sometimes hard to me to understand, specially outside the js world (I know node, express backend).

I will have to develop a frontend for falcon - hadoop, The UI should allow creating a new feed. Users shall be allowed to define a process workflow, The users must define or create at least one cluster prior to creating feed or process.

Those are some req I received

My questions are:

Does the Feed entity behaves like a document or object?, lets say similar to a json object?

Are the clusters just different places where different tasks are made? (I mean in the hadoop, falcon way of life)

The process entity is just the lifecycle of tasks to perform in a feed entity?

And a cluster entity, is just a separate group of tasks?

I know that there is a REST api to communicate with that backend, will that be enough to manage feeds, clusters, etc? or there are limitations?

EDIT

To ysr answer I want to add as the time went through I have a more precise understanding,

entities (feed, process, cluster) are defined, submitted and getted in xml format, entities specification here http://falcon.apache.org/EntitySpecification.html

the REST API is http://falcon.apache.org/restapi/ResourceList.html and you are able to manage your entities lifecycle using it.

From the front end perspective I didnt need to know much more than that.


Solution

  • Falcon basically uses three types of entities.

    1.Cluster 2.Feed 3.Process

    Cluster - basically contains system wide properties like hdfs endpoint, job-tracker endpoint, yarn endpoint (if you are using yarn), oozie endpoint, activemq endpoint

    Feed - relates to data. A feed definition contains information like data path, frequency of availability of data, retention & replication details.

    Process - relates to a job that runs at a particular frequency. A process will consume one or more feeds and generate another feed. A process definition will contain information like frequency with which the job will run, the range of inputs it will consume, the output it will generate, the workflow definition path among others.

    And Falcon contains sufficient REST apis to communicate with the server. There are no such limitations as such currently. If you find any, we(falcon-dev) would be more than happy to incorporate your changes.