djangohdfslivydjango-cron

I need help using django_cron


I am currently working with HDFS, Apache Livy and Django, the goal of this is to send a request to get some code running which is stored in HDFS and which calls Livy to create Batches. For now, everything is working, I have a basic wordcount stored in HDFS, with a .txt file, and on a htlm page I just have a simple button to click on to launch the whole process.

I succeed in creating the wordcount result, and my next step is to get informations from Livy, for instance the ID of the sessions (or batches) currently starting/running/dead/success some sort of callback, but I need the it to self actualize so I can know what states are every sessions in. To do so, I thought I could use Django-cron, therefore I can't manage to set it correctly. I have no errors but nothing more is happening. What am I missing ?

Currently working on Centos7 but I'm using a Conda environment in Python 3.6, with Django latest release, so are livy and HDFS (latest release)

Here are my current files :

livy.html

{% load static %}

<html>
<body>
<div id="div1">

{{result.sessions}}

</div>

<form action="#" method="get">
 <input type="text" name="mytextbox" />
 <input type="submit" class="btn" value="Click" name="mybtn">
</form>

</body>
</html>

views.py

from django.shortcuts import render
from django.http import HttpResponse
from django_cron import CronJobBase, Schedule
import wordcount, livy

# Create your views here.

class CheckIdCronJob(CronJobBase):
    RUN_EVERY_MINS = 1 # every minute

    schedule = Schedule(run_every_mins=RUN_EVERY_MINS)
    code = 'button.CheckIdCronJob'    # a unique code

    def index(request):
        if(request.GET.get('mybtn')):
            r = livy.send(request.GET.get('mytextbox')) #(/test/LICENSE.txt)
            return render(request,'button/livy.html', {'result':r})
        return render(request,'button/livy.html')

livy.py

import json, pprint, requests, textwrap

def send(inputText):
    host = 'http://localhost:8998'
    data = {"file":"/myapp/wordcount.py", "args":[inputText,"2"]}
    headers = {'Content-Type': 'application/json'}
    r = requests.post(host + '/batches', data=json.dumps(data), headers=headers)
    r = requests.get(host + '/batches' + '', data=json.dumps(data), headers=headers)
    return r.json()

Solution

  • What django-crontab does is just make it easy to write management commands that run a job and specify how often/when these jobs should run. You end up with one management command ./manage.py runcron that will check all your jobs and run them if needed.

    What it doesn't do is continuously runcron, which is what you actually need if you want to make sure your jobs run at the right moment. Basically, you want runcron to run every minute (or if the time is not that critical every 10 minutes) for example, so you still need to use some system daemon that will do that.

    crontab is available on CentOS and can be used for just that purpose. The installation of django-crontab shows you an example of how to create a crontab that will run runcron every 5 minutes:

    crontab -e
    */5 * * * * source /home/ubuntu/.bashrc && source /home/ubuntu/work/your-project/bin/activate && python /home/ubuntu/work/your-project/src/manage.py runcrons > /home/ubuntu/cronjob.log
    

    You have to adapt that to fit your use case:

    To check your crontab, run crontab -l (for the currently logged in user) or crontab -l -u user for a different user.