I am currently working with HDFS, Apache Livy and Django, the goal of this is to send a request to get some code running which is stored in HDFS and which calls Livy to create Batches. For now, everything is working, I have a basic wordcount stored in HDFS, with a .txt file, and on a htlm page I just have a simple button to click on to launch the whole process.
I succeed in creating the wordcount result, and my next step is to get informations from Livy, for instance the ID of the sessions (or batches) currently starting/running/dead/success some sort of callback, but I need the it to self actualize so I can know what states are every sessions in. To do so, I thought I could use Django-cron, therefore I can't manage to set it correctly. I have no errors but nothing more is happening. What am I missing ?
Currently working on Centos7 but I'm using a Conda environment in Python 3.6, with Django latest release, so are livy and HDFS (latest release)
Here are my current files :
{% load static %}
<div id="div1">
<form action="#" method="get">
<input type="text" name="mytextbox" />
<input type="submit" class="btn" value="Click" name="mybtn">
from django.shortcuts import render
from django.http import HttpResponse
from django_cron import CronJobBase, Schedule
import wordcount, livy
# Create your views here.
class CheckIdCronJob(CronJobBase):
RUN_EVERY_MINS = 1 # every minute
schedule = Schedule(run_every_mins=RUN_EVERY_MINS)
code = 'button.CheckIdCronJob' # a unique code
def index(request):
r = livy.send(request.GET.get('mytextbox')) #(/test/LICENSE.txt)
return render(request,'button/livy.html', {'result':r})
return render(request,'button/livy.html')
import json, pprint, requests, textwrap
def send(inputText):
host = 'http://localhost:8998'
data = {"file":"/myapp/wordcount.py", "args":[inputText,"2"]}
headers = {'Content-Type': 'application/json'}
r = requests.post(host + '/batches', data=json.dumps(data), headers=headers)
r = requests.get(host + '/batches' + '', data=json.dumps(data), headers=headers)
return r.json()
What django-crontab does is just make it easy to write management commands that run a job and specify how often/when these jobs should run. You end up with one management command ./manage.py runcron
that will check all your jobs and run them if needed.
What it doesn't do is continuously runcron
, which is what you actually need if you want to make sure your jobs run at the right moment. Basically, you want runcron
to run every minute (or if the time is not that critical every 10 minutes) for example, so you still need to use some system daemon that will do that.
is available on CentOS and can be used for just that purpose. The installation of django-crontab shows you an example of how to create a crontab that will run runcron
every 5 minutes:
crontab -e
*/5 * * * * source /home/ubuntu/.bashrc && source /home/ubuntu/work/your-project/bin/activate && python /home/ubuntu/work/your-project/src/manage.py runcrons > /home/ubuntu/cronjob.log
You have to adapt that to fit your use case:
If you just do crontab -e ...
the job will run as the user you're currently logged in as. That might not be the right user to run the manage.py
command, since that user needs to have the correct permissions to run your project. Use -u user
to make the crontab for a different user.
This is actually the complicated thing when running in production: Getting user permissions correct and getting the right user to run the various tasks. Normally you'd have a www-data
or apache
user that's running your server (and hence django app) and you want that same user to run the manage.py
command. It should not be root
running apache as that opens up security risks (your web server would have full access to the entire system).
. Change this appropriately.source
earlier) or by passing the --settings path.to.settings
option to manage.py
at the end so that cron errors (stderr) are also directed to that same log.To check your crontab, run crontab -l
(for the currently logged in user) or crontab -l -u user
for a different user.