Problem understanding python package structure and how to use it to trigger python wheel task in Databricks .
So, it could either be something fundamental related to python packages/modules that I misunderstand or something specific to databricks. I have tried multiple options but none work.
So,jumping in,
I would like to call triggerjob function in createtables.py
using
package_name: dbxdemo and entry_point jobs.createexternaltables.createtables.triggerjob
I have also tried using
package_name: dbxdemo.jobs.createexternaltables.createtables and entry_point: triggerjob
my package structure is
dbxdemo
|--jobs
|--createexttables
|---__init__.py
|---createtables.py
|--sample
|--__init__.py
|--entrypoint.py
|--__init__.py
|--common.py
|--__init__.py
Then I updated my init.py files in the various subfolders as follows
# dbxdemo/__init.py
from . import jobs
__all__=['jobs']
__version__ = "0.0.1"
# dbxdemo/jobs/__init__.py
from . import createexternaltables
from . import sample
__all__=['createexternaltables', 'sample']
# dbxdemo/jobs/createexternaltables/__init__.py
from .createtables import *
The createtables.py file has this sample code
import logging
#import dbxdemo.common
from dbxdemo.common import Job
class CreateExternalTable(Job):
def launch(self):
try:
#do something
except Exception as e:
#do logging
def triggerjob(): #created this outside the class to see if that helps, but no (ideally would want this to be part of the class_
job = CreateExternalTable()
job.launch()
When I try to create a databricks python wheel task and provide the package name as
dbxdemo
and entry_point as
jobs.createexternaltables.createtables.triggerjob
I keep getting an error that
module 'dbxdemo' has no attribute 'jobs'
I have also gone through other S.O posts and tried various combinations.
I have also tried putting the package name as dbxdemo.jobs.createexternaltables.createtables
and entry_point as triggerjob
but even that does not work
In addition , I have also tried changing setup.py (look at the comment)
from setuptools import find_packages, setup
from dbxdemo import __version__
setup(
name="dbxdemo.jobs.createexternaltables.createtables", #earlier also tried with dbxdemo
packages=find_packages(exclude=["tests", "tests.*"]),
setup_requires=["wheel"],
version=__version__,
description="",
author=""
)
P.S: If the problem is databricks specific then this is the dbx documentation I have been following here
I have a feeling this is probably databricks related as I can install the library manually and call this successfully
import dbxdemo
dbxdemo.jobs.createexternaltables.createtables.triggerjob()
Found the answer. Few things
from setuptools import find_packages, setup
from dbxdemo import __version__
setup(
name="dbxdemo",
packages=find_packages(exclude=["tests", "tests.*"]),
setup_requires=["wheel"],
version=__version__,
description="",
author="",
entry_points={
'console_scripts': [
'triggerjob = dbxdemo.jobs.createexternaltables.createtables:triggerjob',
],
}
)
Once this was done, the package_name can be set to dbxdemo
and the entry_point as triggerjob
when creating a Databricks python wheel task.
P.S: for anyone interested in doing through dbx, your deployment.yaml should be
- name: "dbxdemowhl"
<<:
- *basic-static-cluster
python_wheel_task:
package_name: "dbxdemo"
entry_point: triggerjob
parameters: [] # This must be passed even if empty as dbx execute would error out