I am writing a Python config script that creates an array of input files in a domain-specific language (DSL), so my use case is a bit unusual. In this scenario, we want medium-level users to be able to edit the RandomRequest class / create various other classes following similar patterns, that will be used by the input files generator. This way, the middle-level user does not need to edit the core part of the input file generation, even when writing new models in the DSL ; they just need to create the Python objects describing the DSL objects they defined, and the core files translate this accordingly.
The MWE I have for the file that the middle-level users will have to edit to match their use case is as follows:
from enum import Enum
import typing as tp
import dataclasses as dc
import random
class State(Enum):
INC = "incoming"
OUT = "outgoing"
SLEEP = "sleeping"
def random_generator_from_enum[T: Enum](E: type[T]) -> tp.Callable[[], T]:
"""
Returns a function that, when called, returns a random state of the Enum E.
"""
return lambda: random.choice(list(E))
avg_delays = {
State.INC: 10,
State.OUT: 20,
State.SLEEP: 100
}
def delays_from_state(state: State) -> tp.Callable[[], int]:
"""
Returns a function that, when called, returns a random delay in seconds
centered around avg_delays[state].
"""
return lambda: int(random.gauss(avg_delays[state]))
# not working
@dc.dataclass
class RandomRequest:
state: State = dc.field(default_factory=random_generator_from_enum(State))
delay: int = dc.field(default_factory=delays_from_state(state))
if __name__ == '__main__':
# The core generator will create and handle many `RandomRequest` instances.
print(RandomRequest())
This is what I would like to do. Of course, it doesn't work because in RandomRequest, I try to use the state variable that is not defined yet. Same issue obviously arises if I try to use self.state, cls.state, or workarounds based on default values instead of default factories. The usual way to handle this is to use __post_init__:
@dc.dataclass
class RandomRequest:
state: State = dc.field(default_factory=random_generator_from_enum(State))
delay: int = dc.field(init=False)
def __post_init__(self):
self.delay = delays_from_state(self.state)()
However, as the class must be edited and maintained by middle-level users, and as the number of request properties and possible factory functions for each property can grow arbitrarily large, this would make it quite tedious to read and maintain for these users, while a syntax similar to the one I wanted to use above keeps it simpler, with all lines to edit in the same place, and one line per custom property. Using __post_init__ in my usecase quickly makes the result look like this, which is very error-prone (and I didn't even use multiple Enums or default_factories:
@dc.dataclass
class RandomRequest:
state_ini: State = dc.field(default_factory=random_generator_from_enum(State))
state_aim: State = dc.field(default_factory=random_generator_from_enum(State))
state_req: State = dc.field(default_factory=random_generator_from_enum(State))
delay_ini: int = dc.field(init=False)
delay_aim: int = dc.field(init=False)
delay_req: int = dc.field(init=False)
delay_ini2: int = dc.field(init=False)
delay_aim2: int = dc.field(init=False)
delay_req2: int = dc.field(init=False)
delay_ini3: int = dc.field(init=False)
delay_aim3: int = dc.field(init=False)
delay_req3: int = dc.field(init=False)
def __post_init__(self):
self.delay_ini = delays_from_state(self.state_ini)()
self.delay_aim = delays_from_state(self.state_aim)()
self.delay_req = delays_from_state(self.state_req)()
self.delay_ini2 = delays_from_state(self.state_ini)()
self.delay_aim2 = delays_from_state(self.state_aim)()
self.delay_req2 = delays_from_state(self.state_req)()
self.delay_ini3 = delays_from_state(self.state_ini)()
self.delay_aim3 = delays_from_state(self.state_aim)()
self.delay_req3 = delays_from_state(self.state_req)()
instead of the alternative I would like to be able to use:
# not working
@dc.dataclass
class RandomRequest:
state_ini: State = dc.field(default_factory=random_generator_from_enum(State))
state_aim: State = dc.field(default_factory=random_generator_from_enum(State))
state_req: State = dc.field(default_factory=random_generator_from_enum(State))
delay_ini: int = dc.field(default_factory=delays_from_state(state_ini))
delay_aim: int = dc.field(default_factory=delays_from_state(state_aim))
delay_req: int = dc.field(default_factory=delays_from_state(state_req))
delay_ini2: int = dc.field(default_factory=delays_from_state(state_ini))
delay_aim2: int = dc.field(default_factory=delays_from_state(state_aim))
delay_req2: int = dc.field(default_factory=delays_from_state(state_req))
delay_ini3: int = dc.field(default_factory=delays_from_state(state_ini))
delay_aim3: int = dc.field(default_factory=delays_from_state(state_aim))
delay_req3: int = dc.field(default_factory=delays_from_state(state_req))
I'm looking for possible workarounds to be closer to the desired syntax. I would like to keep a class structure for my requests instead of a function that returns a request, as they also have interesting inherited methods that help validating the provided config file by generating the appropriate test suite. However, the more I think about it, the less I believe I will be able to use dataclasses for this, although the simplicity of use of these structures was very adapted to my goals.
Is there still any way to make this work with dataclasses, or even regular classes, or do I need to completely change the way I intended to make this work?
The standard case for a dataclass is where the caller is able to explicitly pass in all of the field values, and those field values are all public. A typical call for the object you show might look like
req = RandomRequest(state=State.INC, delay=5)
With this call pattern, the default_factory will never be invoked.
If you expect that the normal call path will be to pass no parameters, then a data class probably isn't right. An ordinary class will have less boilerplate to set up.
# not a dataclass
class RandomRequest:
def __init__(self):
self.state = random.choice(list(State.__members__.values()))
self.delay = int(random.gauss(avg_delays[self.state]))
If the normal call path will pass all of the parameters, but you also need a way to create one with random values, then using a class method as a secondary constructor could also be a clean approach.
from dataclasses import dataclass
from typing import Self
@dataclass
class RandomRequest:
state: State
delay: int
@classmethod
def random(cls) -> Self:
state = random.choice(list(State.__members__.values()))
delay = int(random.gauss(avg_delays[state]))
return cls(state=state, delay=delay)
If you wanted to make this random construction generic across a range of dataclasses, you could use dataclasses.fields() to introspect a dataclass class and find its fields. You probably wouldn't normally need this, unless you're building some sort of automatic fuzzing tool (and even then I'd look for a prebuilt solution first).
The last example you show seems like it could benefit from refactoring. Using a non-dataclass would certainly work here: you'd only have to write out the properties once in the __init__() method, and you could use a super() call to inherit the base class's initializers. The classmethod approach would be a little harder to adapt, but it'd be possible (maybe passing along a **kwargs of additional parameters).
The three groups of identical fields with numbered suffixes suggests some reorganization might be appropriate. It seems like you have a group of states, and then multiple corresponding groups of delays. So if you have a way to compute the individual random values
class State(StrEnum):
INC = "incoming"
OUT = "outgoing"
SLEEP = "sleeping"
@classmethod
def random(cls) -> Self:
return random.choice(list(cls.__members__.values()))
def random_delay(self) -> int:
avg_delays = {
State.INC: 10,
State.OUT: 20,
State.SLEEP: 100
}
return int(random.gauss(avg_delays[self.state]))
class States:
def __init__(self):
self.ini = State.random()
self.ani = State.random()
self.req = State.random()
self.delays: list[Delays] = [Delays(self) for _ in range(3)]
class Delays:
def __init__(self, states: States):
self.ini = states.ini.random_delay()
self.ani = states.ani.random_delay()
self.req = states.req.random_delay()
states = States()
print(states.delays[0].ini)
Other refactorings are possible too; maybe it's more natural for your setup to pair the state and delay together, for example. It doesn't need to all be in one object, though, and in the last setup I think the very flat data structure is making it more complex than it needs to be.