pythonoopscientific-computing

Object-oriented scientific data processing, how to cleverly fit data, analysis and visualization in objects?


As a biology undergrad i'm often writing python software in order to do some data analysis. The general structure is always :

There is some data to load, perform analysis on (statistics, clustering...) and then visualize the results.

Sometimes for a same experiment, the data can come in different formats, you can have different ways to analyses them and different visualization possible which might or not depend of the analysis performed.

I'm struggling to find a generic "pythonic" and object oriented way to make it clear and easily extensible. It should be easy to add new type of action or to do slight variations of existing ones, so I'm quite convinced that I should do that with oop.

I've already done a Data object with methods to load the experimental data. I plan to create inherited class if I have multiple data source in order to override the load function.

After that... I'm not sure. Should I do a Analysis abstract class with child class for each type of analysis (and use their attributes to store the results) and do the same for Visualization with a general Experiment object holding the Data instance and the multiple Analysis and Visualization instances ? Or should the visualizations be functions that take an Analysis and/or Data object(s) as parameter(s) in order to construct the plots ? Is there a more efficient way ? Am I missing something ?


Solution

  • Your general idea would work, here are some more details that will hopefully help you to proceed:

    I do have a warning: Python is abstract, powerful and high-level enough that you don't generally need to create your own OO design -- it is always possible to do what you want with mininal code using numpy, scipy, and matplotlib, so before start doing the extra coding be sure you need it :)