pythonpandasjupyter-notebook

How to use a specific version of methylprep in Jupyter notebook


I am trying to use a package named Methylprep. It is using "append" function for dataframe, which has been removed since pandas 2.0.

Now the version of pandas installed in my pc is 2.2.2. And I am using jupyter notebook to process my scripts. Is there a way to allow me use specific version of pandas (maybe 1.8) in the script I am currently testing in jupyter notebook ?

Thank you very much !!

import methylprep
from pathlib import Path
filepath = Path('test/')

data_containers = methylprep.run_pipeline(filepath, array_type=None, export=True, manifest_filepath=None, sample_sheet_filepath='test/MethylationEPIC_Sample_Sheet_B.csv')
INFO:methylprep.processing.pipeline:Running pipeline in: test
Reading IDATs: 100%|█████████████████████████████████████████████████████████████████████| 1/1 [00:41<00:00, 41.74s/it]
INFO:methylprep.files.manifests:Reading manifest file: HumanMethylationEPIC_manifest_v2.csv
Processing samples:   0%|                                                                        | 0/1 [00:01<?, ?it/s]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_9268\2686656903.py in ?()
----> 1 data_containers = methylprep.run_pipeline(filepath, array_type=None, export=True, manifest_filepath=None, sample_sheet_filepath='test/MethylationEPIC_Sample_Sheet_B.csv')

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\methylprep\processing\pipeline.py in ?(data_dir, array_type, export, manifest_filepath, sample_sheet_filepath, sample_name, betas, m_value, make_sample_sheet, batch_size, save_uncorrected, save_control, meta_data_frame, bit, poobah, export_poobah, poobah_decimals, poobah_sig, low_memory, sesame, quality_mask, pneg_ecdf, file_format, **kwargs)
    327 
    328         batch_data_containers = []
    329         export_paths = set() # inform CLI user where to look
    330         for idat_dataset_pair in tqdm(idat_datasets, total=len(idat_datasets), desc="Processing samples"):
--> 331             data_container = SampleDataContainer(
    332                 idat_dataset_pair=idat_dataset_pair,
    333                 manifest=manifest,
    334                 retain_uncorrected_probe_intensities=save_uncorrected,

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\methylprep\processing\pipeline.py in ?(self, idat_dataset_pair, manifest, retain_uncorrected_probe_intensities, bit, pval, poobah_decimals, poobah_sig, do_noob, quality_mask, switch_probes, do_nonlinear_dye_bias, debug, sesame, pneg_ecdf, file_format)
    586         self.manifest = manifest # used by inter_channel_switch only.
    587         if self.switch_probes:
    588             # apply inter_channel_switch here; uses raw_dataset and manifest only; then updates self.raw_dataset
    589             # these are read from idats directly, not SigSet, so need to be modified at source.
--> 590             infer_type_I_probes(self, debug=self.debug)
    591 
    592         super().__init__(self.sample, self.green_idat, self.red_idat, self.manifest, self.debug)
    593         # SigSet defines all probe-subsets, then SampleDataContainer adds them with super(); no need to re-define below.

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\methylprep\processing\infer_channel_switch.py in ?(container, debug)
     15     -- runs in SampleDataContainer.__init__ this BEFORE qualityMask step, so NaNs are not present
     16     -- changes raw_data idat probe_means
     17     -- runs on raw_dataset, before meth-dataset is created, so @IR property doesn't exist yet; but get_infer has this"""
     18     # this first step combines all I-red and I-green channel intensities, so IG+oobG and IR+oobR.
---> 19     channels = get_infer_channel_probes(container.manifest, container.green_idat, container.red_idat, debug=debug)
     20     green_I_channel = channels['green']
     21     red_I_channel = channels['red']
     22     ## NAN probes occurs when manifest is not complete

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\methylprep\processing\infer_channel_switch.py in ?(manifest, green_idat, red_idat, debug)
    167     red_in_band['meth'] = oobG_unmeth
    168     green_in_band['unmeth'] = oobR_meth
    169 
    170     # next, add the green-in-band to oobG and red-in-band to oobR
--> 171     oobG_IG = oobG.append(green_in_band).sort_index()
    172     oobR_IR = oobR.append(red_in_band).sort_index()
    173 
    174     # channel swap requires a way to update idats with illumina_ids

~\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py in ?(self, name)
   6295             and name not in self._accessors
   6296             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297         ):
   6298             return self[name]
-> 6299         return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'append'

Solution

  • "Now the version of pandas installed in my pc is 2.2.2."

    Running Methylprep in conjunction with that version of Pandas would be the more productive way forward. You'll otherwise be fighting against the current of development.
    Luckily, someone already did the conversion here, and filed this related pull request at the original Foxotech 'methylprep' source repo. You can go ahead and use that for now, until it is integrated in the source software.

    The command to pip install that particular version of Methylprep would be the following in a terminal where you are sure you are in the environment Jupyter will be using:

    pip install git+https://github.com/gilgameshjw/methylprep/@pandas2.0_0
    

    Or more conveniently, run the in the notebook to use the magic install command that insures the installation from inside a running notebook occurs in the environment the kernel is actually using. ( See more about the modern, magic %pip install command here.)

    %pip install git+https://github.com/gilgameshjw/methylprep/@pandas2.0_0
    

    It seems to run a successful install. And then I even tested it in conjunction with Pandas 2.2.2 since I realized the repo has example data, see test run of methylprep working with Pandas 2 here.

    (In the test notebook, I link to above, I didn't run that install command in the notebook because I wasn't actually planning to test the running until I saw there was example data and thought I'd try. But I put the command in a code cell in the notebook as a magic install command so that anyone curious can step through running the notebook itself or the commands I show right inside a test notebook in fresh session launched from where I indicate at the top of here, all without touching or installing anything on your local system. This enables testing it works first before you mess with your local machine.)