pythongoogle-colaboratorystancmdstanpy

Installing cmdstanpy fast on Google Colab


After finding out that I would need to reinstall certain Python packages in Google Colab every time I refresh a runtime, I quickly lost interest in trying to use Google Colab to run stan code. In particular, the last step in installing cmdstanpy

!pip install cmdstanpy
import cmdstanpy
cmdstanpy.install_cmdstan()

takes about 10 or so minutes!

However, I have noticed this page provided a clever solution that would only require me to install cmdstanpy once. This solution saves all of the stan and c++ files, compresses them into a .tar.gz, and then reads that back in at the beginning of each session.

Unfortunately, some of the source code appears to be outdated. The code in that notebook throws an error after complaining a binary executable isn't getting the right command line parameters.

My question: how can I regenerate this .tar.gz every so often? How can I save all the files created by calling cmdstanpy.install_cmdstan() to be read back in later?

This is my current attempt, but I don't think it's grabbing everything. When I re-upload everything, and decompress it, it complains it can't find everything.

!pip install cmdstanpy
import cmdstanpy
cmdstanpy.install_cmdstan() # this takes a while

# write everything out to disk
import os
import shutil
cmdstan_dir = cmdstanpy.__path__
tar_filename = '/content/cmdstan_files.tar.gz' 
shutil.make_archive(tar_filename.replace('.tar.gz', ''), 'gztar', cmdstan_dir)

Solution

  • This works. Just had to update the version to the newest release: https://github.com/stan-dev/cmdstan/releases/

    # Load packages used in this notebook
    import os
    import json
    import shutil
    import urllib.request
    import pandas as pd
    
    # Install package CmdStanPy
    !pip install --upgrade cmdstanpy
    
    # Install pre-built CmdStan binary
    # (faster than compiling from source via install_cmdstan() function)
    tgz_file = 'colab-cmdstan-2.36.0.tar.gz'
    tgz_url = 'https://github.com/stan-dev/cmdstan/releases/download/v2.36.0/colab-cmdstan-2.36.0.tgz'
    if not os.path.exists(tgz_file):
        urllib.request.urlretrieve(tgz_url, tgz_file)
        shutil.unpack_archive(tgz_file)
    
    # Specify CmdStan location via environment variable
    os.environ['CMDSTAN'] = './cmdstan-2.36.0'
    # Check CmdStan path
    from cmdstanpy import CmdStanModel, cmdstan_path
    cmdstan_path()