python-3.xstatisticsjupyter-notebookmultivariate-testing

How to apply Henze-Zirkler's Multivariate Normality Test in Jupyter notebook with rpy2


I am interested in Applying Henze-Zirkler's Multivariate Normality Test in python 3x and I was wondering if I may do so in python in Jupyter notebook.

I have fitted a VAR model with my data and the then I would like to test whether the residuals from this fitted VAR model are normally distributed.

How may I do so in Jupyter notebook using python?


Solution

  • This is another answer since I discover this method later. If you do not want to import the library of R into Python. One may call the output of R to python. i.e. one is capable of activating R function through python as follow:

    import rpy2.robjects as robjects
    from rpy2.robjects import r
    from rpy2.robjects.numpy2ri import numpy2ri
    from rpy2.robjects.packages import importr
    import numpy as np
    

    suppose that resi is a Dataframe in python say

    # Create data
    resi = pd.DataFrame(np.random.random((108, 2)), columns=['Number1','Number2'])
    

    Then the code is as follow

    #Converting the dataframe from python to R
    
    # firt take the values of the dataframe to numpy
    resi1=np.array(resi, dtype=float)
    
    # Taking the variable from Python to R
    r_resi = numpy2ri(resi1)
    
    # Creating this variable in R (from python)
    r.assign("resi", r_resi)
    
    # Calling libraries in R 
    r('library("MVN")')
    
    # Calling a function in R (from python)
    r("res <- hzTest(resi, qqplot = F)")
    
    # Retrieving information from R to Python
    r_result = r("res")
    
    # Printing the output in python
    print(r_result)
    

    This will generate the output:

     Henze-Zirkler's Multivariate Normality Test 
    
    --------------------------------------------- 
    
      data : resi 
    
    
    
      HZ      : 2.841424 
    
      p-value : 1.032563e-06 
    
    
    
      Result  : Data are not multivariate normal. 
    
    ---------------------------------------------
    

    Update per 2021-08-25 There has been some API changes both to the MVN package and ro rpy2. The following works with MVN version 5.9 and rpy2 version 3.4.

    """Interface file to access the R MVN package"""
    
    import numpy as np
    import rpy2.robjects.packages as rpackages
    from rpy2.robjects import numpy2ri
    from rpy2.robjects.packages import importr
    from rpy2.robjects.vectors import StrVector
    
    
    # Install packages, if they are not already installed
    packages_to_install_if_needed = ("MVN",)
    utils = rpackages.importr("utils")
    utils.chooseCRANmirror(ind=1)  # select the first mirror in the list
    names_to_install = [x for x in packages_to_install_if_needed if not rpackages.isinstalled(x)]
    if len(names_to_install) > 0:
        utils.install_packages(StrVector(names_to_install))
    
    # load the package
    mvn = importr("MVN")
    
    # Generate data
    np_arr = np.random.multivariate_normal(np.ones(2), np.eye(2), size=100)
    
    # activate automatic conversion from numpy to rpy2 interface objects
    numpy2ri.activate()
    
    # perform the work
    res = mvn.mvn(np_arr)
    print(res)
    

    outputting

    $multivariateNormality
               Test        HZ   p value MVN
    1 Henze-Zirkler 0.3885607 0.8343017 YES
    
    $univariateNormality
                  Test  Variable Statistic   p value Normality
    1 Anderson-Darling  Column1     0.2443    0.7569    YES
    2 Anderson-Darling  Column2     0.3935    0.3692    YES
    
    $Descriptives
        n      Mean   Std.Dev    Median       Min      Max      25th     75th
    1 100 0.9619135 1.0353688 1.0222279 -1.994833 3.679615 0.2696537 1.758255
    2 100 0.7664778 0.9134449 0.8121996 -1.568635 2.648268 0.2068718 1.418113
            Skew    Kurtosis
    1 -0.2123274 -0.16171832
    2 -0.3718904 -0.05279222