matplotliblinear-regressionpower-law

Linear regression to fit a power-law


I have two data sets index_list and frequency_list which I plot in a loglog plot by plt.loglog(index_list, freq_list). Now I'm trying to fit a power law a*x^(-b) with linear regression. I expect the curve to follow the initial curve closely but the following code seems to output a similar curve but mirrored on the y-axis. I suspect I am using curve_fit badly.

why is this curve mirrored on the x-axis and how I can get it to properly fit my inital curve?

Using this data

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

f = open ("input.txt", "r")
index_list = []
freq_list = []
index = 0
for line in f:
    split_line = line.split()
    freq_list.append(int(split_line[1]))
    index_list.append(index)
    index += 1

plt.loglog(index_list, freq_list)
def power_law(x, a, b):
    return a * np.power(x, -b)

popt, pcov = curve_fit(power_law, index_list, freq_list)
plt.plot(index_list,  power_law(freq_list, *popt))
plt.show()

Solution

  • The code below made the following changes:

    Note that calling plt.loglog() changes both axes of the plot to logarithmic. All subsequent plots on the same axes will continue to use the logarithmic scale.

    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    import pandas as pd
    import numpy as np
    
    def power_law(x, a, b):
        return a * np.power(x, -b)
    
    df = pd.read_csv("https://norvig.com/google-books-common-words.txt", delim_whitespace=True, header=None)
    
    index_list = df.index.to_numpy(dtype=float) + 1
    freq_list = df[1].to_numpy(dtype=float)
    
    plt.loglog(index_list, freq_list, label='given data')
    
    popt, pcov = curve_fit(power_law, index_list, freq_list, p0=[1, 1], bounds=[[1e-3, 1e-3], [1e20, 50]])
    
    plt.plot(index_list, power_law(index_list, *popt), label='power law')
    plt.legend()
    plt.show()
    

    example plot