pythonmachine-learningscikit-learnlinear-regressionlogistics

How do I apply scikit-learn's LogisticRegression for some decimal data?


I've the training data set like this:

0.00479616 |  0.0119904 |  0.00483092 |  0.0120773 | 1
0.51213136 |  0.0113404 |  0.02383092 |  -0.012073 | 0
0.10479096 |  -0.011704 |  -0.0453692 |  0.0350773 | 0

The first 4 columns is features of one sample and the last column is its output.

I use scikit this way :

  data = np.array(data)
  lr = linear_model.LogisticRegression(C=10)

  X = data[:,:-1]
  Y = data[:,-1]
  lr.fit(X, Y)

  print lr
  # The output is always 1 or 0, not a probability number.
  print lr.predict(data[0][:-1])

I thought Logistic Regression always should gives a probability number between 0 and 1.


Solution

  • Use the predict_proba method to get probabilities. predict gives class labels.

    >>> lr = LogisticRegression()
    >>> X = np.random.randn(3, 4)
    >>> y = [1, 0, 0]
    >>> lr.fit(X, y)
    LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
              intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)
    >>> lr.predict_proba(X[0])
    array([[ 0.49197272,  0.50802728]])
    

    (If you had read the documentation, you would have found this out.)