Getting p-values from GLM model in python

Currently there is no way to get p-value from GLM fitted model in Python, it does work in R.

>>> import numpy as np
>>> df1 = h2o.H2OFrame.from_python(np.random.randn(100,4).tolist(), column_names=list('ABCD'))

Now try the following:

>>> from h2o.estimators.glm import H2OGeneralizedLinearEstimator
>>> glmfitter3 = H2OGeneralizedLinearEstimator(family="gaussian", solver = "IRLSM", alpha=0, lambda_=0,... compute_p_values=True )
>>> glmfitter3.train(x=['A','B'],y="C",training_frame=df1 )glm Model Build progress: |██ | 100%

Now lets get Model Details:

>> print(glmfitter3)

Model Details

=============
H2OGeneralizedLinearEstimator : Generalized Linear Modeling
Model Key: GLM_model_python_1473895693010_1
GLM Model: summary
family link regularization number_of_predictors_total number_of_active_predictors number_of_iterations training_frame
– -------- -------- ---------------- ---------------------------- ----------------------------- ---------------------- ------------------------------------------------------
gaussian identity None 2 2 0 Key_Frame__upload_bc7ed024599c9c807ffee0ab12f8457a.hex
ModelMetricsRegressionGLM: glm
Reported on train data. **
MSE: 0.965343913566
RMSE: 0.982519167022
MAE: 0.76322016906
RMSLE: NaN
R^2: 0.00763659012861
Mean Residual Deviance: 0.965343913566
Null degrees of freedom: 99
Residual degrees of freedom: 97
Null deviance: 97.2772579041
Residual deviance: 96.5343913566
AIC: 288.260621239
Scoring History:
timestamp duration iteration negative_log_likelihood objective
– ------------------- ---------- ----------- ------------------------- -----------
2016-09-14 16:32:24 0.000 sec 0 97.2773 0.972773

 

Now we can print the coefficient as below:

>>> print(glmfitter3.coef())

{u'A': -0.08288242123163249, u'B': 0.027858667912495073, u'Intercept': 0.012225954789000987}>>>

Parsing the JSON does get p-values however it would be great to have p-values from some function.

There is not direct method to get p-values from GLM however you can access model JSON to get those values as below:

Once you have you GLM fitted model i.e. glmfitter3, you can get the model JSON as below:

>>> glmfitter3._model_json

Within the model JSON you can look for ‘output’ values as below:

>>> glmfitter3._model_json['output']

Now if you look for ‘coefficients_table’ you will get your p-values as below:

>>> glmfitter3._model_json['output']['coefficients_table']

Coefficients: glm coefficients
names coefficients std_error z_value p_value standardized_coefficients


Intercept 0.012226 0.0820524 0.149002 0.881862 0.0148644
A -0.0828824 0.0970972 -0.853603 0.395428 -0.0868894
B 0.0278587 0.0999033 0.278856 0.780949 0.0283852

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s