Sometime you may want to emulate hierarchical modeling to achieve your objective you can use beta_constraints as below:
iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
bc = h2o.H2OFrame([("Intercept",-1000,1000,3,30)], column_names=["names","lower_bounds","upper_bounds","beta_given","rho"])
glm = H2OGeneralizedLinearEstimator(family = "gaussian",
The output will look like as below:
There’s more information in the GLM booklet linked below, but the short version is to create a new constraints frame with the columns: names, lower_bounds, upper_bounds, beta_given, & rho, and have a row entry per feature you want to constrain. You can use “Intercept” as a keyword to constraint the intercept.
names: (mandatory) coefficient names
lower bounds: (optional) coefficient lower bounds , must be less thanor equal to upper bounds
upper bounds: (optional) coefficient upper bounds , must be greaterthan or equal to lower bounds
beta given: (optional) specifies the given solution in proximal operatorinterface
rho (mandatory if beta given is specified, otherwise ignored): specifiesper-column L2 penalties on the distance from the given solution
If you want to go deeper to learn how these L1/L2 parameters works, here are more details:
What’s happening is an L2 penalty is being applied between the coeffecient & given. The proximal penalty is computed: Σ(x-x’)*rho. You can confirm this by setting rho to be whatever lambda may be, and set let lambda to 0. This will give the same result as having set lambda to that value.
You can use beta constraints to assign per-feature regularization strength
but only for l2 penalty. The math is explained here:
sum_i rho[i] * L2norm2(beta[i]-betagiven[i])
So if you set beta given to zero, and say all rho except for the intercept to 1e-5
then it is equivalent to running without BC, just with alpha = 0, lambda = 1e-5
Thats it, enjoy!!