Back to DrivenData | Blog

How to use multiple multivariate regression using lasso scikit learn

Folks,

When I use Lasso() instead of LinearRegression() from the benchmark code. I am encountering an error as follows:

predictions = lasso.predict(test_tfidf)

ValueError: operands could not be broadcast together with shapes (3,6772) (3,)

Is there any way to work around to generate the output of test data?

Looks like maybe Lasso isn’t working with the sparse matrix produced by tf-idf? I bet that todense() will fix the error that you’re seeing:

predictions = lasso.predict(test_tfidf.todense())

Peter

I tried using

lasso.predict(test_tfidf.todense())

but the output is random as follows, * - 4, ** and *** are zeros all throughout the file

id,date,restaurant_id,,,
14916,2014-12-17,nkOvZOBW,4,0,0
29767,2013-12-30,WwOaAnOB,4,0,0
29338,2011-04-08,we39kvOk,4,0,0
7219,2013-12-30,dj3d5Xo9,4,0,0
20979,2008-03-31,XJ3rBW3R,4,0,0
5599,2014-08-07,lnORVGON,4,0,0
32994,2013-10-31,XJ3r0YOR,4,0,0
23804,2013-07-02,dj3dP739,4,0,0
1416,2012-01-24,JGoNpdEL,4,0,0
27518,2007-07-03,6Wo2YN39,4,0,0
19961,2007-11-28,njoZ4YEr,4,0,0
19494,2012-04-17,0ZEDZ4oD,4,0,0
14177,2007-10-12,6Wo2AN39,4,0,0

Is it because scikit library doesn’t support multiple multivariate regression for other linear_models? (except LinearRegression())

If you look at lasso.coef_ you’ll see they are all zero. If you look at lasso.intercept_ you’ll see that the intercept for the first model is ~4. The other two intercepts are less than 1 so they were rounded down to zero by predictions.astype(int).

This means that when you fit the model, all of the coefficients were set to zero. You’ll have to play with the parameter alpha to find a value that keeps enough of the coefficients. alpha=0 will keep them all. The default is alpha=1.0, which in this case led to all zeros.

Try something between 1 and 0, for example:

lasso = linear_model.Lasso(alpha=1e-4)

This should generate some predictions, however you probably want to use LassoCV to find the best value of alpha.

Here are the sklearn docs for reference:
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV

1 Like

Thanks @bull. It worked! Now I got the output with non-zeros for ** & *** for alpha=1e-4. However Lasso() doesn’t seem to give good result compared to LinearRegression(). May be I should play with the tuning parameters to get good result.

1 Like