My experiment using lightGBM (Microsoft) from scratch at OSX

LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. It is under the umbrella of the DMTK(http://github.com/microsoft/dmtk) project of Microsoft.

Pre-requisite:

  • cmake
  • gcc

Test Environment:

$ cmake -version
cmake version 3.6.2
$ gcc --version 
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Get Source

$ git clone --recursive https://github.com/Microsoft/LightGBM.git

Preparation:

$ cd lightgbm
$ mkdir build
$ cd build
cmake -DCMAKE_CXX_COMPILER=g++-6 -DCMAKE_C_COMPILER=gcc-6 ..

Configuration:

$ cmake -DCMAKE_CXX_COMPILER=g++-6 -DCMAKE_C_COMPILER=gcc-6 .. 
-- The C compiler identification is GNU 6.2.0
-- The CXX compiler identification is GNU 6.2.0
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc-6
-- Check for working C compiler: /usr/local/bin/gcc-6 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-6
-- Check for working CXX compiler: /usr/local/bin/g++-6 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp
-- Configuring done
CMake Warning (dev):
 Policy CMP0042 is not set: MACOSX_RPATH is enabled by default. Run "cmake
 --help-policy CMP0042" for policy details. Use the cmake_policy command to
 set the policy and suppress this warning.

MACOSX_RPATH is not specified for the following targets:

_lightgbm

This warning is for project developers. Use -Wno-dev to suppress it.

-- Generating done
-- Build files have been written to: /Users/avkashchauhan/src/github.com/microsoft/LightGBM/build

Build now:

$ make -j 
Scanning dependencies of target lightgbm
Scanning dependencies of target _lightgbm
[ 6%] Building CXX object CMakeFiles/lightgbm.dir/src/application/application.cpp.o
....
.....
....


[ 97%] Linking CXX shared library ../lib_lightgbm.so
[100%] Linking CXX executable ../lightgbm
[100%] Built target _lightgbm
[100%] Built target lightgbm

Install Python package:

$ make install 
[ 50%] Built target _lightgbm
[100%] Built target lightgbm
Install the project...
-- Install configuration: ""
-- Installing: /usr/local/bin/lightgbm
-- Installing: /usr/local/lib/lib_lightgbm.so
-- Installing: /usr/local/include/LightGBM
-- Installing: /usr/local/include/LightGBM/application.h
-- Installing: /usr/local/include/LightGBM/bin.h
-- Installing: /usr/local/include/LightGBM/boosting.h
-- Installing: /usr/local/include/LightGBM/c_api.h
-- Installing: /usr/local/include/LightGBM/config.h
-- Installing: /usr/local/include/LightGBM/dataset.h
-- Installing: /usr/local/include/LightGBM/dataset_loader.h
-- Installing: /usr/local/include/LightGBM/export.h
-- Installing: /usr/local/include/LightGBM/feature.h
-- Installing: /usr/local/include/LightGBM/meta.h
-- Installing: /usr/local/include/LightGBM/metric.h
-- Installing: /usr/local/include/LightGBM/network.h
-- Installing: /usr/local/include/LightGBM/objective_function.h
-- Installing: /usr/local/include/LightGBM/tree.h
-- Installing: /usr/local/include/LightGBM/tree_learner.h
-- Installing: /usr/local/include/LightGBM/utils
-- Installing: /usr/local/include/LightGBM/utils/array_args.h
-- Installing: /usr/local/include/LightGBM/utils/common.h
-- Installing: /usr/local/include/LightGBM/utils/log.h
-- Installing: /usr/local/include/LightGBM/utils/openmp_wrapper.h
-- Installing: /usr/local/include/LightGBM/utils/pipeline_reader.h
-- Installing: /usr/local/include/LightGBM/utils/random.h
-- Installing: /usr/local/include/LightGBM/utils/text_reader.h
-- Installing: /usr/local/include/LightGBM/utils/threading.h

Test it now:

$ python -c 'import lightgbm as lg;print(lg.__version__)'
0.1

Sample Code Jupyter Notebook:

# In[1]:
import json
import lightgbm as lgb
import pandas as pd
from sklearn.metrics import mean_squared_error

# In[2]:
# load or create your dataset
print('Load data...')
df_train = pd.read_csv('~/src/github.com/microsoft/LightGBM/examples/regression/regression.train', header=None, sep='\t')
df_test = pd.read_csv('~/src/github.com/microsoft/LightGBM/examples/regression/regression.test', header=None, sep='\t')

# In[4]:
df_train.shape

# In[5]:
df_test.shape

# In[6]:
y_train = df_train[0]
y_test = df_test[0]
X_train = df_train.drop(0, axis=1)
X_test = df_test.drop(0, axis=1)

# In[8]:
y_train.shape

# In[10]:
X_train.shape

# In[11]:
X_test.shape

# In[12]:
# create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# In[13]:
# specify your configurations as a dict
params = {
 'task': 'train',
 'boosting_type': 'gbdt',
 'objective': 'regression',
 'metric': {'l2', 'auc'},
 'num_leaves': 31,
 'learning_rate': 0.05,
 'feature_fraction': 0.9,
 'bagging_fraction': 0.8,
 'bagging_freq': 5,
 'verbose': 0
}

# In[14]:
print('Start training...')
# train
gbm = lgb.train(params,
 lgb_train,
 num_boost_round=20,
 valid_sets=lgb_eval,
 early_stopping_rounds=5)

# In[15]:
print('Start predicting...')
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5)


# In[16]:
print('Dump model to JSON as : lightgbm_model.json')
# dump model to json (and save to file)
model_json = gbm.dump_model()

with open('lightgbm_model.json', 'w+') as f:
 json.dump(model_json, f, indent=4)

print('Above lightgbm_model.json file is saved at your local file system, mostly where jupyter notebook started')

# In[17]:
print('Feature Importance Results:')
print('Feature names:', gbm.feature_name())
print('Calculate feature importances...')
# feature importances
print('Feature importances:', list(gbm.feature_importance()))

# In[18]:
print('Save model...')
# save model to file
gbm.save_model('lightgbm_model.txt')
print('Above lightgbm_model.txt file is saved at your local file system, mostly where jupyter notebook started')

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s