Developers Guide¶
Thank you for taking the time to contribute and reading this page, any and all help is appreciated!
Setting up the Project From Source¶
- General Setup
Clone the project down to your computer
$ git clone https://github.com/georgianpartners/foreshadow.git $ cd foreshadow $ git checkout development
Install and setup pyenv and pyenv-virtualenv
Follow the instructions on their pages or use homebrew if you have a Mac
$ brew install pyenv $ brew install pyenv-virtualenv
Make sure to add the following lines to your
.bash_profile
export PYENV_ROOT="$HOME/.pyenv" export PATH="$PYENV_ROOT/bin:$PATH" if command -v pyenv 1>/dev/null 2>&1; then eval "$(pyenv init -)" fi eval "$(pyenv virtualenv-init -)"
Restart your shell session for the changes to take effect and perform the following setup in the root directory of the project. This sets up a convenient virtualenv that automatically activates in the root of your project. (Note: there is a known error with pyenv. Also, you may need to change the file path depending on your version or you may not even need to do that step.
$ open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg $ pyenv install 3.6.8 $ pyenv global 3.6.8 $ pyenv virtualenv -p python3.6 3.6.8 venv $ pyenv local venv 3.6.8
Install poetry package manager
(venv) $ pyenv shell system $ curl -sSL https://raw.githubusercontent.com/sdispater/poetry/master/get-poetry.py | python $ pyenv shell --unset
- Prepare for Autosklearn install
Autosklearn was setup as an optional dependency as it can be sometimes difficult to install because of its requirement of xgboost. In order to have a development environment that passes all tests, autosklearn is required.
Install swig
Use your package manager to install swig
(venv) $ brew install swig # (or apt-get)
Install gcc (MacOS only)
Use your package manager to install gcc (necessary for xgboost)
(venv) $ brew install gcc@5 # (or apt-get)
- Install all the packages and commit hooks
When the project is installed through poetry both project requirements and development requirements are installed. Install commit-hooks using the pre-commit utility.
(venv) $ poetry install -v (venv) $ export CC=gcc-5; export CXX=g++-5; (venv) $ poetry install -E dev (venv) $ poetry run pre-commit install
Configure PlantUML
(venv) $ brew install plantuml # MacOS (requires brew cask install adoptopenjdk) (venv) $ sudo apt install plantuml # Linux
- Making sure everything works
Run pytest to make sure you’re good to go
(venv) $ poetry run pytest
Run tox to run in supported python versions (optional)
(venv)$ poetry run tox -r # supply the -r flag if you changed the dependencies
Run make html in foreshadow/doc to build the documentation (optional)
(venv) $ poetry run make html
If all the tests pass you’re all set up!
Note
Our platform also includes integration tests that asses the overall performance of our framework using the default settings on a few standard ML datasets. By default these tests are not executed, to run them, set an environmental variable called FORESHADOW_TESTS to ALL
- Suggested development work flow
Create a branch off of development to contain your change
(venv) $ git checkout development (venv) $ git checkout -b {your_feature}
Run pytest and pre-commit while developing This will help ensure something hasn’t broken while adding a feature. Pre-commit will lint the code before each commit.
$ poetry run pytest $ poetry run pre-commit run --all-files
Run tox to test your changes across versions Make sure to add test cases for your change in the appropriate folder in foreshadow/tests and run tox to test your project across python 3.5 and 3.6
$ poetry run tox
Submit a pull request This can be tricky if you have cloned the project instead of forking it but no worries the fix is simple. First go to the project page and fork it there. Then do the following.
(venv) $ git remote add upstream https://github.com/georgianpartners/foreshadow.git (venv) $ git remote set-url origin https://github.com/{YOUR_USERNAME}/foreshadow.git (venv) $ git push origin {your_feature}
Now you can go to the project on your github page and submit a pull request to the main project.
Note
Make sure to submit the pull request against the development branch.
Adding Transformers¶
Adding transformers is quite simple. Simply write a class with the fit transform and inverse_transform methods that extends scikit_learn.base.BaseEstimator
and sklearn.base.TransformerMixin
. Take a look at the structure below and modify it to suit your needs. We would recommend taking a look at the sklearn.preprocessing.RobustScaler source code for a good example.
from foreshadow.base import TransformerMixin, BaseEstimator
from sklearn.utils import check_array
class CustomTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
X = check_array(X)
return self
def transform(self, X, y=None):
X = check_array(X, copy=True)
# modify input based on fit here
return X
def inverse_transform(self, X):
X = check_array(X, copy=True)
# if applicable, write inverse transform here
return X
After writing your transformer make sure place it in the internals folder in its own file with the associated tests for the transformer in the mirrored test directory and you are all set. If you want to add an external transformer that is not already supported by foreshadow submit a pull request with the appropriate modification to the externals.py file in transformers.
Future Architecture Roadmap¶
In progress