API Reference

Foreshadow

Core end-to-end pipeline, foreshadow.

class Foreshadow(problem_type, random_state=None, n_jobs=1, estimator=None, allowed_seconds=300, auto_estimator_kwargs=None)[source]

An end-to-end pipeline to preprocess and tune a machine learning model.

Example

>>> shadow = Foreshadow(problem_type=ProblemType.CLASSIFICATION)
Parameters:
  • X_preparer (Preprocessor, optional) – Preprocessor instance that will apply to X data. Passing False prevents the automatic generation of an instance.
  • y_preparer (Preprocessor, optional) – Preprocessor instance that will apply to y data. Passing False prevents the automatic generation of an instance.
  • estimator (sklearn.base.BaseEstimator, optional) – Estimator instance to fit on processed data
  • optimizer (sklearn.grid_search.BaseSeachCV, optional) – Optimizer class to optimize feature engineering and model hyperparameters
X_preparer

Preprocessor object for performing feature engineering on X data.

Getter:Returns Preprocessor object
Setter:Verifies Preprocessor object, if None, creates a default Preprocessor
Type:Preprocessor
Returns:the X_preparer object
y_preparer

Preprocessor object for performing scaling and encoding on Y data.

Getter:Returns Preprocessor object
Setter:Verifies Preprocessor object, if None, creates a default Preprocessor
Type:Preprocessor
Returns:the y_preparer object
estimator

Estimator object for fitting preprocessed data.

Getter:Returns Estimator object
Setter:Verifies Estimator object. If None, an AutoEstimator object is created in place.
Type:sklearn.base.BaseEstimator
Returns:the estimator object
fit(data_df, y_df)[source]

Fit the Foreshadow instance using the provided input data.

Parameters:
  • data_df (DataFrame) – The input feature(s)
  • y_df (DataFrame) – The response feature(s)
Returns:

The fitted instance.

Return type:

Foreshadow

predict(data_df)[source]

Use the trained estimator to predict the response variable.

Parameters:data_df (DataFrame) – The input feature(s)
Returns:The response feature(s) (transformed if necessary)
Return type:DataFrame
predict_proba(data_df)[source]

Use the trained estimator to predict the response variable.

Uses the predicted confidences instead of binary predictions.

Parameters:data_df (DataFrame) – The input feature(s)
Returns:The probability associated with each response feature
Return type:DataFrame
score(data_df, y_df=None, sample_weight=None)[source]

Use the trained estimator to compute the evaluation score.

The scoding method is defined by the selected estimator.

Parameters:
  • data_df (DataFrame) – The input feature(s)
  • y_df (DataFrame, optional) – The response feature(s)
  • sample_weight (numpy.ndarray, optional) – The weights to be used when scoring each sample
Returns:

A computed prediction fitness score

Return type:

float

get_params(deep=True)[source]

Get params for this object. See super.

Parameters:deep – True to recursively call get_params, False to not.
Returns:params for this object.
set_params(**params)[source]

Set params for this object. See super.

Parameters:**params – params to set.
Returns:See super.
get_intent(column_name: str) → Optional[str][source]

Retrieve the intent of a column.

Parameters:column_name – the column name
Returns:the intent of the column
Return type:str
list_intent(column_names: List[str]) → List[str][source]

Retrieve the intent of a list of columns.

Parameters:column_names – a list of columns
Returns:The list of intents
override_intent(column_name: str, intent: str) → NoReturn[source]

Override the intent of a particular column.

Parameters:
  • column_name – the column to override
  • intent – the user supplied intent
Raises:

ValueError – Invalid column to override.

configure_multiprocessing(n_job: int = 1) → NoReturn[source]

Configure the multiprocessing option.

Parameters:n_job – the number of processes to run the job.
set_processed_data_export_path(data_path: str, is_train: bool) → NoReturn[source]

Set path to export data before feeding the data to the estimator.

Parameters:
  • data_path – the data path string
  • is_train – whether this is for training data
pickle_fitted_pipeline(path: str) → NoReturn[source]

Pickle the foreshadow object with the best pipeline estimator.

Parameters:path – the pickle file path
Raises:ValueError – pipeline not fitted.
configure_sampling(enable_sampling=True, sampling_fraction: float = 0.2, replace: bool = False) → NoReturn[source]

Configure the sampling criteria.

Parameters:
  • enable_sampling – whether to enable sampling in data cleaning and intent resolving # noqa: E501
  • sampling_fraction – whether to use replacement during sampling
  • replace – the sampling fraction

Returns:

register_customized_data_cleaner(data_cleaners: List) → NoReturn[source]

EXPERIMENTAL Allow user to register a customized data cleaner.

Parameters:data_cleaners – customized data cleaners
Raises:ValueError – data cleaner must be a child class of the base cleaner.
get_data_summary() → pandas.core.frame.DataFrame[source]

Get summary statistics and identified intent for training dataset.

Returns:a DataFrame, in which each column represent the summary of a column