API Reference¶

Foreshadow¶

Core end-to-end pipeline, foreshadow.

class Foreshadow(problem_type, random_state=None, n_jobs=1, estimator=None, allowed_seconds=300, auto_estimator_kwargs=None)[source]¶

An end-to-end pipeline to preprocess and tune a machine learning model.

Example

>>> shadow = Foreshadow(problem_type=ProblemType.CLASSIFICATION)

Parameters:

X_preparer (Preprocessor, optional) – Preprocessor instance that will apply to X data. Passing False prevents the automatic generation of an instance.
y_preparer (Preprocessor, optional) – Preprocessor instance that will apply to y data. Passing False prevents the automatic generation of an instance.
estimator (sklearn.base.BaseEstimator, optional) – Estimator instance to fit on processed data
optimizer (sklearn.grid_search.BaseSeachCV, optional) – Optimizer class to optimize feature engineering and model hyperparameters

X_preparer¶

Preprocessor object for performing feature engineering on X data.

Getter:	Returns Preprocessor object
Setter:	Verifies Preprocessor object, if None, creates a default Preprocessor
Type:	`Preprocessor`
Returns:	the X_preparer object

y_preparer¶

Preprocessor object for performing scaling and encoding on Y data.

Getter:	Returns Preprocessor object
Setter:	Verifies Preprocessor object, if None, creates a default Preprocessor
Type:	`Preprocessor`
Returns:	the y_preparer object

estimator¶

Estimator object for fitting preprocessed data.

Getter:	Returns Estimator object
Setter:	Verifies Estimator object. If None, an `AutoEstimator` object is created in place.
Type:	`sklearn.base.BaseEstimator`
Returns:	the estimator object

fit(data_df, y_df)[source]¶

Fit the Foreshadow instance using the provided input data.

Parameters:	data_df (`DataFrame`) – The input feature(s) y_df (`DataFrame`) – The response feature(s)
Returns:	The fitted instance.
Return type:	`Foreshadow`

predict(data_df)[source]¶

Use the trained estimator to predict the response variable.

Parameters:	data_df (`DataFrame`) – The input feature(s)
Returns:	The response feature(s) (transformed if necessary)
Return type:	`DataFrame`

predict_proba(data_df)[source]¶

Use the trained estimator to predict the response variable.

Uses the predicted confidences instead of binary predictions.

Parameters:	data_df (`DataFrame`) – The input feature(s)
Returns:	The probability associated with each response feature
Return type:	`DataFrame`

score(data_df, y_df=None, sample_weight=None)[source]¶

Use the trained estimator to compute the evaluation score.

The scoding method is defined by the selected estimator.

Parameters:	data_df (`DataFrame`) – The input feature(s) y_df (`DataFrame`, optional) – The response feature(s) sample_weight (`numpy.ndarray`, optional) – The weights to be used when scoring each sample
Returns:	A computed prediction fitness score
Return type:	float

get_params(deep=True)[source]¶

Get params for this object. See super.

Parameters:	deep – True to recursively call get_params, False to not.
Returns:	params for this object.

set_params(**params)[source]¶

Set params for this object. See super.

Parameters:	**params – params to set.
Returns:	See super.

get_intent(column_name: str) → Optional[str][source]¶

Retrieve the intent of a column.

Parameters:	column_name – the column name
Returns:	the intent of the column
Return type:	str

list_intent(column_names: List[str]) → List[str][source]¶

Retrieve the intent of a list of columns.

Parameters:	column_names – a list of columns
Returns:	The list of intents

override_intent(column_name: str, intent: str) → NoReturn[source]¶

Override the intent of a particular column.

Parameters:	column_name – the column to override intent – the user supplied intent
Raises:	`ValueError` – Invalid column to override.

configure_multiprocessing(n_job: int = 1) → NoReturn[source]¶

Configure the multiprocessing option.

Parameters:	n_job – the number of processes to run the job.

set_processed_data_export_path(data_path: str, is_train: bool) → NoReturn[source]¶

Set path to export data before feeding the data to the estimator.

Parameters:	data_path – the data path string is_train – whether this is for training data

pickle_fitted_pipeline(path: str) → NoReturn[source]¶

Pickle the foreshadow object with the best pipeline estimator.

Parameters:	path – the pickle file path
Raises:	`ValueError` – pipeline not fitted.

configure_sampling(enable_sampling=True, sampling_fraction: float = 0.2, replace: bool = False) → NoReturn[source]¶

Configure the sampling criteria.

Parameters:	enable_sampling – whether to enable sampling in data cleaning and intent resolving # noqa: E501 sampling_fraction – whether to use replacement during sampling replace – the sampling fraction

Returns:

register_customized_data_cleaner(data_cleaners: List) → NoReturn[source]¶

EXPERIMENTAL Allow user to register a customized data cleaner.

Parameters:	data_cleaners – customized data cleaners
Raises:	`ValueError` – data cleaner must be a child class of the base cleaner.

get_data_summary() → pandas.core.frame.DataFrame[source]¶

Get summary statistics and identified intent for training dataset.

Returns:	a DataFrame, in which each column represent the summary of a column