API Reference¶
Foreshadow¶
Core end-to-end pipeline, foreshadow.
-
class
Foreshadow
(problem_type, random_state=None, n_jobs=1, estimator=None, allowed_seconds=300, auto_estimator_kwargs=None)[source]¶ An end-to-end pipeline to preprocess and tune a machine learning model.
Example
>>> shadow = Foreshadow(problem_type=ProblemType.CLASSIFICATION)
Parameters: - X_preparer (
Preprocessor
, optional) – Preprocessor instance that will apply to X data. Passing False prevents the automatic generation of an instance. - y_preparer (
Preprocessor
, optional) – Preprocessor instance that will apply to y data. Passing False prevents the automatic generation of an instance. - estimator (
sklearn.base.BaseEstimator
, optional) – Estimator instance to fit on processed data - optimizer (
sklearn.grid_search.BaseSeachCV
, optional) – Optimizer class to optimize feature engineering and model hyperparameters
-
X_preparer
¶ Preprocessor object for performing feature engineering on X data.
Getter: Returns Preprocessor object Setter: Verifies Preprocessor object, if None, creates a default Preprocessor Type: Preprocessor
Returns: the X_preparer object
-
y_preparer
¶ Preprocessor object for performing scaling and encoding on Y data.
Getter: Returns Preprocessor object Setter: Verifies Preprocessor object, if None, creates a default Preprocessor Type: Preprocessor
Returns: the y_preparer object
-
estimator
¶ Estimator object for fitting preprocessed data.
Getter: Returns Estimator object Setter: Verifies Estimator object. If None, an AutoEstimator
object is created in place.Type: sklearn.base.BaseEstimator
Returns: the estimator object
-
fit
(data_df, y_df)[source]¶ Fit the Foreshadow instance using the provided input data.
Parameters: Returns: The fitted instance.
Return type:
-
predict
(data_df)[source]¶ Use the trained estimator to predict the response variable.
Parameters: data_df ( DataFrame
) – The input feature(s)Returns: The response feature(s) (transformed if necessary) Return type: DataFrame
-
predict_proba
(data_df)[source]¶ Use the trained estimator to predict the response variable.
Uses the predicted confidences instead of binary predictions.
Parameters: data_df ( DataFrame
) – The input feature(s)Returns: The probability associated with each response feature Return type: DataFrame
-
score
(data_df, y_df=None, sample_weight=None)[source]¶ Use the trained estimator to compute the evaluation score.
The scoding method is defined by the selected estimator.
Parameters: - data_df (
DataFrame
) – The input feature(s) - y_df (
DataFrame
, optional) – The response feature(s) - sample_weight (
numpy.ndarray
, optional) – The weights to be used when scoring each sample
Returns: A computed prediction fitness score
Return type: - data_df (
-
get_params
(deep=True)[source]¶ Get params for this object. See super.
Parameters: deep – True to recursively call get_params, False to not. Returns: params for this object.
-
set_params
(**params)[source]¶ Set params for this object. See super.
Parameters: **params – params to set. Returns: See super.
-
get_intent
(column_name: str) → Optional[str][source]¶ Retrieve the intent of a column.
Parameters: column_name – the column name Returns: the intent of the column Return type: str
-
list_intent
(column_names: List[str]) → List[str][source]¶ Retrieve the intent of a list of columns.
Parameters: column_names – a list of columns Returns: The list of intents
-
override_intent
(column_name: str, intent: str) → NoReturn[source]¶ Override the intent of a particular column.
Parameters: - column_name – the column to override
- intent – the user supplied intent
Raises: ValueError
– Invalid column to override.
-
configure_multiprocessing
(n_job: int = 1) → NoReturn[source]¶ Configure the multiprocessing option.
Parameters: n_job – the number of processes to run the job.
-
set_processed_data_export_path
(data_path: str, is_train: bool) → NoReturn[source]¶ Set path to export data before feeding the data to the estimator.
Parameters: - data_path – the data path string
- is_train – whether this is for training data
-
pickle_fitted_pipeline
(path: str) → NoReturn[source]¶ Pickle the foreshadow object with the best pipeline estimator.
Parameters: path – the pickle file path Raises: ValueError
– pipeline not fitted.
-
configure_sampling
(enable_sampling=True, sampling_fraction: float = 0.2, replace: bool = False) → NoReturn[source]¶ Configure the sampling criteria.
Parameters: - enable_sampling – whether to enable sampling in data cleaning and intent resolving # noqa: E501
- sampling_fraction – whether to use replacement during sampling
- replace – the sampling fraction
Returns:
-
register_customized_data_cleaner
(data_cleaners: List) → NoReturn[source]¶ EXPERIMENTAL Allow user to register a customized data cleaner.
Parameters: data_cleaners – customized data cleaners Raises: ValueError
– data cleaner must be a child class of the base cleaner.
- X_preparer (