API Reference¶
Foreshadow¶
Core end-to-end pipeline, foreshadow.
-
class
Foreshadow
(X_preparer=None, y_preparer=None, estimator=None, optimizer=None, optimizer_kwargs=None)[source]¶ An end-to-end pipeline to preprocess and tune a machine learning model.
Example
>>> shadow = Foreshadow()
Parameters: - X_preparer (
Preprocessor
, optional) – Preprocessor instance that will apply to X data. Passing False prevents the automatic generation of an instance. - y_preparer (
Preprocessor
, optional) – Preprocessor instance that will apply to y data. Passing False prevents the automatic generation of an instance. - estimator (
sklearn.base.BaseEstimator
, optional) – Estimator instance to fit on processed data - optimizer (
sklearn.grid_search.BaseSeachCV
, optional) – Optimizer class to optimize feature engineering and model hyperparameters
-
X_preparer
¶ Preprocessor object for performing feature engineering on X data.
Getter: Returns Preprocessor object Setter: Verifies Preprocessor object, if None, creates a default Preprocessor Type: Preprocessor
-
y_preparer
¶ Preprocessor object for performing scaling and encoding on Y data.
Getter: Returns Preprocessor object Setter: Verifies Preprocessor object, if None, creates a default Preprocessor Type: Preprocessor
-
estimator
¶ Estimator object for fitting preprocessed data.
Getter: Returns Estimator object Setter: Verifies Estimator object. If None, an AutoEstimator
object is created in place.Type: sklearn.base.BaseEstimator
-
optimizer
¶ Optimizer class that will fit the model.
Performs a grid or random search algorithm on the parameter space from the preprocessors and estimators in the pipeline
Getter: Returns optimizer class Setter: Verifies Optimizer class, defaults to None
-
fit
(data_df, y_df)[source]¶ Fit the Foreshadow instance using the provided input data.
Parameters: Returns: The fitted instance.
Return type:
-
predict
(data_df)[source]¶ Use the trained estimator to predict the response variable.
Parameters: data_df ( DataFrame
) – The input feature(s)Returns: The response feature(s) (transformed if necessary) Return type: DataFrame
-
predict_proba
(data_df)[source]¶ Use the trained estimator to predict the response variable.
Uses the predicted confidences instead of binary predictions.
Parameters: data_df ( DataFrame
) – The input feature(s)Returns: The probability associated with each response feature Return type: DataFrame
-
score
(data_df, y_df=None, sample_weight=None)[source]¶ Use the trained estimator to compute the evaluation score.
The scoding method is defined by the selected estimator.
Parameters: - data_df (
DataFrame
) – The input feature(s) - y_df (
DataFrame
, optional) – The response feature(s) - sample_weight (
numpy.ndarray
, optional) – The weights to be used when scoring each sample
Returns: A computed prediction fitness score
Return type: - data_df (
-
dict_serialize
(deep=False)[source]¶ Serialize the init parameters of the foreshadow object.
Parameters: deep (bool) – If True, will return the parameters for this estimator recursively Returns: The initialization parameters of the foreshadow object. Return type: dict
-
classmethod
dict_deserialize
(data)[source]¶ Deserialize the dictionary form of a foreshadow object.
Parameters: data – The dictionary to parse as foreshadow object is constructed. Returns: A re-constructed foreshadow object. Return type: object
- X_preparer (
dp¶
Intents¶
Intents package used by IntentMapper PreparerStep.
-
class
Categoric
[source]¶ Defines a categoric column type.
-
confidence_computation
= {<class 'foreshadow.metrics.MetricWrapper' with function 'num_valid' object at 140236109800504>: 0.25, <class 'foreshadow.metrics.MetricWrapper' with function 'unique_heur' object at 140236109800560>: 0.65, <class 'foreshadow.metrics.MetricWrapper' with function 'is_numeric' object at 140236109800616>: 0.1}¶
-
fit
(X, y=None, **fit_params)[source]¶ Empty fit.
Parameters: - X – The input data
- y – The response variable
- **fit_params – Additional parameters for the fit
Returns: self
-
-
class
Numeric
[source]¶ Defines a numeric column type.
-
confidence_computation
= {<class 'foreshadow.metrics.MetricWrapper' with function 'num_valid' object at 140236109801400>: 0.3, <class 'foreshadow.metrics.MetricWrapper' with function 'unique_heur' object at 140236109801456>: 0.2, <class 'foreshadow.metrics.MetricWrapper' with function 'is_numeric' object at 140236109801512>: 0.4, <class 'foreshadow.metrics.MetricWrapper' with function 'is_string' object at 140236109801568>: 0.1}¶
-
fit
(X, y=None, **fit_params)[source]¶ Empty fit.
Parameters: - X – The input data
- y – The response variable
- **fit_params – Additional parameters for the fit
Returns: self
-
-
class
Text
[source]¶ Defines a text column type.
-
confidence_computation
= {<class 'foreshadow.metrics.MetricWrapper' with function 'num_valid' object at 140236109802184>: 0.2, <class 'foreshadow.metrics.MetricWrapper' with function 'unique_heur' object at 140236109802240>: 0.2, <class 'foreshadow.metrics.MetricWrapper' with function 'is_numeric' object at 140236109802296>: 0.2, <class 'foreshadow.metrics.MetricWrapper' with function 'is_string' object at 140236109802352>: 0.2, <class 'foreshadow.metrics.MetricWrapper' with function 'has_long_text' object at 140236109802408>: 0.2}¶
-
fit
(X, y=None, **fit_params)[source]¶ Empty fit.
Parameters: - X – The input data
- y – The response variable
- **fit_params – Additional parameters for the fit
Returns: self
-
-
class
BaseIntent
[source]¶ Base for all intent definitions.
For each intent subclass a class attribute called confidence_computation must be defined which is of the form:
{ metric_def: weight }
Estimators¶
Estimators provided by foreshadow.
-
class
AutoEstimator
(problem_type=None, auto=None, include_preprocessors=False, estimator_kwargs=None)[source]¶ A wrapped estimator that selects the solution for a given problem.
By default each automatic machine learning solution runs for 1 minute but that can be changed through passed kwargs. Autosklearn is not required for this to work but if installed it can be used alongside TPOT.
Parameters: - problem_type (str) – The problem type, ‘regression’ or ‘classification’
- auto (str) – The automatic estimator, ‘tpot’ or ‘autosklearn’
- include_preprocessors (bool) – Whether include preprocessors in AutoML pipelines
- estimator_kwargs (dict) – A dictionary of args to pass to the specified auto estimator (both problem_type and auto must be specified)
-
problem_type
¶ Type of machine learning problem.
Either regression or classification.
Returns: self._problem_type
-
auto
¶ Type of automl package.
Either tpot or autosklearn.
Returns: self._auto, the type of automl package
-
estimator_kwargs
¶ Get dictionary of kwargs to pass to AutoML package.
Returns: estimator kwargs
-
configure_estimator
(y)[source]¶ Construct and return the auto estimator instance.
Parameters: y – input labels Returns: autoestimator instance
-
fit
(X, y)[source]¶ Fit the AutoEstimator instance.
Uses the selected AutoML estimator.
Parameters: - X (pandas.DataFrame or numpy.ndarray or list) – The input feature(s)
- y (pandas.DataFrame or numpy.ndarray or list) – The response feature(s)
Returns: The selected estimator
-
predict
(X)[source]¶ Use the trained estimator to predict the response.
Parameters: X (pandas.DataFrame or numpy.ndarray or list) – The input feature(s) Returns: The response feature(s) Return type: pandas.DataFrame
-
predict_proba
(X)[source]¶ Use the trained estimator to predict the responses probabilities.
Parameters: X (pandas.DataFrame or numpy.ndarray or list) – The input feature(s) Returns: The probability associated with each response feature Return type: pandas.DataFrame
-
score
(X, y, sample_weight=None)[source]¶ Use the trained estimator to compute the evaluation score.
Note: sample weights are not supported
Parameters: - X (pandas.DataFrame or numpy.ndarray or list) – The input feature(s)
- y (pandas.DataFrame or numpy.ndarray or list) – The response feature(s)
- sample_weight – sample weighting. Not implemented.
Returns: A computed prediction fitness score
Return type:
-
class
MetaEstimator
(estimator, preprocessor)[source]¶ Wrapper that allows data preprocessing on the response variable(s).
Parameters: - estimator – An instance of a subclass of
sklearn.base.BaseEstimator
- preprocessor – An instance of
foreshadow.preprocessor.Preprocessor
-
dict_serialize
(deep=False)[source]¶ Serialize the init parameters (dictionary form) of a transformer.
Parameters: deep (bool) – If True, will return the parameters for this estimator recursively Returns: The initialization parameters of the transformer. Return type: dict
-
fit
(X, y=None)[source]¶ Fit the AutoEstimator instance using a selected AutoML estimator.
Parameters: - X (
pandas.DataFrame
ornumpy.ndarray
or list) – The input feature(s) - y (
pandas.DataFrame
ornumpy.ndarray
or list) – The response feature(s)
Returns: self
- X (
-
predict
(X)[source]¶ Use the trained estimator to predict the response.
Parameters: X (pandas.DataFrame or numpy.ndarray
or list) – The input feature(s)Returns: The response feature(s) (transformed) Return type: pandas.DataFrame
-
predict_proba
(X)[source]¶ Use the trained estimator to predict the response probabilities.
Parameters: X ( pandas.DataFrame
ornumpy.ndarray
or list) – The input feature(s)Returns: The probability associated with each feature Return type: pandas.DataFrame
-
score
(X, y)[source]¶ Use the trained estimator to compute the evaluation score.
Note: sample weights are not supported
Parameters: - X (
pandas.DataFrame
ornumpy.ndarray
or list) – The input feature(s) - y (
pandas.DataFrame
ornumpy.ndarray
or list) – The response feature(s)
Returns: A computed prediction fitness score
Return type: - X (
- estimator – An instance of a subclass of
Optimizers¶
Foreshadow optimizers.
-
class
ParamSpec
(fs_pipeline=None, X_df=None, y_df=None)[source]¶ Holds the specification of the parameter search space.
A search space is a dict or list of dicts. This search space should be viewed as one run of optimization on the foreshadow object. The algorithm for optimization is determined by the optimizer that is chosen. Hence, this specification is agnostic of the optimizer chosen.
A dict represents the set of parameters to be applied in a single run.
A list represents a set of choices that the algorithm (again, agnostic at this point) can pick from.
For example, imagine s as our top level object, of structure:
- s (object)
- .transformer (object)
- .attr
s has an attribute that may be optimized and in turn, that object has parameters that may be optimized. Below, we try two different transformers and try 2 different parameter specifications for each. Note that these parameters are specific to the type of transformer (StandardScaler does not have the parameter feature_range and vice versa).
- [
- {
- “s__transformer”: “StandardScaler”, “s__transformer__with_mean”: [False, True],
}, {
“s__transformer”: “MinMaxScaler”, “s__transformer__feature_range”: [(0, 1), (0, 0.5)] ),},
],
Here, the dicts are used to tell the optimizer where to values to set are. The lists showcase the different values that are possible.
-
convert
(key, replace_val=<function hp_choice>)[source]¶ Convert internal self.param_distributions to valid distribution.
Uses _replace_list to replace all lists with replace_val
Parameters: - key – key to use for top level hp.choice name
- replace_val – value to replace lists with.
-
class
Tuner
(pipeline=None, params=None, optimizer=None, optimizer_kwargs={})[source]¶ Tunes the Foreshadow object using a ParamSpec and Optimizer.
-
class
RandomSearchCV
(estimator, param_distributions, n_iter=10, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', random_state=None, error_score='raise', return_train_score='warn', max_tries=100)[source]¶ Optimize Foreshadow.pipeline and/or its sub-objects.
Utils¶
Common Foreshadow utilities.
-
get_cache_path
()[source]¶ Get the cache path which is in the config directory.
Note
This function also makes the directory if it does not already exist.
Returns: str; The path to the cache directory.
-
get_config_path
()[source]¶ Get the default config path.
Note
This function also makes the directory if it does not already exist.
Returns: The path to the config directory. Return type: str
-
get_transformer
(class_name, source_lib=None)[source]¶ Get the transformer class from its name.
Note
In case of name conflict, internal transformer is preferred over external transformer import. This should only be using in internal unit tests, get_transformer from serialization should be preferred in all other cases. This was written to decouple registration from unit testing.
Parameters: Returns: Imported class
Raises: TransformerNotFound
– If class_name could not be found in internal or external transformer library pathways.
-
check_df
(input_data, ignore_none=False, single_column=False, single_or_empty=False)[source]¶ Convert non dataframe inputs into dataframes.
Parameters: - input_data (
pandas.DataFrame
,numpy.ndarray
, list) – input to convert - ignore_none (bool) – allow None to pass through check_df
- single_column (bool) – check if frame is of a single column and return series
- single_or_empty (bool) – check if the frame is a single column or an empty DF.
Returns: Converted and validated input dataframes
Return type: Raises: ValueError
– Invalid input typeValueError
– Input dataframe must only have one column
- input_data (
-
check_series
(input_data)[source]¶ Convert non series inputs into series.
This is function is to be used in situations where a series is expected but cannot be guaranteed to exist. For example, this function is used in the metrics package to perform computations on a column using functions that only work with series.
Note
This is not to be used in transformers as it will break the standard that enforces only DataFrames as input and output for those objects.
Parameters: input_data (iterable) – The input data
Returns: pandas.Series
Raises: ValueError
– If the data could not be processedValueError
– If the input is a DataFrame and has more than one column
-
check_module_installed
(name)[source]¶ Check whether a module is available for import.
Parameters: name (str) – module name Returns: Whether the module can be imported Return type: bool
-
check_transformer_imports
(printout=True)[source]¶ Determine which transformers were automatically imported.
Parameters: printout (bool, optional) – Whether to output to stdout Returns: A tuple of the internal transformers and the external transformers Return type: tuple(list)
-
is_transformer
(value, method='isinstance')[source]¶ Check if the class is a transformer class.
Parameters: - value – Class or instance
- method (str) – Method of checking. Options are ‘issubclass’ or ‘isinstance’
Returns: True if transformer, False if not.
Raises: ValueError
– if method is neither issubclass or isinstance
-
is_wrapped
(transformer)[source]¶ Check if a transformer is wrapped.
Parameters: transformer – A transformer instance Returns: True if transformer is wrapped, otherwise False. Return type: bool
-
dynamic_import
(attribute, module_path)[source]¶ Import attribute from module found at module_path at runtime.
Parameters: - attribute – the attribute of the module to import (class, function, …)
- module_path – the path to the module.
Returns: attribute from module_path.
Mixin that configure column sharer.
Configure the column sharer attribute if exists.
Parameters: column_sharer – a column sharer instance