class model ( )

The class to fit the regression models in stage one, predict the stage one results and fit stage two regression models


Methods

  • __init__(self)

  • st1_fit()

  • st1_predict()

  • st2_fit()

  • choose_estimator_by_meteo_line()

  • stage2_output_report()


Attributes

(init method of model class)

direc str

Directory of the class


st1_model_results_dic dict

A dictionary consist of st1 model results


st1_varname_list list

List of the names of independent variables in st1


st1_model_month_list list

List of desired months to model in st1


used_feature_list list

List of all used features (strings) in st1


cls_list list

A list of preprocess class objects that we wish to model in st2


all_pred Pandas Dataframe

A dataframe of all st1 predictions


predictions_monthly_list list

Dataframes of predictions of stage one, separated monthly as list elements


st2_model_month_list list

List of desired months to model in st1. Indicated months have to exist in st1_model_month_list


dic_second_stage_names dict

Helps in generating model_var_dict in st2_fit


st2_model_results_dic dict

A dictionary consist of st2 model results


dependent_model_selection boolean

To select the best model based on meteorological line. only useful if there is a linear reference line (EX:Isotopes)


meteo_coef float

If dependent_model_selection=True,global_line, coefficient of the line


meteo_intercept float

If dependent_model_selection=True,global_line, intercept of the line


selection_method str

If dependent_model_selection=True, selection_method: independent,local_line,global_line, point_to_point


thresh_meteoline_high_scores None type or float

A threshold to just consider models with scores higher than that value. if none, equal to mean of scores+std of scores/3


model_selection_report boolean

True or False, to determine if there is a need to model selection method report


model.st1_fit ( )

model.st1_fit (self,var_cls_list,direc,st1_model_month_list="all",args_dic= { "feature_selection" : "auto" , "vif_threshold" : 5, "vif_selection_pairs":[],"correlation_threshold":0.87,"vif_corr":True,"p_val":0.05})

The method to fit regression models to identified preprocess class objects in stage one


Parameters

var_cls_list list

A list of preprocess class objects to to fit regression models. Regression models will be fitted to each elemnt of the list (a preprocess class object).


direc str

Directory of the class


st1_model_month_list str or list of integers default="all"

List of desired months to model in st1


args_dic dict default={"feature_selection":"auto","vif_threshold":5, "vif_selection_pairs":[],"correlation_threshold":0.87,"vif_corr":True,"p_val":0.05}

A dictionary of parameters that identifies the behaviour of feature selection prior to regressions:

  • args_dic["feature_selection"] =”manual”: Statistical information will be shown to the user, and the desired features will be chosen by the user

  • args_dic["feature_selection"] =”auto”: Feature selection will be done automatically

  • args_dic["vif_threshold"] =None: VIF (Variation Inflation Factor) will not be considered as a factor in feature selection

  • args_dic["vif_threshold"] = float type: A threshold to identify high VIF values

  • args_dic["vif_corr"] = True: If True, use correlation coefficient values to identify multicolinearity in features with high vif value

  • args_dic["correlation_threshold"] = 0.87 A threshold to identify high correlation coefficient values

  • args_dic["vif_selection_pairs"] = empty list or list of list(s): If empty: feature elimination based on vif will be automatic

if args_dic["vif_selection_pairs"] =[ ["a","b"] ], in case both "a" and "b" have high vif values and high correlations, the b values will be eliminated


Attributes

direc str

Directory of the class


st1_model_results_dic dict

A dictionary consist of st1 model results


st1_varname_list list

List of the names of independent variables in st1


st1_model_month_list list

List of desired months to model in st1


model.st2_fit ( )

model.st2_fit (self,model_var_dict=None, output_report=True, dependent_model_selection=False, dependent_model_selection_list=None, meteo_coef=8, meteo_intercept=10, selection_method="point_to_point", thresh_meteoline_high_scores=None, model_selection_report=True, args_dic={"feature_selection":"auto", "vif_threshold":5, "vif_selection_pairs":[],  "correlation_threshold":0.87, "vif_corr":True,"p_val":0.05}):

The method to fit regression models to identified preprocess class objects in stage one


Parameters

model_var_dict None type or dict default=None

A dictionary that determines dependent (key - string) and independent (value) features of the second stage regression models. Independent features (value) have to be a list of feature names (string).

If None, all features (independent st1 features and dependent st1 features) will be considered as independent features of second stage models.

  • EXAMPLE:

model_var_dict = {"is1":["CooZ","hmd"],"is2":["prc","hmd"],}

output_report boolean default=True

To generate output reports


Parameters used in choose_estimator_by_meteo_line

dependent_model_selection boolean default=False To select the best model based on a (meteorological) line. only useful if there is a linear refrence line (EX:Isotopes)


dependent_model_selection_list default=None

Used if dependent_model_selection=True. List of two features that have to be used in dependent_model_selection


meteo_coef default=8

Used if dependent_model_selection=True and selection_method="global_line". Coefficient of the line


meteo_intercept default=10

Used if dependent_model_selection=True and selection_method="global_line". Intercept of the line


selection_method default=”point_to_point”

Used if dependent_model_selection=True. selection_method could be:

  • independent

  • local_line: coef and intercept derived from a linear regression of observed data

  • global_line

  • point_to_point: find the models pair with shortest average distance between observed and predicted data


thresh_meteoline_high_scores None type or float default=None

A threshold to just consider models with scores higher than that value. if None, equal to mean of scores+std of scores/3


model_selection_report boolean default =True

To determine if there is a need to model selection method report


args_dic dict default={"feature_selection":"auto","vif_threshold":5, "vif_selection_pairs":[], "correlation_threshold": 0.87, "vif_corr": True, "p_val":0.05}

A dictionary of parameters that identifies the behaviour of feature selection prior to regressions:

  • args_dic["feature_selection"] ="manual": Statistical information will be shown to the user, and the desired features will be chosen by the user

  • args_dic["feature_selection"] ="auto": Feature selection will be done automatically

  • args_dic["vif_threshold"] =None: VIF (Variation Inflation Factor) will not be considered as a factor in feature selection

  • args_dic["vif_threshold"] = float type: A threshold to identify high VIF values

  • args_dic["vif_corr"] = True: If True, use correlation coefficient values to identify multicolinearity in features with high vif value

  • args_dic["correlation_threshold"] = 0.87 A threshold to identify high correlation coefficient values

  • args_dic["vif_selection_pairs"] = empty list or list of list(s): If empty: feature elimination based on vif will be automatic

if args_dic["vif_selection_pairs"] =[ ["a","b"] ], in case both "a" and "b" have high vif values and high correlations, the b values will be eliminated


Attributes

st2_model_results_dic dict

A dictionary consist of st2 model results


Attributes used in choose_estimator_by_meteo_line

dependent_model_selection boolean

To select the best model based on meteorological line. only useful if there is a linear refrence line (EX:Isotopes)


meteo_coef float

If dependent_model_selection=True, global_line, coefficient of the line


meteo_intercept float

If dependent_model_selection=True, global_line, intercept of the line


selection_method str

If dependent_model_selection=True, selection_method: "independent", "local_line", "global_line", "point_to_point"


**thresh_meteoline_high_scores None type or float

A threshold to just consider models with scores higher than that value. if None, equal to mean of scores+std of scores/3


model_selection_report boolean

To determine if there is a need to model selection method report


model.choose_estimator_by_meteo_line ( )

model.choose_estimator_by_meteo_line( self, dependent_model_selection_list, selection_method="point_to_point", model_selection_report=True, thresh_meteoline_high_scores=None, meteo_coef=8, meteo_intercept=10 ):

The method to select the best model based on a (meteorological) line. only useful if there is a linear refrence line (EX:Isotopes). This method could be called automatically in st2_fit if dependent_model_selection=True. or it can be called after st2_fit execution to see the changes in best regression models based on different criterias.

IMPORTANT NOTE: Executing this method will update the st2_model_results_dic to match the latest chosen selection_method. st2_model_results_dic stores the second stage results.


Parameters

dependent_model_selection_list default=None

Used if dependent_model_selection=True. List of two features that have to be used in dependent_model_selection


meteo_coef default=8

Used if dependent_model_selection=True and selection_method="global_line". Coefficient of the line


meteo_intercept default=10

Used if dependent_model_selection=True and selection_method="global_line". Intercept of the line


selection_method default="point_to_point"

Used if dependent_model_selection=True. Selection_method could be:

  • "independent"

  • "local_line": coef and intercept derived from a linear regression of observed data

  • "global_line"

  • "point_to_point": find the models pair with shortest average distance between observed and predicted data


thresh_meteoline_high_scores None type or float default=None

A threshold to just consider models with scores higher than that value. if none, equal to mean of scores+std of scores/3


model_selection_report boolean default =True

True or False, to determine if there is a need to model selection method report


Attributes

st2_model_results_dic dict

Updated dictionary of st2 model results


dependent_model_selection boolean

To select the best model based on meteorological line. only useful if there is a linear refrence line (EX:Isotopes)


meteo_coef float

If dependent_model_selection=True,global_line, coefficient of the line


meteo_intercept float

If dependent_model_selection=True,global_line, intercept of the line


selection_method str

If dependent_model_selection=True, selection_method: "independent","local_line","global_line", "point_to_point"


thresh_meteoline_high_scores None type or float

A threshold to just consider models with scores higher than that value. if none, equal to mean of scores+std of scores/3


model_selection_report boolean

True or False, to determine if there is a need to model selection method report


model.stage2_output_report ( )

model.stage2_output_report(self,direc=None):

This method is useful to update st2_fit output files results in case they are changed. (Normally the change can happen if choose_estimator_by_meteo_line method is executed)


Parameters

direc str default=None

Directory of the output