# class model ( )

The class to fit the regression models in stage one, predict the stage one results and fit stage two regression models

---


## **Methods**

* \_\_init__(self)

* st1_fit()

* st1_predict()

* st2_fit()

* choose_estimator_by_meteo_line()

* stage2_output_report()

---


## **Attributes**

(__init__ method of model class)


**direc** str

Directory of the class

---
**st1_model_results_dic** dict

A dictionary consist of st1 model results

---
**st1_varname_list** list

List of the names of independent variables in st1

---
**st1_model_month_list** list

List of desired months to model in st1

---
**used_feature_list** list

List of all used features (strings) in st1

---
**cls_list** list

A list of preprocess class objects that we wish to model in st2

---
**all_pred** Pandas Dataframe

A dataframe of all st1 predictions

---
**predictions_monthly_list** list

Dataframes of predictions of stage one, separated monthly as list elements 

---
**st2_model_month_list** list

List of desired months to model in st1. Indicated months have to exist in st1_model_month_list

---
**dic_second_stage_names** dict

Helps in generating model_var_dict in st2_fit

---
**st2_model_results_dic** dict

A dictionary consist of st2 model results

---
**dependent_model_selection** boolean

To select the best model based on meteorological line. only useful if there is a linear reference line (EX:Isotopes)

---
**meteo_coef** float 

If dependent_model_selection=True,global_line, coefficient of the line

---
**meteo_intercept** float

If dependent_model_selection=True,global_line, intercept of the line

---
**selection_method** str

If dependent_model_selection=True, selection_method: independent,local_line,global_line, point_to_point

---
**thresh_meteoline_high_scores** None type or float

A threshold to just consider models with scores higher than that value. if none, equal to mean of scores+std of scores/3

---
**model_selection_report** boolean

True or False, to determine if there is a need to model selection method report

---


## model.st1_fit ( )

model.st1_fit (`self,var_cls_list,direc,st1_model_month_list="all",args_dic= { "feature_selection" : "auto" , "vif_threshold" : 5, "vif_selection_pairs":[],"correlation_threshold":0.87,"vif_corr":True,"p_val":0.05}`)

The method to fit regression models to identified preprocess class objects in stage one 

---


### **Parameters**

**var_cls_list** list

A list of preprocess class objects to to fit regression models. Regression models will be fitted to each elemnt of the list (a preprocess class object).

---
**direc** str

Directory of the class

---
st1_model_month_list str or list of integers default=`"all"`

List of desired months to model in st1

---
**args_dic** dict default={`"feature_selection":"auto","vif_threshold":5, "vif_selection_pairs":[],"correlation_threshold":0.87,"vif_corr":True,"p_val":0.05}`

A dictionary of parameters that identifies the behaviour of feature selection prior to regressions:    
    
* args_dic[`"feature_selection"`] ="manual": Statistical information will be shown to the user, and the desired features will be
chosen by the user

* args_dic[`"feature_selection"`] ="auto": Feature selection will be done automatically

* args_dic[`"vif_threshold"`] =None: VIF (Variation Inflation Factor) will not be considered as a factor in feature selection

* args_dic[`"vif_threshold"`] = float type: A threshold to identify high VIF values

* args_dic[`"vif_corr"`] = True: If True, use correlation coefficient values to identify multicolinearity in features with high vif value

* args_dic[`"correlation_threshold"`] = 0.87 A threshold to identify high correlation coefficient values

* args_dic[`"vif_selection_pairs"`] = empty list or list of list(s): If empty: feature elimination based on vif will be automatic

if  args_dic[`"vif_selection_pairs"`] =[ [`"a","b"`] ], in case both `"a"` and `"b"` have high vif values and high correlations, the b values will be eliminated

---


### **Attributes**

**direc** str

Directory of the class

---
**st1_model_results_dic** dict

A dictionary consist of st1 model results

---
**st1_varname_list** list

List of the names of independent variables in st1

---
**st1_model_month_list** list

List of desired months to model in st1

---


## model.st2_fit ( )

model.st2_fit (`self,model_var_dict=None, output_report=True, dependent_model_selection=False, dependent_model_selection_list=None, meteo_coef=8, meteo_intercept=10, selection_method="point_to_point", thresh_meteoline_high_scores=None, model_selection_report=True, args_dic={"feature_selection":"auto", "vif_threshold":5, "vif_selection_pairs":[], 
"correlation_threshold":0.87, "vif_corr":True,"p_val":0.05}`):
  
The method to fit regression models to identified preprocess class objects in stage one

---


### **Parameters**

**model_var_dict** None type or dict  default=`None`

A dictionary that determines dependent (key - string) and independent (value) features of the second stage regression models.
Independent features (value) have to be a list of feature names (string).

If `None`, all features (independent st1 features and dependent st1 features) will be
considered as independent features of second stage models.

* *EXAMPLE:*
```python
model_var_dict = {"is1":["CooZ","hmd"],"is2":["prc","hmd"],}
```
---
**output_report** boolean default=`True`

To generate output reports

---
*Parameters used in choose_estimator_by_meteo_line* 

**dependent_model_selection** boolean default=`False`
To select the best model based on a (meteorological) line. only useful if there is a linear refrence line (EX:Isotopes)

---
**dependent_model_selection_list** default=`None`

Used if dependent_model_selection=`True`. List of two features that have to be used in `dependent_model_selection`

---
**meteo_coef** default=`8`

Used if dependent_model_selection=`True` and selection_method=`"global_line"`. Coefficient of the line

---
**meteo_intercept** default=`10`

Used if dependent_model_selection=`True` and selection_method=`"global_line"`. Intercept of the line

---
**selection_method** default="point_to_point"

Used if `dependent_model_selection`=`True`. `selection_method` could be:

* `independent`
* `local_line`: coef and intercept derived from a linear regression of observed data
* `global_line`
* `point_to_point`: find the models pair with shortest average distance between observed and predicted data  

---
**thresh_meteoline_high_scores** None type or float default=`None`

A threshold to just consider models with scores higher than that value. if `None`, equal to mean of scores+std of scores/3

---
**model_selection_report** boolean default =`True`

To determine if there is a need to model selection method report

---
**args_dic** dict default=`{"feature_selection":"auto","vif_threshold":5, "vif_selection_pairs":[], "correlation_threshold": 0.87, "vif_corr": True, "p_val":0.05}`

A dictionary of parameters that identifies the behaviour of feature selection prior to regressions:    
    
* args_dic[`"feature_selection"`] =`"manual"`: Statistical information will be shown to the user, and the desired features will be
chosen by the user

* args_dic[`"feature_selection"`] =`"auto"`: Feature selection will be done automatically

* args_dic[`"vif_threshold"`] =None: VIF (Variation Inflation Factor) will not be considered as a factor in feature selection

* args_dic[`"vif_threshold"`] = float type: A threshold to identify high VIF values

* args_dic[`"vif_corr"`] = `True`: If True, use correlation coefficient values to identify multicolinearity in features with high vif value

* args_dic[`"correlation_threshold"`] = `0.87` A threshold to identify high correlation coefficient values

* args_dic[`"vif_selection_pairs"`] = empty list or list of list(s): If empty: feature elimination based on vif will be automatic

if  args_dic[`"vif_selection_pairs"`] =[ [`"a","b"`] ], in case both `"a"` and `"b"` have high vif values and high correlations, the b values will be eliminated

---


### **Attributes**

**st2_model_results_dic** dict

A dictionary consist of st2 model results

---
*Attributes used in choose_estimator_by_meteo_line* 

**dependent_model_selection** boolean

To select the best model based on meteorological line. only useful if there is a linear refrence line (EX:Isotopes)

---
**meteo_coef** float 

If dependent_model_selection=`True`, global_line, coefficient of the line

---
**meteo_intercept** float

If dependent_model_selection=`True`, global_line, intercept of the line

---
**selection_method** str

If dependent_model_selection=`True`, selection_method: `"independent", "local_line", "global_line", "point_to_point"`

---
**thresh_meteoline_high_scores None type or float

A threshold to just consider models with scores higher than that value. if `None`, equal to mean of scores+std of scores/3

---
**model_selection_report** boolean

To determine if there is a need to model selection method report

---


## model.choose_estimator_by_meteo_line ( )


model.choose_estimator_by_meteo_line( `self, dependent_model_selection_list, selection_method="point_to_point", model_selection_report=True, thresh_meteoline_high_scores=None, meteo_coef=8, meteo_intercept=10` ):

The method to select the best model based on a (meteorological) line. only useful if there is a linear refrence line (EX:Isotopes).
This method could be called automatically in st2_fit if dependent_model_selection=True. or it can be called after st2_fit execution
to see the changes in best regression models based on different criterias.

*IMPORTANT NOTE:*  Executing this method will update the st2_model_results_dic to match the latest chosen selection_method. st2_model_results_dic stores the second stage results.

---


### **Parameters**

**dependent_model_selection_list** default=`None`

Used if dependent_model_selection=True. List of two features that have to be used in dependent_model_selection

---
**meteo_coef** default=8

Used if dependent_model_selection=`True` and selection_method=`"global_line"`. Coefficient of the line

---
**meteo_intercept** default=10

Used if dependent_model_selection=`True` and selection_method=`"global_line"`. Intercept of the line

---
**selection_method** default=`"point_to_point"`

Used if dependent_model_selection=`True`. Selection_method could be:

* `"independent"`
* `"local_line"`: coef and intercept derived from a linear regression of observed data
* `"global_line"`
* `"point_to_point"`: find the models pair with shortest average distance between observed and predicted data  

---
**thresh_meteoline_high_scores** None type or float default=None

A threshold to just consider models with scores higher than that value. if none, equal to mean of scores+std of scores/3

---
**model_selection_report** boolean default =`True`

`True` or `False`, to determine if there is a need to model selection method report

---


### **Attributes**

**st2_model_results_dic** dict

Updated dictionary of st2 model results

---
**dependent_model_selection** boolean

To select the best model based on meteorological line. only useful if there is a linear refrence line (EX:Isotopes)

---
**meteo_coef** float 

If dependent_model_selection=True,global_line, coefficient of the line

---
**meteo_intercept** float

If dependent_model_selection=True,global_line, intercept of the line

---
**selection_method** str

If dependent_model_selection=`True`, selection_method: `"independent","local_line","global_line", "point_to_point"`

---
**thresh_meteoline_high_scores** None type or float

A threshold to just consider models with scores higher than that value. if none, equal to mean of scores+std of scores/3

---
**model_selection_report** boolean

True or False, to determine if there is a need to model selection method report

---


## model.stage2_output_report ( )

model.stage2_output_report(self,direc=None):

This method is useful to update st2_fit output files results in case they are changed.
(Normally the change can happen if choose_estimator_by_meteo_line method is executed)

---


### **Parameters**

**direc** str default=`None`

Directory of the output