R functions¶
-
calculate_ginni_index
(outcome, data_frame, sample=None)[source]¶ Calculates Ginni impurity indices respect to an outcome variable
http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity Uses the RandomForest package
Parameters: - outcome (str) – Name of the outcome variable
- data_frame (pandas.DataFrame) – Data Frame with variable indices as index, this object will be modified
- sample (list) – Restrict the analysis to subjects in this sample
Returns: The input data_frame with an additional column called
Ginni
that contains the requested indices. The value will benumpy.inf
for the outcome variable, and 0 for variables where the index couldn’t be calculated.
-
calculate_anova
(outcome, regressors_data_frame, interactions_dict, sample)[source]¶ Calculates an anova regression
Uses the car package with type 3 sum of squares
Parameters: - outcome (str) – Name of outcome variable
- regressors_data_frame (pandas.DataFrame) – A DataFrame with three columns regressor name,
degrees of freedom, and interaction. The last column should have
zeros for single variable regressors and 1 for interaction terms.
see
braviz.interaction.qt_models.AnovaRegressorsModel.get_data_frame()
- interactions_dict (dict) – Dictionary mapping indices of interaction terms (in the previous DataFrame) to the indices of its factors.
- sample (list) – List of subject ids considered during the calculation
Returns: `` (output_df, residuals, intercept, fitted) – Factor name, sum of squares, degrees of freedom, F statistic and P value, it includes an (intercept) term. Residuals, intercept and fitted are parameters of the regression. See
braviz.interaction.qt_models.AnovaResultsModel
-
calculate_normalized_linear_regression
(outcome, regressors_data_frame, interactions_dict, sample)[source]¶ Calculates a linear regression after normalizing variables
It uses the arm package standardize
Parameters: - outcome (str) – Name of outcome variable
- regressors_data_frame (pandas.DataFrame) – A DataFrame with three columns regressor name,
degrees of freedom, and interaction. The last column should have
zeros for single variable regressors and 1 for interaction terms.
see
braviz.interaction.qt_models.AnovaRegressorsModel.get_data_frame()
- interactions_dict (dict) – Dictionary mapping indices of interaction terms (in the previous DataFrame) to the indices of its factors.
- sample (list) – List of subject ids considered during the calculation
Returns: A dictionary with the following fields
coefficients_df
: A data frame with 7 columns: Slope, T statistc value, P value, standard error, 95% confidence interval, r_name (variable alias used inside r) and components (Name of variables that make up the term)residuals
: vector with regression residualsfitted
: vector of fitted valuesadj_r2
: Adjusted r squaref_pval
: Regression fit p valuef_stats_val
: Value of the regression F statisticf_stat_df
: Degrees of freedom from F stistics (nominator,denominatior)data_points
: Subject ids used in the calculation (after dropping nans)standardized_model
: DataFrame of standardized datadata
: DataFrame used in the calculation (after dropping nans)mean_sigma
: mean and standard deviation from outcomevar_types
: Dictionary with the type of each variable, options are “r” for real, “b” for binary, and “n” for nominal variables with more levelsdummy_levels
: The text labels for each level of dummy variables (except the base level)
see Fitting & Interpreting Linear Models in R for more details on the interpretation of the output