R functions¶

calculate_ginni_index(outcome, data_frame, sample=None)[source]¶

Calculates Ginni impurity indices respect to an outcome variable

http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity Uses the RandomForest package

Parameters:	outcome (str) – Name of the outcome variable data_frame (pandas.DataFrame) – Data Frame with variable indices as index, this object will be modified sample (list) – Restrict the analysis to subjects in this sample
Returns:	The input data_frame with an additional column called `Ginni` that contains the requested indices. The value will be `numpy.inf` for the outcome variable, and 0 for variables where the index couldn’t be calculated.

calculate_anova(outcome, regressors_data_frame, interactions_dict, sample)[source]¶

Calculates an anova regression

Uses the car package with type 3 sum of squares

Parameters:

outcome (str) – Name of outcome variable
regressors_data_frame (pandas.DataFrame) – A DataFrame with three columns regressor name, degrees of freedom, and interaction. The last column should have zeros for single variable regressors and 1 for interaction terms. see braviz.interaction.qt_models.AnovaRegressorsModel.get_data_frame()
interactions_dict (dict) – Dictionary mapping indices of interaction terms (in the previous DataFrame) to the indices of its factors.
sample (list) – List of subject ids considered during the calculation

Returns:

`` (output_df, residuals, intercept, fitted) – Factor name, sum of squares, degrees of freedom, F statistic and P value, it includes an (intercept) term. Residuals, intercept and fitted are parameters of the regression. See braviz.interaction.qt_models.AnovaResultsModel

calculate_normalized_linear_regression(outcome, regressors_data_frame, interactions_dict, sample)[source]¶

Calculates a linear regression after normalizing variables

It uses the arm package standardize

Parameters:

outcome (str) – Name of outcome variable
regressors_data_frame (pandas.DataFrame) – A DataFrame with three columns regressor name, degrees of freedom, and interaction. The last column should have zeros for single variable regressors and 1 for interaction terms. see braviz.interaction.qt_models.AnovaRegressorsModel.get_data_frame()
interactions_dict (dict) – Dictionary mapping indices of interaction terms (in the previous DataFrame) to the indices of its factors.
sample (list) – List of subject ids considered during the calculation

Returns:

A dictionary with the following fields

coefficients_df : A data frame with 7 columns: Slope, T statistc value, P value, standard error, 95% confidence interval, r_name (variable alias used inside r) and components (Name of variables that make up the term)

residuals : vector with regression residuals

fitted : vector of fitted values

adj_r2 : Adjusted r square

f_pval : Regression fit p value

f_stats_val : Value of the regression F statistic

f_stat_df : Degrees of freedom from F stistics (nominator,denominatior)

data_points : Subject ids used in the calculation (after dropping nans)

standardized_model : DataFrame of standardized data

data : DataFrame used in the calculation (after dropping nans)

mean_sigma : mean and standard deviation from outcome

var_types : Dictionary with the type of each variable, options are “r” for real, “b” for binary, and “n” for nominal variables with more levels

dummy_levels : The text labels for each level of dummy variables (except the base level)

see Fitting & Interpreting Linear Models in R for more details on the interpretation of the output

Utilities¶

import_or_error(lib_name)[source]¶

Tries to import an R package, if it is not found prints a message asking the user to install it and raises an exception

Parameters:	lib_name (str) – Name of an R package

Table Of Contents

Previous topic

Next topic

R functions¶

Utilities¶