R functions

calculate_ginni_index(outcome, data_frame, sample=None)[source]

Calculates Ginni impurity indices respect to an outcome variable

http://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity Uses the RandomForest package

Parameters:
  • outcome (str) – Name of the outcome variable
  • data_frame (pandas.DataFrame) – Data Frame with variable indices as index, this object will be modified
  • sample (list) – Restrict the analysis to subjects in this sample
Returns:

The input data_frame with an additional column called Ginni that contains the requested indices. The value will be numpy.inf for the outcome variable, and 0 for variables where the index couldn’t be calculated.

calculate_anova(outcome, regressors_data_frame, interactions_dict, sample)[source]

Calculates an anova regression

Uses the car package with type 3 sum of squares

Parameters:
  • outcome (str) – Name of outcome variable
  • regressors_data_frame (pandas.DataFrame) – A DataFrame with three columns regressor name, degrees of freedom, and interaction. The last column should have zeros for single variable regressors and 1 for interaction terms. see braviz.interaction.qt_models.AnovaRegressorsModel.get_data_frame()
  • interactions_dict (dict) – Dictionary mapping indices of interaction terms (in the previous DataFrame) to the indices of its factors.
  • sample (list) – List of subject ids considered during the calculation
Returns:

`` (output_df, residuals, intercept, fitted) – Factor name, sum of squares, degrees of freedom, F statistic and P value, it includes an (intercept) term. Residuals, intercept and fitted are parameters of the regression. See braviz.interaction.qt_models.AnovaResultsModel

calculate_normalized_linear_regression(outcome, regressors_data_frame, interactions_dict, sample)[source]

Calculates a linear regression after normalizing variables

It uses the arm package standardize

Parameters:
  • outcome (str) – Name of outcome variable
  • regressors_data_frame (pandas.DataFrame) – A DataFrame with three columns regressor name, degrees of freedom, and interaction. The last column should have zeros for single variable regressors and 1 for interaction terms. see braviz.interaction.qt_models.AnovaRegressorsModel.get_data_frame()
  • interactions_dict (dict) – Dictionary mapping indices of interaction terms (in the previous DataFrame) to the indices of its factors.
  • sample (list) – List of subject ids considered during the calculation
Returns:

A dictionary with the following fields

  • coefficients_df : A data frame with 7 columns: Slope, T statistc value, P value, standard error, 95% confidence interval, r_name (variable alias used inside r) and components (Name of variables that make up the term)
  • residuals : vector with regression residuals
  • fitted : vector of fitted values
  • adj_r2 : Adjusted r square
  • f_pval : Regression fit p value
  • f_stats_val : Value of the regression F statistic
  • f_stat_df : Degrees of freedom from F stistics (nominator,denominatior)
  • data_points : Subject ids used in the calculation (after dropping nans)
  • standardized_model : DataFrame of standardized data
  • data : DataFrame used in the calculation (after dropping nans)
  • mean_sigma : mean and standard deviation from outcome
  • var_types : Dictionary with the type of each variable, options are “r” for real, “b” for binary, and “n” for nominal variables with more levels
  • dummy_levels : The text labels for each level of dummy variables (except the base level)

see Fitting & Interpreting Linear Models in R for more details on the interpretation of the output

Utilities

import_or_error(lib_name)[source]

Tries to import an R package, if it is not found prints a message asking the user to install it and raises an exception

Parameters:lib_name (str) – Name of an R package