# Marking Statistically Significant Values using Pandas

Wed 21 May 2014 by Eoin TraversWriting a results section, I had some data collated using the Pandas library in Python, which I wanted to display the mean for a number of groups, and show if that mean was significantly different from chance (.5) in each case.

Calculating the means, and running the binomial test, is simple.
I'll demonstrate with a data set from UCLA,
the details of which aren't important, but I'm going to look at average
`admit`

, grouped by `rank`

.

```
import pandas as pd
from scipy import stats
# Data courtesy of http://www.ats.ucla.edu/stat/r/dae/logit.htm
data = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
data_grouped = data.groupby('rank') # Grouped values
data_means = data_grouped.mean() # Mean values
# Number of values in the first group (assuming all groups to be equal)
N = data_grouped.count().admit.iloc[0]
# Run a binomial tests for each group
# m*N = Mean accuracy * Number of trials = Total Accuracy
# .5 = Chance.
data_means['p'] = [stats.binom_test(m*N, N, .5) for m in data_means.admit]
print np.round(data_means[['admit', 'p']], 3)
```

```
admit p
rank
1 0.541 0.609
2 0.358 0.020
3 0.231 0.000
4 0.179 0.000
[4 rows x 2 columns]
```

To output this they way you would expect in a publication, I used the following
function, which takes a Pandas `dataframe`

, a list of value column names, a
list of p value column names, and a number why which to round to output.
The input is a list, rather than just a value, so you can enter a list of
columns for each.

```
def mark_sig(df, val_cols, p_cols, round_to=3):
df = df.copy() # Don't modify the original data
mapper = {1:'', .1:' .', .05:' *', .01:' **', .001:' ***', .0001:' ***',}
posible_p = [.0001, .001, .01, .05, .1, 1]
for val_col, p_col in zip(val_cols, p_cols):
# For each value/p value pairing...
for i in range(len(df)):
# For every row...
val = df[val_col].iloc[i]
for p in posible_p:
# Check if the p value if below any of those on the list
if df[p_col].iloc[i] < p:
# If so, add the appropriate asterisks
df[val_col].iloc[i] = str(np.round(val, round_to)) + mapper[p]
break
print_me = val_cols # Only print the value columns
print df[print_me]
mark_sig(data_means, ['admit'], ['p'])
```

```
admit
rank
1 0.541
2 0.358 *
3 0.231 ***
4 0.179 ***
```

Feel free to use and modify this as you wish, although I'm sure there's nicer ways of doing this built into some R and Python packages that give this kind of output.

PS: Analysing the example data set in this way doesn't make any sense: it's used purely for illustrative purposes.