Descriptive Statistics - using Python, MS Excel

The example below illustrates how to compute statistical averages and variances of a numerical column in excel.

By statistical averages and variances we mean to compute the below:

  • Median
  • Average
  • Mean
  • Variance
  • Standard Devation
The program assumes that the 2nd column of the "details" worksheet in "marks.xls" workbook has marks obtained in Math by the students of a class.

The descriptive Statistics can also be generated using the "Data Analysis" option in MS Excel. To be able to view this option under Data Menu in Excel, ensure to check the Analysis ToolPak option under Tools --> Add-ins

Note: NaN = Not a Number
-------

from openpyxl import load_workbook
import numpy as np

wb = load_workbook(filename="marks.xlsx")
sheet = wb["details"]

rows = sheet.max_row
columns = sheet.max_column

mathmarks = np.ndarray(rows-1)
for i in range(2,rows+1):
    mathmarks[i-2] = sheet.cell(row=i, column=2).value

print("median:")
print(np.median(mathmarks))
print("median - ignoring NaNs:")
print(np.nanmedian(mathmarks))

print("weighted average:")
print(np.average(mathmarks))

print("arithmetic mean:")
print(np.mean(mathmarks))
print("arithmetic mean - ignoring NaNs:")
print(np.nanmean(mathmarks))

print("variance:")
print(np.var(mathmarks))
print("variance - ignoring NaNs:")
print(np.nanvar(mathmarks))

print("standard deviation:")
print(np.std(mathmarks))
print("standard deviation - ignoring NaNs:")
print(np.nanstd(mathmarks))

Comments

Popular posts from this blog

Santa Clara (CA) DMV - Driving Test Practice Routes

Installing Python Packages on Mac OS X using Conda

Trip to Coorg from Bangalore