Descriptive Statistics - using Python, MS Excel
The example below illustrates how to compute statistical averages and variances of a numerical column in excel.
By statistical averages and variances we mean to compute the below:
By statistical averages and variances we mean to compute the below:
- Median
- Average
- Mean
- Variance
- Standard Devation
The program assumes that the 2nd column of the "details" worksheet in "marks.xls" workbook has marks obtained in Math by the students of a class.
The descriptive Statistics can also be generated using the "Data Analysis" option in MS Excel. To be able to view this option under Data Menu in Excel, ensure to check the Analysis ToolPak option under Tools --> Add-ins
Note: NaN = Not a Number
-------
The descriptive Statistics can also be generated using the "Data Analysis" option in MS Excel. To be able to view this option under Data Menu in Excel, ensure to check the Analysis ToolPak option under Tools --> Add-ins
Note: NaN = Not a Number
-------
from openpyxl import load_workbook import numpy as np wb = load_workbook(filename="marks.xlsx") sheet = wb["details"] rows = sheet.max_row columns = sheet.max_column mathmarks = np.ndarray(rows-1) for i in range(2,rows+1): mathmarks[i-2] = sheet.cell(row=i, column=2).value print("median:") print(np.median(mathmarks)) print("median - ignoring NaNs:") print(np.nanmedian(mathmarks)) print("weighted average:") print(np.average(mathmarks)) print("arithmetic mean:") print(np.mean(mathmarks)) print("arithmetic mean - ignoring NaNs:") print(np.nanmean(mathmarks)) print("variance:") print(np.var(mathmarks)) print("variance - ignoring NaNs:") print(np.nanvar(mathmarks)) print("standard deviation:") print(np.std(mathmarks)) print("standard deviation - ignoring NaNs:") print(np.nanstd(mathmarks))
Comments
Post a Comment