Saturday, February 18, 2012

Meaning and derivation of the arithmetic mean

Hello again!

My apologies for not writing in so long a period. During the last 18 days, I've been in a military camp and have had no access to the internet or the real world. If you're wondering why, it's because Singapore has a conscription policy for all able-bodied men, and I happen to fall into that category. As for my army experience so far, I shall the leave that to another post. I have missed thinking about mathematics, so this post will be mathematical.

In the study of statistics, central tendency is a critical concept. Central tendency is basically a value or number around which quantitative data, or sets of data, tend to cluster. Central tendency can also be thought of as a number, or a set of numbers, that conveniently and accurately describes a set of data. There are many measures of central tendency. One simple and commonly used measure is the arithmetic mean, and I'm sure we are familiar with the formula for it. For a set of data (numbers) a_1,a_2...a_n, the arithmetic mean (a bar) is given by the following formula:

$\large \bar{a} =\frac{a_1+a_2+...+a_n}{n}=\frac{\sum_{i=1}^{n}a_i}{n}$

I am going to be explaining how the arithmetic mean actually links with the idea of central tendency, and using that idea, I will derive the formula above. To start off, assume there exists some arbitrary sample of data:
Now, assume a central tendency value or an arithmetic mean for this sample exists, and it, being the central value, exists such that the sum of the positive deviations and the sum of the negative deviations (modulus) from it must be exactly the same. To put this more mathematically, the arbitrary data set will need some re-arrangement. The data that is greater than or equal to the arithmetic mean (assuming it exists) must be isolated from the data that is less than the arithmetic mean in the following manner:

\large \begin{align*} &a_1,a_2,a_3...a_s\geq \bar{a} \\ &a_{s+1},a_{s+2},a_{s+3}...a_n< \bar{a} \end{align*}

Now for the mathematical statement regarding the arithmetic mean:

\large \begin{align*} &(a_1-\bar{a})+(a_2-\bar{a})+(a_3-\bar{a})+...+(a_s-\bar{a}) \\ &-(\bar{a}-a_{s+1})-(\bar{a}-a_{s+2})-(\bar{a}-a_{s+3})-...-(\bar{a}-a_n)=0 \end{align*}

Re-arranging the terms:

\large \begin{align*} &a_1+a_2+...+a_n-n\bar{a}=0 \\\\ &n\bar{a}=a_1+a_2+...+a_n \\\\ &\bar{a} = \frac{a_1+a_2+...+a_n}{n} \\\\ &\textup{QED} \end{align*}

I hope you now see how the formula was derived and how the arithmetic mean actually represents the central value of a data set.

LaTeX source: http://www.codecogs.com/latex/eqneditor.php