Moments, Mass, and Motion #1

The first revelation I ever had about statistics was the universal need for statistics. From physics (keep a pin in that) to government and economics, statistics sheds some light onto otherwise murky complex processes. Statistics empowers us to extract empirical insight from the world through data that would otherwise go unexplored. The second revelation I ever had about statistics – one of the paradoxical sort – was that statistics itself is often a murky subjective mess. Riddled with awkward proofs and shaky techniques, statistics regularly revamps outdated concepts for newer, shinier ones in hopes of better interpreting data. I would go as far as to say that for many, statistics feels like a giant black box: some data goes into the python code amalgamation and some result gets spit out – magic. I would even go as far as to say that staples of statistics, the very measures of mean and standard deviation, are subjective (bold, I know). Let me introduce to you The Moment.

Before we go any further let me introduce a few terms as they will be used in this post

Probability Distribution Equation

A PDE is a function that describes the probability of randomly picking a specific value or a set of specific values from a population.

A PDE can tell you how likely you are to pick only kings from classic playing cards (4/52).

Measure

When analysing data from a population or sample, a measure is the value one would get from statistical analyses.

For example, means, standard deviations, PDEs, or DKIs (Keep reading and I’ll explain).

While Pafnuty Chebychev, the father of russian mathematics, was the first to systematically approach moments, in western europe, our journey starts with Karl Pearson and his goal to use sample data to guess a population’s probability distribution equation. He guessed that he could find the same types of measures in both the sample and the PDE. If these similar measures aligned between the sample and the proposed PDE, then the proposed PDE more accurately approximated the population’s PDE. In the late 19th century he adopted Chebychev’s measures to not only describe the accuracy of a proposed PDE but to hold other important information. And so were moments popularized.

Karl Pearson
Pafnuty Chebychev

Moment generator for samples, $$E(x^{n})=1/k\sum_{i=1}^{k}x_{i}^{n} = m_{n}$$

Moment generator for PDEs, $$E(x^{n})= \int_{-\infty }^{\infty }x^{n}f(x)dx = \mu_{n}$$

A moment, written as $\mu$, is described by the equations $E(X)$ where big $X$ is the entire population of values. The first equation works for samples of the population, and the second – for PDEs. You may have noticed that as written, a moment has a number n neatly hidden. That n ranges from $0$ to $\infty$. $E(X^{n})$ finds the moment for $\mu_{n}$.

For deeper analysis, most moments can be adjusted in two common ways: centering and standardising. A centered moment, as the name suggests, is a moment whose population data was centered around the mean. The centered moment is found using $E((X-\mu)^{n})$ where $\mu$ is the mean (Oh? Isn’t that an interesting coincidence?). A standardized moment is a moment that is divided by the standard deviation, found using $E(X^{n}/\sigma)$ where $\sigma$ is the standard deviation.

Raw $$\mu_{1}=E\left(X\right)$$ $$\mu_{2}=E\left(X^{2}\right)$$ $$\mu_{3}=E\left(X^{3}\right)$$ $$\mu_{4}=E\left(X^{4}\right)$$

Centered $$\mu_{2}=E\left(\left(X\mu\right)^{2}\right)$$ $$\mu_{3}=E\left(\left(X\mu\right)^{3}\right)$$ $$\mu_{4}=E\left(\left(X-\mu\right)^{4}\right)$$

Centered & Standardized $$\mu_{3}=E\left(\left(\frac{X-\mu}{\sigma}\right)^{3}\right)$$ $$\mu_{4}=E\left(\left(\frac{X-\mu}{\sigma}\right)^{4}\right)$$

Lets take a look at the first few moments…

Example PDE (this distribution comes from the Maxwell–Boltzmann distribution)

The zeroth moment does not have a use in statistics. E(X0) is just the full area under the PDE curve. Consequently the zeroth moment will always yield one by definition of a PDE being the collection of probabilities.

0th Moment: 1
Mean: 1.596

The first moment is the mean! When I found out that the first moment could describe the mean, it made sense given the formula, but that elegant coincidence(I’m certain it was intentional on Chebychev’s part) blew my mind!

The centralized second moment is the variance, the square of standard deviation! At this point, I gained a new appreciation for moments; sure moments helped approximate a PDE but more importantly, moments provided innate details about the population.

Variance: 0.454

If you took a stat class I can sense your boredom. You already know about the mean and the standard deviation. What if I told you though that there are a few more less well-known moments that have an agreed upon interpretation?

The third centralized and standardized moment, skewness, generally goes under many radars which is quite unfortunate given its useful interpretation. As the name suggests, skewness represents how skewed the data is to one side from the mean. A negative skewness means that the probability for values to the left of the mean is more spread out than for the right while a positive skewness means the exact opposite. A skewness of 0 means the distribution is symmetric.

Skewness: 0.486
Kurtosis: 3.108
DTI vs. DKI

Finally the centralized and standardized fourth moment, kurtosis, measures the heaviness of the tails. Kurtosis is an interesting one because of its more common modified version called excess kurtosis. Excess kurtosis is just kurtosis minus three. Because the normal distribution has a kurtosis of three, any PDE’s kurtosis that differs from the normal distribution would have an excess kurtosis. Now if the kurtosis is three – that is the same as the normal distribution – that kurtosis is called mesokurtic. A kurtosis above three has fatter tails and is called leptokurtic while a kurtosis below three is called platykurtic (really laying it thick with the nonsense names, Ivanna). Interestingly, kurtosis can be used to improve a MRI technique called DTI (diffusion tensor imaging). Basically DTI measures the flow of water molecules within brain tissue to map out the brain. This is all well and good except that DTI analysis assumes that the brain tissue has a mesokurtic distribution. Kurtosis adds to DTI analysis through DKI (diffusion kurtosis imaging). This technique improves DTI analysis by finding the kurtosis at each point of the brain and then modifying the DTI analysis of the MRI data

I have to admit something though: as is statistical tradition, moments are outdated. In fact, substitute PDE approximations, cumulants, have become widely accepted as the better PDE approximators. So yes, even the mean and standard deviation aren’t safe from statistical fuzz. I think that’s exactly why statistics is so special; that eye squinting and dubiousness very well might be the reason statistics is so effective at extracting meaning from data.

I hope y’all enjoyed this little blog post!! I want to spend a moment (haha) to thank people who read this. From jobs, to school, to daily necessities, the world is full of responsibilities and distractions. It’s exhausting. So thank you for taking some time out of your day to learn and enjoy the ramblings of one silly high schooler. Thank you.

 I know I haven’t posted in a while but with senior year I’ve been quite busy trying to make sure I get into college (still searching). This is actually the first of two posts concerning moments. I hope you kept a pin in it…

 

As always, feel free to converse, correct, and comment! I am a high school student after all. Have a good day and thank you!!

 

Leave a Reply

Your email address will not be published. Required fields are marked *