Thank Guinness for the t-test - The Oxford Scientist

Turns out there’s more to Guinness than simply splitting the G. Photo credit: Jessica Johnston via Unsplash

Your favourite stout is responsible for extending one of the more dreaded, dry branches of mathematics known to scientists: statistics. Historically associated with a precise pour, Guinness’ research into the standardisation of their beer in the early 1900s gifted humankind the t-test, an important metric of statistical significance. Beer drinking is and was a serious business; as global expansion burgeoned, so did the impetus for reliable quality control. In 1900, under the renowned chemist Horace Brown, the Guinness Research Laboratory was born. The brewery instigated a policy of hiring Oxbridge men with first class degrees in the sciences and mathematics, bringing us the likes of their characteristic “surge and settle” pour due to the crafty nitrogenation of their beer. An aspirational incentive to get a First, evidently.

how was the Guinness research laboratory meant to assess the quality of an entire field of hops?

Hop flowers, a vital component in the typical bitterness and complexity of a beer, were one such challenge. Their quality and relative sweetness was measured by proxy of their soft resin content, but how was the Guinness research laboratory meant to assess the quality of an entire field of hops? Measuring all would be excessively laborious; measuring only a few would be unreliable. They were thus presented with the dilemma of a small sample size, an issue all-too common for scientists. For example, medical scientists often grapple with large populations and small sample sizes: they must be certain that a drug tested on hundreds of people engenders improvement in millions.

Any sample of data has an associated mean, standard deviation (spread), and probability distribution. Customarily, this is a normal distribution—the familiar bell-curve.

For a small sample of a larger population, there is added uncertainty in the mean: is it skewed because, by chance, you happened to select unusually low or high values? Say the ideal soft resin content was 8% in hops, and you assume that there is a normal distribution around that mean value. You measure the mean of a different small sample, and find it is 6%. Does that reflect an actual decrease in soft resin content, or did you randomly choose more hops with lower-than-average soft resin content? In other words, is your 6% mean a statistically significant deviation from 8%? Typically, a p-value is used, which assumes that statistical significance exists when there is a very low likelihood of that mean value occurring (commonly, less than 5%). For a normal distribution where you have collected a large amount of data this is trivial, as one can compute the area under the curve and figure out how far the new mean (6%) must be from the original mean (8%). Crucially, you’ve only collected data for a small sample, and consequently you have no idea how “spread out” the soft resin content actually is. Is your deviation due to a large variance (naturally occurring big fluctuations around the mean), or actually because this specific field of flowers is not suitable?

With these new distributions, Guinness were able to more confidently predict which samples of hops afforded better soft resin content and standardise their beer.

Enter William Henry Gosset, an Oxford-educated mathematician and chemist. He was hired by Guinness right after completing his degree in Chemistry in 1899. The idea of a p-test preceded him, but he tackled the concept for small sample sizes by working out new distributions, by hand. These distributions were shallower and broader; they increased the threshold for a deviation to be considered significant. They were also dependent on the exact size of the sample. With these new distributions, Guinness were able to more confidently predict which samples of hops afforded better soft resin content and standardise their beer. Gosset’s statistical musings lie behind each sip and have the potential to revitalise your perception of statistics or ruin your next pint.

https://www.scientificamerican.com/article/how-the-guinness-brewery-invented-the-most-important-statistical-method-in

JSTOR

Gosset was an intriguing man. As was customary for industry at the time, he published his results under the pseudonym ‘Student’, so his test has come to be known as the somewhat infantilising ‘Student’s t-test’. Guinness had a policy stifling scientists from publishing papers in totality, but Gosset was able to convince them of the relative irrelevance of his statistical brainchild to the quality of competitors’ stouts. Luckily (or unluckily, depending on your disposition), Guinness management relented. Often, solutions to unique problems in industry have far-reaching, general applications to society: the Haber-Bosch process of ammonia production, used to make artificial fertilisers, was born in the German war effort and continues to feed us today. Gosset remained with Guinness for his entire career, and worked on many statistical problems of correlation and sampling distributions, all in service of brewing better beer. When war erupted in 1914, his poor eyesight prevented him from serving, but he did express his desire to help with what he believed was important: ‘My own war work is obviously to brew Guinness stout in such a way as to waste as little labour and material as possible, and I am hoping to help to do something fairly creditable in that way.’ – letter to Karl Pearson, his colleague, September 1914.

Gosset remained with Guinness for his entire career…

It is a quirky parallel that a company historically obsessed with rigour and craftsmanship has inspired a drinking challenge requiring equivalent precision. Perhaps there is even today an Oxford scientist labouring over the statistics for the ideal ‘rate of chug’ required for a foolproof success in ‘splitting the G’, but that is far beyond the remit of this humble article. Happy chopping!

From this section