miércoles, 30 de abril de 2014

Keep it simple!

Some time ago, I have figured out that those of us are the most knowledgeable who possess the ability to explain the most complex stuff in the most lay terms.

Repeating after the renowned French philosopher, Nicolas Boileau-Despréaux, "Ce que l'on conçoit bien s'énonce clairement, Et les mots pour le dire arrivent aisément."

What is conceived well is expressed clearly, and the words to say it come easily.

There is a famous Russian anectode (yes, we to tell anectodes all the time). Two university professors meet, and one asks another one: "How's the exam going?" "Terrible," - the other answers. "What's wrong?" - inquires the first. "Well, they just can't get it straight. I've explained them once, explained them twice, I have even understood it myself already, but they still don't get it!"

So, some people complicate the things in order to seem more knowledgeable, to get chicks or to rock a job interview, but does this really work out?

It doesn't.

Daniel Kahneman, the behavioral finance forefather, in his amazing book "Thinking, Fast and Slow"  wraps it up as follows:

"If you care about being thought credible and intelligent, do not use compex language where simplier language will do."

This person has a Nobel prize so he kind of knows what he is talking about. Kahneman adresses the findings made by his friend and colleague Daniel Oppenheimer from UCLA (Princeton University before). In his paper (Ig Nobel-winning,  actually) "Consequences of Erudite Vernacular Utilized Irrespective of Necessity: Problems with using long words needlessly", Oppenheimer provides a broad statistically-backed evidence of why people should keep it simple. 

One of the experiments that he describes is using an abstract from the classic work, "Mediation IV", by René Descartes. The first paragraph of this manuscript was translated into English by two different interpreters, and then the translations were presented to a group of Stanford university undergraduates. Half of the participants  read the more complex, 98-word, translation, whereas the other half read the simplier, 82-word, verison. More to that, half of the participants was told that the text had been written by Descartes, while the other half was told that it came from an anonymous author. The students were instructed to rate the complexity of comprehension of the text and the intelligence of its author, both on a 7-point scale.

Results show that those who read the "simple" text and knew that it belonged to Descartes, rated Descartes as more smart than those who read the "complex" text. The same was observed amongst those who attributed the text to the anonymous author.

In this experiment, the complexity of the text was negatively correlated with the intelligence of the author and positively - with the difficulty of comprehension. Then, the difficulty of comprehension was used as a mediator in the analysis aimed to establish the link between the complexity of the text and the intelligence of its author. To do so, Sobel's mediation test was employed.

The results of the analysis are summarized grahpically on the picture taken from the Oppenheimer's paper:


Sobel's test is one of the most common tests for intervenience, that could be also referred to as tests for mediation  (in psychology) or tests for surrogate or intermediate endpoint effects (in epidemiology). If you are interested in these, please take a look at this arcticle.

In plain terms, we would use mediation tests if we want to see whether the relationship between some variable and some other variable depends on some intervening factor. These tests are well realised in R and SAS, however, I failed to find a Python implementation for them.

One of the limitations of the Descartes's study that Oppenheimer points at is that it has been conducted on smart people. So, if you expect to deal with someone whos knowldege you question, the chances of them being mesmerized by you spelling magic complex words are random. 

martes, 15 de abril de 2014

Why abstaining from drinking may be a bad idea when aplying for a job in finance

One of the major motivations of statistics is to attempt to figure out whether there is a link or a lack thereof between something and something else.

The "somethings" tend to be described in data format, and therefore mathematical procedures come in very handy to make an inference and to support or reject what common sense is suggesting.

Whereas for numeric data the methods are quite straightforward, as they stand on the shouders of giants as  taking into account the knowledge coming from, say, physics, geometry, calculus, differential equations etc., when one has to deal with categorial data, the things get somewhat trickier.

There is an ample bunch of test fo dependence/independence for categorical data, and there are well-written books on the matter - like this one or this one. Nonetheless, the suggestion to use this or that particular test is often driven by empirical knowledge and looks more like a technical analysis omen rather then a thorough, sigma-algebra-based strict mathematical stuff.

Whatever works.

As searching for yet another test to evaluate the existence of a link between an important health ouctome and a common exposure factor, I´ve bumped into a classic study conducted by the great mind behind it all, Karl Pearson. In 1909, he evaluated whether there is a connection between criminal behavior and consumption of alcoholic beverages.

Pearson studied 1426 criminals, and his null hypothesis was that there was no association between the type of crime and alcohol consumption.

Below there is a descriptive table for this study. It has been taken from the book by A. Elliott and W. Voodward,  Statistical Analysis Quick Reference Guidebook: With SPSS Examples


Just by a simple 'look see' method, one can firmly reject the null hypothesis: drinkers are clearly more prone to conducting criminal activities.

Except for one. Fraud. Indeed, fraud should require some solid intellectual input, and therefore one must be really clear-headed when doing something fradulent.

As actions of this kind are most frequently associated with financial industry, HR departments of relevant institutions could take a closer look at this interesting misalignment in the common pattern. Also, the prospective candidates for financial jobs could abstain from proudly proclaiming selves as devoted no-drinkers during the interviews.

Not only this is rude, because, you know, in this industry people do not drink alcohol, but also the admirers of Pearson's contribution to statistics might find these life choices not so zero cool.

I'm kidding, of course.

The value of the chi-square statistics for the test of independence is 49.731 and the p-value is around 0.000.