# blueollie

## Statistical Illiteracy

One of things I am most amused by are approval ratings polls. If you follow the social media of Trump haters (I do not approve of him as president), you’ll hear that his ratings are dropping..always dropping…

But Trump supporters will crow about his ratings…going up and you’ll even hear stuff like this from Trump himself:

Here is what they are linking to:

Yep, he popped above Obama …THIS POLL. Note how the TownHall screen shot managed to put that box right over the recent polls.

But what about those other polls?

Real Clear Politics polling average; (41.8 as of this writing)

FiveThirtyEight.com (40.6 approval as of this writing)

Trump vs. Obama in the Fivethirtyeight average

What is going on:

Even an average of polls shows some “fluctuation”; the polls go up and down with time, even when the trend is steady. This is due to randomness of sampling and perhaps some sample error. So, if in one poll, Trump is 38 in one poll and 40 in the next, his supporters say “Trump is gaining in the polls”. But then if he goes to 37 in the next one, Trump opponents cry “he is dropping like a rock!”

If you hang around Trump opponents, you hear only about the drops, and if you hang around supporters, he hear only about the gains.

Sam Wang got it right:

Reality: Trump is at about 41 percent approval (low for an economy in this shape) and there will be a few minor fluctuations in either direction that don’t mean much, if anything.

Workout notes; sore shoulder special swim: 250 free, 250 fins, 250 pull, 250 free, 250 free/back, 250 breast/free, 100 fins drill/swim, 100 free, 100 drill/swim, 100 swim, 100 pull, 100 swim 50 side, 50 swim. Just got it in; protected the shoulder.

treadmill run: 5 minute froggy to get to 44:50 for 4 miles, then walking (17 mpm, 16 mpm, 14 mpm, 13:30 ) to get to 59:40 for 5 miles. Foot did ok until the faster walking. Still ok.

April 3, 2018

## Confusing the individual with the aggregate

One of the things that fascinated me was radioactive decay. If you were given a certain amount of a radioactive isotope, you can deduce how much will be left (not decayed) after a certain amount of time. In fact, you can do this so accurately that you can base a precision clock on it.

However, it is impossible to determine WHICH atom will decay, no matter how much information you have about it. I don’t mean that it is practically impossible but rather that it is literally impossible. And the individual atoms will decay at different times.

In short, you have information about the aggregate but not about the individual. Of course, in this example, we are in the range of quantum phenomena.

But this principle, (aggregate vs. the individual) applies when one attempts to make inferences about what will happen with a population in which there is a high level of variance within the said population, and people often get confused.

Example: suppose you have two groups of students who are, say, starting a program of study in engineering. One group is the group of students whose math ACT scores are 22, and the other group has math ACT scores of 30. The harsh reality is that the group of students with a score of 22 will have very little success; there may well be a few individuals who make it, but the vast majority won’t. And yes, the group with a score of 30 will have some failures, but they will have many more successes.

So, the ACT score matters and has predictive value. But if you bring this up, someone will remember the person with a 30 who flunked out, and someone with a 22 who made it and claim that means that the “ACT is meaningless”. Psst: that isn’t true.

So yes, there are smokers who live a long time, there are those who drive while texting who don’t get into accidents, etc. But smoking does harm longevity and driving while texting increases one’s risk of having an accident.

Application to Illinois Football Illinois football is starting MANY true freshmen and, well, the record so far is grim (2 wins over weaker non-conference opposition, followed by 5 straight losses against “power 5” caliber opposition (USF isn’t “power 5” but they are an undefeated, ranked team). And prospects for another win this season are grim, with 2 Top 10 teams (Wisconsin, Ohio State) and 3 improved teams (Indiana, Purdue, Northwestern) left to play.

So the PR department is playing this “the future is bright” angle:

And yes, the team is playing a lot of freshmen.

But: how good is that class? I went on ESPN and looked at how the Big Ten 2017 recruiting classes were ranked:

Top 10: Ohio State, Michigan
10-25: Penn State, Maryland, Nebraska
26-39: Michigan State
40-49: Wisconsin, Iowa, Northwestern
50-59: Rutgers, Illinois, Indiana, Minnesota, Purdue.

So, based on talent, we *might* be able to hang with Rutgers, Indiana, Minnesota and Purdue, youth or no youth.

Now yes, measuring recruiting is tough to do, and there is always that individual “lightly regarded” recruit who blossoms into an NFL player. It does happen..individually. But a team composed of lightly regarded recruits is rarely, if ever, successful.

Workout notes: yesterday, wet 10K walk (untimed). today: weights. Pull ups were a struggle, so I did a couple of 5-5 sets then 2 sets of 10, one of 7-3 (50 total). (switched grip), usual PT, incline presses: 10 x 135, 4 x 160, 6 x 150, military (dumbbell: 10 x 50, 10 x 45) 10 x 180 machine (90 each arm), rows: 3 sets of 10 x 110. Then a chilly 5K walk outside.

October 25, 2017

## Statistical inference and the morning weight room

I know that this is far from perfect. But for a couple of years, the university had some smaller than average classes. And yes, the gym was more empty at 6 am.
Today: there were more people than usual in the gym at 6 am (start of classes). But that isn’t the only factor: our university is also tearing down buildings and replacing them with updated ones (yes, badly needed upgrades). That reduces the number of available classrooms, hence we have more afternoon/late afternoon classes than before.

So more students plus “being in class in the afternoon” means “more people in they gym” in the morning. Nevertheless, I got through the routine (weights only) in 42 minutes; then added 20 minutes of skips and legs then walked 4 outside.

Social media: it is interesting. In one case, somebody thought he was “calling me out” when, in fact, I was arguing about langue and not the concept. In another case, a Trump supporter refused to read anything from the mainstream media because…well, the election projections were wrong.

Note: the polls did pretty well with the national popular vote; even the state polls in the battle ground states were not that far off..it is just that several were off by a little bit IN THE SAME DIRECTION (which Nate Silver said was a real possibility). The polls weren’t bad, but some (not all) of the inferences from the polls were. But try explaining that to someone whose mind is already made up.

I’ve learned to say “ok, I’ll leave your company for others to enjoy”.

And yes, I’ve had to do that with people who vote the same way that I do. Statements like “no, Bernie Sanders would not have won” or “Hillary Clinton really isn’t that good of a campaigner; Bill Clinton and Barack Obama were a lot better” or “yes, the Russians did spread disinformation but there is no evidence that they hacked the voting machines themselves” have earned me both ire and blocks on Twitter.

No big loss though.

Workout notes:
hip hikes, toe raises, rotator cuff
pull ups: 5 sets of 10: ok.
incline: 10 x 135, 8 x 150, 4 x 160 (decent hip placement)
military: (standing, with dumbbells) 10 x 50, 10 x 45, 20 x 40
rows (Hammer) 3 sets of 10 x 200

The above took 42 minutes.

rope skips: 34, 50, 50 (last two sets: ended at 50 voluntarily). I am getting better.
goblet squats: 5 x 50 (window sill), 10 x 50, 10 x 53 (kettle), 5 x 70 (20 inch box)
4 mile walk in Bradley Park; kind of sluggish. Very good weather though; nice and cool.

August 23, 2017

## A bit of statistics

Ok, how can we draw statistical inference when we cannot run a controlled experiments? After all, correlation and causation are not the same. This is a useful guide as to the how and when. Basically: is the correlation strong, and is there some “plausible reason” for such a correlation? This paper lists 7 points.

Simpson’s paradox You can see a discussion here.

Think of it this way: say 1000 women and 1000 men apply for admission to graduate school. 656 men get admitted, whereas only 260 women get admitted. Does this mean that things are biased against women?

But then we see that there are two very different graduate programs. The very selective graduate program admitted 8 percent of all male applicants but 10 percent of all women applicants. The other graduate program..the “easy to get into” program admitted 90 percent of female applicants and 80 percent of all male applicants. So: we see that the women outdid the men in both programs. Yet, we also see that 800 women applied to the “difficult to get into program” and only 200 men did. On the other hand, 800 men applied to the easy program but only 200 women did.

Check it out: women: 800*.1 =80 admits to the hard program, 200*.9 = 180 admits to the easy program, so 260 total admits. Men: 200*.08 = 16 admits to the hard program, 800*.8 = 640 admits to the easy program, or 656 total admits.

This isn’t just some “trick” either. When social scientists analysed the “stand your ground” defense law in Florida, they found that whites were more likely to be convicted than non-whites. BUT this was because whites were more likely to be accused of assaulting a white victim; it turns out that the probability of prosecution was higher if the victim was white than if the victim was non-white. You can see the details here.

workout notes: 4 mile walk after weights: rotator cuff, 5 sets of 10 pull ups, bench press: 10 x 135, 5 x 185 (strong), 10 x 170, incline: 10 x 135 (very easy), military: 10 x 50 standing, 20 x 50 seated supported, 10 x 200 machine, rows: 3 sets of 10 x 50 single arm. head stand, 2 sets each of 10 yoga leg lifts, 12 twist crunch.

November 29, 2016

## Pre-election Sunday….

Ok, the time for spinning is over and what do the numbers say? Here are the betting lines:

They range from 3/10 to 1/5 for Clinton, with most at 1/4. This is a slight change from last night, but not much of a change.

Here is Upshot’s list of models:

And here are several prediction maps (I’ve put the source in green lettering). This is the list (from most favorable to Trump to most favorable to Clinton)

Election Projection: 284
Fivethirtyeight (Nate Silver) 293
Electoral vote: 317
Benchmark Politics: 322
Predictionwise 323
Princeton (Sam Wang) 323
Upshot (New York Times) 326

I’ve put together the maps, and labeled the source in green.

Some notes: Benchmark uses more data than just polling (e. g. economic indicators, history) and Predictionwise factors in betting lines for each state. And of course, each model factors the various polls a bit differently (e. g., how does one weight older polls? What track record does that polling outfit have? Is it a “likely voter” model or a “registered voter” model?)

But if you notice, the projected Electoral College count doesn’t vary that much; much of the dispute is in the “confidence interval”. Nate Silver’s model has a wider confidence interval (which can vary from a narrow Trump win to a Clinton landslide) and Sam Wang’s has a narrower confidence interval; I talk about this a bit more here.

November 6, 2016

## Election predictions: why the models differ

I see quite a bit of angst over the predictions of the upcoming general election. So I hope to explain the basic difference in philosophies of the competing models.

First, here is the obligatory map; this time I used Predictionwise which uses a blend of betting markets, polls and other data to assign a “probability percentage” of winning the individual states. The map I present shows the blue states as one where Hillary Clinton has a 62 percent probability (or higher) of winning (by this model) and then explain what happens if one wants a higher threshold (say 80 percent, then 90 percent)

Now there are other models out there; fivethirtyeight gives Trump the highest probability of winning; Princeton gives him the lowest.

Why the difference? If you want full details, read Nate Silver’s explanation of the difference in models and his explanation as to why, though Clinton and Obama were in similar positions with regards to the popular vote, Obama was in better position with regards to the Electoral College.

First, look at this chart, taken from Upshot: (I cut out the many of the “safely Democratic” and “safely Republican” states, and attached the header so you can see which model the estimates came from)

Note the 127 “close” states that Trump has to win.

Now consider two “extreme models” (both Nate Silver and Sam Wang are too competent to use either of these, but these extremes can explain the difference in confidence):

Extreme model 1: the vote percentage in the states is in lock step with the national averages. What that means: say Clinton’s average is 45 percent and in, say, Wisconsin, she is 3 points above that. Then Wisconsin is labeled as “D + 3” meaning she’ll get 3 points more than the national average. Now if there is a shift in the national polls, or if the national polls are just a bit off, that shift will be reflected in each state. For example, say the polls shift 4 points in Trump’s direction so Clinton’s average is 43 percent nationally. Then in this model, “D + 3” now becomes 46, down from 48. And that happens IN EVERY STATE.

Therefore a 2 point lead in each swing state becomes a 2 point deficit in each swing state, which indicates that Trump has a reasonable chance to win all of those close states, given a national surge or, say, the polls being off by a bit. Hence the uncertainty.

Of course, this works in the other direction as well; if the polls shift toward Clinton, she could win by a landslide. That explains the relevance of this remark by Nate Silver.

Now one could use the other extreme model: that the swing states are independent. That is, say, an increase in Trump support in New Hampshire is not correlated with an increase in Trump support in, say, Nevada. Now by that model, Trump is cooked; his chances of winning ALL of those tightly contested 127 electoral votes is basically zero, hence Sam Wang’s statement:

Now Wang is way too competent to make the simplistic assumption that the state results are independent of one another. But one has to remember that Clinton is using a sophisticated voter targeting operation in key states (her “firewall states”) and Trump has contempt for such operations. So a small Trump surge nationally might not help him close the gap in those states. Obama’s campaign manager Jim Messina explains that there.

Again, neither Silver or Wang use these extreme models; they are way too competent to do so. But their models weight uncertainty and polling error and the statistical independence of the states differently, hence the difference in probability.

In a nutshell: Silver’s model has a wider “confidence interval” for the number of Electoral Votes (hence, higher probability of a Trump win or a Clinton landslide) and Wang’s confidence interval is smaller (centered around a modest but solid Clinton win in the Electoral College).

November 4, 2016

## Mathy post: women’s legs, polls, pigeons and expectations …

Workout notes: it was 75 F and yes, 100 percent humidity again. THAT, plus my 3 in a row this past weekend (15 mile run Saturday, 13 mile walk Sunday, 4 mile race on Monday) left me tired. So I did a slow, untimed 6.3 (10K) run/walk. Today, it was enough.

Women’s legs and running

Ok, the youngest woman in the photo (calves!) is in her late 50’s. The one with the blonde ponytail is in her early 60’s; the other two are over 70. Yes, all frequently win awards at running races.

So the question: do these ladies get their legs from all of that running, or were these ladies attracted to running because they had the genetic potential to have good legs for running? The answer isn’t that clear, is it.

Yes, I’ll put this as a bonus question on an exam at the appropriate time.

Speaking of statistics: Nate Silver gives a run down of the current state of the election. Clinton is about a 70 percent favorite in many models (including the betting lines) and about 90 percent in the Princeton model (see the NYT model and other models here). If you look at what we are seeing in the polls right now (Trump with a narrow lead in a few of them; the rest showing Clinton with up to a 6-7 point lead), we see that 3-4 point Clinton lead best explains what we are seeing.

Presidential elections in “no incumbent” running years tend to be close (3 times in my lifetime, the popular vote spread was less than 1 point: Kennedy-Nixon, Nixon-Humphrey, Gore-Bush, twice it was 7-8 points: Obama-McCain, Dukakis-Bush).

And this leads me to another topic: conditional probability. This shows up in the famous Montey Hall problem.

Imagine a game: you are shown 3 doors; the prize is behind one door and the other two doors have nothing. Here is the rule: you pick one door. Then the person running the contest *always* shows you a door that does NOT have the prize. Always…and you know that the person running the show WILL do that.

So, should you switch to the door that you did not pick that remained unopened?

Answer: YES. And pigeons are actually better able to figure it out than humans!

Here is the math behind this:

You pick one door: 1/3 is your probability of success. Then you are given the option to choose from the two doors that you did NOT pick…that means if you switch, your probability of success climbs up to 2/3. Remember you only fail if you were right the first time.

Think of it this way: imagine there were 100 doors. You pick. Then you are shown 98 doors where the prize is NOT. Would you switch? Remember your probability of being right on the first choice was 1/100.

Here is where “conditional” comes in: label the doors I, II, III. You pick I. You are shown that it is NOT II.

September 8, 2016

## Jeb Bush: Trump Supporters Aren’t ‘A Bunch Of Idiots’ (he is right)

Jeb Bush said the following:

Former Florida Gov. Jeb Bush (R) said Saturday that supporters for GOP presumptive nominee Donald Trump aren’t “a bunch of idiots” and should be respected, CNN reported.

“What I fear is that people, kind of looking down their nose, will say the people that are supporting Donald Trump are a bunch of idiots. They’re not. They’re legitimately scared. They’re fearful,” Bush reportedly said at an event in Amsterdam. “They’re not as optimistic for legitimate reasons and there should be respect for that. And on the other side, a similar respect needs to be shown.”

Now of course, this statement (which I think should be obvious) has met with ridicule. Yes, I know, I know, we’ve all seen the cherry picked photos of Trump supporters and of Trump rallies:

So, yes, there are some dumb people supporting Donald Trump. And yes, there are some evil ones too.

But when are talking about a national candidate with millions of supporters, a tiny selection of supporters tells you very little about the whole.

Here is an example of what I mean: think of 2008, when i was a proud Obama supporter. Well, some of then Senator Obama’s support came from the..well, less than informed people

and some came from morally questionable people too.

Again, this is just statistics in action; the larger the population, the more the population resembles the larger population.

So, what can say about Trump supporters, “in general”?

For one thing, on the average, they tend to have a higher household income than either Sanders supporters or Clinton supporters.: (the data I report measures median household income; “median” means “that income that is in the middle range of supporters; half of incomes are above, half are below”; this is done to mitigate the effects of a few very large incomes)

72K per year as compared to 61k per year for both Clinton and Sanders supporters. Now this isn’t true in every state: in New Hampshire, Vermont, Connecticut and Virginia the median household income of a Clinton supporter exceeds that of a Trump supporter. Trump supporters earn more than Sanders supporters in all of the surveyed states.

Secondly, there is a positive correlation between income and IQ; on the average those with higher IQs tend to earn more money than those with lower ones. NOTE: the New Scientist article I linked to also deals with wealth too and there isn’t much of a correlation with IQ and household wealth (example: those with higher incomes might well spend more):

The work reveals that while exceptionally smart individuals typically earn more, they are also more likely to spend to their credit card limit, compared with people of average intelligence.

Jay Zagorsky at Ohio State University in Columbus, US, analysed personal financial information collected from 7500 people between the ages of 33 to 41. Subjects provided details about their cash flow – including wages, welfare payments, alimony, and stock dividends – and their overall net worth. They also answered questions about whether they had “maxed out” any of their credit cards, missed bill payments or filed for bankruptcy.

[…]

On the surface, Zagorsky’s analysis confirms the findings of previous studies linking higher intelligence with higher income. “Each point increase in IQ test scores is associated with $202 to$616 more income per year,” he says. For example, a person with a score of 130 (in the top 2%, in terms of IQ) might earn about $12,000 more per year than someone with an average IQ score of about 100. On the surface, people with higher intelligence scores also had greater wealth. The median net worth for people with an IQ of 120 was almost$128,000 compared with \$58,000 for those with an IQ of 100.

But when Zagorsky controlled for other factors – such as divorce, years spent in school, type of work and inheritance – he found no link between IQ and net worth. In fact, people with a slightly above-average IQ of 105 , had an average net worth higher than those who were just a bit smarter, with a score of 110.

Again, there is the correlation between INCOME (not net worth) and IQ.

So, if anything, the data might suggest that Trump supporters might be somewhat brighter than the Sanders and Clinton supporters, on the average. I say “might” because I don’t know the “n” for these income samples. It might be that the Clinton and Sanders groups are larger groups, and therefore subject to “regression to the mean” effects whereas the early Trump supporters might be a more selective sample of people (fewer people).

But I think that there is no evidence that Trump supporters are dumber than either Sanders or Clinton supporters.

May 22, 2016

## Rant: recognizing the limits of what one knows

I’ll admit that I am an expert in a very narrow slice of mathematics. But I am at least an AU from being an international or even a national caliber expert in that narrow field of mathematics.
And yes, I often read about topics that are not in my area; I enjoy popular books and articles on topics from the various branches of science, economics and the like.

Nevertheless, I also realize that when I read such a book or article, or when I attend a public lecture, I am getting a watered down, simplified treatment of the subject. I lack the context and the prerequisite knowledge to appreciate a presentation aimed at the experts.

And there lies one of my biggest frustrations when it comes to talking to people, either on the internet or in person. There are so many who really can’t detect the difference between expert knowledge and what they read (and perhaps half-digested …if that much) from a popular book. It is THAT level of “lack of humility” that makes some unpleasant conversation companions; I am ok with ignorance. After all, I am ignorant of the vast majority of human knowledge. I think that all of us are.

And, sadly, I see this lack of intellectual humility in political or social issues discussion, especially from the “losing side”. It appears to me that being on the losing side of an election (and I’ve been there, many, many times) brings out the worst in people in several ways.

Example: I had someone try to tell me that Hillary Clinton’s popular vote is “within the margin of error”, when one factors in the caucus states.

Of course, that is a dumb statement for a number of reasons.

1. There is a difference between a vote count and a poll count, even though both have a margin of error (remember Florida in the 2000 general election). The margin of errors in vote count is much smaller than it is for a poll.

2. The margin of error for a poll is $1.96 * \frac{.5}{\sqrt{n}}$ (assuming a 95 percent confidence interval and a relatively close election; this comes from the normal approximation to the proportion distribution. So as $n$ increases, the confidence interval, and therefore the margin of error, decreases. Note: for more on polls, read this wonderful little article written by a physics professor.

3. Hillary Clinton leads by about 3 million votes, even when one counts the caucus votes. The latter doesn’t add much as there are fewer caucus states, and these tend to be smaller states. Anyhow, she leads about 57-43.

4. The person making the claim appeared to not understand that winning a small state by a very large percentage didn’t make up for winning a bigger state by a smaller margin.

Yes, by knowing that Sanders won a lot of caucus states and that there IS such a thing as margin of error puts this individual into the “above average” category. But this person was clearly ignorant of their own ignorance.

There is another factor in play: I really think that desperation makes one dumber. When one really likes a candidate or a person, or even a sports team, it is tough to accept an unpleasant reality. I’ve become acquainted with the latter as an Illinois football fan (“yeah, we have a shot at being Wisconsin!” Sure.)

Desperation can lead to an abandonment of one’s values. Check out the Republican Chairman’s take on Donald Trump

Oh sure, few would be surprised at Donald Trump’s behavior, and I doubt that a certain type of Republican really cares that much (“hey, what do you expect with Trump anyway?”)

May 16, 2016

## West Virginia votes today and…and uncomfortable right wing cartoon

The cartoon:

Yes, liberals tend to reflexively take the side of the underdog and, all too often, liberals conflate complaints about the more regressive practices of Islam (example) with justifications of anti-Muslim bigotry (which I openly oppose).

I’ll make it clear: saying that Islam (on the whole) enables many regressive practices is NOT the same as opposing the building of mosques, backing noxious anti-Muslim immigration policies, etc.

West Virginia votes today This should be a rather easy victory for Sanders. This would cut Clinton’s lead in pledged delegates from 285 to 280 or so. However, this shouldn’t be like the 2008 blowout where Clinton crushed Obama by about 40 points (and still trailed by 100 delegates or so); the link is to an old Daily Show (with Jon Stewart) episode which had a funny take on it. Of course, I can put West Virginia in the Republican column right now, though it wasn’t always that way.

National Election

Donald Trump is now turning to the Republican Party for funds. So maybe this election will be more conventional than previously thought.

And yes, you’ll hear that Hillary Clinton is trailing in this battle ground state or that one. Reality: she has a good sized lead right now and it will take something special to change it.

And about the election coverage: Gin and Tacos, while giving Nate Silver proper credit, seems annoyed that many don’t realize that what he does is really, at least by academic standards, well, sort of basic. (and yes, Ed admits some jealousy, but what about me? I don’t even have the best blog on the 4’th floor of my building! 🙂 )

I’ll tell you what I like about Nate Silver: he got his stuff out there, and in 2012, it was a very useful counter to all of the garbage that places like NPR were putting out. My friends who followed the election on NPR were scared to death, even though I told them that the election wasn’t close and showed them the battle ground state polls:

Romney only lead in a few of these and always at “margin of error” levels. There was no hope for him here, though the media constantly reported a “close race”. Silver was the public face against such nonsense; I call the 2012 election as a “victory for the nerds.”

May 10, 2016