A bit of statistics

Ok, how can we draw statistical inference when we cannot run a controlled experiments? After all, correlation and causation are not the same. This is a useful guide as to the how and when. Basically: is the correlation strong, and is there some “plausible reason” for such a correlation? This paper lists 7 points.

Simpson’s paradox You can see a discussion here.

Think of it this way: say 1000 women and 1000 men apply for admission to graduate school. 656 men get admitted, whereas only 260 women get admitted. Does this mean that things are biased against women?

But then we see that there are two very different graduate programs. The very selective graduate program admitted 8 percent of all male applicants but 10 percent of all women applicants. The other graduate program..the “easy to get into” program admitted 90 percent of female applicants and 80 percent of all male applicants. So: we see that the women outdid the men in both programs. Yet, we also see that 800 women applied to the “difficult to get into program” and only 200 men did. On the other hand, 800 men applied to the easy program but only 200 women did.

Check it out: women: 800*.1 =80 admits to the hard program, 200*.9 = 180 admits to the easy program, so 260 total admits. Men: 200*.08 = 16 admits to the hard program, 800*.8 = 640 admits to the easy program, or 656 total admits.

This isn’t just some “trick” either. When social scientists analysed the “stand your ground” defense law in Florida, they found that whites were more likely to be convicted than non-whites. BUT this was because whites were more likely to be accused of assaulting a white victim; it turns out that the probability of prosecution was higher if the victim was white than if the victim was non-white. You can see the details here.

workout notes: 4 mile walk after weights: rotator cuff, 5 sets of 10 pull ups, bench press: 10 x 135, 5 x 185 (strong), 10 x 170, incline: 10 x 135 (very easy), military: 10 x 50 standing, 20 x 50 seated supported, 10 x 200 machine, rows: 3 sets of 10 x 50 single arm. head stand, 2 sets each of 10 yoga leg lifts, 12 twist crunch.

November 29, 2016 Posted by | science, social/political, statistics, walking, weight training | Leave a comment

Pre-election Sunday….

Ok, the time for spinning is over and what do the numbers say? Here are the betting lines:


They range from 3/10 to 1/5 for Clinton, with most at 1/4. This is a slight change from last night, but not much of a change.

Here is Upshot’s list of models:


And here are several prediction maps (I’ve put the source in green lettering). This is the list (from most favorable to Trump to most favorable to Clinton)

Election Projection: 284
Fivethirtyeight (Nate Silver) 293
Electoral vote: 317
Benchmark Politics: 322
Predictionwise 323
Princeton (Sam Wang) 323
Upshot (New York Times) 326

I’ve put together the maps, and labeled the source in green.

Some notes: Benchmark uses more data than just polling (e. g. economic indicators, history) and Predictionwise factors in betting lines for each state. And of course, each model factors the various polls a bit differently (e. g., how does one weight older polls? What track record does that polling outfit have? Is it a “likely voter” model or a “registered voter” model?)

But if you notice, the projected Electoral College count doesn’t vary that much; much of the dispute is in the “confidence interval”. Nate Silver’s model has a wider confidence interval (which can vary from a narrow Trump win to a Clinton landslide) and Sam Wang’s has a narrower confidence interval; I talk about this a bit more here.


November 6, 2016 Posted by | political/social, politics, politics/social, statistics | | Leave a comment

Election predictions: why the models differ

I see quite a bit of angst over the predictions of the upcoming general election. So I hope to explain the basic difference in philosophies of the competing models.

First, here is the obligatory map; this time I used Predictionwise which uses a blend of betting markets, polls and other data to assign a “probability percentage” of winning the individual states. The map I present shows the blue states as one where Hillary Clinton has a 62 percent probability (or higher) of winning (by this model) and then explain what happens if one wants a higher threshold (say 80 percent, then 90 percent)


Now there are other models out there; fivethirtyeight gives Trump the highest probability of winning; Princeton gives him the lowest.

Why the difference? If you want full details, read Nate Silver’s explanation of the difference in models and his explanation as to why, though Clinton and Obama were in similar positions with regards to the popular vote, Obama was in better position with regards to the Electoral College.

First, look at this chart, taken from Upshot: (I cut out the many of the “safely Democratic” and “safely Republican” states, and attached the header so you can see which model the estimates came from)


Note the 127 “close” states that Trump has to win.

Now consider two “extreme models” (both Nate Silver and Sam Wang are too competent to use either of these, but these extremes can explain the difference in confidence):

Extreme model 1: the vote percentage in the states is in lock step with the national averages. What that means: say Clinton’s average is 45 percent and in, say, Wisconsin, she is 3 points above that. Then Wisconsin is labeled as “D + 3” meaning she’ll get 3 points more than the national average. Now if there is a shift in the national polls, or if the national polls are just a bit off, that shift will be reflected in each state. For example, say the polls shift 4 points in Trump’s direction so Clinton’s average is 43 percent nationally. Then in this model, “D + 3” now becomes 46, down from 48. And that happens IN EVERY STATE.

Therefore a 2 point lead in each swing state becomes a 2 point deficit in each swing state, which indicates that Trump has a reasonable chance to win all of those close states, given a national surge or, say, the polls being off by a bit. Hence the uncertainty.

Of course, this works in the other direction as well; if the polls shift toward Clinton, she could win by a landslide. That explains the relevance of this remark by Nate Silver.


Now one could use the other extreme model: that the swing states are independent. That is, say, an increase in Trump support in New Hampshire is not correlated with an increase in Trump support in, say, Nevada. Now by that model, Trump is cooked; his chances of winning ALL of those tightly contested 127 electoral votes is basically zero, hence Sam Wang’s statement:


Now Wang is way too competent to make the simplistic assumption that the state results are independent of one another. But one has to remember that Clinton is using a sophisticated voter targeting operation in key states (her “firewall states”) and Trump has contempt for such operations. So a small Trump surge nationally might not help him close the gap in those states. Obama’s campaign manager Jim Messina explains that there.

Again, neither Silver or Wang use these extreme models; they are way too competent to do so. But their models weight uncertainty and polling error and the statistical independence of the states differently, hence the difference in probability.

In a nutshell: Silver’s model has a wider “confidence interval” for the number of Electoral Votes (hence, higher probability of a Trump win or a Clinton landslide) and Wang’s confidence interval is smaller (centered around a modest but solid Clinton win in the Electoral College).

November 4, 2016 Posted by | political/social, politics, politics/social, statistics | , , , | 2 Comments

Mathy post: women’s legs, polls, pigeons and expectations …

Workout notes: it was 75 F and yes, 100 percent humidity again. THAT, plus my 3 in a row this past weekend (15 mile run Saturday, 13 mile walk Sunday, 4 mile race on Monday) left me tired. So I did a slow, untimed 6.3 (10K) run/walk. Today, it was enough.

Women’s legs and running

Ok, the youngest woman in the photo (calves!) is in her late 50’s. The one with the blonde ponytail is in her early 60’s; the other two are over 70. Yes, all frequently win awards at running races.

So the question: do these ladies get their legs from all of that running, or were these ladies attracted to running because they had the genetic potential to have good legs for running? The answer isn’t that clear, is it.

Bad Math Pun

Yes, I’ll put this as a bonus question on an exam at the appropriate time.

Speaking of statistics: Nate Silver gives a run down of the current state of the election. Clinton is about a 70 percent favorite in many models (including the betting lines) and about 90 percent in the Princeton model (see the NYT model and other models here). If you look at what we are seeing in the polls right now (Trump with a narrow lead in a few of them; the rest showing Clinton with up to a 6-7 point lead), we see that 3-4 point Clinton lead best explains what we are seeing.

Presidential elections in “no incumbent” running years tend to be close (3 times in my lifetime, the popular vote spread was less than 1 point: Kennedy-Nixon, Nixon-Humphrey, Gore-Bush, twice it was 7-8 points: Obama-McCain, Dukakis-Bush).

And this leads me to another topic: conditional probability. This shows up in the famous Montey Hall problem.

Imagine a game: you are shown 3 doors; the prize is behind one door and the other two doors have nothing. Here is the rule: you pick one door. Then the person running the contest *always* shows you a door that does NOT have the prize. Always…and you know that the person running the show WILL do that.

So, should you switch to the door that you did not pick that remained unopened?

Answer: YES. And pigeons are actually better able to figure it out than humans!

Here is the math behind this:

You pick one door: 1/3 is your probability of success. Then you are given the option to choose from the two doors that you did NOT pick…that means if you switch, your probability of success climbs up to 2/3. Remember you only fail if you were right the first time.

Think of it this way: imagine there were 100 doors. You pick. Then you are shown 98 doors where the prize is NOT. Would you switch? Remember your probability of being right on the first choice was 1/100.

Here is where “conditional” comes in: label the doors I, II, III. You pick I. You are shown that it is NOT II.

P(III|not II) can be calculated with Bayes Law.

September 8, 2016 Posted by | politics, running, statistics | , | Leave a comment

Jeb Bush: Trump Supporters Aren’t ‘A Bunch Of Idiots’ (he is right)

Jeb Bush said the following:

Former Florida Gov. Jeb Bush (R) said Saturday that supporters for GOP presumptive nominee Donald Trump aren’t “a bunch of idiots” and should be respected, CNN reported.

“What I fear is that people, kind of looking down their nose, will say the people that are supporting Donald Trump are a bunch of idiots. They’re not. They’re legitimately scared. They’re fearful,” Bush reportedly said at an event in Amsterdam. “They’re not as optimistic for legitimate reasons and there should be respect for that. And on the other side, a similar respect needs to be shown.”

Now of course, this statement (which I think should be obvious) has met with ridicule. Yes, I know, I know, we’ve all seen the cherry picked photos of Trump supporters and of Trump rallies:



So, yes, there are some dumb people supporting Donald Trump. And yes, there are some evil ones too.

But when are talking about a national candidate with millions of supporters, a tiny selection of supporters tells you very little about the whole.

Here is an example of what I mean: think of 2008, when i was a proud Obama supporter. Well, some of then Senator Obama’s support came from the..well, less than informed people

and some came from morally questionable people too.

Again, this is just statistics in action; the larger the population, the more the population resembles the larger population.

So, what can say about Trump supporters, “in general”?

For one thing, on the average, they tend to have a higher household income than either Sanders supporters or Clinton supporters.: (the data I report measures median household income; “median” means “that income that is in the middle range of supporters; half of incomes are above, half are below”; this is done to mitigate the effects of a few very large incomes)

72K per year as compared to 61k per year for both Clinton and Sanders supporters. Now this isn’t true in every state: in New Hampshire, Vermont, Connecticut and Virginia the median household income of a Clinton supporter exceeds that of a Trump supporter. Trump supporters earn more than Sanders supporters in all of the surveyed states.

Secondly, there is a positive correlation between income and IQ; on the average those with higher IQs tend to earn more money than those with lower ones. NOTE: the New Scientist article I linked to also deals with wealth too and there isn’t much of a correlation with IQ and household wealth (example: those with higher incomes might well spend more):

The work reveals that while exceptionally smart individuals typically earn more, they are also more likely to spend to their credit card limit, compared with people of average intelligence.

Jay Zagorsky at Ohio State University in Columbus, US, analysed personal financial information collected from 7500 people between the ages of 33 to 41. Subjects provided details about their cash flow – including wages, welfare payments, alimony, and stock dividends – and their overall net worth. They also answered questions about whether they had “maxed out” any of their credit cards, missed bill payments or filed for bankruptcy.


On the surface, Zagorsky’s analysis confirms the findings of previous studies linking higher intelligence with higher income. “Each point increase in IQ test scores is associated with $202 to $616 more income per year,” he says. For example, a person with a score of 130 (in the top 2%, in terms of IQ) might earn about $12,000 more per year than someone with an average IQ score of about 100.

On the surface, people with higher intelligence scores also had greater wealth. The median net worth for people with an IQ of 120 was almost $128,000 compared with $58,000 for those with an IQ of 100.

But when Zagorsky controlled for other factors – such as divorce, years spent in school, type of work and inheritance – he found no link between IQ and net worth. In fact, people with a slightly above-average IQ of 105 , had an average net worth higher than those who were just a bit smarter, with a score of 110.

Again, there is the correlation between INCOME (not net worth) and IQ.

So, if anything, the data might suggest that Trump supporters might be somewhat brighter than the Sanders and Clinton supporters, on the average. I say “might” because I don’t know the “n” for these income samples. It might be that the Clinton and Sanders groups are larger groups, and therefore subject to “regression to the mean” effects whereas the early Trump supporters might be a more selective sample of people (fewer people).

But I think that there is no evidence that Trump supporters are dumber than either Sanders or Clinton supporters.

May 22, 2016 Posted by | 2008 Election, 2016, politics, politics/social, social/political, statistics | , , | Leave a comment

Rant: recognizing the limits of what one knows

I’ll admit that I am an expert in a very narrow slice of mathematics. But I am at least an AU from being an international or even a national caliber expert in that narrow field of mathematics.
And yes, I often read about topics that are not in my area; I enjoy popular books and articles on topics from the various branches of science, economics and the like.

Nevertheless, I also realize that when I read such a book or article, or when I attend a public lecture, I am getting a watered down, simplified treatment of the subject. I lack the context and the prerequisite knowledge to appreciate a presentation aimed at the experts.

And there lies one of my biggest frustrations when it comes to talking to people, either on the internet or in person. There are so many who really can’t detect the difference between expert knowledge and what they read (and perhaps half-digested …if that much) from a popular book. It is THAT level of “lack of humility” that makes some unpleasant conversation companions; I am ok with ignorance. After all, I am ignorant of the vast majority of human knowledge. I think that all of us are.

And, sadly, I see this lack of intellectual humility in political or social issues discussion, especially from the “losing side”. It appears to me that being on the losing side of an election (and I’ve been there, many, many times) brings out the worst in people in several ways.

Example: I had someone try to tell me that Hillary Clinton’s popular vote is “within the margin of error”, when one factors in the caucus states.

Of course, that is a dumb statement for a number of reasons.

1. There is a difference between a vote count and a poll count, even though both have a margin of error (remember Florida in the 2000 general election). The margin of errors in vote count is much smaller than it is for a poll.

2. The margin of error for a poll is 1.96 * \frac{.5}{\sqrt{n}} (assuming a 95 percent confidence interval and a relatively close election; this comes from the normal approximation to the proportion distribution. So as n increases, the confidence interval, and therefore the margin of error, decreases. Note: for more on polls, read this wonderful little article written by a physics professor.

3. Hillary Clinton leads by about 3 million votes, even when one counts the caucus votes. The latter doesn’t add much as there are fewer caucus states, and these tend to be smaller states. Anyhow, she leads about 57-43.

4. The person making the claim appeared to not understand that winning a small state by a very large percentage didn’t make up for winning a bigger state by a smaller margin.

Yes, by knowing that Sanders won a lot of caucus states and that there IS such a thing as margin of error puts this individual into the “above average” category. But this person was clearly ignorant of their own ignorance.

There is another factor in play: I really think that desperation makes one dumber. When one really likes a candidate or a person, or even a sports team, it is tough to accept an unpleasant reality. I’ve become acquainted with the latter as an Illinois football fan (“yeah, we have a shot at being Wisconsin!” Sure.)

Desperation can lead to an abandonment of one’s values. Check out the Republican Chairman’s take on Donald Trump

Oh sure, few would be surprised at Donald Trump’s behavior, and I doubt that a certain type of Republican really cares that much (“hey, what do you expect with Trump anyway?”)

May 16, 2016 Posted by | Personal Issues, political/social, politics, poll, ranting, statistics | , , , | 1 Comment

West Virginia votes today and…and uncomfortable right wing cartoon

The cartoon:


Yes, liberals tend to reflexively take the side of the underdog and, all too often, liberals conflate complaints about the more regressive practices of Islam (example) with justifications of anti-Muslim bigotry (which I openly oppose).

I’ll make it clear: saying that Islam (on the whole) enables many regressive practices is NOT the same as opposing the building of mosques, backing noxious anti-Muslim immigration policies, etc.

West Virginia votes today This should be a rather easy victory for Sanders. This would cut Clinton’s lead in pledged delegates from 285 to 280 or so. However, this shouldn’t be like the 2008 blowout where Clinton crushed Obama by about 40 points (and still trailed by 100 delegates or so); the link is to an old Daily Show (with Jon Stewart) episode which had a funny take on it. Of course, I can put West Virginia in the Republican column right now, though it wasn’t always that way.

National Election

Donald Trump is now turning to the Republican Party for funds. So maybe this election will be more conventional than previously thought.

And yes, you’ll hear that Hillary Clinton is trailing in this battle ground state or that one. Reality: she has a good sized lead right now and it will take something special to change it.

And about the election coverage: Gin and Tacos, while giving Nate Silver proper credit, seems annoyed that many don’t realize that what he does is really, at least by academic standards, well, sort of basic. (and yes, Ed admits some jealousy, but what about me? I don’t even have the best blog on the 4’th floor of my building! 🙂 )

I’ll tell you what I like about Nate Silver: he got his stuff out there, and in 2012, it was a very useful counter to all of the garbage that places like NPR were putting out. My friends who followed the election on NPR were scared to death, even though I told them that the election wasn’t close and showed them the battle ground state polls:

Screen shot 2012-11-06 at 4.38.49 AM

Romney only lead in a few of these and always at “margin of error” levels. There was no hope for him here, though the media constantly reported a “close race”. Silver was the public face against such nonsense; I call the 2012 election as a “victory for the nerds.”

May 10, 2016 Posted by | political/social, politics, politics/social, religion, statistics, Uncategorized | , , | Leave a comment

Bad political writing: an example

I saw the following article on a FB friend’s wall. The article contained the following passage:

“In short, the Clinton campaign is in the midst of an historic collapse — much of it due to the unraveling of support for Clinton among nonwhite voters — and the national media has yet to take any notice.
Clinton’s 48-point lead in New York less than two weeks ago is now just a 12-point lead, according to the latest Quinnipiac Poll. That poll shows Sanders with approximately 300 percent more support among African-American voters in New York than he had in Mississippi earlier this month.

Wow…that looks pretty bad for Clinton right? Well, let’s look at a series of New York Democratic Primary polls:(chronological order, 2016 polls only, which the bottom being the most recent:

Clinton + 21
Clinton + 21
Clinton +48
Clinton + 12
Clinton + 10

Do you see what is going on? Clearly, the “Clinton by 48 points” is an outlier poll. But because using that poll as a baseline fit the narrative of the article writer, that is the baseline he used!
Climate change deniers do something very similar when they take an unusually hot year (say, 1998) as a baseline and then start arguing that subsequent years are cooler, when in fact those subsequent years are still warmer than the years preceding the unusually hot year.

But back to the political article: yes, there HAS been movement toward Sanders, but hardly the outrageous amount that is claimed by the article.

While we are on the subject of “being misleading”, take a look at an official Trump campaign meme:


Yes, it doesn’t say anything false and it does cherry pick the most favorable poll…which still shows Trump trailing (albeit within the margin of error).

April 3, 2016 Posted by | politics, politics/social, statistics | , , , , | Leave a comment

Games, free speech, terrorism, etc.

Workout notes: 10 K “run” on the track: 9:59, 9:44, 9:33, 9:32, 9:27. 9:44 then 3:10 walk/jog inner lane 2 laps (58:03 at 6, 1:01:13 for 10K). It was mostly an empty track.
Gads. Though this was not a race effort by any means, IT WAS WORK. Sigh…

Posts: It is the start of Thanksgiving break and so I played hooky and went to a daytime game (no classes). The Bradley women got creamed 72-59 by Western Michigan; WMU lead by 16 before freely substituting.

But hey, it was a game to watch. 🙂

Statistics Yes, I know the technical definition of p-value and what “it means”. But attempts to “make it intelligible” to non-experts often fail:

What I learned by asking all these very smart people to explain p-values is that I was on a fool’s errand. Try to distill the p-value down to an intuitive concept and it loses all its nuances and complexity, said science journalist Regina Nuzzo, a statistics professor at Gallaudet University. “Then people get it wrong, and this is why statisticians are upset and scientists are confused.” You can get it right, or you can make it intuitive, but it’s all but impossible to do both.

No fly zones: Turkey shot down a Russian fighter. Ugh. Last I heard, Turkey claimed that the fighter was over Russian airspace and Russia denies that.

Free speech A survey came out about whether it is a good thing to censor speech that “is offensive to minorities”. Not surprisingly, Democrats were more approving of censorship than Republicans (though NOT the majority of Democrats) and the youngest generation (millennials) were strongest in favor of censorship. The good news is that the more educated the person, the less likely that they would approve of censorship. That is good news, given some of the nonsense one hears coming from college campuses these-a-days.

Republicans and Donald Trump

Sure it is still early and most people haven’t started to pay attention to the election. Nevertheless, Donald Trump really is doing well and it should not be that surprising:

Indeed. You have a party whose domestic policy agenda consists of shouting “death panels!”, whose foreign policy agenda consists of shouting “Benghazi!”, and which now expects its base to realize that Trump isn’t serious. Or to put it a bit differently, the definition of a GOP establishment candidate these days is someone who is in on the con, and knows that his colleagues have been talking nonsense. Primary voters are expected to respect that?

And it isn’t a surprise that the terror attacks in Paris helped him:

Conventional wisdom on the politics of terror seems to be faring just as badly as conventional wisdom on the politics of everything. Donald Trump went up, not down, in the polls after Paris — Republican voters somehow didn’t decide to rally around “serious” candidates. And as Greg Sargent notes, polls suggest that the public trusts Hillary Clinton as much if not more than Republicans to fight terror.

May I suggest that these are related?

After all, where did the notion that Republicans are effective on terror come from? Mainly from a rally-around-the-flag effect after 9/11. But if you think about it, Bush became America’s champion against terror because, um, the nation suffered from a big terrorist attack on his watch. It never made much sense.

What Bush did do was talk tough, boasting that he would get Osama bin Laden dead or alive. But, you know, he didn’t. And guess who did?

So people who trust Republicans on terror — which presumably includes the GOP base — are going to be the kind of people who value big talk and bluster over actual evidence of effectiveness. Why on earth would you expect such people to turn against Trump after an attack?

Hey, Fox News and Rush Limbaugh created Donald Trump’s candidacy.

November 24, 2015 Posted by | civil liberties, politics, republicans, republicans politics, running, statistics | , , , , , | Leave a comment

A timely Kathleen Parker column

Kathleen Parker wrote a column which started out about Donald Trump:

Exhibit A: Donald Trump, who can’t stop talking about how rich he is.

My father used to say, “People who have it (money) don’t talk about it.” No one told his mother, whose tropes included, “If you got it, honey, flaunt it.” They didn’t get along.

“Be slow to know” was another of my father’s favorite refrains. As in, be a little mysterious, don’t give away everything, keep yourself to yourself. When I was a child, the most humiliating reprimand from a parent was, “Don’t be a showoff.”

To be a showoff was to signal to the world that you were so lacking in character or talent that you had to attract attention some other way. Enter Trump, though he does apparently have a talent for making money. It helps if your father leaves you millions, as Trump’s did.

Whereas humility was once the universally acknowledged virtue to which one aspired, today we “humble-brag.” As in: “I looked like a wet mop the day I got the Pulitzer.” Something like that. […]

But she goes on to say something else:

Cox and Emanuel hugged. She tweeted. I marveled. I should have tweeted that they hugged, but I’ve just written it so all those readers — did I mention 80 million? (#braggingisfun) — now know about it. Which is meaningless. What matters is that Cox has 1.3 million followers and I (@kathleenparker) have something well south of that.

I’m told this is embarrassing.

Really? I’m embarrassed when I forget that the word “media” is a plural noun and should be followed by “are” not “is.” I’m embarrassed when I put a comma before “but” when it follows a negative predicate. As in: Having few Twitter followers isn’t only embarrassing(,) but is also career-limiting, as the following anecdote illustrates.

I kind of chuckled about that.

But on my Facebook feed, I saw a friend share this meme:

millennial fail

When I went to the source of the meme, I saw that the creator was thrilled that so many were sharing it. That she misspelled “millennials” or that she made a massive logical error mattered not at all.

If you didn’t catch the logical error: there are 75.3 million millennials in the US (using 18-34 as the guide) to 74.9 million baby boomers. So even if both groups vote at the same rate (and they don’t), a majority of millennials voting for one candidate won’t ensure a “landslide”; in fact it won’t ensure a mere victory! Example: suppose 51 percent of the millennials vote for candidate X and 55 percent of boomers vote for Y…if both groups vote at the same rate, Y wins easily.

But hey, they are AWESOME. 🙂

July 11, 2015 Posted by | politics, politics/social, social/political, statistics | , , , | Leave a comment