Jump to content

Talk:Margin of error

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Former featured articleMargin of error is a former featured article. Please see the links under Article milestones below for its original nomination page (for older articles, check the nomination archive) and why it was removed.
Main Page trophyThis article appeared on Wikipedia's Main Page as Today's featured article on October 19, 2004.
Article milestones
DateProcessResult
October 13, 2004Featured article candidatePromoted
March 3, 2007Featured article reviewDemoted
Current status: Former featured article

Concerns

[edit]

There are some concerns about this article's depiction of the margin of error as a statistic used specifically for polling. There is a discussion below about whether the margin of error is synonymous with the confidence interval and whether, as such, it should be more broadly defined. Fadethree 8:36, 19 Oct 2004 (UTC)

TeX (moved to Wikibooks)

[edit]

Near the end of the section on comparing percentages we see this:

This seems to be a dangling phrase, not part of any sentence; it needs to be clarified. I'm guessing it mean the cumulative distribution function of a specified normal distribution; it so, there are clearer ways of saying that. Michael Hardy 00:22, 4 Oct 2004 (UTC)

  • Hi, Michael. Thank you for the valuable suggestions. I should have known better about the capitalization and spacing issues; thanks for the corrections. Regarding the do-it-yourself section, my intention was to allow the user to simply copy and paste the line of code into a Microsoft Excel spreadsheet so that they could calculate the probability that Kerry (or anybody) is leading given the information from a poll. I reverted it back for two reasons, 1) to allow the cut and paste and 2) to match the look and feel of Microsoft Excel, which uses A1 B2 C3 notation without subscripts. I changed some things around to make it clear that I was working in the context of Excel. Let me know what you think; I am happy to revert it back given different intentions. Best, Andrew (Fadethree) 05:35, 4 Oct 2004 (UTC).

In section "Comparing percentages: the probability of leading", is there any chance we could change " use a program like Microsoft Excel " to " use a spreadsheet program "? In my humble opinion, this would improve the Neutral Point of View... Charm 18:17, Oct 21, 2004 (UTC)

"poll"

[edit]

If I'm not going completely mad, isn't use of the word poll a little inaccurate? Shouldn't it be a sample or sample survey? Or at least it is according to my textbook! Please advise... I'd hate to incorrectly edit a front page article

Also, I would have said, for example, 5% significance level rather than 95% confidence level. Does this equate to the same thing, or is a difference in terminology across the Big Pond. If so, should both definitions be included?

How is poll inaccurate? We've been having a discussion below about assumptions that the margin of error is synonymous with the confidence interval. Is this your assumption? If so, note that this article defines the margin of error as a quite specific confidence interval; they are not equivalent. As to your second point, I am fairly certain that 95% confidence level is synonymous with 5% significance level. I would think that confidence is not as apt to be misconstrued as "significance." This talk section has been largely concerned with semantics, and while semantics are nontrivial, I would argue that the dominant audience of this article is people who are trying to understand the margin of error without a statistics background. I would like to err on the side of parsimony. Your thoughts? Fadethree 21:27, 19 Oct 2004 (UTC)
Wow! I will admit that I'm being a little pedantic here, and certainly didn't mean to rub someone up the wrong way, but found during my lessons in stats terminology was very important. Point taken on both counts - the pedant in me is perhaps more needed on the statistics page! Cheers --Mark Lewis 14:34, 20 Oct 2004 (UTC)

Picture idea

[edit]

If this article becomes featured, a picture should be added, in fact, one probably should anyways (:. How about some sort of graphic depicting the margin of error in the provided kerry/bush example? siroχo 09:14, Oct 4, 2004 (UTC)

Re: Level of "Confidence"

[edit]

(This comment copied from Fadethree's talk page:)

Regarding my comment on the FAC page. I appreciate the fine work you've done to get the article to this point. I appreciate your explanation, and I noticed you gave the alternate forms indirectly. I guess the pedant in me simply cringes at leading with an expression which is not fully qualified. Perhaps if in introducing the equation you could mention the 99% confidence interval (though I admit this could be ugly too). I would be happier if the equation was generalized with the proper variable in the numerator and a notation that "for 99% confidence, (numerator) is 1.27".

Also, several references I've seen (an online one is the iSix Sigma site claim that a 95% confidence interval is more common, which matches with my experience also. This may be a more serious issue than the form of the equation. Jgm 13:19, 6 Oct 2004 (UTC)

First of all, let me say that I think that you're right that this should be made clear from the outset, and after this comment I'll try to rearrange the definition to get the "disclaimer" moved higher up than the last line. It might not be as pretty, but I do think it's worth it for now. I may give up and just expand the last line and bold-face some issues I talk about below, but I'll try to move it up first.
To reply to the substantive issues, you're right that 95% confidence is by far and away the standard for "significance" in most of the sciences (though it is usually expressed as p<.05 or some such). In recent media polls, however, the standard seems to be, by far and away, a 99% margin of error. (edit: counterexample the AP poll ([link]).) This can easily be checked in any poll that reports both the sample size and the margin of error (if 1000 is 4% and 1500ish is 3% say). There are a few points to be made here about reporting widely misunderstood topics.
  1. First, no one says, "99% margin of error" or 95%, or anything like it. In other words, there seems to be an illusory consensus here about what the confidence should be. Wikipedia is bound to report what this consensus is, but it is also bound to educate people if this consensus is misleading or damaging. This is why I agree with you that we should make this more clear.
  2. Second, this is an interesting interpretation of the "mission" of Wikipedia. It is bound to report "facts," but what if the way that people understand a topic is fundamentally flawed and potentially misleading? Should it also be bound, then, to educate?
  3. Third, who is the intended audience here? Is it curious members of the voting public who read a poll and want to know what the margin of error is all about? Or is it people who are knowledgeable consumers of statistics and want to use the margin of error for their own social science (or other) work? The opening paragraph should try to aim for as broad a target audience as possible, but the distribution of prior knowledge here seems vast. These are just some thoughts. Fadethree 14:41, 6 Oct 2004 (UTC)

Re: Should a real or hypothetical example be used?

[edit]

(This comment copied from the FAC page)

We should use a made up and neutral example, so we need not use disclaimers like "It should be clear that the choice of poll and who is leading is irrelevant to the presentation of the concepts." These disclaimers are ugly in the flow of a well-written article. ✏ Sverdrup 10:14, 6 Oct 2004 (UTC)

I guess we can frame this as an issue of accessibility. It is a great point that Wikipedia articles should be as general and neutral as possible. This suggests stripping the running example of as much context as possible. I have spent some time teaching, and a successful strategy I've used is meeting the audience (students or web surfers) at the level of their experience. It is much easier to learn a topic when you can relate to it, and this is why I've included an example that I think most people who hear "margin of error" can relate to. If this article does become featured, I think that this would be a great "hook." However, there is also a larger point that, well, this election will be over soon, and this article will live on well past Bush vs. Kerry. A neutral example would probably be applicable for a longer time (though of course it could be updated). I'd like to solicit some other people's opinions (including Sverdup's response; thanks for your comment!). Fadethree 14:41, 6 Oct 2004 (UTC)

Personally I would say made up examples tend to lose people too easily, current examples tend to be contentious and require disclaimers, maybe a real but slightly older example might work as a compromise to avoid either issue?

I think you're right. Creating hypothetical names that are easy to distinguish would probably be an easy way to keep the situation relevant. We could use "Larry Leader" and "Pete Pariah" for the candidates.

questions

[edit]

1) The last section of the article provides a formula for the standard error of the difference between two perfectly negatively correlated survey options, one chosen by p percent of the respondents, and one chosen by q percent of the respondents. The article notes that the assumption of perfect negative correlation "may not be a tenable assumption when there are more than two possible poll responses." The article then describes how the formula would be applied to the results of a 2004 Newsweek poll that reported voter support for John Kerry and George W. Bush.

The choice between Kerry and Bush probably wasn't perfectly negatively correlated, but the formula is applied to the Kerry-Bush example anyway. Is the formula, then, a reasonable approximation of the probability that one option is truly preferred if the two options are almost perfectly negatively correlated? (My question is sincere. I have a math background, but limited experience with statistics.)

2) Referring again to the formula for the standard error of the difference: The response to the questions immediately below on this "Talk" page states that "perfect negative correlation does not necessitate q=1-p," and that "q+p=1 ... is rarely true in most polls." Further below, however, under the discussion headed "Comparing percentages: the probability of leading," there appears to be agreement that the formula itself is based on the assumption that p+q=1.
    a) If the assumption p+q=1 was used in deriving the formula, should that assumption be added to the text of the article?
    b) Does there need to be an edit or correction to the statement "perfect negative correlation does not necessitate q=1-p"? (I realize that the statement could easily be true even if p+q=1 was assumed. I just want to ensure a correction wasn't overlooked, given that there seems to have been at least momentary confusion about the use of the assumption p+q=1 in deriving the formula.)
    c) If the assumption p+q=1 was used to derive the formula, can't the formula for the standard error of the difference then be simplified, as the questioner below suggests, to twice the (approximate) standard error for a survey percentage p? And if the answer is yes, might it make sense to go ahead and show the simplified formula in the article?

3) The article states that "the difference between two percentage estimates may not be statistically significant even when they differ by more than the reported margin of error." This may well be true, but I suspect many people are assuming that the difference is statistically meaningful when the two percentages differ not by more than the reported margin of error, but by more than twice the reported margin of error. The thinking would run this way: Candidate A is favored by 51%; Candidate B is favored by 46%; and I can be 95% confident that Candidate A is truly ahead since the margin of error is ±2 percentage points, meaning that Candidate A has at least 49%, while Candidate B has at most 48%.
    a) Would the statement still hold for twice the margin of error — that is, would the statement still be true if it said that "the difference between two percentage estimates may not be statistically significant even when they differ by more than twice the reported margin of error"? If so, an edit might be worthwhile, since it would probably address the more common assumption about the margin of error.
    b) Does "statistically significant" in the sentence above refer to the significance measured using a t-test for a single population? If so, might there value in mentioning this parenthetically or in a footnote in the article for people trying to connect the various statistical measures of confidence? I do recognize the article is meant primarily for the average reader trying to make sense of a phrase commonly used in public opinion surveys, but I imagine there are some who, like me, are trying to gain greater clarity on just how statisticians themselves view the survey findings and the context they would judge those findings against. That curiosity is not entirely unnatural, given that the sentence itself includes a hyperlink to the Wikipedia entry on statistical significance.
    c) Can the statement above also be taken to mean that if the two percentages differ by less than the margin of error — or alternatively, by less than twice the margin of error — the difference is definitely not statistically significant?
    d) If the answer to my question 2c above is yes, could the statement I'm referring to in question 3 be made more specific, given the mathematical relationship between the (approximate) standard error of the point estimate p and the margin of error for the survey?

This has proved to be a useful article, by the way. Thanks to all the contributors. Nahodha (talk) 17:57, 21 March 2016 (UTC)[reply]


you say assume a 99% level if unspecified to be conservative. 1st, my impression is that 95% is typically the default. 2nd, in that case the 99% assumption understates the variance and thus is actually unconservative, or am i missing something here?

second, why use the constant 1.29 in the 1st example? that is correct for a point estimate of 1/2. For any other point estimate, the constant is wrong. (i do realize you give the correct formula later).

your probability of leading formula assumes perfect negative correlation. i assume here that you mean q = 1-p; why then not make the substitution p - q = p - (1-p) = 2 p -1; so variance = 4*p*(1-p)/N. that reads a lot simpler than the complicated formula you provide.

i generally like this article, and agree that it is a subject woefully misunderstood by the public & press. often people think anything within the margin of error is equally likely, not true of course. it might be a nice example to show how 2 indepent polls can be combined to give a new estimate, with a smaller margin of error. Wolfman 03:50, 7 Oct 2004 (UTC)

  • These are very illuminating questions. Thanks for checking in.
  1. As I note to Jgm above, 95% is standard for most applications in the sciences, but 99% appears to be the standard for most polls reported by the media. As I note there, you can confirm this because polls that have a sample around 1,000 say that their margin of error is 4% (it would be 3% if a 95% confidence interval were used). As I also said to Jgm, this ambiguity (no one specifies what confidence level the margin of error should be) adds to misinterpretations about the statistic. (edit: though it looks like the AP is using the 95% ([link]). I think, though, that as long as some respected pollsters are using 99%, all of them should unless they report the level of confidence.)
  2. I think that you're missing something, but it's not an uncommon slip. Remember that in order to have higher "confidence," the confidence band must be larger. To be sure that you have a confidence band of 99% for a reported 50%, therefore, the margin of error must be larger. This is more conservative because it suggests that the poll has more potential error.
  3. You hit the nail on the head with this question. As you say, for any other point estimate, the constant is wrong. Yet this is the definition of the margin of error. The key thing to remember is that the margin of error is not the confidence interval of any percentage, it is the confidence interval placed at a particular reference point for the purpose of comparing polls.
  4. Another great question. You've just proven that what could be called the margin of error of the difference is twice the margin of error. It is a useful result for approximation. However, perfect negative correlation does not necessitate q=1-p, just that any person who switches votes switches to the other candidate and no other. Also, if we assume that q=1-p, we assume that q+p=1, and this is rarely true in most polls. Note that if it were true, we wouldn't need the standard error of the difference at all, we could just calculate the probability that p>50%.
  5. I have reservations about combining polls to reduce the margin of error because this requires an assumption that they are both random samples of the same population. This is a tenuous assumption for one poll let alone two, and often the discrepancies between polls tell us not about the population but about the respective sampling procedures of the pollsters. Still, it might be a valuable mathematical exercise, and I would be happy to have you add that section should you summarize these kinds of reservations. Thanks again for the insightful questions! Best, Fadethree 18:43, 7 Oct 2004.

Is 95% or 99% more common/conservative?

[edit]

A [Zogby] poll came out today that used the 95% confidence interval, so I'd like to revise my earlier claim that "99% is by far and away the standard..." I should have looked at more polls before making that claim. However, I do not think that this refutes the point that I made in the opening paragraph, "...if the report does not state the confidence level... the 99% equation should be used." As long as some polls are using the 99% equation, they are using a higher standard (their margin of error will be larger for the same sample size). Therefore, the burden of fair reporting falls more squarely on the shoulders of polls using a lower standard. Overall, it is clear that the confidence level should always be reported, because the margin of error is not uniquely defined. This is my opinion as of now; let me know your thoughts. Fadethree 16:57, 11 Oct 2004.

Suppose a poll says the margin of error is 3% without specifiying the confidence level (which is really 95%). Suppose I am personally interested in knowing the 95% confidence level. If I assume that the reported level is 99%, then I would infer that the true 95% level is maybe 2%. In that case, I am overly confident in the polls accuracy, so assuming the 99% level is not conservative.
On the other hand, suppose 3% is actually the 99% margin of error, but I wrongly assume it is the 95% level. If I then want to know the 99% level, I would guess maybe 4%. That is an overstatement of the true margin of error, and thus conservative in that it understates the accuracy.
So, unless I have just got my head turned around wrong on this, it seems to me that the 95% assumption is the conservative one. And thus the safer level to use if the poll does not report the correct confidence level. Wolfman 04:03, 12 Oct 2004 (UTC)
I will try to reframe your argument in a way that is hopefully accurate and hopefully illuminating. There are three variables regarding the confidence level. A) What you assume it is, B) What it actually is, and C) What you want to calculate. Your argument reduces to the following premises: If A < B then C will be overestimated and conservative. If A > B then C will be underestimated and not conservative. You concluded correctly that it is conservative to assume the confidence level is less than it is ("...it seems to me that the 95% assumption is the conservative one."). But this tells us what is more conservative to assume, not what is more conservative to report. As you can see, if we want to be as conservative as possible, we want to minimize A and maximize B. As I noted in the introduction, the 99% equation should be used (i.e. reported) so that no matter what the readers assume, they will be conservative.
Of course, ideally, A = B so that C will always be right. This is one of the reasons why I advocate reporting the confidence level. Even if the confidence level is not reported, the formula for the margin of error can be inverted and solved for the numerator, which we can match to the confidence level.
Now, you could make an argument that this article suggests that the 99% confidence level should be assumed, and though I don't think it does, this would be a bad thing. Any tips that you suggest to make this more clear would be welcome. Thanks again for your question! Fadethree 6:23, 12 Oct 2004 (UTC)
From your reply, I think we agree on the basic principles. This is the line of the article that throws me: "If a report does not state the confidence level of the margin of error, the 99 percent equation should be used to prevent readers from underestimating the potential variance of poll results." I now think what you mean by this sentence is: if an author chooses not to report the level, he should use the 99% figure. I previously read it as: if an author has not reported the level, the reader should assume the reported level is the 99% figure. It's the passive voice "should be used" that creates the ambiguity. Wolfman 06:41, 12 Oct 2004 (UTC)
Gotcha. I see how that sentence is vague. I have changed it to, "If an article does not state the confidence level of the margin of error, it should report the margin of error at 99 percent confidence to prevent readers from underestimating the potential variance of poll results." I am still wondering if I should be more clear, but I'll sleep on this version for now. This was helpful; thank you. Fadethree 6:50, 12 Oct 2004 (UTC)
I find your revised wording clear. And, kudos on a fine article. Wolfman 07:01, 12 Oct 2004 (UTC)

Revisions

[edit]

I have done a survey of polls, and I have concluded that 95% is certainly common enough to be designated "common." I have reorganized the opening to be more concise and take note of the different standards that are out there, and I have added formulas for the 95% (and 90%) confidence levels as well. I think this makes for a much more well-rounded article, and I'd like to thank Wolfman and Jgm for highlighting this issue for me. Best, Fadethree 21:48, 13 Oct 2004 (UTC)

Levels of confidence

[edit]

It says:

The margin of error can be calculated directly from the sample size (the number of poll respondents) and may be reported at three different levels of confidence.

Surely it can be reported at far more than just three levels. Nothing stops a statistician from reporting a 98% or 97% margin of error. Maybe what is meant is that those three levels are frequently mentioned? If so, maybe this should be rephrased. Michael Hardy 01:14, 19 Oct 2004 (UTC)

Absolutely right, Michael. I have clarified this as follows, "...and is commonly reported at one of three different levels of confidence. The 99 percent level is the most conservative, the 95 percent level is the most widespread, and the 90 percent level is rarely used." I wonder if I should be more explicit... feel free to try another wording. Thanks also for your helpful edits to the opening. Best, Fadethree 01:47, 19 Oct 2004 (UTC)

Margin of Error not the only posible poll error

[edit]

Should this article have a section about the need for the poll to have a truely random sample for the result the valid. ie. the idea that doing the Bush/Kerry poll by sampling everybody who works at the White House would not give an acurate result. I know this isn't strictly "margin of error", but some people won't realise this and (maybe) it whould be pointed out Steven jones 01:41, 19 Oct 2004 (UTC)

Thanks, Steven. There is a note about this under the caveats section, "...due to its unfortunate name (it neither establishes a "margin" nor is the whole of "error")..." and under the arguments against... section, "Perhaps most importantly, there are many different sources of error in polling..." In my first drafts for the article, I placed the misconceptions and overinterpretations at the center of the article, but many people noted, correctly, I think, that the article should center first on what the margin of error is and second on the overinterpretations. Your thoughts? Fadethree 02:00, 19 Oct 2004 (UTC)

Last paragraph of the Explanation section ("If the exact confidence intervals are used, then the margin of error takes into account both sampling error and non-sampling error") seems clearly inconsistent with Basic Concept section ("However, the margin of error only accounts for random sampling error, so it is blind to systematic errors..."). The Basic Concept section is correct, so I suggest deleting most of the last paragraph of Explanation section so that it reads: "The margin of error does not represent other potential sources of error or bias such as a non-representative sample-design, poorly phrased questions, people lying or refusing to respond, the exclusion of people who could not be contacted, or miscounts and miscalculations." Charlie nz 10:28, 23 August 2007 (UTC)[reply]

The emphasis placed on use of margin of error in polls suggests to me that there needs to be much more explanation of other sources of error in polls beyond what MOE estimates. The existance of other errors in mentioned more than once, but no information is presented that puts these other errors in perspective. As the media uses MOE extensively the public is led to belief that polls are at least as accurate as the MOE. But this is an incorrect conclusion stemming from incomplete information provided by the media and repeated in the current article. PERHAPS AN ARTICLE IS NEEDED ON THE ACCURACY OF POLLS AND THE ARTICLE CAN LINK TO ONE SUCH SOURCE OF ERROR, MOE AS DESCRIBED IN THE CURRENT ARTICLE. CONSIDERING THE USER INTEREST IN UNDERSTANDING HOW BELIEVABLE A POLL IS, WIKI NEEDS SUCH AN ENTRY.64.27.230.232 (talk) 18:33, 14 June 2009 (UTC)[reply]

population size

[edit]

It's a common intuitive assumption that the ratio of (same size) / (population size) impacts the margin of error at a given confidence level: that is, a sample size of 1000 out of a population of 10,000,000 is more accurate than 1000 out of 100,000,000. This isn't the case for "large enough" populations, but isn't discussed in this article. I don't know the details well enough to discuss it precisely, but it's a frequent misconception, so should probably be mentioned. --Delirium 02:31, Oct 19, 2004 (UTC)

Yes, that is a very common misconception. I think it may be more appropriate for the sampling (statistics) page, however. The margin of error is, of course, a statistic that is derived from sampling theory, however the misconception applies to all sampling theory, not just the margin of error. I'd be happy to see it added at that page, of course. Does this make sense? Fadethree 03:00, Oct 19, 2004 (UTC)
I took the liberty of adding a section discussing this (in somewhat informal terms), at the margin-of-error page, but perhaps it should eventually be moved to the sampling page; I'm agnostic about this. Terry 18:15, 21 Oct 2004 (UTC)
I appreciate the contribution, Terry. I do think that the topic is an imperfect fit here and that it makes more sense at the sampling page. Still, if this is the one-stop place for people who want to understand the topic, your addition will certainly be appreciated here. Wherever it ends up, I think it will be very helpful. Fadethree 10:26, 23 Oct 2004 (UTC)

Other uses of margins of error

[edit]

I don't know if there should be more information appended to the main article or setup as separate pages, but margins of error are used in other fields for related reasons. For example:

  1. In many fields that use digtal computation the margin of error is the sum of all the possible sources of error including the quantization error, sampling error, etc. - giving an area which a given measured value could represent. As such this shares a simular meaning for margin of error, but is applied to a completely different problem.
  2. In some fields of engineering a margin of error is also known as a safety margin and is used an index indicating the amount beyond what is strictly necessary. For example, the margin of error used when building a bridge may be the amount of additional strength that is provided beyond what is strictly necessary based on the mathematical calulations and this is used as a precaution to ensure that the structure doesn't fail and under non-ideal conditions.

Fanboy 06:00, Oct 19, 2004 (UTC)

This is very interesting, Fanboy. I had never heard of these uses of the term. In some senses, the phrase's many meanings speaks to the ambiguity of the phrase "margin of error," and makes the common misinterpretations of the term less surprising. I think that these usages are sufficiently independent to warrant separate pages. I'm not sure if we can call this current page Margin of error (statistics), since your descriptions are also statistical in nature. Perhaps Margin of error (polling). I wouldn't want to make that move until other pages about the Margin of error were up. Thoughts? Fadethree 06:28, 19 Oct 2004 (UTC)
The common thread between all varients of the phrase "Margin of Error" is that the phrase is used to indicate a given range that may be attributed to errors in a calculation or measurment. In popular culture (particularly around the time of political events) this phrase becomes part of the common vernacular because of polls - which is why this page probably was formulated the way it is and why I'm not sure about adding another page. I actually think that the way Trollminator had changed things (which was rolled back) might be a good thing to put back in place as it provides a slightly more accurate definition of Margin of Error - but it still uses polls as primary example of margin of error since this is the primary usage by the general public. Fanboy 06:46, Oct 19, 2004 (UTC)
There are some interesting semantic points here. You probably noted my comment at User talk:Trollminator, and as I note there and in the article, the margin of error (that the article discusses) is a manipulation of the standard error of measurement that is used to compare polls. However, I do not believe that we should confuse the margin of error with the standard error of measurement, which is what I think Trollminator and your definitions are beginning to do. I think that the definition that you note, "the phrase is used to indicate a given range that may be attributed to errors in a calculation or measurement" is already well defined as a confidence interval and should be left at that.
In a sense, defining the "margin of error" as a statistic for polling is a method of triage. The term is simply misleading as a expression of error; as you know, the notion of a "margin" for sampling error is illusory; 95% or 99% are arbitrary standards that have little to do with the complex decisions that these statistics purport to inform. So, I would argue, conflating the "margin of error" with the "standard error" or standard error-based statistics is to perpetuate confusion and misinterpretation for potentially high-stakes decisions (as examples in the article demonstrate).
I think this is an important conversation to have, and I hope I am making my points clear. Please let me know if I am making any sense. Best, Fadethree 07:19, 19 Oct 2004 (UTC)
The problem with this topic is that because "margin of error" in statistics is a measure of the uncertainty in an estimate of a parameter it gets used in different ways by different people who are using the same statistics base.
The errors in polling almost exclusively relate to uncertainty in getting a grasp of the topic at hand - due to the complexity of a situation or modelling the entire population on a smaller potentially non-representative group. The "margin of error" is used to indicate that although you can measure responses perfectly (since in a poll usually everything is a discreet event; yes/no, blue candidate/red candidate/green candidate) there is still some uncertainty as to whether the parameter you are estimating is properly serviced by the question posed.
The "margin of error" definition used in my first point above is a calculated range based on the knowledge that when measuring something digitally you're not sure exactly what you are measuring. This is sometimes used in designing a system as a tolerance of what may be allowed or what may be measured. Try doing a Google search for "margin of error GPS" and you will see another common use of the phrase peppered throughout literature related to mapping coordinates using traditional and GPS systems. In my second point, the margin of error is the maximum allowed uncertainty in the estimate of a parameter for a design for which the design will still not fail. A similar Google search for "margin of error civil engineering" will provide some samples of margin of error used in civil engineering related discussions although "margin of safety" (to reflect that the margin of error is being used for safety purposes) or the shortened "margin" are also commonly used in civil engineering for this parameter.Fanboy 13:49, Oct 19, 2004 (UTC)

Margin of error is a commonly used statistical term, at least in the UK. Since polling is a very popular use of statistics, it is not surprising that the only place many people have seen it is in polling, but the statistics of polling are the same as those of any other sampling measurement. This page should have a slight re-write to reflect that - it's fine to have most of the stuff as polling examples, but it should show that it is not just a polling topic. Mark Richards 15:33, 19 Oct 2004 (UTC)

It is used virtually as a synonym for confidence interval in all branches of statistics. I made edits earlier but they were changed back by Fadethree - I still think this is not just a polling term. [1] here is an example of a non-polling use. Trollminator 15:45, 19 Oct 2004 (UTC)
Yes, a quick search pulls up many examples non-polling related: general statistics, medical statistics, general statistics, medical salaries data, etc etc. Intrigue 16:00, 19 Oct 2004 (UTC)

I will try to respond to all of the above points here. It is perfectly clear that "margin of error" is a term used well outside the "field" of polling, and I understand that it is often used as a synonym of "confidence interval." I was perhaps overly simplistic at the top of this page, and I didn't want to make my argument there and claim authority by page placement. Let me try to make my point succinct. The issue isn't whether the margin of error is used outside of polling, it is whether it should be. I know that this comment may come off as initially offensive to people who have a grounded and perhaps lifelong attachment to the phrase, but give me a brief chance to make an argument. This argument will not be made from authority, so I don't believe any web page with any number of uses of "margin of error" will necessarily refute it. This argument is for "cutting our losses" with the margin of error at the level of polling and preventing the term from spreading unnecessarily into the vernacular of other disciplines.

I liken "margin of error" to any number of phrases in the language which sound catchy, pique interest, marshall authority, roll off the tongue, and, unfortunately, are inappropriate and misleading. It is no coincidence that an example of these is "statistical dead heat," a bloated and meaningless phrase that stems directly from misinterpretations of the "margin of error." The argument boils down to two points.

  1. We already have established and explicitly defined terms that render the use of the margin of error unnecessary. Confidence intervals and standard errors do the job just fine.
  2. The margin of error, when used as a synonym for "confidence interval," places the illusion of a hard, deterministic border on the interval (look up "margin") which simply does not exist. This encourages mischief and has led to widespread misinformation (see article).

A potential counterargument might be that Wikipedia has a responsibility to report all definitions of the term. I would rephrase to, Wikipedia has a responsibility to report all valid definitions of the term and has an even greater responsibility to address common misconceptions, resolve ambiguities, and establish appropriate use.

I will leave it at this for now, but I hope that the structure of this article and my motivations for reining in the definition are becoming clear. Let me add that this has become enough of an issue to make some note about it in the article, and I would appreciate edits and/or suggestions now or once this discussion tapers off. I also appreciate the civil nature of the discussion thus far. Best, Fadethree 17:57, 19 Oct 2004 (UTC)

Thank you - I, too, appreciate civil debate. I am not sure that I agree that the term is vernacular in other fields - it has a long standing technical definition among professional statisticians outside of polling. I hope that you will agree with the explanation I am adding at the top. Let me know if you do not. Trollminator 18:19, 19 Oct 2004 (UTC)
Thanks, Trollminator. I just took out the reference to confidence intervals because I worry that this will lead to a common misconception in polling, that the margin of error is the confidence interval for every percentage. I'd also like to hold off on other fields until we can list them or integrate "margin of error" into the confidence interval page in a way that makes a clear distinction between uses in polling and uses elsewhere. I think that by noting that this is a term used in polling, it is implicit that it is used elsewhere. I also think that the margin of error in polling is probably the most widely held association. Is this okay? Fadethree 19:08, 19 Oct 2004
I guess I've got 2 more cents to add to this discussion and I may at a later time have time to attempt a proper update to this page if someone else is not able to do so first. On the question of whether margin of error should be used outside of polling, I would have to say yes. The reasons for this are two-fold.
  1. The term Margin of Error is a statistical term that came into use for polling, because of the statistics that polls are based on. Other things that use this term for the same reason should be able to do so.
  2. Other uses of Margin of Error often more appropriately fit the literal definition of the term. Going back to my engineering example from above where margin of error fits the dictionary definition of a margin (From Merian-Webster: 3.a: a spare amount or measure or degree allowed or given for contingencies or special situations) better then polls do. So the margin of error is the spare measure provided in case of errors in calculations/measurments to allow for contingencies and non-ideal situations.
When this page is updated however, I do think that the comments by Mark Richards should be noted as polling is the one use of the term that can be assumed that most people will understand as something that have seen in literature and hence can be used as a strong example in explaining the statistics.Fanboy 5:30, Oct 20, 2004 (UTC)
I look forward to your edits, Fanboy. I still do not think that your points address my first point, that we already have well-established and less misleading terms than "margin of error," but I apparently underestimated the extent to which the term has taken root in other fields. Thank you for your helpful comments, and let's just hope that this article's time on the main page has cleared some of the misconceptions held by some of the laypersons and media folk without oversimplifying the nuanced arguments that we're having here. Best, Fadethree 06:14, 20 Oct 2004 (UTC)
In election polling is hardly the only use of the expression margin-of-error, and by no means the only use most people see made of it. Concocting another term for those other usages so as to leave this usage as the exclusive definition is improper as an encyclopedic entry should be encyclopedic, and reflect the actual spectrum of correct usage. Otherwise Wikipedia is in danger of becoming fanciful, misleading, and of limited use as a reference source.

Bayesian confidence intervals

[edit]

Margins of error should be compared with Bayesian credible intervals, the interpretation of which is much more natural (ie they give you the probability that a parameter falls within a given range). There doesn't appear to be much material on this in Wikipedia however. I'll write something if I have time.

Why those formulas aren't in MathML?

[edit]
 IS THAT A MEDIAWIKI MAJOR ISSUE OF TRULY WORSTE LIMITATION??

We use TeX and html for math here. Maybe someday MathML will be supported, but at this time it isn't, because there's not really a need. There are other issues that are more urgent. siroχo 11:42, Oct 19, 2004 (UTC)

Comparing percentages: the probability of leading

[edit]

For the example where the probability of a candidate leading is computed, it says, "These calculations suggest that the probability that Kerry is "truly" leading is 74.66 percent." Technically, the 74.66% is a (one-sided) p-value, not a probability. That is, it is the probability that we would see a lead of 2% or more for Kerry given that Kerry and Bush are exactly tied. If, for example, we knew it was 47/45 the other way, we could still by random chance draw a sample showing a 2% Kerry lead, even though there is probability 0 he would be actually leading at that moment. Or is this getting too technical for what is supposed to be a relatively nontechnical illustration?

Also, the previous comment "questions" mentions that we don't need to assume p+q=1 for the results, but I think the formula for the standard error of the difference used that assumption. Otherwise, the covariance term would be 2*sqrt(p*q*(1-p)*(1-q))/N since covariance is the product of correlation and the standard deviations.

I took the Bayesian route because I think it is easier for people to understand. The Bayesian vs. Frequentist debates over interpretations is fascinating in the ivory tower but I don't want to risk alienating the lay user by getting into philosophy when it won't make too much of a consequential difference. You are welcome to make an edit, of course. About the p+q=1 bit, you are absolutely right; I was mistaken. Best, Fadethree 19:14, 19 Oct 2004

Probability of leading

[edit]

If we're assuming perfect negative correlation, does this not imply that p = 1-q and q=1-p? If so, the formula for the spread considerably simplifies to this:

Also, in this limited situation, assuming p is the larger frequency, p-q=p-(1-p)=2p-1, so we can rewrite the above as

I can imagine adding a table to the article which shows, for various values of p-q and N, the approximate probability of leading. I will construct such a table, unless someone spots an error in my reasoning. Deco 02:13, 3 Nov 2004 (UTC)

I just realized, since the margin of error is defined in terms of N, we can use this formula to calculate the spread instead, where r is the margin of error and C depends on its confidence:
Now we can construct a table of probabilities of leading with margin of error on one axis and difference in percentage points on the other. Since both of these only get up to about 10 in situations where it matters, this table can effectively be complete (maybe have one for 95% confidence margins and one for 99%). This seems like an especially useful table to have. Deco 05:45, 3 Nov 2004 (UTC)
Added those tables. They agree with the text, which is nice. I calculated them using the above formulas, and I believe they're correct. Deco 06:37, 3 Nov 2004 (UTC)
The tables, as stated in the article, apply to two-candidate contests. How applicable are they to multi-candidate races? Canadians, for example, may come to this page from the 39th Canadian General Election page, and our polls will often show five possible responses, at least three of which will have non-negligable shares of the pie. Seems like a good idea to address polls with more than two answers, since that'd cover many situations outside of politics, too. --142.206.2.9 19:28, 19 December 2005 (UTC)[reply]

A recent edit to the equation (17th August 2012) added the factor of 2, but there are two instances of the formula and the 2 is missing from the second. In addition, from my limited understanding, the 2 x standard deviation relates to confidence of 95%, so would it be better to write as "Standard error = <formula>, to a 95% degree of confidence"? Kevfquinn (talk) 12:33, 24 August 2012 (UTC)[reply]

On a related note, the second expression of the formula in "Comparing Differences" for "Standard error of difference" is not equivalent to the first, the signs are incorrect. The first formula makes sense, I don't see the value in expanding the equation but if it's useful the expansion should be:

Kevfquinn (talk) 12:40, 24 August 2012 (UTC)[reply]

unclear sentences

[edit]

"Given the actual percentage difference p − q (2 percent or .02) and the standard error of the difference, above (=.03), use a program like Microsoft Excel to calculate the probability that .02 is greater than zero given a normal distribution with mean 0 and standard deviation .03."

I am trying to wrap my mind around how .02 can be less than zero...
Should this read: "Given the actual percentage difference p − q (2 percent or .02) and the standard error of the difference calculated above (.03), use a program like Microsoft Excel to calculate the probability that a sample from a normal distribution with mean .02 and standard deviation .03 is greater than 0."?
I just changed it to the above because I think it's clearer.
Can someone fix this bit, as I think it's confusing and incomplete. I am not a statistician, although I have a good grounding in maths, but this is where I get lost in the article.
(a) please show an example of the Excel formula used to calculate the probability. Is it NORMDIST? How can the probability (by implication) be independent of the confidence level?
(b) the standard error (deviation?) of the lead is unconnected to the standard error or margin of error in the table. I don't see how using the standard error of the lead can confirm any probability in the table...RodCrosby 19:45, 7 December 2006 (UTC)[reply]

"Formally, if the level of confidence is 99 percent, one is 99 percent certain that the "true" percentage in a population is within a margin of error of a poll's reported percentage for a reported percentage of 50 percent. Equivalently, the margin of error is the radius of the 99 percent confidence interval for a reported percentage of 50 percent."

These sentences are quite unclear. I don't even understand what the 50% is about, so I can't fix it yet. --MarSch 18:17, 24 Jun 2005 (UTC)

The wording is a bit odd. What they're saying, for example, is that if I take a poll with a margin of error of 5% and a confidence level of 99%, and 50% of people polled choose A, then we can conclude that in reality there is a 99% chance that between 45% and 55% of all people would choose A. If more or less than 50% of people polled chose A, it's a little bit more complicated - for example, if 99% choose A, then in reality there is a better than 99% chance that it's the actual percentage is between 94% and 100%. It's a little technical - maybe we can gloss over it all somehow without being totally wrong. Deco 01:08, 25 Jun 2005 (UTC)
Aha, I've got it. Fixing. I made it just talk about "at least 99%" probability that it's within one margin of error, which is accurate and hopefully gets the idea across. Deco 01:10, 25 Jun 2005 (UTC)

Margin of error and "statistical dead heat"

[edit]

I think the article makes some good points on the misuse of margin of error but perhaps overstates the point. Certainly it's true that the margin of error quoted for the individual percentages is not equal to the margin of error for the "lead"; however, if one knows the margin of error for the lead, this does tell one the confidence one may have that the person leading in the poll is actually leading. If the lead is not different from zero by a certain confidence interval, then one may rationally say that we do not have confidence that the party ostensibly leading is actually leading. Describing this as a "statistical dead heat" is reasonable. The fallacy is to believe that we then have no idea who is leading. If the best estimate for candidate A is leading the best estimate for candidate B then we still conclude that it's more likely than not that A is leading, it's just that we cannot be confident of it if the difference does not fall outside the appropriate confidence interval. The point is that the correct margin of error (for the lead) does tell us something about how seriously to take the reported lead, and the reader of this article may be mislead to think this is not the case.

I agree. As a biologist, I'm always reluctant to drawn any conclusions from results which fall outside the 5% confidence interval. Don't get me wrong, I'm not saying it is not more likely that A is leading B, clearly it is, but in biologist anyway, the generally accepted idea is you just do not drawn conclusions without data supporting at least a 5% confidence interval. The difference is simply not statisticly significant. This is not to say there is NO difference between the two, simply that the data does NOT allow us to draw any conclusions with the degree of confidence necessary for us to pay any heed to the said conclusion. Or put it this was. Our data suggests A is more likely to be leading B but our data does not have enough evidence to suggest this theory is probably correct.

Title

[edit]

Shouldn't the title use the word "sampling"? Maurreen 6 July 2005 12:55 (UTC)

I agree that "Margin of sampling error" would be more correct. But the more common usage is "Margin of error", and we should probably follow that in our title. -- Avenue 01:16, 2 May 2006 (UTC)[reply]

Meaning of MOE

[edit]

I'm mathematically inclined, but not a statistician, and currently I'm a little uncertain of the meaning of MOE. Is the following correct:

Given this information:

Level of confidence: 95 %
Margin of error: 4%
Results:
 A:50%
 B:30%
 C:20%

There is a 95 % chance that A is within 4% of correct.
There is a larger than 95% chance that b is within 4% of correct.
There is an even larger percent chance that C is within 4% of correct.

Is that all correct? Overall this seems to be relatively useless. It seems polls would be better to leave off the MOE, but give the sample size, allowig people who care to calulate the MOE themselves for whatever confidence level they desire. 71.0.197.230 03:07, 24 May 2006 (UTC)[reply]

Roughly speaking, that's all correct. And if it was a simple random sample from a large population, you could just give the sample size and let people calculate their own MOEs. But usually most of the the audience for survey results wouldn't know how, or be able to calculate it even if you showed them how. More importantly, many surveys use more complex survey designs, meaning that it becomes much more difficult to calculate MOEs. People would need access to the survey data, details of the sample design, specialised software and statistical expertise to calculate their own margins of error. -- Avenue 04:37, 24 May 2006 (UTC)[reply]

I believe your understanding is incorrect. Please see my comment about a common misunderstanding in the discussion on “Serious problems with the article”. When you say “There is a 95 % chance that A is within 4% of correct”, what do you mean statistically by 95% chance? The true A is a constant with no statistic distribution. From the exactly the sample population and using exactly the same method, but different size, another sample can generate A: 48% with MOE = 3%, or A: 52% with MOE = 8%, but the true A remains exactly the same. If using the same sample size, one may have A = 49%, 52%, 53% with the same MOE (4%). Again, the true A remains the same unknown constant. If a polling organization uses the same confidence level (95%), we can say that among all its polling reports, they capture the true statistic (e.g. A) 95% of the time within the reported EOM’s, which may vary from one report to another. Zipswich (talk) 22:09, 17 March 2009 (UTC)[reply]

The anonymous editor who posted that query almost three years ago is probably long gone, but FWIW, you seem to be misinterpreting their question. The problem is that "margin of error" is a vague term; it is sometimes used to mean "the half-width of a confidence interval for a specific statistic" (which is how you've interpreted it here), but often means "the maximum confidence interval half-width for any statistic" (across percentages based on the whole sample for a particular survey, say). The second meaning seems to be what the original query was about, since they quoted one margin of error that applied to three different results (A, B and C). -- Avenue (talk) 23:13, 17 March 2009 (UTC)[reply]
Actually I did interpret it as the conservative maximum confidence interval assuming p = 0.5. Otherwise I would not have stated “If using the same sample size, one may have A = 49%, 52%, 53% with the same MOE (4%).” Because MOE would change according to p even with the same sample size unless the maximum interval is assumed. No matter what confidence interval is used for MOE, the original statements such as “There is a 95 % chance that A is within 4% of correct.” bear the typical misinterpretation of confidence interval. I can see a Bayesian can argue it is the same as saying “There is 95% probability the interval of 50% +/- 4% contains A”. Probably I am getting into the type of debate between a Bayesians and frequentists that can never be resolved and will last forever. I personally believe it is better for the public to think the pollsters capture the true value within the margins of error 95% of the time and miss it 5% time than any other interpretation.Zipswich (talk) 15:37, 19 March 2009 (UTC)[reply]
Sorry, I now see I was mistaken about which of those two MOE definitions you were using. However, I still think you missed the point of the original question. I believe they were surprised to learn that the coverage probabilities for the intervals derived from a single maximum MOE for a given sample size will vary depending on the value of the estimate, and couldn't see the point of reporting a maximum MOE if it had that property. While I agree with your point that they did not exhibit a correct frequentist understanding of probability in their question, I think that would have been far from the most germane response, especially as the question did make sense under a Bayesian interpretation. -- Avenue (talk) 08:59, 20 March 2009 (UTC)[reply]
As far as I can see, I'd say that when you take samples of a population, the samples can give a range of answers for a what is a single figure when calculated for the whole population. Let's consider the mean. The population has one unambiguous population mean. Different samples can produce different sample means. In other words, sampling introduces uncertainty in estimates of the population mean. The larger the size of the samples, the less uncertainty there is. The margin of error as a percentage of a sample mean (at 95% confidence, for example) says something like: 95% of the time, the true population mean will lie within the margin of error either side of sample mean. If the sample mean is 40 and the MOE (at 95% confidence) is 3% (of the mean), then 95% of the time the true population mean will be in the range 40 +/- 1.2, i.e. 38.8 - 41.2. MOEs can be calculated with any other level of confidence. Have I got it right? Acorrector (talk) 18:35, 8 November 2021 (UTC)[reply]

Serious problems with the article

[edit]

I've been working on and off over the last month to correct various problems with this article. Some of the most glaring problems are now fixed; in particular the complete omission of any reference to the maximum margin of error (when that was what all the figures referred to) or to the assumption of a simple random sample made by all the formulae. Also the limited focus on polls, when margins of error apply to most sample surveys, and the misuse of N instead of n to denote the sample size. There are still some major problems remaining, which I've started to work through: 1) the technically inaccurate descriptions of what a margin of error or confidence interval means; 2) some unstated assumptions about the sample designs assumed by the various formulae and calculations shown; and 3) the fact that most statements in the article apply to only the very few surveys that use simple random sampling with replacement.

Each of these remaining problems on its own would mean the article fails to meet some of the featured article criteria (factual accuracy for problems 1 and 2, and comprehensiveness for problem 3). The third point will be the hardest to address properly, but is probably also the most important. Even if 1 and 2 are fixed, so that our article would not be actively misleading, I believe it would still be of limited use in most practical situations until 3 is dealt with. -- Avenue 15:28, 27 May 2006 (UTC)[reply]

It also only has a single reference which appears to be of very limited scope. --MarSch 10:44, 29 May 2006 (UTC)[reply]

Another problem: the Newsweek poll was weighted, meaning that the standard formulae (which assume an unweighted simple random sample) shouldn't be applied, but we apply them to this poll anyway. The descriptions of the poll in the external links are very vague and, in particular, they don't say whether a simple random sample was used. -- Avenue 15:06, 31 May 2006 (UTC)[reply]

Description

[edit]

The challenge of accurately describing what a margin of error or confidence interval means is that not many people can appreciate. I see people who use confidence intervals all the time without having true understanding of its accurate meaning. Let me use an example to give it a shot. If one uses a method called M to generate an margin of error 4% for confidence level 95% on certain test sample, it means if this method M is used many times on different samples to generate MOE’s which vary from sample to sample (e.g. 3.11% for one sample, 5.02% for another sample, 3.66% for another, etc.), 95% of the time the true statistic (e.g. percentage voting for one candidate) is captured within the margins of error. You can see the confidence level is a probability about the method, not directly associated with the statistic. One common misunderstanding of MOE or confidence interval is that regarding the confidence level (e.g. 95%) as the probability that the statistic falls in the interval formed by MOE’s (e.g. 47%-4% to 47%+4%). The statistic is one unknown constant that does not have probability distribution. The confidence level is all about the method we use to chase this unknown constant. I think the sentence in this article “This level is the probability that a margin of error around the reported percentage would include the "true" percentage” is a good one. Zipswich (talk) 21:39, 17 March 2009 (UTC)[reply]

Your post seems completely unrelated to the three-year-old discussion just above, so I have taken the liberty of giving it a fresh section heading. What you say seems correct, at least for the more restricted meaning of margin of error that refers to just a single statistic (see my response to your other post above), and under the traditional frequentist interpretation. Do you see something in the article that you think should therefore be changed? -- Avenue (talk) 23:26, 17 March 2009 (UTC)[reply]
I was trying to address one of the problems you mentioned:1) the technically inaccurate descriptions of what a margin of error or confidence interval means; Zipswich (talk) 12:59, 18 March 2009 (UTC)[reply]
Well, I made that comment about the article back in May 2006, when it looked like this. I think the descriptions here have been made more technically accurate since then, although there is always room for improvement. In particular, the article currently flips back and forth between frequentist confidence intervals and (at least implicitly) Bayesian credible intervals, without being very clear about the distinction. While this might not bother a naive reader, I think it could confuse readers who have some understanding of one method, but not the other. The sentence you picked out about the confidence level is carefully worded to make sense under each paradigm; I'm glad you like it. -- Avenue (talk) 01:41, 19 March 2009 (UTC)[reply]

Running Example issues

[edit]

I recommend changing the main running example to a 2-person contest. The Bush/Gore/Nader race confuses because it isn't a straightforward enough example. The rest of the discussion never refers again to the Nader statistic--that's a problem and I imagine many readers will wonder about it, making the discussion of the example seeming incomplete. In reality, no presidential race is truly 2-person, but something where the 3rd, 4th etc. candidates are minimal-enough values to be discarded is better. Perhaps with the disclaimer that there is a slight variation from reality.

Nader is a particular problem because of how well-known he is, contra many 3rd party candidates. However, since many situations do involve more than 2 options, a US presidential race of more than 2 that would work well is 1992 when Ross Perot got something like 19%. That works better given how small Nader’s figure was for 2004 and that his figure isn’t high enough to include the whole range of MOE values—his “bottoms out.” Don’t know how US-centric this article is mean to be, but obviously races in other Western nations offer mostly >2 candidate races. Perhaps the 3-person example could be segregated to separate section to keep the 2-person example clearer rather than being intermingled with a 3-person example.

Another issue is that the sections introducing the example do not state the MOE right upfront. That should be immediately stated--anyways--but given that it's an MOE article, all the more. I kept waiting for the shoe to drop, so to speak, on that but it didn't come until (too much) later.

I recommend that the example(s) be spelled out in greater detail to demonstrate that the meaning of MOE as applies to each subject individually, not to the difference between the candidates. I see journalists frequently misstate the MOE calculation as applying to the difference rather than to each candidate. As an example: Candidate A is polled at 53% with B at 47%, MOE of 4. The ranges should be spelled out as for A 57 to 49%; for B 51 to 43%. That gives a concrete example, and can be used to better illustrate the (correct) distinction/importance of considering the application of MOE to candidates A and B individually rather than to the difference of the figure of 4, which would indicate--in that misread of the difference of 6% being (inaccurately reported) outside that MOE of 4.

A brief commentary is made in the dead-heat discussion about the difference and concomitant likelihood, but returning to the running example to spell out the relative (colloquially speaking) the significance (colloquially) of a result for A in the MOE of 57 vs. 49 vs. 53%--how likely is it that 53% is correct vs. 57/49/other percents.

As with many of the math/statistics entries, the issue of very technical explanation vs. man-on-the-street arises. This entry seems to shift from a heuristically useful real-world running example to hardcore calculations. Not sure how to deal with this, but extending the running example to be used in the more technical section about how to calculate the equations could be illustrated better by using the figures from the running examples rather than leaving it more hypothetical and generally abstract--as consistent with the use of a real-world example in the rest of the section. JackWikiSTP (talk) 17:40, 6 May 2008 (UTC)[reply]

Proposed merge with Engineering tolerance

[edit]

Margin of error is a very different topic from Engineering tolerance. One is the actual amount of random variation in estimates from a sample survey; the other is the allowable variation in physical measurements or properties of engineering components. I can't see any benefit in trying to merge the two articles. Does anyone object to the merge tag being removed? -- Avenue (talk) 14:04, 10 May 2008 (UTC)[reply]

Tag now removed. --Salix alba (talk) 17:04, 10 May 2008 (UTC)[reply]

Calculating Max / Min from Standard Deviation

[edit]

Hi, I'm researching using standard deviation to calculate maximuns and minimum and while this site is getting me close to an answer it doesn't quite clearly explain what I need to know. I'm adding my example so that maybe you can make a clear explanation for other users.

I am a geologist and have some rock powder (a standard sample) that the I have bought from a laboratory which has tested it thousands of times to the point where the laboratory claims the amount of gold in the rock is 6ppm with a standard deviation of 0.2. Now I'm going to put this rock powder in with some other rock samples that I'm sending to another laboratory, which they're going to test for gold, so that I can check the accuracy of this other laboratory. I need to create a minimum and maximum value from the standard deviation for the amount of gold to say whether I accept this result as a pass or fail for the value that returns from the laboratory. Usually a 95% confidence level is used. Now I believe that 95% confidence is 2 standard deviations, therefore the max-min range for 95% confidence will be from 5.6ppm to 6.4ppm.

Is this correct? —Preceding unsigned comment added by 125.160.164.105 (talk) 08:17, 3 November 2008 (UTC)[reply]

Your question seems somewhat off-topic here; you might find more useful information at confidence interval, or our Reference Desk might be able to help. There seem to be a few unstated assumptions in your description of the problem; e.g. that the second lab will give you only a point estimate, not a range, and that getting a lab to carry out one measurement will give you enough information to "pass or fail" them. -- Avenue (talk) 23:39, 17 March 2009 (UTC)[reply]

Formulae

[edit]

This is an article about margin of error, yet no formulas are given to help readers calculate the margin of error. Some of the basic formulas would be nice; MOE for proportions, for means, for standard deviations, etc... — Preceding unsigned comment added by 71.89.54.202 (talk) 22:00, 22 December 2012 (UTC)[reply]

Figure not directly relevant

[edit]

The figure in this article shows and describes Bayesian credible intervals, assuming a uniform prior over true values of the parameter. Not confidence intervals/margin of error. The caption gets the MOE interpretation wrong: as discussed elsewhere in Talk, an X% margin of error does not tell you that the true value of the parameter probably falls within X% of the sample estimate. It tells you that if you repeated the survey/experiment infinitely many times, only X% of the margins of error you calculated would not contain the true value.

I'm loathe to remove the figure altogether, since it is a nice diagram, but it is misleading as an illustration of the topic of the article generally. If a new section on "comparison with credible intervals" was added, that would be a good place to put it. — Preceding unsigned comment added by 192.133.61.10 (talk) 19:11, 21 March 2013 (UTC)[reply]

Are the Formulas Wrong?

[edit]

In "Calculations assuming random sampling" the formulas given do not output the results provided. There seems to be a term, possibly ^.5 based on my reading, missing from them. — Preceding unsigned comment added by 167.206.48.221 (talk) 06:01, 31 March 2013 (UTC)[reply]

Questions/Comments

[edit]

Couple of points:

- I don't see direct references to multiplying the std error by a confidence level as in this article http://www.dummies.com/how-to/content/how-sample-size-affects-the-margin-of-error.html. Seems like a formula including a zValue (or tValue) should be included. - Overall that article does a better job of explaining the concept whereas this page is overly complex - How are 0% and 100% values handled? — Preceding unsigned comment added by Bpatton (talkcontribs) 14:54, 24 April 2015 (UTC)[reply]

Sources modified on Margin of error

[edit]

Hello fellow Wikipedians,

I have just attempted to maintain the sources on Margin of error. I managed to add archive links to 1 source, out of the total 1 I modified, whiling tagging 0 as dead.

Please take a moment to review my changes to verify that the change is accurate and correct. If it isn't, please modify it accordingly and if necessary tag that source with {{cbignore}} to keep Cyberbot from modifying it any further. Alternatively, you can also add {{nobots|deny=InternetArchiveBot}} to keep me off the page's sources altogether. Let other users know that you have reviewed my edit by leaving a comment on this post.

Cheers.—cyberbot IITalk to my owner:Online 17:30, 28 June 2015 (UTC)[reply]

[edit]

Hello fellow Wikipedians,

I have just modified 2 external links on Margin of error. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 11:41, 20 May 2017 (UTC)[reply]

[edit]

Hello fellow Wikipedians,

I have just modified one external link on Margin of error. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 04:31, 26 July 2017 (UTC)[reply]

Small gripe on the quality of (the start of) the 'Basic Concept' subsection

[edit]

Two things. Firstly I feel as if this section is written slightly too informally, including the beginning 'Polls **basically** ...', and the crude calculation of the probability that the entirety of the sample votes for one candidate; this does not feel at home on this article but elsewhere in Random Sample, if at all.

Secondly, the blasé 'pollsters take smaller samples that are intended to be representative, that is, a random sample of the population' ignores the fact that random sampling is far from the most common polling method. See the Report of the Inquiry into the 2015 British general election opinion polls's recommendations section for a fairly recent example of this.

--Hu5k3rDu (talk) 14:01, 17 July 2018 (UTC)[reply]

Explanation of confidence interval in image

[edit]

"for each sample size, one is 95% confident that the "true" percentage is in the region indicated by the corresponding segment. The larger the sample is, the smaller the margin of error is."

How does something like this slip into an article? Confidence intervals are mentioned, so one has to go with frequentist interpretation. Something like "given that we repeat the poll some N number of times, we expect the true population value to fall within 95% of the intervals."

The Definition of Margin of Error and the Bowler Example in the Second Paragraph

[edit]

I came to this article to better understand the relationship between the terms Confidence Level and Margin of Error. I got as far as the bowler example in the second paragraph and found myself becoming confused over layman usage and terminology before getting into mathematical and technical definitions in the subsequent paragraphs.

First, it appears from the very first sentence of the first paragraph that the presumption is that the term Margin of Error is pretty much a technical term that applies to sampling and statistics. However, in the second paragraph there is also an acknowledgement that there is a usage that relates to observational error and also a colloquial usage that relates to accuracy (although the writer included the term "precision" as opposed to accuracy - likely a topic for another discussion). So, with regard to this colloquial usage, it appears, at least to me, that there can be two significant interpretations: one expressed as double sided tolerance and another expressed as a combined tolerance. For example, if one is referring to an exact target or referencing by an exact number, then the margin of error or the accepted tolerance would typically be described as a margin to either side of that target or number. In contrast, if one is carrying a 35-inch wide refrigerator though a 36-inch wide doorway, one might describe the margin as being one inch rather than as being plus or minus one-half inch. This appears to make sense in the two examples because in the first example the objective might be to hit the target dead center, where as in the second example the objective is likely to simply get the refrigerator safely through the doorway. In the case of the bowler example in the second paragraph, the objective is more appropriately like the first example, that is, to strike the pin with the ball, not to get the ball through a window or between two pins. Relating this concept to the remaining article, I would suspect that in most cases, at least in my experience, the Margin of Error is normally provided as a double sided value, though possibly not always symmetrically equal.

Second, given my understanding as described above, I simply do not understand the calculation where a 4.75-inch wide bowling pin and an 8.5-inch wide bowling ball can be represented as having a margin of error as 21.75 inches in relationship to the ball striking the pin. If we assume that the ideal strike or the ideal aiming point is a direct center-to-center strike, then the tolerance (or margin of error) on that center-to-center strike on each side would be 6.625 inches ((8.5 + 4.75) / 2). Accordingly, if the center of the ball is beyond 6.625 inches from the center of the pin, the ball will not strike the pin. Furthermore, even if the bowler had the objective to simply nick the pin on one side (possibly to project the pin sideways), the maximum margin of error might be less than an inch to accomplish that objective and a maximum high of 13.25 inches to the opposite side for simply striking the pin though not accomplishing the objective. Accordingly, I fail to understand the meaning of saying that "a bowler has a 21.75 inch margin of error when trying to hit a specific pin to earn a spare." BillinSanDiego (talk) 23:48, 4 December 2020 (UTC)[reply]

Poorly Written

[edit]

With due regard for the technical knowledge of whomever may have contributed to this article, an "editor" to be interpreted as a person familiar with the English language, grammar, popular lexicon, and intelligibility, is what is required here. The last editor of the talk page raises a very important point, which is commonly ignored in Wikipedia. Alas, Wikipedia has a popularity that is undeserved. The bowling pin example should be removed, It is indiscernible to everyone but the American Bowling Association and whoever wrote it! Secondly, the definition of Confidence Interval suggests that it is tied to a 95% probability. It is not. Third, the article starts right away in formulas without introducing any basic concepts. Was Wikipedia not the "people's encyclopedia"? It is nearly impossible to come to a Wikipedia page without finding glaring errors, disparate statements, and irreconcilable contradictions. Maybe y'all can correct this page! 70.231.83.189 (talk) 16:56, 18 September 2021 (UTC)[reply]

Broader concept

[edit]
Illustration of margin for error to cover single-pin spare in ten-pin bowling

There doesn't seem to be a home on Wikipedia for the more colloquial, less statistical/scientific, use of the term margin for error. The present article relates to sampling error, and the engineering concept of Margin of safety#Factor of safety does not even involve the word "error" per se. Is there an overarching concept I'm not thinking of? Please enter your thoughts below. —RCraig09 (talk) 22:19, 11 February 2022 (UTC)[reply]
P.S. The now-deleted recitation of a 21.75-inch margin for error, discussed above, was factually incorrect. —RCraig09 (talk) 22:22, 11 February 2022 (UTC)[reply]