Jump to content

Wikipedia:Spotting possible copyright violations

From Wikipedia, the free encyclopedia

This is a guide to spotting violations of the Wikipedia copyright policy that are simple copy-and-pastes from other websites. Please remember to assume good faith when doing the important work of keeping Wikipedia compliant with CC BY-SA and, where co-licensed, GFDL. It's important to keep in mind as well that what appears to be copied content may not be a copyright issue in some cases - for example, when Wikipedia had the content first or when the content is public domain or compatibly licensed.

Signs that an article might be copy-and-pasted

[edit]

There are a number of signs that an article might be copy-and-pasted. None of these are conclusive evidence, but more than one of these signs tends to be apparent in a copy-and-pasted article.

Indicative, but by no means conclusive signs:

[edit]
  • The text is not wikified or is over-wikified, with every occurrence of a word or phrase made into a wiki link (as if search-and-replace had been used to insert the links)
  • The text was added all at once by one person in finished form with no spelling or other errors.
  • The writing style is "too good to be true"
  • The text has a strange tone of voice, such as an overly informal tone or a very slanted marketing voice with weasel words
  • Use of first person pronouns ("we/our/us...")
  • Use of non-standard characters such as Microsoft "smart quotes" (Note that these may have been created in Microsoft Word or another word processing software offline)
  • In-line footnote links such as "[1]", especially when no footnotes are given.

Strong signs of copy and pasting:

[edit]
  • Out of context phrases like "this site/page/book/whitepaper"
  • Isolated or out-of-context words or phrases such as "top", "go to top", "next page", and "click here", that were originally part of the navigation structure of the original website
  • Use of trademark symbols (™,®) and similar typical signs of commercial text
  • A writing style that rarely occurs outside of a specific, invariably copyrighted, use, such as an advertisement or press release
  • A contribution from a user who has a history of violating copyright

Irrefutable evidence:

[edit]
  • Pages which exhibit the above characteristics, and include the original site's copyright notice, copied intact!
  • A copy of the page source, including links to other pages on the same server which would not occur on Wikipedia or a wiki (e.g., a link to /home/news/latest.html)
  • A URL, labeled as "reference" or "source", which links to a page on a copyrighted website containing the exact (or almost exact) same text where the Wayback Machine verifies they had it first.[1]

Checking it out

[edit]

Once alerted by one or more of these suspicious signs, you can then check the article by highlighting a sentence or non-trivial sentence fragment that is unlikely to be found by chance in many documents, copying and pasting it into a search engine. You should then check the matching pages, if any, for further correspondence to the submitted article. Be aware that many sites "mirror" content from Wikipedia, so a search engine may find several sites with the exact content. Those sites should list Wikipedia as the source of the article, but do not always do so. Wikipedia:Mirrors and forks can help you identify known mirror sites, if you suspect that's what you've encountered. The Wayback Machine can also help confirm copying but has limitations in eliminating it - it does not store every site or every page within a site and may lag by six months or so even on pages that it does store. For extra thoroughness, you may also want to check out the "groups" or "books" options in Google.

Many times an image from some other website is uploaded here under the same name. Hence if you suspect an image to be a copyright violation, you can try searching Google Images for the filename of the image to check if there are matches from other websites for the same image. Even if the image was uploaded with a different name, a google image search for relevant search terms might help finding the original image in case of a copyright violation. TinEye and other reverse image search engines can also be useful.

To find the date when suspected copyrighted text was inserted into an article, you can use the WikiBlame tool for this. There is a link to WikiBlame (as well as to an alternative tool) on the 'View History' tab of every article. Look for the line beginning "External tools: Find addition/removal" towards the top of that page. This lets you determine when specified text was inserted and to compare it against the date of the other source (assuming one was given). Sometimes we find that old article text has been taken from Wikipedia and used without attribution on more recent blogs or websites. Understanding who has copied from who is extremely helpful, and avoids the embarrassment of making flawed accusations of WP:COPYVIO to good faith editors. Where currently active editors appear to be making copyright violations, it is appropriate to warn them and to request WP:REVDEL of all the subsequent edits containing that text. This can sometimes span a number of years.

[edit]

If you suspect one, you should at the very least bring up the issue on that page's talk page. Others can then examine the situation and take action if needed. The most helpful piece of information you can provide is a URL or other reference to what you believe may be the source of the text. If the talk page is not watched, however, your note on the talk page may go unseen. If you suspect this will be the case, please consider also using {{Copypaste}} or opening a section listing your concern at WT:CP.

  • Remember: please don't bite the newbies -- many copy-and-paste contributors may not understand that what they are doing is wrong, and some may turn into valuable contributors if educated rather than punished. You can use the user's talk page to discuss your concerns with them. The {{Uw-copyright-new}} template may be useful for this.
  • Some cases will be false alarms. For example:
    • if the contributor was in fact the author of the text that is published elsewhere under different terms, that does not affect their right to post it here under the CC-BY-SA and GFDL. Point them to Wikipedia:Donating copyrighted materials.
    • Material from public domain resources is sometimes republished with unclear or misleading copyright notices which may obscure the origin. Please see Wikipedia:Plagiarism for how this is corrected.
    • Content might be copied from one Wikipedia article to another or an article from another language's Wikipedia might be translated and published here (bringing with it seemingly suspicious anomalies, particularly if the contributor's understanding of English and/or wikification is limited). As long as attribution is supplied to meet licensing requirements, these are not a copyright violation. Please see Wikipedia:Copying within Wikipedia.
    • Also, sometimes you will find text elsewhere on the Web that was copied from Wikipedia. In these cases, it is a good idea to make a note in the talk page to discourage such false alarms in the future. See {{backwardscopyvio}}.
  • Please see the Wikipedia copyright policy document for what to do in difficult cases, such as where a user continues to post copyrighted material in spite of warnings.

Notes

[edit]
  1. ^ In very, very rare cases, official sites have been found to have updated to include Wikipedia's content after earlier versions had been used as sources for the Wikipedia page. If the Wayback Machine doesn't demonstrate that earlier versions of the suspected origin page differed from the current text, the content on Wikipedia should only be retained if there is strong evidence of natural evolution - that is, if content evolved gradually on Wikipedia over time and especially with the contribution of more than one person.

See also

[edit]