Jump to content

Wikipedia talk:Reliable sources

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

AI-written citations?

[edit]

I was adding an event to an article (Special:Diff/1220193358) when I noticed that the article I was reading as a source, and planning to cite, was tagged as being written by AI on the news company's website. I've looked around a bit, skimmed Wikipedia: Using neural network language models on Wikipedia, WP:LLM, WP:AI, WP:RS and this Wikimedia post, but couldn't find anything directly addressing whether it's ok to cite articles written by AI. Closest I could find is here on WP:RS tentatively saying "ML generation in itself does not necessarily disqualify a source that is properly checked by the person using it" and here on WP:LLM, which clearly states "LLMs do not follow Wikipedia's policies on verifiability and reliable sourcing.", but in a slightly different context, so I'm getting mixed signals. I also asked Copilot and GPT3.5, which both said AI-written citations neither explicitly banned nor permitted, with varying levels of vaguery.

For my specific example, I submitted it but put "(AI)" after the name, but I wanted to raise this more broadly because I'm not sure what to do. My proposal is what I did, use them but tag them as AI in the link, but I'm curious to hear other suggestions.

I've put this on the talk pages in Wikipedia:Using neural network language models on Wikipedia and Wikipedia:Reliable sources. SqueakSquawk4 (talk) 11:36, 22 April 2024 (UTC)[reply]

For me it comes down to a case by case basis. If AI is being used as part of the process, but ultimately the article is from a real person and editor then it's probably fine. The issue comes from articles completely written by AI with little or no oversight.
The site has an AI disclaimer[1] where they say they only use AI in the first way, not the latter. So on that point I would think it should be ok. -- LCU ActivelyDisinterested «@» °∆t° 13:02, 22 April 2024 (UTC)[reply]
@SqueakSquawk4, do you absolutely need that source? If you can find a better one, then I suggest using the better one instead. WhatamIdoing (talk) 02:03, 24 April 2024 (UTC)[reply]
A) I kinda do, it's the only citation I found with everything in the same place. If I took it out I'd have to put in 2 or 3 seperate citations to not leave something uncited.
B) I was going trying to ask more generally, with the one I found as just an example rather than really the focus of what I was asking.
C) @ActivelyDisinterested Thanks, didn't spot that. SqueakSquawk4 (talk) 12:32, 25 April 2024 (UTC)[reply]
  • AI = NO Considering the 'hallucination" issue that LLMs have, and, in fact, considering how they are constructed at a base logic level, I would categorically treat any "AI" source as intrinsically non-reliable. If a news agency is found to be using "AI" constructed articles on a regular basis then that source should be deprecated. Simonm223 (talk) 12:42, 25 April 2024 (UTC)[reply]
    Simon, I think black-and-white rules are easy to understand, but hallucination is only an issue when it appears. AI sometimes generates false claims. If it's writing something you know to be true and non-hallucinated (e.g., because you've read the same claim in other sources, or because it's the kind of general, non-controversial knowledge that the Wikipedia:No original research says doesn't require a citation, like "The capital of France is Paris"), then that problem is irrelevant.
    @SqueakSquawk4, editors might accept this source, especially in light of what AD says. However, if the content is important to you, you might consider using the three other sources instead of (or in addition to) this one, to make it harder for someone to remove it on simplistic "all AI is wrong and bad" grounds.
    As a tangent, we've never defined reliable sources. Unlike an article, which would doubtless begin with a sentence like "A reliable source is...", this guideline begins with "Wikipedia articles should be based on reliable, published sources". I suggest that the actual definition, in practice, is "A reliable source is a published source that experienced Wikipedia editors accept as supporting the material it is cited for". Some editors strongly oppose AI-generated sources, and we can usually expect that some editors won't take time to understand the nuances behind using AI as a convenience vs using AI unsupervised to generate content wholesale.[*] Therefore, I'm uncertain whether it would considered reliable if it were ever seriously disputed.
    [*] This is happening in the real world, with a student accused of plagiarism without any evidence except Turnitin thinking it was AI-generated,[2] so it'll happen on wiki, too. WhatamIdoing (talk) 17:10, 25 April 2024 (UTC)[reply]
    I read on some AI-test tool I tried a caveat, something like "don't use this to punish students." Gråbergs Gråa Sång (talk) 18:22, 25 April 2024 (UTC)[reply]
    IF you have double checked the AI generated source, and it a) actually exists, b) is reliable and c) directly supports the information in the article… then it doesn’t really matter how the source was “generated”. The key is that a human has checked it. Blueboar (talk) 12:40, 3 July 2024 (UTC)[reply]
The general standard applied to trusted news organizations is that it is assumed that they have a process to ensure that their articles are sufficiently reliable, regardless of which specific writer wrote the article. We do not say: You can trust NYT articles if they are written by Mary, but not if they are written by Bob. In theory, there is no difference in this regard between articles written by humans or AI. If they do not fact-check articles written by AI, then it is likely that they also don't fact-check articles written by human writers. And it is certainly possible (in theory) that a news organization only publishes AI articles that are thoroughly fact-checked and corrected, although the use of AI is a red flag that suggests that they are cutting corners.
But that is all generic theory. I would argue that Hoodline is not a good source in general, since they don't even have their own Wikipedia page. Also, Wikipedia states this about Nextdoor, the company that owns them:

In 2019, Nextdoor acquired local news site Hoodline. Later that year, HuffPost and Wired reported that Nextdoor paid a firm to improve its reputation by lobbying for changes to the Wikipedia articles on Nextdoor, NBC, and several other corporations

If they do this, then I have no faith in the quality of Hoodline's reporting. This may just be a AI-generated platform to place ads on, with no journalistic standards. This is something that has cropped up in recent years and will probably become a bigger issue as AI improves, becomes cheaper, becomes easier to use, etc. Aapjes (talk) 11:59, 30 August 2024 (UTC)[reply]

Clarification on reliability and sponsored / promotional content

[edit]

WP:SPONSORED notes that such sources are "generally unacceptable", but could they still be used for statements of basic fact? For instance could a promotional piece about an individual be used to verify their date and place of birth, and other such non-promotional details? The same for sources that use superlatives, for instance could "They were the greatest artist of all time" be used to verify that they were an artist? I've instinctively thought of the answer to these questions as being 'yes', but I unsure that necessarily matches policy/guidance. -- LCU ActivelyDisinterested «@» °∆t° 10:58, 20 July 2024 (UTC)[reply]

In the first instance I'd say maybe, depending on the specific case. WP:ABOUTSELF can be argued to apply, but would it be good enough for WP:DOB? IMO, not necessarily. In the second instance I'd say no. If a sponsored source is the best source calling someone an artist, that's not good enough reason WP should. Gråbergs Gråa Sång (talk) 11:23, 20 July 2024 (UTC)[reply]
In the second instance what if the source wasn't sponsored but did contain very promotional language? -- LCU ActivelyDisinterested «@» °∆t° 11:30, 20 July 2024 (UTC)[reply]
Well for example, while looking for sources for a draft yesterday, a review in The Stage said of my subject "some of the fastest comic juggling you are ever likely to see." I'd absolutely take that as a RS that he's a juggler, if I needed it. Gråbergs Gråa Sång (talk) 11:38, 20 July 2024 (UTC)[reply]
Yes that the sort of source and issue I meant. -- LCU ActivelyDisinterested «@» °∆t° 11:51, 20 July 2024 (UTC)[reply]
I'm not sure that our actual rules are clear in that section. For example, the first sentence says: "Sponsored content is generally unacceptable as a source, because it is paid for by advertisers and bypasses the publication's editorial process."
The problem identified (i.e., bypasses editorial processes) is real, but our rule ought to be closer to "Treat it like an advertisement...because that's what it is". It'd be silly to say that you could support a claim to the subject's self-published and non-independent Twitter account, or to a full-page ad placed by the subject in a magazine, but if the subject pays for an ad in the form of sponsored content, then that's suddenly beyond the pale. WhatamIdoing (talk) 23:40, 27 July 2024 (UTC)[reply]
You've expressed the issue far better than I could. -- LCU ActivelyDisinterested «@» °∆t° 00:07, 28 July 2024 (UTC)[reply]
Let's change the first sentence in that section, then.
  • Current: Sponsored content is generally unacceptable as a source, because it is paid for by advertisers and bypasses the publication's editorial process.
  • Idea #1: Sponsored content is a type of paid advertisement and should be treated like any other paid advertisement.
  • Idea #2: Sponsored content, like other paid advertisements, are paid for by advertisers and are not subject to the publication's editorial process. Advertisements can be cited, but they are non-independent and should be treated as self-published and primary sources in articles.
  • Idea #3: Sponsored content is a paid advertisement that is formatted to look like an article, but whose content and decision to publish is controlled by the sponsor, rather than the publication's editors.
Do you have some ideas? WhatamIdoing (talk) 01:10, 28 July 2024 (UTC)[reply]
I like #3 it direct, defines what sponsored content is, and why that's an issue. But #2 is more explicit, which will likely help stop editors quibbling over it at a later date.
The main thing I like about #3 is "is a paid advertisement that is formatted to look like an article", it closes off silly questions of why sponsored content is an advert (e.g. "How is it an advert when it's a review of the product?").
The second half of #2 ("Advertisements can be...") is also good as it clarifies how they can be used and directs readers to the relevant policies.
So a combination of #3, with the second half of #2 maybe. -- LCU ActivelyDisinterested «@» °∆t° 12:09, 28 July 2024 (UTC)[reply]
The thing that worries me about "whose content and decision to publish is controlled by the sponsor, rather than the publication's editors" is that someone's going to say "But the influencer promises that the content is her own honest opinion, so that's not controlled by the sponsor!"
  • Idea #2+#3: "Sponsored content is a paid advertisement that is formatted to look like an article or other piece of typical content for that outlet. Advertisements can be cited, but they are non-independent and should be treated as self-published and primary sources in articles."
  • Idea #2+#4: Sponsored content is a paid advertisement that is formatted to look like an article or other piece of typical content for that outlet. The content may be directly controlled by the sponsor, or the advertiser may pay an author to create the content (e.g., influencer marketing). Advertisements can be cited, but they are non-independent and should be treated as self-published and primary sources in articles."
WhatamIdoing (talk) 19:07, 28 July 2024 (UTC)[reply]
Yes #2+#4 is an improvement, and you're right about the influencer issue. -- LCU ActivelyDisinterested «@» °∆t° 22:26, 28 July 2024 (UTC)[reply]
I think that would be an improvement. Let's wait until tomorrow, just in case anyone has any objections? WhatamIdoing (talk) 23:50, 28 July 2024 (UTC)[reply]
Fine with me. -- LCU ActivelyDisinterested «@» °∆t° 13:27, 29 July 2024 (UTC)[reply]
Done. WhatamIdoing (talk) 19:20, 30 July 2024 (UTC)[reply]

Where is the list of consensus of which websites are reliable?

[edit]

Where is the list of consensus of which websites are reliable? Personally I find it to be extremely hard to find. Please make it easier to find. NamelessLameless (talk) 06:01, 5 August 2024 (UTC)[reply]

Are you perhaps asking about WP:Reliable sources/Perennial sources (shortcut WP:RSP) - which lists those sources we have discussed multiple times? Blueboar (talk) 10:27, 5 August 2024 (UTC)[reply]
Yeah, that's what I was trying to find. NamelessLameless (talk) 22:50, 9 August 2024 (UTC)[reply]
Note that we don't have (and can't have) either an exhaustive list of 'reliable' sources, or of 'unreliable' ones. Instead, we have policy describing the types of sources that are likely to be considered reliable, and mechanisms for discussing whether a particular source should be considered reliable for particular content. WP:RSNP consists of a list of repeatedly discussed sources only. Generally speaking, these tend to be edge cases of one sort or another. AndyTheGrump (talk) 10:54, 5 August 2024 (UTC)[reply]
The list itself is at Sources Mcljlm (talk) 14:53, 5 August 2024 (UTC)[reply]
That is only a list of sources that have been discussed regularly at RSN, it isn't close to being a full list of consensus of which sources are reliable. As well as discussions on RSN that don't appear on the list many project maintain lists related to their areas. -- LCU ActivelyDisinterested «@» °∆t° 16:44, 5 August 2024 (UTC)[reply]
How can the various lists be found? Mcljlm (talk) 06:36, 6 August 2024 (UTC)[reply]
Prior discussions on RSN can be found by searching the archives, there's a search block in the RSN header. I don't know of any easy way of finding all the project lists. NPP maintain a quite big list, Wikipedia:New page patrol source guide, but it still won't be a complete list and they have their own reasons for maintaining it. Ultimately the reason there isn't a single list is that editors should be looking to the relevant policy and guideline, and using their own good judgement. The consensus lists are meant to help editors when disagreement exists about verification of article content, so the same discussions don't have to happen repeatedly. -- LCU ActivelyDisinterested «@» °∆t° 10:10, 6 August 2024 (UTC)[reply]
@NamelessLameless and @Mcljlm, I am curious why you expect a list to exist. Did another editor perhaps claim that a source you wanted to use wasn't on an approved list?
There are somewhere around 1,500,000,000 websites. If an editor spent just one minute looking at assessing each of them, it would take 3,000 years of round-the-clock work – 24 hours a day, 365 days a year, for 40 lifetimes – to make such a list. Also, because websites spring up and then get removed, the list would be seriously out of date even after a few years. It is impossible. There is no list, and there never will be any such list. WhatamIdoing (talk) 21:23, 7 August 2024 (UTC)[reply]

WP:RS : One more essay, or updating present ones?

[edit]

May be you know, here @ WP I keep taking some constructive initiatives to fill information and knowledge gap areas. Also have started doing little bit of content level of mentoring. Since last couple of months I am contemplating to take initiative to get couple of essays written from other experienced users.

One essay, I would like to take initiative, which would give glimpse of meticulous selection and application of academic scholarly sources that would have better chance to stand at GA, FA, CTOP and during any intense level of content negotiation. Some essay similar to WP:TIERS, but with more practical examples and guidance may be like WP:RSVETTING.

Idk from where to begin whom all to request. I know as of now already there are good number of essays exist and still I do think there is scope for reviewing present essays finding and discussing gap areas and promoting one more essay as said above.

Requesting inputs. Bookku (talk) 10:16, 8 August 2024 (UTC)[reply]

Do reliable sources have to be informed of their use in a Wikipedia article?

[edit]

Do reliable sources have to be informed of their use in a Wikipedia article? Howie Marx (talk) 07:01, 21 August 2024 (UTC)[reply]

Why would/should they be? Headbomb {t · c · p · b} 07:04, 21 August 2024 (UTC)[reply]
I didn`t know if we needed their permission or not to include them as a reliable source in an article ....I can`t think of any reason why they would object to this ... surely it`s good for them too? (as long as it`s factual)
Do you think they should be informed Headbomb? Howie Marx (talk) 07:27, 21 August 2024 (UTC)[reply]
No they don't need to be informed, and their objections wouldn't matter if they had any. -- LCU ActivelyDisinterested «@» °∆t° 10:03, 21 August 2024 (UTC)[reply]

Reliability and Time

[edit]

How much does time factor into the reliability of sources and the accuracy of information? For example, say an article has multiple sources - enough to pass WP:N and WP:V. However, although the sources are regarded as reliable - as in Generally or Marginally - the info from them may be outdated. Perhaps the article was of an older topic that was notable but wasn't created until after a long time has passed. Yet, the only evidence that proves such info is out of date comes from primary and/or potentially unreliable sources. Since the site requires that articles are based on secondary sources, what would an editor do in this situation? Is it better to leave the article intact until a new reliable secondary source is found? Or should the article be updated with current information, even if that info is from dubious sourcing (or even none at all)? I was under the impression that WP:VNT and WP:NTEMP applies, but is it concerning to leave it even if no other sources ever emerge? Thanks, PantheonRadiance (talk) 22:25, 22 August 2024 (UTC)[reply]

To me, the distinction between primary and secondary sources is prescient: a source becomes primary when the context in which it was written has been sufficiently diverged from by "our own", that it no longer suffices to transparently verify claims and communicate information to the reader. That is, if the reader attempted to make deductions while intuitively applying the understanding of the world presently around them onto the text, they would get crucial facts wrong. A primary source requires an additional layer of interpretation and expertise between it and transparent, verifiable claims we can cite. Remsense ‥  22:43, 22 August 2024 (UTC)[reply]
To directly answer your question: I guess we'll have to find out when it happens? It seems like it would depend entirely on what the article is specifically about. Remsense ‥  22:43, 22 August 2024 (UTC)[reply]
Thanks for your reply! I think I get your point, but just to clarify: unless the info is inherently obvious, most of the time the info from primary sources needs analysis and expertise before we can use it on Wikipedia? And such analysis can generally only be done through a reliable secondary source? That's my line of thinking too. I believe that when it comes to the ease of spreading misinformation online, secondary reliable sources should definitely be used to combat that. PantheonRadiance (talk) 23:15, 22 August 2024 (UTC)[reply]
Also for a specific example, my impetus to discuss this was based on Smosh Games. The article became a redirect ten years ago due to a lack of notability. Recently I uncovered multiple sources found since then that proved notability per WP:WEB. However, there was recent contention regarding the info in the article, namely whether much of it was inaccurate because it was outdated - due to the ten years since the AfD. While I believed in sustained notability, another editor claimed its inaccuracy, and continuously added info attempting to update it, without verifying that the info came from secondary sources. I objected to that due to failing WP:V and WP:OR among other MOS guidelines. Needless to say it's a messy debate. PantheonRadiance (talk) 23:20, 22 August 2024 (UTC)[reply]
@PantheonRadiance, inaccuracy gets solved with the [Edit] button, not the delete button. If an old source says "500 members" or "revenue of $2 million", and that's alleged to be inaccurate due to being out of date, then copyedit it to say "500 members in 1965" or "revenue of $2 million in the 2015–2016 fiscal year". WhatamIdoing (talk) 15:53, 30 August 2024 (UTC)[reply]

Dubious

[edit]
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
Please discuss at Wikipedia:Village pump (miscellaneous) WhatamIdoing (talk) 15:57, 30 August 2024 (UTC)[reply]

What should be done to overhaul the {{dubious}} template? Literally every time I've seen it on an article, there is zero discussion on the talk page about what may be dubious in the article. I discussed this on the talk page a while back, but the discussion just went around in circles and fizzled out. Should a drive be done to remove drive-by instances of this tag where no discernible discussion exists? Ten Pound Hammer(What did I screw up now?) 19:41, 28 August 2024 (UTC)[reply]

Also asked at WP:VPM and at WT:V… Please don’t ask the same question at three venues. Consolidate the discussions. Blueboar (talk) 20:40, 28 August 2024 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Bold edit on WP:AGEMATTERS

[edit]

In the passage With regard to historical events, older reports (closer to the event, but not too close such that they are prone to the errors of breaking news) are can be less likely to have errors introduced by repeated copying and summarizing. I have changed the underlined portion.

The main reason for this is ancient primary sources (Plutarch, Livy, Sallust, Cicero, Polybius, Thucydides, etc). They are in fact older and closer to historical events. They are not also necessarily more reliable. The transmission chains for these sources are complicated both in terms of how they were written (see eg Quellenforschung) and how they were copied to the present (eg emendation). For counterexample, it is now relatively common to question descriptions given in, say, Livy on the basis of alternate versions in Dio, even though Dio is later than Livy; similar issues pop up in emendation, where the "earliest" version of a manuscript is not necessarily the one which is accepted. A E Housman in a rather old review, and very fun to read in a base way, a few times aimed his (extremely sharp) skills of invective at that exact assumption.

I noticed this while doing some edits to an essay of mine (User:Ifly6/Primary sources in classics) which also explains why ancient primary sources are problematic. Anyway, I thought the statement rather broad and weakened it. Ifly6 (talk) 17:36, 29 August 2024 (UTC)[reply]