Talk:UTF-8
This is the talk page for discussing improvements to the UTF-8 article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This level-5 vital article is rated B-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||
|
Index
|
|||||
This page has archives. Sections older than 90 days may be automatically archived by Lowercase sigmabot III. |
Table should not only use color to encode information (but formatting like bold and underline)
[edit]As in a previous comment https://en.wikipedia.org/wiki/Talk:UTF-8/Archive_1#Colour_in_example_table? this has been done before, and is *better* so that everyone can clearly see the different part of the code. Relying on color alone is not good, due to color vision deficiencies and varying color rendition on devices.
Microsoft script dead link
[edit]and Microsoft has a script for Windows 10, to enable it by default for its program Microsoft Notepad
"Script How to set default encoding to UTF-8 for notepad by PowerShell". gallery.technet.microsoft.com. Retrieved 2018-01-30.
https://gallery.technet.microsoft.com/scriptcenter/How-to-set-default-2d9669ae?ranMID=24542&ranEAID=TnL5HPStwNw&ranSiteID=TnL5HPStwNw-1ayuyj6iLWwQHN_gI6Np_w&tduid=(1f29517b2ebdfe80772bf649d4c144b1)(256380)(2459594)(TnL5HPStwNw-1ayuyj6iLWwQHN_gI6Np_w)()
This link is dead. How to fix it? — Preceding unsigned comment added by Un1Gfn (talk • contribs) 02:58, 5 April 2021 (UTC)
- That text, and that link, appears to have been removed, so there's no longer anything to fix. Guy Harris (talk) 23:43, 21 December 2023 (UTC)
utf8 octal conversion
[edit]I think this section should be rewritten. It makes no sense to talk about bytes if you have triplets of octal numbers which make 9 bits in total, not 8. The grouping shown in the section is ambiguous (and wrong). --84.167.187.209 (talk) 02:24, 29 May 2021 (UTC)
- The table is correct, the results are 1 to 4 bytes, each displayed as 3 octal digits, the left-most digit cannot be greater than 3. If the bytes were somehow appended into a single octal number then you would first have an endieness question, and more importantly it would remove the alignment between the output octal digits and the input octal digits.
- I have made this modification of the octal table. Do you understand it? x,y,z and w are octal digits.--BIL (talk) 22:11, 29 May 2021 (UTC)
|
- Yes that is a lot clearer.Spitzak (talk) 23:44, 29 May 2021 (UTC)
I do agree there is a huge amount of bloat in this article, conversion from/to UTF-8 is actually really simple and I would love to see the majority of this text spew deleted.Spitzak (talk) 20:24, 29 May 2021 (UTC)
- Suggest/recommend throwing out the whole UTF-8#Octal section. I’m sure the intellectual exercise must have been “neat” or “kind of cool” to whoever took the time and effort to type it up and add it to the article, but IMHO it’s cruft like this that explains how this article got to be so long and bloated. I haven’t seen this appear in _any_ of the Unicode standards documents, and even the single reference cited admits that the API library just compares the binary, even if it might conceivably, theoretically be more convenient for a human with a scientific calculator converting hexadecimal to octal to compare bits manually. This article would IMHO be much more concise and more “encyclopedic” if the half of it comprising personal commentary/observations such as this section (which might be more appropriate, say, as a post on a personal blog, for example) were trimmed. —PowerPCG5 (talk) 08:35, 10 November 2021 (UTC)
- Excellent idea. This section does not add useful information. −Woodstone (talk) 13:52, 10 November 2021 (UTC)
- Absolutely agree. About 3/4 of this article is bloated with trivial observations and/or redundant rewording of the same information over and over again. I did edit this table last, not because I liked it, but it was even larger and more intrusive before (they put it in as more columns in the other tables), and attempts to just remove it got reverted...Spitzak (talk) 15:10, 10 November 2021 (UTC)
- Such is the unfortunate nature of a community-built wiki - editors contribute to their own niche and hobbies. Criticism of Wikipedia#Systemic bias in coverage
- Criticism of Wikipedia#Quality of writing is funny too. Wqwt (talk) 07:21, 4 September 2022 (UTC)
- Absolutely agree. About 3/4 of this article is bloated with trivial observations and/or redundant rewording of the same information over and over again. I did edit this table last, not because I liked it, but it was even larger and more intrusive before (they put it in as more columns in the other tables), and attempts to just remove it got reverted...Spitzak (talk) 15:10, 10 November 2021 (UTC)
- Excellent idea. This section does not add useful information. −Woodstone (talk) 13:52, 10 November 2021 (UTC)
Unfortunately there is an error in it. If first code point is 0200 then last code point can not be 3777, for example. Please consider description in main article of how Encoding process works, then you find that first Unicode code point is always 0 . Apparently that is how it is really done. — Preceding unsigned comment added by SiwardDeGroot (talk • contribs) 14:48, 21 July 2023 (UTC)
- You are talking about "overlong encodings". The code point 0 should be done by the one-byte entry in the first line of the table. Encoding code point 0 using the second line of the table is an error. Spitzak (talk) 15:45, 21 July 2023 (UTC)
US-ASCII
[edit]@Comp.arch: With respect to Special:Diff/1105781113, it's better to use just "ASCII" unless it could be misinterpreted as some other variant of ISO 646 instead of ANSI X3.4-1986. It is not the case here, but I think current usage in the article is okay. IANA preference for "US-ASCII" only matters for use in the charset parameter or similar where "ASCII" is not even a valid label at all. Please don't link it halfway as US-ASCII because that makes absolutely no sense in any context and looks like a formatting mistake. Link the whole US-ASCII, piping it if you don't like the redirect. – MwGamera (talk) 12:49, 22 August 2022 (UTC)
The article contains "{{efn", which looks like a mistake.
I would've fixed it myself but I don't know how to transform the remaining sentence to make sense. 2A01:C23:8D8D:BF00:C070:85C1:B1B8:4094 (talk) 16:17, 2 April 2024 (UTC)
- I fixed it, I think. I'm not 100% sure it's how the previous editors intended. I invite them to review and confirm. Indefatigable (talk) 19:03, 2 April 2024 (UTC)
Should "The Manifesto" be mentioned somewhere?
[edit]More specifically, this one: https://utf8everywhere.org -- Preceding unsigned comment added by Rudxain (talk o contribs) 21:52, 12 July 2024 (UTC)
- Only if it's got significant coverage in reliable sources. Remsense 22:10, 12 July 2024 (UTC)
- It's kind of ahistorical, since the Microsoft decisions that they deplore were made while developing Windows NT 3.1, and UTF-8 wasn't even a standard until Windows NT 3.1 was close to being released. There was more money to be made from East Asian customized computer systems than Unicode computer systems in 1993, so Unicode was probably not their main focus at that time... AnonMoos (talk) 20:30, 15 July 2024 (UTC)
- B-Class vital articles
- Wikipedia level-5 vital articles
- Wikipedia vital articles in Technology
- B-Class level-5 vital articles
- Wikipedia level-5 vital articles in Technology
- B-Class vital articles in Technology
- B-Class Computing articles
- Mid-importance Computing articles
- All Computing articles
- B-Class Computer science articles
- Mid-importance Computer science articles
- WikiProject Computer science articles
- B-Class Typography articles
- Mid-importance Typography articles