45

(Cross-posted as a bug report to MSE which lists all affected languages found thus far. Please go there to see if a language you've observed being labeled incorrectly is included in the table)

I just loaded a question and saw what I assume is a new feature in code blocks where the site tells us at a glance what it thinks the language is in said code block:

Screenshot of HTML code in a question tagged 'HTML' showing 'xml' as the type of code being displayed, which is incorrect.

This question is tagged and the actual code fences explicitly state the code is HTML as well:

Screenshot of the markup showing HTML explicitly listed with the code fences

This might just be specific to HTML/XML, since other questions tagged, e.g. seem to 'guess' correctly. The question where I saw this is: Name attribute on details element

Can this please be fixed to not guess at the very least when the language is explicitly stated in the code fences, and preferably also not when a (single) language tag is used in the question?

(Or better yet, if you think you detect a language that is not tagged in the question, prompt the user to confirm whether that language is correct before allowing them to post the question, and if not, change the guess or auto-add the language tag. That would save on a fair amount of "you can't close this question as a duplicate with your gold badge because the question didn't have the correct language tag beforehand).

Another example: An / question that autodetects different code blocks as Haskell, Sass, and Rust: Find locations with the same travel time to a destination: Heatmap/Contours based on transportation time (Reverse Isochronic Contours):

18
  • 3
    Should the code-fence-language be just html? Maybe lang-html isn't a known language and then it guesses. Commented Nov 4 at 18:24
  • 1
    @mkrieger1 I haven't tested but that could be a problem, although either way it still shouldn't guess a totally different language, just maybe not get that it is HTML. OP may be mixing the old HTML comment style of <!-- language: lang-html --> with the new code fences style. Also, annoyingly, the language guesser doesn't show up in post previews, only the main view once the edit is saved, so I can't test without... actually changing the content, which I want to avoid for this bug report, at least. Commented Nov 4 at 18:27
  • 4
    Runnable snippets never looked better Commented Nov 4 at 18:51
  • @VLAZ oy vey🤦‍♂️-- might be nice if we could get a hybrid view option for Stack Snippets where we hide/collapse all the code but still get the 'run' buttons. Commented Nov 4 at 18:54
  • 1
    It thinks my DRL is Java. Gross. Commented Nov 4 at 19:36
  • @RoddyoftheFrozenPeas That one might be fixable by removing the Java tag on the question... is it even needed there? Would be a good test to see whether/how quickly it updates upon tag change. Commented Nov 4 at 19:43
  • 3
    In theory, Drools is technically a Java library. It's not hurting anything by being there, and theoretically whatever issue the user is having might be caused by the Java side of things going funky. (In this case, it turned out to be irrelevant to the issue at hand.) But just because the post is tagged Java shouldn't mean every other code block in the answers should inherit that. Commented Nov 4 at 19:49
  • 6
    "But just because the post is tagged Java shouldn't mean every other code block in the answers should inherit that." Maybe. Yet, the syntax highlighter tries to use whatever tag has syntax hint defined. The [java] tag does - scroll to the very bottom if the wiki where it says "Code Language (used for syntax highlighting): lang-java". The [drools] tag does not. Which might be for the best - if two tags have a syntax hint and it's different hints, the highlighter disregards all hints and guesses the language. Commented Nov 4 at 19:57
  • 4
    I'm seeing snippets on PHP-tagged pages being labelled as "kotlin" snippets. example Commented Nov 5 at 2:11
  • 1
    @mickmackusa the question you linked to has the [php] tag and the [mysql] tag. Both define different syntax hints. As I said in my previous comment that makes the highlighter simply guess the language in the code block. Commented Nov 5 at 5:08
  • 4
    @RoddyoftheFrozenPeas it hurts in the way that people who are watching the Java tag now get that Drools question shoved to them when they weren't asking for it. It's not a Java language question, it's a tool question only for the eyes of people interested in Drools. The Java tag is an illegal fishing practice there. Commented Nov 5 at 9:25
  • 4
    And VBA code blocks are tagged "vbnet", adding to existing confusion ... Commented Nov 5 at 21:13
  • 1
    I'm noticing that C++ code blocks are being reported as cpp, even though [c++] is the real tag and [cpp] is just a synonym of [c++]. Commented Nov 6 at 19:59
  • 3
    I don’t understand why even label the language at all. The context around the code block usually makes the language unambiguous – and if it does not, that surely means the post should be improved. Commented Nov 7 at 16:37
  • 1
    @user692942 Yep that one is already included in the MSE crosspost list Commented Nov 18 at 22:05

2 Answers 2

11

As an alternative to the newly introduced copy button for code blocks on Stack Overflow, I've updated my userscript (see the StackApps post for more info) to offer a superior (imo), cross-site experience.

The advantages of having this script installed:

  • Works on all Stack Exchange websites (not just Stack Overflow)

  • Guaranteed plain-text attribution. The official SO feature often incorrectly guesses the programming language and comments out the attribution with the wrong syntax, creating extra work. This script provides the attribution as plain text, letting you decide how to comment it out

  • Gives the right attribution (i.e. link to the post, author's name, date of retrieval, and license info) for all posts (regular Q&A's, community wikis, posts by deleted users, etc.)

  • The copy button only appears on hover (unlike SO's feature), saving screen space. It cleanly overrides (removes) the default Stack Overflow copy button

Click to install the updated version with a script manager. Note: I have only tested this with Tampermonkey on Firefox.

3
  • 9
    this is how the official feature should have looked like - nice and clean and not wasting space on every single code block. But wasting space seems to be a common theme with new SO "features" such as the comments redesign... pretty sad state of affairs. Commented Nov 6 at 9:08
  • 3
    And not causing page jumps. Commented Nov 8 at 6:59
  • @dumbass it also works on revision page (although doesn't get the license right if a revision has a different license than the current version of the post.) Anyway, the list goes on, I just thought I made my case already xD Commented Nov 8 at 7:41
6

This is now fixed for HTML/XML, which is the language used on the post where I first saw the issue, thanks to a new hardcoded exception for HTML/XML, per Kristina Lustig on MSE:

thanks for the report - we didn't change anything about how language highlighting works, this was an issue with language aliases that never came up because we weren't surfacing language names before. we implemented a fix that should be live soon (probably tomorrow) that special cases html/xml. – kristinalustig 2025-11-05 21:17:06Z

As evidenced by the same post now showing HTML correctly:

enter image description here

However, it's not fixed for the broader issue of multiple languages being tagged.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.