• Fubarberry
    link
    fedilink
    English
    62 months ago

    I asked mistral/brave AI and got this response:

    How Many Rs in Strawberry

    The word “strawberry” contains three "r"s. This simple question has highlighted a limitation in large language models (LLMs), such as GPT-4 and Claude, which often incorrectly count the number of "r"s as two. The error stems from the way these models process text through a process called tokenization, where text is broken down into smaller units called tokens. These tokens do not always correspond directly to individual letters, leading to errors in counting specific letters within words.

    • @jj4211@lemmy.world
      link
      fedilink
      English
      22 months ago

      Yes, at some point the meme becomes the training data and the LLM doesn’t need to answer because it sees the answer all over the damn place.

  • @otp@sh.itjust.works
    link
    fedilink
    English
    12 months ago

    From a linguistic perspective, this is why I am impressed by (or at least, astonished by) LLMs!

  • @Grabthar@lemmy.world
    link
    fedilink
    English
    162 months ago

    Doc: That’s an interesting name, Mr…

    Fletch: Babar.

    Doc: Is that with one B or two?

    Fletch: One. B-A-B-A-R.

    Doc: That’s two.

    Fletch: Yeah, but not right next to each other, that’s what I thought you meant.

    Doc: Isn’t there a children’s book about an elephant named Babar.

    Fletch: Ha, ha, ha. I wouldn’t know. I don’t have any.

    Doc: No children?

    Fletch: No elephant books.

  • @daniskarma@lemmy.dbzer0.com
    link
    fedilink
    English
    21
    edit-2
    2 months ago

    That happens when do you not understand what is a llm, or what its usecases are.

    This is like not being impressed by a calculator because it cannot give a word synonym.

    • @Strykker@programming.dev
      link
      fedilink
      English
      52 months ago

      But everyone selling llms sells them as being able to solve any problem, making it hard to know when it’s going to fail and give you junk.

    • xigoi
      link
      fedilink
      English
      52 months ago

      Sure, maybe it’s not capable of producing the correct answer, which is fine. But it should say “As an LLM, I cannot answer questions like this” instead of just making up an answer.

      • @daniskarma@lemmy.dbzer0.com
        link
        fedilink
        English
        62 months ago

        I have thought a lot on it. The LLM per se would not know if the question is answerable or not, as it doesn’t know if their output is good of bad.

        So there’s various approach to this issue:

        1. The classic approach, and the one used for censoring: keywords. When the llm gets a certain key word or it can get certain keyword by digesting a text input then give back a hard coded answer. Problem is that while censoring issues are limited. Hard to answer questions are unlimited, hard to hard code all.

        2. Self check answers. For everything question the llm could process it 10 times with different seeds. Then analyze the results and see if they are equivalent. If they are not then just answer that it’s unsure about the answer. Problem: multiplication of resource usage. For some questions like the one in the post, it’s possible than the multiple randomized answers give equivalent results, so it would still have a decent failure rate.

        • xigoi
          link
          fedilink
          English
          -12 months ago

          Why would it not know? It certainly “knows” that it’s an LLM and it presumably “knows” how LLMs work, so it could piece this together if it was capable of self-reflection.

  • @dan1101@lemm.ee
    link
    fedilink
    English
    132 months ago

    It’s like someone who has no formal education but has a high level of confidence and eavesdrops on a lot of random conversations.

  • @humorlessrepost@lemmy.world
    link
    fedilink
    English
    8
    edit-2
    2 months ago

    Works fine for me in o3-mini-high:

    Counting letters in “strawberry”

    Alright, I’m checking: the word “strawberry” is spelled S T R A W B E R R Y. Let me count the letters: S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). There are three R’s: in positions 3, 8, and 9. So, the answer is 3. Even if we ignore case, the count still holds. Therefore, there are 3 r’s in “strawberry.”

  • @Tgo_up@lemm.ee
    link
    fedilink
    English
    152 months ago

    This is a bad example… If I ask a friend "is strawberry spelled with one or two r’s"they would think I’m asking about the last part of the word.

    The question seems to be specifically made to trip up LLMs. I’ve never heard anyone ask how many of a certain letter is in a word. I’ve heard people ask how you spell a word and if it’s with one or two of a specific letter though.

    If you think of LLMs as something with actual intelligence you’re going to be very unimpressed… It’s just a model to predict the next word.

    • @Grandwolf319@sh.itjust.works
      link
      fedilink
      English
      42 months ago

      If you think of LLMs as something with actual intelligence you’re going to be very unimpressed

      Artificial sugar is still sugar.

      Artificial intelligence implies there is intelligence in some shape or form.

      • @Scubus@sh.itjust.works
        link
        fedilink
        English
        02 months ago

        Thats because it wasnt originally called AI. It was called an LLM. Techbros trying to sell it and articles wanting to fan the flames started called it AI and eventually it became common dialect. No one in the field seriously calls it AI, they generally save that terms to refer to general AI or at least narrow ai. Of which an llm is neither.

      • @corsicanguppy@lemmy.ca
        link
        fedilink
        English
        32 months ago

        Artificial sugar is still sugar.

        Because it contains sucrose, fructose or glucose? Because it metabolises the same and matches the glycemic index of sugar?

        Because those are all wrong. What’s your criteria?

        • @Grandwolf319@sh.itjust.works
          link
          fedilink
          English
          22 months ago

          In this example a sugar is something that is sweet.

          Another example is artificial flavours still being a flavour.

          Or like artificial light being in fact light.

      • JohnEdwa
        link
        fedilink
        English
        3
        edit-2
        2 months ago

        Something that pretends or looks like intelligence, but actually isn’t at all is a perfectly valid interpretation of the word artificial - fake intelligence.

      • @Tgo_up@lemm.ee
        link
        fedilink
        English
        12 months ago

        Exactly. The naming of the technology would make you assume it’s intelligent. It’s not.

    • @renegadespork@lemmy.jelliefrontier.net
      link
      fedilink
      English
      262 months ago

      If you think of LLMs as something with actual intelligence you’re going to be very unimpressed… It’s just a model to predict the next word.

      This is exactly the problem, though. They don’t have “intelligence” or any actual reasoning, yet they are constantly being used in situations that require reasoning.

      • @sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        English
        52 months ago

        Maybe if you focus on pro- or anti-AI sources, but if you talk to actual professionals or hobbyists solving actual problems, you’ll see very different applications. If you go into it looking for problems, you’ll find them, likewise if you go into it for use cases, you’ll find them.

        • @renegadespork@lemmy.jelliefrontier.net
          link
          fedilink
          English
          12 months ago

          Personally I have yet to find a use case. Every single time I try to use an LLM for a task (even ones they are supposedly good at), I find the results so lacking that I spend more time fixing its mistakes than I would have just doing it myself.

          • @Scubus@sh.itjust.works
            link
            fedilink
            English
            22 months ago

            So youve never used it as a starting point to learn about a new topic? You’ve never used it to look up a song when you can only remember a small section of lyrics? What about when you want to code a block of code that is simple but monotonous to code yourself? Or to suggest plans for how to create simple sturctures/inventions?

            Anything with a verifyable answer that youd ask on a forum can generally be answered by an llm, because theyre largely trained on forums and theres a decent section the training data included someone asking the question you are currently asking.

            Hell, ask chatgpt what use cases it would recommend for itself, im sure itll have something interesting.

      • @Tgo_up@lemm.ee
        link
        fedilink
        English
        12 months ago

        What situations are you thinking of that requires reasoning?

        I’ve used LLMs to create software i needed but couldn’t find online.

        • @renegadespork@lemmy.jelliefrontier.net
          link
          fedilink
          English
          12 months ago

          Creating software is a great example, actually. Coding absolutely requires reasoning. I’ve tried using code-focused LLMs to write blocks of code, or even some basic YAML files, but the output is often unusable.

          It rarely makes syntax errors, but it will do things like reference libraries that haven’t been imported or hallucinate functions that don’t exist. It also constantly misunderstands the assignment and creates something that technically works but doesn’t accomplish the intended task.

          • @Tgo_up@lemm.ee
            link
            fedilink
            English
            12 months ago

            I think coding is one of the areas where LLMs are most useful for private individuals at this point in time.

            It’s not yet at the point where you just give it a prompt and it spits out flawless code.

            For someone like me that are decent with computers but have little to no coding experience it’s an absolutely amazing tool/teacher.

  • @HoofHearted@lemmy.world
    link
    fedilink
    English
    02 months ago

    The terrifying thing is everyone criticising the LLM as being poor, however it excelled at the task.

    The question asked was how many R in strawbery and it answered. 2.

    It also detected the typo and offered the correct spelling.

    What’s the issue I’m missing?

    • Tywèle [she|her]
      link
      fedilink
      English
      202 months ago

      The issue that you are missing is that the AI answered that there is 1 ‘r’ in ‘strawbery’ even though there are 2 'r’s in the misspelled word. And the AI corrected the user with the correct spelling of the word ‘strawberry’ only to tell the user that there are 2 'r’s in that word even though there are 3.

      • TomAwsm
        link
        fedilink
        English
        -12 months ago

        Sure, but for what purpose would you ever ask about the total number of a specific letter in a word? This isn’t the gotcha that so many think it is. The LLM answers like it does because it makes perfect sense for someone to ask if a word is spelled with a single or double “r”.

        • snooggums
          link
          fedilink
          English
          12 months ago

          It makes perfect sense if you do mental acrobatics to explain why a wrong answer is actually correct.

          • TomAwsm
            link
            fedilink
            English
            02 months ago

            Not mental acrobatics, just common sense.

        • @jj4211@lemmy.world
          link
          fedilink
          English
          12 months ago

          Except many many experts have said this is not why it happens. It cannot count letters in the incoming words. It doesn’t even know what “words” are. It has abstracted tokens by the time it’s being run through the model.

          It’s more like you don’t know the word strawberry, and instead you see: How many 'r’s in 🍓?

          And you respond with nonsense, because the relation between ‘r’ and 🍓 is nonsensical.

    • Fubarberry
      link
      fedilink
      English
      32 months ago

      There’s also a “r” in the first half of the word, “straw”, so it was completely skipping over that r and just focusing on the r’s in the word “berry”

      • @jj4211@lemmy.world
        link
        fedilink
        English
        22 months ago

        It doesn’t see “strawberry” or “straw” or “berry”. It’s closer to think of it as seeing 🍓, an abstract token representing the same concept that the training data associated with the word.

      • @catloaf@lemm.ee
        link
        fedilink
        English
        32 months ago

        It wasn’t focusing on anything. It was generating text per its training data. There’s no logical thought process whatsoever.

    • @TeamAssimilation@infosec.pub
      link
      fedilink
      English
      102 months ago

      Still, it’s kinda insane how two years ago we didn’t imagine we would be instructing programs like “be helpful but avoid sensitive topics”.

      That was definitely a big step in AI.

  • @Lazycog@sopuli.xyz
    link
    fedilink
    English
    32 months ago

    I can already see it…

    Ad: CAN YOU SOLVE THIS IMPOSSIBLE RIDDLE THAT AI CAN’T SOLVE?!

    With OP’s image. And then it will have the following once you solve it: “congratz, send us your personal details and you’ll be added to the hall of fame at CERN Headquarters”

  • Aatube
    link
    fedilink
    42 months ago

    I mean, that’s how I would think about it…

      • Aatube
        link
        fedilink
        32 months ago

        The typo in “strawbery” leads to a conversation like “hey you spelt this wrong there’s two r’s (after the e) not one”

        • @shrodes@lemmy.world
          link
          fedilink
          English
          1
          edit-2
          2 months ago

          It happens even if you ask how many “r”s are in “strawberry”. It’s a well-known AI gotcha that happens on most if not all current models. The typo in the original post is a little misleading and not that relevant.