cross-posted from: https://lemmy.ca/post/37011397

!opensource@programming.dev

The popular open-source VLC video player was demonstrated on the floor of CES 2025 with automatic AI subtitling and translation, generated locally and offline in real time. Parent organization VideoLAN shared a video on Tuesday in which president Jean-Baptiste Kempf shows off the new feature, which uses open-source AI models to generate subtitles for videos in several languages.

  • @Nalivai@lemmy.world
    link
    fedilink
    English
    123 months ago

    The technology is nowhere near being good though. On synthetic tests, on the data it was trained and tweeked on, maybe, I don’t know.
    I corun an event when we invite speakers from all over the world, and we tried every way to generate subtitles, all of them run on the level of YouTube autogenerated ones. It’s better than nothing, but you can’t rely on it really.

    • @Petter1@lemm.ee
      link
      fedilink
      English
      -13 months ago

      You were not able to test it yet calling it nowhere near good 🤦🏻

      Like how should you know?!

      • @Nalivai@lemmy.world
        link
        fedilink
        English
        2
        edit-2
        3 months ago

        Relax, they didn’t write a new way of doing magic, they integrated a solution from the market.
        I don’t know what the new BMW car they introduce this year is capable of, but I know for a fact it can’t fly.

    • @lukewarm_ozone@lemmy.today
      link
      fedilink
      English
      2
      edit-2
      3 months ago

      Really? This is the opposite of my experience with (distil-)whisper - I use it to generate subtitles for stuff like podcasts and was stunned at first by how high-quality the results are. I typically use distil-whisper/distil-large-v3, locally. Was it among the models you tried?

      • @Nalivai@lemmy.world
        link
        fedilink
        English
        13 months ago

        I unfortunately don’t know the specific names of the models, I will comment additionally if I will not forget to ask people who spun up the models themselves.
        The difference might be that live vs recorded stuff, I don’t know.

    • @Scrollone@feddit.it
      link
      fedilink
      English
      63 months ago

      No, but I think it would be super helpful to synchronize subtitles that are not aligned to the video.

    • @TriflingToad@sh.itjust.works
      link
      fedilink
      English
      4
      edit-2
      3 months ago

      is your goal to rely on it, or to have it as a backup?
      For my purpose of having backup nearly anything will be better than nothing.

      • @Nalivai@lemmy.world
        link
        fedilink
        English
        23 months ago

        When you do live streaming there is no time for backup, it either works or not. Better than nothing, that’s for sure, but also maybe marginally better than whatever we had 10 years ago

  • m-p{3}
    link
    fedilink
    English
    71
    edit-2
    3 months ago

    Now I want some AR glasses that display subtitles above someone’s head when they talk à la Cyberpunk that also auto-translates. Of course, it has to be done entirely locally.

    • @Obi@sopuli.xyz
      link
      fedilink
      English
      203 months ago

      I guess we have most of the ingredients to make this happen. Software-wise we’re there, hardware wise I’m still waiting for AR glasses I can replace my normal glasses with (that I wear 24/7 except for sleep). I’d accept having to carry a spare in a charging case so I swap them out once a day or something but other than that I want them to be close enough in terms of weight and comfort to my regular glasses and just give me AR like overlaid GPS, notifications, etc, and indeed instant translation with subtitles would be a function that I could see having a massive impact on civilization tbh.

      • @vvv@programming.dev
        link
        fedilink
        English
        43 months ago

        I think we’re closer with hardware than software. the xreal/rokid category of hmds are comfortable enough to wear all day, and I don’t mind a cable running from behind my ear under a clothes layer to a phone or mini PC in my pocket. Unfortunately you still need to byo cameras to get the overlays appearing in the correct points in space, but cameras are cheap, I suspect these glasses will grow some cameras in the next couple of iterations.

      • Midnight Wolf
        link
        fedilink
        English
        23 months ago

        soon

        Breaking news: “WW3 starts over an insult due to a mistranslated phrase at the G7 summit. We will be nuked in 37 seconds. Fuck like rabbits, it’s all we can do. Now over to Robert with traffic.”

      • @AlligatorBlizzard@sh.itjust.works
        link
        fedilink
        English
        33 months ago

        It’d be incredible for deaf people being able to read captions for spoken conversations and to have the other person’s glasses translate from ASL to English.

        Honestly I’d be a bit shocked if the AI ASL -> English doesn’t exist already, there’s so much training data available, the Deaf community loves video for obvious reasons.

      • m-p{3}
        link
        fedilink
        English
        83 months ago

        I believe you can put prescription lenses in most AR glasses out there, but I suppose the battery is a concern…

        I’m in the same boat, I gotta wear my glasses 24/7.

  • @squid_slime@lemm.ee
    link
    fedilink
    English
    23 months ago

    When we getting amd’s fsr upscaling and frame-gen? Also would subtitles make more sense to use the jellyfin approach.

    • @Naz@sh.itjust.works
      link
      fedilink
      English
      13 months ago

      I have an AMD card, add VLC as a game in the drivers, and you can turn on AMFM (frame gen).

      If it doesn’t work you could just turn it on system wide in display settings of the Adrenaline Software (gear upper right corner, display/gaming).

      I think it requires at least a 6000 series GPU however.

      If you have a Samsung TV or other modern smart TV connected to a laptop, you can also turn on frame-gen using Auto Motion Plus, set to Custom.

      Judder Reduction 10 is double frames, so 24 FPS -> 48.

    • @WalnutLum@lemmy.ml
      link
      fedilink
      English
      13 months ago

      This is already implemented on windows

      Tools > Preferences > Show settings=All > Video \ Subtitles/OSD: Text rendering module [Speech synthesis for Windows]

  • @Doorbook@lemmy.world
    link
    fedilink
    English
    233 months ago

    The nice thing is, now at least this can be used with live tv from other countries and languages.

    Think you want to watch Japanese tv or Korean channels with out bothering about downloading, searching and syncing subtitles

  • Clot
    link
    fedilink
    English
    193 months ago

    Will it be possible to export these AI subs?

  • @renzev@lemmy.world
    link
    fedilink
    English
    1483 months ago

    This sounds like a great thing for deaf people and just in general, but I don’t think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.

    • @cley_faye@lemmy.world
      link
      fedilink
      English
      93 months ago

      It’s unlikely to even replace good subtitles, fan or not. It’s just a nice thing to have for a lot of content though.

      • @boonhet@lemm.ee
        link
        fedilink
        English
        11
        edit-2
        3 months ago

        I have family members who can’t really understand spoken English because it’s a bit fast, and can’t read English subtitles again, because again, too fast for them.

        Sometimes you download a movie and all the Estonian subtitles are for an older release and they desynchronize. Sometimes you can barely even find synchronized English subtitles, so even that doesn’t work.

        This seems like a godsend, honestly.

        Funnily enough, of all the streaming services, I’m again going to have to commend Apple TV+ here. Their shit has Estonian subtitles. Netflix, Prime, etc, do not. Meaning if I’m watching with a family member who doesn’t understand English well, I’ll watch Apple TV+ with a subscription, and everything else is going to be pirated for subtitles. So I don’t bother subscribing anymore. We’re a tiny country, but for some reason Apple of all companies has chosen to acknowledge us. Meanwhile, I was setting up an Xbox for someone a few years ago, and Estonia just… straight up doesn’t exist. I’m not talking about language support - you literally couldn’t pick it as your LOCATION.

    • @FordBeeblebrox@lemmy.world
      link
      fedilink
      English
      223 months ago

      They are like the * in any Terry Pratchett (GNU) novel, sometimes a funny joke can have a little more spice added to make it even funnier

  • @Thistlewick@lemmynsfw.com
    link
    fedilink
    English
    193 months ago

    Amazing. I can finally find out exactly what that nurse is yelling about while she gets railed by the local basketball team.

  • @VerPoilu@sopuli.xyz
    link
    fedilink
    English
    263 months ago

    I hope Mozilla can benefit of a good local translation engine that could come out of it as well.

        • @viking@infosec.pub
          link
          fedilink
          English
          23 months ago

          And it takes forever. I’m using the TWP plugin for Firefox (which uses external resources, configurable to google, bing and yandex translate respectively), and it’s near instantaneous. The local one from Mozilla often takes 30 seconds, and sometimes hangs until I refresh the page.

  • TheRealKuni
    link
    fedilink
    English
    273 months ago

    And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐

    • @DreamlandLividity@lemmy.world
      link
      fedilink
      English
      153 months ago

      I mean, it would. For example Jellyfin implements it, but it does so by extracting the pictures ahead of time and saving them. It takes days to do this for my library.

        • @DreamlandLividity@lemmy.world
          link
          fedilink
          English
          23 months ago

          I get what you are saying, but I don’t think there is any standardized format for these trickplay images. The same images from Plex would likely not be usable in Jellyfin without converting the metadata (e.g. to which time in the video an image belongs to). So VLC probably does not have a good way to understand trickplay images not made by VLC.

    • @cley_faye@lemmy.world
      link
      fedilink
      English
      103 months ago

      Video decoding is resource intensive. We’re used to it, we have hardware acceleration for some of it, but spewing something around 52 million pixels every second from a highly compressed data source is not cheap. I’m not sure how both compare, but small LLM models are not that costly to run if you don’t factor their creation in.

      • TheRealKuni
        link
        fedilink
        English
        13 months ago

        All they’d need to do is generate thumbnails for every period on video load. Make that period adjustable. Might take a few extra seconds to load a video. Make it off by default if they’re worried about the performance hit.

        There are other desktop video players that make this work.

  • ZeroOne
    link
    fedilink
    English
    393 months ago

    As long as the models are OpenSource I have no complains

  • billwashere
    link
    fedilink
    English
    483 months ago

    This might be one of the few times I’ve seen AI being useful and not just slapped on something for marketing purposes.

  • Phoenixz
    link
    fedilink
    English
    49
    edit-2
    3 months ago

    As vlc is open source, can we expect this technology to also be available for, say, jellyfin, so that I can for once and for all have subtitles.done right?

    Edit: I think it’s great that vlc has this, but this sounds like something many other apps could benefit from