• nieceandtows@programming.dev
    link
    fedilink
    arrow-up
    17
    ·
    12 days ago

    How are subtitles created usually? Are they provided by the source material team, some professional third party that manually transcribes the video, or just fans doing it for free?

    • megopie@beehaw.org
      link
      fedilink
      English
      arrow-up
      51
      arrow-down
      1
      ·
      edit-2
      12 days ago

      See that’s the kicker, for the longest time it was basically all fan translated subtitles, and only recently have payed for translation become the norm.

      So it’s really quite pathetic for them to try and save a few bucks by replacing a proper translator with a LLM, given that there are still plenty of passionate fans who would have done it for free. Especially given that translating between Japanese and English in a cultural context heavy situation is something these LLMs are really bad at.

    • sabreW4K3@lazysoci.alOP
      link
      fedilink
      arrow-up
      13
      ·
      12 days ago

      In terms of anime fansubs, it’s normally just great folks in the community. Some got hired by studios. But the studio is meant to provide the subs.

    • handsoffmydata@lemmy.zip
      link
      fedilink
      arrow-up
      4
      ·
      12 days ago

      I maintain my own media library and I ensure every file has English and German subtitles. There are a variety of ways to source srt files but when all else fails a machine with enough compute can transcribe video files using open source whisper. After I generate an English srt file from the video I send it to OpenAI to create the German translation.

      • dubyakay@lemmy.ca
        link
        fedilink
        arrow-up
        1
        ·
        12 days ago

        Is there something similar for manga? Something that can overlay Japanese text on images, similar to what we have on smartphones but for the PC?

        • handsoffmydata@lemmy.zip
          link
          fedilink
          arrow-up
          1
          ·
          11 days ago

          If your video file is Japanese language use a whisper model optimized for Japanese. Once it produces the Japanese srt you can get translations from open ai. Use handbrake to add the srt to the file and you’re done. Good luck!

  • SpectralPineapple@beehaw.org
    link
    fedilink
    arrow-up
    10
    arrow-down
    1
    ·
    edit-2
    12 days ago

    Although it seems likely that Crunchyroll uses an LLM for translation in some way, I wouldn’t call that “confirmed” since that might be the result of an individual translator using it.

    • t3rmit3@beehaw.org
      link
      fedilink
      arrow-up
      6
      arrow-down
      1
      ·
      11 days ago

      The actions of an employee, when reviewed and released by a company, are the actions of that company. A company is just the sum of its employees’ actions.

      • faercol@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        3
        ·
        10 days ago

        Also, LLM have been there for a while. So there are a few possible situations

        • LLM used is authorized or even encouraged. In this case it’s the company
        • LLM use is controlled, and this falls into one of the authorized cases. Same thing really. Also their authorized use cases need review
        • LLM use is forbidden, or restricted and this is not an authorized use. In this case it falls on the company to review what’s being done. It’s their responsibility.

        So yeah, whatever the situation, it’s on Crunchyroll.

  • Geodad@beehaw.org
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    12 days ago

    As someone who is able to speak Japanese, I’d notice the drop in quality of translation almost instantly.

    I never turn on subs anyway when I watch my anime though.

    • t3rmit3@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      11 days ago

      I have to since my partner doesn’t speak Japanese, but half the time I end up having to correct lines for them once or twice, to make things make sense. The non-egregious stuff I don’t even bother with. It’s crazy how amateurish some of the mistakes are, or even what are clearly choices to omit entire sentences, for no reason.

      おい、ゆうじ君、海行こうぜ

      “Hi Yuji!”

      • MaggiWuerze@feddit.org
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        11 days ago

        As someone who learns japanese. Is that a kanji for a honorific? probably kun? ゆうじ is the name, although weird that it is written in hiragana I guess… But I fail at this one 海行こうぜ

        The first Kanji has the one for mother as part of it I think… And the second one is pronounced it ‘i’ so …iikouze ? Let’s go somewhere?

        • t3rmit3@beehaw.org
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          10 days ago

          Yes, 君 is ‘kun’ when used as an honorific.

          海 is ‘umi’, or sea/ocean. You are correct that the second half of the kanji (母) is the same as the standalone character for mother, but it’s base radical is ⽏, which also just means mother. The first radical, ⺡, means water/ liquid, so you can sort of infer that “water mother” = ocean. Not all kanji work out this nicely with their radical structure, though.

          Last part is spot on, ikou (行こう) is the shortened (conjugation?) of iku or ‘to go’ that expresses a suggestion to do, i.e. “let’s (go)”.

          • MaggiWuerze@feddit.org
            link
            fedilink
            arrow-up
            1
            ·
            9 days ago

            Thanks for the feedback, seems my efforts weren’t entirely wasted :D Interesting, that the Kanji for water itself does not contain that rqficale (unless you squint heavily) What’s the difference to Ikkimashou? Isn’t that the suggestive form? As in ‘we should go’

            • t3rmit3@beehaw.org
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              8 days ago

              The radical for water is actually derived from the standalone kanji. It’s basically an extremely short-stroke version of the kanji.

              Ikimashou is just the ‘formal’, full-length version. No difference in meaning. Just as “iku” is the casual version of “ikimasu”.

              Ikimasu -> iku

              Ikimashou -> ikou

              • MaggiWuerze@feddit.org
                link
                fedilink
                English
                arrow-up
                2
                ·
                edit-2
                8 days ago

                Fascinating. That explains the similarity. Since watching that episode of Witch Watch I definitely feel bad about my formal “Duolingo” Japanese :D

                By the way, is there a rule to how these short forms are formed?

                • t3rmit3@beehaw.org
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  edit-2
                  1 day ago

                  By the way, is there a rule to how these short forms are formed?

                  Yep! Most Japanese verbs (with a few exceptions like ‘shimasu’ becoming suru) use one of the ‘i’ variants (‘i’, ‘ki’, ‘ni’, ‘mi’, or ‘ri’) after the kanji, that indicates they are verbs.

                  Yakimasu (to burn/ cook), shirimasu (to know), arukimasu (to walk), arimasu (to be), shinimasu (to die), yomimasu (to read).

                  Ki will become ku in the shortened version, ri will become ru, ni -> nu, etc:

                  yaku, shiru, aruku, aru, shinu, yomu

                  I believe the verbs that don’t end in one of those like tabemasu (to eat) will default to ‘ru’ (taberu), but I don’t know if that’s a rule off the top of my head, or if I just can’t think of any others right now.

                  In the cases where rendaku applies, such as oyogimasu (to swim), the end kana will also have rendaku applied, e.g. oyogu. Ki -> ku, gi -> gu.

    • James R Kirk@startrek.website
      link
      fedilink
      English
      arrow-up
      20
      ·
      12 days ago

      For YouTube tutorial videos I have no issue with relying on GPT, but I think it’s important to recognize that the translation of art is art. I don’t feel good about the idea of something without a soul or perspective interpolating a work of art from one culture and language into another that might be wildly different from where it started.

      That all said, I think Crunchyroll and anyone else using AI art without disclosing it absolutely should be honest about it.

      • null@slrpnk.net
        link
        fedilink
        arrow-up
        1
        ·
        11 days ago

        I feel like what makes the most sense and is likely what’s happening is that ChatGPT is being used to do the initial translation, and then a human is auditing that translation and making adjustments. So just a faster way to get the scaffolding and grunt-work out of the way.

        • megopie@beehaw.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          11 days ago

          they appear to be copying direct translations from chat GPT in to the subtitles, judging by the fact that one of the subtitles said “Chat GPT says:” and then the line in German. People who speak German also noticed that the grammar and sentence structure for many of these shows has been awful and nonsensical at times.

          If anyone is doing any sort of oversight, they don’t appear to speak German them selves and are just betting that the output will be accurate and pasting it in.

          Someone who spoke German and Japanese fluently enough to do competent oversight could probably translate faster than they could edit and rephrase the work of an LLM, which are notoriously bad at translating languages in a high context situation like dialog in a animated show. LLMs are also generally very bad with high context languages like Japanese, and even worse at translating between them and low context languages like German.

  • luciole (he/him)@beehaw.org
    link
    fedilink
    arrow-up
    3
    ·
    12 days ago

    Both translation and subtitles have highly efficient tooling when in the hands of a professional. Translators nowadays use a mix and will build up a dynamic database as they go through a corpus that needs coherence. What’s bad in this instance is not the usage of some AI, but of a badly adapted AI and ultimately of mediocre results which gives an amateurish impression.