There’s already extensions that somehow skip sponsorship sections, so it won’t even take that long.
That’s “crowdsourced”, i.e. manually done by volunteers on per-video basis.
I see a good use case for AI, can also be crowd sourced.
It’s illegal to not identify an ad as an ad (unless you’re a movie maker, but that’s a different topic). All ad blockers need to do is read that indicator. That might not be super simple, but I have faith in the abilities of the brilliant people behind many ad-blocking technologies.
That’s actually hurt by this because it uses timestamps supplied by users to work. But now they are off because the ads are of variable length. We can just hope that YouTube keeps the ability to link to a specific timestamp because then it has to calculate the difference and that can be used by Sponsorblock and adblockers alike.
But then those ads either need to be skippable or not skippable with some kind of metadata which can be used against it by injected scripts.
The problem is those blocking extensions are based on timestamps. Those timestamps are added by the users, it’s a crowdsourced thing. But the ads a single user will see differ from what another user will see. It’s likely the length of the ads is different, which makes the whole timestamp thing a no go.
Along with the timestamp, there needs to be a way to detect where the actual video begins. That way at least an offset can be applied and timestamps maintained, but it would introduce a certain level of error.
The next issue would be to then advance the video to the place where the actual video begins. This can be very hard, as it would need to include some way of recognizing the right frame in the buffer. One requirement is that the starting frame is actually in the buffer (with ads more than a few seconds, this isn’t guaranteed). The add-on has access to this buffer (depending on the platform, this isn’t guaranteed). And there’s a reliable way to recognize the right frame, given the different encoding en quality setups.
And this needs to be done cheap, so with as little as infrastructure as possible. A database of timestamps is very small and crowdsourcing those timestamps is relatively easy. But recognizing frames requires more data to be stored and crowdsourcing the right frame is a lot harder than a timestamp. If the infrastructure ends up being complex and big, someone needs to pay for that. I don’t know if donations alone would cut it. So you would need to play ads, which is exactly what you intend on not doing.
I’m sure the very smart and creative people working on these things will find a way. But it won’t be easy, so I don’t expect a solution very soon.
You need more data to recognize frames, but not a lot more data. A hash for each quality setting would be sufficient as long as they don’t start fuzzing the videos, which would be very expensive on their part.