The problem with (good) hashes is that when you change the input even slightly (maybe a different compression algorithm is used), the hash changes drastically
Yes, that’s why I’m proposing it as opposed to just one pixel to differentiate between ad and video. Youtube videos are already separated in sections, just add some metadata with a hash to every one.
I think that downsizing the scene to like 8x8 pixels (so basically taking the average color of multiple sections of the scene) would mostly work. In order to be undetected, the ad would have to match (at least be close to) the average color of each section, which would be difficult in my opinion: you would need to alter each ad for each video timestamp individually.
Yes, that could be an alternative to computing hashes, I don’t know what option would be less resource intensive