That is prone to error, just a pixel can be too small of a sample. I would prefer something with hashes, just a sha1sum every 5 seconds of the current frame. It can be computed while buffering videos and wait until the ad is over to splice the correct region
The problem with (good) hashes is that when you change the input even slightly (maybe a different compression algorithm is used), the hash changes drastically
Yes, that’s why I’m proposing it as opposed to just one pixel to differentiate between ad and video. Youtube videos are already separated in sections, just add some metadata with a hash to every one.
I think that downsizing the scene to like 8x8 pixels (so basically taking the average color of multiple sections of the scene) would mostly work. In order to be undetected, the ad would have to match (at least be close to) the average color of each section, which would be difficult in my opinion: you would need to alter each ad for each video timestamp individually.
Yes, that could be an alternative to computing hashes, I don’t know what option would be less resource intensive