I've progressed a bit, here is what I could do so far. Thank you all for your advices and tips, it really helped me!:
Desktop graphics sonification using ManyCam:
The patch uses ManyCam (or similar software, tip above by @Johnny Mauser thanks again!). The desktop is fed to the GEM. Mouse position is extracted and it's position used to sonify pixels under mouse pointer.
The patch still needs a lot of work, especially to tune the waveforms for the individual colors (RGB). There are some other things like clipping output etc. that need to be fixed. I need to move on with my research project right now, but I'll post the updated version whenever they'll be ready.
Y position is incorrectly reflected to the GEM window, X position is ok. I need to dig into this if this is gemmouse issue or something else.
Sound clipping, not much time to fix this right know.
Lots of other small things
White pixels sound is too loud and clips the output (max amplitude for all color synths, this probably needs to be adjusted or somehow limited maybe by expression that will reduce the output if all amplitudes are max.