Don't think there's a turn-key high performance solution available for this (yet). 
Certainly doable. Will need some coding to continuously capture what is displayed on screen and then hook up the Eye Tribe API and draw the gaze-coordinates as an overlay on the game. 
Fairly straightforward for a professional coder to do if you can settle for low resolution and/or low framerate (gets tricky above 20 fps and 1080p). I guess in this case you'd want to stream the video feed close to real-time as well.. Maybe possible to do with the 
Twitch API and a semi-transparent overlay?