X11 has an extension called XComposite, which are often used by compositors, but these features can also be used in regular programs for getting direct access to the opengl texture associated with an X11 window. This is done using the XCompositeRedirectWindow, XCompositeNameWindowPixmap, glXCreatePixmap and glXBindTexImageEXT functions. See window-texture for a code example or for simple functions to use in your program.

I’ve used this feature to make the fastest fully GPU accelerated screen (window) recorder on linux. It is similar to Nvidia Shadowplay in performance and unlike shadowplay, it’s an userspace program and it can be changed to work with AMD and Intel as well.

A common question I get is how this screen recorder is different from using OBS studio or FFMPEG with NVENC. I looked through both the source of OBS and FFMPEG and they both use the XComposite functions in the same way I do but they copy the pixels from the opengl texture to CPU and then send the pixel data to the GPU for encoding. So the data goes from GPU -> CPU -> GPU. Copying the pixel data to the CPU is unecessary and can be very slow on some hardware. To copy the opengl texture data from the GPU to the GPU you have to use CUDA (on Nvidia). This is done using the cuda functions cuGraphicsGLRegisterImage and cuMemcpy2D (among others).

The difference in performance is huge. I tested OBS against my screen recorder in CEMU, playing Legend of Zelda: Breath of the wild at 4k resolution with fps locked to 30 fps. When using OBS my fps dropped from 30 to 7 while using my screen recorder, the fps remained at 30. Here is an example recording using my screen recorder, recording at 4k 30 fps:

Hardware used to record the video:

  • Intel i5 4690k
  • Nvidia Geforce GTX 1080
  • 16 GB RAM

Other projects that I have that uses these XComposite functions are a vr video player (for stereoscopic and regular video) and a vr window manager