Accessing X11 window texture without copying
X11 has an extension called XComposite, which are often used by compositors,
but these features can also be used in regular programs for getting direct access
to the opengl texture associated with an X11 window.
This is done using the XCompositeRedirectWindow
, XCompositeNameWindowPixmap
,
glXCreatePixmap
and glXBindTexImageEXT
functions.
See window-texture for a code example
or for simple functions to use in your program.
I’ve used this feature to make the fastest fully GPU accelerated screen (window) recorder on linux. It is similar to Nvidia Shadowplay in performance and unlike shadowplay, it’s an userspace program and it can be changed to work with AMD and Intel as well.
A common question I get is how this screen recorder is different from using OBS studio
or FFMPEG with NVENC. I looked through both the source of OBS and FFMPEG and they both
use the XComposite functions in the same way I do but they copy the pixels from the opengl texture
to CPU and then send the pixel data to the GPU for encoding. So the data goes from GPU -> CPU -> GPU.
Copying the pixel data to the CPU is unecessary and can be very slow on some hardware.
To copy the opengl texture data from the GPU to the GPU you have to use CUDA (on Nvidia). This is done using the cuda functions cuGraphicsGLRegisterImage
and cuMemcpy2D
(among others).
Note: using cuGraphicsGLRegisterImage within an EGL context is broken if the environment variable __GL_THREADED_OPTIMIZATIONS
is set to 1
.
A workaround for this issue is to programatically set __GL_THREADED_OPTIMIZATIONS
to 0
in the program with setenv
.
The difference in performance is huge. I tested OBS against my screen recorder in CEMU, playing Legend of Zelda: Breath of the wild at 4k resolution with fps locked to 30 fps. When using OBS my fps dropped from 30 to 7 while using my screen recorder, the fps remained at 30. Here is an example recording using my screen recorder, recording at 4k 30 fps:
Hardware used to record the video:- Intel i5 4690k
- Nvidia Geforce GTX 1080
- 16 GB RAM
Other projects that I have that uses these XComposite functions are a vr video player (for stereoscopic and regular video) and a vr window manager