Passing a list of values to fragment shader

I want to send a list of values into a fragment shader. It is a possibly large (couple of thousand items long) list of single precision floats. The fragment shader needs random access to this list and I want to refresh the values from the CPU on each frame.

I'm considering my options on how this could be done:

  1. As a uniform variable of array type ("uniform float x[10];"). But there seems to be limits here, on my GPU sending more than a few hundred values is very slow and also I'd have to hard-code the upper limit in the shader when I'd rather would like to change that in runtime.

  2. As a texture with height 1 and width of my list, then refresh the data using glCopyTexSubImage2D.

  3. Other methods? I haven't kept up with all the changes in the GL-specification lately, perhaps there is some other method that is specifically designed for this purpose?

43827 次浏览

One way would be to use uniform arrays like you mention. Another way to do it, is to use a 1D "texture". Look for GL_TEXTURE_1D and glTexImage1D. I personally prefer this way as you don't need to hardcode the size of the array in the shader code as you said, and opengl already has built-in functions for uploading/accessing 1D data on the GPU.

I'd say probably not number 1.. you have a limited number of registers for shader uniforms, which varies by card. You can query GL_MAX_FRAGMENT_UNIFORM_COMPONENTS to find out your limit. On newer cards it runs into the thousands, e.g. a Quadro FX 5500 has 2048, apparently. (http://www.nvnews.net/vbulletin/showthread.php?t=85925). It depends what hardware you want it to run on, and what other uniforms you might want to send to the shader too.

Number 2 could be made to work depending on your requirements. Sorry for the vagueness here, hopefully someone else can give you a more precise answer, but you must be explicit in how many texture calls you make in older shader model cards. It also depends how many texture reads you want to do per fragment, you probably wouldn't want to be trying to read 1000's of elements per fragment, again, depending on your shader model and performance requirements. You could pack values into RGBAs of a texture, giving you 4 reads per texture call, but with random access as a requirement, this might not help you.

I'm not sure about number 3, but I'd suggest maybe looking at UAV (unordered access views) although i think that is DirectX only, with no decent openGL equivalent. I think there's an nVidia extension for openGL, but again you then restrict yourself to a pretty strict minimum spec.

It's unlikely that passing 1000's of items of data to your fragment shader is the best solution to your problem.. perhaps if you gave more details on what you're trying to achieve you may get alternative suggestions?

This sounds like a nice use case for texture buffer objects. These don't have much to do with regular textures and basically allow you to access a buffer object's memory in a shader as a simple linear array. They are similar to 1D textures, but are not filtered and only accessed by an integer index, which sounds like what you need to do when you call it a list of values. And they also support much larger sizes than 1D textures. For updating it you can then use the standard buffer object methods (glBufferData, glMapBuffer, ...).

But on the other hand they require GL3/DX10 hardware to use and have even been made core in OpenGL 3.1, I think. If your hardware/driver doesn't support it, then your 2nd solution would be the method of choice, but rather use a 1D texture than a width x 1 2D texture). In this case you can also use a non-flat 2D texture and some index magic to support lists larger than the maximum texture size.

But texture buffers are the perfect match for your problem, I think. For more exact insight you might also look into the corresponding extension specification.

EDIT: In response to Nicol's comment about uniform buffer objects, you can also look here for a little comparison of the two. I still tend to TBOs, but cannot really reason why, only because I see it a better fit conceptually. But maybe Nicol can provide an anwer with some more insight into the matter.

There are currently 4 ways to do this: standard 1D textures, buffer textures, uniform buffers, and shader storage buffers.

1D Textures

With this method, you use glTex(Sub)Image1D to fill a 1D texture with your data. Since your data is just an array of floats, your image format should be GL_R32F. You then access it in the shader with a simple texelFetch call. texelFetch takes texel coordinates (hence the name), and it shuts off all filtering. So you get exactly one texel.

Note: texelFetch is 3.0+. If you want to use prior GL versions, you will need to pass the size to the shader and normalize the texture coordinate manually.

The main advantages here are compatibility and compactness. This will work on GL 2.1 hardware (using the notation). And you don't have to use GL_R32F formats; you could use GL_R16F half-floats. Or GL_R8 if your data is reasonable for a normalized byte. Size can mean a lot for overall performance.

The main disadvantage is the size limitation. You are limited to having a 1D texture of the max texture size. On GL 3.x-class hardware, this will be around 8,192, but is guaranteed to be no less than 4,096.

Uniform Buffer Objects

The way this works is that you declare a uniform block in your shader:

layout(std140) uniform MyBlock
{
float myDataArray[size];
};

You then access that data in the shader just like an array.

Back in C/C++/etc code, you create a buffer object and fill it with floating-point data. Then, you can associate that buffer object with the MyBlock uniform block. More details can be found here.

The principle advantages of this technique are speed and semantics. Speed is due to how implementations treat uniform buffers compared to textures. Texture fetches are global memory accesses. Uniform buffer accesses are generally not; the uniform buffer data is usually loaded into the shader when the shader is initialized upon its use in rendering. From there, it is a local access, which is much faster.

Semantically, this is better because it isn't just a flat array. For your specific needs, if all you need is a float[], that doesn't matter. But if you have a more complex data structure, the semantics can be important. For example, consider an array of lights. Lights have a position and a color. If you use a texture, your code to get the position and color for a particular light looks like this:

vec4 position = texelFetch(myDataArray, 2*index);
vec4 color = texelFetch(myDataArray, 2*index + 1);

With uniform buffers, it looks just like any other uniform access. You have named members that can be called position and color. So all the semantic information is there; it's easier to understand what's going on.

There are size limitations for this as well. OpenGL requires that implementations provide at least 16,384 bytes for the maximum size of uniform blocks. Which means, for float arrays, you get only 4,096 elements. Note again that this is the minimum required from implementations; some hardware can offer much larger buffers. AMD provides 65,536 on their DX10-class hardware, for example.

Buffer Textures

These are kind of a "super 1D texture". They effectively allow you to access a buffer object from a texture unit. Though they are one-dimensional, they are not 1D textures.

You can only use them from GL 3.0 or above. And you can only access them via the texelFetch function.

The main advantage here is size. Buffer textures can generally be pretty gigantic. While the spec is generally conservative, mandating at least 65,536 bytes for buffer textures, most GL implementations allow them to range in the megabytes in size. Indeed, usually the maximum size is limited by the GPU memory available, not hardware limits.

Also, buffer textures are stored in buffer objects, not the more opaque texture objects like 1D textures. This means you can use some buffer object streaming techniques to update them.

The main disadvantage here is performance, just like with 1D textures. Buffer textures probably won't be any slower than 1D textures, but they won't be as fast as UBOs either. If you're just pulling one float from them, it shouldn't be a concern. But if you're pulling lots of data from them, consider using a UBO instead.

Shader Storage Buffer Objects

OpenGL 4.3 provides another way to handle this: shader storage buffers. They're a lot like uniform buffers; you specify them using syntax almost identical to that of uniform blocks. The principle difference is that you can write to them. Obviously that's not useful for your needs, but there are other differences.

Shader storage buffers are, conceptually speaking, an alternate form of buffer texture. Thus, the size limits for shader storage buffers are a lot larger than for uniform buffers. The OpenGL minimum for the max UBO size is 16KB. The OpenGL minimum for the max SSBO size is 16MB. So if you have the hardware, they're an interesting alternative to UBOs.

Just be sure to declare them as readonly, since you're not writing to them.

The potential disadvantage here is performance again, relative to UBOs. SSBOs work like an image load/store operation through buffer textures. Basically, it's (very nice) syntactic sugar around an imageBuffer image type. As such, reads from these will likely perform at the speed of reads from a readonly imageBuffer.

Whether reading via image load/store through buffer images is faster or slower than buffer textures is unclear at this point.

Another potential issue is that you must abide by the rules for non-synchronous memory access. These are complex and can very easily trip you up.