An Introduction to shader derivative functions（翻译）

Partial difference derivative functions (ddx and ddy in HLSL[a], dFdx and dFdy in GLSL[b]) (in the rest of this article I will use both terms according to the code examples I will provide) are fragment shader instructions wich can be used to compute the rate of variation of any value with respect to the screen-space coordinates.

Derivatives computation

During triangles rasterization, GPUs run many instances of a fragment shader at a time organizing them in blocks of 2×2 pixels. Derivatives are calculated by taking differences between the pixel values in a block; dFdx subtracts the values of the pixels on the left side of the block from the values on the right side, and dFdy subtracts the values of the bottom pixels from the top ones. See the image below where the grid represents the rendered screen pixels and dFdx, dFdy expressions are provided for the generic value p evaluated by the fragment shader instance at (x, y) screen coordinates and belonging to the 2×2 block highlighted in red.

Derivatives can be evaluated for every variable in a fragment shader. For vector and matrix types, derivatives are computed element-wise.

Derivatives functions are fundamental for texture mipmaps implementation and are very useful in a series of algorithms and effects, in particular when there is some kind of dependence on screen space coordinates (for example when rendering wireframe edges with uniform screen pixel thickness).

Derivatives and mipmaps

Mipmaps are pre-computed sequences of images obtained by filtering down a texture into smaller sizes (each mipmap level is two times smaller than the previous). They are used to avoid aliasing artifacts when minifying a texture.

Mipmap是通过将纹理过滤成更小的尺寸（每个mipmap级别比前一个小两倍）而获得的预先计算的图像序列。 它们用于在最小化纹理时避免 反褶假影（aliasing artifacts）

Mipmapping is also important for texture cache coherence, since it enforces a near-one texel to pixel ratio: when traversing a triangle, each new pixel represents a step in texture space of one texel at most. Mipmapping is one of the few cases in rendering where a technique improves both visuals and performance.

Mipmapping对于纹理缓存的一致性也很重要，因为它强制实现接近1的像素与像素的比率：遍历三角形时，每个新像素最多代表一个纹素在纹理空间中的步进。Mipmapping是为数不多的能同时改善视觉效果和性能的渲染技术的一种。

Derivatives are used during texture sampling to select the best mipmap level. The rate of variation of the texture coordinates with respect to the screen coordinates is used to choose a mipmap; the larger the derivatives, the greater the mipmap level (and the lesser the mipmap size).

Derivatives can be used to compute the current triangle’s face normal in a fragment shader. The horizontal and vertical derivatives of the current fragment’s world-position are two vectors laying in the triangle’s surface. Their cross product is a vector orthogonal to the surface and its norm is the triangle’s normal vector (see the 3d model below). Particular attention must be paid to the ordering of the cross product: being the OpenGL coordinate system left-handed (at least when working in window space which is the context where the fragment shader works) and being the horizontal derivative vector always oriented right and the vertical down, the ordering of the cross product to obtain a normal vector oriented toward the camera is horizontal x vertical (more about cross products and basis orientations in this article). The interactive model below shows the link between screen pixels and fragmets over a triangle surface being rasterized, the derivative vectors on the surface (in red and green), and the normal vector (in blue) obtained by the cross product of the twos.

Here is a GLSL code line to compute a flat normal given the fragment position pos in camera space:

1normalize( cross(dFdx(pos), dFdy(pos)) );


And below there is a complete pocket.gl demo with a vertex and fragment shader at work on an Utah Teapot. You can toggle the flat shader using the Flat shaded checkbox.

 1varying vec3 normalInterp;
2varying vec3 pos;
3
4void main(){
5    gl_Position = projectionMatrix * modelViewMatrix * vec4(position, 1.0);
6    vec4 pos4 = modelViewMatrix * vec4(position, 1.0);
7
8    normalInterp = normalMatrix * normal;
9    pos = vec3(pos4) / pos4.w;
10}


 1precision mediump float;
2
3varying vec3 pos;
4varying vec3 normalInterp;
5
7
8const vec3 lightPos     = vec3(200,60,100);
9const vec3 ambientColor = vec3(0.2, 0.0, 0.0);
10const vec3 diffuseColor = vec3(0.5, 0.0, 0.0);
11const vec3 specColor    = vec3(1.0, 1.0, 1.0);
12
13void main() {
14    vec3 normal = mix(normalize(normalInterp),
15        normalize(cross(dFdx(pos), dFdy(pos))), bFlat);
16    vec3 lightDir = normalize(lightPos - pos);
17
18    float lambertian = max(dot(lightDir,normal), 0.0);
19    float specular = 0.0;
20
21    if(lambertian > 0.0) {
22        vec3 viewDir = normalize(-pos);
23        vec3 halfDir = normalize(lightDir + viewDir);
24        float specAngle = max(dot(halfDir, normal), 0.0);
25        specular = pow(specAngle, 16.0);
26    }
27
28    gl_FragColor = vec4(ambientColor +
29    lambertian * diffuseColor + specular * specColor, 1.0);
30}


Derivatives and branches

Derivatives computation is based on the parallel execution on the GPU’s hardware of multiple instances of a shader. Scalar operations are executed with a SIMD (Single Instruction Multiple Data) architecture on registers containing a vector of 4 values for a block of 2×2 pixels. This means that at every step of execution, the shader instances belonging to each 2×2 block are synchronized making derivative computation fast and easy to implement in hardware, being a simple subtraction of values contained in the same register.

But what happens in the case of a conditional branch? In this case, if not all of the threads in a core take the same branch, there is a divergence in the code execution. In the image below an example of divergence is shown: a conditional branch execution in a GPU core with 8 shader instances. Three instances take the first branch (yellow). During the yellow branch execution the other 5 instances are inactive (an execution bitmask is used to activate/deactivate execution). After the yellow branch, the execution mask is inverted and the blue branch is executed by the remaining 5 instances.

In addition to the efficiency and performance loss of the branch, the divergence is breaking the synchronization between the pixels in a block making derivatives operations undefined. This is a problem for texture sampling which needs derivatives for mipmap level selection, anisotropic filtering, etc. When facing such a problem, a shader compiler could flatten the branch (thus avoiding it) or try to rearrange the code moving texture reads outside of the branch control flow. This problem can be avoided by using explicit derivatives or mipmap level when sampling a texture.

Below you can see a HLSL branching experiment written in UE4 using a custom expression node.

Here is the shader code I’m using in the previous example:

 1float tmp = 10000;
2float3 color;
3
4[branch]
5if(xpos > side)
6{
7    tmp = xpos * xpos;
8    float dx = ddx(tmp);
9    color = float3(dx, 0, 0);
10}
11else
12{
13    tmp = xpos * xpos;
14    float dx = ddx(tmp);
15    color = float3(0, dx, 0);
16}
17
18return color * 100;


The purpose of this experiment is to see what happens when derivatives are used inside a divergent block. Suppose that the code above be executed on a GPU core. When a subset of the pixels in a block enters the first branch, the value of tmp for the inactive pixels waiting for the second branch execution should be still 10000. So the ddx function should give a spike for some pixels on divergent blocks. Note the [branch] attribute before the if to force branching using control flow instructions.

As you can see in the picture above, the compiler gives the following error for that piece of code: “cannot have divergent gradient operations inside flow control“, but when the [branch] attribute is removed, the code compiles fine but no spikes are visible during rendering, meaning that the branch has been flattened.

Revealing the block aligning of derivatives

Here is a simple experiment that reveals the inner block alignment of shader derivatives. Look at the following pocket.gl sandbox.

 1uniform vec2 resolution;
2
3uniform float odd_step;
4uniform float show_derivative;
5
6void main() {
7    // center_x is at center x snapped to the nearest even position
8    float center_x = floor(resolution.x / 4.0) * 2.0;
9
10    // snap center_x to an odd number if odd_step is 1
11    center_x += odd_step;
12
13    // Step function is 0 when p.x < step_pos, 1 when p.x >= step_pos
14    float step = ceil(clamp((gl_FragCoord.x - center_x) / resolution.x, 0.0, 1.0));
15
16    // The alpha variable is used to select one of two colors
17    float alpha = show_derivative == 1.0 ? dFdx(step) : step;
18
19    vec3 color = mix(vec3(0.96, 0.96, 0.68), vec3(0.68, 0.1, 0.1), alpha);
20
21    gl_FragColor = vec4(color, 1.0);
22}


The above shader implements a step function over the x axis. We want to compute its derivative. The derivative of a step function would be a Dirac delta function in the continuous domain, but in the shader’s discrete domain the delta function will be equal to 1 when the step jumps from 0 to 1, and 0 elsewhere. Select the Show Derivative checkbox and toggle the Step on odd pix checkbox to snap the Step position to an even (unchecked) or an odd (checked) pixel at the center of the viewport; you’ll see how dFdx(step) changes when moving the transition point from an even to an odd pixel.

Because the derivative computation is performed over blocks of 2×2 pixels, we should expect two different results depending on where the step transition occurs:

• Case 1. If the step transition falls in the middle of a 2×2 block of pixels, we’ll see a vertical line with 2 pixel thickness (the derivative is equal to 1 for each pixel in the 2×2 block, hence the 2 pixel thickness). This happens when the step falls on an odd pixel.
• Case 2. The step transition falls in the middle of two neighbouring 2×2 blocks of pixels. In this case we won’t see any vertical line because both the blocks will compute a derivative equal to 0. This happens when the step falls on an even pixel.
• 情况1：如果阶跃过渡位于2×2像素块的中间，我们将看到一条垂直线，具有2个像素的厚度（对于2×2块中的每个像素，导数等于1，因此2像素厚度）当台阶落在奇数像素上时会发生这种情况。
• 情况2：阶跃过渡落在两个相邻的2×2像素块的中间。在这种情况下，我们看不到任何垂直线，因为两个块都将计算出等于0的导数。这种情况发生在台阶落在偶数像素上时。

As an exercise, try to modify the shader code of the above sandbox in order to show an horizontal step function and an horizontal derivative line.

These aliasing artifacts are caused by the subsampling due to the hardware per-block computation of derivatives; horizontal derivatives have full vertical and half horizontal resolution, vertical derivatives have full horizontal and half vertical resolution.