<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator><link href="https://willpgfx.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://willpgfx.com/" rel="alternate" type="text/html" /><updated>2024-09-22T23:11:05-04:00</updated><id>https://willpgfx.com/feed.xml</id><title type="html">WillP GFX</title><subtitle>Code, rendering, other</subtitle><author><name>Will Pearce</name></author><entry><title type="html">GPU Ray Tracing in an Afternoon</title><link href="https://willpgfx.com/2019/10/gpu-ray-tracing-in-an-afternoon/" rel="alternate" type="text/html" title="GPU Ray Tracing in an Afternoon" /><published>2019-10-20T00:00:00-04:00</published><updated>2019-10-20T00:00:00-04:00</updated><id>https://willpgfx.com/2019/10/gpu-ray-tracing-in-an-afternoon</id><content type="html" xml:base="https://willpgfx.com/2019/10/gpu-ray-tracing-in-an-afternoon/"><![CDATA[<aside class="sidebar__right">
<nav class="toc">
    <header><h4 class="nav__title"><i class="fas fa-file-alt"></i> On This Page</h4></header>
<ul class="toc__menu" id="markdown-toc">
  <li><a href="#motivation" id="markdown-toc-motivation">Motivation</a></li>
  <li><a href="#in-an-afternoon" id="markdown-toc-in-an-afternoon">“In an Afternoon”</a></li>
  <li><a href="#setup" id="markdown-toc-setup">Setup</a></li>
  <li><a href="#adaptations" id="markdown-toc-adaptations">Adaptations</a>    <ul>
      <li><a href="#inheritance-and-polymorphism" id="markdown-toc-inheritance-and-polymorphism">Inheritance and Polymorphism</a></li>
      <li><a href="#random-numbers" id="markdown-toc-random-numbers">Random Numbers</a></li>
      <li><a href="#anti-aliasing" id="markdown-toc-anti-aliasing">Anti-aliasing</a>        <ul>
          <li><a href="#progressive-path-tracing" id="markdown-toc-progressive-path-tracing">Progressive Path Tracing</a></li>
        </ul>
      </li>
      <li><a href="#recursion" id="markdown-toc-recursion">Recursion</a></li>
      <li><a href="#scene-representation" id="markdown-toc-scene-representation">Scene Representation</a></li>
    </ul>
  </li>
  <li><a href="#the-code" id="markdown-toc-the-code">The Code</a>    <ul>
      <li><a href="#obtaining" id="markdown-toc-obtaining">Obtaining</a></li>
      <li><a href="#running" id="markdown-toc-running">Running</a>        <ul>
          <li><a href="#shadertoy" id="markdown-toc-shadertoy">ShaderToy</a></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="#results" id="markdown-toc-results">Results</a></li>
  <li><a href="#references" id="markdown-toc-references">References</a></li>
</ul>

  </nav>
</aside>

<p>It seems to have become something of a rite of passage nowadays for those interested in graphics programming to work their way through Peter Shirley’s excellent <em>Ray Tracing in One Weekend</em>, along the way experiencing firsthand the “aha” moments and the “it’s <strong><em>that</em></strong> simple!?” realizations. With its companions <em>Ray Tracing: The Next Week</em> and <em>Ray Tracing: The Rest of Your Life</em>, the book walks the reader through building a straightforward ray tracing implementation from scratch.</p>

<p>While the book presents all of its code in C++, there have been countless others who have translated its content to other languages. Indeed, I am far from the first to attempt implementing the book on the GPU, but enjoyed the undertaking and decided to share my experience in doing so.</p>

<p>The books are available for a very reasonable $2.99 each on Amazon, and are freely (and legally) available <a href="https://raytracing.github.io/">here</a>.</p>

<h1 id="motivation">Motivation</h1>

<p>Enjoyable challenge aside, the main motivation for moving this task to the GPU is speed. Ray tracing happens to fall into a category of problems sometimes referred to as “<a href="https://en.wikipedia.org/wiki/Embarrassingly_parallel">embarrassingly parallel</a>”. Specifically, each individual pixel can be rendered separately without knowledge of other pixels being rendered alongside it. This is just the type of work a GPU is built for and will happily churn through with relative ease. A render that could take several minutes, if not hours, to create a passable image at low- to medium- resolution on a CPU, can produce a higher-quality result in a fraction of the time on a GPU.</p>

<h1 id="in-an-afternoon">“In an Afternoon”</h1>

<p>The goal of this post is to summarize the changes involved to move the CPU-based implementation to run entirely on the GPU. It is assumed the reader is familiar with Shirley’s book, and has read through and followed along implementing the CPU-based ray tracer. The content here, then, should be largely review and it is not the goal to reintroduce concepts that are covered by the book.</p>

<p>It is recommended that the reader first work their way through the book’s implementation in order to gain a firm conceptual grasp before tackling a GPU-based implementation.</p>

<p>From start to finish, it took me only a few hours to implement the version of the book presented below, including searching for things like random number functions that aren’t immediately available GPU-side.</p>

<h1 id="setup">Setup</h1>

<p>I used Visual Studio Code with the following plugins while implementing the ray tracer:</p>

<p><a href="https://marketplace.visualstudio.com/items?itemName=stevensona.shader-toy">Shader Toy by Adam Stevenson</a></p>

<p><a href="https://marketplace.visualstudio.com/items?itemName=slevesque.shader">Shader languages support for VS Code by slevesque</a></p>

<p>The ray tracer here is written in GLSL in order to take advantage of the Shader Toy plugin and get immediate feedback while working through the individual chapter implementations.</p>

<h1 id="adaptations">Adaptations</h1>

<h2 id="inheritance-and-polymorphism">Inheritance and Polymorphism</h2>

<p>In the book’s implementation, Shirley takes advantage of C++’s concepts of inheritance and polymorphism to present a clean and simple interface for hittable objects and materials to implement. This serves to simplify much of the implementation, as the implementer can then rely on the correct method being executed based on the underlying type.</p>

<p>For example:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">hittable</span>
<span class="p">{</span>
<span class="nl">public:</span>
    <span class="k">virtual</span> <span class="kt">bool</span> <span class="n">hit</span><span class="p">(</span><span class="k">const</span> <span class="n">ray</span><span class="o">&amp;</span> <span class="n">r</span><span class="p">,</span> <span class="kt">float</span> <span class="n">tmin</span><span class="p">,</span> <span class="kt">float</span> <span class="n">tmax</span><span class="p">,</span> <span class="n">hit_record</span><span class="o">&amp;</span> <span class="n">rec</span><span class="p">)</span> <span class="k">const</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">class</span> <span class="nc">sphere</span> <span class="o">:</span> <span class="k">public</span> <span class="n">hittable</span>
<span class="p">{</span>
<span class="nl">public:</span>
    <span class="kt">bool</span> <span class="n">hit</span><span class="p">(</span><span class="k">const</span> <span class="n">ray</span><span class="o">&amp;</span> <span class="n">r</span><span class="p">,</span> <span class="kt">float</span> <span class="n">tmin</span><span class="p">,</span> <span class="kt">float</span> <span class="n">tmax</span><span class="p">,</span> <span class="n">hit_record</span><span class="o">&amp;</span> <span class="n">rec</span><span class="p">)</span> <span class="k">const</span> <span class="k">override</span>
    <span class="p">{</span>
        <span class="c1">// sphere-specific hit implentation</span>
    <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<p>Since these commodities are not available on the GPU side, we can instead employ type identifiers when it’s necessary to be able to perform an action in a polymorphic way.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// MT_ material type</span>
<span class="cp">#define MT_DIFFUSE 0
#define MT_METAL 1
#define MT_DIALECTRIC 2
</span>
<span class="k">struct</span> <span class="n">Material</span>
<span class="p">{</span>
    <span class="kt">int</span> <span class="n">type</span><span class="p">;</span>
    <span class="kt">vec3</span> <span class="n">albedo</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">roughness</span><span class="p">;</span>    <span class="c1">// controls roughness for metals</span>
    <span class="kt">float</span> <span class="n">refIdx</span><span class="p">;</span>       <span class="c1">// index of refraction for dialectric</span>
<span class="p">};</span>

<span class="kt">bool</span> <span class="nf">scatter</span><span class="p">(</span><span class="n">Ray</span> <span class="n">rIn</span><span class="p">,</span> <span class="n">HitRecord</span> <span class="n">rec</span><span class="p">,</span> <span class="k">out</span> <span class="kt">vec3</span> <span class="n">atten</span><span class="p">,</span> <span class="k">out</span> <span class="n">Ray</span> <span class="n">rScattered</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span><span class="p">(</span><span class="n">rec</span><span class="p">.</span><span class="n">material</span><span class="p">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MT_DIFFUSE</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// ... diffuse scatter response</span>
        <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">if</span><span class="p">(</span><span class="n">rec</span><span class="p">.</span><span class="n">material</span><span class="p">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MT_METAL</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// ... metal scatter response</span>
        <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">if</span><span class="p">(</span><span class="n">rec</span><span class="p">.</span><span class="n">material</span><span class="p">.</span><span class="n">type</span> <span class="o">==</span> <span class="n">MT_DIALECTRIC</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// ... dialectric scatter response</span>
        <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="random-numbers">Random Numbers</h2>

<p>C’s and C++’s random number utilities make random number generation CPU-side rather straightforward (we won’t discuss the merits of how “good” the results are here).</p>

<p>C++ provides the <code class="language-plaintext highlighter-rouge">&lt;random&gt;</code> header, and the types and functions therein can be used to easily generate random numbers in a given range. For our purposes, we’re specifically interested in generating numbers between 0 and 1.</p>

<p>While searching for an easy and “good enough” implementation of GPU-based pseudo-random numbers, I stumbled upon the following ShaderToy implementation by <a href="https://www.shadertoy.com/view/XlycWh">Reinder Nijhoff</a> who adapted the hash functions by <a href="https://www.shadertoy.com/view/Xt3cDn">nimitz here</a>. I also adopted the <code class="language-plaintext highlighter-rouge">random_in_unit_disk</code> and <code class="language-plaintext highlighter-rouge">random_in_unit_sphere</code> functions provided by Nijhoff. As mentioned in the introduction, I’m hardly the first to undertake this task.</p>

<h2 id="anti-aliasing">Anti-aliasing</h2>

<p>Chapter 7 of the book adds anti-aliasing to the ray tracer. At its very basics, this means taking multiple subsamples at a given pixel and averaging the results together. For my implementation of this chapter, I went with an absurdly simple box filter based implementation.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">vec2</span> <span class="n">rcpRes</span> <span class="o">=</span> <span class="kt">vec2</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span> <span class="o">/</span> <span class="n">iResolution</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
<span class="kt">vec3</span> <span class="n">col</span> <span class="o">=</span> <span class="kt">vec3</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">numSamples</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">rcpNumSamples</span> <span class="o">=</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span> <span class="o">/</span> <span class="kt">float</span><span class="p">(</span><span class="n">numSamples</span><span class="p">);</span>
<span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">x</span> <span class="o">&lt;</span> <span class="n">numSamples</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">y</span> <span class="o">&lt;</span> <span class="n">numSamples</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="kt">vec2</span> <span class="n">adj</span> <span class="o">=</span> <span class="kt">vec2</span><span class="p">(</span><span class="kt">float</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="kt">float</span><span class="p">(</span><span class="n">y</span><span class="p">));</span>
        <span class="kt">vec2</span> <span class="n">uv</span> <span class="o">=</span> <span class="p">(</span><span class="nb">gl_FragCoord</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">adj</span> <span class="o">*</span> <span class="n">rcpNumSamples</span><span class="p">)</span> <span class="o">*</span> <span class="n">rcpRes</span><span class="p">;</span>
        <span class="n">col</span> <span class="o">+=</span> <span class="n">color</span><span class="p">(</span><span class="n">getRay</span><span class="p">(</span><span class="n">cam</span><span class="p">,</span> <span class="n">uv</span><span class="p">));</span>
    <span class="p">}</span>
<span class="p">}</span>
<span class="n">col</span> <span class="o">/=</span> <span class="kt">float</span><span class="p">(</span><span class="n">numSamples</span> <span class="o">*</span> <span class="n">numSamples</span><span class="p">);</span>
</code></pre></div></div>

<p>This works well enough to confirm that multiple subsamples averaged together make a cleaner image, but it’s limited in a number of ways, not the least of which is that the highly regular sampling pattern yields diminishing returns from additional samples.</p>

<h3 id="progressive-path-tracing">Progressive Path Tracing</h3>

<p>Starting with chapter 8, which introduces diffuse materials and kicks off what most would deem the more visually interesting portion of the book, I decided that instead of taking multiple samples per frame, I would instead create a feedback loop where all previous results were fed into the current frame. Each frame would use a random offset within the current pixel footprint by way of the noise functions mentioned earlier, and have its result appended to a running average of all previous frames.</p>

<p>The new shader entrypoint looks similar to the following, and is used in every subsequent chapter’s implementation.</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// near the top of the shader - this is described in the Shader Toy plugin's documentation</span>
<span class="c1">// sets the previous frame's result as a texture input to the current frame</span>
<span class="cp">#iChannel0 "self"
</span></code></pre></div></div>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">vec2</span> <span class="n">uv</span> <span class="o">=</span> <span class="nb">gl_FragCoord</span><span class="p">.</span><span class="n">xy</span> <span class="o">/</span> <span class="n">iResolution</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
<span class="kt">vec4</span> <span class="n">prev</span> <span class="o">=</span> <span class="n">texture</span><span class="p">(</span><span class="n">iChannel0</span><span class="p">,</span> <span class="n">uv</span><span class="p">);</span>
<span class="kt">vec3</span> <span class="n">prevLinear</span> <span class="o">=</span> <span class="n">toLinear</span><span class="p">(</span><span class="n">prev</span><span class="p">.</span><span class="n">xyz</span><span class="p">);</span>
<span class="n">prevLinear</span> <span class="o">*=</span> <span class="n">prev</span><span class="p">.</span><span class="n">w</span><span class="p">;</span>

<span class="n">uv</span> <span class="o">=</span> <span class="p">(</span><span class="nb">gl_FragCoord</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">hash2</span><span class="p">(</span><span class="n">gSeed</span><span class="p">))</span> <span class="o">/</span> <span class="n">iResolution</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
<span class="kt">vec3</span> <span class="n">col</span> <span class="o">=</span> <span class="n">color</span><span class="p">(</span><span class="n">getRay</span><span class="p">(</span><span class="n">cam</span><span class="p">,</span> <span class="n">uv</span><span class="p">));</span>

<span class="k">if</span><span class="p">(</span><span class="n">iMouseButton</span><span class="p">.</span><span class="n">x</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span> <span class="o">||</span> <span class="n">iMouseButton</span><span class="p">.</span><span class="n">y</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">col</span> <span class="o">=</span> <span class="n">toGamma</span><span class="p">(</span><span class="n">col</span><span class="p">);</span>
    <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">col</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span><span class="p">(</span><span class="n">prev</span><span class="p">.</span><span class="n">w</span> <span class="o">&gt;</span> <span class="mi">5000</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
    <span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="n">prev</span><span class="p">;</span>
    <span class="k">return</span><span class="p">;</span>
<span class="p">}</span>

<span class="n">col</span> <span class="o">=</span> <span class="p">(</span><span class="n">col</span> <span class="o">+</span> <span class="n">prevLinear</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">w</span> <span class="o">=</span> <span class="n">prev</span><span class="p">.</span><span class="n">w</span> <span class="o">+</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">;</span>
<span class="n">col</span> <span class="o">/=</span> <span class="n">w</span><span class="p">;</span>
<span class="n">col</span> <span class="o">=</span> <span class="n">toGamma</span><span class="p">(</span><span class="n">col</span><span class="p">);</span>
<span class="nb">gl_FragColor</span> <span class="o">=</span> <span class="kt">vec4</span><span class="p">(</span><span class="n">col</span><span class="p">,</span> <span class="n">w</span><span class="p">);</span>
</code></pre></div></div>

<p>Since the number of samples is ever-increasing, this approach has the potential to run into floating-point issues when the number of samples in the running average becomes large. If you remove the <code class="language-plaintext highlighter-rouge">if(prev.w &gt; 5000.0)</code> block and allow the ray tracer to run long enough, you’re likely to see little black dots show up in the image. These are caused by values becoming so high in the running average that they are no longer representable as floating-point numbers and end up as <code class="language-plaintext highlighter-rouge">nan</code> or <code class="language-plaintext highlighter-rouge">inf</code>. Capping the number of samples allows for a high quality render and also avoids these issues. The value can be adjusted up or down depending on scene and preference. There are almost certainly more robust ways to solve this issue, but in the spirit of simplicity, that will be outside the scope of this post.</p>

<p>Because of the substantial speedup gained by moving the work to the GPU, I’ve added simple camera controls to the scenes. In cases where the camera moves, the running average is reset in order to prevent the views smearing across each other. This is what the <code class="language-plaintext highlighter-rouge">if(iMouseButton...)</code> check is doing.</p>

<p>Below are two images of the same scene, one taken moments after the render began, the other taken after a few seconds of accumulation.</p>

<figure class="half ">
  
    
      <a href="/assets/images/gpurtioa_post/0_diffuse_progressive_early.png" title="Short Render Time">
          <img src="/assets/images/gpurtioa_post/0_diffuse_progressive_early.png" alt="Short Render Time" />
      </a>
    
  
    
      <a href="/assets/images/gpurtioa_post/1_diffuse_progressive_late.png" title="Longer Render Time">
          <img src="/assets/images/gpurtioa_post/1_diffuse_progressive_late.png" alt="Longer Render Time" />
      </a>
    
  
  
</figure>

<h2 id="recursion">Recursion</h2>

<p>GLSL does not support recursive functions. This limitation is simple enough to overcome by instead using a loop with a capped number of steps.</p>

<p>The CPU implementation below, taken from my own initial follow-along with the book</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Vector3f</span> <span class="nf">color</span><span class="p">(</span><span class="k">const</span> <span class="n">Ray</span><span class="o">&amp;</span> <span class="n">ray</span><span class="p">,</span> <span class="k">const</span> <span class="n">Hitable</span><span class="o">*</span> <span class="n">world</span><span class="p">,</span> <span class="k">const</span> <span class="n">int32</span> <span class="n">depth</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">HitRecord</span> <span class="n">rec</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">world</span><span class="o">-&gt;</span><span class="n">hit</span><span class="p">(</span><span class="n">ray</span><span class="p">,</span> <span class="mf">0.001</span><span class="n">f</span><span class="p">,</span> <span class="n">FLT_MAX</span><span class="p">,</span> <span class="n">rec</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="n">Ray</span> <span class="n">scattered</span><span class="p">;</span>
        <span class="n">Vector3f</span> <span class="n">attenuation</span><span class="p">;</span>

        <span class="k">if</span><span class="p">(</span><span class="n">depth</span> <span class="o">&lt;</span> <span class="mi">50</span> <span class="o">&amp;&amp;</span> <span class="n">rec</span><span class="p">.</span><span class="n">material</span><span class="o">-&gt;</span><span class="n">scatter</span><span class="p">(</span><span class="n">ray</span><span class="p">,</span> <span class="n">rec</span><span class="p">,</span> <span class="n">attenuation</span><span class="p">,</span> <span class="n">scattered</span><span class="p">))</span>
        <span class="p">{</span>
            <span class="k">return</span> <span class="n">attenuation</span> <span class="o">*</span> <span class="n">color</span><span class="p">(</span><span class="n">scattered</span><span class="p">,</span> <span class="n">world</span><span class="p">,</span> <span class="n">depth</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
        <span class="p">}</span>
        <span class="k">else</span>
        <span class="p">{</span>
            <span class="k">return</span> <span class="n">Vector3f</span><span class="p">{</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">};</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">const</span> <span class="n">Vector3f</span> <span class="n">unitDirection</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">ray</span><span class="p">.</span><span class="n">direction</span><span class="p">());</span>
    <span class="k">const</span> <span class="n">float32</span> <span class="n">t</span> <span class="o">=</span> <span class="mf">0.5</span><span class="n">f</span> <span class="o">*</span> <span class="p">(</span><span class="n">unitDirection</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">);</span>
    <span class="k">return</span> <span class="n">Vector3f</span><span class="p">{</span><span class="mf">1.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">}</span> <span class="o">*</span> <span class="p">(</span><span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">t</span><span class="p">)</span> <span class="o">+</span> <span class="n">Vector3f</span><span class="p">{</span><span class="mf">0.5</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.7</span><span class="n">f</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">}</span> <span class="o">*</span> <span class="n">t</span><span class="p">;</span>
<span class="p">}</span>

</code></pre></div></div>

<p>now becomes</p>

<div class="language-glsl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">vec3</span> <span class="nf">color</span><span class="p">(</span><span class="n">Ray</span> <span class="n">r</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">HitRecord</span> <span class="n">rec</span><span class="p">;</span>
    <span class="kt">vec3</span> <span class="n">col</span> <span class="o">=</span> <span class="kt">vec3</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
    <span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">MAX_DEPTH</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">if</span><span class="p">(</span><span class="n">hit_world</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mo">001</span><span class="p">,</span> <span class="mi">10000</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="n">rec</span><span class="p">))</span>
        <span class="p">{</span>
            <span class="n">Ray</span> <span class="n">scatterRay</span><span class="p">;</span>
            <span class="kt">vec3</span> <span class="n">atten</span><span class="p">;</span>
            <span class="k">if</span><span class="p">(</span><span class="n">scatter</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">rec</span><span class="p">,</span> <span class="n">atten</span><span class="p">,</span> <span class="n">scatterRay</span><span class="p">))</span>
            <span class="p">{</span>
                <span class="n">col</span> <span class="o">*=</span> <span class="n">atten</span><span class="p">;</span>
                <span class="n">r</span> <span class="o">=</span> <span class="n">scatterRay</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="k">else</span>
            <span class="p">{</span>
                <span class="k">return</span> <span class="kt">vec3</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>
        <span class="k">else</span>
        <span class="p">{</span>
            <span class="kt">float</span> <span class="n">t</span> <span class="o">=</span> <span class="mi">0</span><span class="p">.</span><span class="mi">5</span> <span class="o">*</span> <span class="p">(</span><span class="n">r</span><span class="p">.</span><span class="n">d</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
            <span class="n">col</span> <span class="o">*=</span> <span class="n">mix</span><span class="p">(</span><span class="kt">vec3</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">),</span> <span class="kt">vec3</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="p">,</span> <span class="mi">0</span><span class="p">.</span><span class="mi">7</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">),</span> <span class="n">t</span><span class="p">);</span>
            <span class="k">return</span> <span class="n">col</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The initial ray is simply overwritten with the result of the ray produced by the scattering event before the next loop iteration.</p>

<h2 id="scene-representation">Scene Representation</h2>

<p>The use of polymorphic types allows certain niceties as described in the section above. One of these niceties is having a <code class="language-plaintext highlighter-rouge">HittableList</code> type that can itself include any number of <code class="language-plaintext highlighter-rouge">Hittable</code> implementers and be traversed quite easily.</p>

<p>Instead of creating a pseudo-polymorphic hittable type as described above for materials, I’ve instead opted for building the scene on each invocation of a <code class="language-plaintext highlighter-rouge">hit_world</code> function that can be implemented freshly in each shader depending on what content is desired. There is perhaps a trade-off here between the memory usage and execution speed that could be worth exploring in the future. For a scene reprensentation built on shader entry and used throughout, the memory requirement would increase, but the cost of re-creating those types and materials during traversal may decrease.</p>

<p>Of course, there’s nothing stopping you from implementing a more flexible <code class="language-plaintext highlighter-rouge">Hittable</code> type, similar to how materials are handled, and having a type identifier decipher which intersection function should be used. For the purposes of this exercise, I found the above approach to be plenty.</p>

<h1 id="the-code">The Code</h1>

<h2 id="obtaining">Obtaining</h2>

<p>All the code associated with this post can be found in the below GitLab repository. The code is structured such that there is one .glsl file for each chapter of the book that produces output. The files are named with a convention of <code class="language-plaintext highlighter-rouge">b#_ch#.glsl</code> where the first number is which book the file comes from, and the second is the chapter in the book. All of the first book in the series is represented, as well as the first chapter in the second book, as it was minimal effort to add a new <code class="language-plaintext highlighter-rouge">MovingSphere</code> type and implement motion blur.</p>

<p>The <code class="language-plaintext highlighter-rouge">common.glsl</code> file contains types, intersection functions, the scattering function, helper functions for creating types with default values, and the all-important noise functions.</p>

<p><a href="https://gitlab.com/willp-public/gpu-ray-tracing-post">Gitlab Repository</a></p>

<h2 id="running">Running</h2>

<p>The simplest way to see the code in action is to follow theses steps:</p>
<ul>
  <li>clone it from the repository</li>
  <li>open a Visual Studio Code workspace at the root of the folder containing the code</li>
  <li>open one of the chapter <code class="language-plaintext highlighter-rouge">.glsl</code> files</li>
  <li>select <code class="language-plaintext highlighter-rouge">Shader Toy: Show GLSL Preview</code> from the VS Code command palette (ctrl+shift+p).</li>
</ul>

<h3 id="shadertoy">ShaderToy</h3>

<p>With a little coercing, mostly around setting up the feedback-loop through a buffer, detecting mouse movement, and updating <code class="language-plaintext highlighter-rouge">main</code> to <code class="language-plaintext highlighter-rouge">mainImage</code>, the code can be updated to run on <a href="http://www.shadertoy.com/">ShaderToy</a>. I’ve created an example <a href="https://www.shadertoy.com/view/tddSz4">here</a>.</p>

<iframe src="https://www.shadertoy.com/embed/tddSz4?gui=true&amp;t=10&amp;paused=true" width="100%" height="400" frameborder="0" allowfullscreen="allowfullscreen"></iframe>

<p>In order to see the results in action, be sure to press the play button on the embedded viewer. Click and move the mouse to change the camera location.</p>

<h1 id="results">Results</h1>

<figure class=" ">
  
    
      <a href="/assets/images/gpurtioa_post/3_results_0.png" title="Results">
          <img src="/assets/images/gpurtioa_post/3_results_0.png" alt="Results" />
      </a>
    
  
  
</figure>

<figure class=" ">
  
    
      <a href="/assets/images/gpurtioa_post/2_motion_blur.png" title="Motion Blur">
          <img src="/assets/images/gpurtioa_post/2_motion_blur.png" alt="Motion Blur" />
      </a>
    
  
  
</figure>

<h1 id="references">References</h1>

<p><a href="https://raytracing.github.io/books/RayTracingInOneWeekend.html">Ray Tracing in One Weekend by Peter Shirley</a></p>

<p><a href="https://www.shadertoy.com/view/XlycWh">RIOW 1.12: Where Next? by Reinder Nijhoff</a></p>

<p><a href="https://www.shadertoy.com/view/Xt3cDn">Quality hashes collection WebGL2 by nimitz</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[Here we take the work previously presented in Peter Shirley's excellent book, and move it to run entirely on the GPU.]]></summary></entry><entry><title type="html">Vulkan Textures Unbound</title><link href="https://willpgfx.com/2019/06/vulkan-textures-unbound/" rel="alternate" type="text/html" title="Vulkan Textures Unbound" /><published>2019-06-22T00:00:00-04:00</published><updated>2019-06-22T00:00:00-04:00</updated><id>https://willpgfx.com/2019/06/vulkan-textures-unbound</id><content type="html" xml:base="https://willpgfx.com/2019/06/vulkan-textures-unbound/"><![CDATA[<aside class="sidebar__right">
<nav class="toc">
    <header><h4 class="nav__title"><i class="fas fa-file-alt"></i> On This Page</h4></header>
<ul class="toc__menu" id="markdown-toc">
  <li><a href="#problem-statement" id="markdown-toc-problem-statement">Problem Statement</a></li>
  <li><a href="#error-messages" id="markdown-toc-error-messages">Error Messages</a>    <ul>
      <li><a href="#quick-and-dirty-way-to-address-error-2" id="markdown-toc-quick-and-dirty-way-to-address-error-2">Quick and Dirty Way to Address Error 2</a></li>
    </ul>
  </li>
  <li><a href="#a-cleaner-approach" id="markdown-toc-a-cleaner-approach">A Cleaner Approach</a></li>
  <li><a href="#final-thoughts" id="markdown-toc-final-thoughts">Final Thoughts</a></li>
  <li><a href="#references" id="markdown-toc-references">References</a></li>
</ul>

  </nav>
</aside>

<h1 id="problem-statement">Problem Statement</h1>

<p>I have recently been working on implementing a Vulkan 1.1 backend for my engine. Since the project started with a Direct3D 12 graphics backend, all of the shaders are written in HLSL. While <a href="https://github.com/Microsoft/DirectXShaderCompiler/">DXC</a> has done wonders to reduce the amount of rework needed for many of the project’s shaders, one feature that did not work directly out of the box when cross-compiling to SPIR-V for Vulkan was unbounded arrays of textures. There seems to be limited information available online about how to support these, so that will be the focus of this post.</p>

<p>To be clear, when I say “array of textures” I am talking about the following:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Texture2D</span> <span class="n">mytextures</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">);</span> <span class="c1">// array of textures &lt;- this post is about this</span>

<span class="n">Texture2DArray</span> <span class="n">othertexture</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">);</span> <span class="c1">// texture array &lt;- not talking about this</span>
</code></pre></div></div>

<p><a href="https://docs.microsoft.com/en-us/windows/desktop/direct3d12/resource-binding-in-hlsl">HLSL and Direct3D 12</a> allow for truly unbounded arrays of textures. The shader author does not need to know the upper limit of the array, and from the application side the implemeneter only needs to be sure they do not cause the shader to index outside of a valid range of bound descriptors. Since not indexing outside the bounds of an array is a characteristic of a well-formed application in the first place, this seems a reasonable requirement for the flexibility this introduces.</p>

<p>To be a bit more specific, here’s what we are trying to support in a cross-platform way:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// instead of this</span>
<span class="n">Texture2D</span> <span class="n">materialTextures</span><span class="p">[</span><span class="mi">1024</span><span class="p">]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">);</span> <span class="c1">// array of textures with upper limit</span>

<span class="c1">// we want to write this</span>
<span class="n">Texture2D</span> <span class="n">materialTextures</span><span class="p">[]</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">);</span> <span class="c1">// unbounded array of textures</span>
</code></pre></div></div>

<h1 id="error-messages">Error Messages</h1>

<p>When initially starting a Vulkan build of the application without any of the required extended features enabled and an unbound array of textures in use by a shader, the first error we stumble upon is something similar to the following, let’s call this <strong>Error 1</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Shader requires VkPhysicalDeviceDescriptorIndexingFeaturesEXT::runtimeDescriptorArray but is not enabled on the device.
</code></pre></div></div>

<p>Easy enough, it tells us directly in the message which feature we need to enable in order for this syntax to be allowed. I’ll show later in the post how to query for support and enable that feature, but for now let’s move on. With that feature enabled, the initial error is assuaged, but now we have a new validation error, which we’ll refer to as <strong>Error 2</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Descriptor set 0x4c6 bound as set #1 encountered the following validation error at vkCmdDrawIndexed() time: Descriptor in binding #1 index 18 is being used in draw but has not been updated.
</code></pre></div></div>

<p>What this is basically saying is that if you have a descriptor set layout that declares itself has having some number of descriptors, but you have only written the first few descriptors into your descriptor set, and the shader could possibly index past where you have written, you are in error. Since we updated our shader and enabled the <code class="language-plaintext highlighter-rouge">runtimeDescriptorArray</code> feature, the validation layer assumes that the entire descriptor set could be accessed by the shader and will validate against the entire contents, including for locations that may not have actually been written.</p>

<p>For example, say our descriptor set layout declares that it uses 1000 descriptors, but so far we have only had need to load the first 100. The validation layer will see the uninitialized contents after our first 100 textures and think that we may intend to index into them in the shader, which would be an error. There is a fairly straightforward way around this without enabling another feature that I will show, but if the feature <code class="language-plaintext highlighter-rouge">descriptorBindingPartiallyBound</code> is available, I recommend enabling it as it will provide true support for partial binding.</p>

<h2 id="quick-and-dirty-way-to-address-error-2">Quick and Dirty Way to Address Error 2</h2>

<p>All it boils down to is this. After the valid entries have been written into <code class="language-plaintext highlighter-rouge">VkWriteDescriptorSet</code> structs, fill the entirety of the remaining objects with the first descriptor. The only assumption made here is that at least the first descriptor is valid. I feel this is a safe-enough assumption when combined with some well-placed assertions, otherwise why make a call to update the set in the first place? This way technically works, insofar that draws take place as expected with no validation layer warnings.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// after cycling through all of the "real" descriptors desired to be set, find out if any descriptor slots are left</span>
<span class="k">if</span><span class="p">(</span><span class="n">descriptorCount</span> <span class="o">&lt;</span> <span class="n">descriptorRange</span><span class="p">.</span><span class="n">numDescriptors</span><span class="p">)</span>
<span class="p">{</span>
	<span class="c1">// for each unwritten descriptor in the layout, copy the first descriptor into those locations</span>
	<span class="k">for</span><span class="p">(</span><span class="n">uint32</span> <span class="n">i</span> <span class="o">=</span> <span class="n">descriptorCount</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">descriptorRange</span><span class="p">.</span><span class="n">numDescriptors</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
	<span class="p">{</span>
		<span class="c1">// the first descriptor is assumed to be valid</span>
		<span class="n">imageInfos</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">imageInfos</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
		<span class="o">++</span><span class="n">descriptorCount</span><span class="p">;</span>
	<span class="p">}</span>
<span class="p">}</span>

<span class="n">VkWriteDescriptorSet</span> <span class="n">writeDescriptorSet</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">dstSet</span> <span class="o">=</span> <span class="n">descriptorSet</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">dstBinding</span> <span class="o">=</span> <span class="n">binding</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">dstArrayElement</span> <span class="o">=</span> <span class="mi">0u</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">descriptorCount</span> <span class="o">=</span> <span class="n">descriptorCount</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">descriptorType</span> <span class="o">=</span> <span class="n">VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">pImageInfo</span> <span class="o">=</span> <span class="n">pImageInfo</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">pBufferInfo</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">writeDescriptorSet</span><span class="p">.</span><span class="n">pTexelBufferView</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>

<span class="n">vkUpdateDescriptorSets</span><span class="p">(...);</span> <span class="c1">// as usual</span>
</code></pre></div></div>

<p>Even with the extended feature enabled, filling in the remaining descriptors with a known error texture would be a good way of detecting out-of-bounds accesses at runtime. Thanks to <a href="https://twitter.com/longbool">Alex Tardif</a> for bringing that to my attention.</p>

<h1 id="a-cleaner-approach">A Cleaner Approach</h1>

<p>The following approach solves both of the issues presented above, and is a simpler and cleaner approach compared to copying a single descriptor all over the place.</p>

<p>We want to query for and enable two extended features in order to enable truly unbounded arrays of textures. As luck would have it, both features come from the same set, namely <code class="language-plaintext highlighter-rouge">VkPhysicalDeviceDescriptorIndexingFeaturesEXT</code>. The features we want enabled are:</p>

<ul>
  <li>runtimeDescriptorArray - for <strong>Error 1</strong></li>
  <li>descriptorBindingPartiallyBound - for <strong>Error 2</strong></li>
</ul>

<p>This code block shows how to query for support for these features for a given <code class="language-plaintext highlighter-rouge">VkPhysicalDevice</code>:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">VkPhysicalDeviceDescriptorIndexingFeaturesEXT</span> <span class="n">indexingFeatures</span><span class="p">{};</span>
<span class="n">indexingFeatures</span><span class="p">.</span><span class="n">sType</span>	<span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DESCRIPTOR_INDEXING_FEATURES_EXT</span><span class="p">;</span>
<span class="n">indexingFeatures</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>

<span class="n">VkPhysicalDeviceFeatures2</span> <span class="n">deviceFeatures</span><span class="p">{};</span>
<span class="n">deviceFeatures</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2</span><span class="p">;</span>
<span class="n">deviceFeatures</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">indexingFeatures</span><span class="p">;</span>
<span class="n">vkGetPhysicalDeviceFeatures2</span><span class="p">(</span><span class="n">physicalDevice</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">deviceFeatures</span><span class="p">);</span>

<span class="k">if</span><span class="p">(</span><span class="n">indexingFeatures</span><span class="p">.</span><span class="n">descriptorBindingPartiallyBound</span> <span class="o">&amp;&amp;</span> <span class="n">indexingFeatures</span><span class="p">.</span><span class="n">runtimeDescriptorArray</span><span class="p">)</span>
<span class="p">{</span>
	<span class="c1">// all set to use unbound arrays of textures</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Next, enable those two features when creating the logical device (<code class="language-plaintext highlighter-rouge">VkDevice</code>):</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">VkPhysicalDeviceDescriptorIndexingFeaturesEXT</span> <span class="n">indexingFeatures</span><span class="p">{};</span>
<span class="n">indexingFeatures</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DESCRIPTOR_INDEXING_FEATURES_EXT</span><span class="p">;</span>
<span class="n">indexingFeatures</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">indexingFeatures</span><span class="p">.</span><span class="n">descriptorBindingPartiallyBound</span> <span class="o">=</span> <span class="n">VK_TRUE</span><span class="p">;</span>
<span class="n">indexingFeatures</span><span class="p">.</span><span class="n">runtimeDescriptorArray</span> <span class="o">=</span> <span class="n">VK_TRUE</span><span class="p">;</span>

<span class="n">VkDeviceCreateInfo</span> <span class="n">createInfo</span><span class="p">{};</span>
<span class="n">createInfo</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO</span><span class="p">;</span>
<span class="n">createInfo</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">indexingFeatures</span><span class="p">;</span>
<span class="c1">// the rest of the createInfo is filled out as normal</span>
</code></pre></div></div>

<p>The last adjustment we need to make is to the descriptor set layout. For layouts requiring an unbounded array of textures, we want to add the <code class="language-plaintext highlighter-rouge">VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT</code> flag by filling out a <code class="language-plaintext highlighter-rouge">VkDescriptorBindingFlagsEXT</code> struct. This will allow the validation layer to ease up and trust that the implementer is not going to index outside the valid range of bound descriptors for the given set.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// update these values to be useful for your specific use case</span>
<span class="n">VkDescriptorSetLayoutBinding</span> <span class="n">layoutBinding</span><span class="p">{};</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">0u</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">descriptorType</span> <span class="o">=</span> <span class="n">VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">descriptorCount</span> <span class="o">=</span> <span class="mi">10000u</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">stageFlags</span> <span class="o">=</span> <span class="n">VK_SHADER_STAGE_FRAGMENT_BIT</span><span class="p">;</span>

<span class="n">VkDescriptorBindingFlagsEXT</span> <span class="n">bindFlag</span> <span class="o">=</span> <span class="n">VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT</span><span class="p">;</span>

<span class="n">VkDescriptorSetLayoutBindingFlagsCreateInfoEXT</span> <span class="n">extendedInfo</span><span class="p">{};</span>
<span class="n">extendedInfo</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_BINDING_FLAGS_CREATE_INFO_EXT</span><span class="p">;</span>
<span class="n">extendedInfo</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">extendedInfo</span><span class="p">.</span><span class="n">bindingCount</span> <span class="o">=</span> <span class="mi">1u</span><span class="p">;</span>
<span class="n">extendedInfo</span><span class="p">.</span><span class="n">pBindingFlags</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">bindFlag</span><span class="p">;</span>

<span class="n">VkDescriptorSetLayoutCreateInfo</span> <span class="n">layoutInfo</span><span class="p">{};</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO</span><span class="p">;</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">extendedInfo</span><span class="p">;</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">flags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">bindingCount</span> <span class="o">=</span> <span class="mi">1u</span><span class="p">;</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">pBindings</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">layoutBinding</span><span class="p">;</span>

<span class="n">vkCreateDescriptorSetLayout</span><span class="p">(...);</span> <span class="c1">// as usual</span>
</code></pre></div></div>

<h1 id="final-thoughts">Final Thoughts</h1>

<p>There you have it. The second way requires a little more initialization code at startup to query for and enable the required features, but that seems a small price to pay for a cleaner implementation and being able to bind partially-filled descriptor sets.</p>

<p>That said, it is also compatible with copying a known texture descriptor (error or otherwise) as mentioned earlier, if desired. Use cases vary across and within projects, and the implementer should choose the best fit for the problem they’re solving.</p>

<p>Being a sole developer, I don’t have access to the swaths of hardware configurations available to larger studios. I can however state that the above features are available on my 980 Ti, which is over four years old at the time of writing this post. I imagine most desktop GPUs will have had the features available for some time now, though mobile-centric GPUs may not be as accommodating (mobile is not currently one of my target ecosystems). Specific device support can be looked up on <a href="https://twitter.com/SaschaWillems2">Sascha Willem’s</a> site <a href="http://vulkan.gpuinfo.org/">here</a>.</p>

<h1 id="references">References</h1>

<p><a href="https://www.khronos.org/registry/vulkan/specs/1.1-extensions/man/html/VkPhysicalDeviceDescriptorIndexingFeaturesEXT.html">https://www.khronos.org/registry/vulkan/specs/1.1-extensions/man/html/VkPhysicalDeviceDescriptorIndexingFeaturesEXT.html</a></p>

<p><a href="https://khronos.org/registry/vulkan/specs/1.1-extensions/man/html/VkDescriptorSetLayoutBindingFlagsCreateInfoEXT.html">https://khronos.org/registry/vulkan/specs/1.1-extensions/man/html/VkDescriptorSetLayoutBindingFlagsCreateInfoEXT.html</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[This post shows an example of how to use unbounded arrays of textures in Vulkan.]]></summary></entry><entry><title type="html">A Platform-Independent Thread Pool Using C++14</title><link href="https://willpgfx.com/2016/01/a-platform-independent-thread-pool-using-c14/" rel="alternate" type="text/html" title="A Platform-Independent Thread Pool Using C++14" /><published>2016-01-03T00:00:00-05:00</published><updated>2016-01-03T00:00:00-05:00</updated><id>https://willpgfx.com/2016/01/a-platform-independent-thread-pool-using-c14</id><content type="html" xml:base="https://willpgfx.com/2016/01/a-platform-independent-thread-pool-using-c14/"><![CDATA[<aside class="sidebar__right">
<nav class="toc">
    <header><h4 class="nav__title"><i class="fas fa-file-alt"></i> On This Page</h4></header>
<ul class="toc__menu" id="markdown-toc">
  <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
  <li><a href="#implementation" id="markdown-toc-implementation">Implementation</a>    <ul>
      <li><a href="#a-thread-safe-queue" id="markdown-toc-a-thread-safe-queue">A Thread-Safe Queue</a></li>
      <li><a href="#the-thread-pool" id="markdown-toc-the-thread-pool">The Thread Pool</a></li>
      <li><a href="#submitting-work-to-the-thread-pool" id="markdown-toc-submitting-work-to-the-thread-pool">Submitting Work to the Thread Pool</a></li>
    </ul>
  </li>
  <li><a href="#does-it-work" id="markdown-toc-does-it-work">Does It Work?</a></li>
  <li><a href="#about-the-number-of-pooled-threads" id="markdown-toc-about-the-number-of-pooled-threads">About the Number of Pooled Threads</a></li>
  <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li>
  <li><a href="#acknowledgements" id="markdown-toc-acknowledgements">Acknowledgements</a></li>
  <li><a href="#references" id="markdown-toc-references">References</a></li>
</ul>

  </nav>
</aside>

<h1 id="introduction">Introduction</h1>

<p>One of the major benefits provided by the new generation of graphics APIs is much better support for multithreaded command list generation and submission.  It’s not uncommon for computers nowadays to contain 2, 4, 8, or even 16 core processors.  The goal of the solution in this post is to ensure we can use the power our CPU provides, not just for generating graphics command lists, but for any task that can be easily parallelized.</p>

<p>At its simplest, a thread pool is a collection of threads that run continuously waiting to take on a task to complete.  If there’s no task available, they yield or sleep for some amount of time, wake back up, and check again.  When a task is available, one of the waiting threads claims it, runs it, and returns to the waiting state.</p>

<p>The reason we would want to use a thread pool instead of creating new threads over and over for each task we want to run on a separate thread is to save on the time it would otherwise take to construct a thread, submit work to it, and deconstruct it when it’s done running.  With a small collection of threads continuously running and waiting on tasks, we’re only left with the middle step - work submission.</p>

<h1 id="implementation">Implementation</h1>

<p>The thread pool presented here is based off the implementation provided in [1].  It has been updated to include variadic arguments for added flexibility.</p>

<h2 id="a-thread-safe-queue">A Thread-Safe Queue</h2>

<p>Before we build the pool itself, we need a means of submitting work in a thread-safe manner.  Jobs should be picked up in the same order they are submitted to the pool, which means a queue is a good candidate.  Jobs are pushed to the back of the queue, and popped from the front.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * The ThreadSafeQueue class.
 * Provides a wrapper around a basic queue to provide thread safety.
 */</span>
<span class="cp">#pragma once
</span>
<span class="cp">#ifndef THREADSAFEQUEUE_HPP
#define THREADSAFEQUEUE_HPP
</span>
<span class="cp">#include</span> <span class="cpf">&lt;atomic&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;condition_variable&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;mutex&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;queue&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;utility&gt;</span><span class="cp">
</span>
<span class="k">namespace</span> <span class="n">MyNamespace</span>
<span class="p">{</span>
    <span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
    <span class="k">class</span> <span class="nc">ThreadSafeQueue</span>
    <span class="p">{</span>
    <span class="nl">public:</span>
        <span class="cm">/**
         * Destructor.
         */</span>
        <span class="o">~</span><span class="n">ThreadSafeQueue</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">invalidate</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Attempt to get the first value in the queue.
         * Returns true if a value was successfully written to the out parameter, false otherwise.
         */</span>
        <span class="kt">bool</span> <span class="nf">tryPop</span><span class="p">(</span><span class="n">T</span><span class="o">&amp;</span> <span class="n">out</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">{</span><span class="n">m_mutex</span><span class="p">};</span>
            <span class="k">if</span><span class="p">(</span><span class="n">m_queue</span><span class="p">.</span><span class="n">empty</span><span class="p">()</span> <span class="o">||</span> <span class="o">!</span><span class="n">m_valid</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="n">out</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">m_queue</span><span class="p">.</span><span class="n">front</span><span class="p">());</span>
            <span class="n">m_queue</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
            <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Get the first value in the queue.
         * Will block until a value is available unless clear is called or the instance is destructed.
         * Returns true if a value was successfully written to the out parameter, false otherwise.
         */</span>
        <span class="kt">bool</span> <span class="nf">waitPop</span><span class="p">(</span><span class="n">T</span><span class="o">&amp;</span> <span class="n">out</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">std</span><span class="o">::</span><span class="n">unique_lock</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">{</span><span class="n">m_mutex</span><span class="p">};</span>
            <span class="n">m_condition</span><span class="p">.</span><span class="n">wait</span><span class="p">(</span><span class="n">lock</span><span class="p">,</span> <span class="p">[</span><span class="k">this</span><span class="p">]()</span>
            <span class="p">{</span>
                <span class="k">return</span> <span class="o">!</span><span class="n">m_queue</span><span class="p">.</span><span class="n">empty</span><span class="p">()</span> <span class="o">||</span> <span class="o">!</span><span class="n">m_valid</span><span class="p">;</span>
            <span class="p">});</span>
            <span class="cm">/*
             * Using the condition in the predicate ensures that spurious wakeups with a valid
             * but empty queue will not proceed, so only need to check for validity before proceeding.
             */</span>
            <span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">m_valid</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="n">out</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">m_queue</span><span class="p">.</span><span class="n">front</span><span class="p">());</span>
            <span class="n">m_queue</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
            <span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Push a new value onto the queue.
         */</span>
        <span class="kt">void</span> <span class="nf">push</span><span class="p">(</span><span class="n">T</span> <span class="n">value</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">{</span><span class="n">m_mutex</span><span class="p">};</span>
            <span class="n">m_queue</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">value</span><span class="p">));</span>
            <span class="n">m_condition</span><span class="p">.</span><span class="n">notify_one</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Check whether or not the queue is empty.
         */</span>
        <span class="kt">bool</span> <span class="n">empty</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="k">const</span>
        <span class="p">{</span>
            <span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">{</span><span class="n">m_mutex</span><span class="p">};</span>
            <span class="k">return</span> <span class="n">m_queue</span><span class="p">.</span><span class="n">empty</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Clear all items from the queue.
         */</span>
        <span class="kt">void</span> <span class="nf">clear</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">{</span><span class="n">m_mutex</span><span class="p">};</span>
            <span class="k">while</span><span class="p">(</span><span class="o">!</span><span class="n">m_queue</span><span class="p">.</span><span class="n">empty</span><span class="p">())</span>
            <span class="p">{</span>
                <span class="n">m_queue</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
            <span class="p">}</span>
            <span class="n">m_condition</span><span class="p">.</span><span class="n">notify_all</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Invalidate the queue.
         * Used to ensure no conditions are being waited on in waitPop when
         * a thread or the application is trying to exit.
         * The queue is invalid after calling this method and it is an error
         * to continue using a queue after this method has been called.
         */</span>
        <span class="kt">void</span> <span class="nf">invalidate</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">{</span><span class="n">m_mutex</span><span class="p">};</span>
            <span class="n">m_valid</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
            <span class="n">m_condition</span><span class="p">.</span><span class="n">notify_all</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Returns whether or not this queue is valid.
         */</span>
        <span class="kt">bool</span> <span class="n">isValid</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="k">const</span>
        <span class="p">{</span>
            <span class="n">std</span><span class="o">::</span><span class="n">lock_guard</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">mutex</span><span class="o">&gt;</span> <span class="n">lock</span><span class="p">{</span><span class="n">m_mutex</span><span class="p">};</span>
            <span class="k">return</span> <span class="n">m_valid</span><span class="p">;</span>
        <span class="p">}</span>

    <span class="k">private</span><span class="o">:</span>
        <span class="n">std</span><span class="o">::</span><span class="n">atomic_bool</span> <span class="n">m_valid</span><span class="p">{</span><span class="nb">true</span><span class="p">};</span>
        <span class="k">mutable</span> <span class="n">std</span><span class="o">::</span><span class="n">mutex</span> <span class="n">m_mutex</span><span class="p">;</span>
        <span class="n">std</span><span class="o">::</span><span class="n">queue</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">m_queue</span><span class="p">;</span>
        <span class="n">std</span><span class="o">::</span><span class="n">condition_variable</span> <span class="n">m_condition</span><span class="p">;</span>
    <span class="p">};</span>
<span class="p">}</span>

<span class="cp">#endif
</span></code></pre></div></div>

<p>Most of this is pretty standard fare for designing a thread-safe class.  We lock a mutex anytime we need to read or write data and provide a simplified interface over a <code class="language-plaintext highlighter-rouge">std::queue</code> where writes are checked for validity before being performed.  This is why <code class="language-plaintext highlighter-rouge">tryPop</code> and <code class="language-plaintext highlighter-rouge">waitPop</code> return bools for success and write to the provide parameter in successful cases.</p>

<p>Any time <code class="language-plaintext highlighter-rouge">push</code> is called with a new task, it calls <code class="language-plaintext highlighter-rouge">notify_one()</code> on the condition variable which will wake one thread blocked on the condition.  The mutex is locked, the predicate is checked, and if all conditions are met (the queue is not empty and the queue is still valid), a task is popped and returned from the queue.</p>

<p>Because this queue provides a blocking method, <code class="language-plaintext highlighter-rouge">waitPop</code>, that depends on a condition variable being set to continue, it also needs a way to signal to anything waiting on the condition in the case that the queue needs to be deconstructed while there are threads still blocked on the condition.  This is accomplished through the <code class="language-plaintext highlighter-rouge">invalidate()</code> method that first sets the <code class="language-plaintext highlighter-rouge">m_valid</code> member to false and then calls <code class="language-plaintext highlighter-rouge">notify_all()</code> on the condition variable.  This will wake up every thread blocked on the condition and <code class="language-plaintext highlighter-rouge">waitPop</code> will return with a value of <code class="language-plaintext highlighter-rouge">false</code>, indicating to the call site that no work is being returned.</p>

<p>Another nicety the condition variable gives us is protection from spurious wakeups [3].  If a spurious wakeup does occur and the entire predicate isn’t met, the thread goes back to waiting.</p>

<h2 id="the-thread-pool">The Thread Pool</h2>

<p>The implementation of the thread pool is shown below.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * The ThreadPool class.
 * Keeps a set of threads constantly waiting to execute incoming jobs.
 */</span>
<span class="cp">#pragma once
</span>
<span class="cp">#ifndef THREADPOOL_HPP
#define THREADPOOL_HPP
</span>
<span class="cp">#include</span> <span class="cpf">"ThreadSafeQueue.hpp"</span><span class="cp">
</span>
<span class="cp">#include</span> <span class="cpf">&lt;algorithm&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;atomic&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;cstdint&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;functional&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;future&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;memory&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;thread&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;type_traits&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;utility&gt;</span><span class="cp">
#include</span> <span class="cpf">&lt;vector&gt;</span><span class="cp">
</span>
<span class="k">namespace</span> <span class="n">MyNamespace</span>
<span class="p">{</span>
    <span class="k">class</span> <span class="nc">ThreadPool</span>
    <span class="p">{</span>
    <span class="nl">private:</span>
        <span class="k">class</span> <span class="nc">IThreadTask</span>
        <span class="p">{</span>
        <span class="nl">public:</span>
            <span class="n">IThreadTask</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
            <span class="k">virtual</span> <span class="o">~</span><span class="n">IThreadTask</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
            <span class="n">IThreadTask</span><span class="p">(</span><span class="k">const</span> <span class="n">IThreadTask</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
            <span class="n">IThreadTask</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="k">const</span> <span class="n">IThreadTask</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
            <span class="n">IThreadTask</span><span class="p">(</span><span class="n">IThreadTask</span><span class="o">&amp;&amp;</span> <span class="n">other</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
            <span class="n">IThreadTask</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="n">IThreadTask</span><span class="o">&amp;&amp;</span> <span class="n">other</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>

            <span class="cm">/**
             * Run the task.
             */</span>
            <span class="k">virtual</span> <span class="kt">void</span> <span class="n">execute</span><span class="p">()</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
        <span class="p">};</span>

        <span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">Func</span><span class="p">&gt;</span>
        <span class="k">class</span> <span class="nc">ThreadTask</span><span class="o">:</span> <span class="k">public</span> <span class="n">IThreadTask</span>
        <span class="p">{</span>
        <span class="nl">public:</span>
            <span class="n">ThreadTask</span><span class="p">(</span><span class="n">Func</span><span class="o">&amp;&amp;</span> <span class="n">func</span><span class="p">)</span>
                <span class="o">:</span><span class="n">m_func</span><span class="p">{</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">func</span><span class="p">)}</span>
            <span class="p">{</span>
            <span class="p">}</span>

            <span class="o">~</span><span class="n">ThreadTask</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="k">override</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
            <span class="n">ThreadTask</span><span class="p">(</span><span class="k">const</span> <span class="n">ThreadTask</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
            <span class="n">ThreadTask</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="k">const</span> <span class="n">ThreadTask</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
            <span class="n">ThreadTask</span><span class="p">(</span><span class="n">ThreadTask</span><span class="o">&amp;&amp;</span> <span class="n">other</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
            <span class="n">ThreadTask</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="n">ThreadTask</span><span class="o">&amp;&amp;</span> <span class="n">other</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>

            <span class="cm">/**
             * Run the task.
             */</span>
            <span class="kt">void</span> <span class="nf">execute</span><span class="p">()</span> <span class="k">override</span>
            <span class="p">{</span>
                <span class="n">m_func</span><span class="p">();</span>
            <span class="p">}</span>

        <span class="k">private</span><span class="o">:</span>
            <span class="n">Func</span> <span class="n">m_func</span><span class="p">;</span>
        <span class="p">};</span>

    <span class="k">public</span><span class="o">:</span>
        <span class="cm">/**
         * A wrapper around a std::future that adds the behavior of futures returned from std::async.
         * Specifically, this object will block and wait for execution to finish before going out of scope.
         */</span>
        <span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">T</span><span class="p">&gt;</span>
        <span class="k">class</span> <span class="nc">TaskFuture</span>
        <span class="p">{</span>
        <span class="nl">public:</span>
            <span class="n">TaskFuture</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">future</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&amp;&amp;</span> <span class="n">future</span><span class="p">)</span>
                <span class="o">:</span><span class="n">m_future</span><span class="p">{</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">future</span><span class="p">)}</span>
            <span class="p">{</span>
            <span class="p">}</span>

            <span class="n">TaskFuture</span><span class="p">(</span><span class="k">const</span> <span class="n">TaskFuture</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
            <span class="n">TaskFuture</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="k">const</span> <span class="n">TaskFuture</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>
            <span class="n">TaskFuture</span><span class="p">(</span><span class="n">TaskFuture</span><span class="o">&amp;&amp;</span> <span class="n">other</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
            <span class="n">TaskFuture</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="n">TaskFuture</span><span class="o">&amp;&amp;</span> <span class="n">other</span><span class="p">)</span> <span class="o">=</span> <span class="k">default</span><span class="p">;</span>
            <span class="o">~</span><span class="n">TaskFuture</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="k">if</span><span class="p">(</span><span class="n">m_future</span><span class="p">.</span><span class="n">valid</span><span class="p">())</span>
                <span class="p">{</span>
                    <span class="n">m_future</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
                <span class="p">}</span>
            <span class="p">}</span>

            <span class="k">auto</span> <span class="nf">get</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="k">return</span> <span class="n">m_future</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
            <span class="p">}</span>


        <span class="k">private</span><span class="o">:</span>
            <span class="n">std</span><span class="o">::</span><span class="n">future</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span> <span class="n">m_future</span><span class="p">;</span>
        <span class="p">};</span>

    <span class="k">public</span><span class="o">:</span>
        <span class="cm">/**
         * Constructor.
         */</span>
        <span class="n">ThreadPool</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
            <span class="o">:</span><span class="n">ThreadPool</span><span class="p">{</span><span class="n">std</span><span class="o">::</span><span class="n">max</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="o">::</span><span class="n">hardware_concurrency</span><span class="p">(),</span> <span class="mi">2u</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1u</span><span class="p">}</span>
        <span class="p">{</span>
            <span class="cm">/*
             * Always create at least one thread.  If hardware_concurrency() returns 0,
             * subtracting one would turn it to UINT_MAX, so get the maximum of
             * hardware_concurrency() and 2 before subtracting 1.
             */</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Constructor.
         */</span>
        <span class="k">explicit</span> <span class="n">ThreadPool</span><span class="p">(</span><span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="kt">uint32_t</span> <span class="n">numThreads</span><span class="p">)</span>
            <span class="o">:</span><span class="n">m_done</span><span class="p">{</span><span class="nb">false</span><span class="p">},</span>
            <span class="n">m_workQueue</span><span class="p">{},</span>
            <span class="n">m_threads</span><span class="p">{}</span>
        <span class="p">{</span>
            <span class="k">try</span>
            <span class="p">{</span>
                <span class="k">for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0u</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">numThreads</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
                <span class="p">{</span>
                    <span class="n">m_threads</span><span class="p">.</span><span class="n">emplace_back</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ThreadPool</span><span class="o">::</span><span class="n">worker</span><span class="p">,</span> <span class="k">this</span><span class="p">);</span>
                <span class="p">}</span>
            <span class="p">}</span>
            <span class="k">catch</span><span class="p">(...)</span>
            <span class="p">{</span>
                <span class="n">destroy</span><span class="p">();</span>
                <span class="k">throw</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Non-copyable.
         */</span>
        <span class="n">ThreadPool</span><span class="p">(</span><span class="k">const</span> <span class="n">ThreadPool</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>

        <span class="cm">/**
         * Non-assignable.
         */</span>
        <span class="n">ThreadPool</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">=</span><span class="p">(</span><span class="k">const</span> <span class="n">ThreadPool</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">)</span> <span class="o">=</span> <span class="k">delete</span><span class="p">;</span>

        <span class="cm">/**
         * Destructor.
         */</span>
        <span class="o">~</span><span class="n">ThreadPool</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">destroy</span><span class="p">();</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Submit a job to be run by the thread pool.
         */</span>
        <span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">Func</span><span class="p">,</span> <span class="k">typename</span><span class="o">...</span> <span class="nc">Args</span><span class="p">&gt;</span>
        <span class="k">auto</span> <span class="nf">submit</span><span class="p">(</span><span class="n">Func</span><span class="o">&amp;&amp;</span> <span class="n">func</span><span class="p">,</span> <span class="n">Args</span><span class="o">&amp;&amp;</span><span class="p">...</span> <span class="n">args</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="k">auto</span> <span class="n">boundTask</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o">&lt;</span><span class="n">Func</span><span class="o">&gt;</span><span class="p">(</span><span class="n">func</span><span class="p">),</span> <span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o">&lt;</span><span class="n">Args</span><span class="o">&gt;</span><span class="p">(</span><span class="n">args</span><span class="p">)...);</span>
            <span class="k">using</span> <span class="n">ResultType</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">result_of_t</span><span class="o">&lt;</span><span class="k">decltype</span><span class="p">(</span><span class="n">boundTask</span><span class="p">)()</span><span class="o">&gt;</span><span class="p">;</span>
            <span class="k">using</span> <span class="n">PackagedTask</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">packaged_task</span><span class="o">&lt;</span><span class="n">ResultType</span><span class="p">()</span><span class="o">&gt;</span><span class="p">;</span>
            <span class="k">using</span> <span class="n">TaskType</span> <span class="o">=</span> <span class="n">ThreadTask</span><span class="o">&lt;</span><span class="n">PackagedTask</span><span class="o">&gt;</span><span class="p">;</span>
            
            <span class="n">PackagedTask</span> <span class="n">task</span><span class="p">{</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">boundTask</span><span class="p">)};</span>
            <span class="n">TaskFuture</span><span class="o">&lt;</span><span class="n">ResultType</span><span class="o">&gt;</span> <span class="n">result</span><span class="p">{</span><span class="n">task</span><span class="p">.</span><span class="n">get_future</span><span class="p">()};</span>
            <span class="n">m_workQueue</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">make_unique</span><span class="o">&lt;</span><span class="n">TaskType</span><span class="o">&gt;</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">move</span><span class="p">(</span><span class="n">task</span><span class="p">)));</span>
            <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
        <span class="p">}</span>

    <span class="k">private</span><span class="o">:</span>
        <span class="cm">/**
         * Constantly running function each thread uses to acquire work items from the queue.
         */</span>
        <span class="kt">void</span> <span class="nf">worker</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="k">while</span><span class="p">(</span><span class="o">!</span><span class="n">m_done</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o">&lt;</span><span class="n">IThreadTask</span><span class="o">&gt;</span> <span class="n">pTask</span><span class="p">{</span><span class="nb">nullptr</span><span class="p">};</span>
                <span class="k">if</span><span class="p">(</span><span class="n">m_workQueue</span><span class="p">.</span><span class="n">waitPop</span><span class="p">(</span><span class="n">pTask</span><span class="p">))</span>
                <span class="p">{</span>
                    <span class="n">pTask</span><span class="o">-&gt;</span><span class="n">execute</span><span class="p">();</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Invalidates the queue and joins all running threads.
         */</span>
        <span class="kt">void</span> <span class="nf">destroy</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">m_done</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
            <span class="n">m_workQueue</span><span class="p">.</span><span class="n">invalidate</span><span class="p">();</span>
            <span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="o">&amp;</span> <span class="kr">thread</span> <span class="o">:</span> <span class="n">m_threads</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="k">if</span><span class="p">(</span><span class="kr">thread</span><span class="p">.</span><span class="n">joinable</span><span class="p">())</span>
                <span class="p">{</span>
                    <span class="kr">thread</span><span class="p">.</span><span class="n">join</span><span class="p">();</span>
                <span class="p">}</span>
            <span class="p">}</span>
        <span class="p">}</span>

    <span class="k">private</span><span class="o">:</span>
        <span class="n">std</span><span class="o">::</span><span class="n">atomic_bool</span> <span class="n">m_done</span><span class="p">;</span>
        <span class="n">ThreadSafeQueue</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">unique_ptr</span><span class="o">&lt;</span><span class="n">IThreadTask</span><span class="o">&gt;&gt;</span> <span class="n">m_workQueue</span><span class="p">;</span>
        <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="kr">thread</span><span class="o">&gt;</span> <span class="n">m_threads</span><span class="p">;</span>
    <span class="p">};</span>

    <span class="k">namespace</span> <span class="n">DefaultThreadPool</span>
    <span class="p">{</span>
        <span class="cm">/**
         * Get the default thread pool for the application.
         * This pool is created with std::thread::hardware_concurrency() - 1 threads.
         */</span>
        <span class="kr">inline</span> <span class="n">ThreadPool</span><span class="o">&amp;</span> <span class="n">getThreadPool</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="k">static</span> <span class="n">ThreadPool</span> <span class="n">defaultPool</span><span class="p">;</span>
            <span class="k">return</span> <span class="n">defaultPool</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="cm">/**
         * Submit a job to the default thread pool.
         */</span>
        <span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">Func</span><span class="p">,</span> <span class="k">typename</span><span class="o">...</span> <span class="nc">Args</span><span class="p">&gt;</span>
        <span class="kr">inline</span> <span class="k">auto</span> <span class="nf">submitJob</span><span class="p">(</span><span class="n">Func</span><span class="o">&amp;&amp;</span> <span class="n">func</span><span class="p">,</span> <span class="n">Args</span><span class="o">&amp;&amp;</span><span class="p">...</span> <span class="n">args</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="k">return</span> <span class="n">getThreadPool</span><span class="p">().</span><span class="n">submit</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o">&lt;</span><span class="n">Func</span><span class="o">&gt;</span><span class="p">(</span><span class="n">func</span><span class="p">),</span> <span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o">&lt;</span><span class="n">Args</span><span class="o">&gt;</span><span class="p">(</span><span class="n">args</span><span class="p">)...);</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="cp">#endif
</span></code></pre></div></div>

<p>There are a few pieces to touch on here.  First, we have an <code class="language-plaintext highlighter-rouge">IThreadTask</code> interface that defines an <code class="language-plaintext highlighter-rouge">execute()</code> pure virtual function.  The reason for this interface is simply so we can maintain a collection of them in one container type (the <code class="language-plaintext highlighter-rouge">ThreadSafeQueue&lt;T&gt;</code>).  <code class="language-plaintext highlighter-rouge">ThreadTask&lt;T&gt;</code> implements <code class="language-plaintext highlighter-rouge">IThreadTask</code> and takes a callable type <code class="language-plaintext highlighter-rouge">T</code> for its template parameter.</p>

<p>When constructing the thread pool, we attempt to read the number of hardware threads available to the system by using <code class="language-plaintext highlighter-rouge">std::thread::hardware_concurrency()</code>.  We always ensure the pool is started with at least one thread running, and ideally started with <code class="language-plaintext highlighter-rouge">hardware_concurrency - 1</code> threads running.  The reason for the minus one will be discussed later.  For each thread available, we construct a <code class="language-plaintext highlighter-rouge">std::thread</code> object that runs the private member function <code class="language-plaintext highlighter-rouge">worker()</code>.</p>

<p>The worker function’s only job is to endlessly check the queue to see if there is work to be done and execute the task if there is.  Since we’ve taken care to design the queue in a thread-safe manner, we don’t need to do any additional synchronization here.  The thread will enter the loop, get to <code class="language-plaintext highlighter-rouge">waitPop</code>, and either pop and execute a queued task, or wait on a task to become available via the submit function.  If <code class="language-plaintext highlighter-rouge">waitPop</code> returns true, we know <code class="language-plaintext highlighter-rouge">pTask</code> has been written to and can immediately execute it.  If it returns false, it most likely means that the queue has been invalidated.</p>

<p>The <code class="language-plaintext highlighter-rouge">submit</code> function is the public facing interface of the thread pool.  It starts by creating a few handy type definitions that make the actual implementation easier to follow.  First, the provided function and its arguments are bound to a callable object with no parameters using <code class="language-plaintext highlighter-rouge">std::bind</code>.  We need this for our <code class="language-plaintext highlighter-rouge">ThreadTask&lt;T&gt;</code> class to be able to call execute on its functor without having to know the arguments that came with the original function.  We then create a <code class="language-plaintext highlighter-rouge">std::packaged_task</code> with the bound task and extract the <code class="language-plaintext highlighter-rouge">std::future</code> from it before pushing it onto the queue.  Here again, we do not need to do any additional synchronization due to the thread-safe implementation of the queue.  You’ll notice the <code class="language-plaintext highlighter-rouge">std::future</code> returned from the <code class="language-plaintext highlighter-rouge">std::packaged_task</code> is wrapped in a class called <code class="language-plaintext highlighter-rouge">TaskFuture&lt;T&gt;</code>.  This was a design decision because of the way I intend to use the pool in my specific application.  I wanted the futures to mimic the way <code class="language-plaintext highlighter-rouge">std::async</code> futures work, specifically that they will block until their work is complete when they are going out of scope and being destructed.  <code class="language-plaintext highlighter-rouge">std::packaged_task</code> futures don’t do this out of the box, so we give them a simple wrapper to emulate the behavior [2].  Like <code class="language-plaintext highlighter-rouge">std::future</code>, <code class="language-plaintext highlighter-rouge">TaskFuture</code> is movable-only, so the synchronization does not have to occur in the same method as the call site as long as it’s passed along from the method.</p>

<p>You will see where the queue’s <code class="language-plaintext highlighter-rouge">invalidate</code> method is called in the thread pool’s <code class="language-plaintext highlighter-rouge">destroy()</code> method, which is called from the destructor or if an exception is thrown while creating the threads in the constructor, <strong>before</strong> joining the threads, and <strong>after</strong> setting the thread pool’s done marker to true.  The order is important to ensure that the threads know to exit their worker functions instead of re-attempting to obtain more work from the invalidated queue.  Due to the way the predicate is set up on the queue’s condition variable, it is not an error to re-enter <code class="language-plaintext highlighter-rouge">waitPop</code> on an invalidated queue since it will just return false, but it is a waste of time.</p>

<p>An optional nicety I decided to throw in is the <code class="language-plaintext highlighter-rouge">DefaultThreadPool</code> namespace.  This creates a thread pool with the maximum number of threads as discussed previously and is accessible from anywhere in the application that includes the thread pool header.  I prefer using this as opposed to having each subsystem owning its own thread pool, but there’s nothing wrong with creating thread pool instances through the constructors, either.</p>

<h2 id="submitting-work-to-the-thread-pool">Submitting Work to the Thread Pool</h2>

<p>With the above in place.  Submitting work is as simple as including the thread pools header file and calling its <code class="language-plaintext highlighter-rouge">submit</code> function with a callable object and optionally arguments to be provided to it.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">auto</span> <span class="n">taskFuture</span> <span class="o">=</span> <span class="n">MyNamespace</span><span class="o">::</span><span class="n">DefaultThreadPool</span><span class="o">::</span><span class="n">submitJob</span><span class="p">([]()</span>
<span class="p">{</span>
    <span class="n">lengthyProcess</span><span class="p">();</span>
<span class="p">});</span>

<span class="k">auto</span> <span class="n">taskFuture2</span> <span class="o">=</span> <span class="n">MyNamespace</span><span class="o">::</span><span class="n">DefaultThreadPool</span><span class="o">::</span><span class="n">submitJob</span><span class="p">([](</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">float</span> <span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">lengthyProcessWithArguments</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">);</span>
<span class="p">},</span> <span class="mi">5</span><span class="p">,</span> <span class="mf">10.0</span><span class="n">f</span><span class="p">);</span>
</code></pre></div></div>

<p>If submitting a reference for an argument, it is important to remember to wrap it with <code class="language-plaintext highlighter-rouge">std::ref</code> or <code class="language-plaintext highlighter-rouge">std::cref</code>.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">MyObject</span> <span class="n">obj</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">taskFuture</span> <span class="o">=</span> <span class="n">MyNamespace</span><span class="o">::</span><span class="n">DefaultThreadPool</span><span class="o">::</span><span class="n">submitJob</span><span class="p">([](</span><span class="k">const</span> <span class="n">MyObject</span><span class="o">&amp;</span> <span class="n">object</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">lengthyProcessThatNeedsToReadObject</span><span class="p">(</span><span class="n">object</span><span class="p">);</span>
<span class="p">},</span> <span class="n">std</span><span class="o">::</span><span class="n">cref</span><span class="p">(</span><span class="n">obj</span><span class="p">));</span>
</code></pre></div></div>

<h1 id="does-it-work">Does It Work?</h1>

<p>To ensure the thread pool and backing queue work not only in ideal cases, but also in the case where work is being submitted faster than the threads can take it on, we can write a little program that submits a bunch of jobs that sleep for a while and then synchronizes on them.  My machine reports eight as the result of <code class="language-plaintext highlighter-rouge">std::thread::hardware_concurrency()</code>, so I create a thread pool with seven threads.  The task I’m running is just to sleep whatever thread is executing for one second and finish.  I’ll submit twenty-one of these jobs to the pool.  We know that this would take about twenty-one seconds if executed serially, and since we’re running a thread pool with seven threads, we know that if everything is working well the jobs should all complete in about three seconds.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="k">namespace</span> <span class="n">MyNamespace</span><span class="p">;</span>
<span class="n">Timer</span> <span class="n">saturationTimer</span><span class="p">;</span>
<span class="k">const</span> <span class="k">auto</span> <span class="n">startTime</span> <span class="o">=</span> <span class="n">saturationTimer</span><span class="p">.</span><span class="n">tick</span><span class="p">();</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">ThreadPool</span><span class="o">::</span><span class="n">TaskFuture</span><span class="o">&lt;</span><span class="kt">void</span><span class="o">&gt;&gt;</span> <span class="n">v</span><span class="p">;</span>
<span class="k">for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0u</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">21u</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">v</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">DefaultThreadPool</span><span class="o">::</span><span class="n">submitJob</span><span class="p">([]()</span>
    <span class="p">{</span>
        <span class="n">std</span><span class="o">::</span><span class="n">this_thread</span><span class="o">::</span><span class="n">sleep_for</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">seconds</span><span class="p">(</span><span class="mi">1</span><span class="p">));</span>
    <span class="p">}));</span>
<span class="p">}</span>
<span class="k">for</span><span class="p">(</span><span class="k">auto</span><span class="o">&amp;</span> <span class="n">item</span><span class="o">:</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">item</span><span class="p">.</span><span class="n">get</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">const</span> <span class="k">auto</span> <span class="n">dt</span> <span class="o">=</span> <span class="n">saturationTimer</span><span class="p">.</span><span class="n">tick</span><span class="p">()</span> <span class="o">-</span> <span class="n">startTime</span><span class="p">;</span>
</code></pre></div></div>

<p>Running the above code on my machine, the result is just about what would be expected, averaging around 3.005 seconds over a dozen runs.</p>

<h1 id="about-the-number-of-pooled-threads">About the Number of Pooled Threads</h1>

<p>Earlier I mentioned that I start the thread pool with <code class="language-plaintext highlighter-rouge">std::thread::hardware_concurrency() - 1</code> threads.  The reason for this is simple.  The thread that’s calling the thread pool is a perfectly valid thread to do work on while you’re waiting for the results of submitted tasks to become available.  Despite the example from the Does It Work? section, submitting a bunch of jobs and then just waiting on them to complete is hardly optimal, so it makes sense to have the thread pool executing up to NumThreads - 1 jobs and the main thread doing whatever work it can accomplish in the meantime.  Splitting the workload up evenly across all available threads is usually the best approach with a task-based setup like this.</p>

<h1 id="conclusion">Conclusion</h1>

<p>This post has discussed what a thread pool is, why they’re useful, and how to get started implementing one.  There are very likely ways to make the provided thread pool more performant by specializing it more to avoid memory allocations on job submissions, but for my use cases I typically ensure the jobs being submitted are large enough that they make up for the time lost to allocating and deallocating memory with the time gained by running them in parallel with other large tasks.  Your mileage may vary, but at the very least you should have a solid start to customizing a thread pool to fit your exact needs.</p>

<h1 id="acknowledgements">Acknowledgements</h1>

<p>Thank you to the members of <a href="https://www.reddit.com/r/cpp/">/r/cpp</a> who helped with code review and provided feedback.</p>

<h1 id="references">References</h1>

<p>[1] William, Anthony.  <em>C++ Concurrency in Action:  Practical Multithreading</em>.  ISBN:  9781933988771</p>

<p>[2] <a href="http://scottmeyers.blogspot.com/2013/03/stdfutures-from-stdasync-arent-special.html">http://scottmeyers.blogspot.com/2013/03/stdfutures-from-stdasync-arent-special.html</a></p>

<p>[3] <a href="http://en.cppreference.com/w/cpp/thread/condition_variable/wait">http://en.cppreference.com/w/cpp/thread/condition_variable/wait</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[One of the major benefits provided by the new generation of graphics APIs is much better support for multithreaded command list generation and submission.  It's not uncommon for computers nowadays to contain 2, 4, 8, or even 16 core processors.  The goal of the solution in this post is to ensure we can use the power our CPU provides, not just for generating graphics command lists, but for any task that can be easily parallelized.]]></summary></entry><entry><title type="html">Screen Space Glossy Reflections Demo Available</title><link href="https://willpgfx.com/2015/12/screen-space-glossy-reflections-demo-available/" rel="alternate" type="text/html" title="Screen Space Glossy Reflections Demo Available" /><published>2015-12-16T00:00:00-05:00</published><updated>2015-12-16T00:00:00-05:00</updated><id>https://willpgfx.com/2015/12/screen-space-glossy-reflections-demo-available</id><content type="html" xml:base="https://willpgfx.com/2015/12/screen-space-glossy-reflections-demo-available/"><![CDATA[<h1 id="its-been-a-while">It’s Been a While</h1>

<p>Earlier this year I wrote a post discussing an implementation of <a href="/2015/07/screen-space-glossy-reflections/">real-time screen space glossy reflections</a>.  The post has received a lot of positive feedback, and I’ve had some very interesting conversations with various individuals since it went up discussing theory, details, shortcomings, and everything in between.  The response has been great, and I appreciate the community’s interest.  One request I’ve received a few times is for a working demo that users could play with to get a better feel for the effect in action.  I had originally hoped to finish updating the engine to support DirectX 12 before releasing anything, and while it’s probably about 90% done, there are still some areas that need work and my time lately has been limited.</p>

<p>Thankfully, it’s the year 2015 (for a little while longer) and we have this magical thing called source control.  I’ve decided to use a tag I created right before the DX 12 update began, and have modified it to provide a small demo for anyone that’s been waiting on it.  The goods news is that it’s entirely DirectX 11-based, so the hardware support will be much broader than that of a DX 12 solution.  The downside is that I’ve been able to make a few improvements, especially around blending missed ray hits with fallback solutions that won’t be present in the demo provided.  I should get a chance to release a demo with the new features once things settle down a bit and all will be right with the world.</p>

<h1 id="demo-controls">Demo Controls</h1>

<p>Once the scene loads, anyone familiar with first person applications should feel more or less at home with the basics.  <code class="language-plaintext highlighter-rouge">A</code>, <code class="language-plaintext highlighter-rouge">W</code>, <code class="language-plaintext highlighter-rouge">S</code>, and <code class="language-plaintext highlighter-rouge">D</code> control movement, with <code class="language-plaintext highlighter-rouge">W</code> and <code class="language-plaintext highlighter-rouge">S</code> moving the camera forwards and backwards, and <code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">D</code> strafing the camera left and right.  The mouse controls where the camera looks.  The user is not glued to the ground, so will move forward in whatever direction the camera is facing.  <code class="language-plaintext highlighter-rouge">J</code> and <code class="language-plaintext highlighter-rouge">K</code> control the floor roughness value, with <code class="language-plaintext highlighter-rouge">J</code> making the floor smoother, and <code class="language-plaintext highlighter-rouge">K</code> making it rougher.  A uniform roughness texture is applied over the entire floor, but in a real-world application an artist-authored texture would be used to make the results much more convincing.  <code class="language-plaintext highlighter-rouge">Q</code> and <code class="language-plaintext highlighter-rouge">E</code> are used to change the time of day.</p>

<p>The <code class="language-plaintext highlighter-rouge">Esc</code> key is used to quit the application.  To restart the scene without exiting, use the left, right, or up arrow keys.</p>

<h1 id="some-ugliness">Some Ugliness</h1>

<p>The fallback environment maps are setup exactly as they were in the original post.  Specifically, this means that a large area of the scene only has the global, undetailed environment map to fall back to.  This is quite noticeable in the beginning area of the scene underneath the characters.  If you move directly forward from the starting point of the scene, you’ll pass through a few walls and end up in an enclosed hallway-type structure.  This area does have localized environment maps to fallback to on ray hits and the results are cleaner.  As stated in the first section, more work has been done to improve blending that is not present in the demo.</p>

<p>Besides the shortcomings of the screen space approach discussed in the original blog post, the stack of boxes in the scene still use the engine’s old physics and collision system.  In the latest version, all of this has been <a href="/2015/05/stacks-on-stacks/">updated to use the Bullet Physics implementation</a>, but if you choose to knock the stack down (clicking the left mouse button throws a ball), be aware that you’re likely to see quite a bit of oddness.  That being said - go for it, it’s always fun to knock things over!</p>

<p>Also, ambient light is handled by sampling from environment maps placed throughout the scene.  To ensure maintaining these doesn’t become a bottleneck, only one is ever updated per frame, and they’re only updated when the lighting changes.  Namely, this means that as the time of day changes the environment maps will get rebuilt.  If the time of day changes slowly enough, as it would in a real-world application, these updates would be mostly unnoticeable.  However, since the user can control the time of day the overall lighting situation can change faster than the environment maps can keep up.  If the user holds down one of the keys to change the time of day, they’ll see stale lighting data being applied to most parts of the scene.  Once the key is released, the environment map renderer will catch up and the lighting will become coherent again.</p>

<h1 id="the-demo">The Demo</h1>

<p>Below is a link to download the demo.  Feel free and encouraged to continue commenting, asking questions, and offering constructive criticism.</p>

<p><a href="/assets/demos/RedBudDemox64Release.zip">Download the demo here</a>.</p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[Earlier this year I wrote a post discussing an implementation of real-time screen space glossy reflections.  The post has received a lot of positive feedback, and I've had some very interesting conversations with various individuals since it went up discussing theory, details, shortcomings, and everything in between.  The response has been great, and I appreciate the community's interest.  One request I've received a few times is for a working demo that users could play with to get a better feel for the effect in action.]]></summary></entry><entry><title type="html">Screen Space Glossy Reflections</title><link href="https://willpgfx.com/2015/07/screen-space-glossy-reflections/" rel="alternate" type="text/html" title="Screen Space Glossy Reflections" /><published>2015-07-10T00:00:00-04:00</published><updated>2015-07-10T00:00:00-04:00</updated><id>https://willpgfx.com/2015/07/screen-space-glossy-reflections</id><content type="html" xml:base="https://willpgfx.com/2015/07/screen-space-glossy-reflections/"><![CDATA[<aside class="sidebar__right">
<nav class="toc">
    <header><h4 class="nav__title"><i class="fas fa-file-alt"></i> On This Page</h4></header>
<ul class="toc__menu" id="markdown-toc">
  <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
  <li><a href="#glossy-ray-traced-reflections" id="markdown-toc-glossy-ray-traced-reflections">Glossy Ray Traced Reflections</a></li>
  <li><a href="#setting-up" id="markdown-toc-setting-up">Setting Up</a></li>
  <li><a href="#ray-tracing-in-screen-space" id="markdown-toc-ray-tracing-in-screen-space">Ray Tracing in Screen Space</a></li>
  <li><a href="#blurring-the-light-buffer" id="markdown-toc-blurring-the-light-buffer">Blurring the Light Buffer</a></li>
  <li><a href="#cone-tracing" id="markdown-toc-cone-tracing">Cone Tracing</a></li>
  <li><a href="#bringing-it-all-together" id="markdown-toc-bringing-it-all-together">Bringing It All Together</a></li>
  <li><a href="#areas-of-improvement" id="markdown-toc-areas-of-improvement">Areas of Improvement</a></li>
  <li><a href="#results" id="markdown-toc-results">Results</a></li>
  <li><a href="#conclusion" id="markdown-toc-conclusion">Conclusion</a></li>
  <li><a href="#acknowledgements" id="markdown-toc-acknowledgements">Acknowledgements</a></li>
  <li><a href="#references" id="markdown-toc-references">References</a></li>
</ul>

  </nav>
</aside>

<h1 id="introduction">Introduction</h1>

<p>Reflections are an important effect present in any routine attempting to approximate global illumination.  They give the user important spatial information about an object’s location, as well as provide an important visual indicator of the surface properties of certain materials.</p>

<p>For several years now, engineers and researchers in real-time graphics have worked towards improving reflections in their applications.  Simple implementations like cube maps used as reflection probes have been around for decades, while much newer techniques build upon their predecessors, such as parallax-corrected environment maps [4].</p>

<p>More recently, screen space ray tracing has become a widely used supplement to previously established methods of applying reflections to scenes.  The idea is simple enough - a view ray is reflected about a surface normal, then the resultant vector is stepped along until it has intersected the depth buffer.  With that location discovered, the light buffer is sampled and the value is added  to the final lighting result.  The below image shows a comparison of a scene looking at a stack of boxes without and with screen space ray tracing enabled.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_off_on_comparison.png" title="SSLR Off/On Comparison">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_off_on_comparison.png" alt="SSLR Off/On Comparison" />
      </a>
    
  
  
</figure>

<p>In practice, there are more than a few pitfalls to this approach that need special care and addressing to avoid.  The most obvious short-coming of this and any other screen space effect is the limited information available.  If a ray doesn’t hit something before leaving the screen bounds, it will not return a result, even if its would-be collider is just barely off-screen.</p>

<p>This effect also tends to have a lot of trouble with rays facing back towards the viewer.  When given thought, it makes a lot of sense that this would present an issue.  For one, if the ray reflects directly back at the viewer, it will never intersect the depth buffer, thus basically degenerating into the case of rays traveling off-screen that’s already been discussed.  The other issue is similar, but maybe not as obvious.  If a ray is traveling back in the general direction of the viewer and it does intersect the depth buffer, it’s likely to do so on a face of an object that’s faced away from the viewer.  This means that even if an intersection is reported, an incorrect result will be sampled from the light buffer at that position.  This can lead to ugly artifacts such as streaks across surfaces.  The figure below shows a top-down view of a ray being cast from the viewer, hitting a mirrored surface, and finally making contact with the back of a box.  Since from the viewer’s perspective the back of the box is not currently on-screen, erroneous results will be returned if that result is used.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/reflect_back_of_box.png" title="Reflect Back of Box">
          <img src="/assets/images/ss_glossy_reflections_post/reflect_back_of_box.png" alt="Reflect Back of Box" />
      </a>
    
  
  
</figure>

<p>There are ways to mitigate many of these artifacts, including fallback methods and fading that will be addressed later on.</p>

<h1 id="glossy-ray-traced-reflections">Glossy Ray Traced Reflections</h1>

<p>One further challenge with the generic approach described above is that if the result is used directly, only perfectly mirror-like reflections can be generated.  In the real world, most surfaces do not reflect light perfectly, but instead scatter, absorb, and reflect it in varying proportions due to microfacets [9].  To account for this, the technique needs to not only consider where the ray intersects the depth buffer, but also the roughness of the reflecting surface and the distance the ray has traveled.  The following image shows a comparison of mirror-like and glossy reflections.  Notice on the right half of the image how the further the ray has to travel to make contact, the blurrier it becomes.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_mirror_glossy_comparison.png" title="SSLR Mirror/Glossy Comparison">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_mirror_glossy_comparison.png" alt="SSLR Mirror/Glossy Comparison" />
      </a>
    
  
  
</figure>

<p>The rest of this post will re-touch on some of these issues as it discusses and provides a full implementation of ray tracing in screen space and creating glossy reflections via cone tracing.</p>

<h1 id="setting-up">Setting Up</h1>

<p>The effects described in this post are implemented using DirectX 11 and HLSL.  That’s not at all to say those are mandatory for following along.  In fact, the ray tracing shader used below is a translation of one written in GLSL, which would use OpenGL as its graphics API.</p>

<p>This implementation was designed as part of a deferred shading pipeline.  The effect runs after geometry buffer generation and lighting has completed.  The ray tracing step needs access to the depth buffer and a buffer containing the normals of the geometry in view.  The blurring step needs access to the light buffer.  The cone tracing step needs access to all of the aforementioned buffers, including the resultant ray traced buffer and blurred light buffer, as well as a buffer containing the specular values for materials in view.  It is also beneficial to include a fallback buffer containing indirect specular contributions derived from methods such as parallax-corrected cube maps.  These will each be addressed as they are used in the implementation.</p>

<p>Therefore, the final list of buffers needed before starting the effect becomes:</p>

<ul>
  <li>Depth buffer - the implementation uses a non-linear depth buffer due to its ready availability after the geometry buffer is generated.  McGuire’s initial implementation [1] uses a linear depth buffer and may be more efficient.</li>
  <li>Normal buffer - the geometry buffer used in this implementation stores all values in view space.  If the implementer stores their values in world space, they will need to be cognizant of the differences and prepared to apply appropriate transforms when necessary.</li>
  <li>Light buffer - this is a buffer containing all lighting to be applied to the scene.  The exact values stored in this buffer will be refined further during implementation discussion.</li>
  <li>Specular buffer - stored linearly as Fresnel reflectance at normal incidence (F(0°)) [5].  Some engines, such as Unreal Engine 4, have different workflows where this value may be hard-coded for dialectrics to a value of around 0.04 and stored in base color for metals.  The engine in use for this project is custom and stores the value directly.</li>
  <li>Roughness buffer - this engine stores the roughness value in the w-component of the specular buffer, and is thus readily available when the previous buffer is bound.</li>
  <li>Fallback indirect specular buffer - this buffer contains specular lighting values calculated before the ray tracing step using less precise techniques such as parallax-corrected cube maps and environment probes to help alleviate jarring discontinuities between ray hits and misses.</li>
</ul>

<p>The depth buffer used in this implementation has 32 bits for depth.  All buffers containing lighting data contain 16 bit per channel floating point values.</p>

<p>Also needed for this effect is a constant buffer containing values specific to the effect.  In the initial GLSL implementation these were passed as uniforms, but in HLSL we set up a constant buffer like so:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * The SSLRConstantBuffer.
 * Defines constants used to implement SSLR cone traced screen-space reflections.
 */</span>
<span class="cp">#ifndef CBSSLR_HLSLI
#define CBSSLR_HLSLI
</span>
<span class="n">cbuffer</span> <span class="n">cbSSLR</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">b0</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">float2</span> <span class="n">cb_depthBufferSize</span><span class="p">;</span> <span class="c1">// dimensions of the z-buffer</span>
    <span class="kt">float</span> <span class="n">cb_zThickness</span><span class="p">;</span> <span class="c1">// thickness to ascribe to each pixel in the depth buffer</span>
    <span class="kt">float</span> <span class="n">cb_nearPlaneZ</span><span class="p">;</span> <span class="c1">// the camera's near z plane</span>

    <span class="kt">float</span> <span class="n">cb_stride</span><span class="p">;</span> <span class="c1">// Step in horizontal or vertical pixels between samples. This is a float</span>
    <span class="c1">// because integer math is slow on GPUs, but should be set to an integer &gt;= 1.</span>
    <span class="kt">float</span> <span class="n">cb_maxSteps</span><span class="p">;</span> <span class="c1">// Maximum number of iterations. Higher gives better images but may be slow.</span>
    <span class="kt">float</span> <span class="n">cb_maxDistance</span><span class="p">;</span> <span class="c1">// Maximum camera-space distance to trace before returning a miss.</span>
    <span class="kt">float</span> <span class="n">cb_strideZCutoff</span><span class="p">;</span> <span class="c1">// More distant pixels are smaller in screen space. This value tells at what point to</span>
    <span class="c1">// start relaxing the stride to give higher quality reflections for objects far from</span>
    <span class="c1">// the camera.</span>

    <span class="kt">float</span> <span class="n">cb_numMips</span><span class="p">;</span> <span class="c1">// the number of mip levels in the convolved color buffer</span>
    <span class="kt">float</span> <span class="n">cb_fadeStart</span><span class="p">;</span> <span class="c1">// determines where to start screen edge fading of effect</span>
    <span class="kt">float</span> <span class="n">cb_fadeEnd</span><span class="p">;</span> <span class="c1">// determines where to end screen edge fading of effect</span>
    <span class="kt">float</span> <span class="n">cb_sslr_padding0</span><span class="p">;</span> <span class="c1">// padding for alignment</span>
<span class="p">};</span>

<span class="cp">#endif
</span></code></pre></div></div>

<p>This constant buffer is contained in it’s own .hlsli file and included in the various steps where needed.  Most of the values map directly to uniform values in the GLSL implementation, and a few others will be discussed as they become pertinent.</p>

<h1 id="ray-tracing-in-screen-space">Ray Tracing in Screen Space</h1>

<p>The ray tracing portion of this technique is directly derived from Morgan McGuire and Mike Mara’s open source implementation of using the Digital Differential Analyzer (DDA) line algorithm to evenly distribute ray traced samples in screen space [1].  Their method handles perspective-correct interpolation of a 3D ray projected to screen space, and helps avoid over- and under-sampling issues present in traditional ray marches.  This helps more evenly distribute the limited number of samples that can be afforded per frame across the ray instead of skipping large portions at the start of the ray and bunching up samples towards the end.</p>

<p>McGuire and Mara’s initial implementation was presented in GLSL and assumed negative one (-1) to be the far plane Z value.  Below, the implementation has been converted to HLSL and uses positive one for the far plane.  The initial implementation also uses a linear depth buffer, though their accompanying paper provides source code for running the effect with a non-linear depth buffer.  The provided implementation assumes non-linear depth, and reconstructs linear Z values as they are sampled from the depth buffer using the methods described in [6].</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// By Morgan McGuire and Michael Mara at Williams College 2014</span>
<span class="c1">// Released as open source under the BSD 2-Clause License</span>
<span class="c1">// http://opensource.org/licenses/BSD-2-Clause</span>
<span class="c1">//</span>
<span class="c1">// Copyright (c) 2014, Morgan McGuire and Michael Mara</span>
<span class="c1">// All rights reserved.</span>
<span class="c1">//</span>
<span class="c1">// From McGuire and Mara, Efficient GPU Screen-Space Ray Tracing,</span>
<span class="c1">// Journal of Computer Graphics Techniques, 2014</span>
<span class="c1">//</span>
<span class="c1">// This software is open source under the "BSD 2-clause license":</span>
<span class="c1">//</span>
<span class="c1">// Redistribution and use in source and binary forms, with or</span>
<span class="c1">// without modification, are permitted provided that the following</span>
<span class="c1">// conditions are met:</span>
<span class="c1">//</span>
<span class="c1">// 1. Redistributions of source code must retain the above</span>
<span class="c1">// copyright notice, this list of conditions and the following</span>
<span class="c1">// disclaimer.</span>
<span class="c1">//</span>
<span class="c1">// 2. Redistributions in binary form must reproduce the above</span>
<span class="c1">// copyright notice, this list of conditions and the following</span>
<span class="c1">// disclaimer in the documentation and/or other materials provided</span>
<span class="c1">// with the distribution.</span>
<span class="c1">//</span>
<span class="c1">// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND</span>
<span class="c1">// CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,</span>
<span class="c1">// INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF</span>
<span class="c1">// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
<span class="c1">// DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR</span>
<span class="c1">// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,</span>
<span class="c1">// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT</span>
<span class="c1">// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF</span>
<span class="c1">// USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED</span>
<span class="c1">// AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT</span>
<span class="c1">// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING</span>
<span class="c1">// IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF</span>
<span class="c1">// THE POSSIBILITY OF SUCH DAMAGE.</span>
<span class="cm">/**
 * The ray tracing step of the SSLR implementation.
 * Modified version of the work stated above.
 */</span>
<span class="cp">#include</span> <span class="cpf">"SSLRConstantBuffer.hlsli"</span><span class="cp">
#include</span> <span class="cpf">"../../ConstantBuffers/PerFrame.hlsli"</span><span class="cp">
#include</span> <span class="cpf">"../../Utils/DepthUtils.hlsli"</span><span class="cp">
</span>
<span class="n">Texture2D</span> <span class="n">depthBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">);</span>
<span class="n">Texture2D</span> <span class="n">normalBuffer</span><span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t1</span><span class="p">);</span>

<span class="k">struct</span> <span class="nc">VertexOut</span>
<span class="p">{</span>
    <span class="n">float4</span> <span class="n">posH</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">viewRay</span> <span class="o">:</span> <span class="n">VIEWRAY</span><span class="p">;</span>
    <span class="n">float2</span> <span class="n">tex</span> <span class="o">:</span> <span class="n">TEXCOORD</span><span class="p">;</span>
<span class="p">};</span>

<span class="kt">float</span> <span class="nf">distanceSquared</span><span class="p">(</span><span class="n">float2</span> <span class="n">a</span><span class="p">,</span> <span class="n">float2</span> <span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">a</span> <span class="o">-=</span> <span class="n">b</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">a</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">bool</span> <span class="nf">intersectsDepthBuffer</span><span class="p">(</span><span class="kt">float</span> <span class="n">z</span><span class="p">,</span> <span class="kt">float</span> <span class="n">minZ</span><span class="p">,</span> <span class="kt">float</span> <span class="n">maxZ</span><span class="p">)</span>
<span class="p">{</span>
    <span class="cm">/*
     * Based on how far away from the camera the depth is,
     * adding a bit of extra thickness can help improve some
     * artifacts. Driving this value up too high can cause
     * artifacts of its own.
     */</span>
    <span class="kt">float</span> <span class="n">depthScale</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mf">1.0</span><span class="n">f</span><span class="p">,</span> <span class="n">z</span> <span class="o">*</span> <span class="n">cb_strideZCutoff</span><span class="p">);</span>
    <span class="n">z</span> <span class="o">+=</span> <span class="n">cb_zThickness</span> <span class="o">+</span> <span class="n">lerp</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">2.0</span><span class="n">f</span><span class="p">,</span> <span class="n">depthScale</span><span class="p">);</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">maxZ</span> <span class="o">&gt;=</span> <span class="n">z</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">minZ</span> <span class="o">-</span> <span class="n">cb_zThickness</span> <span class="o">&lt;=</span> <span class="n">z</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">void</span> <span class="nf">swap</span><span class="p">(</span><span class="n">inout</span> <span class="kt">float</span> <span class="n">a</span><span class="p">,</span> <span class="n">inout</span> <span class="kt">float</span> <span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">float</span> <span class="n">t</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
    <span class="n">a</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
    <span class="n">b</span> <span class="o">=</span> <span class="n">t</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">float</span> <span class="nf">linearDepthTexelFetch</span><span class="p">(</span><span class="n">int2</span> <span class="n">hitPixel</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Load returns 0 for any value accessed out of bounds</span>
    <span class="k">return</span> <span class="n">linearizeDepth</span><span class="p">(</span><span class="n">depthBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">int3</span><span class="p">(</span><span class="n">hitPixel</span><span class="p">,</span> <span class="mi">0</span><span class="p">)).</span><span class="n">r</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">// Returns true if the ray hit something</span>
<span class="kt">bool</span> <span class="nf">traceScreenSpaceRay</span><span class="p">(</span>
    <span class="c1">// Camera-space ray origin, which must be within the view volume</span>
    <span class="n">float3</span> <span class="n">csOrig</span><span class="p">,</span>
    <span class="c1">// Unit length camera-space ray direction</span>
    <span class="n">float3</span> <span class="n">csDir</span><span class="p">,</span>
    <span class="c1">// Number between 0 and 1 for how far to bump the ray in stride units</span>
    <span class="c1">// to conceal banding artifacts. Not needed if stride == 1.</span>
    <span class="kt">float</span> <span class="n">jitter</span><span class="p">,</span>
    <span class="c1">// Pixel coordinates of the first intersection with the scene</span>
    <span class="n">out</span> <span class="n">float2</span> <span class="n">hitPixel</span><span class="p">,</span>
    <span class="c1">// Camera space location of the ray hit</span>
    <span class="n">out</span> <span class="n">float3</span> <span class="n">hitPoint</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Clip to the near plane</span>
    <span class="kt">float</span> <span class="n">rayLength</span> <span class="o">=</span> <span class="p">((</span><span class="n">csOrig</span><span class="p">.</span><span class="n">z</span> <span class="o">+</span> <span class="n">csDir</span><span class="p">.</span><span class="n">z</span> <span class="o">*</span> <span class="n">cb_maxDistance</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">cb_nearPlaneZ</span><span class="p">)</span> <span class="o">?</span>
    <span class="p">(</span><span class="n">cb_nearPlaneZ</span> <span class="o">-</span> <span class="n">csOrig</span><span class="p">.</span><span class="n">z</span><span class="p">)</span> <span class="o">/</span> <span class="n">csDir</span><span class="p">.</span><span class="n">z</span> <span class="o">:</span> <span class="n">cb_maxDistance</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">csEndPoint</span> <span class="o">=</span> <span class="n">csOrig</span> <span class="o">+</span> <span class="n">csDir</span> <span class="o">*</span> <span class="n">rayLength</span><span class="p">;</span>

    <span class="c1">// Project into homogeneous clip space</span>
    <span class="n">float4</span> <span class="n">H0</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">float4</span><span class="p">(</span><span class="n">csOrig</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">),</span> <span class="n">viewToTextureSpaceMatrix</span><span class="p">);</span>
    <span class="n">H0</span><span class="p">.</span><span class="n">xy</span> <span class="o">*=</span> <span class="n">cb_depthBufferSize</span><span class="p">;</span>
    <span class="n">float4</span> <span class="n">H1</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">float4</span><span class="p">(</span><span class="n">csEndPoint</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">),</span> <span class="n">viewToTextureSpaceMatrix</span><span class="p">);</span>
    <span class="n">H1</span><span class="p">.</span><span class="n">xy</span> <span class="o">*=</span> <span class="n">cb_depthBufferSize</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">k0</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">/</span> <span class="n">H0</span><span class="p">.</span><span class="n">w</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">k1</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">/</span> <span class="n">H1</span><span class="p">.</span><span class="n">w</span><span class="p">;</span>

    <span class="c1">// The interpolated homogeneous version of the camera-space points</span>
    <span class="n">float3</span> <span class="n">Q0</span> <span class="o">=</span> <span class="n">csOrig</span> <span class="o">*</span> <span class="n">k0</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">Q1</span> <span class="o">=</span> <span class="n">csEndPoint</span> <span class="o">*</span> <span class="n">k1</span><span class="p">;</span>

    <span class="c1">// Screen-space endpoints</span>
    <span class="n">float2</span> <span class="n">P0</span> <span class="o">=</span> <span class="n">H0</span><span class="p">.</span><span class="n">xy</span> <span class="o">*</span> <span class="n">k0</span><span class="p">;</span>
    <span class="n">float2</span> <span class="n">P1</span> <span class="o">=</span> <span class="n">H1</span><span class="p">.</span><span class="n">xy</span> <span class="o">*</span> <span class="n">k1</span><span class="p">;</span>

    <span class="c1">// If the line is degenerate, make it cover at least one pixel</span>
    <span class="c1">// to avoid handling zero-pixel extent as a special case later</span>
    <span class="n">P1</span> <span class="o">+=</span> <span class="p">(</span><span class="n">distanceSquared</span><span class="p">(</span><span class="n">P0</span><span class="p">,</span> <span class="n">P1</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mf">0.0001</span><span class="n">f</span><span class="p">)</span> <span class="o">?</span> <span class="n">float2</span><span class="p">(</span><span class="mf">0.01</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.01</span><span class="n">f</span><span class="p">)</span> <span class="o">:</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">;</span>
    <span class="n">float2</span> <span class="n">delta</span> <span class="o">=</span> <span class="n">P1</span> <span class="o">-</span> <span class="n">P0</span><span class="p">;</span>

    <span class="c1">// Permute so that the primary iteration is in x to collapse</span>
    <span class="c1">// all quadrant-specific DDA cases later</span>
    <span class="kt">bool</span> <span class="n">permute</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">abs</span><span class="p">(</span><span class="n">delta</span><span class="p">.</span><span class="n">x</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">abs</span><span class="p">(</span><span class="n">delta</span><span class="p">.</span><span class="n">y</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="c1">// This is a more-vertical line</span>
        <span class="n">permute</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
        <span class="n">delta</span> <span class="o">=</span> <span class="n">delta</span><span class="p">.</span><span class="n">yx</span><span class="p">;</span>
        <span class="n">P0</span> <span class="o">=</span> <span class="n">P0</span><span class="p">.</span><span class="n">yx</span><span class="p">;</span>
        <span class="n">P1</span> <span class="o">=</span> <span class="n">P1</span><span class="p">.</span><span class="n">yx</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">float</span> <span class="n">stepDir</span> <span class="o">=</span> <span class="n">sign</span><span class="p">(</span><span class="n">delta</span><span class="p">.</span><span class="n">x</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">invdx</span> <span class="o">=</span> <span class="n">stepDir</span> <span class="o">/</span> <span class="n">delta</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>

    <span class="c1">// Track the derivatives of Q and k</span>
    <span class="n">float3</span> <span class="n">dQ</span> <span class="o">=</span> <span class="p">(</span><span class="n">Q1</span> <span class="o">-</span> <span class="n">Q0</span><span class="p">)</span> <span class="o">*</span> <span class="n">invdx</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">dk</span> <span class="o">=</span> <span class="p">(</span><span class="n">k1</span> <span class="o">-</span> <span class="n">k0</span><span class="p">)</span> <span class="o">*</span> <span class="n">invdx</span><span class="p">;</span>
    <span class="n">float2</span> <span class="n">dP</span> <span class="o">=</span> <span class="n">float2</span><span class="p">(</span><span class="n">stepDir</span><span class="p">,</span> <span class="n">delta</span><span class="p">.</span><span class="n">y</span> <span class="o">*</span> <span class="n">invdx</span><span class="p">);</span>

    <span class="c1">// Scale derivatives by the desired pixel stride and then</span>
    <span class="c1">// offset the starting values by the jitter fraction</span>
    <span class="kt">float</span> <span class="n">strideScale</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">min</span><span class="p">(</span><span class="mf">1.0</span><span class="n">f</span><span class="p">,</span> <span class="n">csOrig</span><span class="p">.</span><span class="n">z</span> <span class="o">*</span> <span class="n">cb_strideZCutoff</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">stride</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">+</span> <span class="n">strideScale</span> <span class="o">*</span> <span class="n">cb_stride</span><span class="p">;</span>
    <span class="n">dP</span> <span class="o">*=</span> <span class="n">stride</span><span class="p">;</span>
    <span class="n">dQ</span> <span class="o">*=</span> <span class="n">stride</span><span class="p">;</span>
    <span class="n">dk</span> <span class="o">*=</span> <span class="n">stride</span><span class="p">;</span>

    <span class="n">P0</span> <span class="o">+=</span> <span class="n">dP</span> <span class="o">*</span> <span class="n">jitter</span><span class="p">;</span>
    <span class="n">Q0</span> <span class="o">+=</span> <span class="n">dQ</span> <span class="o">*</span> <span class="n">jitter</span><span class="p">;</span>
    <span class="n">k0</span> <span class="o">+=</span> <span class="n">dk</span> <span class="o">*</span> <span class="n">jitter</span><span class="p">;</span>

    <span class="c1">// Slide P from P0 to P1, (now-homogeneous) Q from Q0 to Q1, k from k0 to k1</span>
    <span class="n">float4</span> <span class="n">PQk</span> <span class="o">=</span> <span class="n">float4</span><span class="p">(</span><span class="n">P0</span><span class="p">,</span> <span class="n">Q0</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="n">k0</span><span class="p">);</span>
    <span class="n">float4</span> <span class="n">dPQk</span> <span class="o">=</span> <span class="n">float4</span><span class="p">(</span><span class="n">dP</span><span class="p">,</span> <span class="n">dQ</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="n">dk</span><span class="p">);</span>
    <span class="n">float3</span> <span class="n">Q</span> <span class="o">=</span> <span class="n">Q0</span><span class="p">;</span> 

    <span class="c1">// Adjust end condition for iteration direction</span>
    <span class="kt">float</span> <span class="n">end</span> <span class="o">=</span> <span class="n">P1</span><span class="p">.</span><span class="n">x</span> <span class="o">*</span> <span class="n">stepDir</span><span class="p">;</span>

    <span class="kt">float</span> <span class="n">stepCount</span> <span class="o">=</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">prevZMaxEstimate</span> <span class="o">=</span> <span class="n">csOrig</span><span class="p">.</span><span class="n">z</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">rayZMin</span> <span class="o">=</span> <span class="n">prevZMaxEstimate</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">rayZMax</span> <span class="o">=</span> <span class="n">prevZMaxEstimate</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">sceneZMax</span> <span class="o">=</span> <span class="n">rayZMax</span> <span class="o">+</span> <span class="mf">100.0</span><span class="n">f</span><span class="p">;</span>
    <span class="k">for</span><span class="p">(;</span>
        <span class="p">((</span><span class="n">PQk</span><span class="p">.</span><span class="n">x</span> <span class="o">*</span> <span class="n">stepDir</span><span class="p">)</span> <span class="o">&lt;=</span> <span class="n">end</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="p">(</span><span class="n">stepCount</span> <span class="o">&lt;</span> <span class="n">cb_maxSteps</span><span class="p">)</span> <span class="o">&amp;&amp;</span>
        <span class="o">!</span><span class="n">intersectsDepthBuffer</span><span class="p">(</span><span class="n">sceneZMax</span><span class="p">,</span> <span class="n">rayZMin</span><span class="p">,</span> <span class="n">rayZMax</span><span class="p">)</span> <span class="o">&amp;&amp;</span>
        <span class="p">(</span><span class="n">sceneZMax</span> <span class="o">!=</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">);</span>
        <span class="o">++</span><span class="n">stepCount</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">rayZMin</span> <span class="o">=</span> <span class="n">prevZMaxEstimate</span><span class="p">;</span>
        <span class="n">rayZMax</span> <span class="o">=</span> <span class="p">(</span><span class="n">dPQk</span><span class="p">.</span><span class="n">z</span> <span class="o">*</span> <span class="mf">0.5</span><span class="n">f</span> <span class="o">+</span> <span class="n">PQk</span><span class="p">.</span><span class="n">z</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">dPQk</span><span class="p">.</span><span class="n">w</span> <span class="o">*</span> <span class="mf">0.5</span><span class="n">f</span> <span class="o">+</span> <span class="n">PQk</span><span class="p">.</span><span class="n">w</span><span class="p">);</span>
        <span class="n">prevZMaxEstimate</span> <span class="o">=</span> <span class="n">rayZMax</span><span class="p">;</span>
        <span class="k">if</span><span class="p">(</span><span class="n">rayZMin</span> <span class="o">&gt;</span> <span class="n">rayZMax</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">swap</span><span class="p">(</span><span class="n">rayZMin</span><span class="p">,</span> <span class="n">rayZMax</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="n">hitPixel</span> <span class="o">=</span> <span class="n">permute</span> <span class="o">?</span> <span class="n">PQk</span><span class="p">.</span><span class="n">yx</span> <span class="o">:</span> <span class="n">PQk</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
        <span class="c1">// You may need hitPixel.y = depthBufferSize.y - hitPixel.y; here if your vertical axis</span>
        <span class="c1">// is different than ours in screen space</span>
        <span class="n">sceneZMax</span> <span class="o">=</span> <span class="n">linearDepthTexelFetch</span><span class="p">(</span><span class="n">depthBuffer</span><span class="p">,</span> <span class="n">int2</span><span class="p">(</span><span class="n">hitPixel</span><span class="p">));</span>

        <span class="n">PQk</span> <span class="o">+=</span> <span class="n">dPQk</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="c1">// Advance Q based on the number of steps</span>
    <span class="n">Q</span><span class="p">.</span><span class="n">xy</span> <span class="o">+=</span> <span class="n">dQ</span><span class="p">.</span><span class="n">xy</span> <span class="o">*</span> <span class="n">stepCount</span><span class="p">;</span>
    <span class="n">hitPoint</span> <span class="o">=</span> <span class="n">Q</span> <span class="o">*</span> <span class="p">(</span><span class="mf">1.0</span><span class="n">f</span> <span class="o">/</span> <span class="n">PQk</span><span class="p">.</span><span class="n">w</span><span class="p">);</span>
    <span class="k">return</span> <span class="nf">intersectsDepthBuffer</span><span class="p">(</span><span class="n">sceneZMax</span><span class="p">,</span> <span class="n">rayZMin</span><span class="p">,</span> <span class="n">rayZMax</span><span class="p">);</span>
<span class="p">}</span>

<span class="n">float4</span> <span class="nf">main</span><span class="p">(</span><span class="n">VertexOut</span> <span class="n">pIn</span><span class="p">)</span> <span class="o">:</span> <span class="n">SV_TARGET</span>
<span class="p">{</span>
    <span class="n">int3</span> <span class="n">loadIndices</span> <span class="o">=</span> <span class="n">int3</span><span class="p">(</span><span class="n">pIn</span><span class="p">.</span><span class="n">posH</span><span class="p">.</span><span class="n">xy</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="n">float3</span> <span class="n">normalVS</span> <span class="o">=</span> <span class="n">normalBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadIndices</span><span class="p">).</span><span class="n">xyz</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="o">!</span><span class="n">any</span><span class="p">(</span><span class="n">normalVS</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="k">return</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="kt">float</span> <span class="n">depth</span> <span class="o">=</span> <span class="n">depthBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadIndices</span><span class="p">).</span><span class="n">r</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">rayOriginVS</span> <span class="o">=</span> <span class="n">pIn</span><span class="p">.</span><span class="n">viewRay</span> <span class="o">*</span> <span class="nf">linearizeDepth</span><span class="p">(</span><span class="n">depth</span><span class="p">);</span>

    <span class="cm">/*
     * Since position is reconstructed in view space, just normalize it to get the
     * vector from the eye to the position and then reflect that around the normal to
     * get the ray direction to trace.
     */</span>
    <span class="n">float3</span> <span class="n">toPositionVS</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">rayOriginVS</span><span class="p">);</span>
    <span class="n">float3</span> <span class="n">rayDirectionVS</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">reflect</span><span class="p">(</span><span class="n">toPositionVS</span><span class="p">,</span> <span class="n">normalVS</span><span class="p">));</span>

    <span class="c1">// output rDotV to the alpha channel for use in determining how much to fade the ray</span>
    <span class="kt">float</span> <span class="n">rDotV</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">rayDirectionVS</span><span class="p">,</span> <span class="n">toPositionVS</span><span class="p">);</span>

    <span class="c1">// out parameters</span>
    <span class="n">float2</span> <span class="n">hitPixel</span> <span class="o">=</span> <span class="n">float2</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">);</span>
    <span class="n">float3</span> <span class="n">hitPoint</span> <span class="o">=</span> <span class="n">float3</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">);</span>

    <span class="kt">float</span> <span class="n">jitter</span> <span class="o">=</span> <span class="n">cb_stride</span> <span class="o">&gt;</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">?</span> <span class="kt">float</span><span class="p">(</span><span class="kt">int</span><span class="p">(</span><span class="n">pIn</span><span class="p">.</span><span class="n">posH</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">pIn</span><span class="p">.</span><span class="n">posH</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.5</span><span class="n">f</span> <span class="o">:</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">;</span>

    <span class="c1">// perform ray tracing - true if hit found, false otherwise</span>
    <span class="kt">bool</span> <span class="n">intersection</span> <span class="o">=</span> <span class="n">traceScreenSpaceRay</span><span class="p">(</span><span class="n">rayOriginVS</span><span class="p">,</span> <span class="n">rayDirectionVS</span><span class="p">,</span> <span class="n">jitter</span><span class="p">,</span> <span class="n">hitPixel</span><span class="p">,</span> <span class="n">hitPoint</span><span class="p">);</span>

    <span class="n">depth</span> <span class="o">=</span> <span class="n">depthBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">int3</span><span class="p">(</span><span class="n">hitPixel</span><span class="p">,</span> <span class="mi">0</span><span class="p">)).</span><span class="n">r</span><span class="p">;</span>

    <span class="c1">// move hit pixel from pixel position to UVs</span>
    <span class="n">hitPixel</span> <span class="o">*=</span> <span class="n">float2</span><span class="p">(</span><span class="n">texelWidth</span><span class="p">,</span> <span class="n">texelHeight</span><span class="p">);</span>
    <span class="k">if</span><span class="p">(</span><span class="n">hitPixel</span><span class="p">.</span><span class="n">x</span> <span class="o">&gt;</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">||</span> <span class="n">hitPixel</span><span class="p">.</span><span class="n">x</span> <span class="o">&lt;</span> <span class="mf">0.0</span><span class="n">f</span> <span class="o">||</span> <span class="n">hitPixel</span><span class="p">.</span><span class="n">y</span> <span class="o">&gt;</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">||</span> <span class="n">hitPixel</span><span class="p">.</span><span class="n">y</span> <span class="o">&lt;</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">intersection</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="nf">float4</span><span class="p">(</span><span class="n">hitPixel</span><span class="p">,</span> <span class="n">depth</span><span class="p">,</span> <span class="n">rDotV</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">intersection</span> <span class="o">?</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">:</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The DepthUtils.hlsli header contains the linearizeDepth function that’s used to convert a perspective-z depth into a linear value.  The PerFrame.hlsli header contains several values that are set at the start of a frame and remain constant throughout.  Of particular interest are texelWidth and texelHeight, which contain the texel size for the client (1 / dimension).  We use these value to convert pixel positions from the trace result into UV coordinates for easy lookup in subsequent steps.</p>

<p>An idea borrowed from Ben Hopkins (<a href="https://twitter.com/kode80">@kode80</a>), who also open sourced his implementation of ray tracing based on McGuire’s initial work, is to use cutoff value for the stride based on Z distance [2].  The idea is that since as the distance grows further from the viewer and perspective projection makes objects smaller in screen space, the stride can be shortened and still likely find its contact point.  This helps distant locations create higher quality reflections than if they were to use a large stride similar to closer locations.  In the above implementation, this idea was extended into adding additional thickness to objects as their distance from the viewer increased.  This resulted in less artifacts at shallow angles where the rayZMin and rayZMax values would grow such that the sampled sceneZMax would fail and be rejected by small margins.</p>

<p>Another interesting idea from Hopkins’ implementation was to store the values and the step derivatives in float4 types.  The goal of this is to encourage the HLSL compiler to take advantage of SIMD operations since they are used in identical operations all at the same time.  In practice, the output from the Visual Studio 2013 Graphics Debugger showed the bytecode was nearly identical between the McGuire implementation and Hopkins’ implementation, but it was left in for being a cool idea.</p>

<p>The image below shows the results of the ray tracing step.  The buffer values include the UV coordinates of the ray hit in the x and y components, the depth in the z component, and the dot product of the view ray and the reflection ray in the w component.  The value stored in the w-component is used in the cone tracing step to fade rays facing towards the camera.  Black pixels mark areas where no intersection occurred.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_ray_traced_buffer.png" title="SSLR Ray Traced Buffer">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_ray_traced_buffer.png" alt="SSLR Ray Traced Buffer" />
      </a>
    
  
  
</figure>

<h1 id="blurring-the-light-buffer">Blurring the Light Buffer</h1>

<p>The next step to obtaining glossy reflections is to blur the light buffer.  Specifically, the light buffer is copied to the top-most mip level of a texture supporting a full mip chain, and from there the result is blurred into its lower mip levels.  A separable 1-dimensional Gaussian blur is used.  The below implementation uses a 7-tap kernel, but the implementer should experiment to get a value that seems appropriate for their particular needs.  First the blur is applied vertically to a temporary buffer, then the blur is applied horizontally to the next level down in the mip chain.  The following code listing shows a simple blur shader.  Notice that to use the contents, there would need to exist two additional shaders, one defining each of the pre-processor directives specifying directionality and including the file below.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * The Convolution shader body.
 * Requires either CONVOLVE_VERTICAL or CONVOLVE_HORIZONTAL
 * to be defined.
 */</span>
<span class="cp">#ifndef CONVOLUTIONPS_HLSLI
#define CONVOLUTIONPS_HLSLI
</span>
<span class="cp">#include</span> <span class="cpf">"SSLRConstantBuffer.hlsli"</span><span class="cp">
</span>
<span class="k">struct</span> <span class="nc">VertexOut</span>
<span class="p">{</span>
    <span class="n">float4</span> <span class="n">posH</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
    <span class="n">float2</span> <span class="n">tex</span> <span class="o">:</span> <span class="n">TEXCOORD</span><span class="p">;</span>
<span class="p">};</span>

<span class="n">Texture2D</span> <span class="n">colorBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t1</span><span class="p">);</span>

<span class="cp">#if CONVOLVE_HORIZONTAL
</span><span class="k">static</span> <span class="k">const</span> <span class="n">int2</span> <span class="n">offsets</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="p">{{</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">}};</span>
<span class="cp">#elif CONVOLVE_VERTICAL
</span><span class="k">static</span> <span class="k">const</span> <span class="n">int2</span> <span class="n">offsets</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="p">{{</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span><span class="p">},</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">},</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">},</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">},</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">},</span> <span class="p">{</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">}};</span>
<span class="cp">#endif
</span><span class="k">static</span> <span class="k">const</span> <span class="kt">float</span> <span class="n">weights</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mf">0.001</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.028</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.233</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.474</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.233</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.028</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.001</span><span class="n">f</span><span class="p">};</span>

<span class="n">float4</span> <span class="n">main</span><span class="p">(</span><span class="n">VertexOut</span> <span class="n">pIn</span><span class="p">)</span><span class="o">:</span> <span class="n">SV_Target0</span>
<span class="p">{</span>
    <span class="n">float2</span> <span class="n">uvs</span> <span class="o">=</span> <span class="n">pIn</span><span class="p">.</span><span class="n">tex</span> <span class="o">*</span> <span class="n">cb_depthBufferSize</span><span class="p">;</span> <span class="c1">// make sure to send in the SRV's dimensions for cb_depthBufferSize</span>
    <span class="c1">// sample level zero since only one mip level is available with the bound SRV</span>
    <span class="n">int3</span> <span class="n">loadPos</span> <span class="o">=</span> <span class="n">int3</span><span class="p">(</span><span class="n">uvs</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>

    <span class="n">float4</span> <span class="n">color</span> <span class="o">=</span> <span class="n">float4</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">);</span>
    <span class="p">[</span><span class="n">unroll</span><span class="p">]</span>
    <span class="k">for</span><span class="p">(</span><span class="n">uint</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0u</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">7u</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">color</span> <span class="o">+=</span> <span class="n">colorBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadPos</span><span class="p">,</span> <span class="n">offsets</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">*</span> <span class="n">weights</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="nf">float4</span><span class="p">(</span><span class="n">color</span><span class="p">.</span><span class="n">rgb</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>

<span class="cp">#endif
</span></code></pre></div></div>

<p>During the blur passes the constant buffer values storing depth buffer size for the rest of the effect are re-purposed for recovering the load positions for fetches from the bound texture.  At the end of all blur passes these values should be reset to the correct dimensions before proceeding.</p>

<h1 id="cone-tracing">Cone Tracing</h1>

<p>At this point in the effect, the ray traced buffer is complete and the full mip chain of the light buffer has been generated.  The idea in this section comes from the Yasin Uludag’s article in GPU Pro 5 [3].</p>

<p>It was mentioned earlier in the post that for glossy reflections to be represented, both the surface roughness and the distance traveled from the reflecting point to its point of contact needed to be accounted for.  Whereas a perfect mirror would cast a straight line outwards from the origin point, a rougher surface would cast a cone shape.  The figure below shows a representation of this phenomenon (albeit a bit crudely).</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_ray_vs_cone_comparison.png" title="SSLR Ray vs Cone Comparison">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_ray_vs_cone_comparison.png" alt="SSLR Ray vs Cone Comparison" />
      </a>
    
  
  
</figure>

<p>With these observation made, it can further be distinguished that in screen space a cone (3-dimensional) projects into an isosceles triangle (2-dimensional).  Knowing the location of the starting point and the ray’s end point tells us how far in screen space the ray has traveled.  With the roughness value available for the current surface through sampling the appropriate texture, everything that’s needed to move forward is on-hand.</p>

<p>The steps for cone tracing are as follows.</p>

<ol>
  <li>The adjacent length of the isosceles triangle is found by finding the magnitude of the vector from the origin position to the ray hit position.</li>
  <li>The sampled roughness is converted into a specular power.</li>
  <li>The specular power is then used to calculate the cone angle (theta) for the isosceles triangle.</li>
  <li>The opposite length of the the triangle is found by dividing the cone angle in half and finding the opposite side of a right triangle using basic trigonometry, specifically that tan(theta) = oppositeLength/adjacentLength, which is equivalently represented as oppositeLength = tan(theta) * adjacentLength.</li>
  <li>The result is then doubled to recover the full length.</li>
  <li>The radius of a circle inscribed in the triangle is found using the formula found at [7] for isosceles triangles.  This is used to determine the sample position and the mip level from which to sample.</li>
  <li>The color is sampled and weighted based on surface roughness.</li>
  <li>Steps 2-7 are repeated several times until the resulting alpha reaches 1, or the loop hits its iteration limit.  During each iteration, the triangle’s adjacent length is shortened by the previously calculated radius, then each value is recomputed for the new triangle.</li>
</ol>

<p>Step 7 in particular differs from Uludag’s implementation where he builds out an entire visibility buffer that is used to help diminish contributions from sampled pixels that should not be included as part of the result.  For most cases, the results tend to be good enough with this simplified approach, and the cost saved from not creating the visibility buffer and the hierarchical z-buffer from Uludag’s article can be re-assigned to further refinements or other effects.</p>

<p>The formula for finding the incircle of an isosceles triangle is displayed below.  In the formula, <em>a</em> represents the opposite length of the triangle and <em>h</em> represents the adjacent length.  The following image was obtained from [7].</p>

<p><img src="/assets/images/ss_glossy_reflections_post/isosceles_triangle_inradius_formula.gif" alt="isosceles triangle inradius formula" class="align-center" /></p>

<p>Once the cone traced color is found, it’s modulated by the calculated Fresnel term using the values from the specular buffer, a normalized vector pointing from the surface location back towards the viewer, and the surface normal.  Finally, several fading steps are applied to help diminish the pronouncement of areas where the ray tracing step failed to find an intersection.  The results of this step are added back to the original light buffer and the process is complete.</p>

<p>The below shader code demonstrates this process.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"SSLRConstantBuffer.hlsli"</span><span class="cp">
#include</span> <span class="cpf">"../../LightingModel/PBL/LightUtils.hlsli"</span><span class="cp">
#include</span> <span class="cpf">"../../ConstantBuffers/PerFrame.hlsli"</span><span class="cp">
#include</span> <span class="cpf">"../../Utils/DepthUtils.hlsli"</span><span class="cp">
#include</span> <span class="cpf">"../../ShaderConstants.hlsli"</span><span class="cp">
</span>
<span class="k">struct</span> <span class="nc">VertexOut</span>
<span class="p">{</span>
    <span class="n">float4</span> <span class="n">posH</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">viewRay</span> <span class="o">:</span> <span class="n">VIEWRAY</span><span class="p">;</span>
    <span class="n">float2</span> <span class="n">tex</span> <span class="o">:</span> <span class="n">TEXCOORD</span><span class="p">;</span>
<span class="p">};</span>

<span class="n">SamplerState</span> <span class="n">sampTrilinearClamp</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">s1</span><span class="p">);</span>

<span class="n">Texture2D</span> <span class="n">depthBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t0</span><span class="p">);</span> <span class="c1">// scene depth buffer used in ray tracing step</span>
<span class="n">Texture2D</span> <span class="n">colorBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t1</span><span class="p">);</span> <span class="c1">// convolved color buffer - all mip levels</span>
<span class="n">Texture2D</span> <span class="n">rayTracingBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t2</span><span class="p">);</span> <span class="c1">// ray-tracing buffer</span>
<span class="n">Texture2D</span> <span class="n">normalBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t3</span><span class="p">);</span> <span class="c1">// normal buffer - from g-buffer</span>
<span class="n">Texture2D</span> <span class="n">specularBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t4</span><span class="p">);</span> <span class="c1">// specular buffer - from g-buffer (rgb = ior, a = roughness)</span>
<span class="n">Texture2D</span> <span class="n">indirectSpecularBuffer</span> <span class="o">:</span> <span class="k">register</span><span class="p">(</span><span class="n">t5</span><span class="p">);</span> <span class="c1">// indirect specular light buffer used for fallback</span>

<span class="c1">///////////////////////////////////////////////////////////////////////////////////////</span>
<span class="c1">// Cone tracing methods</span>
<span class="c1">///////////////////////////////////////////////////////////////////////////////////////</span>

<span class="kt">float</span> <span class="nf">specularPowerToConeAngle</span><span class="p">(</span><span class="kt">float</span> <span class="n">specularPower</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// based on phong distribution model</span>
    <span class="k">if</span><span class="p">(</span><span class="n">specularPower</span> <span class="o">&gt;=</span> <span class="n">exp2</span><span class="p">(</span><span class="n">CNST_MAX_SPECULAR_EXP</span><span class="p">))</span>
    <span class="p">{</span>
        <span class="k">return</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">const</span> <span class="kt">float</span> <span class="n">xi</span> <span class="o">=</span> <span class="mf">0.244</span><span class="n">f</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">exponent</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">/</span> <span class="p">(</span><span class="n">specularPower</span> <span class="o">+</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">);</span>
    <span class="k">return</span> <span class="nf">acos</span><span class="p">(</span><span class="n">pow</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">exponent</span><span class="p">));</span>
<span class="p">}</span>

<span class="kt">float</span> <span class="nf">isoscelesTriangleOpposite</span><span class="p">(</span><span class="kt">float</span> <span class="n">adjacentLength</span><span class="p">,</span> <span class="kt">float</span> <span class="n">coneTheta</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// simple trig and algebra - soh, cah, toa - tan(theta) = opp/adj, opp = tan(theta) * adj, then multiply * 2.0f for isosceles triangle base</span>
    <span class="k">return</span> <span class="mf">2.0</span><span class="n">f</span> <span class="o">*</span> <span class="n">tan</span><span class="p">(</span><span class="n">coneTheta</span><span class="p">)</span> <span class="o">*</span> <span class="n">adjacentLength</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">float</span> <span class="nf">isoscelesTriangleInRadius</span><span class="p">(</span><span class="kt">float</span> <span class="n">a</span><span class="p">,</span> <span class="kt">float</span> <span class="n">h</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">float</span> <span class="n">a2</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="n">a</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">fh2</span> <span class="o">=</span> <span class="mf">4.0</span><span class="n">f</span> <span class="o">*</span> <span class="n">h</span> <span class="o">*</span> <span class="n">h</span><span class="p">;</span>
    <span class="k">return</span> <span class="p">(</span><span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="n">sqrt</span><span class="p">(</span><span class="n">a2</span> <span class="o">+</span> <span class="n">fh2</span><span class="p">)</span> <span class="o">-</span> <span class="n">a</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mf">4.0</span><span class="n">f</span> <span class="o">*</span> <span class="n">h</span><span class="p">);</span>
<span class="p">}</span>

<span class="n">float4</span> <span class="nf">coneSampleWeightedColor</span><span class="p">(</span><span class="n">float2</span> <span class="n">samplePos</span><span class="p">,</span> <span class="kt">float</span> <span class="n">mipChannel</span><span class="p">,</span> <span class="kt">float</span> <span class="n">gloss</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">float3</span> <span class="n">sampleColor</span> <span class="o">=</span> <span class="n">colorBuffer</span><span class="p">.</span><span class="n">SampleLevel</span><span class="p">(</span><span class="n">sampTrilinearClamp</span><span class="p">,</span> <span class="n">samplePos</span><span class="p">,</span> <span class="n">mipChannel</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">float4</span><span class="p">(</span><span class="n">sampleColor</span> <span class="o">*</span> <span class="n">gloss</span><span class="p">,</span> <span class="n">gloss</span><span class="p">);</span>
<span class="p">}</span>

<span class="kt">float</span> <span class="nf">isoscelesTriangleNextAdjacent</span><span class="p">(</span><span class="kt">float</span> <span class="n">adjacentLength</span><span class="p">,</span> <span class="kt">float</span> <span class="n">incircleRadius</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// subtract the diameter of the incircle to get the adjacent side of the next level on the cone</span>
    <span class="k">return</span> <span class="n">adjacentLength</span> <span class="o">-</span> <span class="p">(</span><span class="n">incircleRadius</span> <span class="o">*</span> <span class="mf">2.0</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>

<span class="c1">///////////////////////////////////////////////////////////////////////////////////////</span>

<span class="n">float4</span> <span class="nf">main</span><span class="p">(</span><span class="n">VertexOut</span> <span class="n">pIn</span><span class="p">)</span> <span class="o">:</span> <span class="n">SV_TARGET</span>
<span class="p">{</span>
    <span class="n">int3</span> <span class="n">loadIndices</span> <span class="o">=</span> <span class="n">int3</span><span class="p">(</span><span class="n">pIn</span><span class="p">.</span><span class="n">posH</span><span class="p">.</span><span class="n">xy</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="c1">// get screen-space ray intersection point</span>
    <span class="n">float4</span> <span class="n">raySS</span> <span class="o">=</span> <span class="n">rayTracingBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadIndices</span><span class="p">).</span><span class="n">xyzw</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">fallbackColor</span> <span class="o">=</span> <span class="n">indirectSpecularBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadIndices</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
    <span class="k">if</span><span class="p">(</span><span class="n">raySS</span><span class="p">.</span><span class="n">w</span> <span class="o">&lt;=</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">)</span> <span class="c1">// either means no hit or the ray faces back towards the camera</span>
    <span class="p">{</span>
        <span class="c1">// no data for this point - a fallback like localized environment maps should be used</span>
        <span class="k">return</span> <span class="n">float4</span><span class="p">(</span><span class="n">fallbackColor</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">);</span>
    <span class="p">}</span>
    <span class="kt">float</span> <span class="n">depth</span> <span class="o">=</span> <span class="n">depthBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadIndices</span><span class="p">).</span><span class="n">r</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">positionSS</span> <span class="o">=</span> <span class="n">float3</span><span class="p">(</span><span class="n">pIn</span><span class="p">.</span><span class="n">tex</span><span class="p">,</span> <span class="n">depth</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">linearDepth</span> <span class="o">=</span> <span class="n">linearizeDepth</span><span class="p">(</span><span class="n">depth</span><span class="p">);</span>
    <span class="n">float3</span> <span class="n">positionVS</span> <span class="o">=</span> <span class="n">pIn</span><span class="p">.</span><span class="n">viewRay</span> <span class="o">*</span> <span class="n">linearDepth</span><span class="p">;</span>
    <span class="c1">// since calculations are in view-space, we can just normalize the position to point at it</span>
    <span class="n">float3</span> <span class="n">toPositionVS</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">positionVS</span><span class="p">);</span>
    <span class="n">float3</span> <span class="n">normalVS</span> <span class="o">=</span> <span class="n">normalBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadIndices</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>

    <span class="c1">// get specular power from roughness</span>
    <span class="n">float4</span> <span class="n">specularAll</span> <span class="o">=</span> <span class="n">specularBuffer</span><span class="p">.</span><span class="n">Load</span><span class="p">(</span><span class="n">loadIndices</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">gloss</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">specularAll</span><span class="p">.</span><span class="n">a</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">specularPower</span> <span class="o">=</span> <span class="n">roughnessToSpecularPower</span><span class="p">(</span><span class="n">specularAll</span><span class="p">.</span><span class="n">a</span><span class="p">);</span>

    <span class="c1">// convert to cone angle (maximum extent of the specular lobe aperture)</span>
    <span class="c1">// only want half the full cone angle since we're slicing the isosceles triangle in half to get a right triangle</span>
    <span class="kt">float</span> <span class="n">coneTheta</span> <span class="o">=</span> <span class="n">specularPowerToConeAngle</span><span class="p">(</span><span class="n">specularPower</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.5</span><span class="n">f</span><span class="p">;</span>

    <span class="c1">// P1 = positionSS, P2 = raySS, adjacent length = ||P2 - P1||</span>
    <span class="n">float2</span> <span class="n">deltaP</span> <span class="o">=</span> <span class="n">raySS</span><span class="p">.</span><span class="n">xy</span> <span class="o">-</span> <span class="n">positionSS</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">adjacentLength</span> <span class="o">=</span> <span class="n">length</span><span class="p">(</span><span class="n">deltaP</span><span class="p">);</span>
    <span class="n">float2</span> <span class="n">adjacentUnit</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">deltaP</span><span class="p">);</span>

    <span class="n">float4</span> <span class="n">totalColor</span> <span class="o">=</span> <span class="n">float4</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">remainingAlpha</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">maxMipLevel</span> <span class="o">=</span> <span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">cb_numMips</span> <span class="o">-</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">;</span>
    <span class="kt">float</span> <span class="n">glossMult</span> <span class="o">=</span> <span class="n">gloss</span><span class="p">;</span>
    <span class="c1">// cone-tracing using an isosceles triangle to approximate a cone in screen space</span>
    <span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="mi">14</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// intersection length is the adjacent side, get the opposite side using trig</span>
        <span class="kt">float</span> <span class="n">oppositeLength</span> <span class="o">=</span> <span class="n">isoscelesTriangleOpposite</span><span class="p">(</span><span class="n">adjacentLength</span><span class="p">,</span> <span class="n">coneTheta</span><span class="p">);</span>

        <span class="c1">// calculate in-radius of the isosceles triangle</span>
        <span class="kt">float</span> <span class="n">incircleSize</span> <span class="o">=</span> <span class="n">isoscelesTriangleInRadius</span><span class="p">(</span><span class="n">oppositeLength</span><span class="p">,</span> <span class="n">adjacentLength</span><span class="p">);</span>

        <span class="c1">// get the sample position in screen space</span>
        <span class="n">float2</span> <span class="n">samplePos</span> <span class="o">=</span> <span class="n">positionSS</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">adjacentUnit</span> <span class="o">*</span> <span class="p">(</span><span class="n">adjacentLength</span> <span class="o">-</span> <span class="n">incircleSize</span><span class="p">);</span>

        <span class="c1">// convert the in-radius into screen size then check what power N to raise 2 to reach it - that power N becomes mip level to sample from</span>
        <span class="kt">float</span> <span class="n">mipChannel</span> <span class="o">=</span> <span class="n">clamp</span><span class="p">(</span><span class="n">log2</span><span class="p">(</span><span class="n">incircleSize</span> <span class="o">*</span> <span class="n">max</span><span class="p">(</span><span class="n">cb_depthBufferSize</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">cb_depthBufferSize</span><span class="p">.</span><span class="n">y</span><span class="p">)),</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="n">maxMipLevel</span><span class="p">);</span>

        <span class="cm">/*
         * Read color and accumulate it using trilinear filtering and weight it.
         * Uses pre-convolved image (color buffer) and glossiness to weigh color contributions.
         * Visibility is accumulated in the alpha channel. Break if visibility is 100% or greater (&gt;= 1.0f).
         */</span>
        <span class="n">float4</span> <span class="n">newColor</span> <span class="o">=</span> <span class="n">coneSampleWeightedColor</span><span class="p">(</span><span class="n">samplePos</span><span class="p">,</span> <span class="n">mipChannel</span><span class="p">,</span> <span class="n">glossMult</span><span class="p">);</span>

        <span class="n">remainingAlpha</span> <span class="o">-=</span> <span class="n">newColor</span><span class="p">.</span><span class="n">a</span><span class="p">;</span>
        <span class="k">if</span><span class="p">(</span><span class="n">remainingAlpha</span> <span class="o">&lt;</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">newColor</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*=</span> <span class="p">(</span><span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">abs</span><span class="p">(</span><span class="n">remainingAlpha</span><span class="p">));</span>
        <span class="p">}</span>
        <span class="n">totalColor</span> <span class="o">+=</span> <span class="n">newColor</span><span class="p">;</span>

        <span class="k">if</span><span class="p">(</span><span class="n">totalColor</span><span class="p">.</span><span class="n">a</span> <span class="o">&gt;=</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="k">break</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="n">adjacentLength</span> <span class="o">=</span> <span class="n">isoscelesTriangleNextAdjacent</span><span class="p">(</span><span class="n">adjacentLength</span><span class="p">,</span> <span class="n">incircleSize</span><span class="p">);</span>
        <span class="n">glossMult</span> <span class="o">*=</span> <span class="n">gloss</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="n">float3</span> <span class="n">toEye</span> <span class="o">=</span> <span class="o">-</span><span class="n">toPositionVS</span><span class="p">;</span>
    <span class="n">float3</span> <span class="n">specular</span> <span class="o">=</span> <span class="n">calculateFresnelTerm</span><span class="p">(</span><span class="n">specularAll</span><span class="p">.</span><span class="n">rgb</span><span class="p">,</span> <span class="n">abs</span><span class="p">(</span><span class="n">dot</span><span class="p">(</span><span class="n">normalVS</span><span class="p">,</span> <span class="n">toEye</span><span class="p">)))</span> <span class="o">*</span> <span class="n">CNST_1DIVPI</span><span class="p">;</span>

    <span class="c1">// fade rays close to screen edge</span>
    <span class="n">float2</span> <span class="n">boundary</span> <span class="o">=</span> <span class="n">abs</span><span class="p">(</span><span class="n">raySS</span><span class="p">.</span><span class="n">xy</span> <span class="o">-</span> <span class="n">float2</span><span class="p">(</span><span class="mf">0.5</span><span class="n">f</span><span class="p">,</span> <span class="mf">0.5</span><span class="n">f</span><span class="p">))</span> <span class="o">*</span> <span class="mf">2.0</span><span class="n">f</span><span class="p">;</span>
    <span class="k">const</span> <span class="kt">float</span> <span class="n">fadeDiffRcp</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">/</span> <span class="p">(</span><span class="n">cb_fadeEnd</span> <span class="o">-</span> <span class="n">cb_fadeStart</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">fadeOnBorder</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">saturate</span><span class="p">((</span><span class="n">boundary</span><span class="p">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">cb_fadeStart</span><span class="p">)</span> <span class="o">*</span> <span class="n">fadeDiffRcp</span><span class="p">);</span>
    <span class="n">fadeOnBorder</span> <span class="o">*=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">saturate</span><span class="p">((</span><span class="n">boundary</span><span class="p">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">cb_fadeStart</span><span class="p">)</span> <span class="o">*</span> <span class="n">fadeDiffRcp</span><span class="p">);</span>
    <span class="n">fadeOnBorder</span> <span class="o">=</span> <span class="n">smoothstep</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">,</span> <span class="n">fadeOnBorder</span><span class="p">);</span>
    <span class="n">float3</span> <span class="n">rayHitPositionVS</span> <span class="o">=</span> <span class="n">viewSpacePositionFromDepth</span><span class="p">(</span><span class="n">raySS</span><span class="p">.</span><span class="n">xy</span><span class="p">,</span> <span class="n">raySS</span><span class="p">.</span><span class="n">z</span><span class="p">);</span>
    <span class="kt">float</span> <span class="n">fadeOnDistance</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">saturate</span><span class="p">(</span><span class="n">distance</span><span class="p">(</span><span class="n">rayHitPositionVS</span><span class="p">,</span> <span class="n">positionVS</span><span class="p">)</span> <span class="o">/</span> <span class="n">cb_maxDistance</span><span class="p">);</span>
    <span class="c1">// ray tracing steps stores rdotv in w component - always &gt; 0 due to check at start of this method</span>
    <span class="kt">float</span> <span class="n">fadeOnPerpendicular</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="n">lerp</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">,</span> <span class="n">saturate</span><span class="p">(</span><span class="n">raySS</span><span class="p">.</span><span class="n">w</span> <span class="o">*</span> <span class="mf">4.0</span><span class="n">f</span><span class="p">)));</span>
    <span class="kt">float</span> <span class="n">fadeOnRoughness</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="n">lerp</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">,</span> <span class="n">gloss</span> <span class="o">*</span> <span class="mf">4.0</span><span class="n">f</span><span class="p">));</span>
    <span class="kt">float</span> <span class="n">totalFade</span> <span class="o">=</span> <span class="n">fadeOnBorder</span> <span class="o">*</span> <span class="n">fadeOnDistance</span> <span class="o">*</span> <span class="n">fadeOnPerpendicular</span> <span class="o">*</span> <span class="n">fadeOnRoughness</span> <span class="o">*</span> <span class="p">(</span><span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">saturate</span><span class="p">(</span><span class="n">remainingAlpha</span><span class="p">));</span>

    <span class="k">return</span> <span class="nf">float4</span><span class="p">(</span><span class="n">lerp</span><span class="p">(</span><span class="n">fallbackColor</span><span class="p">,</span> <span class="n">totalColor</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="n">specular</span><span class="p">,</span> <span class="n">totalFade</span><span class="p">),</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The following image roughly illustrates the process.  From top to bottom, the floor of the image starts off perfectly mirror-like and gradually becomes rougher.  The red lines indicate the cones.  The circles inscribed in them show how the radii are used for mip selection (i.e., the larger the circle, the further down the mip chain), and the center of each circle is where the sample would be taken.  Notice that for a perfectly mirror-like surface, the cone diminishes to a straight line.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_cone_width_comparison.png" title="SSLR Cone Width Comparison">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_cone_width_comparison.png" alt="SSLR Cone Width Comparison" />
      </a>
    
  
  
</figure>

<h1 id="bringing-it-all-together">Bringing It All Together</h1>

<p>It’s mentioned earlier that a fallback technique is useful for any screen space reflection technique.  This implementation uses parallax-corrected cube maps based on Lagarde’s post [4]. These also include a fallback to generic, non-corrected cube maps as a last resort.  These values are all computed before the screen space reflections technique starts and are accessed above in the cone tracing step through the “indirectSpecularBuffer” resource.  While fallback methods won’t be as exact as ray traced results, properly set-up cube maps can certainly help alleviate jarring artifacts.  The image below shows a comparison of two sections of the same scene.  The left half of the image does not have good cube map placement and the missed reflection data is quite noticeable under the sphere.  The right half includes blended parallax-corrected cube maps and introduces a much less severe penalty for missed rays.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_fallback_bad_good_comparison.png" title="SSLR Fallback Bad/Good Comparison">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_fallback_bad_good_comparison.png" alt="SSLR Fallback Bad/Good Comparison" />
      </a>
    
  
  
</figure>

<p>Another artifact of inadequate fallback techniques can also be seen in the left image above.  As the traced ray nears closer to the edge of the screen, it starts to become faded.  The code for this is towards the bottom of the cone tracing shader.  Without a decent fallback technique in place, the differences between the center of the screen and the edges can be quite drastic.  The right half of the image shows such fading only to a very minor degree, most noticeably on the left edge of the picture.</p>

<p>Due to the numerous issues mentioned towards the start of the post, rays facing back towards the viewer are disallowed entirely.  This is an implementation choice and by no means a requirement.  Implementers should experiment with their own scenes and determine whether backwards-traversing rays provide acceptable results for use cases specific to the application.  In the implementation above, ray results start to fade as they become perpendicular such as to not cause a sharp cutoff at any one point.</p>

<p>A final nicety that was added to this implementation is that the indirect specular buffer is actually a part of the light buffer during the initial convolution and is subtracted back out before applying the cone tracing pass.  What this allows for is metals to be reflected more appropriately in the cone traced step.  In the image below, the left half does not take these steps into consideration and the metal’s reflection is black.  The specular highlight shows up in the reflection since it is contributed from direct lighting, the sun in this case, but none of the indirect light is included.  In the right half of the image, these effects are enabled and the sky is observable in the reflected sphere.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_metal_reflections_comparison.png" title="SSLR Metal Reflections Comparison">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_metal_reflections_comparison.png" alt="SSLR Metal Reflections Comparison" />
      </a>
    
  
  
</figure>

<p>The U-shape on the bottom of each sphere is due to not having good fallback techniques in this area of the scene, and can be alleviated as discussed previously.</p>

<h1 id="areas-of-improvement">Areas of Improvement</h1>

<p>The biggest area of needed improvement with this technique in its current state is the need for a better blur technique.  The current separable Gaussian blur, while fast, can lead to reflections being blurred onto parts of the scene where they don’t belong.  A feature-aware blur similar to a bilateral blur is likely a better candidate in this space.  Specifically, the blur will likely need to account for large depth discrepancies and reject samples that do not fall within a specified threshold.  It should be noted that battling these type of artifacts is a potential strength of Uludag’s proposed visibility buffer.</p>

<p>The blur can also be sped up while still obtaining the same results by using the approach found at [10].  This is slated as future work for the current effect.</p>

<p>While testing storage for the blurred results, a Texture2DArray was also tried out.  While this means of storage improved the overall perceived smoothness of the blur over varying roughness values, the memory requirements and increased time to run the blur several times over the full textures were simply not worth the small improvements.  The mip-chained texture provides decent results and blends adequately with trilinear sampling.  While testing values for various kernel sizes and sigmas, the calculator at [11] was extremely helpful for quick iteration.</p>

<p>One further improvement that can be made to the blurred result using the current implementation is to sample several points within the inscribed circle instead of just the center and blend all the results together.  The trade-off for sampling multiple points in this fashion is between performance and quality.  This technique is demonstrated in [8] on page 3 of the conversation.</p>

<p>Another area of improvement for this technique would be to update the reflection model to better match the lighting model used in the rest of the engine’s rendering pipeline.  As mentioned previously, the above implementation for the cone-tracing step is based off Uludag’s explanation provided in [3].  In its current state, the effect uses an approximation of the Phong model, while the rest of the pipeline uses GGX for its specular distribution term.  Uludag does offer suggestions in his article on how to adapt to other reflection models, and this may be the topic of a future post once implemented.</p>

<p>Furthermore, using more efficiently packed buffers for lighting data could prove to be a performance improvement for this technique.  As mentioned above, all buffers containing lighting data are 64-bit floating point buffers with 16 bits of precision in each channel.  Future experimentation with a more efficient 32-bit floating point buffer such as DirectX’s <code class="language-plaintext highlighter-rouge">DXGI_FORMAT_R11G11B10_FLOAT</code> should be considered.</p>

<h1 id="results">Results</h1>

<p>This section contains images generated using the techniques described above.  Each image is comprised of a few smaller images showing increasing roughness in the floor material.</p>

<p>The first image shows the effect working on a large scale in an area of the scene spanning over 100 meters.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_result_1.png" title="SSLR Results 1">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_result_1.png" alt="SSLR Results 1" />
      </a>
    
  
  
</figure>

<p>The second image shows the effect working in a more localized setting at ground level, similar to how a user would perceive the world in a first-person game or application.  The area uses parallax-corrected cube maps as a fallback technique, and missed ray intersections, such as those that would likely occur around concave objects (the soldier in this case), are very well-blended.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_result_2.png" title="SSLR Results 2">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_result_2.png" alt="SSLR Results 2" />
      </a>
    
  
  
</figure>

<p>The third image again shows the effect in a localized setting.  The later time of day creates a steeper contrast between shadowed and un-shadowed areas causing the effect to be more pronounced and better showing how a rougher surface will blur and even start to pull the reflection vertically.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_result_3.png" title="SSLR Results 3">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_result_3.png" alt="SSLR Results 3" />
      </a>
    
  
  
</figure>

<p>The fourth image again uses a steeper lighting contrast to help demonstrate how the effect applies as the floor material changes from very smooth to very rough.</p>

<figure class=" ">
  
    
      <a href="/assets/images/ss_glossy_reflections_post/sslr_result_4.png" title="SSLR Results 4">
          <img src="/assets/images/ss_glossy_reflections_post/sslr_result_4.png" alt="SSLR Results 4" />
      </a>
    
  
  
</figure>

<p>The following videos show the effect running in a real-time interactive application.  For best viewing, it is recommended to either run the videos in full-screen with high-definition enabled, or visit their respective YouTube pages by following these links:</p>

<p><a href="https://www.youtube.com/watch?v=J2nju5q_2iw">Video 1</a></p>

<p><a href="https://www.youtube.com/watch?v=nh9r22OwHEw">Video 2</a></p>

<!-- Courtesy of embedresponsively.com //-->

<div class="responsive-video-container">
    <iframe src="https://www.youtube-nocookie.com/embed/J2nju5q_2iw" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe>
  </div>

<!-- Courtesy of embedresponsively.com //-->

<div class="responsive-video-container">
    <iframe src="https://www.youtube-nocookie.com/embed/nh9r22OwHEw" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe>
  </div>

<h1 id="conclusion">Conclusion</h1>

<p>This post has presented a full implementation of a solution for glossy screen space reflections.  While the abundance of programmer art and MS Paint images may not be quite as fantastical as those rendered using a proper studio’s asset collection, the contributions of the effect to the final result should be clear.  Even with a basic reflection model, the technique serves to add more realism to a scene and provides a means for believable real-time reflections for rough surfaces.</p>

<h1 id="acknowledgements">Acknowledgements</h1>

<p>I first came into contact with Bruce Wilkie about a year ago when he posted a topic on gamedev.net.  We were both working on implementing Yasin Uludag’s article from GPU Pro 5 [3].  We spoke a few times on the subject, and it became abundantly clear that he was much more knowledgeable on the matter than me.  He was critical in helping me understand and figure out Uludag’s use of the hierarchical Z-buffer for ray tracing and work the kinks out of my initial attempts at implementing it [8].  Bruce was kind enough to offer that we keep in touch and that I could ask him questions around issues I might have while implementing different features in my engine, which I work on as a hobby in my spare time.  I’ve certainly taken advantage of that offer over the course of the year, and he’s offered various ranges of advice on almost everything graphics-related that’s been posted to this blog to date.  He has shown a great deal of patience in helping clarify certain concepts to me, and has a knack for explaining how to arrive at a solution without simply giving the answer away - an extremely valuable teaching technique.  He also brought the idea of the more efficient blur using [10] to my attention as a solid alternative to the standard approach used above, as well as offered a few more suggestions for improvement over the first draft of this post.</p>

<p>Thank you, Bruce.</p>

<p>I would also like thank Morgan McGuire (<a href="https://twitter.com/morgan3d">@morgan3d</a>) and Mike Mara for open-sourcing and generously licensing their DDA-based ray tracing code.  A thank you goes to Ben Hopkins (<a href="https://twitter.com/kode80">@kode80</a>) for doing the same with his implementation.</p>

<h1 id="references">References</h1>

<p>[1] Morgan McGuire and Mike Mara.  <a href="http://casual-effects.blogspot.com/2014/08/screen-space-ray-tracing.html">http://casual-effects.blogspot.com/2014/08/screen-space-ray-tracing.html</a></p>

<p>[2] Ben Hopkins.  <a href="http://www.kode80.com/blog/2015/03/11/screen-space-reflections-in-unity-5/">http://www.kode80.com/blog/2015/03/11/screen-space-reflections-in-unity-5/</a></p>

<p>[3] Yasin Uludag.  GPU Pro 5.  <em>Hi-Z Screen-Space Cone-Traced Reflections.</em></p>

<p>[4] Sébastien Lagarde.  <a href="https://seblagarde.wordpress.com/2012/09/29/image-based-lighting-approaches-and-parallax-corrected-cubemap/">https://seblagarde.wordpress.com/2012/09/29/image-based-lighting-approaches-and-parallax-corrected-cubemap/</a></p>

<p>[5] Sébastien Lagarde.  <a href="https://seblagarde.wordpress.com/2011/08/17/feeding-a-physical-based-lighting-mode/">https://seblagarde.wordpress.com/2011/08/17/feeding-a-physical-based-lighting-mode/</a></p>

<p>[6] Matt Pettineo.  <a href="https://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/">https://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/</a></p>

<p>[7] <a href="http://mathworld.wolfram.com/Inradius.html">Weisstein, Eric W. “Inradius.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/Inradius.html</a></p>

<p>[8] <a href="https://www.gamedev.net/topic/658702-help-with-gpu-pro-5-hi-z-screen-space-reflections/">https://www.gamedev.net/topic/658702-help-with-gpu-pro-5-hi-z-screen-space-reflections/</a></p>

<p>[9] <a href="https://en.wikipedia.org/wiki/Specular_highlight">https://en.wikipedia.org/wiki/Specular_highlight</a></p>

<p>[10] <a href="http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/">http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/</a></p>

<p>[11] <a href="http://dev.theomader.com/gaussian-kernel-calculator/">http://dev.theomader.com/gaussian-kernel-calculator/</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[Reflections are an important effect present in any routine attempting to approximate global illumination.  They give the user important spatial information about an object's location, as well as provide an important visual indicator of the surface properties of certain materials.]]></summary></entry><entry><title type="html">Dealing with Shadow Map Artifacts</title><link href="https://willpgfx.com/2015/05/dealing-with-shadow-map-artifacts/" rel="alternate" type="text/html" title="Dealing with Shadow Map Artifacts" /><published>2015-05-10T00:00:00-04:00</published><updated>2015-05-10T00:00:00-04:00</updated><id>https://willpgfx.com/2015/05/dealing-with-shadow-map-artifacts</id><content type="html" xml:base="https://willpgfx.com/2015/05/dealing-with-shadow-map-artifacts/"><![CDATA[<aside class="sidebar__right">
<nav class="toc">
    <header><h4 class="nav__title"><i class="fas fa-file-alt"></i> On This Page</h4></header>
<ul class="toc__menu" id="markdown-toc">
  <li><a href="#perspective-aliasing" id="markdown-toc-perspective-aliasing">Perspective Aliasing</a></li>
  <li><a href="#shadow-acne" id="markdown-toc-shadow-acne">Shadow Acne</a></li>
  <li><a href="#peter-panning" id="markdown-toc-peter-panning">Peter Panning</a></li>
  <li><a href="#working-in-the-shader" id="markdown-toc-working-in-the-shader">Working in the Shader</a></li>
  <li><a href="#references" id="markdown-toc-references">References</a></li>
</ul>

  </nav>
</aside>

<p>In a <a href="/2015/05/stacks-on-stacks/">previous post</a> on stack stabilization, the linked video showed a few major issue with shadow mapping.  These issues have plagued the technique since it’s inception, and while there are many methods that assist in alleviating them, it’s still very difficult to completely get rid of them.  Here we’ll review some common artifacts and discuss potential ways to squash them.</p>

<h1 id="perspective-aliasing">Perspective Aliasing</h1>

<p>These types of artifacts are perhaps the simplest to alleviate.  Stair-like artifacts outlining the projected shadows are generally caused by the resolution of the shadow map being too low.  Compare the halves in the image below.  The top half shows a scene using a shadow map resolution of 256x256, while the bottom shows the same scene using a resolution of 2048x2048.</p>

<figure class=" ">
  
    
      <a href="/assets/images/shadowmap_artifacts_post/0_sm_resolution_comparison.png" title="Shadow Map Resolution Comparison">
          <img src="/assets/images/shadowmap_artifacts_post/0_sm_resolution_comparison.png" alt="Shadow Map Resolution Comparison" />
      </a>
    
  
  
</figure>

<p>Unfortunately, increasing the resolution will only get us so far.  Even at high resolutions, if the viewer is close enough to the receiving surface, tiny stair-like artifacts will still be noticeable along the edges of projected shadows.  The solution to this is to use a technique called percentage closer filtering (PCF).  Instead of sampling at one location, this algorithm samples several points around the initial location, weighs the results that are shadowed versus non-shadowed, and creates soft edges for the result.  The image below shows an up-close view of a shadow map with 2048x2048 resolution without and then with PCF enabled.</p>

<figure class=" ">
  
    
      <a href="/assets/images/shadowmap_artifacts_post/1_pcf_comparison.png" title="PCF Comparison">
          <img src="/assets/images/shadowmap_artifacts_post/1_pcf_comparison.png" alt="PCF Comparison" />
      </a>
    
  
  
</figure>

<p>There are several different sampling patterns that can be used for the PCF algorithm.  Currently, I’m using a simple box filter around the center location.  Other sampling patterns, such as a rotated Poisson disc, are also popular and produce varying results.</p>

<h1 id="shadow-acne">Shadow Acne</h1>

<p>Another common artifact found in shadow mapping is shadow acne, or erroneous self-shadowing.  This generally occurs when the texel depth in light space and the texel depth in view space are so close that floating point errors incorrectly cause the depth test to fail.  The image below shows an example of these artifacts present (top) and addressed (bottom).</p>

<figure class=" ">
  
    
      <a href="/assets/images/shadowmap_artifacts_post/2_shadowacne_comparison.png" title="Shadow Acne">
          <img src="/assets/images/shadowmap_artifacts_post/2_shadowacne_comparison.png" alt="Shadow Acne" />
      </a>
    
  
  
</figure>

<p>There are a few ways to address this issue.  It’s so prevalent, that most graphics APIs provide a means to instantiate a rasterizer state that includes both a depth bias and a slope-scaled depth bias.  Essentially, during shadow map creation, these values are used in combination to offset the current value by a certain amount and push it out of the range where floating point inaccuracies would cause inaccurate comparisons.  One must be careful when setting these bias values.  Too high of a value can cause the next issue to be discussed, peter panning, while too low of a value will still let acne artifacts creep back into the final image.</p>

<h1 id="peter-panning">Peter Panning</h1>

<p>It’s frustrating when introducing a fix for one thing breaks something else.  That’s exactly what we can potentially end up with when we use depth biases for shadow maps.  Peter Panning is caused by offsetting the depth values in light space too much.  The result is that the shadow becomes detached from the object casting it.  The image below displays this phenomenon.  In both halves of the image, the blocks are resting on the ground, but in the top half the depth bias is so large that it pushes the shadow away from the caster, causing them to appear as though they could be floating.  The bottom half uses a more appropriate depth bias and the shadow appears properly attached.</p>

<figure class=" ">
  
    
      <a href="/assets/images/shadowmap_artifacts_post/3_peter_panning_comparison.png" title="Peter Panning">
          <img src="/assets/images/shadowmap_artifacts_post/3_peter_panning_comparison.png" alt="Peter Panning" />
      </a>
    
  
  
</figure>

<p>Bangarang!</p>

<h1 id="working-in-the-shader">Working in the Shader</h1>

<p>Using hardware depth biasing in the rasterizer is nice in that it’s fast and easy enough to set up and get working.  Sometimes, however, we have different needs for our shadow maps and want to delay these types of correction steps until further in the pipeline.  Though I’ve since reverted to a more basic approach, when first implementing <a href="/2015/04/separable-subsurface-scattering/">transmittance through thin materials</a> I switched my shadow map vertex shaders to output linear values to make the implementation a bit more straightforward.  If I used the rasterizer state offsets as described above, I would have to somehow track and undo those offsets before I could use the values effectively in my transmittance calculations, or else have major artifacts from depth discrepancies.  Fortunately, there are several excellent resources that describe alternative methods for getting rid of shadow artifacts (see references), and with a combination of ideas borrowed from all of them, I’ve been able to get a fairly decent implementation working.  Below is some example code in HLSL.</p>

<p>Storing linear values to the shadow map:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// client code</span>
<span class="n">Matrix4x4f</span> <span class="n">linearProjectionMtx</span> <span class="o">=</span> <span class="n">createPerspectiveFOVLHMatrix4x4f</span><span class="p">(</span><span class="n">fovy</span><span class="p">,</span> <span class="n">aspect</span><span class="p">,</span> <span class="n">nearPlane</span><span class="p">,</span> <span class="n">farPlane</span><span class="p">);</span>
<span class="n">linearProjectionMtx</span><span class="p">.</span><span class="n">rc33</span> <span class="o">/=</span> <span class="n">farPlane</span><span class="p">;</span>
<span class="n">linearProjectionMtx</span><span class="p">.</span><span class="n">rc34</span> <span class="o">/=</span> <span class="n">farPlane</span><span class="p">;</span>

<span class="c1">// shadow map vertex shader</span>
<span class="n">float4</span> <span class="nf">main</span><span class="p">(</span><span class="n">VertexIn</span> <span class="n">vIn</span><span class="p">)</span> <span class="o">:</span> <span class="n">SV_POSITION</span>
<span class="p">{</span>
    <span class="c1">// transform to homogeneous clip space</span>
    <span class="n">float4</span> <span class="n">posH</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">float4</span><span class="p">(</span><span class="n">vIn</span><span class="p">.</span><span class="n">posL</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">),</span> <span class="n">worldViewProjectionMatrix</span><span class="p">);</span>
    <span class="c1">// store linear depth to shadow map - there is no change to the value stored for orthographic projections since w == 1</span>
    <span class="n">posH</span><span class="p">.</span><span class="n">z</span> <span class="o">*=</span> <span class="n">posH</span><span class="p">.</span><span class="n">w</span><span class="p">;</span>
    <span class="k">return</span> <span class="n">posH</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Using a scaled normal offset in the light shader before transforming a point in world space by the shadow transform matrix.  I use a deferred shading pipeline and store data in the G-Buffer in view space, hence having to transform the new position by the inverse of the camera view matrix first:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#if DIRECTIONALLIGHT
</span>    <span class="n">float3</span> <span class="n">toLightV</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="o">-</span><span class="n">light</span><span class="p">.</span><span class="n">direction</span><span class="p">);</span>
<span class="cp">#else
</span>    <span class="n">float3</span> <span class="n">toLightV</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">light</span><span class="p">.</span><span class="n">position</span> <span class="o">-</span> <span class="n">position</span><span class="p">);</span>
<span class="cp">#endif
</span>    <span class="kt">float</span> <span class="n">cosAngle</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="mf">1.0</span><span class="n">f</span> <span class="o">-</span> <span class="n">dot</span><span class="p">(</span><span class="n">toLightV</span><span class="p">,</span> <span class="n">normal</span><span class="p">));</span>
    <span class="n">float3</span> <span class="n">scaledNormalOffset</span> <span class="o">=</span> <span class="n">normal</span> <span class="o">*</span> <span class="p">(</span><span class="n">cb_normalOffset</span> <span class="o">*</span> <span class="n">cosAngle</span> <span class="o">*</span> <span class="n">smTexelDimensions</span><span class="p">);</span>
    <span class="n">float4</span> <span class="n">shadowPosW</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">float4</span><span class="p">(</span><span class="n">position</span> <span class="o">+</span> <span class="n">scaledNormalOffset</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">),</span> <span class="n">inverseViewMatrix</span><span class="p">);</span>
</code></pre></div></div>

<p>Once the point has been transformed by the shadow matrix, finish projecting it and apply a depth offset:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// complete projection by doing division by w</span>
<span class="n">shadowPosH</span><span class="p">.</span><span class="n">xyz</span> <span class="o">/=</span> <span class="n">shadowPosH</span><span class="p">.</span><span class="n">w</span><span class="p">;</span>
<span class="n">shadowPosH</span><span class="p">.</span><span class="n">z</span> <span class="o">-=</span> <span class="n">cb_depthBias</span> <span class="o">*</span> <span class="n">smTexelDimensions</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">depth</span> <span class="o">=</span> <span class="n">shadowPosH</span><span class="p">.</span><span class="n">z</span><span class="p">;</span> <span class="c1">// depth to use for PCF comparison</span>
</code></pre></div></div>

<p>And that’s it.  The values for depth bias and normal offset have to be adjusted per light and depend on various factors, such as the light range, the shadow projection matrix, and to some extent the resolution of the shadow map, but when properly set the results can be quite nice and artifacts are almost entirely mitigated.</p>

<h1 id="references">References</h1>

<p><a href="http://www.dissidentlogic.com/old/images/NormalOffsetShadows/GDC_Poster_NormalOffset.png">http://www.dissidentlogic.com/old/images/NormalOffsetShadows/GDC_Poster_NormalOffset.png</a></p>

<p><a href="http://c0de517e.blogspot.co.at/2011/05/shadowmap-bias-notes.html">http://c0de517e.blogspot.co.at/2011/05/shadowmap-bias-notes.html</a></p>

<p><a href="http://www.digitalrune.com/Support/Blog/tabid/719/EntryId/218/Shadow-Acne.aspx">http://www.digitalrune.com/Support/Blog/tabid/719/EntryId/218/Shadow-Acne.aspx</a></p>

<p><a href="https://msdn.microsoft.com/en-us/library/windows/desktop/ee416324%28v=vs.85%29.aspx">https://msdn.microsoft.com/en-us/library/windows/desktop/ee416324%28v=vs.85%29.aspx</a></p>

<p><a href="https://www.mvps.org/directx/articles/linear_z/linearz.htm">https://www.mvps.org/directx/articles/linear_z/linearz.htm</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[In a previous post on stack stabilization, the linked video showed a few major issue with shadow mapping.  These issues have plagued the technique since it's inception, and while there are many methods that assist in alleviating them, it's still very difficult to completely get rid of them.  Here we'll review some common artifacts and discuss potential ways to squash them.]]></summary></entry><entry><title type="html">Stacks on Stacks</title><link href="https://willpgfx.com/2015/05/stacks-on-stacks/" rel="alternate" type="text/html" title="Stacks on Stacks" /><published>2015-05-08T00:00:00-04:00</published><updated>2015-05-08T00:00:00-04:00</updated><id>https://willpgfx.com/2015/05/stacks-on-stacks</id><content type="html" xml:base="https://willpgfx.com/2015/05/stacks-on-stacks/"><![CDATA[<p>A long while back, I realized my scenes would be better served and more interesting if there was a more dynamic component to them.  Outside of the very basics, implementing  a proper physics engine with accurate collision detection and response was quite foreign to me.  Therefore, I picked up Ian Millington’s book <em>Game Physics Engine Development</em> and got to work.  I enjoyed the author’s approachable writing style and well-explained information on both particle and rigid body dynamics.  Within about a week or so, I was able to integrate a fairly robust adaptation of the engine presented in the book into my own engine’s architecture.</p>

<p>While the information presented on physical body simulation is quite good, the book’s main shortcoming is in collision detection and resolution.  In fairness, the author calls this out and tries to realistically set the reader’s expectations, but there’s a lot left to be desired when two boxes can’t reliably be stacked on top of one another due to non-converging solutions for contact generation and impulse resolutions.  Regardless, this is the approach that had lived in my engine for well over a year and still remains in the code base, although I consider it to be deprecated for anything beyond very simple simulations.</p>

<p>After a lot of research and a short back and forth email exchange with <a href="http://www.randygaul.net/">Randy Gaul</a>, I tried my hand at implementing a more complex collision detection routine.  The new routine generated an entire contact manifold, as opposed to the old one, which only ever recorded one contact between two objects for any given point in time.  The contact manifold contained up to 4 points per collision pair.  This data, combined with a few other tricks I picked up here and there, finally allowed a small stack of boxes to sit on top of each other without shaking and falling over.</p>

<p>Eventually, I decided I wanted an overall more robust solution for both physics simulation and collision detection and resolution, so I spent a weekend integrating the <a href="http://bulletphysics.org/wordpress/">Bullet Physics</a> library into my engine.  Bullet’s API has proven to be reasonably straightforward, and I was able to get a stable stack of boxes set up in a very short amount of time.</p>

<p>The video below shows the dramatic difference in the old collision resolution method, and the newly implemented engine backed by Bullet.</p>

<!-- Courtesy of embedresponsively.com //-->

<div class="responsive-video-container">
    <iframe src="https://www.youtube-nocookie.com/embed/lkPJjizpLII" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe>
  </div>

<p>With the old setup, I would place objects in the world with a sleep state and a tiny amount of space between each to give the appearance of a stack, but as soon as I interacted with anything in the stack, all bets were off.  With the new implementation, I can safely let objects fall into place and rest on top of each other at the start of the simulation without worrying too much about the whole thing going haywire.</p>

<p>(Regarding the ugly shadow artifacts in the video, those will be addressed in a <a href="/2015/05/dealing-with-shadow-map-artifacts/">follow-up post</a> specific to the topic.)</p>

<h1 id="references">References</h1>

<p><a href="http://www.randygaul.net/">http://www.randygaul.net/</a></p>

<p><a href="http://allenchou.net/">http://allenchou.net/</a></p>

<p><a href="https://code.google.com/p/box2d/downloads/list">https://code.google.com/p/box2d/downloads/list</a></p>

<p><a href="https://github.com/bulletphysics/bullet3/releases">https://github.com/bulletphysics/bullet3/releases</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[A long while back, I realized my scenes would be better served and more interesting if there was a more dynamic component to them.  Outside of the very basics, implementing  a proper physics engine with accurate collision detection and response was quite foreign to me.  Therefore, I picked up Ian Millington’s book Game Physics Engine Development and got to work.  I enjoyed the author’s approachable writing style and well-explained information on both particle and rigid body dynamics.  Within about a week or so, I was able to integrate a fairly robust adaptation of the engine presented in the book into my own engine’s architecture.]]></summary></entry><entry><title type="html">Bachelor Thesis Acknowledgement</title><link href="https://willpgfx.com/2015/05/bachelor-thesis-acknowledgement/" rel="alternate" type="text/html" title="Bachelor Thesis Acknowledgement" /><published>2015-05-04T00:00:00-04:00</published><updated>2015-05-04T00:00:00-04:00</updated><id>https://willpgfx.com/2015/05/bachelor-thesis-acknowledgement</id><content type="html" xml:base="https://willpgfx.com/2015/05/bachelor-thesis-acknowledgement/"><![CDATA[<p>I recently received an acknowledgement in Lukas Hermanns’ bachelor’s thesis entitled <em>Screen Space Cone Tracing for Glossy Reflections</em>, which I thought was really cool of him.  He’s produced some great results, and I’m happy to have lent a hand in the excellent work he’s done.</p>

<p>The full thesis can be found here:  <a href="http://publica.fraunhofer.de/documents/N-336466.html">http://publica.fraunhofer.de/documents/N-336466.html</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[I recently received an acknowledgement in Lukas Hermanns’ bachelor’s thesis entitled Screen Space Cone Tracing for Glossy Reflections, which I thought was really cool of him.  He’s produced some great results, and I’m happy to have lent a hand in the excellent work he’s done.]]></summary></entry><entry><title type="html">Separable Subsurface Scattering</title><link href="https://willpgfx.com/2015/04/separable-subsurface-scattering/" rel="alternate" type="text/html" title="Separable Subsurface Scattering" /><published>2015-04-18T00:00:00-04:00</published><updated>2015-04-18T00:00:00-04:00</updated><id>https://willpgfx.com/2015/04/separable-subsurface-scattering</id><content type="html" xml:base="https://willpgfx.com/2015/04/separable-subsurface-scattering/"><![CDATA[<p>I’ve recently implemented Screen-Space Separable Subsurface Scattering into my rendering engine.  This implementation is based off the incredible work that’s been done over the past several years, and documented <a href="http://www.iryoku.com/separable-sss-released">here</a> and <a href="http://cg.tuwien.ac.at/~zsolnai/gfx/separable-subsurface-scattering-with-activision-blizzard/">here</a>.  I’m quite pleased with the results I’m getting from the effect and so am posting a few screenshots of it in action.</p>

<p>The first screenshots show the effect in daylight.  Hopefully it’s quite obvious which head in the picture has the new technique applied and which is being lit with the engine’s standard lighting model.</p>

<figure class="half ">
  
    
      <a href="/assets/images/sssss_post/sssss_off.png" title="Effect Off">
          <img src="/assets/images/sssss_post/sssss_off.png" alt="Effect Off" />
      </a>
    
  
    
      <a href="/assets/images/sssss_post/sssss_on.png" title="Effect On">
          <img src="/assets/images/sssss_post/sssss_on.png" alt="Effect On" />
      </a>
    
  
  
</figure>

<p>The second screenshot shows another part of the overall effect, which is the transmittance of light through very thin slabs of materials, such as ears.</p>

<figure class=" ">
  
    
      <a href="/assets/images/sssss_post/transmittance_and_sssss_1.png" title="Transmittance 1">
          <img src="/assets/images/sssss_post/transmittance_and_sssss_1.png" alt="Transmittance 1" />
      </a>
    
  
  
</figure>

<p>The next screenshot better shows both subsurface scattering and transmittance working together.  In particular, notice how the light behaves along the ridge of the nose.</p>

<figure class=" ">
  
    
      <a href="/assets/images/sssss_post/transmittance_and_sssss_2.png" title="Transmittance 2">
          <img src="/assets/images/sssss_post/transmittance_and_sssss_2.png" alt="Transmittance 2" />
      </a>
    
  
  
</figure>

<p>Finally, I cobbled together a quick setup showing how this technique could be used to create a nice effect for candles.  To make the effect, I added a wax-like kernel, and just aimed a bright spotlight straight down at a cylinder.</p>

<figure class=" ">
  
    
      <a href="/assets/images/sssss_post/sssss_wax_candle.png" title="Wax Kernel">
          <img src="/assets/images/sssss_post/sssss_wax_candle.png" alt="Wax Kernel" />
      </a>
    
  
  
</figure>

<h1 id="references">References</h1>

<p><a href="http://www.iryoku.com/separable-sss-released">http://www.iryoku.com/separable-sss-released</a></p>

<p><a href="http://cg.tuwien.ac.at/~zsolnai/gfx/separable-subsurface-scattering-with-activision-blizzard/">http://cg.tuwien.ac.at/~zsolnai/gfx/separable-subsurface-scattering-with-activision-blizzard/</a></p>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[I’ve recently implemented Screen-Space Separable Subsurface Scattering into my rendering engine.  This implementation is based off the incredible work that’s been done over the past several years, and documented here and here.  I’m quite pleased with the results I’m getting from the effect and so am posting a few screenshots of it in action.]]></summary></entry><entry><title type="html">Hello, World!</title><link href="https://willpgfx.com/2011/07/hello-world/" rel="alternate" type="text/html" title="Hello, World!" /><published>2011-07-19T00:00:00-04:00</published><updated>2011-07-19T00:00:00-04:00</updated><id>https://willpgfx.com/2011/07/hello-world</id><content type="html" xml:base="https://willpgfx.com/2011/07/hello-world/"><![CDATA[<p>This is a test post with a little sample code to verify syntax highlighting is working correctly.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">&lt;iostream&gt;</span><span class="cp">
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="s">"Hello, world!"</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name>Will Pearce</name></author><summary type="html"><![CDATA[This is a test post with a little sample code to verify syntax highlighting is working correctly.]]></summary></entry></feed>