Thursday, August 20, 2009

Ye Olde Pixels - Silverlight 3 Old Movie Pixel Shader


In my latest WriteableBitmap Performance post I mentioned that the Silverlight 3 pixel shaders run really fast, although they are executed on the CPU and not on the GPU and that nice real-time effects can be done with it. A Silverlight 3 pixel shader could also be applied to any UIElement, thus the MediaElement too. The Silverlight MediaElement is mainly used to play music or video and with an attached shader many cool effects could be realized. As you might know I like nice effects and due to this passion I have coded another pixel shader for Silverlight. This time I have implemented a shader that simulates scratches, noise and other effects you might have seen in old movies.

Live

The intensity of the scratches and the noise is controlled with the Sliders. The shader could be disabled with the "Bypass" checkbox and it's also possible to load a WMV clip with the "Load" Button.
William Moore made some short video clips with After Effects for me and I have uploaded them to my Dropbox. Just copy a URL to the TextBox and press "Load" to try them with the shader:
  • Grass.wmv
  • MooreFamily.wmv

How it works
Here's the pixel shader written in HLSL:
// Parameters
float ScratchAmount : register(C0);
float NoiseAmount : register(C1);
float2 RandomCoord1 : register(C2);
float2 RandomCoord2 : register(C3);
float Frame : register(C4);

// Static
static float ScratchAmountInv = 1.0 / ScratchAmount;

// Sampler
sampler2D TexSampler : register(S0);
sampler2D NoiseSampler : register(S1);

// Shader
float4 main(float2 uv : TEXCOORD) : COLOR
{
    // Sample texture
    float4 color = tex2D(TexSampler, uv);

    // Add Scratch
    float2 sc = Frame * float2(0.001f, 0.4f);
    sc.x = frac(uv.x + sc.x);
    float scratch = tex2D(NoiseSampler, sc).r;
    scratch = 2 * scratch * ScratchAmountInv;
    scratch = 1 - abs(1 - scratch);
    scratch = max(0, scratch);
    color.rgb += scratch.rrr; 

    // Calculate random coord + sample
    float2 rCoord = (uv + RandomCoord1 + RandomCoord2) * 0.33;
    float3 rand = tex2D(NoiseSampler, rCoord);
    // Add noise
    if(NoiseAmount > rand.r)
    {
        color.rgb = 0.1 + rand.b * 0.4;
    }

    // Convert to gray + desaturated Sepia
    float gray = dot(color, float4(0.3, 0.59, 0.11, 0)); 
    color = float4(gray * float3(0.9, 0.8, 0.6) , 1);

    // Calc distance to center
    float2 dist = 0.5 - uv;   
    // Random light fluctuation
    float fluc = RandomCoord2.x * 0.04 - 0.02;
    // Vignette effect
    color.rgb *= (0.4 + fluc - dot(dist, dist))  * 2.8;

    return color;
}

The 1st effect adds some random scratches, which are moving slowly each frame.
Unfortunately the Shader Model 2 doesn't support any random function and the noise() intrinsic could not be used for real-time effects. That's a problem if you want to make pseudorandom-based shaders, but one key element of restricted shader development is pre calculation and texture lookup. The randomness, which is used in the shader, comes from a WriteableBitmap, that is filled with random color in the initialization step of the application. The shader in turn samples random values from that texture.


The 2nd effect adds a bit random noise to the image. If the texture coordinates with the frame counter would only be used, a static, moving pattern would be seen and it wouldn't look noisy at all. So actually a new random noise texture is needed every frame to achieve the desired effect. Generating a random texture with a WriteableBitmap in every frame would be too expensive. Instead of this, a little trick is used and only four random numbers are generated every frame and passed to the shader as parameters. These random seeds are then used as texture coordinates for a random lookup in the random colored texture. Therefore each pixel in every frame samples a different random value. Although it is not highly random, the values are sufficient enough to produce changing noise.


The 3rd step converts the pixel to grayscale and colors it afterwards with a desaturated Sepia tone.










The 4th and last effect computes the distance to the center and multiplies it with the color. This generates the vignette effect as a fadeout to black towards the edge. A random light flickering of an old projector is also simulated.








That's it and it's all about cheating.

Source code
Feel free to download the Visual Studio 2008 solution including the pixel shader from here.

Update 12-29-2009
I refactored the code behind shader class and separated it completly from the calling code (MainPage.xaml.cs). Now the shader class generates a default noise texture and random coordinates itself.
The linked source code was updated.

Update 12-31-2009
See this blog post if you want to try the shader in real time with your own webcam.

Wednesday, August 5, 2009

Silverlight 3 WriteableBitmap Performance Follow-Up

Two weeks ago I wrote the Silverlight 3 WriteableBitmap Performance blog post and got some good response on it. One of it came from the author of Quakelight, Julien Frelat. He contacted and asked me if I had tested his PNG wrapper technique, which is used in the Silverlight port of the famous game Quake. I thought Quakelight uses the WriteableBitmap and not a custom Stream hack, so I haven’t considered to check it for the Speedtest. Actually Quakelight uses the Silverlight 3 WriteableBitmap, but the source code includes a custom Stream for Silverlight 2 too.
For better performance the Quakelight PngWrapper and BitmapData classes use a 8 bit color palette instead of full 32 bit ARGB colors. This fact makes it not directly comparable to the other competitors, which all support 32 bit ARGB colors, but for certain problems 256 colors could be sufficient. That’s why I wrote this follow-up and integrated Quakelight’s Silverlight 2 Stream implementation.
The Speedtest generates an interference image and writes each pixel to a buffer, which is then used as the source of an Image. This effect could also be realized with a pixel shader and as I was working on the Speedtest v2, I implemented the effect as a pixel shader too.
Make sure to check the Update section at the bottom of this post.

The Competitors
  1. Silverlight 3 WriteableBitmap.
  2. RawPngBufferStream from the open source GameEngine Balder.
  3. Nikola's PngEncoder, which is an improved version of Joe Stegman's work.
  4. Ian Griffiths' SlDynamicBitmap library.
  5. NEW: Quakelight’s 8 bit BitmapData.
  6. NEW: A pixel shader.

Live

The application measures the time, which every implementation needs to draw the "Maximum Frames" and calculates the mean frames per seconds (fps). The third text column shows the relative performance compared to the WriteableBitmap.
If the tests complete very fast, you should increase the "Maximum Frames" to get better results.


How it works
The image is still 512 x 512 pixels in size and the mean frames per second are measured, but I changed the CalculateColor(int x, int y) effect a bit and removed the lookup table for the sine movement of the circles. The effect looks almost the same, but the code is much better to understand. At least I hope so.

private void CalculateColor(int x, int y)
{
   // Normalize coordinates
   double xn = x * TexSizeInv;
   double yn = y * TexSizeInv;

   // Overlayed sine circle rings
   // I use member variables for argb, 
   // so I don't have to allocate and return a byte array in each call
   // Ugly, but much faster than return new byte[]{ }

   // Red
   double d = (xn - C1.X) * (xn - C1.X) + (yn - C1.Y) * (yn - C1.Y);
   r = (byte)((Math.Sin(d * Frequency)) > 0 ? 0 : 255);

   // Blue
   d = (xn - C2.X) * (xn - C2.X) + (yn - C2.Y) * (yn - C2.Y);
   b = (byte)((Math.Sin(d * Frequency)) > 0 ? 0 : 255);

   // Green fills the gaps
   g = (byte)~(r | b);
}

The calculation uses normalized texture coordinates and the circle centers are computed each frame and stored in the Points C1 and C2:

private void CalculateCenters()
{
   // Nice sine circle movement
   // Use normalized coordinates
   C1.X = Math.Sin(framesCount * 0.02) * Half + Quarter;
   C1.Y = Math.Sin(framesCount * 0.08) * Half + Quarter;
   C2.X = Math.Sin(framesCount * 0.10) * Half + Quarter;
   C2.Y = Math.Sin(framesCount * 0.04) * Half + Quarter;
}

The normalization of the center coordinates makes it easier to use the same values directly as parameters for the pixel shader:

// Parameters
float2 C1 : register(C0);
float2 C2 : register(C1);
float Frequency : register(C2);

// Shader
float4 main(float2 p : TEXCOORD) : COLOR
{
   // Overlayed sine circle rings
   float4 color = 1;

   // Red
   float2 dist = uv - C1; 
   color.r = sin(dot(dist, dist) * Frequency) > 0 ? 0 : 1;

   // Blue
   dist = uv - C2; 
   color.b = sin(dot(dist, dist) * Frequency) > 0 ? 0 : 1;

   // Green fills the gaps
   color.g = color.r + color.b > 0 ? 0 : 1;
   return color;
}

Results

The results differ a bit from the first article, which is caused by the other algorithm that is used here. There are other color distributions and thus results in a slightly different drawing. The Silverlight rendering has a great impact on the performance and unfortunately Silverlight doesn't ship with the .Net Stopwatch class and the only way to get suitable data, is the measuring of larger code blocks, including the drawing, with the imprecise DateTime struct.
The Quakelight BitmapData is almost as fast as the Silverlight 3 WriteableBitmap, but at the cost of the 256 color limitation. Although the pixel shader is not executed on the GPU, it still runs ultra-fast compared to the other competitors.
If you want to implement a procedural image generation technique I recommend to try a pixel shader, but keep in mind that Silverlight 3 only supports the limited Shader Model 2. I noticed that the Silverlight pixel shaders seem to use SEE and are automatically executed in parallel if they run on a multi-core processor. The framerate on a dual-core machine was twice as high as on a single-core CPU. This parallel software shader implementaion in Silverlight is actually the only right way to implement them and nothing special. Shaders are designed for parallel execution on the GPU.

Source code
Download the C# code and the pixel shader from here.

Update 08-10-2009
Justin Harrell used my Speedtest source code and added tests for the Silverlight 3 MediaSource API. He contacted my via Email and attached the source code:

Hello

I have been looking into Silverlight Graphics performance myself and was interested in the new SL3 features. So I read your blog entry for WriteableBitmap performance and was also interested in the new MediaSource managed codec abilities, which looked like another way to get pixels to the screen, but also audio as well which could be interesting for games etc.
So I took your source for the test sample and added two MediaSource tests, one using single threaded frame generation, the second using a background thread and a double buffer based on work from Pete Brown at on his Commodore 64 emulator in Silverlight.
I also reworked the UI with checkboxes to turn on/off tests as well as adding 3 new types of tests beyond the Circle interference to test performance of the rendering method vs the pixel generation. These include a random noise, simple scrolling line, and a no op. I also refactored some to make it easier to add new test types. I did not implement shader based versions of my tests, so if you run the shader test on anything but circle interference its basically a no op for now J.
I thought it might be an interesting addition to the sample as yet another way to generate dynamic bitmaps, the mediasource performs very well, although still behind WriteableBitmap.
Note for the non-double buffered version of MediaSource there is a padTime that can be tuned, it is currently set to 10ms and can be lowered to improve frame rate, but below a certain time it will get too short based on how faster your computer is and cause the media player to think it is losing frames and start skipping badly. I haven’t figured out a good way to compensate for this automatically yet, it has to do with how long the video render takes in addition to the pixel generation. The double buffer does not have this issue as it is a fixed frame rate set by frameTime.
I have attached the sample to this email, let me know what you think, and thanks for the great blog posts.

Justin Harrell

Thanks Justin for sharing your additions. That's why I <3 open source.