Home > Nvidia > Processor > Nvidia Geforce 6 Series Manual

Nvidia Geforce 6 Series Manual

Here you can view all the pages of manual Nvidia Geforce 6 Series Manual. The Nvidia manuals for Processor are available online for free. You can easily download all the documents as PDF.

Page 11

30.3 GPU Features
This section covers both fixed-function features and Shader Model 3.0 support (de-
scribed in detail later) in GeForce 6 Series GPUs. As we describe the various pieces, we
focus on the many new features that are meant to make applications shine (in terms of
both visual quality and performance) on GeForce 6 Series GPUs.
30.3.1 Fixed-Function Features
Geometry Instancing
With Shader Model 3.0, the capability for sending multiple batches of geometry with
one Direct3D call has been added,...

Page 12

482
Z-Cull
NVIDIA GPUs since GeForce3 have technology, called z-cull, that allows hidden sur-
face removal at speeds much faster than conventional rendering. The GeForce 6 Series
z-cull unit is the third generation of this technology, which has increased efficiency for
a wider range of cases. Also, in cases where stencil is not being updated, early stencil
reject can be employed to remove rendering early when stencil test (based on equals
comparison) fails.
Occlusion Query
Occlusion query is the ability...

Page 13

distance passes the test, it’s in light; if not, it’s in shadow. NVIDIA GPUs have dedi-
cated transistors to perform four z-compares per pixel (on four neighboring z-values)
per clock, and to perform bilinear filtering of the pass/fail data. This more advanced
variation of percentage-closer filtering saves many shader instructions compared to
GPUs that don’t have direct shadow buffer support.
High-Dynamic-Range Blending Using fp16 Surfaces, Texture Filtering,
and Blending
GeForce 6 Series GPUs allow for...

Page 14

484
●Dynamic flow control.Branching and looping are now part of the shader model. On
the GeForce 6 Series vertex engine, branching and looping have minimal overhead of
just two cycles. Also, each vertex can take its own branches without being grouped in
the way pixel shader branches are. So as branches diverge, the GeForce 6 Series vertex
processor still operates efficiently.
●Vertex texturing. Textures can now be fetched in a vertex program, although only
nearest-neighbor filtering is supported in...

Page 15

separate textures. So, for example, the surface normal and the diffuse and specular
material properties could be written to textures, and the textures could all be used in
subsequent passes when lighting the scene with multiple lights. This is illustrated in
Figure 30-8.
●Dynamic flow control (branching).Shader Model 3.0 supports conditional branch-
ing and looping, allowing for more flexible shader programs.
●Indexing of attributes. With Shader Model 3.0, an index register can be used to
select which...

Page 16

486
●3:1 and 2:2 coissue.Each four-component-wide vector unit is capable of executing
two independent instructions in parallel, as shown in Figure 30-9: either one three-
wide operation on RGB and a separate operation on alpha, or one two-wide opera-
tion on red-green and a separate two-wide operation on blue-alpha. This gives the
compiler more opportunity to pack scalar computations into vectors, thereby doing
more work in a shorter time.
●Dual issue. Dual issue is similar to coissue, except that the...

Page 17

Fragment Processor Performance
The GeForce 6 Series fragment processor architecture has the following performance
characteristics:
●Each pipeline is capable of performing a four-wide, coissue-able multiply-add (MAD)
or four-term dot product (
DP4), plus a four-wide, coissue-able and dual-issuable
multiply instruction per clock in series, as shown in Figure 30-11. In addition, a
multifunction unit that performs complex operations can replace the alpha channel
MADoperation. Operations are performed at...

Page 18

488
Table 30-2.Overhead Incurred When Executing Flo w-Control Operations in Fragment Programs
Instruction Cost (Cycles)
If/ endif4
If/else/ endif6
Call2
Ret2
Loop/ endloop4
Furthermore, branching in the fragment processor is affected by the level of divergence
of the branches. Because the fragment processor operates on hundreds of pixels per
instruction, if a branch is taken by some fragments and not others, all fragments exe-
cute both branches, but only writing to the registers on the branches each...

Page 19

30.4 Performance489
Table 30-3.
Data Storage Formats Supported by GeForce 6 Series GPUs
FormatDescription of Data in Memory
Ver tex
Texture
SupportFragment Texture
SupportRender Target
Support
B8One 8-bit fixed-point number✗✓✓
A1R5G5B5A 1-bit value and three 5-bit unsigned fixed-point
numbers✗✓✓
A4R4G4B4Four 4-bit unsigned fixed-point numbers✗✓✗
R5G6B55-bit, 6-bit, and 5-bit fixed-point numbers✗✓✓
A8R8G8B8Four 8-bit fixed-point numbers✗✓✓
DXT1Compressed 4×4 pixels into 8 bytes ✗✓✗
DXT2,3,4,5Compressed...

Page 20

490
30.5 Achieving Optimal Performance
While graphics hardware is becoming more and more programmable, there are still
some tricks to ensuring that you exploit the hardware fully to get the most perform-
ance. This section lists some common techniques that you may find helpful. A more
detailed discussion of performance advice is available in the NVIDIA GPU Program-
ming Guide , which is freely available in several languages from the NVIDIA Developer
Web site...
Start reading Nvidia Geforce 6 Series Manual
All Nvidia manuals