Home > Nvidia > Processor > Nvidia Geforce 6 Series Manual

Nvidia Geforce 6 Series Manual

Here you can view all the pages of manual Nvidia Geforce 6 Series Manual. The Nvidia manuals for Processor are available online for free. You can easily download all the documents as PDF.

Overview View all the pages Comments

Page 1

471
The GeForce 6 Series GPU
Architecture
Emmett Kilgariff
NVIDIA Corporation
Randima Fernando
NVIDIA Corporation
Chapter 30
The previous chapter described how GPU architecture has changed as a result of compu-
tational and communications trends in microprocessing. This chapter describes the archi-
tecture of the GeForce 6 Series GPUs from NVIDIA, which owe their formidable
computational power to their ability to take advantage of these trends. Most notably, we
focus on the GeForce 6800 (NVIDIA’s...

Page 2

472Chapter 30 The GeForce 6 Series GPU Architecture
bandwidth, until finally the PCI Express standard was introduced in 2004, with a maxi-
mum theoretical bandwidth of 4 GB/sec simultaneously available to and from the GPU.
(Your mileage may vary; currently available motherboard chipsets fall somewhat below
this limit—around 3.2 GB/sec or less.) 
It is important to note the vast differences between the GPU’s memory interface band-
width and bandwidth in other parts of the system, as shown in Table 30-1....

Page 3

473
Table 30-1 reiterates some of the points made in the preceding chapter: there is a vast
amount of bandwidth available internally on the GPU. Algorithms that run on the
GPU can therefore take advantage of this bandwidth to achieve dramatic performance
improvements. 
30.2 Overall System Architecture
The next two subsections go into detail about the architecture of the GeForce 6 Series
GPUs. Section 30.2.1 describes the architecture in terms of its graphics capabilities.
Section 30.2.2 describes the...

Page 4

474
First, commands, textures, and vertex data are received from the host CPU through
shared buffers in system memory or local frame-buffer memory. A command stream is
written by the CPU, which initializes and modifies state, sends rendering commands, and
references the texture and vertex data. Commands are parsed, and a vertex fetch unit is
used to read the vertices referenced by the rendering commands. The commands, vertices,
and state changes flow downstream, where they are used by subsequent...

Page 5

GPU—the GeForce 6 Series—allows vertex programs to fetch texture data. All opera-
tions are done in 32-bit floating-point (fp32) precision per component. The GeForce 6
Series architecture supports scalable vertex-processing horsepower, allowing the same
architecture to service multiple price/performance points. In other words, high-end
models may have six vertex units, while low-end models may have two.
Because vertex processors can perform texture accesses, the vertex engines are connected
to the...

Page 6

476
The rasterization block calculates which pixels (or samples, if multisampling is enabled)
are covered by each primitive, and it uses the z-cull block to quickly discard pixels (or
samples) that are occluded by objects with a nearer depth value. Think of a fragment as
a “candidate pixel”: that is, it will pass through the fragment processor and several tests,
and if it gets through all of them, it will end up carrying depth and color information
to a pixel on the frame buffer (or render target)....

Page 7

The fragment processor uses the texture unit to fetch data from memory, optionally filter-
ing the data before returning it to the fragment processor. The texture unit supports many
source data formats (see Section 30.3.3, “Supported Data Storage Formats”). Data can be
filtered using bilinear, trilinear, or anisotropic filtering. All data is returned to the fragment
processor in fp32 or fp16 format. A texture can be viewed as a 2D or 3D array of data that
can be read by the texture unit at arbitrary...

Page 8

478
independent memory partitions give the GPU a wide (256 bits), flexible memory sub-
system, allowing for streaming of relatively small (32-byte) memory accesses at near the
35 GB/sec physical limit.
30.2.2 Functional Block Diagram for Non-Graphics Operations
As graphics hardware becomes more and more programmable, applications unrelated to
the standard polygon pipeline (as described in the preceding section) are starting to
present themselves as candidates for execution on GPUs.
Figure 30-6 shows a...

Page 9

The vertex processor operates on data, passing it directly to the fragment processor, or
by using the rasterizer to expand the data into interpolated values. At this point, each
triangle (or point) from the vertex processor has become one or more fragments.
Before a fragment reaches the fragment processor, the z-cull unit compares the pixel’s
depth with the values that already exist in the depth buffer. If the pixel’s depth is
greater, the pixel will not be visible, and there is no point shading that...

Page 10

480
during z-cull. This avoids all fragment processor work on scissored (rejected) pixels. Scis-
soring is rarely useful for general-purpose computation because general-purpose program-
mers typically draw rectangles to perform computations in the first place.
Next, the fragment’s depth is compared with the depth in the frame buffer. If the depth
test passes, the fragment moves on in the pipeline. Optionally, the depth value in the
frame buffer can be replaced at this stage.
After this, the fragment can...

Start reading Nvidia Geforce 6 Series Manual
All Nvidia manuals