Texture is usually stored as a bitmap. A bitmap contains a finite number of discrete samples of some original image. During scan conversion of a textured polygon, pixel coordinates are mapped into texture coordinates (u, v) and the texture is re-sampled using these coordinates. In particular, if the polygon is shrunk due to perspective, it will cover only a few pixels. This will result in just a few sampling points spread across the texture bitmap.
If the texture contains a lot of fine detail, the re-sampling points might not be representative of the particular area of the bitmap.
Below, the picture on the right is the result of picking every 5th pixel from the picture on the left. Notice random white pixels when the sampling point fell on white tracks. The tracks correspond to a high-frequency component of the picture--large variations of color concentrated in small areas. Since the width of the tracks is less than 5 pixels, the samples often miss the tracks completely, at other times exaggerate their width. Sampling of an image that has high-frequency components with a grid of lower frequency results in aliasing artifacts.
On the left, this is the same picture after applying a 5x5 box filter. The resampling on the right (again, every 5th pixel) results in a much smoother picture. Filtering smoothed out high frequency components. For instance, the tracks are much less sharp than in the previous picture.
Applying a box filter is equivalent to averaging out a square of 5x5 pixels for each pixel of the resulting picture.