Think about it !
Head stage: selects the first N records from each partition of an input data set and copies the selected records to an output data set.
Tail stage: selects the last N records from each partition of an input data set and copies the selected records to an output data set.
Sample stage: samples an input data set. Operates in two modes: Percent mode, extracts rows, selecting them by means of a random number generator, and writes a given percentage of these to each output data set; Period mode, extracts every Nth row from each partition, where N is the period which you supply.
Filter stage: transfers, unmodified, the records of the input data set which satisfy the specified requirements and filters out all other records. You can specify different requirements to route rows down different output links.
External Filter stage: allows you to specify a UNIX command that acts as a filter on the data you are processing. An example would be to use the stage to grep a data set for a certain string, or pattern, and discard records which did not contain a match
Sequential File stage: FILTER OPTION - use this to specify that the data is passed through a filter program before being written to a file or files on output or before being placed in a dataset on input.