Sequence similarity in 6 regions of the HIV genome

These histograms are based on pairwise similarity scores from alignments of a selected number of sequences in the HIV database. All sequences are from different individuals. These numbers form an upper bound on similarities that can be considered 'reasonable', because:

  1. The sequences were gapstripped before the calculations, reducing the differences;
  2. many were derived from cultured virus ('lab strains'), and
  3. most of them are from old isolates (1984-85).

Similarities greater that 0.99 were never observed even in the protease alignment. Similarities over 98.5% were seen with the following frequencies: gp120 0%, gp41 0.05%, p17 0.05%, p24 0.5%, RT 2%, protease 7%.

