and vertebrate huntingtins as HEAT proteins
a, The consensus cDNA sequences of the Drosophila HD gene predicted a 3584 amino acid polypeptide with four regions of greatest sequence identity with human huntingtin (noted by horizontal lines, aa residues and %identity). The polyglutamine and polyproline stretches present in the amino terminal region of human huntingtin (noted as a red bar) are absent from Drosophila huntingtin. The positions of 28 HEAT-like sequences are shown as vertical lines in Drosophila, with the first and last numbered as D1 and D28, respectively. The positions of 36 HEAT-like sequences in human huntingtin are similarly designated, from H1 to H36, with corresponding locations in other vertebrate huntingtins shown below. These sequences are referred to here as HEAT-like, as they were not defined by the same homology considerations originally used to define HEAT repeats and do not always precisely match their reported start and end-points. Vertebrate huntingtin HEAT-like sequences were identified by iterative MAST searches of the nr protein database, beginning with a MEME motif of 38 amino acids trained with the 10 published human huntingtin HEAT repeats (corresponding with the regions of H3-5, H9-12, H18-19, and H21). All matches in vertebrate huntingtins with position p values < 10-4 were used to create 6 species-specific MEME huntingtin motifs, along with one combined cross-species MEME motif, each of which was used in the next round. Shuffling the sequences in the training sets or attempting an iterative search process with random segments of proteins not reported to contain HEAT motifs produced either no motif or no significant additional matches. MEME motifs were also created using 436 HEAT repeats from a wide variety of proteins [36, 37] as well as from subsets of these representing importin (HEAT_IMB), adaptin (HEAT_ADB) and PP2A (HEAT_AAA) families. The vertebrate huntingtin HEAT-like regions (detected in one or more species) by these motifs were: HEAT: 2–6, 9–12, 16, 19, 30, 36; HEAT_IMB: 2–4, 6, 12, 17; HEAT_ADB: none, HEAT_AAA 2–5, 9, 10, 12, 16, 17, 28, 30, 34–36. Drosophila huntingtin HEAT-like sequences were identified by similar iterative searches, seeding the initial species-specific MEME motif with 4 Drosophila huntingtin segments (HEAT-like sequences 1, 10, 13, 19) that showed significant matches with the HEAT_IMB importin MEME motif. During the iterative searching, additional MEME motifs were also generated using the combination of Drosophila and fish HEAT-like sequences. Individual MEME motifs created from each group of 6 vertebrate HEAT-like sequences revealed a direct correspondence (noted in green) between vertebrate segments 2, 12, 16 and 35 and Drosophila segments 1, 13, 16 and 28, respectively. It is likely that Drosophila huntingtin contains additional undetected HEAT-like sequences, as our search process could not benefit from comparison with more closely related species, as was possible among the vertebrates. b, Consensus secondary structures for both human and Drosophila HEAT-like sequences (probability of helical structure, pH_sec, for amino acids 1–38) were predicted using PhD without alignment and revealed a pair of helical regions separated by a non-helical region.