A new way to rapidly create functional, fluorescent fusion proteins: random insertion of GFP with an in vitro transposition reaction

Background The jellyfish green fluorescent protein (GFP) can be inserted into the middle of another protein to produce a functional, fluorescent fusion protein. Finding permissive sites for insertion, however, can be difficult. Here we describe a transposon-based approach for rapidly creating libraries of GFP fusion proteins. Results We tested our approach on the glutamate receptor subunit, GluR1, and the G protein subunit, αs. All of the in-frame GFP insertions produced a fluorescent protein, consistent with the idea that GFP will fold and form a fluorophore when inserted into virtually any domain of another protein. Some of the proteins retained their signaling function, and the random nature of the transposition process revealed permissive sites for insertion that would not have been predicted on the basis of structural or functional models of how that protein works. Conclusion This technique should greatly speed the discovery of functional fusion proteins, genetically encodable sensors, and optimized fluorescence resonance energy transfer pairs.


Background
The discovery that the jellyfish green fluorescent protein (GFP) can form a functional fluorophore without other gene products or co-factors [1] was rapidly followed by reports that GFP can be used to create fluorescent fusion proteins [e.g. [2,3]]. For the first time, it became possible to create a wide variety of genetically encodable fluorescent fusion proteins that could be followed in living systems [reviewed in: [4]]. Most GFP fusion proteins have been built by placing GFP at either the N-or C-terminus of the host protein. This can, however, destroy the function of some host proteins. The alternative is to insert GFP into the middle of the host protein [5][6][7][8]. Unfortunately, finding a permissive location for insertion of the GFP can be problematic and time consuming.
One way of speeding the process is to randomly generate libraries of GFP fusion proteins and then screen for clones  19 bp inverted repeats, the MEs. The EGFP coding region is positioned such that when <EGFP-V> inserts between the codons of a target gene, a fusion protein will be produced. <EGFP-V> also carriesKan r . There is a stop codon in the 5' end of the Kan r cassette in the same frame as the EGFP coding sequence, so if the transposon lands in an open reading frame, in the correct orientation and frame, a truncated, EGFPtagged, protein will initially be produced. Removal of theKan r cassette by Srf I digestion and re-ligation produces a reading frame that extends across the entire transposon. (B) The target plasmid, a s EE in pcDNA1/Amp, encodes an epitope tagged version of the G protein subunit a s . (C) Transposed plasmids carry Amp r and Kan r . <EGFP-V> insertions within the target gene produce a PCR product when <EGFP-V> is inserted in the correct orientation, and the size of the PCR product reveals which <EGFP-V> insertions are in the coding sequence. that encode functional, fluorescent proteins. One group used a combination of nick translation and nuclease S1 treatment to randomly insert GFP into a cAMP-dependent protein kinase regulatory subunit from Dictyostelium [6]. A surprisingly large number of the resulting fusion proteins were fluorescent and retained cAMP binding, demonstrating that this can be a powerful approach. A weakness of this strategy, however, is that it can produce deletions in the host sequence. Another approach is to use the random behavior of a transposon to insert GFP into many different places in a target protein. Two synthetic transposons have been reported that can produce GFP fusion proteins [9,10]. The design of these transposons included additional protein domains or linkers between the GFP and the target protein, however, and little is known about how many of the resulting proteins continued to function. We reasoned that a Tn5 transposon [11,12] could be designed that would generate GFP fusion proteins with relatively short linkers (~7 amino acids) between the GFP and the host protein analogous to GFP fusion proteins that have already been shown to function [6,8]. To test this approach, we targeted the G protein subunit a s and the glutamate receptor subunit GluR1.

Results
Changes have been made to the Tn5 transposon, and its transposase, that in concert produce a hyperactive transposon capable of a 1% insertion frequency in an in vitro reaction [reviewed in: [12]]. This hyperactive Tn5 transposon is defined as any sequence flanked by the inverted 19 base pair repeats known as mosaic ends (MEs). The recombinant Tn5 transposase binds these ME sequences and, in the presence of Mg 2+ , catalyzes the random insertion of the transposon into target DNA in a complex process that involves generating a 9 base pair staggered nick in the target. This staggered nick is subsequently repaired to produce a 9 base pair duplication of the target sequence that flanks the inserted transposon. Two possible reading frames extend through the MEs of the Tn5 transposon. Our initial GFP transposon, <EGFP-V>, was created by placing the sequence encoding enhanced green fluorescent protein (EGFP) in one of these frames such that if the transposon landed in another coding sequence, in the correct orientation and frame, it would produce a GFP fusion protein ( figure 1A). The low probability of transposition in an in vitro reaction made it necessary to include antibiotic resistance, so Kan r was added to the transposon flanked by Srf I restriction sites that can be used to subsequently remove it.
An epitope tagged version of the G-protein subunit a s (a s EE) was chosen as the first target ( figure 1B). Previous studies have shown that the N-and C-termini of a s are important for its interactions with receptors, G-protein b and g subunits, and the plasma membrane [13,14], so placing GFP within internal regions of a s is more likely to generate a functional, fluorescent subunit [8]. Moreover, the structure of a s has been solved [15], making it possible to interpret the results in the context of the three-dimensional structure. After transposition and transformation, colonies expressing dual antibiotic resistance were screened with PCR to identify clones in which <EGFP-V> had landed in the correct orientation within the coding region (figure 1C). Assuming that Tn5 behavior is random, the probability that <EGFP-V> will land in the a s EE coding sequence during transposition should be the ratio of the coding sequence to the size of the total plasmid (18.5%). However, transpositions that disrupt critical elements of the plasmid (the plasmid origin or the Amp r gene) should not be recovered after transformation, so the predicted probability of observed transpositions within the a s EE coding sequence increases to 23.8%, with half of these (11.9%) being in the correct orientation. PCR screening of 384 Amp r + Kan r resistant colonies identified 44 clones with <EGFP-V> insertions within the a s EE coding region in the correct orientation (11.4%).
Each clone containing an in-frame insertion should encode a truncated a s protein with GFP at the carboxy-terminus due to a stop codon in the Kan r . Thirty-five of the PCR-positive clones were transiently expressed in HEK 293 cells, and 13 were fluorescent. Sequencing confirmed that the 13 fluorescent constructs were truncated a s -GFP fusion proteins (with 12 being unique insertions) and that the remaining 22 <EGFP-V> insertions were out of frame (figure 2A). The 12 clones encoding unique a s -GFP fusion proteins were digested with Srf I and re-ligated to create full-length fusion proteins (figure 2B). Transient expression of each of the 12 a s -GFP fusion proteins in HEK 293 cells produced a fluorescent signal. This is surprising because several insertions appear to be in internal and/or rigid secondary protein structures ( figure 3). It appears that the folding of GFP to form a fluorophore is thermodynamically favorable at most insertion sites.
To determine what effect the GFP insertions had on a s localization, the full-length fusion proteins were transiently co-expressed in HEK 293 cells with G protein subunits b 1 and g 7 , which have been shown to mediate signaling between the b-adrenergic receptor and G s [16]. Amino-and carboxy-terminus GFP fusions, a s -GFP(N) and a s -GFP(C), respectively, were also co-expressed with b 1 and g 7 for comparison. The end-labeled GFP fusions and two of the transposon insertions, a s -GFP(18-20) and a s -GFP(92-94), showed clear localization to the plasma membrane ( figure 4A). The remaining 10 fusion proteins displayed a uniform fluorescence signal throughout the cytoplasm (figure 4B).
The fusion proteins were tested for function by assaying their abilities to stimulate adenylyl cyclase in response to receptor stimulation. They were co-expressed with the luteinizing hormone (LH) receptor in HEK 293 cells and cAMP accumulation was measured in both the presence and absence of the LH receptor agonist, human chorionic gonadotropin (hCG). Basal and stimulated cAMP accumulation in cells expressing a s -GFP(18-20), a s -GFP(92-94), a s -GFP(N), or a s -GFP(C) were higher than in cells expressing vector alone (figure 5A). However, only in cells expressing a s -GFP(92-94) were these differences statistically significant (p < 0.05). The basal and stimulated activities of a s -GFP(92-94) were less than those of a s , although these differences were not statistically significant (p < 0.05). The remaining 10 of the12 fusion proteins exhibited no detectable activity. One possible explanation for the decreased activities of the a s -GFP fusion proteins relative to a s EE would be a decrease in protein expression level. Cell fractionation and immunoblotting with an anti-EE monoclonal antibody showed that both a s -GFP(92-94) and a s -GFP (18)(19)(20) were expressed at lower levels than a s EE, in contrast to a s -GFP(N) and a s -GFP(C) (figure 5B).
Interpreting these results in the context of the structure of a s leads to a surprising result. A rational approach to designing a fluorescent, functional a s -GFP fusion protein would have most likely targeted the exposed loops [e.g. [8]], yet these insertions were not functional. The most functional protein was produced by the insertion of GFP into an a-helix that one would have avoided (figure 3).
The discovery that all of the in-frame insertions in a s produced truncated fluorescent fusion proteins suggested that we could identify in frame insertions by transiently expressing all of the transposed clones and visually screening them for fluorescence. This alternative screening strategy could be particularly useful for large coding regions where a PCR-based screen might fail. To reduce the number of transient transfections required, a second  transposon was created with enhanced cyan fluorescent protein (ECFP). Two separate transpositions with the different colored transposons, followed by co-transfections in the visual screen (one potential green clone and one potential cyan clone per well), can identify twice as many in-frame insertions in a given number of transfections. This approach could be expanded to encompass many different fluorophores.
In the experiments with a s , several clones were recovered with identical transposon insertions. This is consistent with previous reports of Tn5 preferentially inserting into particular locations in the target sequence [17,18]. Since these "hotspots" could become a limiting factor in the number of unique insertions recovered within a target sequence, the second reading frame through the Tn5 MEs was used for the ECFP transposon. This doubles the  (92-94). The GFP insertions into a s can be interpreted in the context of the structures of GFP (PDB file: 1EMA) and a s -GTPg S (PDB file: 1AZT). In this image, the structure of GFP [42] is green, while the helical domain of the a s ubunit [15] is pink, and the GTPase domain is blue. GTPg S is yellow. The GFP insertion a s -GFP(92-94) that produced a functional G protein subunit is illustrated by the short linkers (encoded by the Tn5 MEs) between GFP and a s (dark blue). The other sites of <EGFP-V> insertion are shown as green spheres. The numbers on the spheres indicate the second of the three duplicated residues that flank the transposon insertions (the numbers are based on the long form of a s ).
number of potentially useful insertion sites within a given target sequence.
The glutamate receptor subunit GluR1 [19] was used to test the new transposons and the visual screening process. Independent transpositions of the GluR1 plasmid were performed with the EGFP and ECFP transposons (<TgPT-0> and <TcPT-1>, respectively). In 288 co-transfections, there were 20 wells with EGFP fluorescence, 21 wells with ECFP fluorescence, and 2 wells with both EGFP and ECFP fluorescence. Sequencing revealed 35 unique insertions (17 <TgPT-0> and 18 <TcPT-1>) and 10 repetitive insertions ( figure 6A). The recovery of 45 fluorescent clones from 576 colonies (7.8%) agrees with the predicted frequency of transpositions resulting in GluR1-EGFP/ECFP fusions (7.7%), which is consistent with the interpretation that all in-frame insertions produce a fluorescent protein. Clones representing unique fluorescent fusion proteins were digested with Srf I to remove the Kan r selection cassette and re-ligated to generate full-length GluR1-EGFP/ECFP fusions. These fusion proteins were screened, in transiently transfected HEK 293 cells, for glutamate-gated ion channel function. Of the 29 unique tribrid fusion constructs tested, all produce detectable fluorescence and 6 were functional ( figure 6B).

Discussion
Creating functional, fluorescent fusion proteins involves finding a permissive site for the insertion of GFP, a process that in most cases still involves some guesswork. The results of both the a s and GluR1 transpositions illustrate this point. Based on previous studies with the G protein subunit a q [8] we anticipated that an insertion within an exterior flexible loop region of a s would be most likely to produce a functional fusion protein. Surprisingly, the a s fusion protein that was the most functional, a s -GFP (92-94), resulted from an insertion into an a helix (figure 3), while the insertions in exposed loops, a s -GFP (67-69) and a s -GFP (188-190), were not functional. Similarly, in the case of GluR1, one of the insertions that produced a functional channel, GluR1-GFP(526-528), was within the hydrophobic region thought to be the first transmembrane domain (see Additional File: Figure 7). Additionally, within a given region of GluR1, one insertion will produce a functional channel while another nearby insertion does not (e.g. the intracellular carboxy-terminus region or the amino terminus between amino acids 210 and 330). The reasons for these discrepancies are not obvious.
The discovery that GFP will still fold and form a fluorophore when placed virtually anywhere in another coding region suggests that the limiting step in the process is whether the target protein it is inserted into folds and functions correctly. Indeed, GFP fusion constructs have been used to assay, and improve upon, the folding of a variety of proteins in a bacterial expression system [20]. The relatively random nature of the transposition events we recovered in this study suggests that it might be possible to insert GFP at nearly every position in a given protein, but there are two potential limits. First, the laws of probability predict that there will be rapidly diminishing returns in the search for unique Tn5 insertions as one recovers each additional clone. Second, the behavior of the Tn5 transposon is not entirely random. Goryshin and colleagues [18] have shown that there is a weak consensus site for Tn5 insertion which is consistent with our results. It appears that the resolution limit will be an insertion each three amino acids on average in a target protein.
Inserting a reporter domain such as GFP into another protein always has the potential of perturbing the target and destroying it's ability to function. In this study 16% of the tribrid fusion proteins were still functional. One explanation for why GFP can be used for internal insertion is that the N-and C-termini of GFP exit the structure quite close to one another and are unlikely to displace the surrounding domains of the target protein a great deal. This is analogous to the use of the bovine pancreatic trypsin inhibitor for internal insertions [21]. The transposons described here could potentially be improved upon by optimizing the length and flexibility of the linkers between the target   and the GFP. Another potential improvement to the process would be to use bacterial expression to screen for transposon insertions that produce a fluorescent protein. This could, however, be problematic with proteins from the mammalian nervous sytem, such as ion channels, that are difficult to express in bacteria.
The approach described here should speed the discovery of genetically encodable fluorescent sensors. The pioneering work of Siegel and Isacoff showed that GFP placed within a portion of the Shaker K + channel C-terminus produced a fluorophore that responded to changes in membrane voltage [5], but they built a number of different constructs before finding one that worked. Similarly, Ataka and Pieribone created an EGFP-Na + channel fusion protein that changes fluorescence in response to membrane depolarizations on a time-scale that would be sufficent to image action potentials. This discovery, however, was the result of designing, building, and testing eight different tribrid fusion proteins [22]. Little is known about the mechanism whereby changes in channel conformation are converted to changes in the fluorophore, so it remains to be determined whether GFP can signal conformational changes in other kinds of proteins. Nevertheless, the use of the transposons described here should shift the work from building the constructs to devising high throughput assays for function.
Finally, random GFP tagging will facilitate the creation of potential fluorescence resonance energy transfer (FRET) reagents to study protein interactions in living systems. To date, a few studies have demonstrated the potential power of GFP-FRET by labeling different proteins [6,[23][24][25][26] or by fusing two different fluorophores to the same protein [27][28][29][30][31][32]. Creating efficent donor and acceptor fusion proteins is difficult, however, because FRET only occurs when the two fluorophores are attached to surfaces that are very close to one another. The approach described here makes it possible to rapidly generate libraries of potential donor and acceptor tribrid fusion proteins that can be screened, in pairwise combinations, for function and FRET signals.

Conclusions
The transposons described here make it possible to rapidly generate large numbers of different GFP fusion proteins. The results show that GFP can be inserted into a wide variety of other protein domains and it will continue to fold and form a fluorophore. The rapid and random nature of the transposition process makes it possible to generate and screen many different fusion constructs to identify those that continue to function. In the case of the two proteins tested here, roughly 1 in 6 of the fusion proteins retained their signaling function, and the random nature of the transposition process revealed permissive sites for insertion that would not have been predicted on the basis of structural or functional models of how that protein works. This simple tool should speed the search for a wide variety of new biological probes for the study of nervous system.

Materials and Methods
PCR and standard subcloning procedures were use to create the initial transposon, <EGFP-V> (full sequence at: [http://momotion.med.yale.edu]). The Tn5 MEs were added to the 5' and 3' ends of an EGFP coding sequence, with a Srf I restriction site at its 3' end, such that one continuous reading frame extended through both MEs and EGFP (figure 2). To add antibiotic selection, the Kan r gene from pUniV5-His-TOPO™ (Invitrogen, Carlsbad, CA) was flanked with Srf I sites and inserted into the transposon. The improved transposons, <TgPT-0> and <TcPT-1>, were created in the same way as <EGFP-V>, but Asc I sites were added to facilitate changing the fluorescent protein at a later date (supplemental material). In addition, the two different reading frames present in the MEs were used to create the two different transposons, and ECFP was used in place of EGFP in <TcPT-1>. A primer complementary to the19 bp Tn5 ME (5'-CTGTCTCTTATACACATCT-3') was used to amplify the transposons (1 cycle at 95°C for 3:30 min., 24 cycles of 95°C for 30 sec 47°C for 30 sec 72°C for 1 min., 1 cycle at 72°C for 5 min.) with Pfu polymerase (Stratagene, La Jolla, CA). The PCR product was purified and concentrated with the Geneclean II kit (Bio101 Inc., Vista, CA) and eluted in 1X TE buffer. 0.2 fmoles of transposon were incubated with 5.0 mL of EZ::TN™ transposase (Epicentre Technologies) in 25% glycerol at 25°C for 30 min.
Molar equivalents of transposon and target plasmid (0.4 fmoles ea.) were incubated in reaction buffer (50 mM Tris-acetate (pH 7.5), 150 mM potassium acetate, 10 mM magnesium acetate and 4 mM spermidine) at 37°C for 2 hr in a 10 mL reaction. Transposition was stopped by adding 1 mL of 1% SDS and incubating at 70°C for 10 min. Top 10 F' E. coli (Stratagene) were transformed with 1 mL of the transposition reaction and plated on LB agar with either ampicillin (100 mg/mL) and kanamycin (50 mg/ mL) to recover transposed clones, or ampicillin (100 mg/ mL) alone to establish the transposition efficiency.
The cDNA encoding the rat a s [33], modified to carry the EE epitope [34], was in pcDNA1/Amp (Invitrogen). GFP was added to the N-or C-terminus of a s to create end-labeled constructs for comparison with the transposed GFP tribrid fusion proteins. The amino-labeled clone, GFP-[GGGPSGGGGS]-a s EE, and carboxy-labeled clone, a s EE-[SGGGGSGQH]-GFP, were generated via overlap extension [35]. Linker sequences are in brackets. The flip variant of rat GluR1 was in the CMV expression plasmid pRK5 (a generous gift from Derek Bowie, Emory University, Atlanta, GA).
PCR screening for <EGFP-V> insertions within the a s EE coding region was performed using a protocol described by Cease et al. [36] using an upper primer complimentary to the 5' UTR (5'-GCTCCCGCGGCTCCTGCTCTGCTC-3'), and a lower primer complimentary to EGFP (5'-GCCGTCGCCGATGGGGGTGTTCTG-3'. The clones that produced clear PCR products within the expected size range were then miniprepped (QIAgen, Germantown, MD).
Insertion sites were identified for all PCR-positive <EGFP-V> transposed clones and all fluorescent <TgPT-0>/<TcPT-1> transposed clones by sequencing out of the transposon with a primer complimentary to the EGFP/ECFP coding region (5'-tggccgtttacgtcgccgtcca-3'). Srf I restriction digestion was then used to remove theKan r cassette from the clones carrying in-frame insertions, thereby creating a sequence encoding a full-length fusion protein. After digestion and re-ligation, Top 10 F' E. coli were transformed with 1 mL of the ligation reaction and plated on LB agar containing ampicillin. The colonies were re-plated the following day on ampicillin and kanamycin to verify loss of theKan r .
The fusion proteins were transiently expressed in HEK 293 cells [37]. Transfections were done using Lipofectamine 2000 (Gibco BRL). Images were collected from live cells 20-48 hr later on an inverted Zeiss microscope fitted with computer controlled (IPLabs, Scanalytics) filter wheels (Ludl Electronics) on the excitation and emission paths. EGFP was imaged with an FITC filter set, while ECFP was distinguished from EGFP in co-expression experiments by changing both the excitation and emission filter sets (Exciters: 440AF21 & 500AF25, Dichroic cat# XF 2063, Emitters 480AF & 545AF35; Omega, Brattleboro, VT). a s -GFP fusion proteins were assayed for the ability to stimulate adenylyl cyclase in response to luteinizing hormone (LH) receptor stimulation [38]. 10 6 HEK 293 cells/ 60 mm-dish were co-transfected with 2 mg of plasmid DNA encoding the a s -GFP fusion protein, and 0.2 mg of plasmid DNA encoding the rat LH receptor in pCIS [39], using 10 mL of Lipofectamine 2000. [ 3 H]-adenine-labeled cells were assayed for cAMP accumulation after incubation at 37°C for 40 min. in the presence of 1 mM 3-isobutyl-1-methylxanthine (IBMX) a phosphodiesterase inhibitor, and in the presence or absence of 7.5 ng/mL human chorionic gonadotropin (hCG), as described previously. Conversion of ATP to cAMP was expressed as: 12 ´ 10 6 HEK 293 cells were transfected, using DEAE-dextran [40], with 25 mg of plasmid DNA. Forty-eight hours after transfection, cells were lysed and membrane and supernatant fractions harvested as described previously [8]. 10 mg of membrane proteins and normalized volumes of the supernatants were resolved by SDS-polyacrylamide electrophoresis (10%), transferred to nitrocellulose, and probed with a monoclonal antibody to the EE epitope [34]. The antigen-antibody complexes were visualized with ECL chemiluminescence (Amersham Biosciences, Piscataway, NJ).
Whole-cell patch clamp recording was used to test the GluR1 fusion proteins for function in transiently transfected HEK 293 cells as previously described [41]. The external solution was (in mM): 150 NaCl, 3 KCl, 2 CaCl 2 , 1 MgCl 2 , 5 glucose, 0.002 glycine and 10 HEPES (pH 7.4). Patch pipettes were filled with a solution containing (in mM): 120 CsF, 33 KOH, 2 MgCl 2 , 1 CaCl 2 , 0.1 spermine, 10 HEPES, and 11 EGTA (pH 7.4). Cyclothiazide was prepared as a 20 mM stock solution in DMSO and diluted to 100 mM in external solution. All chemicals were purchased from Sigma. Drugs were applied with a rapid superfusion system made from a pulled theta capillary. The open tip responses obtained with this system had 10-90% rise-times of 150 ms to 300 ms.