By Amrita Iyer
Mentor: Marta Gaglia, Biology; Funding Source: Provost’s Officeiyeramrita_36504_2263656_Poster-Presentation-Final-1
Influenzas A Virus, otherwise known as the flu, affects millions of people each year. I am sure many of us can remember getting our flu shot every year, but this is only preventative. Currently, there are limited treatment options once we contract the virus, so by studying the virus, we can develop better treatments for people.
My research is with the Gaglia Lab, which studies PA-X, a ribonuclease found in Influenza A Virus that degrades host RNAs and blocks the expression of many host genes, including genes involved in cellular immune response (Jagger et al., 2012; Gaucherand et al., 2019. Through analysis of a few RNAs, the Gaglia lab has determined that there is evidence that PA-X activity is tied to RNA splicing, a processing step of RNA in which parts of the RNA are cut out (introns) and parts of the RNA are retained (exons) (Gaucherand et al., 2019). In addition, after mapping the PA-X cut site in three separate RNAs, the lab has found that a particular “GCTG” sequence may also define cut site specificity (Gaucherand, unpublished). Since a “GCTG” sequence is ubiquitous in RNAs as it is only four bases long, this may only be part of the criteria for PA-X cut site activity. Hence, my research this summer focused on determining if PA-X is sequence specific.
The first part of my research focused on analyzing RNA sequences to determine if there was a relationship between the frequency of a sequence and the degradation levels of RNAs containing that sequence. Using Python, I developed a code that would help me analyze RNA sequences to determine the frequency of different “k-mers” (groups of nucleotide bases). The Gaglia lab had already measured total RNA levels in cells that express PA-X versus control cells and found that cells that express PA-X have lower levels of RNA overall (Gaucherand et al., 2019). The lower the ratio value, the more the RNA is broken down, indicating PA-X activity. Using Python, I looked at all RNAs to see if there was a negative correlation between the frequency of a sequence in the RNA and its degradation level. Given that there were so many different sequences, in order to better analyze the data, I used a Spearman statistic (rho value). The rho value represented the strength of the correlation, with –1 being the strongest negative correlation. Based on the results, there was a difference in correlation between the rho values for all RNAs, compared to RNAs with a specific number of exons, supporting the idea that splicing influences PA-X activity. In addition, the sequences with the most negative correlations were like the observed “GCTG” sequence found earlier, which supported the idea that this sequence may be involved in PA-X activity (Finding 1).
Then, my research shifted to confirm that my results of PA-X activity were influenced by splicing. Since the dataset for all RNAs was much larger than the dataset for RNAs with a specific number of exons (8480 vs. 632), I needed to compare a random subset of RNAs of a similar size to control for size effects and see if this dataset yielded similar results. This analysis showed that when we set the number of exons, the rho values are similar, indicating that the rho values obtained significant and are not random (Finding 2).
The last part of my research focused on looking into the most negative correlations from my frequency versusdegradation graphs. Taking the “k-mer” sequences that yielded the most negative correlations, Prof. Gaglia, Lea, and I were able to propose the 10-base recognition sequence CTGCTGGGCA as a potential cut site sequence for PA-X. To see if PA-X acts on this sequence, I looked for this sequence in all the RNAs with 6 exons and found RNAs with perfect or near perfect matches (0, 1 or 2 bases different) to this sequence. Using the ratio values, I plotted the cumulative probabilities of these RNAs to see if there was a difference in ratio values between RNAs that have perfect or near perfect matches compared to RNAs with no matches. Based on the graph, it appears that RNAs with sequences that are a perfect or near perfect matches to the proposed recognition sequence are more degraded, supporting the idea that PA-X might act on this sequence (Finding 3).
Overall, my research is far from complete. The next step is to further analyze the cumulative probability graphs by plotting a random subset of RNAs to see if the difference in ratio values is significant and refine the proposed recognition sequence according to the results.