Evidence on the role of long non‐coding (lnc) RNAs has been accumulating over decades, but it has been only recently that advances in sequencing technologies have allowed the field to fully appreciate their abundance and diversity. Despite this, only a handful of lncRNAs have been phenotypically or mechanistically studied. Moreover, novel lncRNAs and new classes of RNAs are being discovered at growing pace, suggesting that this class of molecules may have functions as diverse as protein‐coding genes. Interestingly, the brain is the organ where lncRNAs have the most peculiar features including the highest number of lncRNAs that are expressed, proportion of tissue‐specific lncRNAs and highest signals of evolutionary conservation. In this work, we critically review the current knowledge about the steps that have led to the identification of the non‐coding transcriptome including the general features of lncRNAs in different contexts in terms of both their genomic organisation, evolutionary origin, patterns of expression, and function in the developing and adult mammalian brain.
Proteins have long been regarded as the only molecules decoding the genetic information. Yet, plenty of genes performing this function at the RNA level have gradually been described since more than half a century (Eddy, 2001; Morris & Mattick, 2014). These non‐coding RNAs carry out biological functions as diverse as proteins do, both as housekeeping and as regulatory molecules (Pauli et al, 2011), challenging proteins as executors of all genetic programmes.
Housekeeping non‐coding RNAs are involved in basic cellular functions and include rRNAs, tRNAs, snoRNAs and snRNAs (see Table 1 for abbreviations and functions). Regulatory non‐coding RNAs are usually classified into small and long non‐coding RNAs based on a threshold of 200 nucleotides. Small non‐coding RNAs with sizes ranging from 18 to 32 nucleotides include, among others, miRNAs, esiRNAs and piRNAs (Table 1) that are mainly involved in transcriptional and posttranscriptional regulation of gene expression through RNA interference (Pauli et al, 2011). Long non‐coding (lnc) RNAs are, in turn, defined as RNAs longer than 200 nucleotides (a threshold based on technical limitations of RNAseq library preparations) lacking coding potential as assessed by a number of bioinformatic tools (Rinn & Chang, 2012; Ilott & Ponting, 2013; Ulitsky & Bartel, 2013). Though a few examples of lncRNAs have been known and studied for decades, including Xist, H19 and Air (Rinn & Chang, 2012; Morris & Mattick, 2014), lncRNAs remain altogether among the last classes of non‐coding RNAs to have been described and, to date, the least understood.
In the early 2000s, several studies aiming to characterise the full coding capacity of the mammalian genome initially reported that there might be as many non‐coding as coding genes (Rinn & Chang, 2012; Morris & Mattick, 2014). These studies included the FANTOM project based on full‐length cDNA cloning and Sanger sequencing (Okazaki et al, 2002; Carninci et al, 2005) and other transcriptome studies based on tiling microarrays (Kapranov et al, 2002; Rinn et al, 2003). Other tools, such as chromatin state maps and RNA polymerase II occupancy, were also used to identify transcribed genomic regions (Guttman et al, 2009; De Santa et al, 2010). However, it was with the urge of massively parallel sequencing and development of RNAseq that the lncRNA component of several transcriptomes was thoroughly assessed and thousands of novel lncRNAs identified (Rinn & Chang, 2012; Ilott & Ponting, 2013; Ulitsky & Bartel, 2013). To date, lncRNAs have been studied in several organisms, tissues and cell types both during development and during adulthood, revealing thousands of lncRNAs with exquisite cell type, tissue and developmental stage specificity from which several characteristics of lncRNAs have emerged (Guttman & Rinn, 2012; Ilott & Ponting, 2013). Intriguingly, the organ where lncRNAs have the most peculiar characteristics is the brain.
The brain is the organ where more lncRNAs are expressed, encompassing the highest proportion of tissue‐specific lncRNAs (Derrien et al, 2012; Kaushik et al, 2013; Francescatto et al, 2014; Washietl et al, 2014). Even more, brain‐specific lncRNAs present the highest signals of evolutionary conservation relative to those expressed in other tissues (Ponjavic et al, 2009; He et al, 2014). Given these characteristics, lncRNAs have been proposed to play important roles in the genetic programmes regulating brain development and function (van Leeuwen & Mikkers, 2010; Qureshi et al, 2010).
In this review, we first describe the general features of lncRNAs that are more likely to be relevant to reveal their function further focusing on specific roles of a number of lncRNAs whose molecular function has been described during development and adulthood of the central nervous system (CNS). By this, we aim to point out the key findings that led to the emergence of a new field in the molecular cell biology of the mammalian brain, a field that is expected to significantly expand in the near future.
General characteristics of lncRNAs
At the molecular level, lncRNAs are in general similar to mRNAs. As they are transcribed by RNA polymerase II (Pol II), most lncRNAs are polyadenylated, capped and frequently spliced (Ulitsky & Bartel, 2013). Only a small fraction of lncRNAs is not polyadenylated (Ilott & Ponting, 2013), including circular RNAs (circRNAs) (Salzman et al, 2012), lncRNAs flanked by snoRNAs (Yin et al, 2012) or those with a triple helical structure at their 3′ end (Wilusz et al, 2012). Other general characteristics of vertebrate lncRNAs include a lower number of exons (2–3 on average) and shorter sequences than protein‐coding genes (Ulitsky & Bartel, 2013). Chromatin modification patterns, transcriptional regulation and splicing signals seem not to differ from those of coding genes, though splicing seems to occur with less efficiency (Ulitsky & Bartel, 2013).
Yet, some important differences exist between lncRNAs and mRNAs, including lower sequence conservation (Guttman et al, 2009; Ørom et al, 2010; Derrien et al, 2012) and ten times lower median expression levels (Ulitsky & Bartel, 2013). These differences have been used to argue against a functional role of lncRNAs, proposing that they are a consequence of unspecific activity of Pol II leading to lowly expressed and unstable transcripts that lack signs of sequence conservation (Wang et al, 2004; Struhl, 2007; Graur et al, 2013). However, not only do lncRNAs present clear signs of evolutionary conservation (Ponjavic et al, 2007; Chodroff et al, 2010) (further discussed below) but also their low expression levels are not necessarily a consequence of low stability nor an argument for lack of function. First, lncRNAs do not seem to be particularly unstable as a group as they present a wide range of transcript half‐lives, similar to that of mRNAs (Clark et al, 2012; Tani et al, 2012). Second, low levels of expression are not necessarily reflecting homogeneous expression across tissues but could rather be a consequence of cell specificity with higher expression levels in a restricted cell population, which becomes averaged down when studying groups of cells, whole tissues or organs (Djebali et al, 2012; Ilott & Ponting, 2013). Finally, even lncRNAs with low expression levels could still be functional since some regulatory mechanisms do not require high concentration of effector molecules, as, for example, for lncRNAs acting at their site of transcription or when transcription itself is a regulatory mechanism, as we shall discuss below.
From genomic organisation and expression to function
Several characteristics of lncRNA loci and transcripts are indicative of their function. These characteristics include lncRNA‐specific expression patterns, their genomic organisation in close proximity to protein‐coding genes (developmental regulators in particular) and their overlap with enhancers and transposable elements.
Expression patterns of lncRNAs
Studies in a variety of cell lines and organisms, including maize, fly, zebrafish, mouse and human, have found that lncRNAs present cell, tissue and/or developmental specific expression patterns to an even higher degree than protein‐coding genes (Ravasi et al, 2006; Dinger et al, 2008a; Guttman et al, 2010; Cabili et al, 2011; Ulitsky et al, 2011; Derrien et al, 2012; Djebali et al, 2012; Pauli et al, 2012; Wamstad et al, 2012; Young et al, 2012; Li et al, 2014). These specific expression patterns are reminiscent of genes with regulatory functions and have been considered as one indication of lncRNAs having roles in development and cell identity (Mercer et al, 2008; Mattick & Dinger, 2013). However, expression patterns alone are not sufficient to validate function since lncRNAs may theoretically be the result of unspecific transcription from cryptic promoters or intergenic sequences that happen to have high affinity for the transcription machinery (Khaitovich et al, 2006; Ravasi et al, 2006; Struhl, 2007; Ulitsky & Bartel, 2013). Accordingly, lncRNAs' specific expression patterns would result from cell type, tissue or developmental changes in the chromatin accessibility of the corresponding loci and/or from transcriptional regulatory activity proximal to their loci (Khaitovich et al, 2006; Ravasi et al, 2006; Struhl, 2007).
Yet, it remains difficult to explain how such “transcriptional noise” would result in expression patterns even more specific than those of protein‐coding genes (Guttman et al, 2010; Cabili et al, 2011; Derrien et al, 2012; Djebali et al, 2012; Pauli et al, 2012). Equally unexpected for non‐functional transcripts is their tissue‐specific splicing patterns (Ravasi et al, 2006; Aprea et al, 2015) and the conservation of their promoters and transcription factor‐binding sites at a level comparable to those of protein‐coding genes (Carninci et al, 2005; Ponjavic et al, 2007; Guttman et al, 2009; Necsulea et al, 2014) (further discussed below). This seems to imply that the specific expression patterns of lncRNAs are a consequence of a highly regulated transcriptional programme rather than “noise” and, thus, that at least a portion of them are likely involved in biological functions including regulation of key developmental programmes and cell identity (Mattick, 2011; Necsulea et al, 2014).
Genomic proximity to developmental regulators
Another interesting feature of lncRNAs supporting their role in development, and perhaps related to their specific expression patterns, is their preferential genomic localisation in proximity to, or overlapping, developmental regulators and transcription factors (Fig 1) (Dinger et al, 2008a; Mercer et al, 2008; Guttman et al, 2009; Ponjavic et al, 2009; Cabili et al, 2011; Ulitsky et al, 2011; Pauli et al, 2012; Wamstad et al, 2012; Young et al, 2012; Lepoivre et al, 2013). According to several studies, the expression patterns of most lncRNAs positively correlate with those of their neighbouring/overlapping coding gene (Dinger et al, 2008a; Ponjavic et al, 2009; Guttman et al, 2011; Derrien et al, 2012; Lepoivre et al, 2013; Sigova et al, 2013; Aprea et al, 2015). Consequently, it has been suggested that lncRNAs may regulate gene expression in cis, that is to control nearby genes in an allele‐specific manner (Ponjavic et al, 2009; Wamstad et al, 2012). Some examples of cis‐acting lncRNAs have been described, such as HOTTIP, a lncRNA transcribed from the 5′ end of the HOXA locus (Wang et al, 2011). This lncRNA binds to and targets the MLL/Tritorax complex in cis to the HOXA genes across 40 Kb that are brought together through chromosome looping. Recruiting the MLL/Tritorax complex leads to histone H3K4 trimethylation and activation of gene expression (Wang et al, 2011) (additional examples of cis‐acting lncRNAs are described below).
The tight correlation in expression patterns of coding and non‐coding transcripts should, however, not be taken as an evidence for a general role of non‐coding RNAs in cis‐regulatory functions. In fact, besides cis‐regulatory activity, several possibilities can also explain the similar expression patterns of neighbouring genes. These include: (i) common regulatory sequences controlling the expression of both the coding and the non‐coding gene, (ii) chromatin modifications spreading along the chromosome and affecting gene transcription within a chromosomal domain, and/or (iii) relocalisation of the genomic region to a transcriptionally active nuclear compartment such as transcription factors (Hurst et al, 2004; Katayama et al, 2005; Arnone et al, 2012; Bickmore, 2013). Moreover, several studies have found that the correlation between neighbouring coding and non‐coding gene pairs is comparable to that found between coding–coding gene pairs (Cabili et al, 2011; Ulitsky et al, 2011). This is actually not surprising as mammalian genomes present an excess of gene pairs, independently of their coding capacity, located within 1 Kb of each other. These include bidirectional and overlapping antisense transcripts which constitute at least 10% (Adachi & Lieber, 2002; Takai & Jones, 2004; Trinklein et al, 2004; Engström et al, 2006; Li et al, 2006) and 25% (Katayama et al, 2005; Engström et al, 2006), respectively, of the genes in the human and mouse genome. These gene pairs, with conserved proximity and orientation in vertebrates when both orthologs exist (Trinklein et al, 2004; Chen et al, 2005; Engström et al, 2006; Li et al, 2006), tend to be coexpressed displaying positive and, to a much lower frequency, negative correlation (Trinklein et al, 2004; Chen et al, 2005; Katayama et al, 2005; Engström et al, 2006; Li et al, 2006; Arnone et al, 2012). In addition, a common and evolutionary conserved feature of eukaryotic genomes is the presence of chromosomal domains of genes with similar or coordinated expression patterns (Spellman & Rubin, 2002; Fukuoka et al, 2004; Hurst et al, 2004; Sémon & Duret, 2006; Woo et al, 2010). These domains can range from a few Kbs in yeast to 100 Kb in Drosophila and up to Mbs in mammals (Spellman & Rubin, 2002; Hurst et al, 2004). Domains even longer than 10 Mbs can be found, which are explained by the three dimensional structure of the chromosomes in the nucleus (Woo et al, 2010).
Altogether, lncRNA proximity and coexpression with protein‐coding genes is probably reflecting an evolutionary conserved genomic organisation with an abundance of bidirectional promoters resulting in an excess of head‐to‐head gene pairs and gene domains with coordinated gene expression. As several mechanisms can explain these coexpression patterns, this alone cannot be considered as evidence for gene regulation in cis. Yet, this information is still relevant to understand lncRNA function.
Eukaryotic genomes are often organised in functional domains and/or gene pairs where genes involved in the same biological pathway cluster (Lee & Sonnhammer, 2003; Fukuoka et al, 2004; Li et al, 2006; Al‐Shahrour et al, 2010; Arnone et al, 2012). Interestingly, a higher degree of expression correlation was observed for genes involved in the same biological pathway when they are in the same genomic domain rather than when they are further apart (Al‐Shahrour et al, 2010). Thus, the expression correlation of gene pairs supports the involvement of lncRNAs in biological pathways similar to those of their neighbouring protein‐coding gene independently of a cis‐regulatory mechanism. One such example is HOTAIR, another lncRNA that in human regulates the expression of HOX genes, transcription factors involved in embryonic body plan and cell specification. HOTAIR is expressed from the HOXC locus in antisense to the HOXC genes, while it represses the HOXD locus on another chromosome. HOTAIR recruits the polycomb repressive complex 2 (PRC2) through direct interaction with the SUZ12 subunit leading to histone H3K27 trimethylation and gene repression of the HOXD locus (Rinn et al, 2007). Thus, this lncRNA transcribed from the HOXC locus is not involved in regulating HOXC genes in cis, but is involved in the same biological process as HOXC by controlling embryonic body plan through HOXD expression.
Overlap with enhancers
Another feature of lncRNA loci is their frequent overlap with enhancers and transposable elements. Active enhancers have been shown to be transcribed bidirectionally, producing short, unspliced, unpolyadenylated and unstable (exosome sensitive) eRNAs (Table 1) preceding the activation of the genes under control of the enhancer (Kim et al, 2010; Koch et al, 2011; Andersson et al, 2014; Arner et al, 2015). In addition, some enhancers are transcribed directionally into longer, spliced, polyadenylated transcripts with low coding capacity, that is lncRNAs (Koch et al, 2011; Kowalczyk et al, 2012). Simultaneously, more than half of lncRNAs expressed in blood cells were found to originate from transcription sites overlapping enhancers, with their expression correlating with that of the neighbouring protein‐coding gene even more strongly than lncRNAs not overlapping enhancers (De Santa et al, 2010; Marques et al, 2013). As Pol II can be recruited by the enhancer for translocation to the promoter, it is possible that the juxtaposition of promoter and enhancer that stimulates transcription from the promoter leads to transcription from the enhancer as a by‐product (Koch et al, 2011; Kowalczyk et al, 2012). Alternatively, the enhancer could be regulating both the expression of the overlapping lncRNA and that of the proximal protein‐coding gene, with neither having any direct effect on the expression of its neighbour.
Nonetheless, it is also possible that lncRNAs overlapping enhancers, or the act of transcription per se, are important for enhancer function. Pol II passage during enhancer transcription could lead to chromatin remodelling, changing the accessibility of the enhancer to transcription factors (De Santa et al, 2010; Koch et al, 2011). For example, transcription is necessary for histone acetylation in enhancers upstream of Ccl5 (De Santa et al, 2010). In other cases the enhancer could instead be acting through or together with the lncRNA. For example, the lncRNA Evf2 (or Dlx6os1) transcribed from the intergenic region between Dlx5 and Dlx6 overlaps one of the enhancers found in this region and regulates the binding of the transcription factor DLX2 to this enhancer (Feng et al, 2006). Even more, lncRNAs themselves can act as enhancers (Ørom et al, 2010), for example by cis‐long range transcriptional activation through interaction with the mediator complex (Lai et al, 2013) or by targeting the WDR5/MLL histone methyltransferase complex in cis leading to activating chromatin modifications (Wang et al, 2011). Thus, the expression of a lncRNA that overlaps an active enhancer could provide information of its possible function as an eRNA or enhancer coregulator.
Overlap with transposable elements
In the case of lncRNA overlapping, containing or derived from transposable elements (TEs), the difference with protein‐coding genes is striking. Whereas 5% of coding gene loci, and only 0.3% of coding sequences, are derived from TEs, the majority of human and mouse lncRNAs overlap at least one TE and more than 30% of their sequences are derived from TEs (Kelley & Rinn, 2012; Kapusta et al, 2013). This percentage could be even higher as the fraction of TE‐derived sequences has been shown to be decreased after standard RNA extraction procedures, with a large fraction of non‐coding RNAs associated with euchromatin being composed of TEs (Hall et al, 2014). The higher proportion of TE sequences in lncRNAs probably reflects their different sequence constrains, lacking codon or reading frame conservation constrains, thus accepting more readily TE insertions (Kapusta et al, 2013; Kapusta & Feschotte, 2014). Still, TE‐derived sequences in lncRNAs are less frequent than in the whole genome (ca. 50% in human and mouse), probably indicating that structure and function of some lncRNAs could be disrupted by these insertions (Kapusta et al, 2013). Moreover, the TE composition of lncRNAs is different from genomic background as the former are enriched in long terminal repeats of endogenous retroviruses, and depleted of both long and short interspersed elements in human and mouse (Kelley & Rinn, 2012; Kapusta et al, 2013).
Sequences of lncRNAs derived from TEs may play important roles by providing functional domains for protein interaction or base pairing (RNA–RNA or RNA–DNA). In particular, protein interaction domains in lncRNAs can be a direct consequence of TE insertions because these domains are already present in TEs to mediate the assembling of ribonucleoprotein complexes necessary for the TEs' lifecycle (Johnson & Guigó, 2014). These insertions can thus provide domains for interactions with proteins encoded within the TE or the genome, including transcription factors and chromatin modifiers (Johnson & Guigó, 2014). One example of a domain derived from TEs involved in RNA–protein interaction is present in Xist, a lncRNA essential for dosage compensation of the X chromosome in cis. The 5′ region of this transcript contains several tandem repeats that are likely derived from TEs (Elisaphenko et al, 2008). In particular, repeat A is derived from ERVB5, an endogenous retrovirus (Elisaphenko et al, 2008), and forms two hairpins that mediate the targeting of PRC2 to the inactive X chromosome leading to histone H3K27 trimethylation and repression of gene expression (Zhao et al, 2008).
Additionally, TEs can provide DNA or RNA interaction domains to lncRNAs. As TEs exist as multiple copies in the genome and some of these copies form part of other transcripts in complementary orientation, each TE domain is likely capable of interacting with DNA or RNA sequences derived from the same family of TE (Johnson & Guigó, 2014). A lncRNA with such TE could regulate a whole family of transcripts or genomic regions. ANRIL is one such example. This lncRNA, encoded in a locus associated with coronary disease, acts in part by interacting with PRC1 and PRC2 while binding to the promoters of its targets in trans due to the interaction of the same Alu element (primate‐specific short interspersed nuclear element) present in both the ANRIL transcript and the promoters of ANRIL‐regulated genes (Holdt et al, 2013). Another example concerning Alu elements, in this case involved in RNA–RNA interaction, is implicated in Staufen 1 (STAU1)‐mediated mRNA decay. LncRNAs containing Alu elements can base‐pair with an Alu element in the 3′ UTR of a group of mRNAs targeted for degradation. This double‐stranded RNA–RNA interaction recruits the STAU1 protein and triggers STAU1‐mediated decay (Gong & Maquat, 2011). Even more, and as proposed by the repeat insertion domains of lncRNAs hypothesis, a combination of different functional domains derived from TEs would lead to different lncRNA functions (Johnson & Guigó, 2014).
In the future, new bioinformatic tools could help identify TE‐derived functional domains in lncRNAs and perhaps, as lncRNAs have already been shown to cluster into families determined by the type of TE insertion (Derrien et al, 2012), classify them into families with related functional domains and function (Johnson & Guigó, 2014).
Evolutionary origin and conservation of lncRNAs
Although failure to detect evolutionary conservation in lncRNAs does not necessarily mean lack of function, its identification usually implies that a sequence is under evolutionary constrains because mutations that impair function are eliminated by purifying selection (Cooper & Brown, 2008). Since evolutionary conservation can predict function, major efforts have been invested towards the assessment of the evolutionary origin and conservation of lncRNAs.
Evolutionary origin of lncRNAs
A hypothesis that has been proposed for the evolutionary origin of lncRNAs involves the acquisition of a promoter inserted by TEs or derived from pre‐existing promoters or enhancers (Kapusta & Feschotte, 2014). This hypothesis could explain some of the features of lncRNAs mentioned above, including their expression patterns, proximity to coding genes and overlap with enhancers and TEs.
The positions of TEs in lncRNAs are biased towards the transcription start site, suggesting a role in the origin of these transcripts (Kelley & Rinn, 2012). Promoters and enhancers present an inherent bidirectional transcriptional activity generating short transcripts, usually rapidly degraded, in both directions (even if transcriptional elongation occurs in one orientation) (Kim et al, 2010; Koch et al, 2011; Wei et al, 2011). During evolution, some of these transcripts, or those induced by TE‐derived promoters, could have evolved into stable transcripts, acquiring polyadenylation and splicing signals likely derived as well from TE insertions. In fact, TE insertions are not only capable of providing these signals, but could also adapt into domains involved in protein or nucleic acid recognition, as explained above (Kapusta et al, 2013; Johnson & Guigó, 2014), thereby providing functional domains to non‐coding RNAs evolving into functional transcripts. In essence, bidirectional promoters, enhancers and TEs may have served as a fertile playground for evolution and natural selection of lncRNAs.
According to this hypothesis for the origin of lncRNAs, their expression patterns are related to the expression pattern of the regulatory element that originated them, as is the case for those lncRNAs with TEs in their transcription start sites which are characteristically more cell type specific (Djebali et al, 2012). For example, the TE HERVH is present in the promoter region of more than 100 lncRNAs where it provides transcriptional regulatory signals that lead to a stem cell‐specific expression pattern (Kelley & Rinn, 2012). Furthermore, lncRNAs originating from divergent promoters or pre‐existing enhancers would be expected to have expression patterns influenced by those of the coding genes that were originally regulated by these promoters and enhancers, though other regulatory features could have evolved to regulate as well their expression. This, together with the influence of the same genomic domain (acting similarly) on both genes, could explain the expression correlation of lncRNAs and their proximal coding genes.
Although the features of lncRNAs can be explained by their origin and genomic context, evolution would inevitably lead to the accumulation of mutations that would on average lessen transcription. That is, changes in sequence could inactivate transcription unless a transcript has acquired a function upon which its transcription is maintained under some level of selection (Brosius, 2005). Given the number of lncRNAs and their cell‐ and tissue‐specific expression patterns, it is not surprising that evolution has used at least some of these transcripts for important biological functions (Kowalczyk et al, 2012), particularly within the same developmental programme or regulatory network as their proximal gene. However, it still remains to be clarified why lncRNAs appear to be preferentially located in the genome in close proximity to developmental regulators and transcription factors. This could be due to several reason such as: (i) gene deserts flanking transcription factors being enriched in regulatory elements, frequently corresponding to enhancers (Ulitsky & Bartel, 2013); (ii) promoters and enhancers involved in the regulation of these genes being more prone to generate stable transcripts; or perhaps (iii) transcripts originating from these promoters/enhancers being more likely to acquire a function because of either their higher abundance or their differential expression. As a result, these lncRNAs would be particularly selected during evolution.
Evolutionary conservation of lncRNAs
The proportion of the genome and transcriptome that is functional has been a source of debate. On the one hand, it is argued that the lack of sequence conservation precludes function (Wang et al, 2004; Graur et al, 2013). On the other hand, besides concerns on the sequences used as controls in these conservation analysis, it is argued that different functional elements undergo different evolutionary rates some of which are difficult to detect such as species‐specific elements (Hyashizaki, 2004; Mattick & Dinger, 2013). Thus, evolutionary conservation of lncRNAs should be considered not only at the sequence level but also at the level of RNA structure and expression patterns (Kapusta & Feschotte, 2014).
Although considerably less conserved than exonic sequences of protein‐coding genes, lncRNAs have been shown to present clear signals of sequence conservation. Their sequences are distinctly more conserved than ancient repeats and present higher signals of sequence conservation mainly in splice sites, exons and small domains surrounded by less constrained sequences (Ponjavic et al, 2007; Guttman et al, 2009, 2010; Chodroff et al, 2010; Ørom et al, 2010; Derrien et al, 2012). This overall lower conservation probably reflects the different evolutionary constrains on RNA structures than on coding sequences, which are restrained by synonymous or conservative non‐synonymous substitutions that lead to conservation of the protein sequence and protein structure stability (Guttman et al, 2009; Ilott & Ponting, 2013).
Moreover, many lncRNAs show recent evolutionary origin, including primate‐ and rodent‐specific lncRNAs (Derrien et al, 2012). In a study where homologous lncRNA families were reconstructed based on sequence similarities, only 3% of the families were found to have originated more than 300 million years ago, while most lncRNA families were primate specific (Necsulea et al, 2014). These young primate‐specific lncRNAs present low exonic sequence constrains, even lower than random intergenic regions, whereas ancestral lncRNAs, expressed, for example, in all amniotes or tetrapods, present sequence constrains similar or even higher than coding sequences (Necsulea et al, 2014; Washietl et al, 2014). The presence of recent negative selection found at the human population level is consistent with a recent origin of primate‐specific lncRNAs or with their acquisition of novel functions (Necsulea et al, 2014).
As the function of lncRNAs, differently from mRNAs, likely depends on tridimensional domains, which can tolerate mutations as long as the intramolecular folding or intermolecular interactions are maintained, sequence conservation probably lacks sensitivity to detect lncRNAs' evolutionary constraints (though exceptions can be found for domains involved in RNA–RNA or RNA–DNA base pairing) (Johnson & Guigó, 2014; Kapusta & Feschotte, 2014). Therefore, conservation studies at the level of RNA secondary structure are likely to be more informative than those based on sequence alone. Methods for detecting RNA secondary structure are still noisy and prone to false positives (Kapusta & Feschotte, 2014). Nevertheless, human lncRNAs have been shown to be enriched in evolutionarily conserved RNA structures, suggesting that these motifs are relevant to lncRNA function (Smith et al, 2013; Kapusta & Feschotte, 2014). Interestingly, most of these structures do not overlap with any sequence‐constrained element (Smith et al, 2013).
A clear example of a functional lncRNA with poor sequence conservation is Xist, whose 5′ region is lowly conserved but contains tandem repeats proposed to form secondary structures essential for its function (Zhao et al, 2008). In addition, sequences involved in Xist localisation to the chromatin of the inactive X chromosome are scattered through the transcript and present no clear signal of sequence conservation (Wutz et al, 2002). On the other hand, stretches with relatively high sequence conservation do not seem to be functional, as their deletion did not lead to discernible phenotypes (Brockdorff, 2002).
Independently from sequence similarity, evolutionary conservation can be investigated at the level of transcription by assessing whether or not ortholog loci exist and whether they present similar transcriptional regulation and display similar expression patterns across species. In this context, lncRNAs have also evolved rapidly (Khaitovich et al, 2006) with the proportion of human lncRNAs with orthologous transcripts increasing from one‐third among placental mammals to 63–72% in primates and 80–92% in hominids. This is much lower than the 90–98% observed for primates' protein‐coding genes but still higher than expected by chance (Necsulea et al, 2014; Washietl et al, 2014). This rapid evolution seems to be a consequence of a considerable gain and loss of lncRNAs during speciation (Washietl et al, 2014). On the other hand, selective constrains seem to be higher at the level of transcriptional regulation. Promoters of lncRNAs show conservation levels identical to those of protein‐coding genes, even for young lncRNAs (Carninci et al, 2005; Ponjavic et al, 2007; Guttman et al, 2009; Chodroff et al, 2010; Necsulea et al, 2014). Moreover, sequence conservation of transcription factor‐binding sites is even stronger than in protein‐coding gene promoters, in particular for ancient lncRNAs, which points to active regulation of lncRNA transcription (Necsulea et al, 2014). Consistently, their tissue‐specific expression patterns, one of the most prominent features of lncRNAs, seem to be remarkably conserved among primates (Necsulea et al, 2014) and across mammals (Washietl et al, 2014), with expression correlation levels between species largely similar to those of mRNAs (Necsulea et al, 2014; Washietl et al, 2014). Even more, lncRNA are mostly defined by the tissue where they are expressed rather than by the species (Washietl et al, 2014).
All together, the function of a lncRNA might not depend directly on its sequence and its evolutionary constrains might not be strong enough to be easily detectable. Thus, when studying the evolution of lncRNAs, it is important to also consider the conservation of their secondary structure and expression patterns. While the tools to address the former are rather limited, more and more sequencing studies now provide a solid platform to achieve the latter.
Coding and non‐coding RNAs: a thin line
LncRNAs are usually identified through a series of bioinformatic tools that apply a negative selection of protein‐coding features found in mRNAs (Ulitsky & Bartel, 2013). These include the following: (i) selection by open reading frame (ORF) conservation, as protein‐coding sequences favour synonymous base changes and non‐random codon usage while lncRNAs tend to have more random substitutions (Ilott & Ponting, 2013); (ii) ORFs in mRNAs tend to have sequence similarities to other proteins or contain known protein domains while lncRNAs do not (Dinger et al, 2008b; Ulitsky & Bartel, 2013); and (iii) selection by ORF length, which tends to be longer than that expected by chance in coding transcripts with a limit of 100 amino acids being commonly used (Dinger et al, 2008b; Ulitsky & Bartel, 2013).
Although these bioinformatic criteria are valid for an overall identification of lncRNAs, ribosome profiling and mass spectrometry studies have shown that catalogues of lncRNAs included up to 10% of transcripts coding for peptides with newly evolved short ORFs and misannotated a few conserved lncRNAs as coding transcripts (Bánfai et al, 2012; Chew et al, 2013; Bazzini et al, 2014). Therefore, when studying a putative lncRNA it should still be investigated whether this may code for a protein or short peptide(s) and, even more important, whether a putative biological function is dependent on the transcript, the peptide or both. In fact, some transcripts initially defined as lncRNAs have later been found to code for functional short peptides (Andrews & Rothnagel, 2014). For example, the fly gene tal, which is essential for embryonic development and for epithelial architecture, was originally classified as a lncRNA but later found to be a polycistronic RNA encoding four evolutionary conserved short peptides (Kondo et al, 2007). These peptides control epidermal differentiation inducing the cleavage of the transcription factor Svb that then changes from repressor to activator (Kondo et al, 2010). Conversely, two‐thirds of human genes until recently predicted to be hypothetical protein coding were later found to lack coding potential (Jia et al, 2010).
In any case, one should contemplate the possibility of a gene having a function both at the RNA and at the protein level. For example, the steroid receptor RNA activator Sra was originally identified as a RNA gene that co‐activates the MYOD transcription factor and several nuclear receptors. Subseq uently, a novel Sra splicing variant was identified that codes for the protein SRAP (Lanz et al, 1999; Chooniedass‐Kothari et al, 2004). This protein inhibits Sra‐dependent co‐activation through direct Sra–SRAP interaction (Hubé et al, 2011). So, even if a gene were to code for a peptide, this would not preclude the possibility of a function at the RNA level. This leads to the intriguing possibility that some mRNA studied thus far only in the context of their protein‐coding potential could as well have a function as lncRNA. Some examples support this conclusion including the following: (i) p53 mRNA, which binds the E3 ubiquitin ligase Mdm2 preventing its own protein polyubiquitination and degradation (Candeias et al, 2008); (ii) LXRA, with an alternative splicing variant acting as a coactivator of the LXR nuclear receptor protein (Hashimoto et al, 2009); and (iii) mbl, which encodes the splicing factor MBL and a non‐coding circular splicing variant that regulates its own splicing by binding MBL (Ashwal‐Fluss et al, 2014). Intriguingly, in all these cases the RNA and the protein of a gene were found to coregulate each other's expression or activity.
All in all, defining a gene as coding or non‐coding may be more difficult than typically assumed and several possibilities exist for dual functions at the interface between these two categories.
The organ where lncRNA function appears to be particularly relevant is the brain, a complex organ with multiple cell types, organised into layers and nuclei spatially arranged into areas with specialised roles and interconnections that regulate most bodily functions, among all cognition. As such, a complex developmental programme regulates the different types of progenitors that coexist in the developing neural tube giving rise to different neuronal and glial subtypes in a timely and spatially coordinated manner (Rowitch & Kriegstein, 2010; Martynoga et al, 2012; Lodato et al, 2015).
LncRNAs seem to act as novel regulators in the temporal and spatial control of developmental programmes, cell fate and function (Chodroff et al, 2010; van Leeuwen & Mikkers, 2010; Qureshi et al, 2010). Several features of lncRNAs expressed during brain development contribute to this hypothesis. First, lncRNAs expressed in the brain present a higher degree of evolutionary conservation. In particular, in rodents and primates these lncRNAs have been shown to present higher sequence constrains than those expressed in other tissues (Ponjavic et al, 2009; He et al, 2014) with levels of sequence conservation correlating with levels of gyrification (Johnson et al, 2015). Furthermore, brain‐expressed lncRNAs have been found to be enriched in predicted, conserved RNA structures (Mercer et al, 2008; Ponjavic et al, 2009; Seemann et al, 2012) and thus are more likely to present conserved functions.
Second, brain‐expressed lncRNAs display remarkably specific expression patterns. As mentioned above, a number of studies have each simultaneously analysed lncRNA expression in different tissues identifying the brain as the organ expressing the largest number of lncRNAs and the highest proportion of tissue‐specific lncRNAs in species ranging from fly to humans (Inagaki et al, 2005; Ulitsky et al, 2011; Derrien et al, 2012; Kaushik et al, 2013; Necsulea et al, 2014; Washietl et al, 2014). Furthermore, this is not a result of promiscuous expression in brain tissues, as this tissue specificity shows a remarkable conservation across species (He et al, 2014; Washietl et al, 2014) with expression profiles of lncRNAs in the human and macaque developing prefrontal cortex presenting levels of evolutionary conservation comparable to those of coding genes (He et al, 2014). Another tissue with widespread expression of lncRNAs with conserved tissue specificity is the testis (Necsulea et al, 2014; Washietl et al, 2014). Testis‐specific lncRNAs are enriched in young lncRNAs (Necsulea et al, 2014) showing greater variability among species than those expressed in other tissues (Khaitovich et al, 2006). These results are indicative of positive selection (Khaitovich et al, 2006) and consistent with testis permissive expression leading to new gene origination (Necsulea et al, 2014). However, those lncRNAs expressed in the brain present lower variability between species than those expressed in other tissues, pointing to negative selection and conserved functions of brain‐specific lncRNAs (Khaitovich et al, 2006).
Third, within the brain lncRNAs present high temporal and spatial specificity. A number of transcriptome analysis of brain regions such as, among others, cortex, cerebellum and hippocampus during development and adulthood, have identified numerous lncRNAs differentially expressed over time and/or brain areas (Mercer et al, 2008; Belgard et al, 2011; Pal et al, 2011; Ramos et al, 2013; Lipovich et al, 2014; Kadakkuzha et al, 2015). These expression patterns showed a specificity even greater than those of protein‐coding genes (Ramos et al, 2013), suggesting an involvement of lncRNAs in brain development and function. Moreover, transcriptome analysis of specific cell types coexisting in the developing cortex have identified lncRNAs selectively expressed in neural stem cells as opposed to neurogenic progenitors or newborn neurons (Aprea et al, 2013, 2015) as well as in different subpopulation of pyramidal neurons (Molyneaux et al, 2015). Even more, a higher proportion of lncRNAs than protein‐coding genes have been found specifically enriched in a subtype of human neural stem cells common to gyrencephalic brains (outer radial glia) with a crucial role in brain expansion during evolution (Johnson et al, 2015). Overall, these cell type‐specific expression patterns point to lncRNA contribution to cell fate, lineage specification and maintenance of cell identity during development and evolution of the mammalian brain.
LncRNAs regulating neighbouring transcription factors
Among the hundreds of lncRNAs expressed in the CNS, few have been characterised either at a phenotypic or at a mechanistic level. As already discussed for lncRNAs in general, many of these functionally validated lncRNAs either overlap active enhancers or localise within 50 Kb of transcription factors centrally involved in neurogenesis (Ponjavic et al, 2009; Aprea et al, 2013; Lv et al, 2013). This may represent a bias in that researchers are more prone to study a lncRNA when it is next to their transcription factor of interest. Nevertheless, this seems to be a general feature of lncRNAs expressed in the brain. Several groups, including ours, have recently found that during cell fate specification in the developing cortex, lncRNA expression showed an almost perfect linear correlation with their neighbouring genes even up to 1 Mb away, with these coding genes being enriched in transcription factors involved in neurogenesis (Ponjavic et al, 2009; Lv et al, 2013; Aprea et al, 2015). In addition, the examples validated so far have shown that these features can be important for lncRNA function in brain development, which in many cases is to regulate the expression of the neighbouring gene and/or to be involved in the same biological process.
One example of such lncRNAs is Paupar, located 8.5 Kb upstream of Pax6 encompassing an enhancer conserved in vertebrates and encoded within the intron of Pax6 opposite strand (Fig 3A) (Vance et al, 2014). Paupar expression in the CNS correlates with Pax6, a homeodomain transcription factor involved in the regulation of proliferation and specification of neural progenitors (Martynoga et al, 2012; Vance et al, 2014). Paupar knockdown led to alterations in the cell cycle of N2A cells inducing neuronal differentiation (Fig 2). Apparently, Paupar controls transcription of Pax6 and protein activity of PAX6 and other transcription factors by regulating their association to transcriptional cofactors through a predicted stem loop secondary structure (Fig 3A) (Vance et al, 2014). Thus, Paupar regulates transcription in cis and protein activity in trans.
Similarly, EMX2 is another homeodomain transcription factor regulating proliferation and differentiation in the CNS where it is essential for cortical lamination and arealisation (Martynoga et al, 2012). Emx2 expression is regulated by the lncRNA Emx2OS, with its transcription start site overlapping Emx2 first exon (Spigoni et al, 2010). Emx2 and Emx2OS have a similar temporal expression and colocalise in defined areas of the neural tube, in particular in the ventricular and subventricular zone. In contrast, their expression in neurons is mutually exclusive with high levels of Emx2OS or Emx1 in cortical plate or marginal zone Cajal–Retzius neurons, respectively (Spigoni et al, 2010). The mechanism underlying this pattern is unknown, but it seems that Emx2 and Emx2OS promote each other's expression in progenitors, while in neurons, Emx2OS downregulates Emx2 through Dicer‐dependent degradation of the double‐stranded RNA formed between the two, which might be due to upregulation of the RNA degradation machinery upon neuronal differentiation (Fig 3B) (Spigoni et al, 2010).
utNgn1 is a lncRNA required for proper transcriptional control of Neurog1, a bHLH transcription factor promoting neuronal differentiation and fate specification in the neural tube. This lncRNA is transcribed 7 Kb upstream of Neurog1, encompasses an enhancer conserved in vertebrates and is expressed together with Neurog1 upon neuronal differentiation (Onoguchi et al, 2012). In particular, utNgn1 seems to be required to enhance Neurog1 expression because its knockdown prevents Neurog1 upregulation. Conversely, knockdown of Neurog1 does not inhibit the upregulation of utNgn1 excluding the possibility that this lncRNA results from transcriptional noise upon Neurog1 expression (Onoguchi et al, 2012). In addition, a BAC reporter mouse where Neurog1 was replaced by GFP and utNgn1 was removed resulted in the unchanged expression pattern of GFP in tissues but at substantially reduced levels (Quiñones et al, 2010). Finally, utNgn1 knockdown not only downregulated Neurog1, but also reduced the expression of Tbr2 and Neurod1, consistent with a role of utNgn1 in neuronal commitment (Onoguchi et al, 2012).
During cortical development, different classes of pyramidal neurons are generated in a defined pattern (Lodato et al, 2015). This requires the timely regulation of several transcription factors including two members of the class III POU, Brn1 and Bnr2, that regulate the proliferation of late precursors and are essential for upper‐layer neuronal fate specification and migration (McEvilly et al, 2002; Sugitani et al, 2002; Dominguez et al, 2013). Interestingly, three lncRNAs, linc‐Brn1b, Dali and Pnky, are located in proximity and coexpressed with Brn1 or Brn2 to influence neuronal commitment of neural progenitors and neuronal differentiation.
Linc‐Brn1b and Dali are two conserved lncRNAs located 10 and 50 Kb downstream of Brn1, respectively, with which they are expressed in the telencephalic proliferative zones during mid‐corticogenesis and upper cortical neurons by the end of neurogenesis (Sauvageau et al, 2013; Chalei et al, 2014). Both lncRNAs were suggested to control the expression of Brn1 in cis and other targets in trans ultimately influencing proliferation of progenitors, upper‐layer neuronal fate (Linc‐Brn1b) and dendritogenesis (Dali) (Figs 2 and 3C) (Sauvageau et al, 2013; Chalei et al, 2014). In particular, Dali seems to compete with DNMT1 for binding to either a transcriptional cofactor or DNA, hence reducing DNMT1‐mediated CpG methylation at loci where DNMT1 is targeted by BRN1 or other transcription factors (Fig 3C) (Chalei et al, 2014). Finally, Pnky is a lncRNA encoded 2 Kb upstream in the opposite strand of Brn2 with two regions conserved in vertebrates (Ramos et al, 2015). This lncRNA is expressed in the developing cortex and adult subventricular zone where it is enriched in NSC and downregulated in committed progenitors and neurons (Ramos et al, 2015). Pnky knockdown increased NSC commitment towards the neurogenic lineage and led to an expansion of committed progenitors in the adult subventricular zone (Fig 2) (Ramos et al, 2015). Pnky has been found to interact with PTBP1 (Ramos et al, 2015), a splicing factor expressed in NSCs that represses the inclusion of neural exons in non‐neural cells (Grabowski, 2011). Knockdown of Pnky or PTBP1 led to similar phenotypes and to similar alterations in expression and splicing of transcripts involved in cell adhesion, neurogenesis and synaptogenesis (Ramos et al, 2015). Interestingly, Brn2 and Pnky are expressed in complementary patterns and overexpression of the former phenocopies knockdown of the latter leading to progenitor commitment towards generation of upper‐layer neurons (Sugitani et al, 2002; Dominguez et al, 2013; Ramos et al, 2013, 2015).
DLX5 and DLX6 are homeodomain transcription factors essential for interneuron migration and specification in the ventral telencephalon (Wang et al, 2010). These transcription factors are expressed as bigene clusters from the same locus that encodes two lncRNAs. One of them is Evf2 (or Dlx6os), a lncRNA conserved in vertebrates antisense to Dlx6 and encompassing one of the two intergenic ultraconserved enhancers found in this locus (Fig 3D). This transcript, like Dlx5/6, is regulated by Shh signalling and expressed in the ventral telencephalon, in committed progenitors of the subventricular zone and immature neurons (Feng et al, 2006). Evf2 acts as a transcriptional regulator of its own locus both in cis and in trans, and its absence leads to defects in interneuron generation (Fig 2) (Bond et al, 2009; Kohtz, 2014). On the one side, Evf2 recruits the transcription factor DLX2 to Dlx5/6 enhancers stabilising this interaction to activate transcription (Fig 3D) (Feng et al, 2006). On the other side, Evf2 represses both Dlx5 and Dlx6 by different mechanisms: in the former case by recruiting the methyl CpG‐binding protein MECP2 that competes for the same binding site as DLX2 while, in the latter, Evf2 acts through inhibition by antisense transcription (Fig 3D) (Bond et al, 2009; Berghoff et al, 2013). Even more, Evf2 inhibits site‐specific CpG methylation of one of the ultraconserved enhancers in trans (Berghoff et al, 2013). This example shows how a lncRNA can regulate the genes in its own locus, both in cis and in trans allowing differential regulation of genes with shared regulatory elements (Berghoff et al, 2013).
Dlx1as is a lncRNA in the locus of two members of the distal‐less gene family, Dlx1/Dlx2. Its transcription start site lies in between the bigene cluster with exon 2 overlapping Dlx1 in the opposite strand (Fig 3E) (Kraus et al, 2013). Dlx1as appears to be involved in the neural versus glial fate decision in progenitors of the ventral telencephalon (Fig 2), and its truncation in mouse increased Dlx1 expression in the ventral telencephalon and adult hippocampus affecting Mash1 expression as well (Kraus et al, 2013). It is unclear whether this is a direct or indirect effect through increased DLX1 levels since, just like Dlx1 overexpression (Stühmer et al, 2002), truncation of Dlx1as does not lead to a change in the interneuron output (Kraus et al, 2013).
Nkx2.2 and Six3 are homeodomain transcription factors expressed in the ventral neural tube and regulated by a lncRNA, Nkx2.2AS and Six3OS, respectively. These lncRNAs are transcribed from the opposite strand of their neighbouring gene with which they share expression patterns (Geng et al, 2007; Tochitani & Hayashizaki, 2008; Rapicavoli et al, 2011). Overexpression of Nkx2.2AS leads to oligodendrogenesis (Fig 2) perhaps, in part, due to Nkx2.2 upregulation (Fig 3F) (Tochitani & Hayashizaki, 2008). In addition, Six3OS has been found to regulate cell fate specification in the developing retina and in the neurogenic niche of the adult subventricular zone (Fig 2), likely by regulating SIX3 activity (Rapicavoli et al, 2011; Ramos et al, 2013). Specifically, Six3OS RNA has been found to interact with the transcriptional coregulator EYA and with subunits of histone‐modifying complexes, suggesting its role as a scaffold RNA mediating the interaction of histone‐modifying enzymes with the complex SIX3–EYA (Fig 3G) (Rapicavoli et al, 2011). Six3OS likely regulates cell fate specification also independently of SIX3, perhaps regulating the activity of other transcription factors that interact with EYA (Rapicavoli et al, 2011).
LncRNAs regulating morphogens
In addition to transcription factors, also morphogens involved in brain development and function can be regulated by their proximally encoded lncRNAs. One of these factors is BDNF, a neurotrophin regulated by Bdnf‐AS and that is involved in survival of peripheral neurons, neuron size and arborisation (Pruunsild et al, 2007; Modarresi et al, 2012; Ceni et al, 2014). This lncRNA is transcribed in antisense to BDNF, is partially conserved between human and mouse and is coexpressed with BDNF in many tissues (Pruunsild et al, 2007; Modarresi et al, 2012). Bdnf‐AS knockdown has been shown to increase proliferation in the subgranular zone of the adult hippocampus and to increase neurogenesis, neurite outgrowth and maturation (Fig 2) (Modarresi et al, 2012). These effects are thought to be mediated mainly by an upregulation of Bdnf, as Bdnf‐AS knockdown increased the neurotrophin's mRNA and protein by reducing the PRC2 complex and levels of the silencing mark histone H3K27 trimethylation in the Bdnf locus (Fig 3H) (Modarresi et al, 2012).
FGF2 is another interesting example of a morphogen controlled by lncRNAs. FGF2 is a growth factor involved in a range of physiological processes including the maintenance of proliferation of neural progenitors at the onset of cortical neurogenesis (Tiberi et al, 2012). The locus that encodes Fgf2 is conserved in vertebrates (MacFarlane et al, 2010) and also encodes, in the opposite strand, the protein NUDT6 that controls cell proliferation independently from FGF2 (Li & Murphy, 2000; Asa et al, 2001). The RNA encoding NUDT6, Fgf2‐AS, presents partial overlap to the 3′ UTR of the Fgf2 mRNA, with which it has a reciprocal expression pattern (Knee et al, 1997; MacFarlane et al, 2010). Fgf2‐AS overexpression, even in the absence of a translation, inhibited proliferation (Fig 2) by reducing Fgf2 mRNA stability and translation efficiency probably through base pairing between Fgf2‐AS and Fgf2 3′ UTR in an Ago2‐dependent mechanism (Li & Murphy, 2000; MacFarlane et al, 2010). Hence, Fgf2‐AS presents the characteristics of a RNA acting as a lncRNA and mRNA.
LncRNAs in trans‐regulation of neurogenesis
Besides the many examples of lncRNAs neighbouring genes involved in neurogenesis, several lncRNAs in the nervous system act exclusively in trans and are involved not only in the regulation of transcription but also in other essential cellular processes such as splicing and translation.
One example is Rmst, a brain‐restricted lncRNA conserved from frog to human (Ng et al, 2012). Overexpression of Rmst promoted neuronal differentiation, while its knockdown inhibited it by promoting the glial fate (Fig 2) (Ng et al, 2013). Rmst acts as a transcriptional coregulator of SOX2, a transcription factor involved in the maintenance of stemness and required for neural differentiation. In the absence of Rmst, SOX2 activates genes involved in NSC identity, whereas during neuronal differentiation, Rmst upregulation is required for SOX2 binding at loci of proneural transcription factors and other genes involved in neuronal function (Ng et al, 2013).
Sox2 expression is also regulated by another lncRNA called Tuna or megamind. Tuna is a lncRNA evolutionary conserved in vertebrates (Ulitsky et al, 2011; Lin et al, 2014) with expression restricted to the CNS (in addition to the human testis) and that has been shown to be required for neuronal differentiation of mESCs and in the developing zebrafish brain (Ulitsky et al, 2011; Lin et al, 2014). Knockdown of Tuna in mESCs and developing zebrafish downregulated genes involved in neurogenesis and cell proliferation resulting in reduced differentiation (Fig 2) (Ulitsky et al, 2011; Lin et al, 2014). Mechanistically, Tuna forms a RNA–multiprotein complex through a conserved region with three RNA binding proteins, PTBP1, hnRNP‐K and NCL, targeting the complex to the promoters of pluripotency and neural differentiation genes, including Sox2 (Fig 3I) (Lin et al, 2014). Both SOX2 and Tuna regulate a common set of genes, while Sox2 overexpression is sufficient to partially rescue Tuna knockdown (Lin et al, 2014).
Malat1 is an intergenic lncRNA highly expressed in neurons where it localises in nuclear speckles (Bernard et al, 2010) and was originally found to regulate dendritic growth and synaptic formation in cultured hippocampal neurons (Fig 2) (Bernard et al, 2010). Though Malat1 is known to regulate splicing in vitro (Bernard et al, 2010; Tripathi et al, 2010), its mechanism of action is yet unclear as mouse knockout models led to only a limited number of altered alternative splicing events and dysregulation of genes located near the Malat1 locus (Zhang et al, 2012). Even more, three independently generated Malat1 knockout mouse models did not display any apparent phenotype, at least in standard housing conditions (Eismann et al, 2012; Nakagawa et al, 2012; Zhang et al, 2012).
Last but not least, Miat is a lncRNA with orthologs in Xenopus, chicken, mouse and human (Rapicavoli et al, 2010). Miat is among the transcripts with the highest expression levels in the developing mouse CNS where it is maintained in several areas during adulthood (Sone et al, 2007; Mercer et al, 2008; Aprea et al, 2013; Sunkin et al, 2013). Miat localises exclusively in the nucleus, with a dotted pattern overlapping areas with weaker DNA signal but not overlapping any known nuclear domain (Blackshaw et al, 2004; Sone et al, 2007). In the retina, Miat knockdown induced progenitor differentiation towards amacrine and Müller glia lineages (Rapicavoli et al, 2010), while in the developing cortex, it induced the generation of committed progenitors simultaneously changing their fate from differentiation to proliferation, which is apparently due to alternative exon usage of cell fate determinants regulated by Miat (Fig 2) (Aprea et al, 2013). In fact, this lncRNA has been found to interact with the splicing factors SF1, QK1 and SRSF1 (Fig 3J) (Rapicavoli et al, 2010; Tsuiji et al, 2011; Barry et al, 2014), supporting its role in alternative splicing (Tsuiji et al, 2011).
Several other lncRNAs have been shown to affect neurogenesis, though their detailed phenotype and mechanism of action awaits for more studies. These include: (i) Gm17566, a lncRNA antisense to Pax6 whose overexpression in the developing cortex disrupted neurogenesis (Aprea et al, 2013); (ii) cyrano, a lncRNA specifically expressed in the nervous system, conserved in tetrapods, and whose knockdown reduced neurogenesis in the developing zebrafish (Ulitsky et al, 2011); and (iii) Peril, a lncRNA expressed in the developing brain whose knockout resulted in neonatal lethality and altered expression of genes involved in cell cycle regulation, metabolism, immune response and mRNA and protein processing (Sauvageau et al, 2013). Finally, the deletion of 13 uncharacterised lncRNAs with brain‐specific expression patterns led, in general, to differential expression of gene sets involved in cell cycle regulation, cell fate commitment and neuronal differentiation (Goff et al, 2015).
Besides the examples of lncRNAs involved in brain development, these genes are also important for brain function and are involved in disease. One example of a lncRNA important for brain function is Bc1, a dendritic transcript that participates in protein synthesis in synapto‐dendritic microdomains influencing growth and synaptic plasticity. Bc1 acts as a repressor of translation blocking the activity of the initiation factor, thus inhibiting the formation of the preinitiation complex (Wang et al, 2002; Lin et al, 2008). In addition, lncRNAs seem to be important for neuronal activity as many lncRNAs and circRNAs are differentially expressed upon neuronal activation (Kim et al, 2010; Lipovich et al, 2012; Barry et al, 2014; You et al, 2015).
Moreover, mutations in lncRNA loci or dysregulation of lncRNA expression have been implicated in several neurological disorders (van de Vondervoort et al, 2013; Szafranski et al, 2015) such as Huntington's (Johnson et al, 2010; Chung et al, 2011; Johnson, 2012), Alzheimer's (Mus et al, 2007; Faghihi et al, 2008; Lukiw, 2013) and stroke (Dharap et al, 2012). Additional examples include: (i) Miat, which was found to be downregulated in brains affected with schizophrenia and is apparently involved in this pathology favouring the expression of the alternative splicing variants of DISC1 and ERBB4 (Barry et al, 2014); (ii) Ube3a‐ATS, a lncRNA antisense to the ubiquitin E3 ligase UBE3A expressed from the Prader‐Willi/Angelman syndrome locus and involved in imprinting of the Ube3A gene whose perturbation results in neurodevelopmental disorders (Meng et al, 2012); (iii) Kcna2AS, a lncRNA that regulates the expression of its antisense gene, the voltage‐dependent potassium channel Kcna2 expressed in dorsal root ganglia afferent neurons (Zhao et al, 2013). Peripheral nerve injury has been found to increase Kcna2AS, which downregulates KCNA2, leading to decreased voltage‐gated potassium currents resulting in increased excitability of dorsal root ganglia neurons and neuropathic pain (Zhao et al, 2013).
Overall, and even though most lncRNAs are completely uncharacterised, the few studied so far have shown several key roles in signalling, transcription, translation, splicing and coregulation of protein activity that are important in several organs, most notably the brain. With many more functional studies being expected in the near future, this reinforces the notion that lncRNAs represent a major novel regulatory dimension of CNS formation and function.
A new member to the club: circular RNAs
Considering the major efforts in detecting and annotating new transcripts going on for decades, it is very surprising that an entirely new class of RNAs was appreciated only in the last few years. In fact, early reports of circRNAs (Capel et al, 1993) have been disregarded as singularities, noise or even artefacts and it was only with the advent of deep sequencing and the development of novel bioinformatics tools that thousands of members of this new class of RNAs have come to light (Salzman et al, 2012; Hentze & Preiss, 2013).
CircRNAs are derived from head‐to‐tail splicing of mRNAs. Canonical splice signals and the spliceosome are involved in this circularisation, which is induced by mechanisms that bring closer together the 3′ and 5′ ends to be linked, including complementary regions or binding sites for splicing factors such as MBL or QKI (Ashwal‐Fluss et al, 2014; Conn et al, 2015) in the introns flanking circularised exons (Ebbesen et al, 2015). Similarly to linear lncRNAs, circRNAs they are expressed specifically in different developmental stages or cell types (Memczak et al, 2013; Salzman et al, 2013). Interestingly, they are also enriched in the nervous system of both mammals and invertebrates (Westholm et al, 2014; Rybak‐Wolf et al, 2015; You et al, 2015). The reasons for this enrichment seems to be twofold, as circRNAs are derived mainly from linear mRNAs expressed in the nervous system and genes with wider expression patterns are more likely to present a circular variant in the brain (Ashwal‐Fluss et al, 2014; Westholm et al, 2014; Rybak‐Wolf et al, 2015; You et al, 2015). For some of these genes, the circular variant is even the predominant isoform in brain (Rybak‐Wolf et al, 2015). CircRNAs display interesting features in terms of evolutionary conservation, as exons found in circular variants are more conserved than flanking exons (Rybak‐Wolf et al, 2015) and splicing sites involved in circularising these RNAs are more conserved than sites involved in standard splicing (You et al, 2015). Even more, genes expressed as circRNAs in the human brain are often detected also as circular in the mouse brain or even in the fly (Rybak‐Wolf et al, 2015).
Brain‐expressed circRNAs are differentially expressed among different regions (Rybak‐Wolf et al, 2015) and during mouse development (You et al, 2015) showing an overall upregulation during neuronal differentiation (Rybak‐Wolf et al, 2015). Surprisingly, they are preferentially derived from coding and 5′ UTR exons, in particular from host genes involved in synaptic function (Ashwal‐Fluss et al, 2014; Rybak‐Wolf et al, 2015; You et al, 2015). Moreover, circRNAs appear enriched in synaptic compartments (Rybak‐Wolf et al, 2015; You et al, 2015) and show a clear upregulation during development at the onset of synaptogenesis (You et al, 2015). Thus, circRNAs appear to be particularly relevant for synaptogenesis and synaptic function. However, considering that previous studies primarily focussed on analysis of cell lines or whole brain regions, it would be hardly surprising if additional functions of circRNAs were not resolved. Moreover, analyses of circRNAs in single cell types were not yet reported and considering the novelty of this field, it is expected that more functions of circRNAs will be proposed and revisited in the near future.
Unfortunately, only one circRNA has been functionally manipulated in the brain so far, that is ciRS‐7 (or CDR1as), a circRNA transcribed antisense to the protein‐coding gene CDR1 that is conserved in eutherians and highly expressed in the mouse and human CNS, even more than the sense transcript (Hansen et al, 2011, 2013). This circular RNA seems to be involved in midbrain development by regulating mir‐7 levels (Memczak et al, 2013). ciRS‐7 expression levels are regulated through miR‐671 AGO2‐dependent endonucleolytic cleavage (Hansen et al, 2011), while ciRS‐7 simultaneously regulates mir‐7 levels acting as a microRNA sponge (Hansen et al, 2013; Memczak et al, 2013). This circRNA has dozens of conserved mir‐7 seed target sites, and some 3′ base pairing, but central mismatches prevent microRNA‐dependent endonucleolytic cleavage. Given its circularity and lack of polyA tail, ciRS‐7 is not susceptible to deadenylation and exonucleolytic degradation, thus stably binding mir‐7 (Hansen et al, 2013; Memczak et al, 2013). This mir‐7 regulatory mechanism seems to be brain specific, since, although mir‐7 is expressed in other tissues, it is coexpressed with ciRS‐7 only in the brain (Hansen et al, 2013; Memczak et al, 2013). CiRS‐7 expression in zebrafish, which expresses mir‐7 but has lost the CDR1 locus, leads to reduced midbrain size (Memczak et al, 2013). Interestingly, ciRS‐7 has more recently been implicated in function of pancreatic beta cells (Xu et al, 2015).
Study of ciRS‐7 promoted the view that circRNAs may act as miRNA sponges, but it is unclear whether this particular circRNA with 70 miRNA target sites may be more an exception than the rule, and several other roles for this new class of molecules have been suggested that await validation (Ashwal‐Fluss et al, 2014). Nevertheless, experiments on ciRS‐7 have provided the first evidence for a functional role of circRNAs. Considering the specific temporal and spatial expression of several thousands of them, it is not unreasonable to conclude that circRNAs may represent a class of RNAs with functions as diverse as lncRNA or even mRNAs.
For most lncRNAs, a biological function has yet to be shown and most likely a proportion of them will reveal to be non‐functional or transcriptional noise (Ilott & Ponting, 2013; Ulitsky & Bartel, 2013). Nevertheless, the list of lncRNAs with proven function in several biological processes has been constantly increasing in the last few years and this list is expected to continue to grow, as more researchers realise the importance of this large group of understudied genes and new tools become available towards deciphering their role. For example, evidence for the evolutionary conservation of lncRNAs has been accumulating and new sophisticated approaches are being developed to elucidate their conservation at the level of 3D structure. Together with the abundance of lncRNAs and the several features that characterise them, it is reasonable to argue that a large group of functional lncRNAs still awaits to be discovered.
Yet, predicting the function of lncRNAs still relies on resources that are quite limited and scarce relative to those available for studying protein‐coding genes. In particular, it should be considered that most investigations of lncRNA function are based on single studies, with some lncRNAs only studied in vitro. These results should be interpreted with caution, as some lncRNAs have been shown to have different effects in vitro and in vivo (Kohtz, 2014). Moreover, other lncRNAs have been studied only in mouse models with genomic deletions. In this case, it is difficult to discriminate between the effects caused by the lack of a lncRNA transcript from the general effects caused by the loss of the genomic region itself (Bassett et al, 2014; Kohtz, 2014), which is particularly troublesome when the lncRNA acts in cis and/or the genomic locus contains intrinsic regulatory regions. Hence, it is unclear to which extent current studies can effectively mimic lncRNA involvement in biological processes and better tools are needed that allow either their selective targeting, such as by RNAi, antisense oligonucleotides or more subtle manipulations, such as by CRISPR/Cas9 approaches (Bassett et al, 2014).
Nevertheless, current limitations on lncRNA research explain the significant increase in laboratory techniques and bioinformatics tools being developed including c‐KLAN (Chakraborty et al, 2012), dChIRP (Quinn et al, 2014) and “guilt by association” (Rinn & Chang, 2012). The search of functional domains as proposed by the modularity (Guttman & Rinn, 2012) and RIDL hypothesis (Johnson & Guigó, 2014), though for now limited, in the future may improve the identification of lncRNA function. All together, these and future tools hold promises for finding lncRNAs involved in a number of processes currently being studied and others that are yet unforeseen.
More resources are likely to arise and become available in the near future to help identify the molecular and cellular pathways underlying the function of any given lncRNA for basic research and medicine. In particular, the study of lncRNAs has been mostly investigated in the brain where they were revealed to be particularly relevant for development and function. It now seems important to expand this knowledge to other cells and organs serving as fertile grounds for exciting new discoveries. This knowledge is now making the first steps moving from the study of lncRNA to the newly appreciated circRNAs and perhaps more RNAs yet to be discovered. Given the many possibilities arising from the current state of this field, it is expected that the future will bring us a substantial increase in our insight, or even a complete change of view, about the genetic programmes and molecular regulation underlying cell identity and function.
Conflict of interest
The authors declare that they have no conflict of interest.
We are grateful to Matías García for excellent support on the preparation of the figures. JA and FC were supported by the CRTD, the TUD, the DFG Priority Program SPP1738 and SFB655.
- Cryptic promoter
- Promoter‐like sequences located within open reading frames (ORFs) that are usually not accessible to the transcriptional machinery. Perturbations in the chromatin structure can lead to the exposure of these sequences and to aberrant transcription from inside ORFs (Smolle & Workman, 2013).
- Cis‐acting DNA sequence that can heighten transcription from distal promoters (even up to 1 Mb away). Enhancers interact with the corresponding promoters through DNA loops recruiting transcription factors and the transcriptional machinery. Initially identified genome wide as highly conserved non‐coding DNA sequences that induce tissue‐specific expression when linked to minimal promoters and currently assessed through specific chromatin modifications such as on histones and binding of a transcriptional coactivator (Zhou et al, 2011; Pennacchio et al, 2013).
- A gene related to a second gene by descent from a common ancestral DNA sequence caused by the event of speciation (ortholog) or genetic duplication (paralog).
- Genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain similar functions in the course of evolution allowing reliable prediction of gene function in newly sequenced genomes.
- Genes related by duplication within the genome of a single species. Paralogs typically evolve new functions even if related to the original one.
- DNA sequence proximal to the transcription start site, usually considering the upstream 2 Kb sequence as an approximation, that integrates the regulatory input into transcription initiation. It contains sites for the binding of the transcriptional machinery, transcription factors and cofactors (Zhou et al, 2011; Lenhard et al, 2012).
- Transposable elements (TEs)
- Genomic sequences that can translocate to another location or change their copy number in the genome. Class I TEs move through a reverse‐transcribed RNA intermediate and include, according to their reverse transcriptase and mechanistic features, long terminal repeats (LTR)/endogenous retroviruses (ERV) and long and short interspersed nuclear elements (LINEs and SINEs). Class II TE do not depend on an RNA intermediate and include the subclass 1, which moves through a “cut‐and‐paste” mechanism and subclass 2, which duplicates without double strand cleavage (Wicker et al, 2007; Rebollo et al, 2012).
- © 2015 The Authors