Genomic Regions and Transcription
Given that a primary function of DNA is to guide the production of proteins, it might be assumed that genomic DNA is dominated by the sequences of genes that encode proteins. Somewhat counterintuitively, this is not the case. In fact, it is estimated that only 1% of the human genome is part of a "coding region", meaning the region contains information used to create proteins. The function of the other 99%, known as "non-coding regions" of the genome, is an area of active research.
When a gene is transcribed from DNA to RNA through the process of transcription, a mature mRNA (or "mature messenger RNA", sometimes just shortened to "mRNA" where the "mature" part is implied) is typically the desired final product. During the first step of transcription, the relevant coding region of a gene within the DNA sequence is turned into a single-stranded, immature version of RNA. This immature RNA is not used directly to encode for proteins. Instead, it is cut up into pieces and spliced together to produce the working recipe as mature mRNA. The sections of the gene which are included in the final recipe are called the exons, while the sections of the sequence that are spliced out of the recipe are called introns. The figure below depicts how introns and exons are related to the final mRNA recipe molecule.
Continuing to build on our bakery analogy, each recipe (gene) has a number of variations or "flavors" that can be added to tweak the final cake that is made. At one point in time, the cell may make a version of the gene above by including all four exons in the figure above. At some later point in time, the cell might make a version of this protein only including exons 1, 3, 4, which would have a slightly different effect. These events are called alternative splicing events and are an area of active research (you can learn more about alternative splicing in the relevant advanced topics guide).
Variations in the exonic regions of the genome can significantly change which parts are used to assemble a protein, thereby, changing its function in a cell, like changing sugar to salt in a recipe. Variations in non-coding regions outside exonic sequencing are still important (examples of important non-coding genomic sequences include transcription factor binding sites and super-enhancers). The effect of variations in non-coding DNA are not as easily interpreted as exonic variants, but frequently they are significant in disease and are discussed in more detail within the Genomic Variation section of this guide.