Scientific Director's Corner: Genes are the Blueprints for Making Proteins

Summary

  • Genes are the blueprints of life, passing down instructions from one generation to the next. They're made of DNA and act like master architects, determining traits and cell functions.

  • Stored within the nucleus on chromosomes, genes govern the workings of our individual bodies with their unique sequences.

  • Through processes like transcription and translation, genes are decoded into proteins.

  • Each protein serves a specific function within the cell, like cogs in a machine.


What is a gene?

Genes are pieces of information, passed down from parents to child, that affect the development and function of our cells and body. People have approximately 20,000 – 25,000 genes. The information carried by genes is very specific, each gene can be thought of as a construction manual on how to put together a protein. Proteins help make up the physical structures of cells and tissues and help the cells and tissues perform their functions. The process of how genes make proteins is the subject of this post.

What are genes made of?

Genes are made of a chemical molecule called deoxyribonucleic acid, or DNA. Each molecule of DNA consists of a long chain of smaller chemicals subunits called nucleotides (See Figure 1). A nucleotide is composed of a 5-carbon sugar in a ring structure, a nitrogen-containing ring structure known as a base, and a phosphate group. The 5-carbon sugar can take two forms, ribose or deoxyribose, as shown in Figure 1. Note that an oxygen atom is attached to the #2 (2 prime or 2’) carbon of ribose but is lacking in deoxyribose (hence the name, de-oxy). For DNA, the nucleotide subunits are made of deoxyribose. If the nucleotide subunits are composed of ribose, then the molecule formed is called ribonucleic acid, or RNA, which we will discuss a bit later.

Figure 1. The structures of a nucleotide and 5-carbon sugars found in nucleic acids.


DNA is almost always made up of two interlinking strands (double-stranded), as illustrated in Figure 2. The grey dashed box outlines the structure of one nucleotide. There are four different bases that are used to make up nucleotides: adenine, thymine, cytosine, and guanine, which are abbreviated A,T,C,G. The nucleotides are attached to each other to form a chain, the phosphate group connects the 3’ carbon of one deoxyribose to the 5’ carbon of the next deoxyribose. Because of this attachment a string of nucleotides has a 5’ end (unattached phosphate) and a 3’ end (unattached oxygen). The chain, or strand, of nucleotides on the left is attached to a separate strand of nucleotides on the right by weak chemical bond between the bases (dotted lines). These attachments are specific: A attaches to T and C attaches to G. Note the orientation of the deoxyribose and phosphate groups on the right, they appear to look ‘upside down’, the two strands run in opposite directions, one 5’-3’ the other 3’-5’. Due to the 3-dimensional nature of the nucleotides, the two strands end up wrapping around one another in a double-helix formation.

Figure 2. The double-stranded structure of DNA, showing all four different nucleotides.


Genes are made up of hundreds, or thousands, or tens of thousands of nucleotides strung together. The specific chain of nucleotides, i.e. the specific chain of A’s, T’s, C’s, and G’s and how they are arranged is called a DNA sequence, or genetic sequence, or a genetic code. The specific DNA sequences of each person is different, that is what makes each person unique. A person’s entire set of genes is known as their genome and their specific DNA sequences are known as their genotype. A person’s genotype will determine what they look like and how their body functions, this is known as their phenotype. For example, blue eyes are a phenotype but the specific DNA sequence that makes them blue is a genotype.

Where are genes found?

Virtually every cell in the body contains a copy of every single gene. These copies are stored in a part of the cell called the nucleus. The long strands of DNA that make up the genes are coiled together to form structures called chromosomes (See Figure 3). There are 23 pairs of chromosomes in humans, numbered 1-22 (based on size, biggest to smallest) plus one pair of sex-chromosomes (either XX if you are a female or XY if you are a male). So there are two #1 chromosomes, two #2 chromosomes, etc. A person inherits one #1 chromosome from their mother and one #1 chromosome from their father; therefore, since each chromosome contains genes, a person has two copies of each gene, one inherited from their mother and one from their father

Figure 3. Chromosome structure and how it relates to DNA.

How do genes make proteins?

The process of taking the information contained in a gene and using that to construct a protein is illustrated in Figure 4. Since the DNA that makes up chromosomes contain the genes, it is referred to as genomic DNA. Even though DNA is made up of two strands of nucleotides, for any given gene only one strand is used to make, or ‘code’ for the protein. Generally, when the genetic code is written out, it is this stand’s sequence that is written, so in Figure 4A the sequence that starts CCATAT… is the protein-making sequence. The protein-making strand is also referred to as the sense strand or coding strand. DNA sequences are almost always written in the 5’-3’ direction, regardless of which DNA strand is the coding strand.

Figure 4. Step-by-step process of how the information contained in a gene is used to make a protein.

For most genes, the DNA sequence is broken up into sections called exons and introns; in the example in Figure 4A, the gene has 3 exons and 2 introns. Only the exons are used to make the protein.

Genomic DNA is located in the nucleus of a cell, but proteins are made in the cell cytoplasm. In order to take the information encoded by the genomic DNA from the nucleus to the cytoplasm, it is first copied, or transcribed into an RNA sequence (like a monk would transcribe the contents of a scroll into a book), Figure 4B.

RNA is very similar to DNA with a couple of important differences. First, RNA is made up of only one strand (single-stranded). Second, ribose, not deoxyribose, is the 5-carbon sugar that makes up the backbone. Third, instead of the base thymine, RNA uses the base uracil. When RNA binds to itself or to DNA uracil attaches to adenine, as shown in Figure 5.

Figure 5. Structure of uracil and how it binds to adenine


This process of transcription is carried out by a complex structure known as RNA polymerase. RNA polymerase recognizes a specific sequence of genomic DNA called a promoter, which lies ‘upstream’, or in front, of a gene. The promoter tells the RNA polymerase to bind to the genomic DNA and start copying the DNA. The RNA polymerase does this by moving in a 5’ to 3’ direction. It first unwinds the DNA from its double-helix formation and separates the two strands of DNA. Since the purpose of transcription is to make a replica copy of the coding strand, the RNA polymerase uses the non-coding strand, or antisense strand as a template. It can make a copy because of the specific A:T, C:G relationship between nucleotides, it knows when it ‘reads’ a C (cytosine) for example on the template strand that it needs to add a G (guanine) to the growing RNA sequence; sense RNA uses uracil instead of thymine, when it reads an A (adenine) it adds a uracil. As the RNA polymerase continues down the genomic DNA it rewinds the DNA back into a double-helix.

The RNA sequence made is called messenger RNA, or mRNA, because it acts like a mailman, delivering the information contained in the gene to make a protein from the nucleus to the cytoplasm. When the mRNA is first made it is a copy of both the exon and intron portions of the gene, so it is further described as precursor mRNA, or pre-mRNA. However, just like a good mailman will get rid of your junk mail before delivering your regular mail to you, the junk intron portions of the pre-mRNA have to be thrown out before the good, or exon portions, can be delivered to the cytoplasm. A process known as splicing is performed, which cuts out the intron sequences and stitches together the exons as shown in Figure 4C. The result is a mature mRNA, Figure 4D. In addition, a repeated sequence of adenosine nucleotides, known as a polyA tail, are added to the 3’ end of the mRNA. The polyA tail helps in transporting the mature mRNA out of the nucleus into the cytoplasm and helps stabilize the mRNA from being broken down once in the cytoplasm.

Once the mature mRNA is delivered into the cytoplasm it attaches to a structure called a Ribosome, which translates the language of nucleic acids into the language of amino acids, much like someone might translate Egyptian hieroglyphics into English, Figure 4E. There are 4 different nucleotides that make up RNA but there are 20 amino acids that make up proteins, therefore, translation cannot simply be 1:1. Instead, 3 nucleotides (or bases) in the mRNA are used to code for one amino acid, these sets of 3 bases are known as codons.

One codon is especially important; the ribosome reads the mRNA starting at the 5’ end, when it reaches a sequence of A-U-G it begins to assemble the protein. The A-U-G sequence is known as the start codon because it tells the ribosome to start making an amino acid chain. The section of mRNA before the A-U-G is called the 5’ untranslated region, or 5’ UTR because it is not translated into the language of amino acids.

The ribosome assembles the polypeptide with the help of a special type of RNA called a transfer RNA, or tRNA. Each tRNA is a folded structure that consists of an anticodon on one end and an amino acid on the other, each anticodon is specific for only one amino acid. For example, the anticodon for the A-U-G start codon is U-A-C (A matches U, C matched G); every tRNA that contains a U-A-C anticodon will have a methionine (Met) amino acid attached at the other end.  As the ribosome moves down the mRNA it reads each 3-base codon and selects the proper tRNA with the appropriate anticodon. The amino acid attached to the tRNA is removed and added to the growing amino acid chain.

Just as there is a sequence that tells the ribosome to start making a protein, there are sequences in the mRNA that tell the ribosome to stop making the protein. The sequences U-G-A, U-A-A, and U-A-G do not code for any amino acid and are known as stop codons or termination codons. Once the ribosome reaches one of these sequences it falls off the mRNA and the protein is released, Figure 4F. For the majority of proteins, the initial methionine amino acid is removed as it only served as the initial signal to start making the amino acid chain.

Previous
Previous

We Were in the House … The White House

Next
Next

Celebrating 2023 & Looking Forward