In this special article, Dr Gary Yates takes a closer look at the cannabis genome. He takes a deep dive to unravel some of the mysteries of the cannabis plant at the DNA level.
The cannabis plant has long been revered for its ability to provide humans with a wealth of resources. From clothing and food to medicine and wellbeing, the number of uses grows as technology advances. Do the genes and the DNA of the cannabis plant provide any insight as to why this plant has so many benefits? Are there large-scale observations at the genomic level that account for its unique array of products? Understanding the basic structure of the genome and comparing this with other organisms reveals some interesting details.
The words ‘cannabis genome’ refers to the complete set of genetic material:
DNA = Genes + Non-coding regions; these together make up the genome
DNA (Deoxyribonucleic Acid): The molecule that contains the genetic instructions for life, made of 4 bases known as A, C, G and T. The order and length of these bases can create different functions and regions, including genes.
Genes: Segments of DNA that code for proteins, usually with a specific function.
Non-coding Regions: Parts of DNA that do not code for proteins but could have regulatory roles, structural functions, or no known function.
Genome: The complete set of DNA, including all genes and non-coding regions, for an organism.
Here are a few other definitions that may help:
Genotype: The genotype is another way to describe all the genes of an organism.
Phenotype: The phenotype is the physical manifestation or traits of the genes and their interaction with the environment. The genotype and the environment determine the phenotype.
Chemotype: A chemotype, also known as a chemical phenotype, is based on the chemical compounds the plant produces. It reflects the organism's biochemistry, which can be influenced by its genetic makeup (genotype) and environmental factors. It can also be considered a part of the phenotype.
Cultivar: Cultivar stands for "cultivated variety" and is a term used to describe plants that have been selectively bred for certain desired traits and can be consistently reproduced. It implies human intervention in selecting specific characteristics, such as color, size, yield, taste, or cannabinoid profile.
Locus: the term "locus" (plural "loci") refers to the specific physical location of a gene or other significant sequence of DNA on a chromosome
Eukaryotes: Eukaryotes are organisms with cells that have a nucleus and other organelles enclosed within membranes. Examples include plants, animals, fungi, and protists.


Genome Complexity and Size:
The genome size, measured in base pairs, has significant implications for the scope and scale of the genomic sequencing efforts. Compare this with the human genome, which has approximately 3 billion base pairs and represents a large and complex template for analyzing DNA. The comprehensive mapping of this vast collection of genetic information was a monumental task that initially took years and cost billions of dollars during the Human Genome Project. Since then, technological advances have drastically reduced both the cost and time required to understand the DNA, but the process still requires substantial resources due to the genome's size.
The cannabis genome is smaller, containing about 830 million base pairs. This reduction in size translates to fewer reactions and less data to process and analyze; therefore, sequencing the cannabis DNA is inherently less time-consuming and expensive than the human genome. Smaller genomes require fewer reagents, fewer computational resources for genome assembly and analysis, and shorter periods for data processing. This is particularly significant for industries relying on genomic information for breeding, biotechnological applications, and genetic research, as it allows for more rapid progress and innovation.
However, the reduced complexity doesn't just affect the initial genome assembly. It also impacts ongoing research efforts, as fewer base pairs mean identifying specific genetic traits, variations, or mutations within the genome can be accomplished faster and with less funding. This has practical implications for areas such as variety improvement, disease resistance and understanding the genetic basis of cannabinoid production, which are central to both the agricultural and pharmaceutical aspects of cannabis use.


The table references ploidy number, indicating the number of chromosome sets: diploids have two sets of chromosomes, while octoploids have eight sets. Each species has a distinct chromosome count, and a set. For instance, Cannabis has 20 chromosomes (10 pairs), humans have 46 (23 pairs), and rice has 24 (12 pairs). Although all three are diploid, they differ in their total number of chromosomes. This total number is less about the ploidy and more about the specific evolutionary history and genetic makeup of each species.
What is Bioinformatics?
Bioinformatics is the use of computational and mathematical applications to better process, handle and understand complex data sets, such as the whole-genome. This methodology reveals what the DNA is telling us. Bioinformatics plays a crucial role in genomic work with cannabis, as it does with other organisms, by providing computational tools and methods required to manage and analyse the vast amounts of data generated by sequence data. When sequencing the cannabis sativa genome, or any genome, scientists are faced with millions of short DNA reads that must be assembled into the proper order to create a coherent picture of the entire genome. Bioinformatics offers the algorithms and software to piece these sequences together, identify genes, and predict their functions, usually based on homology. It also helps in comparing cannabis DNA with other species, providing insights into evolutionary relationships and the functions of specific genes.
Furthermore, bioinformatics is indispensable in the study of genetic variation within cannabis. Single Nucleotide Polymorphisms (SNPs. Pronounced ‘snips’) and other genetic markers are identified using bioinformatic tools, enabling researchers to link changes in the DNA with traits of interest, such as cannabinoid profiles, disease resistance, or yield. This is particularly important for marker-assisted selection in breeding programs, where the goal is to develop strains with specific characteristics. In addition to genomic sequencing, bioinformatics methods are used to understand gene expression patterns.


Transcriptomic analyses, where RNA sequences are used to deduce which genes are active under certain conditions or in certain parts of the plant, rely heavily on bioinformatics for data analysis and visualization. As the regulatory environment for cannabis research evolves and the cost of DNA analysis continues to decrease, we may see an explosion of genomic data available for cannabis. Bioinformatics will be essential to process this data efficiently, providing insights that will help maximize this plant's high-quality medical and economic potential.
The complexity of a genome isn't just about the length or number of genes; it also includes the types of regulatory elements, mutation rates, and the genetic diversity within a species. In cannabis, each cultivar has unique chemotypes, usually with a genetic component underlying. High-quality variants of a cultivar, often with a different chemotype, can appear almost identical at the DNA level, with only the odd SNP indicating a difference.
Genetic Diversity
Genetic diversity refers to the entire species' total number of genetic characteristics. In cannabis, this means all 3 subspecies of Cannabis Sativa L: Indica, Ruderalis, and Sativa. It is the variation in the genetic composition among the individuals within a population that comprise the gene pool. This diversity helps populations adapt to changing environments, resist diseases, and maintain overall health. Another way to describe it is through the gene pool. The gene pool is the complete set of unique alleles in a species or population. An allele is a version of a gene, so, for example, in humans, there are different alleles for eye colour. I.e. this gene determines eye colour, but the colour itself is determined by the variant of the allele, of which there are a few different versions. When we talk about different alleles, we refer to different versions of a gene occupying the same locus on homologous chromosomes.
The distinction between two different genes and two variants of the same gene largely depends on their chromosomal position, known as the locus for a single location or loci for multiple locations. Alleles are different versions of a gene at a specific locus.
Gene Content
The cannabis genome, which has been sampled and sequenced, comprises approximately 30,000 genes. Cannabis, like many other plants, has a significant amount of genetic diversity, which is why there are so many varieties with different properties. This resembles the genetic variation found among dog breeds, but it's within a single species.
Making THC, CBD and Other Compounds
Cannabis sativa has unique genes related to cannabinoid production, the compounds responsible for the medicinal and psychoactive properties of the plant. These include THC (tetrahydrocannabinol) and CBD (cannabidiol). These complex pathways involve several key enzymes and genetic components that combine to convert basic building blocks into active compounds. Cannabinoid biosynthesis begins with the precursor molecule olivetolic acid, which combines with geranyl pyrophosphate to produce cannabigerolic acid (CBGA), the "mother" cannabinoid that is the precursor to THC, CBD and CBG. The synthesis of CBGA is primarily catalyzed by the enzyme geranyl pyrophosphate:olivetolate geranyltransferase (GOT). From CBGA, specific enzymes called synthases convert this molecule into the major cannabinoid acids:
- Tetrahydrocannabinolic acid (THCA, which leads to THC).
- Cannabidiolic acid (CBDA, which leads to CBD).
- Cannabichromenic acid (CBCA).
THCA synthase, CBDA synthase, and CBCA synthase are responsible for these transformations. These acids are then decarboxylated through heat to yield their active forms. The genes encoding these synthases have been isolated and characterized, and they are specifically expressed in the trichomes of the cannabis plant, where cannabinoid synthesis is localized. The expression of these genes is influenced by the plant's genetic makeup and environment, explaining the variation in cannabinoid content among different strains and even individual plants.
In addition to the primary cannabinoids, a suite of other enzymes contributes to forming the minor compounds, flavonoids, sterols, and terpenoids that give cannabis its wide-ranging profiles of effects and aromas.


The genetic regulation of these pathways is a subject of intense research, with implications for both the therapeutic use of cannabis and the legal cannabis industry. Advances in genomic and biotechnological methods are enhancing our understanding of these pathways, allowing for the development of cannabis strains with specific cannabinoid profiles tailored to medical needs or consumer preferences.
In comparison, humans have genes that code for the endocannabinoid system, which interacts with both endocannabinoids (produced in the body) and plant-based cannabinoids, and other crops have unique genes responsible for their distinctive features, such as the caffeine synthesis in coffee plants.
Conclusion
Cannabis and humans share a common ancestor if you go back far enough in the evolutionary tree of life, as do all eukaryotes. When comparing cannabis to other crops, it's important to consider their evolutionary paths and breeding histories. Crops like maize have undergone significant selective breeding to enhance specific traits like yield and disease resistance. Cannabis breeding has historically been more focused on traits like cannabinoid profiles and growth characteristics suitable for various climates and purposes.
While cannabis has a less complex genome than humans and differs in size and gene content compared to other important crops, it possesses unique genetic attributes that have been extensively shaped by both natural variation and human cultivation. Understanding its genome helps in the development of specific strains for medical or recreational use and provides insights into plant biology and evolution.


