How many genes are contained in each human cell? What are genes and the human genome

What is the human genome? How long has this term been used in science and, and why does this concept have such great value nowadays?

Human genome- the totality of hereditary material contained in a cell. It consists of 23 pairs.

Genes are individual pieces of DNA. Each of them is responsible for some characteristic or part of the body: height, eye color, etc.

When scientists manage to completely “decipher” the information recorded on DNA, people will be able to fight diseases that are inherited. Moreover, perhaps then it will be possible to solve the problem of aging.

Previously it was believed that the number of genes in our body is more than hundreds of thousands. However international studies Recently it has been confirmed that there are approximately 28,000 genes in our body. To date, only a few thousand of them have been studied.

Genes are unevenly distributed across chromosomes. Why this is so, scientists do not yet know.

The cells of the body constantly read the information that is written in DNA. Each of them does its job: distributes oxygen throughout the body, destroys viruses, etc.

But there are also special cells - reproductive cells. In men these are sperm, and in women they are eggs. They contain not 46 chromosomes, but exactly half - 23.

When sex cells fuse, the new organism contains complete set chromosomes: half from father and half from mother.

This is why children are in some ways similar to each of their parents.

Several genes are usually responsible for the same trait. For example, our height depends on 16 units of DNA. At the same time, some genes affect several traits at once (for example, those with red hair have a light skin tone and freckles).

A person's eye color is determined by two genes, and the one responsible for brown eyes is dominant. This means that it is more likely to manifest itself when it “meets” another gene.

Therefore, a brown-eyed father and a blue-eyed mother will most likely have a brown-eyed baby. Dark hair, thick eyebrows, dimples on the cheeks and chin are also dominant signs.

But the gene responsible for Blue eyes– recessive. Such genes appear much less frequently if both parents have them.

We hope that now you know what the human genome is. Of course, in the near future science may surprise us with new discoveries in this area. But this is a matter for the future.

If you like interesting facts about everything - subscribe to any social network. It's always interesting with us!

Did you like the post? Press any button.

Article for the “bio/mol/text” competition: This interesting question, the answer to which was supposed to be given by the Human Genome Project, completed in 2003. After scientists obtained basic information about the human genome, they tried to determine the number of genes, but this task was not so simple. The purpose of this article is to summarize and analyze scientific data on the compilation of a catalog of human genes.

The general sponsor of the competition is the Diaem company: the largest supplier of equipment, reagents and consumables for biological research and production.


The audience award was sponsored by the Medical Genetics Center.


"Book" sponsor of the competition - "Alpina Non-Fiction"

How little is known about genes! The first time I acutely felt this was while in practice in the laboratory of medical genetics of Harbinsky medical university. The research group where I interned was studying the Sei-1 oncogene, which induces the formation of two-minute chromosomes (DM), which contributes to the development of oncogenesis. However, the mechanism of formation of the Sei-1 oncogene remains unknown to this day. But various gene mutations are the cause of other dangerous diseases person, in addition to cancer. So, in this article we will outline some thoughts on why we still do not know much about genes, and also formulate our opinion about how many genes a person has.

Human Genome Project and complete list of genes

Revealing full list genes are necessary to elucidate the molecular mechanisms of the occurrence and development of cancer, schizophrenia, dementia, and many other human diseases. Sequencing of DNA isolated from patient tissues makes it possible to identify mutations such as nucleotide substitutions, deletions and insertions responsible for the occurrence of these diseases.

Actually, this is why the Human Genome Project was started ( Human genome project, HGP), which lasted from 1990 to 2003. Its main task was to determine the nucleotide sequence of human DNA and the location of 100,000 human genes (as was then believed). In parallel, it was planned to study the DNA of a set of model organisms in order to obtain comparative information necessary for understanding the functioning of the human genome. It was intended that the information obtained as a result of the HGP would become a reference book for biomedical science in the 21st century. The goals of these studies were to gain information about the causes of a range of diseases and, ultimately, to develop treatments for the more than 4,000 genetic diseases that affect humanity, including multifactorial diseases in which genetic susceptibility plays an important role. It was believed that the results of genome sequencing would allow us to determine the location of each gene and their total number. However, subsequent events have proven the opposite: today there are several gene databases that differ significantly from each other. Moreover, the number of protein-coding genes coincides, but the number of genes of other types diverges.

Human Proteome Project

In 2010, at the initiative of the Organization for the Study of the Human Proteome ( Human proteome organization, HUPO) the Human Proteome Project was started ( HPP), which aims to create a complete list of proteins of the species Homo sapiens . To do this, firstly, it is assumed to identify and characterize at least one protein product of protein-coding genes, their single-nucleotide polymorphisms and splicing variants, as well as types of post-translational modification of proteins. Secondly, proteomics data obtained as a result of the implementation of HPP contribute, in addition to genomic data, to the solution of various biomedical problems and the creation of new annotated knowledge bases, such as neXtProt .

Currently neXtProt contains information about 17,487 proteins whose existence has been experimentally confirmed, 1,728 proteins confirmed at the transcript level, 515 identified on the basis of homology, 76 predicted and 571 of unknown nature. Of particular interest are proteins whose existence has not been experimentally proven, although there is evidence that they are encoded by the genome. These are the so-called "lost" proteins, which constitute approximately 18% of all encoded proteins. A resource has been created to identify and characterize such proteins MissingProteinPedia .

The Human Proteome is a continuation of the Human Genome Project. It is expected that through the proteome project we will learn the exact number of protein-coding genes, which will subsequently allow us to understand how many genes a person has.

A little about RNA

The Human Genome Project has shown that RNA molecules are as important to life as DNA. There are many RNAs inside cells (Figure 2). Initially, RNA is divided into non-coding RNAs (ncRNA), which are not translated into proteins, and coding RNAs (mRNA), serving as a matrix for the synthesis of protein polypeptide chains. Non-coding RNAs have a more complex classification. They are infrastructural and regulatory. Infrastructure RNAs are represented by ribosomal RNA (rRNA) and transfer RNA (tRNA). rRNA molecules are synthesized in the nucleolus and form the basis of the ribosome, and also encode the proteins of the ribosomal subunits. Once rRNAs are fully assembled, they move into the cytoplasm, where, as key regulators of translation, they participate in reading the mRNA code. The sequence of three nitrogenous bases in mRNA indicates the inclusion of a specific amino acid in the protein sequence. tRNA molecules bring these amino acids to the ribosomes, where protein is synthesized.

Read more about RNA in the articles “Biomolecules”: “ About all the RNAs in the world, big and small», « Coding non-coding RNAs" And " Power of the Rings: Almighty Circular RNAs» .

Figure 2. RNA species

Regulatory ncRNAs are very widely represented in the body, are classified depending on size and perform a number of important functions(Table 1).

Table 1. Non-coding regulatory RNAs
NameDesignationLengthFunctions
Long non-coding RNAs lncRNA, lncRNA 200 nucleotides 1. Regulate selective DNA methylation by directing DNA methyltransferase
2. They direct the selective planting of repressor complexes polycomb
Small RNAs Small nuclear RNAs snRNA, snRNA 150 nucleotides 1. Participate in splicing
2. Regulate the activity of transcription factors
3. Maintain telomere integrity
Small nucleolar RNAs snoRNA, snoRNA 60–300 nucleotides 1. Participate in the chemical modification of rRNA, tRNA and snRNA
2. Possibly involved in stabilizing the structure of rRNA and protecting against the action of hydrolases
Small interfering RNA miRNA, siRNA 21–22 nucleotides 1. Provide antiviral immune protection
2. Suppress the activity of their own genes
Micro RNA miRNA, miRNA 18–25 nucleotides Suppress translation by RNA interference
Antisense RNA asRNA 1. Short: less than 200 nucleotides
2. Long: more than 200 nucleotides
Block translation by forming hybrids with mRNA
RNAs associated with Piwi proteins piRNA, piwiRNA 26–32 nucleotides They are also called “genome guardians”, they suppress the activity of mobile genetic elements during embryogenesis

Terminology problem

Before answering the question: “How many genes do we have?”, you need to understand what a gene is?

The main focus of HGP was on protein-coding genes. However, as stated in the original HGP report in 2001, " thousands of human genes produce non-coding RNAs (ncRNAs), which are their end products", although at that time about 706 ncRNA genes were known. In a recent article published in the journal BMC Biology Stephen Salzberg ( Steven L. Salzberg) gives the following definition of a gene:

A gene is any portion of chromosomal DNA that is transcribed into a functional RNA molecule or is first transcribed into RNA and then translated into a functional protein.

This definition includes both non-coding RNA genes and protein-coding genes and allows all alternative splicing variants at a single locus to be defined as variants of the same gene. This allows you to exclude pseudogenes– non-functional remnants of structural genes that have lost their ability to encode proteins.

The results of the first two studies indicated the presence of 31,000 and 26,588 protein-coding genes in humans, and in 2004 the complete sequence of the human genome appeared, and the authors estimated that the complete catalog contains 24,000 protein-coding genes. Catalog of human genes Ensembl includes 22,287 protein-coding genes and 34,214 transcripts.

Next Generation Sequencing (NGS)

Emergence of high-throughput methods parallel sequencing(in this type of sequencing, millions of DNA fragments from a single sample are sequenced simultaneously) or next generation sequencing (next-generation sequencing, NGS) made it possible to significantly speed up the search for functional regions of the genome. Biotechnology companies have developed and commercialized various NG sequencing platforms that can sequence from 1 million to tens of billions of short sequences (reads, reads) each 50–600 nucleotides long. The most popular platforms include: Illumina And IonTorrent, using DNA amplification by PCR, as well as single-molecule sequencing platforms such as Helicos Biosciences HeliScope, Pacific Biosciences SMRT (single molecule real-time sequencing), and nanopore sequencing Oxford Nanopore, which perform real-time sequencing and allow reading significantly longer reads - up to 10–60 thousand nucleotides. Additionally, the invention of RNA sequencing ( RNA-seq) in 2008, which was created to quantify gene expression, also contributed to the discovery of transcribed sequences, both coding and non-coding RNAs.

Thanks to NGS, databases of lncRNAs and other RNA genes (such as microRNAs) have grown dramatically over the decade, and current human gene catalogs now contain more RNA-coding genes than protein-coding genes (Table 2).

Table 2. Quantity different types genes in the following databases: Gencode, Ensembl, RefSeq, CHESS
Types of genesGencodeEnsemblRefSeqCHESS
Protein-coding genes 19 901 20 376 20 345 21 306
Long non-coding RNA genes 15 779 14 720 17 712 18 484
Antisense RNA 5501 - 28 2694
Other non-coding RNAs 2213 2222 13 899 4347
Pseudogenes 14 723 1740 15 952 -
Total number of transcripts 203 835 203 903 154 484 323 827

RNA sequencing has revealed that alternative splicing, alternative transcription initiation and alternative transcription termination occur much more frequently than previously thought, affecting up to 95% of human genes. Therefore, even if the location of all genes is known, all isoforms of those genes must first be identified and whether these isoforms have any function or simply represent splicing errors.

Human Gene Databases

The task of compiling a catalog of all genes is still not solved. The problem is that in the last 15 years, only two research groups have compiled a list of dominant genes: RefSeq , which is supported by the National Center for Biotechnology Information ( NCBI) at the National Institutes of Health ( NIH), And Ensembl/Gencode , which is supported by the European Molecular Biology Laboratory ( EMBL). However, despite great progress, now the number of protein-coding genes, long non-coding RNA genes, pseudogenes in the catalogs varies, and the number of antisense RNAs and other non-coding RNAs also varies (Table 2). The catalogs are still being finalized: in the past year, for example, hundreds of protein-coding genes have been added or removed from the list Gencode. These disagreements explain the problem of creating a complete catalog of human genes.

In 2017, a new human gene database was created - CHESS . Notably, it includes all protein-coding genes like Gencode, so RefSeq, so users CHESS no need to decide which database they prefer. More genes may cause more errors, but the creators believe that a larger set will be useful in the study of human diseases that are not yet classified as genetic. Gene set CHESS Currently in version 2.0 it is not yet final, and the creators are certainly working on improving it.

The cells of the body have 46 chromosomes. The carriers of units of heredity are structures cell nucleus– chromosomes.
Chromosomes can be easily observed in dividing cells. The cells of the body contain a diploid set of chromosomes - each chromosome has a sister chromosome similar to itself. Sex cells contain a haploid set of chromosomes.
There are 46 chromosomes in the cells of the human body.
There are two types of cell division - mitosis and meiosis. The first is characteristic of the division of somatic cells, the second occurs during the formation of germ cells.
During mitosis, chromosomes are duplicated and then dispersed to daughter cells. As a result, two cells are formed that are absolutely identical to the parent.
In meiosis, chromosomes are duplicated once, but then followed by two cycles of cell division. During the first division, homologous chromosomes are randomly dispersed into different cells. The second division of meiosis resembles mitosis. As a result of meiosis, four daughter cells with a haploid set of chromosomes are formed.
The process of chromosome recombination during reduction division corresponds to the recombination of Mendelian units of heredity.
The units of heredity are called genes and are arranged linearly on chromosomes. Genes located on the same chromosome are called linked.
Linked genes can recombine due to the process of crossing over, in which regions are exchanged between homologous chromosomes.
The processes of recombination that occur in meiosis underlie genetic variation and lead to the genetic uniqueness of individuals.
Scientists from the Wellcome Trust Sanger Institute in Cambridge have deciphered another human chromosome, which has become the largest mapped on at the moment. Chromosome 20 became the third. It contains information on a range of conditions, from obesity and eczema to dementia and cataracts.

The chromosome contains 727, 32 of which are associated with the development of genetic diseases, including Creutzfeldt-Jakob disease, severe disorders of the immune system, heart disease, and diabetes. The sixty million nucleotides that make up the chromosome make up about two percent of the total human genetic code.

Dr. Panos Deloukas, who led the team, noted that the chromosome contains an additional piece of DNA containing at least one gene. A similar area is found in 37 percent of people of the European race. Scientists do not know whether this gene functions in people and what it is responsible for.

Scientists have also discovered that on the twentieth chromosome there are more than 30 thousand variants of the arrangement of nucleotides, which provides diversity in the structure of DNA. Knowing the variations, scientists say, could help explain, for example, why some people are predisposed to developing cancer or diabetes.

Each human chromosome is represented by two spiral-shaped chains of DNA molecules connected by nucleotides. DNA contains four nucleotides: adenine, thymine, guanine and cytosine. The sequence of nucleotides in DNA molecules determines the genetic code of an organism.

In humans, 99.9 percent of genes are the same, and it is the difference in the structure of 0.1 percent of genes that makes people unique.

Healthy

Human Genome Project- an international research project whose main goal was to determine the sequence of nucleotides that make up DNA and identify 20-25 thousand. The project was the culmination of several years of work supported by the US Department of Energy, in particular workshops held in 1984 and 1986, and the subsequent actions of the Department of Energy. The 1987 report clearly states: “The ultimate goal of this endeavor is to understand the human genome” and “knowledge of the human genome is as essential to the progress of medicine and other health sciences as knowledge of anatomy was to its present state.” The search for technologies suitable for solving the proposed problem began in the second half of the 80s. In 1998, American researcher Craig Venter and his firm Celera Genomics launched a similar privately funded study. In the early 1990s, when

A comparison of tens of thousands of human genomes has shown that there are 3,230 absolutely essential genes.

In biology, there is the concept of a minimal genome - a minimal set of genes without which an organism cannot survive. Of course, there are a lot of questions about this concept. For example, what kind of organism are we talking about? You can take a single-celled bacterium, or you can take a very, very multicellular person - they are so different in their lifestyle that the set of necessary genes in them will obviously also be different.

Human X chromosome under an electron microscope. (Photo by Dr. Gopal Murti/Visuals Unlimited/Corbis)

Human chromosomes at the moment of cell division. (Photo by Lester V. Bergman/CORBIS.)

Again, there is a “lifestyle” point. Under what conditions will a minimal genome be sufficient? The same bacterium can enter an exceptionally favorable nutrient medium, with ideal indicators of temperature, salt content, nutrients, etc., or maybe, on the contrary, go on a starvation ration, and even experience an increase in salinity or acidity. And the set of genes necessary for survival will be different in both cases. Therefore, when discussing the minimal genome, it is often stipulated that we are talking specifically about favorable living conditions.

In general, the idea that some genes are more necessary than others arose relatively long ago: for example, back in 1996, Arkady Mushegyan and Evgeniy Kunin estimated the minimum necessary genome for a bacterial cell to be 256 genes; in 2004, other researchers proposed a set of 204 genes. The minimal genome was built on a comparative analysis of several bacterial genomes; if we talk about a specific organism, then we inevitably have to think about bacteria Mycoplasma genitalium, pathogen genitourinary system human - it has only 517 genes, of which 482 encode proteins; there are 382 vital ones. The mycoplasma genome was considered the smallest for some time, until the DNA of several more microorganisms was read, which can only exist as symbionts inside host cells. So far the champion here is the bacterium Carsonella, living in the cells of psyllids - its genome contains only 182 genes with protein information.

Bacteria are bacteria, but what if you try to estimate the minimum number of genes in a person? This is exactly what a research team led by Daniel MacArthur attempted to do. Daniel MacArthur) from the Broad Institute. You can separate important genes from unimportant ones if you assume that important genes will be in different people completely or almost completely similar to each other. It is known that genes can undergo small changes in sequences that distinguish one individual from another; such changes may not affect the functioning of the protein encoded by the gene at all, or may have only a slight effect. But in the case of important genes, their modifications are very likely to have a bad effect on the body, and it is unlikely to survive. As for unimportant genes, they can, under certain conditions, allow themselves to work not very well, without endangering our lives.

And so the researchers undertook to compare the genes of 60 thousand people with each other (it is worth clarifying that they compared only exons, that is, those sections of genes that carry information about the sequence of amino acids in proteins). In total, we managed to find 10 million differences.

On the other hand, for each gene we estimated the theoretical number of variants that it would receive if they arose in it randomly and remained so. The result of the theoretical estimate was compared with what was obtained during comparative analysis real DNA sequences (taken, remember, from 60 thousand people). As expected, some genes easily “treated” variations in their own sequence, while others, on the contrary, tried to get rid of them. Having counted genes in which there were no or almost no changes, the authors of the work received a figure of 3230 - this is exactly how many human genes cannot afford any, even the slightest, changes in functioning. That is, we can say that these 3230 are the vital genetic set of a person. (Recall that in total the human genome contains, different estimates, from 20 to 25 thousand genes.)

Obviously, modifications in the sequences of such genes immediately lead to some severe disorders or even during embryonic development, so that a person does not even have time to be born, either after birth, in childhood or early youth(a person dies before having children). Indeed, 20% of the 3230 described are known to be associated with various diseases, but the function of most of the remaining genes remains to be determined. The results obtained can be used for medical purposes: it is obvious that the search for the genetic causes of certain diseases is best to start with the “minimum genetic set”.

The new data currently exists in the form of a preprint; there is no article with it yet. It is possible that by the time of official publication, after all the reviewers’ comments, the number of genes will somehow change. However, it may change like this: who knows, what if we take an even larger set of sequences for analysis, then the list of necessary genes will increase? Let's not forget that our genome, like any other, consists not only of coding sequences (that is, those that directly carry information about proteins) - in DNA there are a lot of regulatory regions, promoters, enhancers, insulators, regions, encoding regulatory RNAs, and among them, of course, there are vital ones.

By the way, one of the tasks of determining the minimum genome is creating an organism literally from scratch. In other words, can we, knowing the genetic set of a minimal genome, create a living bacterial cell, even if it requires exceptionally favorable conditions for itself? By the way, they are already trying to do this with bacteria; well, someday it will come to man.

With development natural sciences, which occurred at the beginning of the 20th century, managed to identify the principles of heredity. During the same period, new terms emerged to describe what genes and the human genome are. A genome is a unit of hereditary information responsible for the formation of any property in the body of a carrier. In living nature, it is the transfer of this information that is the basis of the entire reproduction process. This term, like the very definition of what genes are, was first used by botanist Vilhelm Johansen in 1909.

Gene structure

Today it has been established that genes are individual sections of DNA - deoxyribonucleic acid. Each gene is responsible for transmitting data in the human body about the structure of RNA (ribonucleic acid) or protein. As a rule, a gene contains several sections of DNA. The structures that undertake the transmission of hereditary information are called coding sequences. But at the same time, there are structures in DNA that affect the expression of the gene. These regions are called regulatory regions. That is, genes include coding and regulatory sequences that are located separately from each other in DNA.

Human genome

In 1920, Hans Winkler introduced the concept of the genome. At first, this term was used to designate a set of genes of an unpaired single set of chromosomes, which is inherent in biological species. There was an opinion that the genome completely replenishes all the properties of an organism of a certain species. But later the meaning of this term changed a little, since studies showed that this definition does not entirely correspond to the truth.

Genetic information

It was established what genes are and that the DNA of many organisms contains sequences that do not code for anything. In addition, part of the genetic information is contained in DNA, which is located outside the cell nucleus. Some genes responsible for coding the same trait may differ significantly in their structure. That is, a genome is a collective set of genes that are contained on chromosomes and beyond. It characterizes the properties of a certain population of individuals, but the genetic set of each individual organism has significant differences from its genome.

What is the basis of heredity

Many different studies have been conducted in an attempt to determine what genes are. Therefore, it is impossible to answer this question unambiguously. If you believe biological definition of this term, a gene is a DNA sequence containing information about a specific protein. And until recently, this explanation of this term was quite sufficient. But it has now been established that the sequence in which the protein is encoded is not always continuous. It can be interrupted by sections interspersed within it that do not carry any information.

Gene identification

You can identify a gene by a group of mutations, each of which prevents the creation of the corresponding protein. Nevertheless, this statement can be considered correct also with regard to intermittent genes. The properties of their clusters in this case turn out to be much more complex. But this statement is quite controversial, since many genes with a discontinuous chain are found in situations where it is impossible to conduct a thorough genetic analysis. It was believed that the genome is quite constant, and any changes in its overall structure occur only in extreme cases. Specifically, only on an extended evolutionary time scale. But such a judgment contradicts recently obtained data proving that certain rearrangements periodically occur in DNA, and that there are relatively variable components of the genome.

Properties of genes identified in Mendel's work

Mendel's work, namely his first and second laws, precisely formulated what genes are and what their properties are. The first law examines the characteristics of an individual gene. The body contains two copies of each gene, that is, in modern terms, it is diploid. One of the two copies of the gene passes to the descendant from the parent through gametes, that is, it is inherited. The gametes combine to form a fertilized egg (zygote), which carries one copy from each parent. Therefore, the body receives one maternal copy of the gene and one paternal copy.

Two-faced aging gene

As is known, human aging is explained not only by the accumulation of problems in the body, but also by the work of certain genes that carry information about aging. The question immediately arises as to why this gene was preserved during the process of evolution. Why is it needed in the body and what role does it play? Research on this topic was based on breeding a species of mice without the characteristic p66Shc protein. Individuals who lacked this protein were not prone to accumulating body fat, aged more slowly, and suffered less from metabolic changes, cardiovascular diseases and diabetes. It turns out that this protein is a gene that accelerates the aging process. But only laboratory studies gave such results. Then the animals were transferred to natural habitats, and as a result, the population of mutant individuals began to decline. For this reason, it was decided to conduct further research, and as a result, the fact was confirmed that the “aging gene” is of great importance in the body’s adaptation processes and is responsible for the natural energy metabolism in the body of animals.

Richard Dawkins - evolutionary biologist and his "Selfish Gene"

The book written by Richard Dawkins (The Selfish Gene) is the most popular book on evolution. The book sets a non-typical viewing angle; it shows that evolution, or rather natural selection, occurs primarily at the level of genes. Of course, today this fact is no longer in doubt, but in 1976 such a statement was quite innovative. We are created by our genes. All living things are necessary to preserve genes. The world of the selfish gene is a world of ruthless exploitation, fierce competition and deception.