Bioinformatics issues and challanges presentation at s p college

Dr N A Ganai
Professor
Centre of Animal Biotechnology
SKUAST-Kashmir

Contents
 Introduction to Bioinformatics
 Complexity of life
 Size of genome
 Exponential growth in information
generation
 Why and how to handle this
information
 Definition of Bioinformatics?
 Data bases
 Tools
 Scope of Bioinformatics
 Anticipated benefits
 Ethical, Legal, and Social
Issues

DNA is not merely a molecule with a pattern;
it is a code, a language, and an information
storage mechanism

Size of Human Genome
 Each cell carries: 3.2 billion base pairs
 A code you need to write in 500 books, each book of
500 pages
 Length of DNA in adult man:
 The total length of DNA present in one adult human is
calculated as:
 (length of 1 bp)(number of bp per cell)(number of cells in the body)
 (0.34 × 10-9 m)(6 × 109)(1013)
 2.0 × 1013 meters
 That is the equivalent of nearly 70 trips from the earth
to the sun and back.

Human Genome Project
• HGP: International research effort
• Began 1990, completed 2003
• Biggest ever project in life
sciences
• 20 labs participated world
around
• Next steps for ~30,000 genes
– Function and regulation of all genes
– Significance of variations between
people
– Cures, therapies, “genomic
healthcare”

From DNA to Cell Function
DNA sequence
(split into genes)
Amino Acid
Sequence
Protein
3D
Structure
Protein
Function
Cell
Activity
codes for
folds into
dictates determines
has
Lecture 2

Genomics
Transcriptomics
Proteomics
Metabolomics

Year Base Pairs Sequences
1982 680,338 606
1983 2,274,029 2,427
1984 3,368,765 4,175
1985 5,204,420 5,700
1986 9,615,371 9,978
1987 15,514,776 14,584
1988 23,800,000 20,579
1989 34,762,585 28,791
1990 49,179,285 39,533
1991 71,947,426 55,627
1992 101,008,486 78,608
1993 157,152,442 143,492
1994 217,102,462 215,273
1995 384,939,485 555,694
1996 651,972,984 1,021,211
1997 1,160,300,687 1,765,847
1998 2,008,761,784 2,837,897
1999 3,841,163,011 4,864,570
2000 11,101,066,288 10,106,023
2001 15,849,921,438 14,976,310
2002 28,507,990,166 22,318,883
2003 36,553,368,485 30,968,418
2004 44,575,745,176 40,604,319
2005 56,037,734,462 52,016,762
2006 69,019,290,705 64,893,747
2007 83,874,179,730 80,388,382
2008 99,116,431,942 98,868,465
Av. Growth in data generation :
5400 times per year

Exponential Growth in Biological Databases:
High throughput Technologies
PCR : by Kary Mullis 1983 - an employee of Cetus Corporation, a
biotechnology firm in California
 Awarded the Nobel Prize for the discovery of PCR in 1993

 Microarray Technology
 Real-Time PCR
 DNA Chips

Sequencing
 Sanger method : 1975
 Chain Termination Method
 Maxam Gilbert : 1977
 Chemical Modification Method
 Next Generation: 1994
 High Throughput
 Parallel sequencing
 Entire genome can be sequenced
in a matter of weeks

History of DNA Sequencing
Avery: Proposes DNA as ‘Genetic Material’
Watson & Crick: Double Helix Structure of DNA
Holley: Sequences Yeast tRNAAla
1870
1953
1940
1965
1970
1977
1980
1990
2002
Miescher: Discovers DNA
Wu: Sequences  Cohesive End DNA
Sanger: Dideoxy Chain Termination
Gilbert: Chemical Degradation
Messing: M13 Cloning
Hood et al.: Partial Automation
• Cycle Sequencing
• Improved Sequencing Enzymes
• Improved Fluorescent Detection Schemes
1986
• Next Generation Sequencing
•Improved enzymes and chemistry
•Improved image processing
Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998)
1
15
150
50,000
25,000
1,500
200,000
50,000,000
Efficiency
(bp/person/year)
15,000
100,000,000,000 2008

The Genome Sequence
is at hand…so?
“The good news is that we have the human genome.
The bad news is it’s just a parts list”

• Gene number, exact locations, and functions
• Gene regulation
• DNA sequence organization
• Noncoding DNA types, amount, distribution, information content, and functions
• Coordination of gene expression, protein synthesis, and post-translational events
• Interaction of proteins in complex molecular machines
• Predicted vs experimentally determined gene function
• Evolutionary conservation among organisms
• Protein conservation (structure and function)
• Proteomes (total protein content and function) in organisms
• Correlation of SNPs (single-base DNA variations among individuals) with health and
disease
• Disease-susceptibility prediction based on gene sequence variation
• Genes involved in complex traits and multigene diseases
• Complex systems biology including microbial consortia useful for environmental
restoration
• Developmental genetics, genomics
What Next???
We need to know every part, its function
and application

What is Bioinformatics?
 The newest, fastest growing specialty
in the life sciences that integrates
biotechnology and computer science.
 Computers aid to collect, analyze,
and interpret biological information
at the molecular level.
 Bioinformatics encompasses a set of
software tools that aid in:
 molecular sequence analysis,
 structural analysis
 functional analysis
of genes & genomes and their
corresponding products

 Understand a living cell and how it
functions at molecular level
 Develop data basses and
computational tools
 Tools are used to mine (analyze)
databases to generate knowledge
to better understand the living
systems
Goal of Bioinformatics

Biological Data basses : Why
 Why?
 Store all the data (information) related to Genomics, Transcriptomics,
preoteomics, Metabolomics in Data Bases
 Make biological data available to scientists.
 To make biological data available in computer-readable form.
 Types of Databases
 Primary Databases: Store raw DNA/RNA and protein data
submitted by scientists
 GenBank: by NCBI USA www.ncbi.nlm.nih.gov/genbank/
 EMBL: European : www.ebi.ac.uk/embl/
 DDBJ: Japan www.ddbj.nig.ac.jp/
 PDB: Protein Data bank http://www.rcsb.org/pdb/home/home.do

Data Bases … cont.
Secondary data bases: Contain computationally processed or
manually curetted information based on primary data bases.
 SWISS-Prot: Curetted protein data base www.ebi.ac.uk/swissprot
 TrEMBL: Translated Nucleic acid sequences in EMBL
 PIR: annotated protein sequences
 UniProt: Combined database of SWISSProt, TrEMBL, PIR
 Prosite
 PRINTS
 BLOCKS
 PFAM
Specialized Data bases :cater to a particular research interest
 FlyBase
 HIV Sequence data base
 Ribosome data base
 OMIM
 Microarray Gene expression database
 ExPASY etc. etc.

We need
Bioinformatics Tools…
To mine (analyze) databases to generate knowledge to
better understand the living systems
 Search/compare databases
 Sequence Analysis
 Genomics
 Phylogenics
 Structure Prediction
 Molecular Modelling
 Microarrays
 Packages, Misc Apps, Graphics, Scripts

Examples of Bioinformatics Tools
 Database interfaces (Search Tools)
 Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, …
 Sequence alignment
 BLAST, FASTA (Fast All)
 Multiple sequence alignment
 Clustal, MultAlin, DiAlign
 Gene finding
 Genscan, GenomeScan, GeneMark, GRAIL
 Protein Domain analysis and identification
 pfam, BLOCKS, ProDom,
 Pattern Identification/Characterization
 Gibbs Sampler, AlignACE, MEME
 Protein Folding prediction
 PredictProtein, SwissModeler

Five websites
that all biologists should Bookmark
 NCBI (The National Center for Biotechnology Information;
 http://www.ncbi.nlm.nih.gov/
 EBI (The European Bioinformatics Institute)
 http://www.ebi.ac.uk/
 The Canadian Bioinformatics Resource
 http://www.cbr.nrc.ca/
 SwissProt/ExPASy (Swiss Bioinformatics Resource)
 http://expasy.cbr.nrc.ca/sprot/
 PDB (The Protein Databank)
 http://www.rcsb.org/PDB/

Anticipated Benefits of
Genome Research & Bioinformatics
Molecular Medicine : Gene Testing ,
Pharmacogenomics
Gene Therapy
 improve diagnosis of disease
 detect genetic predispositions to disease
 create drugs based on molecular information
 use gene therapy and control systems as drugs
 design “custom drugs” (pharmacogenomics) based on
individual genetic profiles
Microbial Genomics
 rapidly detect and treat pathogens in clinical practice
 develop new energy sources (biofuels)
 monitor environments to detect pollutants
 protect citizenry from biological and chemical warfare
 clean up toxic waste safely and efficiently

DNA Identification (Forensics)
 identify potential suspects whose DNA may
match evidence left at crime scenes
 exonerate persons wrongly accused of
crimes
 establish paternity and other family
relationships
 identify endangered and protected species
as an aid to wildlife officials (could be
 detect bacteria and other organisms that
may pollute air, water, soil, and food
 match organ donors with recipients in
transplant programs
 determine pedigree for seed or livestock
breeds
Benefits: …contined

Agriculture, Livestock Breeding, and
Bioprocessing
 grow disease-, insect-, and drought-resistant crops
 breed healthier, more productive, disease-resistant
farm animals
 grow more nutritious produce
 develop biopesticides
 incorporate edible vaccines incorporated into food
products
 develop new environmental cleanup uses for
plants like tobacco
Benefits …cont
.

ELSI: Ethical, Legal,
and Social Issues
• Privacy and confidentiality of genetic information.
• Fairness in the use of genetic information by insurers, employers,
courts, schools, adoption agencies, and the military, among others.
• Psychological impact, stigmatization, and discrimination due to an
individual’s genetic differences.
• Reproductive issues including adequate and informed consent and
use of genetic information in reproductive decision making.
• Clinical issues including the education of doctors and other health-
service providers, people identified with genetic conditions, and the
general public about capabilities, limitations, and social risks; and
implementation of standards and quality-control measures.
Health and environmental issues concerning genetically modified foods
(GM) and microbes.
Commercialization of products including property rights (patents,
copyrights, and trade secrets) and accessibility of data and materials.

Common Questions
of a Student of biology

Bioinformatics issues and challanges presentation at s p college

Bioinformatics issues and challanges presentation at s p college

More Related Content

What's hot

Similar to Bioinformatics issues and challanges presentation at s p college

More from SKUASTKashmir

Recently uploaded

Bioinformatics issues and challanges presentation at s p college