Introduction to Bioinformatics
Dr. Huxley Makonde,
Pure & Applied Sciences,
Technical University of Mombasa
Purpose:
 To evaluate, use, apply algorithms and computational tools to solve biological
problems.
Course Learning Outcomes:
By the end of the course unit, the student should be able to:
 Use bioinformatics tools to process biological data.
 Apply basic computational tools to extract data from biological databases.
 Use various bioinformatics software/ tools to conduct basic analyses on sequence
data.
Course description:
Overview of Bioinformatics. Nature of biological data. Major Bioinformatics
Resources. Literature databases. Nucleic Acid sequence databases. Protein
sequence databases. Database searches. Query engines. Exploring EMBOSS and
OMIM. Sequence comparisons & alignment concepts. Sequence alignments
methods. Scoring Matrices. Overview of BLAST. File formats. Alignment algorithms.
Applications of MSA. Molecular Phylogeny. Clustering techniques. Boot strapping.
Derived Data and Derived Databases. Exploring various databases at InterPro
Analysis of Macromolecular sequences. Applications of various tools for protein
sequence analysis.
Teaching methodology:
 Lectures/Tutorials, Discussion, Assignments, Demonstrations and Practicals.
Instructional Materials/Equipment:
 Chalk/White board, charts, handouts, slide and overhead/LCD projector,
duster, computer, laboratory chemicals, equipment, glassware and biological
material.
Course Assessment:
• Continuous Assessments and End of Semester examination.
• Continuous Assessments 30% and final Written Examination 70%
Course Textbooks:
• Arthur L. 2014. Introduction to Bioinformatics. 4th
Ed; Oxford University Press;
ISBN-10: 0199651566, ISBN-13: 978-0199651566.
• Ziheng Y. 2014. Molecular Evolution: A Statistical Approach. Oxford University
Press. ISBN-10: 0199602611, ISBN-13: 978-0199602612.
• Supratim C. 2014. Bioinformatics for Beginners: Genes, Genomes, Molecular
Evolution, Databases and Analytical Tools. 1st
ed. Academic Press. ISBN-10:
0124104711, ISBN-13: 978-0124104716.
Course Journals:
• BMC Bioinformatics; ISSN 1471-2105
• Journal of Bioinformatics and Computational Biology; ISSN 0219-7200
• Journal of Proteomics & Bioinformatics; ISSN 0974-276X
Course Textbooks for further reading:
• Compeau, P., Pevzner, P. 2015. Bioinformatics Algorithms: An Active
Learning Approach. 2nd
Ed. Active Learning Publishers; ISBN-10:
0990374629; ISBN-13: 978-0990374626.
• Lesk, A.M., 2012. Introduction to Genomics. 2nd ed. Oxford University
Press.ISBN-10: 0199564353, ISBN-13: 978-0199564354.
• Brown, S.M., 2013. Next-Generation DNA Sequencing Informatics.1st ed.
Cold Spring Harbor Laboratory Press. ISBN-10: 1936113872, ISBN-13:
978-1936113873.
Course Journals for further reading:
• Evolutionary Bioinformatics; ISSN 1176-9343
• The Open Bioinformatics Journal; ISSN 1875-0362
• Journal of Integrative Bioinformatics; ISSN 1613-4516
What is Bioinformatics?
 Bioinformatics involves use of computers to collect, store, distribute,
retrieval, manage and to analyze large sets of biological data to
formulate hypotheses and build models of underlying biological
processes in order to solve biological questions.
 It involves - Bulk Data storage, mining & analysis
 As a computational biology - development of algorithms and
statistical models to analyze biological data.
NB: Highly repetitive and mathematically complex tasks, are better
handled by computers.
 It helps to understand the code of the genetic material (DNA).
Massive DNA sequences have been generated through different
sequencing platforms. This information needs to be utilized so that it can
be of benefit to mankind.
Continuation
Continuation
 DNA is the basic molecule of life that directly controls the
fundamental biology of life. It codes for genes, which codes for
proteins that determine biological make-up of any living organism. It
is the evolution error of the genomic DNA, which defines the
likelihood of developing diseases or resistance of these disorders.
 The ultimate goals of bioinformatics is to uncover the wealth of
biological information hidden in the massive sequences, structures,
literature and other biological data and obtain a clear insight into
fundamental biology of organisms and to use these data to enhance
the standard of life for mankind.
 Bioinformatics is used now and in the foreseeable future in areas of
molecular medicine to help better or more customized medicine to
prevent or cure disease.
 It has beneficial applications in other fields such as environment,
energy, biotechnology, agriculture etc.
Bioinformatics is an interdisciplinary subject
Scope of bioinformatics
 Storage and retrieval of biological data
 Molecular structures manipulation:-
- visualization and analysis, classification, prediction
 Sequence analysis: Sequence alignments, database searches,
domain and motif detection
 Genomics: mapping, annotation, comparative genomics
 Phylogeny
 Functional genomics: Transcriptome, proteomic, interactome
 Analysis of biochemical networks: metabolic networks, regulatory
networks
 Systems biology: Modeling and simulation of dynamical systems
Continuation
Areas of Application of bioinformatics
 Research in biology:- understanding of molecular organization of the
cell/organism, development, Mechanisms of evolution
 Medicine:- Diagnostic of cancers, detecting genes involved in cancer
 Pharmaceutical research:- mechanisms of drug action, drug target
identification
 Biotechnology:- Gene therapy, bioengineering
 Agriculture:- Improvement of food crops, insect/ herbicides
resistance, improve nutritional quality
 Comparative studies:- Genome comparatives of different species
 Forensic sciences - DNA sequencing may be used along with DNA
profiling methods for forensic identification and paternity testing
Biological information
Dry Vs Wet-lab Experiments
Key areas of Bioinformatics
Some Sequencing technologies
The risks of inference
 Any analysis of massive data will unavoidably generate a certain rate
of errors (false positives and false negatives).
 Good research and development will include an evaluation of the
error rates.
 Good methods will minimize the error rate.
 Trade-off between specificity and sensitivity.
Bioinformatics is a science of inference
Why bioinformatics then????
 In most cases, wet biology will be required afterwards to validate the
predictions
 Bioinformatics can do
 Reduce data to a small set of testable predictions
 Assign a degree of confidence to each prediction
 The biologist will often have to choose the appropriate degree of
confidence, depending on the trade between;
 cost for validating predictions - benefit expected from the right
predictions
 Bioinformatics as in silico biology
• Allows to explore domains that cannot be addressed experimentally e.g., the study of past evolutionary
events
• Phylogenetic inference and comparative genomics give us insights in the mechanisms of evolution and in
the past evolutionary events
 The time scale of these events is however so large (billions of years) that one cannot conceive to
reproduce the inferred events with experimental methods.
What bioinformatics can do!!
 Development of computational tools i.e. writing software and
programs
 Application of these tools to generate biological knowledge (data
analysis and inference)
 Creating databases
 Molecular sequence analysis i.e. performing the following analyses;
 Alignment, database searching, motif and pattern discovery, gene and
promoter finding, genome assembly and comparison.
 Molecular structural analysis such as protein and nucleic acid
structure analysis, comparison, classification and prediction.
 Molecular functional analysis i.e.
 Gene expression profiling, protein-protein interaction, prediction,
protein sub-cellular localization prediction.
• NB: These aspects interact to produce integrated results
Bioinformatics limitations
 Completely relying on the information is dangerous if the information
is inaccurate
 Quality of bioinformatics prediction depends on (i) Quality of the data
and (ii) Sophistication of the algorithms
 Bioinformatics and experimental biology are complementary, thus,
bioinformatics results need to be consistent with experimental biology.
 Sequence data contain errors.
 Downstream interpretation of sequence data will be wrong if the
sequence is or the annotation thereof is wrong.
 Many algorithms lack capability and sophistication to truly reflect
reality.
 Outcome of computation also depends on available computing power.

Introduction to Bioinformatics_BTMB_2018.ppt

  • 1.
    Introduction to Bioinformatics Dr.Huxley Makonde, Pure & Applied Sciences, Technical University of Mombasa
  • 2.
    Purpose:  To evaluate,use, apply algorithms and computational tools to solve biological problems. Course Learning Outcomes: By the end of the course unit, the student should be able to:  Use bioinformatics tools to process biological data.  Apply basic computational tools to extract data from biological databases.  Use various bioinformatics software/ tools to conduct basic analyses on sequence data. Course description: Overview of Bioinformatics. Nature of biological data. Major Bioinformatics Resources. Literature databases. Nucleic Acid sequence databases. Protein sequence databases. Database searches. Query engines. Exploring EMBOSS and OMIM. Sequence comparisons & alignment concepts. Sequence alignments methods. Scoring Matrices. Overview of BLAST. File formats. Alignment algorithms. Applications of MSA. Molecular Phylogeny. Clustering techniques. Boot strapping. Derived Data and Derived Databases. Exploring various databases at InterPro Analysis of Macromolecular sequences. Applications of various tools for protein sequence analysis.
  • 3.
    Teaching methodology:  Lectures/Tutorials,Discussion, Assignments, Demonstrations and Practicals. Instructional Materials/Equipment:  Chalk/White board, charts, handouts, slide and overhead/LCD projector, duster, computer, laboratory chemicals, equipment, glassware and biological material. Course Assessment: • Continuous Assessments and End of Semester examination. • Continuous Assessments 30% and final Written Examination 70% Course Textbooks: • Arthur L. 2014. Introduction to Bioinformatics. 4th Ed; Oxford University Press; ISBN-10: 0199651566, ISBN-13: 978-0199651566. • Ziheng Y. 2014. Molecular Evolution: A Statistical Approach. Oxford University Press. ISBN-10: 0199602611, ISBN-13: 978-0199602612. • Supratim C. 2014. Bioinformatics for Beginners: Genes, Genomes, Molecular Evolution, Databases and Analytical Tools. 1st ed. Academic Press. ISBN-10: 0124104711, ISBN-13: 978-0124104716.
  • 4.
    Course Journals: • BMCBioinformatics; ISSN 1471-2105 • Journal of Bioinformatics and Computational Biology; ISSN 0219-7200 • Journal of Proteomics & Bioinformatics; ISSN 0974-276X Course Textbooks for further reading: • Compeau, P., Pevzner, P. 2015. Bioinformatics Algorithms: An Active Learning Approach. 2nd Ed. Active Learning Publishers; ISBN-10: 0990374629; ISBN-13: 978-0990374626. • Lesk, A.M., 2012. Introduction to Genomics. 2nd ed. Oxford University Press.ISBN-10: 0199564353, ISBN-13: 978-0199564354. • Brown, S.M., 2013. Next-Generation DNA Sequencing Informatics.1st ed. Cold Spring Harbor Laboratory Press. ISBN-10: 1936113872, ISBN-13: 978-1936113873. Course Journals for further reading: • Evolutionary Bioinformatics; ISSN 1176-9343 • The Open Bioinformatics Journal; ISSN 1875-0362 • Journal of Integrative Bioinformatics; ISSN 1613-4516
  • 5.
  • 6.
     Bioinformatics involvesuse of computers to collect, store, distribute, retrieval, manage and to analyze large sets of biological data to formulate hypotheses and build models of underlying biological processes in order to solve biological questions.  It involves - Bulk Data storage, mining & analysis  As a computational biology - development of algorithms and statistical models to analyze biological data. NB: Highly repetitive and mathematically complex tasks, are better handled by computers.  It helps to understand the code of the genetic material (DNA). Massive DNA sequences have been generated through different sequencing platforms. This information needs to be utilized so that it can be of benefit to mankind. Continuation
  • 7.
    Continuation  DNA isthe basic molecule of life that directly controls the fundamental biology of life. It codes for genes, which codes for proteins that determine biological make-up of any living organism. It is the evolution error of the genomic DNA, which defines the likelihood of developing diseases or resistance of these disorders.  The ultimate goals of bioinformatics is to uncover the wealth of biological information hidden in the massive sequences, structures, literature and other biological data and obtain a clear insight into fundamental biology of organisms and to use these data to enhance the standard of life for mankind.  Bioinformatics is used now and in the foreseeable future in areas of molecular medicine to help better or more customized medicine to prevent or cure disease.  It has beneficial applications in other fields such as environment, energy, biotechnology, agriculture etc.
  • 8.
    Bioinformatics is aninterdisciplinary subject
  • 9.
    Scope of bioinformatics Storage and retrieval of biological data  Molecular structures manipulation:- - visualization and analysis, classification, prediction  Sequence analysis: Sequence alignments, database searches, domain and motif detection  Genomics: mapping, annotation, comparative genomics  Phylogeny  Functional genomics: Transcriptome, proteomic, interactome  Analysis of biochemical networks: metabolic networks, regulatory networks  Systems biology: Modeling and simulation of dynamical systems
  • 10.
  • 11.
    Areas of Applicationof bioinformatics  Research in biology:- understanding of molecular organization of the cell/organism, development, Mechanisms of evolution  Medicine:- Diagnostic of cancers, detecting genes involved in cancer  Pharmaceutical research:- mechanisms of drug action, drug target identification  Biotechnology:- Gene therapy, bioengineering  Agriculture:- Improvement of food crops, insect/ herbicides resistance, improve nutritional quality  Comparative studies:- Genome comparatives of different species  Forensic sciences - DNA sequencing may be used along with DNA profiling methods for forensic identification and paternity testing
  • 17.
  • 19.
    Dry Vs Wet-labExperiments
  • 20.
    Key areas ofBioinformatics
  • 21.
  • 22.
    The risks ofinference  Any analysis of massive data will unavoidably generate a certain rate of errors (false positives and false negatives).  Good research and development will include an evaluation of the error rates.  Good methods will minimize the error rate.  Trade-off between specificity and sensitivity. Bioinformatics is a science of inference
  • 23.
    Why bioinformatics then???? In most cases, wet biology will be required afterwards to validate the predictions  Bioinformatics can do  Reduce data to a small set of testable predictions  Assign a degree of confidence to each prediction  The biologist will often have to choose the appropriate degree of confidence, depending on the trade between;  cost for validating predictions - benefit expected from the right predictions  Bioinformatics as in silico biology • Allows to explore domains that cannot be addressed experimentally e.g., the study of past evolutionary events • Phylogenetic inference and comparative genomics give us insights in the mechanisms of evolution and in the past evolutionary events  The time scale of these events is however so large (billions of years) that one cannot conceive to reproduce the inferred events with experimental methods.
  • 24.
    What bioinformatics cando!!  Development of computational tools i.e. writing software and programs  Application of these tools to generate biological knowledge (data analysis and inference)  Creating databases  Molecular sequence analysis i.e. performing the following analyses;  Alignment, database searching, motif and pattern discovery, gene and promoter finding, genome assembly and comparison.  Molecular structural analysis such as protein and nucleic acid structure analysis, comparison, classification and prediction.  Molecular functional analysis i.e.  Gene expression profiling, protein-protein interaction, prediction, protein sub-cellular localization prediction. • NB: These aspects interact to produce integrated results
  • 25.
    Bioinformatics limitations  Completelyrelying on the information is dangerous if the information is inaccurate  Quality of bioinformatics prediction depends on (i) Quality of the data and (ii) Sophistication of the algorithms  Bioinformatics and experimental biology are complementary, thus, bioinformatics results need to be consistent with experimental biology.  Sequence data contain errors.  Downstream interpretation of sequence data will be wrong if the sequence is or the annotation thereof is wrong.  Many algorithms lack capability and sophistication to truly reflect reality.  Outcome of computation also depends on available computing power.