CIS 455: Bioinformatics (Fall 2005)
Home
Lectures
Assignments
Project
Announcements
Term Project
You are required to do a term project in this class. Your project shall be an implementation of an algorithm in bioinformatics. The topics are not confined to the course syllabus. You should create and type in your own code, and any copying (electronic or otherwise) of another person's code or code fragments is a violation of the Academic Ethical Standards. You need to write a documentation for your project. The documentation should have the following components:
- a cover page stating the title of the project, your name, the course name, the semester, the instructor's name, and the due date;
- a description of your project that includes an introduction, major steps of the algorithm, and what is achieved;
- a printout of your code or scripts that should be properly documented and indented.
You need to turn in one copy of this documentation as well as copies of the primary journal/conference articles that you used. You are also required to show a demo of your implementation to me before/when you turn in the above materials. Matlab is the recommended programming language for your project. If you want to use another language, it must be approved by the instructor.
Topic Proposal
As the class moves on, I will put a few potential project topics on this page. You can also select a different topic of your interest. However, all topics must be approved by the instructor. No matter whether you select a topic from here or by yourself, you are required to turn in a topic proposal that includes a title, a short abstract, and a bibliography. The bibliography must include at least the papers that describe the algorithm you want to implement.
If you are interested in any of the following topics, be sure to see me for a discussion and I will have more detailed information for you.
- Potential topics
- Multifactor Dimensionality Reduction (MDR): MDR is a nonparametric and genetic model-free alternative to logistic regression for detecting and characterizing nonlinear interactions among discrete genetic and environmental attributes. The MDR method combines attribute selection, attribute construction, and classification with cross-validation and permutation testing to provide a comprehensive and powerful data mining approach to detecting nonlinear interactions. See more details about MDR at http://www.epistasis.org/mdr.html
- Design a project in which you use MDR to solve some bioinformatics problem. (Ref1, Ref2)
- A real project from Prof. Eli Stahl at Department of Biology: The project is to design and populate a database to integrate diverse biological data related to plant drought genetics (gene annotations, DNA and amino acid sequence, genetic markers, plant strains, geographical and climatalogical data, microarray gene expression data, results of other bioinformatic analyses), to design query tools to generate integrated datasets, and to begin analyses to statistically and/or visually integrate them. This project can be divided into three sub-projects as follows and each student can work on one of those.
- One is to cross reference a large number of marker locations with gene coordinates (cds, exon/intron positions).
- Another is to analyze microarray data, to cluster genes with similar expression patterns, and then to look for over-represented motifs in their promoters as well as patterns in the gene descriptions.
- After looking at the microarray data Prof. Stahl has on hand, more data are available. The motifs/patterns discovered can also be searched for in different gene sets- from genetic candidate regions. Further analyses would integrate motif/pattern data with other data in non-parametric multivariate correlational analyses.
Note: Part of project is to develop cgi-bin scripts to do these things on a web interface. If you want to do one of these projects, you will need to meet and have discussion with Prof. Stahl several times during the semester.
- Microarray data analysis: Gene expression profiles from microarray data can be used to research the function of cells, compare the differences between healthy and diseased tissue, and observe changes with the application of drugs.
- Phylogenetic analysis: Phylogenetic analysis is the process you use to determine the evolutionary relationships between organisms. The results of an analysis can be drawn in a hierarchical diagram called a cladogram or phylogram (phylogenetic tree). The branches in a tree are based on the hypothesized evolutionary relationships (phylogeny) between organisms. Each member in a branch, also known as a monophyletic group, is assumed to be descended from a common ancestor. Originally, phylogenetic trees were created using morphology, but now, determining evolutionary relationships includes matching patterns in nucleic acid and protein sequences.
- Sequence analysis: Sequence analysis is the process you use to find information about a nucleotide or amino acid sequence using computational methods. Common tasks in sequence analysis are identifying genes, determining the similarity of two genes, determining the protein coded by a gene, and determining the function of a gene by finding a similar gene in another organism with a know function.
- The project can also be studying a single paper, a book chapter or a web resource in depth and implementing/extending a related algorithm. The following includes several examples.
- Potential resources to find your own topic
Presentation
You need to do a presentation on your term project. One goal is to introduce your work to other students in this class. Please see here for a set of hints for giving a good talk. Follow these rules when you do your presentation.
Grading
My evaluation of your project will, to some extent, be subjective. However, there are certain rules for the grading. You will receive a numerical score for your presentation and project. The maximum scores for them are as follows:
- presentation: totally 10 points
- term project: totally 10 points
You are required to use a spelling check tool to make sure that your spelling, punctuation, and grammar in your final term project documentation are correct, as well as the overall readability of your writeup.
Note that plagiarism will not be tolerated; if you feel the need to include portions of a textbook or article in your paper, remember to attribute them properly.
There will be a 10% penalty for each day that your topic proposal or final paper/project is late.