We had several of our Computer Science students participate in the Student Research Symposium that took place March 9th and 10th. Three students created research posters that were reviewed by a judging committee, while one of our students gave a research presentation in front of a judging panel. We are excited that our computer science students are getting involved in campus wide activities and commend each one of them for a job well done. Their research abstracts are below.
Host Prediction for Viral Metagenomes Using Oligonucleotide Profiles
Michiyo Wellington-Oguri, Computer Science
Robert Edwards, Computer Science
Metagenomic sequencing of virus-like particles has identified large numbers of viral sequences with no known homologs. The role of these viruses remains elusive, especially in environments where many potential hosts co-occur. Here, we use a Random Forest to classify viral sequence fragments by host based on the frequency that oligonucleotide sequences appear in the genomes. Our approach classifies viral sequences with a 11% to 91% precision rate, depending on the sequence length, sequencing error rate, and confidence requirements. We use this tool to analyze viral gut metagenomes and compare the predicted host distributions with the corresponding bacterial metagenomes.
Ocean Surveying Software
Brian Hudson, Computer Science
Robert Edwards, Computer Science
Real time data collection from remote sensing equipment is a continual challenge for field biologists. The quantity of data and amount of simultaneous information recorded at the field site requires some complex computational logistics to ensure data integrity and accuracy. The ATRIS system developed by the US Geological Survey continuously records photographic imaging of the seabed, third-dimensional analysis of the bathymetry, location-based sensing via GPS, and motion-based sensing from direct interactions with the host vehicle. Here we present a new application that integrates the data streams from these remote sensors and displays the information to the user. The data streams are coordinated and stored for future use. All of the software was developed in C++ and will be released using an open source license.
Database Structure and Visualization Software for the Viral Dark Matter Project
Nicholas Turner, Computer Science
Robert Edwards, Computer Science
Viruses in the environment play a large role in community nutrient cycles through complex interactions with their hosts, by causing host death or modifying host metabolism. These interactions, though, are relatively poorly understood because on average, only around 10-30% of their proteins have similarity to any other known protein. To address this, we are functionally characterizing these proteins through how they affect host physiology using two different tools: metabolomics and phenotype microarrays, each of which address a different aspect of metabolism. Metabolomics give a picture of how the expressed protein affects the cell’s overall metabolism while phenotype microarrays provide insight into how the protein affects the cell’s ability to utilize specific compounds. These datasets are information rich and by nature, extremely large. In order to parse, order, compare and visualize the data, computational methods in the form of parsing software, database input/output and visualization software were used. The microarray data was output from the plate reader and uploaded to the web-based application that first parses the data and assigns relevant information to each piece such as time of measurement. The data is then funneled into a database rich with extraneous information. Most importantly, the database maps the important relationships between the data and associates relevant information that can be used in various methods of visualization. This pipeline forms the basis upon which subsequent metabolic modeling software can be used to predict protein function.
Application of Singular Value Decomposition in Feature Extraction Using Mammogram Recognition
Saifuddin Tariwala, Computer Science
Roman Swiniarski, Computer Science
This paper presents the application of Singular value decomposition in feature extraction and selection method using mammogram recognition systems. Mammographic images were classified into two categories, normal and cancerous. The methods of feature extraction were investigated using singular value decomposition. The feature patterns were reduced and selected using principal component analysis (PCA) and rough sets. The rough sets methods were applied to the final selection of the pattern features. Mammogram recognition is one of the most difficult pattern recognition tasks. Several techniques have been developed for mammogram recognition. One of the most prominent methods is based on application of singular value decomposition for feature extraction from mammographic images. The process of feature extraction, pattern forming, and classifier design (for a given data set) belongs to the most difficult in data mining. The goal of feature extraction and selection is to find the minimal and the most representative pattern for processing and classification. For example the one and 2-dimensional Fourier transform allows us to obtain representation of pattern in space that is better for classification.