We had several of our Computer Science students participate in the Student Research Symposium that took place March 4th and 5th. Three of our students gave a research presentation in front of a judging panel and one created a research poster that was reviewed by a judging committee. We are excited that our computer science students are getting involved in campus wide activities and participating in research opportunities and commend each one of them for a job well done. Their research abstracts are below. We would also like to give a special shout out to Yun Trinh who received one of the 10 Presidents Awards for Oral Presentations.  He was awarded $500 and invited to represent SDSU at the annual California State University Research Competition.

 

Automatic Detection and Classification of Toothed Whale Echolocation Clicks in Diverse Long Term Recordings

Scott Lindeneau, Computer Science (M)

Marie Roch, Computer Science

Scott Lindeneau

When extracting features for machine learning algorithms great attention must be placed on the quality of the features extracted. This work demonstrated the effectiveness and applicability of several ambiguity reducing techniques in the selection and extraction of features for toothed whale echolocation clicks from long term data collections across a variety of collection environments in the Southern California Bight. The ability to automatically detect and then classify a wide variety of species in a wide variety of environments demonstrates the promise of automating the time intensive process of manual annotation. We present results of classification to a variety of species on a 4 TB data set of recordings across a variety of locations and conditions in the Southern California Bight as part of the 2015 Intl. Workshop on Detection, Classification, Localization, and Density Estimation dataset. Echolocation clicks are identified via a multi-stage detection process that identifies potential echolocation clicks via their Teager energy and validates detections using spectral content and timing characteristics. Click spectra are normalized for background noise via spectral means subtraction, a process that has been shown to be effective for mitigating for site and instrument variability in the classification of echolocation clicks. To prevent weak echolocation clicks from contaminating the noise estimation process, candidate noise regions are validated with a click detector with a low threshold.

Features are extracted from echolocation spectra using cepstral features that offer a compaction of the feature space, thus reducing dimensionality. These features are classified with a Gaussian mixture model. We introduce techniques to clean the input data, such as reducing the number of false detections triggered by anthropogenic sources. Our work demonstrates that toothed whale encounters are possible to detect for most species in a wide variety of environments. When classification is limited to encounters that human analysts can reliably identify, the system performs with 89% accuracy, but rejection of encounters that analysts cannot be reliably classified to species remains an open challenge (76% accuracy) for future investigation.

 

Unsupervised Identification of Toothed Whales from Echolocation Clicks

Yun Trinh, Computer Science (M)

Marie Roch, Computer Science

Yun Trinh

There are many regions of the ocean where little is known about toothed whale species assemblages and where many species are not acoustically well understood. The ability to identify potential toothed whale species within a geographic region in an automated manner can provide spatiotemporal information that permits effective allocation of resources for further investigation and/or mitigation of human activities. Examples include spatiotemporal direction of visual survey effort and restricting anthropogenic activities such as seismic exploration to times/ locations that are least likely to affect marine mammals.

In contrast to well studied areas where examples of echo location clicks recorded with visual confirmation of species identity, under studied areas are frequently lacking even basic population distribution information, and standard machine learning techniques that rely on learning from known examples are not applicable. In this study, we rely on unsupervised machine learning techniques that are capable of discerning structure from unknown data.

While echolocation clicks are highly related to toothed whale morphology, complicating factors for using echolocation clicks for species identification include that clicks are highly directional, subject to uneven frequency attenuation, and may be affected by an animal’s behavioral state. Consequently, rather than modeling individual clicks, the distribution of clicks from each toothed whale encounter is modeled using a Gaussian mixture model with the number of mixtures determined by a model selection criterion. Similarity between toothed whale encounters is measured by the symmetric Kullback-Leibler distance, and average-link clustering is used to construct a dendrogram. Preliminary results on data collected between 2009 and 2013 at seven recording sights throughout Southern California Bight are presented. The data contain echolocation clicks from five known species (Baird’s and Cuvier’s beaked whales, sperm whales, Pacific white-sided and Risso’s dolphins) as well as from at least two species that cannot yet be reliably separated (common and bottlenose dolphins). Results show that the technique is capable of clustering the majority of encounters from the same species close to one another within a dendrogram.

 

Underwater Probes

Jeffrey Sadural, Computer Science (M)

Robert Edwards, Computer Science

Jeffrey Sadural

Underwater probes are used to take the measures of various conditions of the water in harsh environments. Such information is valuable in studying the ecology of our oceans and can be applicable to studies on the metagenomic content of the environment.

The cost of underwater probes can be prohibitive for hobbyists, students or professors looking to use them for various small-scale experiments. With the starting price for commercial grade probes starting in thousands of dollars, other ways of obtaining the data needed for study and experiments are constantly sought after.

With the advent of micro-controllers and open source projects such as the OpenROV, we have now been given an opportunity to create our own underwater probes and a fraction of the cost. Us­ing local hardware store bought materials along with specialized hobby circuits and probes, we constructed our own underwater probes, looking to reduce the overall cost to further our studies.

Modeled after the various projects in the open source community we fabricated an underwater probe for use in both freshwater and salt-water. These probes can also be constructed to sit on top of the water’s surface as well as be submerged indefinitely.

The biggest hurdle that we currently face is ensuring the accuracy of the probes.

We are developing improved software and calibrations to deliver a product that anyone can build and use for their own studies and experiments for a fraction of the cost of commercial probes.

 

A genome sequence search engine for papers

Heqiao Liu, Computer Science (M)

Robert Edwards, Computer Science

Heqiao Liu

The amount of published papers about studies of gene and genome sequence increases rapidly comparing to years ago. It is an outcome of the improved efficiency of sequencing technologies, the increased scale of participants of the area of study, and the popularity of on-line paper publications. It brings the demand to identify the sequence(s) discussed in such papers, and, on the contrary, a search engine to find all papers in a set about a target sequence.

The challenge of such system is to provide results with certain level of accuracy. In most papers, sequence is represented by few types of identification serial number, that such character-number sequence can be applied to different subjects in irrelevant areas. Thus, a strategy needs to be built to improve the result correctness of the system. The current method for the system is to evaluate the possibility of a paper relating to bioinformatics or genetic study, and use the weight to rank to filter results.