FSDM 2017

Invited Speaker---Prof. Kilho Shin

Graduate School of Applied Informatics, University of Hyogo, Japan
Information Networking Institute, Carnegie Mellon University, USA

Biography: He received an M.S. in Mathematics from the University of Tokyo and a Ph.D. in Computer Science from the Research Center for Advanced Science and Technology of the University of Tokyo. He is currently a professor of Graduate School of Applied Informatics of University of Hyogo and an adjunct faculty at Carnegie Mellon Information Network Institute. He worked at Fuji Xerox corporate research laboratory as a principal researcher, where he studied tree automaton theory, cryptography and information security. His current research interests include devising technology for untraceable authentication and machine learning theory, with a focus on feature selection, kernels and similarity.

Speech Title: Multiple Alignments for Data Objects and a Generalization of the Center Star Algorithm
Abstract: Multiple alignments of strings have been extensively studied as an effective tool to study string-type data such as DNA. In this speech, we generalize the notion of multiple alignments of strings and introduce M-alignments. M-alignments can be defined for arbitrary data objects that consist of a finite number of components. Such objects can be ordered and unordered trees, rooted and unrooted trees, directed and undirected graphs, partially and totally ordered finite sets and so on. Furthermore, we study how the structure of objects is inherited to M-alignments. For example, we show a sufficient condition for M-alignments of trees to be trees. On the other hand, it is known that the problem to find optimal multiple alignments of more than two strings is NP-hard and the same holds for M-alignments. For the legacy multiple alignments of strings, the center star algorithm is given as an efficient and bounded-error approximation algorithm. In this regard, we see that the algorithm can be extended to M-alignments under a certain reasonable assumption, and moreover, the extended algorithm has the same error bound as the original algorithm for strings. We also apply the generalized center star algorithm to real datasets that describe relation between glycans and particular diseases. Since the structure of glycans is understood as ordered trees, we extracted patterns from M-alignments of glycan structures and see that our method can extract patterns of sub-trees that are effective to characterize the diseases.