【2017.7】Big-data Analysis, IoT and Bioinformatics

GSB Workshop (organized as part of GI-CoRE GSQ, GSB & IGM Joint Symposium)

BASIC INFORMATION
Workshop Title Big-data Analysis, IoT and Bioinformatics
Date July 11, 2017
Venue Conference Room 11-17, IST, Hokkaido University
Program Committee Hiroshi HIRATA (IST, Hokkaido University)
Yoshikazu MIYANAGA (IST, Hokkaido University)
Hidemi WATANABE (IST, Hokkaido University)
Naoki OSADA (IST, Hokkaido University)

Program

Session : Big-data Analysis / Chair : Hideyuki Imai, Naoki Osada
13:30-13:35 HIROKI ARIMURA (IST/GSB)
Opening Remarks
13:35–13:48 KOUSUKE FUKUI and Akihisa Tomita (Lab of Optical Processing and Networking, IST)
Analog quantum error correction on encoded qubits for large scale quantum computers
13:48–14:01 KENTA ISHIHARA, Takahiro Ogawa and Miki Haseyama (Lab of Media Dynamics, IST)
Detection of Gastric Cancer Risk from X-ray Images based on Machine Learning
14:01–14:14 KEISUKE MAEDA, Sho Takahashi, Takahiro Ogawa and Miki Haseyama (Lab of Media Dynamics, IST)
Deterioration Level Estimation on Transmission Towers based on Machine Learning
14:14–14:27 REN TOGO, Kenta Ishihara, Takahiro Ogawa and Miki Haseyama (Lab of Media Dynamics, IST)
Estimation of regions related to Helicobacter pylori infection from gastric X-ray images
14:27-14:40 NAMO PODEE, Yoshinori Dobashi and Tsuyoshi Yamamoto (Lab of Information Media Environment, IST)
GPU Adaptive Path Tracing without Atomic Instruction
14:40–14:53 HONGJIE ZHAI and Makoto Haraguchi (Knowledge-Base Lab, IST)
Guessing Associated Features by Non-negative Tri-Factorization
14:53–15:03 Coffee
Session : IoT / Chair : Hiroshi Hirata
15:03–15:16 MYAT HSU AUNG, Hiroshi Tsutsui and Yoshikazu Miyanaga (Lab of Information Communication Networks, IST)
An Implementation of WiFi Based Indoor Positioning System Using Estimated Reference Locations
15:16–15:29 ITARU HIDA,1 Shinya Takamaeda-Yamazaki,2 Masayuki Ikebe,1 Masato Motomura2 and Tetsuya Asai1 (1Lab for Integrated NanoSystem; 2Lab for Integrated Digital System Architecture, IST)
A Versatile and Energy-Efficient Reconfigurable Accelerator for Embedded Microprocessors
15:29–15:42 KODAI UEYOSHI,1 Masayuki Ikebe,2 Tetsuya Asai,2 Shinya Takamaeda-Yamazaki1 and Masato Motomura1 (1Lab for Integrated Digital System Architecture; 2Lab for Integrated NanoSystem, IST)
Hardware Accelerator Design for Convolutional Neural Networks with low bit precision
15:42–15:55 XIAOXIONG XING, Yoshinori Dobashi and Tsuyoshi Yamamoto (Lab of Information Media Environment, IST)
Learning interior design using convolutional neural networks
15:55–16:08 KASHO YAMAMOTO,1 Shinya Takamaeda-Yamazaki,1 Masayuki Ikebe,2 Tetsuya Asai2 and Masato Motomura1 ( 1Lab for Integrated Digital System Architecture; 2Lab for Integrated NanoSystem, IST)
Time-Division Multiplexing Ising Machine on FPGAs
16:08–16:18 Coffee
Session : Bioinformatics / Chair : Hidemi Watanabe
16:18–16:31 KEITO AOKI, Kanako Koyanagi and Hidemi Watanabe (Lab of Genome Sciences, IST)
Error detection and classifying mixed genomes methods for next generation sequencing based on the characteristics of reads orientation
16:31–16:44 SANGEETHA RATNAYAKE, Toshinori Endo and Naoki Osada (Information Biology Lab, IST)
Amino Acid Exchangeability and Disease-causing Ability in Human Beta Globin Gene
16:44–16:57 DAI WATABE,1 Naoki Osada,1 Toshinori Endo1 and Hiroshi Yuasa2 ( 1Information Biology Lab, IST, Hokkaido University; 2The Research Institute of Evolutionary Biology)
American Traditional Bottle Gourds Possessed Hybrid DNA in the Nucleus and Chloroplasts: Alternative Scenario for Ancient Propagation of Lagenaria siceraria
16:57-17:02 HIROKI ARIMURA (IST/GSB)
Concluding Remarks

Abstracts


13:35-13:48

Analog quantum error correction on encoded qubits for large scale quantum computers

KOUSUKE FUKUI and Akihisa Tomita
Laboratory of Optical Processing and Networking, IST, Hokkaido University

Abstract

Today, big data analytics is valuable for text analytics, machine learning, predictive analytics, data mining, statistics, and businesses. Quantum computation (QC) is an attractive tool to perform faster processing speed for big data analytics, because QC has been shown to solve  efficiently some hard problems for conventional computers. Currently, a small scale QC with various quantum systems has been demonstrated. However, a practical quantum computation is still a significant experimental challenge, because of error accumulation.  In this work, we propose a method which can alleviate the requirement on error correction for encoded qubits. This novel method improves the tolerance against errors and will pave the way for constructing a practical quantum computers.


13:48-14:01

Detection of Gastric Cancer Risk from X-ray Images based on Machine Learning

KENTA ISHIHARA, Takahiro Ogawa and Miki Haseyama
Laboratory of Media Dynamics, IST, Hokkaido University

Abstract

This paper presents an automatic detection method for gastric cancer risk from X-ray images. Helicobacter pylori (H. pylori) infection causes the development of gastric cancer, and its risk is decreased by H. pylori eradication therapy. Therefore, mass screening by image diagnosis becomes more and more important. However, as the number of patients becomes larger, the workload of doctor becomes heavier. Therefore, more efficient image diagnosis becomes feasible by an automatic detection method for gastric cancer risk. In the proposed method, we integrate probabilities of multiple results calculated by inputting each patch into Convolutional Neural Network based on the soft voting. By utilizing this approach, we can consider the confidence of the result for each patch to determine the final detection result.


14:01-14:14

Deterioration Level Estimation on Transmission Towers based on Machine Learning

KEISUKE MAEDA, Sho Takahashi, Takahiro Ogawa and Miki Haseyama
Laboratory of Media Dynamics, IST, Hokkaido University

Abstract

Maintenance inspection of transmission towers is important. Especially, deterioration level estimation is one of the most indispensable tasks. Since visual inspection has been mostly performed by inspectors in order to estimate these levels, more efficient inspection methods are required. Therefore, this paper presents automatic deterioration level estimation based on machine learning. Specifically, extreme learning machine which is one of the neural network-based machine learning methods is used in the proposed method. Consequently, supporting maintenance inspection is realized by using the proposed method.


14:14-14:27

Estimation of regions related to Helicobacter pylori infection from gastric X-ray images

REN TOGO, Kenta Ishihara, Takahiro Ogawa and Miki Haseyama
Laboratory of Media Dynamics, IST, Hokkaido University

Abstract

Helicobacter pylori (H. pylori) infection and H. pylori-induced gastritis are a key factor of gastric cancer. Since a high level of expertise is necessary for diagnosis of H. pylori infection from gastric X-ray images, computer-aided diagnosis systems are desirable for realizing effective gastric cancer mass screening. This paper presents a new method that estimates regions related to H. pylori infection based on machine learning techniques. Visual supports for clinicians become feasible by using our method.


14:27-14:40

GPU Adaptive Path Tracing without Atomic Instruction

NAMO PODEE, Yoshinori Dobashi and Tsuyoshi Yamamoto
Laboratory of Information Media Environment, IST, Hokkaido University

Abstract

We present an adaptive technique for path tracing on a GPU without the use of atomic instruction. The technique improves the efficiency of the current state of the art parallel path tracing methods. Our method uses a stream compaction algorithm to generate, in parallel, a list of pixels to be traced, also called a sample stream, which may contain multiple samples for each pixel. To accelerate the convergence, we choose pixels to be traced by predicting the square error reduction rate, which is computed by comparing the past path tracing result and its filtered version with a bilateral filter. Then, we use traditional stream compaction path tracing for the generated sample stream and accumulate the result iteratively, in parallel. We show that our method is up to 2.6 times faster compared to previous parallel path tracing techniques for equal-quality rendering. We also analyze how much improvement has been achieved in different scenes and discuss the limitations of our method.


14:40-14:53

Guessing Associated Features by Non-negative Tri-Factorization

HONGJIE ZHAI and Makoto Haraguchi
Knowledge-Base Laboratory, IST, Hokkaido University

Abstract

In this paper, we try to guess feature associations from limited knowledge. That is, given two object sets with their own feature sets, our task is to guess associations between features, where only a small part of associations is presented. We call the known associations as “hints”. To achieve this goal, we build common clusters cross all the features, where the known associated features will be clustered into a common cluster. For other features, they will be clustered based on their similarities with hints. Technically speaking, we use the Non-negative Tri-factorization to do the clustering on all features. A laplacian constraint is proposed to guarantee associated features will be put in the same cluster. We experimentally show that the proposed method can guess many meaningful associations.


15:03-15:16

An Implementation of WiFi Based Indoor Positioning System Using Estimated Reference Locations

MYAT HSU AUNG, Hiroshi Tsutsui and Yoshikazu Miyanaga
Laboratory of Information Communication Networks, IST, Hokkaido University

Abstract

In this paper, we present an implementation of WiFi-based indoor positioning system using estimated reference locations. In case of general WiFi-based indoor positioning systems, the database of WiFi access points is constructed by gathering pairs of MAC address and received signal strength indicator (RSSI) value of each known reference location. However, this task requires high cost since the administrator should know the actual position of each reference location. In the proposed approach, the database is constructed by gathering MAC-RSSI pairs using a reference device moving in a constant speed with simple direction. Assuming a constant speed, the location of each reference point can be estimated from the velocity. Estimation accuracy evaluation results show that user’s locations can be roughly estimated


15:16-15:29

A Versatile and Energy-Efficient Reconfigurable Accelerator for Embedded Microprocessors

ITARU HIDA,1 Shinya Takamaeda-Yamazaki,2 Masayuki Ikebe,1 Masato Motomura2 and Tetsuya Asai1
1Laboratory for Integrated NanoSystem; 2Laboratory for Integrated Digital System Architecture, IST, Hokkaido University

Abstract

Conventional processors are energy in-efficient in that they fail to utilize the fact that most of their time and energy are spent on heavily-recursively executed small code segments. A DYNaSTA accelerator, proposed and implemented, is an architectural solution to such a problem. Not only exhibiting around an order of magnitude energy efficiency improvement, the architecture can also exploit full potential of the low-power circuit techniques such as DVFS and power gating.


15:29-15:42

Hardware Accelerator Design for Convolutional Neural Networks with low bit precision

KODAI UEYOSHI,1 Masayuki Ikebe,2 Tetsuya Asai,2 Shinya Takamaeda-Yamazaki1 and Masato Motomura1
1Laboratory for Integrated Digital System Architecture; 2Laboratory for Integrated NanoSystem, IST, Hokkaido University

Abstract

Deep learning, especially the convolutional neural network (CNN), is a state-of-the-art model that can achieve significantly high accuracy in many machine learning tasks. Recently, efficient hardware platforms for accelerating CNN have been throughly studied. A binarized neural network has been reported to minimize the multipliers, which consume a large amount of resources, with a minimal decrease in accuracy. In this study, we analyzed the optimal performance of CNN implemented on an field programmable gate array (FPGA) considering its logic resources and a memory bandwidth, using multiple types of parallelisms such as kernels, pixels, and channels both in conventional and binarized CNNs. As a result, it became clear that all the parallelisms are required for the binarized neural network to obtain the best performance.


15:42-15:55

Learning interior design using convolutional neural networks

XIAOXIONG XING, Yoshinori Dobashi and Tsuyoshi Yamamoto
Laboratory of Information Media Environment, IST, Hokkaido University

Abstract

Previous works on interior design have used optimization applied to hand-crafted cost functions. There are works which design their cost functions by following interior design guidelines or through experience, and there are works that start by building statistical models reflecting furniture to furniture’s spatial relationships and then sample from those models. Neural networks, on the other hand, excels at finding the intrinsic relationship among furniture in a design sample, therefore, we propose to apply convolutional neural networks to learning end-to-end interior design.


15:55-16:08

Time-Division Multiplexing Ising Machine on FPGAs

KASHO YAMAMOTO,1 Shinya Takamaeda-Yamazaki1 Masayuki Ikebe,2 Tetsuya Asai2 and Masato Motomura1
1Laboratory for Integrated Digital System Architecture; 2Laboratory for Integrated NanoSystem, IST, Hokkaido University

Abstract

Annealing machines based on the Ising model which can solve combinatorial optimization problems is an emerging solution to overcome the performance limit of von Neumann architecture. When Ising processor solves the problem, conversion of the problem is necessary to embed the problem on the Ising processor. However, the conversion causes a decrease in the solution accuracy and convergence speed. In this research, we propose the time-divison multiplexing architecture to solve the conversion problem.


16:18-16:31

Error detection and classifying mixed genomes methods for next generation sequencing based on the characteristics of reads orientation

KEITO AOKI, Kanako Koyanagi and Hidemi Watanabe
Laboratory of Genome Sciences, IST, Hokkaido University

Abstract

Next generation sequencing (NGS) produce a large number of reads (DNA fragments), so that enable inexpensive DNA sequencing. The error rates for NGS are known to be higher (10-2 – 4*10-2) than those for traditional Sanger sequencing (10-2 – 10-2). Thus, when mixed DNA samples such as metagenome or diploid genome are analyzed by NGS, it is difficult to distinguish errors from substitution or polymorphic sites of mixed samples. In general, use majority vote by higher coverage to avoid error, but it has negative effect for wasting the performance of NGS and excluding low frequency substantial sequences. Here we surveyed statistical characteristics of errors from NGS and developed an algorithm for detecting errors based on that characteristics. Our algorithm can distinguish errors from substitution or polymorphic sites even if the frequency of a mixed sample is low by the statistical test and followed by read classifying step. In statistical test, comparing distribution of base status between reads orientation in each site. And then, we classify reads according to the test result. We will discuss the results of analyzing raw data in which two adenovirus genomes were mixed.


16:31-16:44

Amino Acid Exchangeability and Disease-causing Ability in Human Beta Globin Gene

SANGEETHA RATNAYAKE, Toshinori Endo and Naoki Osada
Information Biology Laboratory, IST, Hokkaido University

Abstract

Amino acid exchangeability of proteins is concerned as important in therapeutic medical investigations. The effect of amino acid changes became conversational due to their ability of being a benign or a disease-causing mutation. Many studies have been carried out considering the information about evolutionary conservation, stability of the protein or the physio-chemical properties to understand the relationship between the mutation and its effect.

In order to understand these consequences we focus on Human beta globin gene (HBB). Beta globin gene is an important subunit along with alpha subunit and composes hemoglobin protein, which plays a vital role in humans as it transport oxygen and other gases throughout the whole body. Disorders in HBB are one of the most frequently observed genetic diseases in humans, where many mutations has been reported at almost every amino acid site.

We investigated a mutation pattern in human HBB, focusing on  many aspects such as evolutionary conservation, structure stability, and physiochemical properties of amino acid mutations, in order to understand the consequences of disease causality. We applied a logistic regression model containing many relevant explanatory variables in the model, such as distance between the amino acid and the ion-molecule, free energy change by mutations, residue depth from protein surface, entropy of amino acid in vertebrates, and physio-chemical amino acid mutation classification.

The logistic regression analysis revealed physio-chemical properties of amino acids have statistically significant effect, whereas the rest of the variables showed significant effects only when interacting with another variable . However,  we found an exceptional behavior of amino acid exchanges which results disease and non-disease phenotypes in HBB based on physiochemical classification of amino acids, suggesting that there are still some hidden reasons determining the disease causality of amino acid mutations.


16:44-16:57

American Traditional Bottle Gourds Possessed Hybrid DNA in the Nucleus and Chloroplasts: Alternative Scenario for Ancient Propagation of Lagenaria siceraria

DAI WATABE,1 Naoki Osada,1 Toshinori Endo1 and Hiroshi Yuasa2
1Information Biology Laboratory, IST, Hokkaido University; 2The Research Institute of Evolutionary Biology

Abstract

Bottle gourd Lagenaria siceraria is one of the primarily domesticated plants. It was suggested by archaeological records dated back around 10,000 years before the present (B.P.) from the sites worldwide including Mexico and Japan. Even older records were also reported at the sites in American Continents. Two major scenario had been proposed for the bottle gourd arrival in American Continents – by human carriage from Asia, and by current floating directly from Africa via Atlantic Ocean. For the sake to provide stronger evidence, we here report the DNA analysis of 60 seed specimens collected throughout three decades including those inheritedly grown by local tribes, for nuclear and chloroplast DNA. For this purpose, we addressed two characteristic Insertions/Deletions (INDELs) of chloroplast DNA. Interestingly, the INDELs separated American samples into Asian and African subtypes. This suggested their origins were heterogeneous. Nuclear DNA analysis, however, suggested all American specimens were hybrids except one Guatemalan was pure Asian type while no pure African subtype was found. Because they were derived from the in-tribe grown and that no wild species were found in America, our results would suggest the ancient transmission to America predates by human carriage from Asia, rather than direct floating from Africa.


Other Information

Proceedings (pdf, 2.4 MB)

Participants: 18 Researchers (2 from foreign institution); 29 Students