【2017 Workshop】Jul 11

Big-data Analysis, IoT and Bioinformatics

July 11, 2017
Conference Room 11-17, IST, Hokkaido University

Organized by:
Hiroshi Hirata (IST, Hokkaido University)
Yoshikazu Miyanaga (IST, Hokkaido University)
Hidemi Watanabe (IST/GSB, Hokkaido University)
Naoki Osada (IST/GSB, Hokkaido University)


Session 1: Big-data Analysis
Chair: Hideyuki Imai (IST, Hokkaido University), Naoki Osada (IST/GSB, Hokkaido University)

13:30 – 13:35
Hiroki Arimura (IST/GSB, Hokkaido University)
Opening Remarks
13:35 – 13:48
Kousuke Fukui and Akihisa Tomita (Laboratory of Optical Processing and Networking, IST, Hokkaido University)
Analog quantum error correction on encoded qubits for large scale quantum computers
13:48 – 14:01
Kenta Ishihara, Takahiro Ogawa and Miki Haseyama (Laboratory of Media Dynamics, IST, Hokkaido University)
Detection of Gastric Cancer Risk from X-ray Images based on Machine Learning
14:01 – 14:14
Keisuke Maeda, Sho Takahashi, Takahiro Ogawa and Miki Haseyama (Laboratory of Media Dynamics, IST, Hokkaido University)
Deterioration Level Estimation on Transmission Towers based on Machine Learning
14:14 – 14:27
Ren Togo, Kenta Ishihara, Takahiro Ogawa and Miki Haseyama (Laboratory of Media Dynamics, IST, Hokkaido University)
Estimation of regions related to Helicobacter pylori infection from gastric X-ray images
14:27 – 14:40
Namo Podee, Yoshinori Dobashi and Tsuyoshi Yamamoto (Laboratory of Information Media Environment, IST, Hokkaido University)
GPU Adaptive Path Tracing without Atomic Instruction
14:40 – 14:53
Hongjie Zhai and Makoto Haraguchi (Knowledge-Base Laboratory, IST, Hokkaido University)
Guessing Associated Features by Non-negative Tri-Factorization
14:53 – 15:03

Session 2: IoT
Chair: Hiroshi Hirata (IST, Hokkaido University)

15:03 – 15:16
Myat Hsu Aung, Hiroshi Tsutsui and Yoshikazu Miyanaga (Laboratory of Information Communication Networks, IST, Hokkaido University)
An Implementation of WiFi Based Indoor Positioning System Using Estimated Reference Locations
15:16 – 15:29
Itaru Hida, Masayuki Ikebe and Tetsuya Asai (Laboratory for Integrated NanoSystem, IST, Hokkaido University)
Shinya Takamaeda-Yamazaki and Masato Motomura (Laboratory for Integrated Digital System Architecture, IST, Hokkaido University)
A Versatile and Energy-Efficient Reconfigurable Accelerator for Embedded Microprocessors
15:29 – 15:42
Kodai Ueyoshi, Shinya Takamaeda-Yamazaki and Masato Motomura (Laboratory for Integrated Digital System Architecture, IST, Hokkaido University)
Masayuki Ikebe and Tetsuya Asai (Laboratory for Integrated NanoSystem, IST, Hokkaido University)
Hardware Accelerator Design for Convolutional Neural Networks with low bit precision
15:42 – 15:55
Xiaoxiong Xing, Yoshinori Dobashi and Tsuyoshi Yamamoto (Laboratory of Information Media Environment, IST, Hokkaido University)
Learning interior design using convolutional neural networks
15:55 – 16:08
Kasho Yamamoto, Shinya Takamaeda-Yamazaki and Masato Motomura (Laboratory for Integrated Digital System Architecture, IST, Hokkaido University)
Masayuki Ikebe and Tetsuya Asai (Laboratory for Integrated NanoSystem, IST, Hokkaido University)
Time-Division Multiplexing Ising Machine on FPGAs
16:08 – 16:18

Session 3: Bioinformatics
Chair: Hidemi Watanabe (IST/GSB, Hokkaido University)

16:18 – 16:31
Keito Aoki, Kanako Koyanagi and Hidemi Watanabe (Laboratory of Genome Sciences, IST, Hokkaido University)
Error detection and classifying mixed genomes methods for next generation sequencing based on the characteristics of reads orientation
16:31 – 16:44
Sangeetha Ratnayake, Toshinori Endo and Naoki Osada (Information Biology Laboratory, IST, Hokkaido University)
Amino Acid Exchangeability and Disease-causing Ability in Human Beta Globin Gene
16:44 – 16:57
Dai Watabe, Naoki Osada and Toshinori Endo (Information Biology Laboratory, IST, Hokkaido University)
Hiroshi YUASA (The Research Institute of Evolutionary Biology)
American Traditional Bottle Gourds Possessed Hybrid DNA in the Nucleus and Chloroplasts: Alternative Scenario for Ancient Propagation of Lagenaria siceraria
16:57 – 17:02
Hiroki Arimura (IST/GSB, Hokkaido University)
Concluding Remarks


[BD-1] Fukui et al.
Today, big data analytics is valuable for text analytics, machine learning, predictive analytics, data mining, statistics, and businesses. Quantum computation (QC) is an attractive tool to perform faster processing speed for big data analytics, because QC has been shown to solve efficiently some hard problems for conventional computers. Currently, a small scale QC with various quantum systems has been demonstrated. However, a practical quantum computation is still a significant experimental challenge, because of error accumulation. In this work, we propose a method which can alleviate the requirement on error correction for encoded qubits. This novel method improves the tolerance against errors and will pave the way for constructing a practical quantum computers.
[BD-2] Ishihara et al.
This paper presents an automatic detection method for gastric cancer risk from X-ray images. Helicobacter pylori (H. pylori) infection causes the development of gastric cancer, and its risk is decreased by H. pylori eradication therapy. Therefore, mass screening by image diagnosis becomes more and more important. However, as the number of patients becomes larger, the workload of doctor becomes heavier. Therefore, more efficient image diagnosis becomes feasible by an automatic detection method for gastric cancer risk. In the proposed method, we integrate probabilities of multiple results calculated by inputting each patch into Convolutional Neural Network based on the soft voting. By utilizing this approach, we can consider the confidence of the result for each patch to determine the final detection result.
[BD-3] Maeda et al.
Maintenance inspection of transmission towers is important. Especially, deterioration level estimation is one of the most indispensable tasks. Since visual inspection has been mostly performed by inspectors in order to estimate these levels, more efficient inspection methods are required. Therefore, this paper presents automatic deterioration level estimation based on machine learning. Specifically, extreme learning machine which is one of the neural network-based machine learning methods is used in the proposed method. Consequently, supporting maintenance inspection is realized by using the proposed method.
[BD-4] Togo et al.
Helicobacter pylori (H. pylori) infection and H. pylori-induced gastritis are a key factor of gastric cancer. Since a high level of expertise is necessary for diagnosis of H. pylori infection from gastric X-ray images, computer-aided diagnosis systems are desirable for realizing effective gastric cancer mass screening. This paper presents a new method that estimates regions related to H. pylori infection based on machine learning techniques. Visual supports for clinicians become feasible by using our method.
[BD-5] Podee et al.
We present an adaptive technique for path tracing on a GPU without the use of atomic instruction. The technique improves the efficiency of the current state of the art parallel path tracing methods. Our method uses a stream compaction algorithm to generate, in parallel, a list of pixels to be traced, also called a sample stream, which may contain multiple samples for each pixel. To accelerate the convergence, we choose pixels to be traced by predicting the square error reduction rate, which is computed by comparing the past path tracing result and its filtered version with a bilateral filter. Then, we use traditional stream compaction path tracing for the generated sample stream and accumulate the result iteratively, in parallel. We show that our method is up to 2.6 times faster compared to previous parallel path tracing techniques for equal-quality rendering. We also analyze how much improvement has been achieved in different scenes and discuss the limitations of our method.
[BD-6] Zhai et al.
In this paper, we try to guess feature associations from limited knowledge. That is, given two object sets with their own feature sets, our task is to guess associations between features, where only a small part of associations is presented. We call the known associations as “hints”. To achieve this goal, we build common clusters cross all the features, where the known associated features will be clustered into a common cluster. For other features, they will be clustered based on their similarities with hints. Technically speaking, we use the Non-negative Tri-factorization to do the clustering on all features. A laplacian constraint is proposed to guarantee associated features will be put in the same cluster. We experimentally show that the proposed method can guess many meaningful associations.
[IoT-1] Aung et al.
In this paper, we present an implementation of WiFi-based indoor positioning system using estimated reference locations. In case of general WiFi-based indoor positioning systems, the database of WiFi access points is constructed by gathering pairs of MAC address and received signal strength indicator (RSSI) value of each known reference location. However, this task requires high cost since the administrator should know the actual position of each reference location. In the proposed approach, the database is constructed by gathering MAC-RSSI pairs using a reference device moving in a constant speed with simple direction. Assuming a constant speed, the location of each reference point can be estimated from the velocity. Estimation accuracy evaluation results show that user’s locations can be roughly estimated.
[IoT-2] Hida et al.
Conventional processors are energy in-efficient in that they fail to utilize the fact that most of their time and energy are spent on heavily-recursively executed small code segments. A DYNaSTA accelerator, proposed and implemented, is an architectural solution to such a problem. Not only exhibiting around an order of magnitude energy efficiency improvement, the architecture can also exploit full potential of the low-power circuit techniques such as DVFS and power gating.
[IoT-3] Ueyoshi et al.
Deep learning, especially the convolutional neural network (CNN), is a state-of-the-art model that can achieve significantly high accuracy in many machine learning tasks. Recently, efficient hardware platforms for accelerating CNN have been throughly studied. A binarized neural network has been reported to minimize the multipliers, which consume a large amount of resources, with a minimal decrease in accuracy. In this study, we analyzed the optimal performance of CNN implemented on an field programmable gate array (FPGA) considering its logic resources and a memory bandwidth, using multiple types of parallelisms such as kernels, pixels, and channels both in conventional and binarized CNNs. As a result, it became clear that all the parallelisms are required for the binarized neural network to obtain the best performance.
[IoT-4] Xing et al.
Previous works on interior design have used optimization applied to hand-crafted cost functions. There are works which design their cost functions by following interior design guidelines or through experience, and there are works that start by building statistical models reflecting furniture to furniture’s spatial relationships and then sample from those models. Neural networks, on the other hand, excels at finding the intrinsic relationship among furniture in a design sample, therefore, we propose to apply convolutional neural networks to learning end-to-end interior design.
[IoT-5] Yamamoto et al.
Annealing machines based on the Ising model which can solve combinatorial optimization problems is an emerging solution to overcome the performance limit of von Neumann architecture. When Ising processor solves the problem, conversion of the problem is necessary to embed the problem on the Ising processor. However, the conversion causes a decrease in the solution accuracy and convergence speed. In this research, we propose the time-divison multiplexing architecture to solve the conversion problem.
[Bio-1] Aoki et al.
Next generation sequencing (NGS) produce a large number of reads (DNA fragments), so that enable inexpensive DNA sequencing. The error rates for NGS are known to be higher (10-2 – 4*10-2) than those for traditional Sanger sequencing (10-2 – 10-2). Thus, when mixed DNA samples such as metagenome or diploid genome are analyzed by NGS, it is difficult to distinguish errors from substitution or polymorphic sites of mixed samples. In general, use majority vote by higher coverage to avoid error, but it has negative effect for wasting the performance of NGS and excluding low frequency substantial sequences. Here we surveyed statistical characteristics of errors from NGS and developed an algorithm for detecting errors based on that characteristics. Our algorithm can distinguish errors from substitution or polymorphic sites even if the frequency of a mixed sample is low by the statistical test and followed by read classifying step. In statistical test, comparing distribution of base status between reads orientation in each site. And then, we classify reads according to the test result. We will discuss the results of analyzing raw data in which two adenovirus genomes were mixed.
[Bio-2] Ratnayake et al.
Amino acid exchangeability of proteins is concerned as important in therapeutic medical investigations. The effect of amino acid changes became conversational due to their ability of being a benign or a disease-causing mutation. Many studies have been carried out considering the information about evolutionary conservation, stability of the protein or the physio-chemical properties to understand the relationship between the mutation and its effect.

In order to understand these consequences we focus on Human beta globin gene (HBB). Beta globin gene is an important subunit along with alpha subunit and composes hemoglobin protein, which plays a vital role in humans as it transport oxygen and other gases throughout the whole body. Disorders in HBB are one of the most frequently observed genetic diseases in humans, where many mutations has been reported at almost every amino acid site.

We investigated a mutation pattern in human HBB, focusing on many aspects such as evolutionary conservation, structure stability, and physiochemical properties of amino acid mutations, in order to understand the consequences of disease causality. We applied a logistic regression model containing many relevant explanatory variables in the model, such as distance between the amino acid and the ion-molecule, free energy change by mutations, residue depth from protein surface, entropy of amino acid in vertebrates, and physio-chemical amino acid mutation classification.

The logistic regression analysis revealed physio-chemical properties of amino acids have statistically significant effect, whereas the rest of the variables showed significant effects only when interacting with another variable . However, we found an exceptional behavior of amino acid exchanges which results disease and non-disease phenotypes in HBB based on physiochemical classification of amino acids, suggesting that there are still some hidden reasons determining the disease causality of amino acid mutations.

[Bio-3] Watabe et al.
Bottle gourd Lagenaria siceraria is one of the primarily domesticated plants. It was suggested by archaeological records dated back around 10,000 years before the present (B.P.) from the sites worldwide including Mexico and Japan. Even older records were also reported at the sites in American Continents. Two major scenario had been proposed for the bottle gourd arrival in American Continents – by human carriage from Asia, and by current floating directly from Africa via Atlantic Ocean. For the sake to provide stronger evidence, we here report the DNA analysis of 60 seed specimens collected throughout three decades including those inheritedly grown by local tribes, for nuclear and chloroplast DNA. For this purpose, we addressed two characteristic Insertions/Deletions (INDELs) of chloroplast DNA. Interestingly, the INDELs separated American samples into Asian and African subtypes. This suggested their origins were heterogeneous. Nuclear DNA analysis, however, suggested all American specimens were hybrids except one Guatemalan was pure Asian type while no pure African subtype was found. Because they were derived from the in-tribe grown and that no wild species were found in America, our results would suggest the ancient transmission to America predates by human carriage from Asia, rather than direct floating from Africa.

Number of Participants

19 researchers (2 from foreign institution)
29 students

pdf (2.4 MB)


Graduate School of Information Science and Technology
Global Institution for Collaborative Research and Education
Global Station for Big Data and Cybersecurity