## Neural and Evolutionary Computing

### CompNet: Neural networks growing via the compact network morphism

Jun Lu , Wei Ma , Boi Faltings **Subjects** : Neural and Evolutionary Computing (cs.NE)

It is often the case that the performance of a neural network can be improved

by adding layers. In real-world practices, we always train dozens of neural

network architectures in parallel which is a wasteful process. We explored

(CompNet), in which case we morph a well-trained neural network to a deeper one

where network function can be preserved and the added layer is compact. The

work of the paper makes two contributions: a). The modified network can

converge fast and keep the same functionality so that we do not need to train

from scratch again; b). The layer size of the added layer in the neural network

is controlled by removing the redundant parameters with sparse optimization.

This differs from previous network morphism approaches which tend to add more

neurons or channels beyond the actual requirements and result in redundance of

the model. The method is illustrated using several neural network structures on

different data sets including MNIST and CIFAR10.

### Universal approximations of invariant maps by neural networks

Comments: 64 pages, 3 figures

**Subjects**

:

Neural and Evolutionary Computing (cs.NE)

We describe generalizations of the universal approximation theorem for neural

networks to maps invariant or equivariant with respect to linear

representations of groups. Our goal is to establish network-like computational

models that are both invariant/equivariant and provably complete in the sense

of their ability to approximate any continuous invariant/equivariant map. Our

contribution is three-fold. First, in the general case of compact groups we

propose a construction of a complete invariant/equivariant network using an

intermediate polynomial layer. We invoke classical theorems of Hilbert and Weyl

to justify and simplify this construction; in particular, we describe an

explicit complete ansatz for approximation of permutation-invariant maps.

Second, we consider groups of translations and prove several versions of the

universal approximation theorem for convolutional networks in the limit of

continuous signals on euclidean spaces. Finally, we consider 2D signal

transformations equivariant with respect to the group SE(2) of rigid euclidean

motions. In this case we introduce the “charge–conserving convnet” — a

convnet-like computational model based on the decomposition of the feature

space into isotypic representations of SO(2). We prove this model to be a

universal approximator for continuous SE(2)–equivariant signal

transformations.

### Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Feiwen Zhu ,

Jeff Pool ,

Michael Andersch ,

Jeremy Appleyard ,

Fung Xie

Comments: Published as a conference paper at ICLR 2018

**Subjects**

:

Neural and Evolutionary Computing (cs.NE)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Recurrent Neural Networks (RNNs) are powerful tools for solving

sequence-based problems, but their efficacy and execution time are dependent on

the size of the network. Following recent work in simplifying these networks

with model pruning and a novel mapping of work onto GPUs, we design an

efficient implementation for sparse RNNs. We investigate several optimizations

and tradeoffs: Lamport timestamps, wide memory loads, and a bank-aware weight

layout. With these optimizations, we achieve speedups of over 6x over the next

best algorithm for a hidden layer of size 2304, batch size of 4, and a density

of 30%. Further, our technique allows for models of over 5x the size to fit on

a GPU for a speedup of 2x, enabling larger networks to help advance the

state-of-the-art. We perform case studies on NMT and speech recognition tasks

in the appendix, accelerating their recurrent layers by up to 3x.

### Stacked U-Nets: A No-Frills Approach to Natural Image Segmentation

Sohil Shah ,

Pallabi Ghosh ,

Larry S Davis ,

Tom Goldstein

Comments: The code is available at this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Many imaging tasks require global information about all pixels in an image.

Conventional bottom-up classification networks globalize information by

decreasing resolution; features are pooled and downsampled into a single

output. But for semantic segmentation and object detection tasks, a network

must provide higher-resolution pixel-level outputs. To globalize information

while preserving resolution, many researchers propose the inclusion of

sophisticated auxiliary blocks, but these come at the cost of a considerable

increase in network size and computational cost. This paper proposes stacked

u-nets (SUNets), which iteratively combine features from different resolution

scales while maintaining resolution. SUNets leverage the information

globalization power of u-nets in a deeper network architectures that is capable

of handling the complexity of natural images. SUNets perform extremely well on

semantic segmentation tasks using a small number of parameters.

### IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification

Sam Leroux ,

Pavlo Molchanov ,

Pieter Simoens ,

Bart Dhoedt ,

Thomas Breuel ,

Jan Kautz

Comments: ICLR 2018 Workshop track

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Deep residual networks (ResNets) made a recent breakthrough in deep learning.

The core idea of ResNets is to have shortcut connections between layers that

allow the network to be much deeper while still being easy to optimize avoiding

vanishing gradients. These shortcut connections have interesting side-effects

that make ResNets behave differently from other typical network architectures.

In this work we use these properties to design a network based on a ResNet but

with parameter sharing and with adaptive computation time. The resulting

network is much smaller than the original network and can adapt the

computational cost to the complexity of the input image.

## Computer Vision and Pattern Recognition

### A matrix-free approach to parallel and memory-efficient deformable image registration

Lars König ,

Jan Rühaak ,

Alexander Derksen ,

Jan Lellmann

Comments: Accepted for publication in SIAM Journal on Scientific Computing (SISC)

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

We present a novel computational approach to fast and memory-efficient

deformable image registration. In the variational registration model, the

computation of the objective function derivatives is the computationally most

expensive operation, both in terms of runtime and memory requirements. In order

to target this bottleneck, we analyze the matrix structure of gradient and

Hessian computations for the case of the normalized gradient fields distance

measure and curvature regularization. Based on this analysis, we derive

equivalent matrix-free closed-form expressions for derivative computations,

eliminating the need for storing intermediate results and the costs of sparse

matrix arithmetic. This has further benefits: (1) matrix computations can be

fully parallelized, (2) memory complexity for derivative computation is reduced

from linear to constant, and (3) overall computation times are substantially

reduced.

In comparison with an optimized matrix-based reference implementation, the

CPU implementation achieves speedup factors between 3.1 and 9.7, and we are

able to handle substantially higher resolutions. Using a GPU implementation, we

achieve an additional speedup factor of up to 9.2.

Furthermore, we evaluated the approach on real-world medical datasets. On ten

publicly available lung CT images from the DIR-Lab 4DCT dataset, we achieve the

best mean landmark error of 0.93 mm compared to other submissions on the

DIR-Lab website, with an average runtime of only 9.23 s. Complete non-rigid

registration of full-size 3D thorax-abdomen CT volumes from oncological

follow-up is achieved in 12.6 s. The experimental results show that the

proposed matrix-free algorithm enables the use of variational registration

models also in applications which were previously impractical due to memory or

runtime restrictions.

### Crossbar-Net: A Novel Convolutional Network for Kidney Tumor Segmentation in CT Images

Qian Yu ,

Yinhuan Shi ,

Jinquan Sun ,

Yang Gao ,

Yakang Dai ,

Jianbing Zhu

Comments: This paper includes 7 pages and 13 figures

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Due to the irregular motion, similar appearance and diverse shape, accurate

segmentation of kidney tumor in CT images is a difficult and challenging task.

To this end, we present a novel automatic segmentation method, termed as

Crossbar-Net, with the goal of accurate segmenting the kidney tumors. Firstly,

considering that the traditional learning-based segmentation methods normally

employ either whole images or squared patches as the training samples, we

innovatively sample the orthogonal non-squared patches (namely crossbar

patches), to fully cover the whole kidney tumors in either horizontal or

vertical directions. These sampled crossbar patches could not only represent

the detailed local information of kidney tumor as the traditional patches, but

also describe the global appearance from either horizontal or vertical

direction using contextual information. Secondly, with the obtained crossbar

patches, we trained a convolutional neural network with two sub-models (i.e.,

horizontal sub-model and vertical sub-model) in a cascaded manner, to integrate

the segmentation results from two directions (i.e., horizontal and vertical).

This cascaded training strategy could effectively guarantee the consistency

between sub-models, by feeding each other with the most difficult samples, for

a better segmentation. In the experiment, we evaluate our method on a real CT

kidney tumor dataset, collected from 94 different patients including 3,500

images. Compared with the state-of-the-art segmentation methods, the results

demonstrate the superior results of our method on dice ratio score, true

positive fraction, centroid distance and Hausdorff distance. Moreover, we have

extended our crossbar-net to a different task: cardiac segmentation, showing

the promising results for the better generalization.

Interactive Medical Image Segmentation via Point-Based Interaction and Sequential Patch Learning

Jinquan Sun , Yinghuan Shi , Yang Gao , Lei Wang , Luping Zhou , Wanqi Yang , Dinggang Shen **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Due to low tissue contrast, irregular object appearance, and unpredictable

location variation, segmenting the objects from different medical imaging

modalities (e.g., CT, MR) is considered as an important yet challenging task.

In this paper, we present a novel method for interactive medical image

segmentation with the following merits. (1) Our design is fundamentally

different from previous pure patch-based and image-based segmentation methods.

We observe that during delineation, the physician repeatedly check the

inside-outside intensity changing to determine the boundary, which indicates

that comparison in an inside-outside manner is extremely important. Thus, we

innovatively model our segmentation task as learning the representation of the

bi-directional sequential patches, starting from (or ending in) the given

central point of the object. This can be realized by our proposed ConvRNN

network embedded with a gated memory propagation unit. (2) Unlike previous

interactive methods (requiring bounding box or seed points), we only ask the

physician to merely click on the rough central point of the object before

segmentation, which could simultaneously enhance the performance and reduce the

segmentation time. (3) We utilize our method in a multi-level framework for

better performance. We systematically evaluate our method in three different

segmentation tasks including CT kidney tumor, MR prostate, and PROMISE12

challenge, showing promising results compared with state-of-the-art methods.

The code is available here:

href{ this https URL }{Sequential-patch-based-segmentation}.

### Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders

Ananya Harsh Jha , Saket Anand , Maneesh Singh , V. S. R. Veeravasarapu **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Generative models that learn disentangled representations for different

factors of variation in an image can be very useful for targeted data

augmentation. By sampling from the disentangled latent subspace of interest, we

can efficiently generate new data necessary for a particular task. Learning

disentangled representations is a challenging problem, especially when certain

factors of variation are difficult to label. In this paper, we introduce a

novel architecture that disentangles the latent space into two complementary

subspaces by using only weak supervision in form of pairwise similarity labels.

Inspired by the recent success of cycle-consistent adversarial architectures,

we use cycle-consistency in a variational auto-encoder framework. Our

non-adversarial approach is in contrast with the recent works that combine

adversarial training with auto-encoders to disentangle representations. We show

compelling results of disentangled latent subspaces on three datasets and

compare with recent works that leverage adversarial training.

### A generalizable approach for multi-view 3D human pose regression

Abdolrahim Kadkhodamohammadi ,

Nicolas Padoy

Comments: The supplementary video is available at this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Despite the significant improvement in the performance of monocular pose

estimation approaches and their ability to generalize to unseen environments,

multi-view (MV) approaches are often lagging behind in terms of accuracy and

are specific to certain datasets. This is mainly due to the fact that (1)

contrary to real world single-view (SV) datasets, MV datasets are often

captured in controlled environments to collect precise 3D annotations, which do

not cover all real world challenges, and (2) the model parameters are learned

for specific camera setups. To alleviate these problems, we propose a two-stage

approach to detect and estimate 3D human poses, which separates SV pose

detection from MV 3D pose estimation. This separation enables us to utilize

each dataset for the right task, i.e. SV datasets for constructing robust pose

detection models and MV datasets for constructing precise MV 3D regression

models. In addition, our 3D regression approach only requires 3D pose data and

its projections to the views for building the model, hence removing the need

for collecting annotated data from the test setup. Our approach can therefore

be easily generalized to a new environment by simply projecting 3D poses into

2D during training according to the camera setup used at test time. As 2D poses

are collected at test time using a SV pose detector, which might generate

inaccurate detections, we model its characteristics and incorporate this

information during training. We demonstrate that incorporating the detector’s

characteristics is important to build a robust 3D regression model and that the

resulting regression model generalizes well to new MV environments. Our

evaluation results show that our approach achieves competitive results on the

Human3.6M dataset and significantly improves results on a MV clinical dataset

that is the first MV dataset generated from live surgery recordings.

### Bound and Conquer: Improving Triangulation by Enforcing Consistency

Adam Scholefield ,

Alireza Ghasemi ,

Martin Vetterli

Comments: 8 pages, 4 figures, Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

We study the accuracy of triangulation in multi-camera systems with respect

to the number of cameras. We show that, under certain conditions, the optimal

achievable reconstruction error decays quadratically as more cameras are added

to the system. Furthermore, we analyse the error decay-rate of major

state-of-the-art algorithms with respect to the number of cameras. To this end,

we introduce the notion of consistency for triangulation, and show that

consistent reconstruction algorithms achieve the optimal quadratic decay, which

is asymptotically faster than some other methods. Finally, we present

simulations results supporting our findings. Our simulations have been

implemented in MATLAB and the resulting code is available in the supplementary

material.

### Localized Traffic Sign Detection with Multi-scale Deconvolution Networks

Songwen Pei , Fuwu Tang , Yanfei Ji , Jing Fan , Zhong Ning **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Autonomous driving is becoming a future practical lifestyle greatly driven by

deep learning. Specifically, an effective traffic sign detection by deep

learning plays a critical role for it. To address the issues of taking amount

of time to compute complicate algorithm and low ratio of detecting blurred and

sub-pixel images of traffic sign, we propose Multi-scale Deconvolution

Networks(MDN) by flexibly combining multi-scale convolutional neural network

with deconvolution sub-network. It is certified that the MDN is effective by

comparing with classical algorithms on the benchmarks of the localized traffic

sign, such as Chinese Traffic Sign Dataset(CTSD), and the German Traffic Sign

Benchmarks(GTSRB).

### Open Set Domain Adaptation by Backpropagation

Kuniaki Saito , Shohei Yamamoto , Yoshitaka Ushiku , Tatsuya Harada **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Numerous algorithms have been proposed for transferring knowledge from a

label-rich domain (source) to a label-scarce domain (target). Almost all of

them are proposed for a closed-set scenario, where the source and the target

domain completely share the class of their samples. We call the shared class

the doublequote{known class.} However, in practice, when samples in target

domain are not labeled, we cannot know whether the domains share the class. A

target domain can contain samples of classes that are not shared by the source

domain. We call such classes the doublequote{unknown class} and algorithms

that work well in the open set situation are very practical. However, most

existing distribution matching methods for domain adaptation do not work well

in this setting because unknown target samples should not be aligned with the

source.

In this paper, we propose a method for an open set domain adaptation scenario

which utilizes adversarial training. A classifier is trained to make a boundary

between the source and the target samples whereas a generator is trained to

make target samples far from the boundary. Thus, we assign two options to the

feature generator: aligning them with source known samples or rejecting them as

unknown target samples. This approach allows extracting features that separate

unknown target samples from known target samples. Our method was extensively

evaluated in domain adaptation setting and outperformed other methods with a

large margin in most settings.

### Automatic classification of trees using a UAV onboard camera and deep learning

Comments: 9 pages, 3 figures, 5 tables

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

Automatic classification of trees using remotely sensed data has been a dream

of many scientists and land use managers. Recently, Unmanned aerial vehicles

(UAV) has been expected to be an easy-to-use, cost-effective tool for remote

sensing of forests, and deep learning has attracted attention for its ability

concerning machine vision. In this study, using a commercially available UAV

and a publicly available package for deep learning, we constructed a machine

vision system for the automatic classification of trees. In our method, we

segmented a UAV photography image of forest into individual tree crowns and

carried out object-based deep learning. As a result, the system was able to

classify 7 tree types at 89.0% accuracy. This performance is notable because we

only used basic RGB images from a standard UAV. In contrast, most of previous

studies used expensive hardware such as multispectral imagers to improve the

performance. This result means that our method has the potential to classify

individual trees in a cost-effective manner. This can be a usable tool for many

forest researchers and managements.

### dhSegment: A generic deep-learning approach for document segmentation

Sofia Ares Oliveira ,

Benoit Seguin ,

Frederic Kaplan (Digital Humanities Laboratory, EPFL, Switzerland)

Comments: (*) Equal contribution

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

In recent years there have been multiple successful attempts tackling

document processing problems separately by designing task specific hand-tuned

strategies. We argue that the diversity of historical document processing tasks

prohibits to solve them one at a time and shows a need for designing generic

approaches in order to handle the variability of historical series. In this

paper, we address multiple tasks simultaneously such as page extraction,

baseline extraction, layout analysis or multiple typologies of illustrations

and photograph extraction. We propose an open-source implementation of a

CNN-based pixel-wise predictor coupled with task dependent post-processing

blocks. We show that a single CNN-architecture can be used across tasks with

competitive results. Moreover most of the task-specific post-precessing steps

can be decomposed in a small number of simple and standard reusable operations,

adding to the flexibility of our approach.

### Online Convolutional Sparse Coding with Sample-Dependent Dictionary

Yaqing Wang , Quanming Yao , James T. Kwok , Lionel M. Ni **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Convolutional sparse coding (CSC) has been popularly used for the learning of

shift-invariant dictionaries in image and signal processing. However, existing

methods have limited scalability. In this paper, instead of convolving with a

dictionary shared by all samples, we propose the use of a sample-dependent

dictionary in which filters are obtained as linear combinations of a small set

of base filters learned from the data. This added flexibility allows a large

number of sample-dependent patterns to be captured, while the resultant model

can still be efficiently learned by online learning. Extensive experimental

results show that the proposed method outperforms existing CSC algorithms with

significantly reduced time and space requirements.

### An Element Sensitive Saliency Model with Position Prior Learning for Web Pages

Jie Chang ,

Ya Zhang ,

Yanfeng Wang

Comments: 15 pages, 9 figures, 2 tabels

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

Understanding human visual attention is important for multimedia

applications. Many studies have attempted to learn from eye-tracking data and

build computational saliency prediction models. However, limited efforts have

been devoted to saliency prediction for Web pages, which are characterized by

more diverse content elements and spatial layouts. In this paper, we propose a

novel end-to-end deep generative saliency model for Web pages. To capture

position biases introduced by page layouts, a Position Prior Learning

sub-network is proposed, which models position biases as multivariate Gaussian

distribution using variational auto-encoder. To model different elements of a

Web page, a Multi Discriminative Region Detection (MDRD) branch and a Text

Region Detection(TRD) branch are introduced, which target to extract

discriminative localizations and “prominent” text regions likely to correspond

to human attention, respectively. We validate the proposed model with FiWI, a

public Web-page dataset, and shows that the proposed model outperforms the

state-of-art models for Web-page saliency prediction.

### Stacked U-Nets: A No-Frills Approach to Natural Image Segmentation

Sohil Shah ,

Pallabi Ghosh ,

Larry S Davis ,

Tom Goldstein

Comments: The code is available at this https URL

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Neural and Evolutionary Computing (cs.NE)

Many imaging tasks require global information about all pixels in an image.

Conventional bottom-up classification networks globalize information by

decreasing resolution; features are pooled and downsampled into a single

output. But for semantic segmentation and object detection tasks, a network

must provide higher-resolution pixel-level outputs. To globalize information

while preserving resolution, many researchers propose the inclusion of

sophisticated auxiliary blocks, but these come at the cost of a considerable

increase in network size and computational cost. This paper proposes stacked

u-nets (SUNets), which iteratively combine features from different resolution

scales while maintaining resolution. SUNets leverage the information

globalization power of u-nets in a deeper network architectures that is capable

of handling the complexity of natural images. SUNets perform extremely well on

semantic segmentation tasks using a small number of parameters.

### Latent Fingerprint Recognition: Role of Texture Template

Kai Cao , Anil K. Jain **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

We propose a texture template approach, consisting of a set of virtual

minutiae, to improve the overall latent fingerprint recognition accuracy. To

compensate for the lack of sufficient number of minutiae in poor quality latent

prints, we generate a set of virtual minutiae. However, due to a large number

of these regularly placed virtual minutiae, texture based template matching has

a large computational requirement compared to matching true minutiae templates.

To improve both the accuracy and efficiency of the texture template matching,

we investigate: i) both original and enhanced fingerprint patches for training

convolutional neural networks (ConvNets) to improve the distinctiveness of

descriptors associated with each virtual minutiae, ii) smaller patches around

virtual minutiae and a fast ConvNet architecture to speed up descriptor

extraction, iii) reduce the descriptor length, iv) a modified hierarchical

graph matching strategy to improve the matching speed, and v) extraction of

multiple texture templates to boost the performance. Experiments on NIST SD27

latent database show that the above strategies can improve the matching speed

from 11 ms (24 threads) per comparison (between a latent and a reference print)

to only 7.7 ms (single thread) per comparison while improving the rank-1

accuracy by 8.9% against 10K gallery.

### Adversarial Training of Variational Auto-encoders for High Fidelity Image Generation

Salman H. Khan , Munawar Hayat , Nick Barnes **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Variational auto-encoders (VAEs) provide an attractive solution to image

generation problem. However, they tend to produce blurred and over-smoothed

images due to their dependence on pixel-wise reconstruction loss. This paper

introduces a new approach to alleviate this problem in the VAE based generative

models. Our model simultaneously learns to match the data, reconstruction loss

and the latent distributions of real and fake images to improve the quality of

generated samples. To compute the loss distributions, we introduce an

auto-encoder based discriminator model which allows an adversarial learning

procedure. The discriminator in our model also provides perceptual guidance to

the VAE by matching the learned similarity metric of the real and fake samples

in the latent space. To stabilize the overall training process, our model uses

an error feedback approach to maintain the equilibrium between competing

networks in the model. Our experiments show that the generated samples from our

proposed model exhibit a diverse set of attributes and facial expressions and

scale up to high-resolution images very well.

Pushing the Limits of Unconstrained Face Detection: a Challenge Dataset and Baseline Results

Hajime Nada , Vishwanath A. Sindagi , He Zhang , Vishal M. Patel **Subjects** : Computer Vision and Pattern Recognition (cs.CV)

Face detection has witnessed immense progress in the last few years, with new

milestones being surpassed every year. While many challenges such as large

variations in scale, pose, appearance are successfully addressed, there still

exist several issues which are not specifically captured by existing methods or

datasets. In this work, we identify the next set of challenges that requires

attention from the research community and collect a new dataset of face images

that involve these issues such as weather-based degradations, motion blur,

focus blur and several others. We demonstrate that there is a considerable gap

in the performance of state-of-the-art detectors and real-world requirements.

Hence, in an attempt to fuel further research in unconstrained face detection,

we present a new annotated Unconstrained Face Detection Dataset (UFDD) with

several challenges and benchmark recent methods. Additionally, we provide an

in-depth analysis of the results and failure cases of these methods. The

dataset as well as baseline results will be made publicly available in due

time.

### Variational Regularization of Inverse Problems for Manifold-Valued Data

Martin Storath , Andreas Weinmann **Subjects** : Numerical Analysis (math.NA) ; Computer Vision and Pattern Recognition (cs.CV); Differential Geometry (math.DG)

In this paper, we consider the variational regularization of manifold-valued

data in the inverse problems setting. In particular, we consider TV and TGV

regularization for manifold-valued data with indirect measurement operators. We

provide results on the well-posedness and present algorithms for a numerical

realization of these models in the manifold setup. Further, we provide

experimental results for synthetic and real data to show the potential of the

proposed schemes for applications.

### Network Transplanting

Quanshi Zhang , Yu Yang , Ying Nian Wu , Song-Chun Zhu **Subjects** : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

This paper focuses on a novel problem, i.e., transplanting a

category-and-task-specific neural network to a generic, distributed network

without strong supervision. Like playing LEGO blocks, incrementally

constructing a generic network by asynchronously merging specific neural

networks is a crucial bottleneck for deep learning. Suppose that the

pre-trained specific network contains a module (f) to extract features of the

target category, and the generic network has a module (g) for a target task,

which is trained using other categories except for the target category. Instead

of using numerous training samples to teach the generic network a new category,

we aim to learn a small adapter module to connect (f) and (g) to accomplish the

task on a target category in a weakly-supervised manner. The core challenge is

to efficiently learn feature projections between the two connected modules. We

propose a new distillation algorithm, which exhibited superior performance. Our

method without training samples even significantly outperformed the baseline

with 100 training samples.

Anis Davoudi , Kumar Rohit Malhotra , Benjamin Shickel , Scott Siegel , Seth Williams , Matthew Ruppert , Emel Bihorac , Tezcan Ozrazgat-Baslanti , Patrick J. Tighe , Azra Bihorac , Parisa Rashidi **Subjects** : Human-Computer Interaction (cs.HC) ; Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP)

Currently, many critical care indices are repetitively assessed and recorded

by overburdened nurses, e.g. physical function or facial pain expressions of

nonverbal patients. In addition, many essential information on patients and

their environment are not captured at all, or are captured in a non-granular

manner, e.g. sleep disturbance factors such as bright light, loud background

noise, or excessive visitations. In this pilot study, we examined the

feasibility of using pervasive sensing technology and artificial intelligence

for autonomous and granular monitoring of critically ill patients and their

environment in the Intensive Care Unit (ICU). As an exemplar prevalent

condition, we also characterized delirious and non-delirious patients and their

environment. We used wearable sensors, light and sound sensors, and a

high-resolution camera to collected data on patients and their environment. We

analyzed collected data using deep learning and statistical analysis. Our

system performed face detection, face recognition, facial action unit

detection, head pose detection, facial expression recognition, posture

recognition, actigraphy analysis, sound pressure and light level detection, and

visitation frequency detection. We were able to detect patient’s face (Mean

average precision (mAP)=0.94), recognize patient’s face (mAP=0.80), and their

postures (F1=0.94). We also found that all facial expressions, 11 activity

features, visitation frequency during the day, visitation frequency during the

night, light levels, and sound pressure levels during the night were

significantly different between delirious and non-delirious patients

(p-value<0.05). In summary, we showed that granular and autonomous monitoring

of critically ill patients and their environment is feasible and can be used

for characterizing critical care conditions and related environment factors.

## Artificial Intelligence

Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum Objectives

Krishnendu Chatterjee ,

Adrián Elgyütt ,

Petr Novotný ,

Owen Rouillé

Comments: Preliminary version

**Subjects**

:

Artificial Intelligence (cs.AI)

Partially-observable Markov decision processes (POMDPs) with discounted-sum

payoff are a standard framework to model a wide range of problems related to

decision making under uncertainty. Traditionally, the goal has been to obtain

policies that optimize the expectation of the discounted-sum payoff. A key

drawback of the expectation measure is that even low probability events with

extreme payoff can significantly affect the expectation, and thus the obtained

policies are not necessarily risk-averse. An alternate approach is to optimize

the probability that the payoff is above a certain threshold, which allows

obtaining risk-averse policies, but ignores optimization of the expectation. We

consider the expectation optimization with probabilistic guarantee (EOPG)

problem, where the goal is to optimize the expectation ensuring that the payoff

is above a given threshold with at least a specified probability. We present

several results on the EOPG problem, including the first algorithm to solve it.

### Routing Driverless Transport Vehicles in Car Assembly with Answer Set Programming

Martin Gebser ,

Philipp Obermeier ,

Michel Ratsch-Heitmann ,

Mario Runge ,

Torsten Schaub

Comments: Paper presented at the 34nd International Conference on Logic Programming (ICLP 2018), Oxford, UK, July 14 to July 17, 2018; 15 pages, LaTeX, 3 figures

**Subjects**

:

Artificial Intelligence (cs.AI)

Automated storage and retrieval systems are principal components of modern

production and warehouse facilities. In particular, automated guided vehicles

nowadays substitute human-operated pallet trucks in transporting production

materials between storage locations and assembly stations. While low-level

control systems take care of navigating such driverless vehicles along

programmed routes and avoid collisions even under unforeseen circumstances, in

the common case of multiple vehicles sharing the same operation area, the

problem remains how to set up routes such that a collection of transport tasks

is accomplished most effectively. We address this prevalent problem in the

context of car assembly at Mercedes-Benz Ludwigsfelde GmbH, a large-scale

producer of commercial vehicles, where routes for automated guided vehicles

used in the production process have traditionally been hand-coded by human

engineers. Such ad-hoc methods may suffice as long as a running production

process remains in place, while any change in the factory layout or production

targets necessitates tedious manual reconfiguration, not to mention the missing

portability between different production plants. Unlike this, we propose a

declarative approach based on Answer Set Programming to optimize the routes

taken by automated guided vehicles for accomplishing transport tasks. The

advantages include a transparent and executable problem formalization, provable

optimality of routes relative to objective criteria, as well as elaboration

tolerance towards particular factory layouts and production targets. Moreover,

we demonstrate that our approach is efficient enough to deal with the transport

tasks evolving in realistic production processes at the car factory of

Mercedes-Benz Ludwigsfelde GmbH.

### Experimenting with robotic intra-logistics domains

Martin Gebser ,

Philipp Obermeier ,

Thomas Otto ,

Torsten Schaub ,

Orkunt Sabuncu ,

Van Nguyen ,

Tran Cao Son

Comments: Paper presented at the 34nd International Conference on Logic Programming (ICLP 2018), Oxford, UK, July 14 to July 17, 2018 18 pages, LaTeX, 8 PDF figures (arXiv:YYMM.NNNNN)

**Subjects**

:

Artificial Intelligence (cs.AI)

We introduce the asprilo [1] framework to facilitate experimental studies of

approaches addressing complex dynamic applications. For this purpose, we have

chosen the domain of robotic intra-logistics. This domain is not only highly

relevant in the context of today’s fourth industrial revolution but it moreover

combines a multitude of challenging issues within a single uniform framework.

This includes multi-agent planning, reasoning about action, change, resources,

strategies, etc. In return, asprilo allows users to study alternative solutions

as regards effectiveness and scalability. Although asprilo relies on Answer Set

Programming and Python, it is readily usable by any system complying with its

fact-oriented interface format. This makes it attractive for benchmarking and

teaching well beyond logic programming. More precisely, asprilo consists of a

versatile benchmark generator, solution checker and visualizer as well as a

bunch of reference encodings featuring various ASP techniques. Importantly, the

visualizer’s animation capabilities are indispensable for complex scenarios

like intra-logistics in order to inspect valid as well as invalid solution

candidates. Also, it allows for graphically editing benchmark layouts that can

be used as a basis for generating benchmark suites.

[1] asprilo stands for Answer Set Programming for robotic intra-logistics

### Temporal Answer Set Programming on Finite Traces

Pedro Cabalar ,

Roland Kaminski ,

Torsten Schaub ,

Anna Schuhmann

Comments: Paper presented at the 34nd International Conference on Logic Programming (ICLP 2018), Oxford, UK, July 14 to July 17, 2018 15 pages, LaTeX, 0 PDF figures (arXiv:YYMM.NNNNN)

**Subjects**

:

Artificial Intelligence (cs.AI)

In this paper, we introduce an alternative approach to Temporal Answer Set

Programming that relies on a variation of Temporal Equilibrium Logic (TEL) for

finite traces. This approach allows us to even out the expressiveness of TEL

over infinite traces with the computational capacity of (incremental) Answer

Set Programming (ASP). Also, we argue that finite traces are more natural when

reasoning about action and change. As a result, our approach is readily

implementable via multi-shot ASP systems and benefits from an extension of

ASP’s full-fledged input language with temporal operators. This includes future

as well as past operators whose combination offers a rich temporal modeling

language. For computation, we identify the class of temporal logic programs and

prove that it constitutes a normal form for our approach. Finally, we outline

two implementations, a generic one and an extension of clingo.

### An improvement of the convergence proof of the ADAM-Optimizer

Sebastian Bock , Josef Goppold , Martin Weiß **Subjects** : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

A common way to train neural networks is the Backpropagation. This algorithm

includes a gradient descent method, which needs an adaptive step size. In the

area of neural networks, the ADAM-Optimizer is one of the most popular adaptive

step size methods. It was invented in cite{Kingma.2015} by Kingma and Ba. The

(5865) citations in only three years shows additionally the importance of the

given paper. We discovered that the given convergence proof of the optimizer

contains some mistakes, so that the proof will be wrong. In this paper we give

an improvement to the convergence proof of the ADAM-Optimizer.

### Persistent Monitoring of Stochastic Spatio-temporal Phenomena with a Small Team of Robots

Comments: Robotics Science and Systems, 2014 (RSS-14)

**Subjects**

:

Robotics (cs.RO)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

This paper presents a solution for persistent monitoring of real-world

stochastic phenomena, where the underlying covariance structure changes sharply

across time, using a small number of mobile robot sensors. We propose an

adaptive solution for the problem where stochastic real-world dynamics are

modeled as a Gaussian Process (GP). The belief on the underlying covariance

structure is learned from recently observed dynamics as a Gaussian Mixture (GM)

in the low-dimensional hyper-parameters space of the GP and adapted across time

using Sequential Monte Carlo methods. Each robot samples a belief point from

the GM and locally optimizes a set of informative regions by greedy

maximization of the submodular entropy function. The key contributions of this

paper are threefold: adapting the belief on the covariance using Markov Chain

Monte Carlo (MCMC) sampling such that particles survive even under sharp

covariance changes across time; exploiting the belief to transform the problem

of entropy maximization into a decentralized one; and developing an

approximation algorithm to maximize entropy on a set of informative regions in

the continuous space. We illustrate the application of the proposed solution

through extensive simulations using an artificial dataset and multiple real

datasets from fixed sensor deployments, and compare it to three competing

state-of-the-art approaches.

### Interaction-Aware Probabilistic Behavior Prediction in Urban Environments

Jens Schulz ,

Constantin Hubmann ,

Julian Löchner ,

Darius Burschka

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

**Subjects**

:

Robotics (cs.RO)

; Artificial Intelligence (cs.AI)

Planning for autonomous driving in complex, urban scenarios requires accurate

trajectory prediction of the surrounding drivers. Their future behavior depends

on their route intentions, the road-geometry, traffic rules and mutual

interaction, resulting in interdependencies between their trajectories. We

present a probabilistic prediction framework based on a dynamic Bayesian

network, which represents the state of the complete scene including all agents

and respects the aforementioned dependencies. We propose Markovian,

context-dependent motion models to define the interaction-aware behavior of

drivers. At first, the state of the dynamic Bayesian network is estimated over

time by tracking the single agents via sequential Monte Carlo inference.

Secondly, we perform a probabilistic forward simulation of the network’s

estimated belief state to generate the different combinatorial scene

developments. This provides the corresponding trajectories for the set of

possible, future scenes. Our framework can handle various road layouts and

number of traffic participants. We evaluate the approach in online simulations

and real-world scenarios. It is shown that our interaction-aware prediction

outperforms interaction-unaware physics- and map-based approaches.

### Generalized Logical Operations among Conditional Events

Angelo Gilio , Giuseppe Sanfilippo **Subjects** : Probability (math.PR) ; Artificial Intelligence (cs.AI); Logic (math.LO)

We generalize, by a progressive procedure, the notions of conjunction and

disjunction of two conditional events to the case of (n) conditional events. In

our coherence-based approach, conjunctions and disjunctions are suitable

conditional random quantities. We define the notion of negation, by verifying

De Morgan’s Laws. We also show that conjunction and disjunction satisfy the

associative and commutative properties, and a monotonicity property. Then, we

give some results on coherence of prevision assessments for some families of

compounded conditionals; in particular we examine the Fr’echet-Hoeffding

bounds. Moreover, we study the reverse probabilistic inference from the

conjunction (mathcal{C}_{n+1}) of (n+1) conditional events to the family

({mathcal{C}_{n},E_{n+1}|H_{n+1}}). We consider the relation with the notion

of quasi-conjunction and we examine in detail the coherence of the prevision

assessments related with the conjunction of three conditional events. Based on

conjunction, we also give a characterization of p-consistency and of

p-entailment, with applications to several inference rules in probabilistic

nonmonotonic reasoning. Finally, we examine some non p-valid inference rules;

then, we illustrate by an example two methods which allow to suitably modify

non p-valid inference rules in order to get inferences which are p-valid.

Shabnam Sadeghi Esfahlani ,

Silvia Cirstea ,

Alireza Sanaei ,

George Wilson

Comments: 8 pages, 7 figures, 17096690

Journal-ref: 19-21 June 2017; 978-1-5090-1412-5; 978-1-5090-1413-2; 2163-5145

**Subjects**

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Robotics (cs.RO)

Rehabilitation robotics combined with video game technology provides a means

of assisting in the rehabilitation of patients with neuromuscular disorders by

performing various facilitation movements. The current work presents ReHabGame,

a serious game using a fusion of implemented technologies that can be easily

used by patients and therapists to assess and enhance sensorimotor performance

and also increase the activities in the daily lives of patients. The game

allows a player to control avatar movements through a Kinect Xbox, Myo armband

and rudder foot pedal, and involves a series of reach-grasp-collect tasks whose

difficulty levels are learnt by a fuzzy interface. The orientation, angular

velocity, head and spine tilts and other data generated by the player are

monitored and saved, whilst the task completion is calculated by solving an

inverse kinematics algorithm which orientates the upper limb joints of the

avatar. The different values in upper body quantities of movement provide fuzzy

input from which crisp output is determined and used to generate an appropriate

subsequent rehabilitation game level. The system can thus provide personalised,

autonomously-learnt rehabilitation programmes for patients with neuromuscular

disorders with superior predictions to guide the development of improved

clinical protocols compared to traditional theraputic activities.

### Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

Jie Tan ,

Tingnan Zhang ,

Erwin Coumans ,

Atil Iscen ,

Yunfei Bai ,

Danijar Hafner ,

Steven Bohez ,

Vincent Vanhoucke

Comments: Accompanying video: this https URL

**Subjects**

:

Robotics (cs.RO)

; Artificial Intelligence (cs.AI)

Designing agile locomotion for quadruped robots often requires extensive

expertise and tedious manual tuning. In this paper, we present a system to

automate this process by leveraging deep reinforcement learning techniques. Our

system can learn quadruped locomotion from scratch using simple reward signals.

In addition, users can provide an open loop reference to guide the learning

process when more control over the learned gait is needed. The control policies

are learned in a physics simulator and then deployed on real robots. In

robotics, policies trained in simulation often do not transfer to the real

world. We narrow this reality gap by improving the physics simulator and

learning robust policies. We improve the simulation using system

identification, developing an accurate actuator model and simulating latency.

We learn robust controllers by randomizing the physical environments, adding

perturbations and designing a compact observation space. We evaluate our system

on two agile locomotion gaits: trotting and galloping. After learning in

simulation, a quadruped robot can successfully perform both gaits in the real

world.

### Sounding Board: A User-Centric and Content-Driven Social Chatbot

Hao Fang ,

Hao Cheng ,

Maarten Sap ,

Elizabeth Clark ,

Ari Holtzman ,

Yejin Choi ,

Noah A. Smith ,

Mari Ostendorf

Comments: 5 pages, 3 figures, NAACL 2018

**Subjects**

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa

Prize. The system architecture consists of several components including spoken

language processing, dialogue management, language generation, and content

management, with emphasis on user-centric and content-driven design. We also

share insights gained from large-scale online logs based on 160,000

conversations with real-world users.

## Information Retrieval

Nikolaos Polatidis , Elias Pimenidis , Michalis Pavlidis , Spyridon Papastergiou , Haralambos Mouratidis **Subjects** : Information Retrieval (cs.IR)

Modern information society depends on reliable functionality of information

systems infrastructure, while at the same time the number of cyber-attacks has

been increasing over the years and damages have been caused. Furthermore,

graphs can be used to show paths than can be exploited by attackers to intrude

into systems and gain unauthorized access through vulnerability exploitation.

This paper presents a method that builds attack graphs using data supplied from

the maritime supply chain infrastructure. The method delivers all possible

paths that can be exploited to gain access. Then, a recommendation system is

utilized to make predictions about future attack steps within the network. We

show that recommender systems can be used in cyber defense by predicting

attacks. The goal of this paper is to identify attack paths and show how a

recommendation method can be used to classify future cyber-attacks in terms of

risk management. The proposed method has been experimentally evaluated and

validated, with the results showing that it is both practical and effective.

## Computation and Language

Tianze Shi ,

Carlos Gómez-Rodríguez ,

Lillian Lee

Comments: Proceedings of NAACL-HLT 2018. 6 pages

Journal-ref: Proceedings of NAACL-HLT 2018

**Subjects**

:

Computation and Language (cs.CL)

We generalize Cohen, G’omez-Rodr’iguez, and Satta’s (2011) parser to a

family of non-projective transition-based dependency parsers allowing

polynomial-time exact inference. This includes novel parsers with better

coverage than Cohen et al. (2011), and even a variant that reduces time

complexity to (O(n^6)), improving over the known bounds in exact inference for

non-projective transition-based parsing. We hope that this piece of theoretical

work inspires design of novel transition systems with better coverage and

better run-time guarantees.

Code available at this https URL

### Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading

Martin Raison , Pierre-Emmanuel Mazaré , Rajarshi Das , Antoine Bordes **Subjects** : Computation and Language (cs.CL)

This paper aims at improving how machines can answer questions directly from

text, with the focus of having models that can answer correctly multiple types

of questions and from various types of texts, documents or even from large

collections of them. To that end, we introduce the Weaver model that uses a new

way to relate a question to a textual context by weaving layers of recurrent

networks, with the goal of making as few assumptions as possible as to how the

information from both question and context should be combined to form the

answer. We show empirically on six datasets that Weaver performs well in

multiple conditions. For instance, it produces solid results on the very

popular SQuAD dataset (Rajpurkar et al., 2016), solves almost all bAbI tasks

(Weston et al., 2015) and greatly outperforms state-of-the-art methods for open

domain question answering from text (Chen et al., 2017).

### Extracting Parallel Paragraphs from Common Crawl

Jakub Kúdela ,

Irena Holubová ,

Ondřej Bojar

Comments: Accepted to the Prague Bulletin of Mathematical Linguistics 107, April 2017

Journal-ref: The Prague Bulletin of Mathematical Linguistics, Volume 107, Issue

1, Pages 39-56, ISSN (Online) 1804-0462 (2017)

**Subjects**

:

Computation and Language (cs.CL)

Most of the current methods for mining parallel texts from the web assume

that web pages of web sites share same structure across languages. We believe

that there still exists a non-negligible amount of parallel data spread across

sources not satisfying this assumption. We propose an approach based on a

combination of bivec (a bilingual extension of word2vec) and locality-sensitive

hashing which allows us to efficiently identify pairs of parallel segments

located anywhere on pages of a given web domain, regardless their structure. We

validate our method on realigning segments from a large parallel corpus.

Another experiment with real-world data provided by Common Crawl Foundation

confirms that our solution scales to hundreds of terabytes large set of

web-crawled data.

### End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Zhong-Qiu Wang ,

Jonathan Le Roux ,

DeLiang Wang ,

John R. Hershey

Comments: Submitted to Interspeech 2018

**Subjects**

:

Sound (cs.SD)

; Computation and Language (cs.CL); Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

This paper proposes an end-to-end approach for single-channel

speaker-independent multi-speaker speech separation, where time-frequency (T-F)

masking, the short-time Fourier transform (STFT), and its inverse are

represented as layers within a deep network. Previous approaches, rather than

computing a loss on the reconstructed signal, used a surrogate loss based on

the target STFT magnitudes. This ignores reconstruction error introduced by

phase inconsistency. In our approach, the loss function is directly defined on

the reconstructed signals, which are optimized for best separation. In

addition, we train through unfolded iterations of a phase reconstruction

algorithm, represented as a series of STFT and inverse STFT layers. While mask

values are typically limited to lie between zero and one for approaches using

the mixture phase for reconstruction, this limitation is less relevant if the

estimated magnitudes are to be used together with phase reconstruction. We thus

propose several novel activation functions for the output layer of the T-F

masking, to allow mask values beyond one. On the publicly-available wsj0-2mix

dataset, our approach achieves state-of-the-art 12.6 dB scale-invariant

signal-to-distortion ratio (SI-SDR) and 13.1 dB SDR, revealing new

possibilities for deep learning based phase reconstruction and representing a

fundamental progress towards solving the notoriously-hard cocktail party

problem.

### Sounding Board: A User-Centric and Content-Driven Social Chatbot

Hao Fang ,

Hao Cheng ,

Maarten Sap ,

Elizabeth Clark ,

Ari Holtzman ,

Yejin Choi ,

Noah A. Smith ,

Mari Ostendorf

Comments: 5 pages, 3 figures, NAACL 2018

**Subjects**

:

Human-Computer Interaction (cs.HC)

; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa

Prize. The system architecture consists of several components including spoken

language processing, dialogue management, language generation, and content

management, with emphasis on user-centric and content-driven design. We also

share insights gained from large-scale online logs based on 160,000

conversations with real-world users.

## Distributed, Parallel, and Cluster Computing

### Recoverable Consensus in Shared Memory

Wojciech Golab **Subjects** : Distributed, Parallel, and Cluster Computing (cs.DC)

Herlihy’s consensus hierarchy is one of the most widely cited results in

distributed computing theory. It ranks the power of various synchronization

primitives for solving consensus in a model where asynchronous processes

communicate through shared memory and fail by halting. This paper revisits the

consensus hierarchy in a model with crash-recovery failures, where the

specification of consensus, called emph{recoverable consensus} in this paper,

is weakened by allowing non-terminating executions when a process fails

infinitely often. Two variations of this model are considered: independent

failures, and simultaneous (i.e., system-wide) failures. Universal primitives

such as Compare-And-Swap solve consensus easily in both models, and so the

contributions of the paper focus on lower levels of the hierarchy. We make

three contributions in that regard: (i) We prove that any primitive at level

two of Herlihy’s hierarchy remains at level two if simultaneous crash-recovery

failures are introduced. This is accomplished by transforming (one instance of)

any 2-process conventional consensus algorithm to a 2-process recoverable

consensus algorithm. (ii) For any (n > 1) and (f > 0), we show how to use (f+1)

instances of any conventional (n)-process consensus algorithm and (Theta(f +

n)) read/write registers to solve (n)-process recoverable consensus when

crash-recovery failures are independent, assuming that every execution contains

at most (f) such failures. (iii) Lastly, we prove for any (f > 0) that any

2-process recoverable consensus algorithm that uses TAS and read/writer

registers requires at least (f+1) TAS objects, assuming that crash-recovery

failures are independent and every execution contains at most (f) such failures

per process (or at most (2f) failures total).

### Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication

Ankur Mallick , Malhar Chaudhari , Gauri Joshi **Subjects** : Distributed, Parallel, and Cluster Computing (cs.DC) ; Information Theory (cs.IT)

Large-scale machine learning and data mining applications require computer

systems to perform massive computations that need to be parallelized across

multiple nodes, for example, massive matrix-vector and matrix-matrix

multiplication. The presence of straggling nodes — computing nodes that

unpredictably slowdown or fail — is a major bottleneck in such distributed

computations. We propose a emph{rateless fountain coding} strategy to

alleviate the problem of stragglers in distributed matrix-vector

multiplication. Our algorithm creates a stream of linear combinations of the

(m) rows of the matrix, and assigns them to different worker nodes, which then

perform row-vector products with the encoded rows. The original matrix-vector

product can be decoded as soon as slightly more than (m) row-vector products

are collectively finished by the nodes. This strategy enables fast nodes to

steal work from slow nodes, without requiring the master to perform any dynamic

load-balancing. Compared to recently proposed fixed-rate erasure coding

strategies which ignore partial work done by straggling nodes, rateless coding

achieves significantly lower overall delay, as well as small computational and

decoding overhead.

### History-Preserving Bisimulations on Reversible Calculus of Communicating Systems

Clément Aubert , Ioana Cristescu (TAMIS) **Subjects** : Logic in Computer Science (cs.LO) ; Distributed, Parallel, and Cluster Computing (cs.DC); Formal Languages and Automata Theory (cs.FL)

History-and hereditary history-preserving bisimulation (HPB and HHPB) are

equivalences relations for denotational models of concurrency. Finding their

counterpart in process algebras is an open problem, with some partial

successes: there exists in calculus of communicating systems (CCS) an

equivalence based on causal trees that corresponds to HPB. In Reversible CSS

(RCCS), there is a bisimulation that corresponds to HHPB, but it considers only

processes without auto-concurrency. We propose equivalences on CCS with

auto-concurrency that correspond to HPB and HHPB, and their so-called “weak”

variants. The equivalences exploit not only reversibility but also the memory

mechanism of RCCS.

### Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Feiwen Zhu ,

Jeff Pool ,

Michael Andersch ,

Jeremy Appleyard ,

Fung Xie

Comments: Published as a conference paper at ICLR 2018

**Subjects**

:

Neural and Evolutionary Computing (cs.NE)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Recurrent Neural Networks (RNNs) are powerful tools for solving

sequence-based problems, but their efficacy and execution time are dependent on

the size of the network. Following recent work in simplifying these networks

with model pruning and a novel mapping of work onto GPUs, we design an

efficient implementation for sparse RNNs. We investigate several optimizations

and tradeoffs: Lamport timestamps, wide memory loads, and a bank-aware weight

layout. With these optimizations, we achieve speedups of over 6x over the next

best algorithm for a hidden layer of size 2304, batch size of 4, and a density

of 30%. Further, our technique allows for models of over 5x the size to fit on

a GPU for a speedup of 2x, enabling larger networks to help advance the

state-of-the-art. We perform case studies on NMT and speech recognition tasks

in the appendix, accelerating their recurrent layers by up to 3x.

## Learning

### An improvement of the convergence proof of the ADAM-Optimizer

Sebastian Bock , Josef Goppold , Martin Weiß **Subjects** : Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

A common way to train neural networks is the Backpropagation. This algorithm

includes a gradient descent method, which needs an adaptive step size. In the

area of neural networks, the ADAM-Optimizer is one of the most popular adaptive

step size methods. It was invented in cite{Kingma.2015} by Kingma and Ba. The

(5865) citations in only three years shows additionally the importance of the

given paper. We discovered that the given convergence proof of the optimizer

contains some mistakes, so that the proof will be wrong. In this paper we give

an improvement to the convergence proof of the ADAM-Optimizer.

### Decoupled Parallel Backpropagation with Convergence Guarantee

Zhouyuan Huo , Bin Gu , Qian Yang , Heng Huang **Subjects** : Learning (cs.LG) ; Machine Learning (stat.ML)

Backpropagation algorithm is indispensable for the training of feedforward

neural networks. It requires propagating error gradients sequentially from the

output layer all the way back to the input layer. The backward locking in

backpropagation algorithm constrains us from updating network layers in

parallel and fully leveraging the computing resources. Recently, several

algorithms have been proposed for breaking the backward locking. However, their

performances degrade seriously when networks are deep. In this paper, we

propose decoupled parallel backpropagation algorithm for deep learning

optimization with convergence guarantee. Firstly, we decouple the

backpropagation algorithm using delayed gradients, and show that the backward

locking is removed when we split the networks into multiple modules. Then, we

utilize decoupled parallel backpropagation in two stochastic methods and prove

that our method guarantees convergence to critical points for the non-convex

problem. Finally, we perform experiments for training deep convolutional neural

networks on benchmark datasets. The experimental results not only confirm our

theoretical analysis, but also demonstrate that the proposed method can achieve

significant speedup without loss of accuracy.

### Learning Non-Stationary Space-Time Models for Environmental Monitoring

Sahil Garg ,

Amarjeet Singh ,

Fabio Ramos

Comments: AAAI-12

**Subjects**

:

Learning (cs.LG)

; Machine Learning (stat.ML)

One of the primary aspects of sustainable development involves accurate

understanding and modeling of environmental phenomena. Many of these phenomena

exhibit variations in both space and time and it is imperative to develop a

deeper understanding of techniques that can model space-time dynamics

accurately. In this paper we propose NOSTILL-GP – NOn-stationary Space TIme

variable Latent Length scale GP, a generic non-stationary, spatio-temporal

Gaussian Process (GP) model. We present several strategies, for efficient

training of our model, necessary for real-world applicability. Extensive

empirical validation is performed using three real-world environmental

monitoring datasets, with diverse dynamics across space and time. Results from

the experiments clearly demonstrate general applicability and effectiveness of

our approach for applications in environmental monitoring.

### Offline Evaluation of Ranking Policies with Click Models

Shuai Li , Yasin Abbasi-Yadkori , Branislav Kveton , S. Muthukrishnan , Vishwa Vinay , Zheng Wen **Subjects** : Learning (cs.LG) ; Machine Learning (stat.ML)

Many web systems rank and present a list of items to users, from recommender

systems to search and advertising. An important problem in practice is to

evaluate new ranking policies offline and optimize them before they are

deployed. We address this problem by proposing new evaluation algorithms for

estimating the expected number of clicks on ranked lists from stored logs of

past results. The existing algorithms are not guaranteed to be statistically

efficient in our problem because the number of recommended lists can grow

exponentially with their length. To overcome this challenge, we use models of

user interaction with the list of items, the so-called click models, to

construct estimators that learn statistically efficiently. We analyze our

estimators and prove that they are more efficient than the estimators that do

not use the structure of the click model, under the assumption that the click

model holds. We evaluate our estimators in a series of experiments on a

real-world dataset and show that they consistently outperform prior estimators.

### Scalable Bilinear (π) Learning Using State and Action Features

Yichen Chen , Lihong Li , Mengdi Wang **Subjects** : Learning (cs.LG) ; Optimization and Control (math.OC); Machine Learning (stat.ML)

Approximate linear programming (ALP) represents one of the major algorithmic

families to solve large-scale Markov decision processes (MDP). In this work, we

study a primal-dual formulation of the ALP, and develop a scalable, model-free

algorithm called bilinear (pi) learning for reinforcement learning when a

sampling oracle is provided. This algorithm enjoys a number of advantages.

First, it adopts (bi)linear models to represent the high-dimensional value

function and state-action distributions, using given state and action features.

Its run-time complexity depends on the number of features, not the size of the

underlying MDPs. Second, it operates in a fully online fashion without having

to store any sample, thus having minimal memory footprint. Third, we prove that

it is sample-efficient, solving for the optimal policy to high precision with a

sample complexity linear in the dimension of the parameter space.

### Efficiently Learning Nonstationary Gaussian Processes

Comments: This report was prepared in 2013

**Subjects**

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Most real world phenomena such as sunlight distribution under a forest

canopy, minerals concentration, stock valuation, exhibit nonstationary dynamics

i.e. phenomenon variation changes depending on the locality. Nonstationary

dynamics pose both theoretical and practical challenges to statistical machine

learning algorithms that aim to accurately capture the complexities governing

the evolution of such processes. Typically the nonstationary dynamics are

modeled using nonstationary Gaussian Process models (NGPS) that employ local

latent dynamics parameterization to correspondingly model the nonstationary

real observable dynamics. Recently, an approach based on most likely induced

latent dynamics representation attracted research community’s attention for a

while. The approach could not be employed for large scale real world

applications because learning a most likely latent dynamics representation

involves maximization of marginal likelihood of the observed real dynamics that

becomes intractable as the number of induced latent points grows with problem

size. We have established a direct relationship between informativeness of the

induced latent dynamics and the marginal likelihood of the observed real

dynamics. This opens up the possibility of maximizing marginal likelihood of

observed real dynamics indirectly by near optimally maximizing entropy or

mutual information gain on the induced latent dynamics using greedy algorithms.

Therefore, for an efficient yet accurate inference, we propose to build an

induced latent dynamics representation using a novel algorithm LISAL that

adaptively maximizes entropy or mutual information on the induced latent

dynamics and marginal likelihood of observed real dynamics in an iterative

manner. The relevance of LISAL is validated using real world sensing datasets.

### Adaptive Sensing for Learning Nonstationary Environment Models

Sahil Garg ,

Amarjeet Singh ,

Fabio Ramos

Comments: ArXiv version of the paper written in 2013

**Subjects**

:

Learning (cs.LG)

; Machine Learning (stat.ML)

Most environmental phenomena, such as wind profiles, ozone concentration and

sunlight distribution under a forest canopy, exhibit nonstationary dynamics

i.e. phenomenon variation change depending on the location and time of

occurrence. Non-stationary dynamics pose both theoretical and practical

challenges to statistical machine learning algorithms aiming to accurately

capture the complexities governing the evolution of such processes. In this

paper, we address the sampling aspects of the problem of learning nonstationary

spatio-temporal models, and propose an efficient yet simple algorithm – LISAL.

The core idea in LISAL is to learn two models using Gaussian processes (GPs)

wherein the first is a nonstationary GP directly modeling the phenomenon. The

second model uses a stationary GP representing a latent space corresponding to

changes in dynamics, or the nonstationarity characteristics of the first model.

LISAL involves adaptively sampling the latent space dynamics using information

theory quantities to reduce the computational cost during the learning phase.

The relevance of LISAL is extensively validated using multiple real world

datasets.

### Network Transplanting

Quanshi Zhang , Yu Yang , Ying Nian Wu , Song-Chun Zhu **Subjects** : Learning (cs.LG) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

This paper focuses on a novel problem, i.e., transplanting a

category-and-task-specific neural network to a generic, distributed network

without strong supervision. Like playing LEGO blocks, incrementally

constructing a generic network by asynchronously merging specific neural

networks is a crucial bottleneck for deep learning. Suppose that the

pre-trained specific network contains a module (f) to extract features of the

target category, and the generic network has a module (g) for a target task,

which is trained using other categories except for the target category. Instead

of using numerous training samples to teach the generic network a new category,

we aim to learn a small adapter module to connect (f) and (g) to accomplish the

task on a target category in a weakly-supervised manner. The core challenge is

to efficiently learn feature projections between the two connected modules. We

propose a new distillation algorithm, which exhibited superior performance. Our

method without training samples even significantly outperformed the baseline

with 100 training samples.

### Persistent Monitoring of Stochastic Spatio-temporal Phenomena with a Small Team of Robots

Comments: Robotics Science and Systems, 2014 (RSS-14)

**Subjects**

:

Robotics (cs.RO)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

This paper presents a solution for persistent monitoring of real-world

stochastic phenomena, where the underlying covariance structure changes sharply

across time, using a small number of mobile robot sensors. We propose an

adaptive solution for the problem where stochastic real-world dynamics are

modeled as a Gaussian Process (GP). The belief on the underlying covariance

structure is learned from recently observed dynamics as a Gaussian Mixture (GM)

in the low-dimensional hyper-parameters space of the GP and adapted across time

using Sequential Monte Carlo methods. Each robot samples a belief point from

the GM and locally optimizes a set of informative regions by greedy

maximization of the submodular entropy function. The key contributions of this

paper are threefold: adapting the belief on the covariance using Markov Chain

Monte Carlo (MCMC) sampling such that particles survive even under sharp

covariance changes across time; exploiting the belief to transform the problem

of entropy maximization into a decentralized one; and developing an

approximation algorithm to maximize entropy on a set of informative regions in

the continuous space. We illustrate the application of the proposed solution

through extensive simulations using an artificial dataset and multiple real

datasets from fixed sensor deployments, and compare it to three competing

state-of-the-art approaches.

### Using Machine Learning to Improve Cylindrical Algebraic Decomposition

Zongyan Huang ,

Matthew England ,

David Wilson ,

James H. Davenport ,

Lawrence C. Paulson

Comments: arXiv admin note: text overlap with arXiv:1608.04219 , arXiv:1404.6369

**Subjects**

:

Symbolic Computation (cs.SC)

; Learning (cs.LG)

Cylindrical Algebraic Decomposition (CAD) is a key tool in computational

algebraic geometry, best known as a procedure to enable Quantifier Elimination

over real-closed fields. However, it has a worst case complexity doubly

exponential in the size of the input, which is often encountered in practice.

It has been observed that for many problems a change in algorithm settings or

problem formulation can cause huge differences in runtime costs, changing

problem instances from intractable to easy. A number of heuristics have been

developed to help with such choices, but the complicated nature of the

geometric relationships involved means these are imperfect and can sometimes

make poor choices. We investigate the use of machine learning (specifically

support vector machines) to make such choices instead.

Machine learning is the process of fitting a computer model to a complex

function based on properties learned from measured data. In this paper we apply

it in two case studies: the first to select between heuristics for choosing a

CAD variable ordering; the second to identify when a CAD problem instance would

benefit from Groebner Basis preconditioning. These appear to be the first such

applications of machine learning to Symbolic Computation. We demonstrate in

both cases that the machine learned choice outperforms human developed

heuristics.

Xi Chen ,

Ali Ghadirzadeh ,

John Folkesson ,

Patric Jensfelt

Comments: Submitted to IROS 2018

**Subjects**

:

Robotics (cs.RO)

; Learning (cs.LG); Machine Learning (stat.ML)

Mobile robot navigation in complex and dynamic environments is a challenging

but important problem. Reinforcement learning approaches fail to solve these

tasks efficiently due to reward sparsities, temporal complexities and

high-dimensionality of sensorimotor spaces which are inherent in such problems.

We present a novel approach to train action policies to acquire navigation

skills for wheel-legged robots using deep reinforcement learning. The policy

maps height-map image observations to motor commands to navigate to a target

position while avoiding obstacles. We propose to acquire the multifaceted

navigation skill by learning and exploiting a number of manageable navigation

behaviors. We also introduce a domain randomization technique to improve the

versatility of the training samples. We demonstrate experimentally a

significant improvement in terms of data-efficiency, success rate, robustness

against irrelevant sensory data, and also the quality of the maneuver skills.

### Method to assess the functional role of noisy brain signals by mining envelope dynamics

Andreas Meinel , Henrich Kolkhorst , Michael Tangermann **Subjects** : Signal Processing (eess.SP) ; Learning (cs.LG); Neurons and Cognition (q-bio.NC); Applications (stat.AP); Machine Learning (stat.ML)

Data-driven spatial filtering approaches are commonly used to assess rhythmic

brain activity from multichannel recordings such as electroencephalography

(EEG). As spatial filter estimation is prone to noise, non-stationarity effects

and limited data, a high model variability induced by slight changes of, e.g.,

involved hyperparameters is generally encountered. These aspects challenge the

assessment of functionally relevant features which are of special importance in

closed-loop applications as, e.g., in the field of rehabilitation. We propose a

data-driven method to identify groups of reliable and functionally relevant

oscillatory components computed by a spatial filtering approach. Therefore, we

initially embrace the variability of decoding models in a large configuration

space before condensing information by density-based clustering of components’

functional signatures. Exemplified for a hand force task with rich within-trial

structure, the approach was evaluated on EEG data of 18 healthy subjects. We

found that functional characteristics of single components are revealed by

distinct temporal dynamics of their event-related power changes. Based on a

within-subject analysis, our clustering revealed seven groups of homogeneous

envelope dynamics on average. To support introspection by practitioners, we

provide a set of metrics to characterize and validate single clusterings. We

show that identified clusters contain components of strictly confined frequency

ranges, dominated by the alpha and beta band. Our method is applicable to any

spatial filtering algorithm. Despite high model variability, it allows

capturing and monitoring relevant oscillatory features. We foresee its

application in closed-loop applications such as brain-computer interface based

protocols in stroke rehabilitation.

### Automatic classification of trees using a UAV onboard camera and deep learning

Comments: 9 pages, 3 figures, 5 tables

**Subjects**

:

Computer Vision and Pattern Recognition (cs.CV)

; Learning (cs.LG); Machine Learning (stat.ML)

Automatic classification of trees using remotely sensed data has been a dream

of many scientists and land use managers. Recently, Unmanned aerial vehicles

(UAV) has been expected to be an easy-to-use, cost-effective tool for remote

sensing of forests, and deep learning has attracted attention for its ability

concerning machine vision. In this study, using a commercially available UAV

and a publicly available package for deep learning, we constructed a machine

vision system for the automatic classification of trees. In our method, we

segmented a UAV photography image of forest into individual tree crowns and

carried out object-based deep learning. As a result, the system was able to

classify 7 tree types at 89.0% accuracy. This performance is notable because we

only used basic RGB images from a standard UAV. In contrast, most of previous

studies used expensive hardware such as multispectral imagers to improve the

performance. This result means that our method has the potential to classify

individual trees in a cost-effective manner. This can be a usable tool for many

forest researchers and managements.

### Event Forecasting with Pattern Markov Chains

Elias Alevizos , Alexander Artikis , Georgios Paliouras **Subjects** : Databases (cs.DB) ; Learning (cs.LG)

We present a system for online probabilistic event forecasting. We assume

that a user is interested in detecting and forecasting event patterns, given in

the form of regular expressions. Our system can consume streams of events and

forecast when the pattern is expected to be fully matched. As more events are

consumed, the system revises its forecasts to reflect possible changes in the

state of the pattern. The framework of Pattern Markov Chains is used in order

to learn a probabilistic model for the pattern, with which forecasts with

guaranteed precision may be produced, in the form of intervals within which a

full match is expected. Experimental results from real-world datasets are shown

and the quality of the produced forecasts is explored, using both precision

scores and two other metrics: spread, which refers to the “focusing resolution”

of a forecast (interval length), and distance, which captures how early a

forecast is reported.

Marc-Antoine Moinnereau , Thomas Brienne , Simon Brodeur , Jean Rouat , Kevin Whittingstall , Eric Plourde **Subjects** : Signal Processing (eess.SP) ; Learning (cs.LG)

The use of electroencephalogram (EEG) as the main input signal in

brain-machine interfaces has been widely proposed due to the non-invasive

nature of the EEG. Here we are specifically interested in interfaces that

extract information from the auditory system and more specifically in the task

of classifying heard speech from EEGs. To do so, we propose to limit the

preprocessing of the EEGs and use machine learning approaches to automatically

extract their meaningful characteristics. More specifically, we use a regulated

recurrent neural network (RNN) reservoir, which has been shown to outperform

classic machine learning approaches when applied to several different

bio-signals, and we compare it with a deep neural network approach. Moreover,

we also investigate the classification performance as a function of the number

of EEG electrodes. A set of 8 subjects were presented randomly with 3 different

auditory stimuli (English vowels a, i and u). We obtained an excellent

classification rate of 83.2% with the RNN when considering all 64 electrodes. A

rate of 81.7% was achieved with only 10 electrodes.

### Distributed Differentially-Private Algorithms for Matrix and Tensor Factorization

Hafiz Imtiaz ,

Anand D. Sarwate

Comments: 39 pages, in review for publication

**Subjects**

:

Machine Learning (stat.ML)

; Learning (cs.LG)

In many signal processing and machine learning applications, datasets

containing private information are held at different locations, requiring the

development of distributed privacy-preserving algorithms. Tensor and matrix

factorizations are key components of many processing pipelines. In the

distributed setting, differentially private algorithms suffer because they

introduce noise to guarantee privacy. This paper designs new and improved

distributed and differentially private algorithms for two popular matrix and

tensor factorization methods: principal component analysis (PCA) and orthogonal

tensor decomposition (OTD). The new algorithms employ a correlated noise design

scheme to alleviate the effects of noise and can achieve the same noise level

as the centralized scenario. Experiments on synthetic and real data illustrate

the regimes in which the correlated noise allows performance matching with the

centralized setting, outperforming previous methods and demonstrating that

meaningful utility is possible while guaranteeing differential privacy.

BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption

Daniel Reem , Simeon Reich , Alvaro De Pierro **Subjects** : Optimization and Control (math.OC) ; Learning (cs.LG); Functional Analysis (math.FA); Numerical Analysis (math.NA)

The problem of minimization of a separable convex objective function has

various theoretical and real-world applications. One of the popular methods for

solving this problem is the proximal gradient method (proximal forward-backward

algorithm). A very common assumption in the use of this method is that the

gradient of the smooth term in the objective function is globally Lipschitz

continuous. However, this assumption is not always satisfied in practice, thus

casting a limitation on the method. In this paper we discuss, in a wide class

of finite and infinite-dimensional spaces, a new variant (BISTA) of the

proximal gradient method which does not impose the above-mentioned global

Lipschitz continuity assumption. A key contribution of the method is the

dependence of the iterative steps on a certain decomposition of the objective

set into subsets. Moreover, we use a Bregman divergence in the proximal

forward-backward operation. Under certain practical conditions, a

non-asymptotic rate of convergence (that is, in the function values) is

established, as well as the weak convergence of the whole sequence to a

minimizer. We also obtain a few auxiliary results of independent interest,

among them a general and useful stability principle which, roughly speaking,

says that given a uniformly continuous function defined on an arbitrary metric

space, if we slightly change the objective set over which the optimal (extreme)

values are computed, then these values vary slightly. This principle suggests a

general scheme for tackling a wide class of non-convex and non-smooth

optimization problems.

### Tensor Methods for Nonlinear Matrix Completion

Greg Ongie , Laura Balzano , Daniel Pimentel-Alarcón , Rebecca Willett , Robert D. Nowak **Subjects** : Machine Learning (stat.ML) ; Learning (cs.LG)

In the low rank matrix completion (LRMC) problem, the low rank assumption

means that the columns (or rows) of the matrix to be completed are points on a

low-dimensional linear algebraic variety. This paper extends this thinking to

cases where the columns are points on a low-dimensional nonlinear algebraic

variety, a problem we call Low Algebraic Dimension Matrix Completion (LADMC).

Matrices whose columns belong to a union of subspaces (UoS) are an important

special case. We propose a LADMC algorithm that leverages existing LRMC methods

on a tensorized representation of the data. For example, a second-order

tensorization representation is formed by taking the outer product of each

column with itself, and we consider higher order tensorizations as well. This

approach will succeed in many cases where traditional LRMC is guaranteed to

fail because the data are low-rank in the tensorized representation but not in

the original representation. We also provide a formal mathematical

justification for the success of our method. In particular, we show bounds of

the rank of these data in the tensorized representation, and we prove sampling

requirements to guarantee uniqueness of the solution. Interestingly, the

sampling requirements of our LADMC algorithm nearly match the information

theoretic lower bounds for matrix completion under a UoS model. We also provide

experimental results showing that the new approach significantly outperforms

existing state-of-the-art methods for matrix completion in many situations.

### From Principal Subspaces to Principal Components with Linear Autoencoders

Elad Plaut **Subjects** : Machine Learning (stat.ML) ; Learning (cs.LG)

The autoencoder is an effective unsupervised learning model which is widely

used in deep learning. It is well known that an autoencoder with a single

fully-connected hidden layer, a linear activation function and a squared error

cost function trains weights that span the same subspace as the one spanned by

the principal component loading vectors, but that they are not identical to the

loading vectors. In this paper, we show how to recover the loading vectors from

the autoencoder weights.

### Sparse Persistent RNNs: Squeezing Large Recurrent Networks On-Chip

Feiwen Zhu ,

Jeff Pool ,

Michael Andersch ,

Jeremy Appleyard ,

Fung Xie

Comments: Published as a conference paper at ICLR 2018

**Subjects**

:

Neural and Evolutionary Computing (cs.NE)

; Distributed, Parallel, and Cluster Computing (cs.DC); Learning (cs.LG)

Recurrent Neural Networks (RNNs) are powerful tools for solving

sequence-based problems, but their efficacy and execution time are dependent on

the size of the network. Following recent work in simplifying these networks

with model pruning and a novel mapping of work onto GPUs, we design an

efficient implementation for sparse RNNs. We investigate several optimizations

and tradeoffs: Lamport timestamps, wide memory loads, and a bank-aware weight

layout. With these optimizations, we achieve speedups of over 6x over the next

best algorithm for a hidden layer of size 2304, batch size of 4, and a density

of 30%. Further, our technique allows for models of over 5x the size to fit on

a GPU for a speedup of 2x, enabling larger networks to help advance the

state-of-the-art. We perform case studies on NMT and speech recognition tasks

in the appendix, accelerating their recurrent layers by up to 3x.

### End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction

Zhong-Qiu Wang ,

Jonathan Le Roux ,

DeLiang Wang ,

John R. Hershey

Comments: Submitted to Interspeech 2018

**Subjects**

:

Sound (cs.SD)

; Computation and Language (cs.CL); Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)

This paper proposes an end-to-end approach for single-channel

speaker-independent multi-speaker speech separation, where time-frequency (T-F)

masking, the short-time Fourier transform (STFT), and its inverse are

represented as layers within a deep network. Previous approaches, rather than

computing a loss on the reconstructed signal, used a surrogate loss based on

the target STFT magnitudes. This ignores reconstruction error introduced by

phase inconsistency. In our approach, the loss function is directly defined on

the reconstructed signals, which are optimized for best separation. In

addition, we train through unfolded iterations of a phase reconstruction

algorithm, represented as a series of STFT and inverse STFT layers. While mask

values are typically limited to lie between zero and one for approaches using

the mixture phase for reconstruction, this limitation is less relevant if the

estimated magnitudes are to be used together with phase reconstruction. We thus

propose several novel activation functions for the output layer of the T-F

masking, to allow mask values beyond one. On the publicly-available wsj0-2mix

dataset, our approach achieves state-of-the-art 12.6 dB scale-invariant

signal-to-distortion ratio (SI-SDR) and 13.1 dB SDR, revealing new

possibilities for deep learning based phase reconstruction and representing a

fundamental progress towards solving the notoriously-hard cocktail party

problem.

### Hierarchical Density Order Embeddings

Ben Athiwaratkun ,

Andrew Gordon Wilson

Comments: Published at ICLR 2018

**Subjects**

:

Computation and Language (cs.CL)

; Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)

By representing words with probability densities rather than point vectors,

probabilistic word embeddings can capture rich and interpretable semantic

information and uncertainty. The uncertainty information can be particularly

meaningful in capturing entailment relationships — whereby general words such

as “entity” correspond to broad distributions that encompass more specific

words such as “animal” or “instrument”. We introduce density order embeddings,

which learn hierarchical representations through encapsulation of probability

densities. In particular, we propose simple yet effective loss functions and

distance metrics, as well as graph-based schemes to select negative samples to

better learn hierarchical density representations. Our approach provides

state-of-the-art performance on the WordNet hypernym relationship prediction

task and the challenging HyperLex lexical entailment dataset — while retaining

a rich and interpretable density representation.

Finding Efficient Swimming Strategies in a Three Dimensional Chaotic Flow by Reinforcement Learning

K. Gustavsson ,

L. Biferale ,

A. Celani ,

S. Colabrese

Comments: Published on Eur. Phys. J. E (December 14, 2017)

Journal-ref: Eur. Phys. J. E 40, 110 (2017)

**Subjects**

:

Fluid Dynamics (physics.flu-dyn)

; Statistical Mechanics (cond-mat.stat-mech); Learning (cs.LG); Chaotic Dynamics (nlin.CD)

We apply a reinforcement learning algorithm to show how smart particles can

learn approximately optimal strategies to navigate in complex flows. In this

paper we consider microswimmers in a paradigmatic three-dimensional case given

by a stationary superposition of two Arnold-Beltrami-Childress flows with

chaotic advection along streamlines. In such a flow, we study the evolution of

point-like particles which can decide in which direction to swim, while keeping

the velocity amplitude constant. We show that it is sufficient to endow the

swimmers with a very restricted set of actions (six fixed swimming directions

in our case) to have enough freedom to find efficient strategies to move upward

and escape local fluid traps. The key ingredient is the

learning-from-experience structure of the algorithm, which assigns positive or

negative rewards depending on whether the taken action is, or is not,

profitable for the predetermined goal in the long term horizon. This is another

example supporting the efficiency of the reinforcement learning approach to

learn how to accomplish difficult tasks in complex fluid environments.

## Information Theory

### Rate-Splitting Multiple Access for Cooperative Multi-Cell Networks

Yijie Mao ,

Bruno Clerckx ,

Victor O.K. Li

Comments: 5 pages, 6 sigures

**Subjects**

:

Information Theory (cs.IT)

As a promising downlink multiple access scheme, Rate-Splitting Multiple

Access (RSMA) has been shown to achieve superior spectral and energy

efficiencies compared with Space-Division Multiple Access (SDMA) and

Non-Orthogonal Multiple Access (NOMA) in downlink single-cell systems. By

relying on linearly precoded rate-splitting at the transmitter and successive

interference cancellation at the receivers, RSMA has the capability of

partially decoding the interference and partially treating the interference as

noise, and therefore copes with a wide range of user deployments and network

loads. In this work, we further investigate RSMA in cooperative multi-cell

networks. Specifically, we study the optimal beamformer design to maximize the

Weighted Sum-Rate (WSR) of all the users subject to individual Quality of

Service (QoS) rate constraints and per base station power constraints.

Numerical results show that, in a fully cooperative multi-cell network, RSMA

achieves significant WSR improvement over SDMA and NOMA in a wide range of

inter-user and inter-cell channel strength disparities. Specifically, SDMA

(resp. NOMA) is more suited to deployments with little (resp. large) inter-user

channel strength disparity and large (resp. little) inter-cell channel

disparity, while RSMA is suited to any deployment. We conclude that RSMA

provides rate, robustness and QoS enhancements over SDMA and NOMA in

cooperative multi-cell networks.

### Linear ((2,p,p))-AONTs do Exist

Comments: arXiv admin note: text overlap with arXiv:1702.06612 by other authors

**Subjects**

:

Information Theory (cs.IT)

; Cryptography and Security (cs.CR)

A ((t,s,v))-all-or-nothing transform (AONT) is a bijective mapping defined on

(s)-tuples over an alphabet of size (v), which satisfies that if any (s-t) of

the (s) outputs are given, then the values of any (t) inputs are completely

undetermined. When (t) and (v) are fixed, to determine the maximum integer (s)

such that a ((t,s,v))-AONT exists is the main research objective. In this

paper, we solve three open problems proposed in [IEEE Trans. Inform. Theory 64

(2018), 3136-3143.] and show that there do exist linear ((2,p,p))-AONTs. Then

for the size of the alphabet being a prime power, we give the first infinite

class of linear AONTs which is better than the linear AONTs defined by Cauchy

matrices. Besides, we also present a recursive construction for general AONTs

and a new relationship between AONTs and orthogonal arrays.

Amogh Rajanna ,

Carl P. Dettmann

Comments: 30 pages, 10 figures, part of it at IEEE ICC 2018

**Subjects**

:

Information Theory (cs.IT)

Adaptive transmission schemes are a crucial aspect of the radio design for 5G

wireless channels. The paper studies the performance of two classes of adaptive

transmission schemes in cellular downlink. One class is based on physical layer

rateless codes with constant transmit power and the other uses fixed-rate codes

in conjunction with power adaptation. Using a simple stochastic geometry model

for cellular downlink, the focus is to investigate the necessity of power

control in a rateless coded adaptive transmission scheme. The performance of

both rateless and fixed-rate coded adaptive transmission schemes are compared

by evaluating the typical user success probability and rate achievable with the

two schemes. Based on both theoretical analysis and simulation results, it is

clearly shown that physical layer rateless coding simplifies the role of power

control in an adaptive transmission scheme.

### Communication, Computing and Caching for Mobile VR Delivery: Modeling and Trade-off

Yaping Sun ,

Zhiyong Chen ,

Meixia Tao ,

Hui Liu

Comments: to appear in IEEE ICC 2018

**Subjects**

:

Information Theory (cs.IT)

Mobile virtual reality (VR) delivery is gaining increasing attention from

both industry and academia due to its ability to provide an immersive

experience. However, achieving mobile VR delivery requires ultra-high

transmission rate, deemed as a first killer application for 5G wireless

networks. In this paper, in order to alleviate the traffic burden over wireless

networks, we develop an implementation framework for mobile VR delivery by

utilizing caching and computing capabilities of mobile VR device. We then

jointly optimize the caching and computation offloading policy for minimizing

the required average transmission rate under the latency and local average

energy consumption constraints. In a symmetric scenario, we obtain the optimal

joint policy and the closed-form expression of the minimum average transmission

rate. Accordingly, we analyze the tradeoff among communication, computing and

caching, and then reveal analytically the fact that the communication overhead

can be traded by the computing and caching capabilities of mobile VR device,

and also what conditions must be met for it to happen. Finally, we discuss the

optimization problem in a heterogeneous scenario, and propose an efficient

suboptimal algorithm with low computation complexity, which is shown to achieve

good performance in the numerical results.

### Deep Learning Coordinated Beamforming for Highly-Mobile Millimeter Wave Systems

Ahmed Alkhateeb ,

Sam Alex ,

Paul Varkey ,

Ying Li ,

Qi Qu ,

Djordje Tujkovic

Comments: 43 pages, 14 figures, submitted to IEEE Access

**Subjects**

:

Information Theory (cs.IT)

Supporting high mobility in millimeter wave (mmWave) systems enables a wide

range of important applications such as vehicular communications and wireless

virtual/augmented reality. Realizing this in practice, though, requires

overcoming several challenges. First, the use of narrow beams and the

sensitivity of mmWave signals to blockage greatly impact the coverage and

reliability of highly-mobile links. Second, highly-mobile users in dense mmWave

deployments need to frequently hand-off between base stations (BSs), which is

associated with critical control and latency overhead. Further, identifying the

optimal beamforming vectors in large antenna array mmWave systems requires

considerable training overhead, which significantly affects the efficiency of

these mobile systems. In this paper, a novel integrated machine learning and

coordinated beamforming solution is developed to overcome these challenges and

enable highly-mobile mmWave applications. In the proposed solution, a number of

distributed yet coordinating BSs simultaneously serve a mobile user. This user

ideally needs to transmit only one uplink training pilot sequence that will be

jointly received at the coordinating BSs using omni or quasi-omni beam

patterns. These received signals draw a defining signature not only for the

user location, but also for its interaction with the surrounding environment.

The developed solution then leverages a deep learning model that learns how to

use these signatures to predict the beamforming vectors at the BSs. This

renders a comprehensive solution that supports highly-mobile mmWave

applications with reliable coverage, low latency, and negligible training

overhead. Simulation results show that the proposed deep-learning coordinated

beamforming strategy approaches the achievable rate of the genie-aided solution

that knows the optimal beamforming vectors with no training overhead.

### Decoding Reed-Muller Codes Using Minimum-Weight Parity Checks

Elia Santi ,

Christian Häger ,

Henry D. Pfister

Comments: 5 pages, 3 figures, accepted for ISIT 2018

**Subjects**

:

Information Theory (cs.IT)

Reed-Muller (RM) codes exhibit good performance under maximum-likelihood (ML)

decoding due to their highly-symmetric structure. In this paper, we explore the

question of whether the code symmetry of RM codes can also be exploited to

achieve near-ML performance in practice. The main idea is to apply iterative

decoding to a highly-redundant parity-check (PC) matrix that contains only the

minimum-weight dual codewords as rows. As examples, we consider the peeling

decoder for the binary erasure channel, linear-programming and belief

propagation (BP) decoding for the binary-input additive white Gaussian noise

channel, and bit-flipping and BP decoding for the binary symmetric channel. For

short block lengths, it is shown that near-ML performance can indeed be

achieved in many cases. We also propose a method to tailor the PC matrix to the

received observation by selecting only a small fraction of useful

minimum-weight PCs before decoding begins. This allows one to both improve

performance and significantly reduce complexity compared to using the full set

of minimum-weight PCs.

### Success Probability and Area Spectral Efficiency of a VANET Modeled as a Cox Process

Vishnu Vardhan Chetlur ,

Harpreet S. Dhillon

Comments: To appear in IEEE Wireless Communications Letters

**Subjects**

:

Information Theory (cs.IT)

; Networking and Internet Architecture (cs.NI)

This paper analyzes the performance of a vehicular ad hoc network (VANET)

modeled as a Cox process, where the spatial layout of the roads is modeled by a

Poisson line process (PLP) and the locations of nodes on each line are modeled

as a 1D Poisson point process (PPP). For this setup, we characterize the

success probability of a typical link and the area spectral efficiency (ASE) of

the network assuming slotted ALOHA as the channel access scheme. We then

concretely establish that the success probability of a typical link in a VANET

modeled using a Cox process converges to that of a 1D and 2D PPP for some

extreme values of the line and node densities. We also study the trends in

success probability as a function of the system parameters and show that the

optimum transmission probability that maximizes the ASE for this Cox process

model differs significantly from those of the relatively-simpler 1D and 2D PPP

models used commonly in the literature to model vehicular networks.

### Communication over an Arbitrarily Varying Channel under a State-Myopic Encoder

Amitalok J. Budkuley ,

Sidharth Jaggi

Comments: 16 pages

**Subjects**

:

Information Theory (cs.IT)

We study the problem of communication over a discrete arbitrarily varying

channel (AVC) when a noisy version of the state is known non-causally at the

encoder. The state is chosen by an adversary which knows the coding scheme. A

state-myopic encoder observes this state non-causally, though imperfectly,

through a noisy discrete memoryless channel (DMC). We first characterize the

capacity of this state-dependent channel when the encoder-decoder share

randomness unknown to the adversary, i.e., the randomized coding capacity.

Next, we show that when only the encoder is allowed to randomize, the capacity

remains unchanged when positive. Interesting and well-known special cases of

the state-myopic encoder model are also presented.

### Average Case Analysis of Leaf-Centric Binary Tree Sources

Louisa Seelbach Benkner , Markus Lohrey **Subjects** : Discrete Mathematics (cs.DM) ; Information Theory (cs.IT); Combinatorics (math.CO)

We study the average size of the minimal directed acyclic graph (DAG) with

respect to so-called leaf-centric binary tree sources as studied by Zhang,

Yang, and Kieffer. A leaf-centric binary tree source induces for every (n geq

2) a probability distribution on all binary trees with (n) leaves. We

generalize a result shown by Flajolet, Gourdon, Martinez and Devroye according

to which the average size of the minimal DAG of a binary tree that is produced

by the binary search tree model is (Theta(n / log n)).

### Rateless Codes for Near-Perfect Load Balancing in Distributed Matrix-Vector Multiplication

Ankur Mallick , Malhar Chaudhari , Gauri Joshi **Subjects** : Distributed, Parallel, and Cluster Computing (cs.DC) ; Information Theory (cs.IT)

Large-scale machine learning and data mining applications require computer

systems to perform massive computations that need to be parallelized across

multiple nodes, for example, massive matrix-vector and matrix-matrix

multiplication. The presence of straggling nodes — computing nodes that

unpredictably slowdown or fail — is a major bottleneck in such distributed

computations. We propose a emph{rateless fountain coding} strategy to

alleviate the problem of stragglers in distributed matrix-vector

multiplication. Our algorithm creates a stream of linear combinations of the

(m) rows of the matrix, and assigns them to different worker nodes, which then

perform row-vector products with the encoded rows. The original matrix-vector

product can be decoded as soon as slightly more than (m) row-vector products

are collectively finished by the nodes. This strategy enables fast nodes to

steal work from slow nodes, without requiring the master to perform any dynamic

load-balancing. Compared to recently proposed fixed-rate erasure coding

strategies which ignore partial work done by straggling nodes, rateless coding

achieves significantly lower overall delay, as well as small computational and

decoding overhead.

#### 欢迎加入我爱机器学习QQ14群：336582044

微信扫一扫，关注我爱机器学习公众号

微博：我爱机器学习

CoLaBug.com遵循[CC BY-SA 4.0]分享并保持客观立场，本站不承担此类作品侵权行为的直接责任及连带责任。您有版权、意见、投诉等问题，请通过[eMail]联系我们处理，如需商业授权请联系原作者/原网站。