Julia vs R vs Python: simple optimization

综合编程 2018-02-13 阅读原文

In this post, I will try to compare and contrast Julia, R, and Python via a simple maximum likelihood optimization problem whose motivation comes from the credit risk domain and is discussed in more detail inthis post

Side note - the author's experience level at the time of writing

LanguageLevel of experience
R9 years
Julia6 months
Pythonbeginner

For such a simple optimization problem, R, Julia, and Python/SciPy will all do a competent job , so there is no clear winner. However as noted by Julia discourse member ChrisRackauckas

if you wish to solve only a single optimization problem and that problem takes <10 seconds, then Julia’s long initial compilation is something you want to avoid. For long enough problems, or for solving multiple optimization problems, the compilation time is not noticeable.
LanguageProsCons
JuliaEasy to use; Wonderful support for truncated distributions; Easy to understand output Long wait for the first run due to compilation
REasy to useModel output not as readable as Julia's
PythonEasy to useModel output not as readable as Julia's

The optimization problem

Given observations

Q

1

,

Q

2

,

.

.

.

,

Q

n

Q_1,, Q_2,, ...,, Q_n

Q 1 , Q 2 , . . . , Q n , we aim to find paramters

μ

mu

and

σ

sigma

that optimize this likelihood function

L

=

(

ϕ

(

Q

i

,

μ

,

σ

)

Φ

(

max

Q

t

,

μ

,

σ

)

)

L = prodleft(frac{phi(Q_i,mu,sigma)}{Phi(max Q_t,mu,sigma)}right)

L = ( Φ ( max Q t , μ , σ ) ϕ ( Q i , μ , σ ) )

often we try to optimize the log-likelihood instead

log

L

=

l

=

(

i

ϕ

(

Q

i

,

μ

,

σ

)

)

n

Φ

(

max

Q

t

,

μ

,

σ

)

log L = l = left(sum_i phi(Q_i,mu,sigma)right) - nPhi(max Q_t,mu,sigma)

lo g L = l = ( i ϕ ( Q i , μ , σ ) ) n Φ ( max Q t , μ , σ )

side note - a more efficient numerical solution exists
This Ergashev et al., 2016 paper (behind a paywall) derived a necessary condition for when a solution exists and derived an equation in one parameter which when solved (using simple uniroot techniques) can recover both

μ

mu

and

σ

sigma

. From my testing, applying Ergashev's formula yields about 50x speed up to the R solution. However, in this blogpost, I aim to compare and contrast the optimization function in Julia vs. R vs. Python and hence I have chosen not to implement Ergashev's methods.

Julia solution

Below is my Julia implementation using Optim.jl In Julia, one can use symbols in variable names, so I have used

μ

σ

musigma

as a variable name. Julia also has a popular package called JuMP.jl for optimization problems. However, the Optim.jl package is more than adequate for such a simple problem, and I will only look at JuMP.jl in the future when dealing with more advanced optimization problems.

I have noted the time it takes to perform the first run of the optimization problem is 7.5 seconds. This is considerably slower than both R and Python. In regards to this, Julia Discourse member ChrisRackauckas have pointed out

If you want to solve a problem that takes 3000 seconds to solve, the first time in a Julia session will make it take 3007 seconds making the total runtime more relevant to the total time than compilation. If you want to solve many 100 10 second optimization problems, the first will take 17 seconds, and subsequent calls will not have the compilation and will take roughly 10 seconds, making the total runtime 1007 seconds and thus the individual problem each is more relevant than the compilation time. If you want to solve a single 5-second optimization problem, it’ll be 12 seconds. Thus the compilation time issue can be annoying for interactive use when you want to solve a single problem but doesn’t effect performance-sensitive uses.

He also notes that

Julia v0.7 does have an interpreter though which promises to reduce compilation time in exchange for less optimizations and this may be an interesting middle ground for the specific interactive use case. Additionally, tools like PackageCompiler.jl may become more common, limiting the “startup time” of package code that is noticed here.

In the below, I have hard-coded the values of Q_t that I would like to use in the MLE estimation.

using Distributions, Optim

# hard coded dataobservations
odr=[0.10,0.20,0.15,0.22,0.15,0.10,0.08,0.09,0.12]
Q_t = quantile.(Normal(0,1), odr)

# return a function that accepts `[mu, sigma]` as parameter
function neglik_tn(Q_t)
    maxx = maximum(Q_t)
    f(μσ) = -sum(logpdf.(Truncated(Normal(μσ[1],μσ[2]), -Inf, maxx), Q_t))
    f
end

neglikfn = neglik_tn(Q_t)

# optimize!
# start searching 
@time res = optimize(neglikfn, [mean(Q_t), std(Q_t)]) # 7.5 seconds
@time res = optimize(neglikfn, [mean(Q_t), std(Q_t)]) # 0.000137 seconds

# the mu and sigma estimates
Optim.minimizer(res) # [-1.0733250637041452,0.2537450497038758] # or

# use `fieldnames(res)` to see the list of field names that can be referenced via . (dot)
res.minimizer # [-1.0733250637041452,0.2537450497038758]

which gave me the following output which is by far the most descriptive out of the Julia/R/Python-verse.

Results of Optimization Algorithm
 * Algorithm: Nelder-Mead
 * Starting Point: [-1.1300664159893685,0.22269345618402703]
 * Minimizer: [-1.0733250637041452,0.2537450497038758]
 * Minimum: -1.893080e+00
 * Iterations: 28
 * Convergence: true
   *  √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: true
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 59

What was nice about Julia?

RatingDescription
Truncated(DN, lower, upper) is a wonderfully simple way to define truncated distribution
A logpdf function that works for any distribution
Very clear and informative output

What was not so nice about Julia?

RatingDescription
If you only have simple optimization problems that don't take > 10 seconds to run and that you don't need to run them thousands of times then the slow compilation time of 7.5 seconds is super annoying!

R solution

R has a truncnorm package for dealing with truncated normals.

odr=c(0.10,0.20,0.15,0.22,0.15,0.10,0.08,0.09,0.12)
x = qnorm(odr)

library(truncnorm)
neglik_tn = function(x) {
  maxx = max(x)
  resfn = function(musigma) {
    -sum(log(dtruncnorm(x, a = -Inf, b= maxx, musigma[1], musigma[2])))
  }
  
  resfn
}

neglikfn = neglik_tn(x)

system.time(res <- optim(c(mean(x), sd(x)), neglikfn))
res

which gave the output

$par
[1] -1.4294776  0.3555692

$value
[1] -3.020876

$counts
function gradient 
      57       NA 

$convergence
[1] 0

$message
NULL

What was nice about R?

RatingDescription
Has a package for truncated normal
Gives me results right away; this wouldn't be such a plus if it weren't for Julia's slow compilation speed

What was not so nice about R?

RatingDescription
A couple of minor things: no log density for truncated normal; and no easy way to define truncated distribution for arbitrary distributions; sparse output (print)

Python solution

Even though I have no experience with Python, simple Google searches allowed me to come up with this solution. I have used the Anaconda distribution which saved me a lot of hassle in terms installing packages, as all the packages I need are pre-installed.

import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm

# generate the data
odr=[0.10,0.20,0.15,0.22,0.15,0.10,0.08,0.09,0.12]
Q_t = norm.ppf(odr)
maxQ_t = max(Q_t)

# define a function that will return a return to optimize based on the input data
def neglik_trunc_tn(Q_t):
    maxQ_t = max(Q_t)
    def neglik_trunc_fn(musigma):
        return -sum(norm.logpdf(Q_t, musigma[0], musigma[1])) + len(Q_t)*norm.logcdf(maxQ_t, musigma[0], musigma[1])
    return neglik_trunc_fn

# the likelihood function to optimize
neglik = neglik_trunc_tn(Q_t)

# optimize!
res = minimize(neglik, [np.mean(Q_t), np.std(Q_t)])
res

and these are the results

fun: -1.8930804441641282
 hess_inv: array([[ 0.01759589,  0.00818596],
       [ 0.00818596,  0.00937868]])
      jac: array([ -3.87430191e-07,   3.33786011e-06])
  message: 'Optimization terminated successfully.'
     nfev: 40
      nit: 6
     njev: 10
   status: 0
  success: True
        x: array([-1.07334252,  0.25373624])

What was nice about Python?

RatingDescription
Easy to google even for beginners
Gives me results right away

What was not so nice about Python?

RatingDescription
The output could be nicer, but a minor point

Conclusion

For such a simple optimization problem, you can't go wrong choosing any of the three languages/ecosystems. All three languages/ecosystems allow you to define closures which I find to be a good way to generate functions that embed the data that it needs, leaving you with just a function with the distribution parameters as the only arguments.

I would not choose to use Julia if I only need to run simple (sub-10 seconds) optimizations that I only intend to run once or twice due to long compilation times. If your problem is fairly standard you can get around this by using PackageCompiler.jl to precompile the functions; but that feels like an over-complication for such a simple task that both R and Python/SciPy can handle easily.

Codementor Tutorials

责编内容by:Codementor Tutorials阅读原文】。感谢您的支持!

您可能感兴趣的

New Course: Creating and Distributing Python Packa... Myself and Audrey Roy Greenfeld have released a course, Creating and Distributing Python Packages . Features it includes: ...
编译Python3.7并配置ssl库为LibreSSL 背景 重新安装Python 3.7.0的版本。 按照之前的套路,安装系统依赖: yum install bzip2-devel yum install python-devel yum install libffi-dev yum install sqlite-devel yum ins...
R实战 第九篇:数据标准化 数据标准化处理是数据分析的一项基础工作,不同评价指标往往具有不同的量纲,数据之间的差别可能很大,不进行处理会影响到数据分析的结果。为了消除指标之间的量纲和取值范围差异对数据分析结果的影响,需要对数据进行标准化处理,就是说,把数据按照比例进行缩放,使之落入一个特定的区域,便于进行综合分析。 在继续...
数据不八卦,跟我从零做一次「数据收集与分析」... 不得不说呢,我是一个非常八卦的产品经理~今天,就借目前比较火的《李雨桐与薛之谦事件》来做一个基础的数据分析吧。因为我也是一个新手,很高级的方法就暂时没掌握了。此篇答案作抛砖引玉。 前言 大家在平时如果需要进行一个需求说明或者是一些用户相关的展示,如果用到了一些可视化的数据图表,是会让自己的...
学Python需要天赋吗?看完弟弟编写的爬虫与爆破脚本,只有汗水!... 学Python需要天赋吗?看完弟弟编写的爬虫与爆破脚本,只有汗水! Python学习记录脚本,希望弟弟通过练习编写脚本一点点提升自己很菜的编程水平~~明天会更好,希望自己越来越强吧。 实现原理 学Python需要天赋吗?看完弟弟编写...