Introduction to Julia

Authors
Affiliations

Felix Cremer

Max Planck Institute for Biogeochemistry

Lazaro Alonso

Max Planck Institute for Biogeochemistry

Anshul Singhvi

JuliaHub

Fabian Gans

Max Planck Institute for Biogeochemistry

1 Why Julia?

2 Basics of Julia programming

# Simple math
x = 5
y = 2
x + y
7
# Vector math
x = [1,2,3,4]
y = [2.5,3.5,1.0,2.9]
x + y
4-element Vector{Float64}:
 3.5
 5.5
 4.0
 6.9
# Functions look like math
f(x) = 2x^2 + sin(x)
f(5)
49.04107572533686
# Apply a function element-wise
f.(x)
4-element Vector{Float64}:
  2.8414709848078967
  8.909297426825681
 18.14112000805987
 31.243197504692073
# Always be explicit when applying function vectorized
sin.([1.0,2.0,3.0])
3-element Vector{Float64}:
 0.8414709848078965
 0.9092974268256817
 0.1411200080598672
# Matrix math
m = rand(4,3)
n = ones(3,5)
m*n
4×5 Matrix{Float64}:
 0.93134  0.93134  0.93134  0.93134  0.93134
 2.5254   2.5254   2.5254   2.5254   2.5254
 1.82363  1.82363  1.82363  1.82363  1.82363
 2.01026  2.01026  2.01026  2.01026  2.01026

2.0.1 So what is special about Julia?

function mysum(x)
    s = zero(eltype(x))
    for ix in x
        s = s+ix
    end
    return s
end
vec = rand(1000000)
@time mysum(vec)
@time mysum(vec)
  0.009618 seconds (3.86 k allocations: 183.891 KiB, 91.81% compilation time)
  0.000739 seconds (1 allocation: 16 bytes)
499515.0814489908

2.1 Variables

# Variables are names (tags, stickers) for Julia objects
x = 2
y = x
x = 3
x, y 
(3, 2)

2.2 Data types

2.2.1 Built-in Data types

#Numeric data types
for T in (UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, 
          Float16, Float32, Float64, ComplexF16, ComplexF32, ComplexF64)
    @show T(5)
end
T(5) = 0x05
T(5) = 0x0005
T(5) = 0x00000005
T(5) = 0x0000000000000005
T(5) = 5
T(5) = 5
T(5) = 5
T(5) = 5
T(5) = Float16(5.0)
T(5) = 5.0f0
T(5) = 5.0
T(5) = Float16(5.0) + Float16(0.0)im
T(5) = 5.0f0 + 0.0f0im
T(5) = 5.0 + 0.0im
@show typeof([1,2,3])
@show typeof([1.0,2.0,3.0])
typeof([1, 2, 3]) = Vector{Int64}
typeof([1.0, 2.0, 3.0]) = Vector{Float64}
Vector{Float64} (alias for Array{Float64, 1})

2.2.2 Custom data types

structs are basic types composing several fields into a single object.

struct PointF
x::Float64
y::Float64
end
p = PointF(2.0,3.0)
PointF(2.0, 3.0)

The fields of a struct may or may not be typed

struct PointUntyped
x
y
end
p = PointUntyped(2.f0, 3)
PointUntyped(2.0f0, 3)
PointUntyped("Hallo", sin)
PointUntyped("Hallo", sin)

Parametric types can be used to generic specialized code for a variety of field types.

struct Point{T<:Number}
x::T
y::T
end
p = Point(3.f0,2.f0)
Point{Float32}(3.0f0, 2.0f0)

2.3 Loops

Loops are written using the for keyword and process any object implementing the iteration interface

for i in 1:3
    println(i)
end

for letter in "hello"
    println(letter)
end
1
2
3
h
e
l
l
o

2.4 Functions

There are 3 ways to define functions in Julia:

Long form:

function f1(x, y)
    return x+y
end
f1 (generic function with 1 method)

Note that the return keyword is optional. If it is missing, a function always returns the result of the last statement.

Short form:

f2(x, y) = x + y
f2 (generic function with 1 method)

Very useful for writing short one-liners.

Anonymous functions (similar to lambdas in python), will be important in the Functional Programming section:

f3 = (x,y) -> x + y
#11 (generic function with 1 method)

They all define the same function:

f1(1,2) == f2(1,2) == f3(1,2) == 3
true

2.5 Multiple Dispatch

In object-oriented programming languages methods (behavior) are part of the class namespace itself and can be used to implement generic behavior.

class Point():
    def __init__(self,x,y):
        self.x = x
        self.y = y

    def abs(self):
        return self.x*self.x + self.y*self.y

p = Point(3,2)
p.abs()

In Julia, functions are first-class objects and can have multiple methods for different combinations of argument types.

absolute(p::Point) = p.x^2 + p.y^2
absolute (generic function with 1 method)

This defines a new function absolute with a single method

methods(absolute)
# 1 method for generic function absolute from Main:
  • absolute(p::Point) in Main at In[21]:1
@show absolute(Point(2,3))
@show absolute(Point(2.0,3.0))
absolute(Point(2, 3)) = 13
absolute(Point(2.0, 3.0)) = 13.0
13.0

In addition to defining new functions, existing functions can be extended to work for our custom data types:

import Base: +, -, *, /, zero, one, oneunit
+(p1::Point, p2::Point) = Point(p1.x+p2.x, p1.y+p2.y)
-(p1::Point, p2::Point) = Point(p1.x-p2.x, p1.y-p2.y)
*(x::Number, p::Point) = Point(x*p.x, x*p.y)
/(p::Point,x::Number) = Point(p.x/x,p.y/x)
-(p::Point) = Point(-p.x,-p.y)
zero(x::Point{T}) where T = zero(typeof(x))
zero(::Type{Point{T}}) where T = Point(zero(T),zero(T))
one(x::Point{T}) where T = one(typeof(x))
one(::Type{Point{T}}) where T = Point(one(T),one(T))
Point{T}(p::Point) where T = Point{T}(T(p.x),T(p.y))

Now that we have defined some basic math around the Point type we can use a lot of generic behavior:

#Create zero matrix of Points
zeros(Point{Float64},3,2)
3×2 Matrix{Point{Float64}}:
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)
#Diagonal matrices
pointvec = (1:3) .* ones(Point{Float64})
3-element Vector{Point{Float64}}:
 Point{Float64}(1.0, 1.0)
 Point{Float64}(2.0, 2.0)
 Point{Float64}(3.0, 3.0)
using LinearAlgebra
diagm(0=>pointvec)
3×3 Matrix{Point{Float64}}:
 Point{Float64}(1.0, 1.0)  Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(2.0, 2.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)  Point{Float64}(3.0, 3.0)
rand(4,5) * ones(Point{Float64},5,2)
4×2 Matrix{Point{Float64}}:
 Point{Float64}(3.6525, 3.6525)    Point{Float64}(3.6525, 3.6525)
 Point{Float64}(1.78177, 1.78177)  Point{Float64}(1.78177, 1.78177)
 Point{Float64}(3.1139, 3.1139)    Point{Float64}(3.1139, 3.1139)
 Point{Float64}(2.26338, 2.26338)  Point{Float64}(2.26338, 2.26338)
range(Point(-1.0,-2.0),Point(3.0,4.0),length=10)
10-element LinRange{Point{Float64}, Int64}:
 Point{Float64}(-1.0, -2.0), …, Point{Float64}(3.0, 4.0)

3 Package management

4 Functional Programming

Transform an array by applying the same function to every element. We can do this using a loop:

a = rand(100)
out = similar(a)
for i in eachindex(a)
    out[i] = sqrt(a[i])
end
out
100-element Vector{Float64}:
 0.40569790116115334
 0.6541065977681604
 0.977605082277798
 0.5780201251783924
 0.7491830123639724
 0.936679733945372
 0.7260907590361468
 0.8372184513833079
 0.6677264931748974
 0.9229086178823654
 0.34001492053987126
 0.4325306586981604
 0.6679040937533969
 ⋮
 0.6950224864425166
 0.4773870180884778
 0.4909335780146303
 0.8143634874384189
 0.49203816634950226
 0.7683110927305918
 0.8420592223664212
 0.5496051330899939
 0.7053019796229103
 0.6023275539163602
 0.614756998653948
 0.9303507239941956

In the end we “map” a function over an array, so the following does the same as our loop defined above.

map(sqrt,a)
100-element Vector{Float64}:
 0.40569790116115334
 0.6541065977681604
 0.977605082277798
 0.5780201251783924
 0.7491830123639724
 0.936679733945372
 0.7260907590361468
 0.8372184513833079
 0.6677264931748974
 0.9229086178823654
 0.34001492053987126
 0.4325306586981604
 0.6679040937533969
 ⋮
 0.6950224864425166
 0.4773870180884778
 0.4909335780146303
 0.8143634874384189
 0.49203816634950226
 0.7683110927305918
 0.8420592223664212
 0.5496051330899939
 0.7053019796229103
 0.6023275539163602
 0.614756998653948
 0.9303507239941956

There is also the very similar broadcast function:

broadcast(sqrt,a)
100-element Vector{Float64}:
 0.40569790116115334
 0.6541065977681604
 0.977605082277798
 0.5780201251783924
 0.7491830123639724
 0.936679733945372
 0.7260907590361468
 0.8372184513833079
 0.6677264931748974
 0.9229086178823654
 0.34001492053987126
 0.4325306586981604
 0.6679040937533969
 ⋮
 0.6950224864425166
 0.4773870180884778
 0.4909335780146303
 0.8143634874384189
 0.49203816634950226
 0.7683110927305918
 0.8420592223664212
 0.5496051330899939
 0.7053019796229103
 0.6023275539163602
 0.614756998653948
 0.9303507239941956

Instead of calling the broadcast function explicitly, most Julia programmers would use the shorthand dot-syntax:

sqrt.(a)
100-element Vector{Float64}:
 0.40569790116115334
 0.6541065977681604
 0.977605082277798
 0.5780201251783924
 0.7491830123639724
 0.936679733945372
 0.7260907590361468
 0.8372184513833079
 0.6677264931748974
 0.9229086178823654
 0.34001492053987126
 0.4325306586981604
 0.6679040937533969
 ⋮
 0.6950224864425166
 0.4773870180884778
 0.4909335780146303
 0.8143634874384189
 0.49203816634950226
 0.7683110927305918
 0.8420592223664212
 0.5496051330899939
 0.7053019796229103
 0.6023275539163602
 0.614756998653948
 0.9303507239941956

which gets translated to the former expression when lowering the code.

4.1 Differences between broadcast and map

For single-argument functions there is no difference between map and broadcast. However, the functions differe in behavior when mutiple arguments are passed:

a = [0.1, 0.2, 0.3]
b = [1.0 2.0 3.0]
@show size(a)
@show size(b)
@show map(+,a,b)
@show broadcast(+,a,b)
@show a .+ b
nothing
size(a) = (3,)
size(b) = (1, 3)
map(+, a, b) = [1.1, 2.2, 3.3]
broadcast(+, a, b) = [1.1 2.1 3.1; 1.2 2.2 3.2; 1.3 2.3 3.3]
a .+ b = [1.1 2.1 3.1; 1.2 2.2 3.2; 1.3 2.3 3.3]

map - iterates over all arguments separately, and passing them one by one to the applied function - agnostic of array shapes

broadcast - is dimension-aware - matches lengths of arrays along each array dimension - expanding dimensions of length 1 or non-existing dimensions at the end

In most cases one uses broadcast because it is easier to type using the dot-notation.

4.1.1 reduce and foldl

reduce(+,1:10)
55

What is happening behind the scenes?

function myplus(x,y)
    @show x,y
    return x+y
end
reduce(myplus,1:10)
(x, y) = (1, 2)
(x, y) = (3, 3)
(x, y) = (6, 4)
(x, y) = (10, 5)
(x, y) = (15, 6)
(x, y) = (21, 7)
(x, y) = (28, 8)
(x, y) = (36, 9)
(x, y) = (45, 10)
55

foldl is very similar to reduce, but with left-associativty guaranteed (all elements of the array will be processed strictly in order), makes parallelization impossible.

Example task: find the longest streak of true values in a Bool array.

function streak(oldstate,newvalue)
    maxstreak, currentstreak = oldstate
    if newvalue #We extend the streak by 1
        currentstreak += 1
        maxstreak = max(currentstreak, maxstreak)
    else
        currentstreak = 0
    end
    return (maxstreak,currentstreak)
end
x = rand(Bool,1000)
foldl(streak,x,init=(0,0))
(9, 0)

mapreduce and mapfoldl combine both map and reduce. For example, to compute the sum of squares of a vector one can do:

r = rand(1000)
mapreduce(x->x*x,+,r)
342.00530573304724

To compute the longest streak of random numbers larger than 0.9 we could do:

mapfoldl(>(0.1),streak,r,init=(0,0))
(51, 10)

5 Allocations

Last exercise: In a vector of numbers, count how often a consecutive value is larger than its predecessor.