Introduction to Julia

Authors

Affiliations

Felix Cremer

Max Planck Institute for Biogeochemistry

Lazaro Alonso

Max Planck Institute for Biogeochemistry

Anshul Singhvi

JuliaHub

Fabian Gans

Max Planck Institute for Biogeochemistry

1 Why Julia?

Published around 2012 initially developed at MIT
Just in Time compiled
Optionally typed
Gives you the best of C, Matlab and Python
Function oriented not object oriented
solves the two language problem

2 Basics of Julia programming

Variables
struct
Boolean Operators and Numeric comparisons
Functions
Multiple Dispatch
Keyword arguments
Anonymous Functions
Conditionals

# Simple math
x = 5
y = 2
x + y

# Vector math
x = [1,2,3,4]
y = [2.5,3.5,1.0,2.9]
x + y

4-element Vector{Float64}:
 3.5
 5.5
 4.0
 6.9

# Functions look like math
f(x) = 2x^2 + sin(x)
f(5)

49.04107572533686

# Apply a function element-wise
f.(x)

4-element Vector{Float64}:
  2.8414709848078967
  8.909297426825681
 18.14112000805987
 31.243197504692073

# Always be explicit when applying function vectorized
sin.([1.0,2.0,3.0])

3-element Vector{Float64}:
 0.8414709848078965
 0.9092974268256817
 0.1411200080598672

# Matrix math
m = rand(4,3)
n = ones(3,5)
m*n

4×5 Matrix{Float64}:
 2.10274   2.10274   2.10274   2.10274   2.10274
 0.461203  0.461203  0.461203  0.461203  0.461203
 0.984686  0.984686  0.984686  0.984686  0.984686
 1.66137   1.66137   1.66137   1.66137   1.66137

2.0.1 So what is special about Julia?

function mysum(x)
    s = zero(eltype(x))
    for ix in x
        s = s+ix
    end
    return s
end
vec = rand(1000000)
@time mysum(vec)
@time mysum(vec)

  0.006841 seconds (3.86 k allocations: 184.781 KiB, 80.55% compilation time)
  0.000944 seconds (1 allocation: 16 bytes)

499992.28060113866

2.1 Variables

# Variables are names (tags, stickers) for Julia objects
x = 2
y = x
x = 3
x, y

(3, 2)

2.2 Data types

2.2.1 Built-in Data types

#Numeric data types
for T in (UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64, 
          Float16, Float32, Float64, ComplexF16, ComplexF32, ComplexF64)
    @show T(5)
end

T(5) = 0x05
T(5) = 0x0005
T(5) = 0x00000005
T(5) = 0x0000000000000005
T(5) = 5
T(5) = 5
T(5) = 5
T(5) = 5
T(5) = Float16(5.0)
T(5) = 5.0f0
T(5) = 5.0
T(5) = Float16(5.0) + Float16(0.0)im
T(5) = 5.0f0 + 0.0f0im
T(5) = 5.0 + 0.0im

@show typeof([1,2,3])
@show typeof([1.0,2.0,3.0])

typeof([1, 2, 3]) = Vector{Int64}
typeof([1.0, 2.0, 3.0]) = Vector{Float64}

Vector{Float64} (alias for Array{Float64, 1})

2.2.2 Custom data types

structs are basic types composing several fields into a single object.

struct PointF
    x::Float64
    y::Float64
end
p = PointF(2.0,3.0)

PointF(2.0, 3.0)

The fields of a struct may or may not be typed

struct PointUntyped
    x
    y
end
p = PointUntyped(2.f0, 3)

PointUntyped(2.0f0, 3)

PointUntyped("Hallo", sin)

PointUntyped("Hallo", sin)

Parametric types can be used to generic specialized code for a variety of field types.

struct Point{T<:Number}
    x::T
    y::T
end
p = Point(3.f0,2.f0)

Point{Float32}(3.0f0, 2.0f0)

2.3 Loops

Loops are written using the for keyword and process any object implementing the iteration interface

for i in 1:3
    println(i)
end

for letter in "hello"
    println(letter)
end

1
2
3
h
e
l
l
o

2.4 Functions

There are 3 ways to define functions in Julia:

Long form:

function f1(x, y)
    return x+y
end

f1 (generic function with 1 method)

Note that the return keyword is optional. If it is missing, a function always returns the result of the last statement.

Short form:

f2(x, y) = x + y

f2 (generic function with 1 method)

Very useful for writing short one-liners.

Anonymous functions (similar to lambdas in python), will be important in the Functional Programming section:

f3 = (x,y) -> x + y

#1 (generic function with 1 method)

They all define the same function:

f1(1,2) == f2(1,2) == f3(1,2) == 3

true

2.5 Multiple Dispatch

In object-oriented programming languages methods (behavior) are part of the class namespace itself and can be used to implement generic behavior.

using PythonCall

class Point():
    def __init__(self,x,y):
        self.x = x
        self.y = y

    def abs(self):
        return self.x*self.x + self.y*self.y

p = Point(3,2)
p.abs()

Python: 13

In Julia, functions are first-class objects and can have multiple methods for different combinations of argument types.

absolute(p::Point) = p.x^2 + p.y^2

absolute (generic function with 1 method)

This defines a new function absolute with a single method

methods(absolute)

# 1 method for generic function absolute from [36mMain.Notebook[39m:

absolute(p::Main.Notebook.Point) in Main.Notebook at /Users/lalonso/Documents/BiDS2025_JuliaTutorial/intro-julia.qmd:218

@show absolute(Point(2,3))
@show absolute(Point(2.0,3.0))

absolute(Point(2, 3)) = 13
absolute(Point(2.0, 3.0)) = 13.0

13.0

In addition to defining new functions, existing functions can be extended to work for our custom data types:

import Base: +, -, *, /, zero, one, oneunit
+(p1::Point, p2::Point) = Point(p1.x+p2.x, p1.y+p2.y)
-(p1::Point, p2::Point) = Point(p1.x-p2.x, p1.y-p2.y)
*(x::Number, p::Point) = Point(x*p.x, x*p.y)
/(p::Point,x::Number) = Point(p.x/x,p.y/x)
-(p::Point) = Point(-p.x,-p.y)
zero(x::Point{T}) where T = zero(typeof(x))
zero(::Type{Point{T}}) where T = Point(zero(T),zero(T))
one(x::Point{T}) where T = one(typeof(x))
one(::Type{Point{T}}) where T = Point(one(T),one(T))
Point{T}(p::Point) where T = Point{T}(T(p.x),T(p.y))

Now that we have defined some basic math around the Point type we can use a lot of generic behavior:

#Create zero matrix of Points
zeros(Point{Float64},3,2)

3×2 Matrix{Point{Float64}}:
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)

#Diagonal matrices
pointvec = (1:3) .* ones(Point{Float64})

3-element Vector{Point{Float64}}:
 Point{Float64}(1.0, 1.0)
 Point{Float64}(2.0, 2.0)
 Point{Float64}(3.0, 3.0)

using LinearAlgebra
diagm(0=>pointvec)

3×3 Matrix{Point{Float64}}:
 Point{Float64}(1.0, 1.0)  Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(2.0, 2.0)  Point{Float64}(0.0, 0.0)
 Point{Float64}(0.0, 0.0)  Point{Float64}(0.0, 0.0)  Point{Float64}(3.0, 3.0)

rand(4,5) * ones(Point{Float64},5,2)

4×2 Matrix{Point{Float64}}:
 Point{Float64}(2.77515, 2.77515)  Point{Float64}(2.77515, 2.77515)
 Point{Float64}(3.42832, 3.42832)  Point{Float64}(3.42832, 3.42832)
 Point{Float64}(2.55391, 2.55391)  Point{Float64}(2.55391, 2.55391)
 Point{Float64}(1.60603, 1.60603)  Point{Float64}(1.60603, 1.60603)

range(Point(-1.0,-2.0),Point(3.0,4.0),length=10)

10-element LinRange{Point{Float64}, Int64}:
 Point{Float64}(-1.0, -2.0), …, Point{Float64}(3.0, 4.0)

3 Package management

4 Functional Programming

Transform an array by applying the same function to every element. We can do this using a loop:

a = rand(100)
out = similar(a)
for i in eachindex(a)
    out[i] = sqrt(a[i])
end
out

100-element Vector{Float64}:
 0.9600521482297211
 0.6526420528851079
 0.6917260533942913
 0.9409432647567283
 0.9442237519837718
 0.8808711985915347
 0.8231046779286021
 0.5391330544108317
 0.651259492779308
 0.1929081604917568
 ⋮
 0.230333791329993
 0.5099496064482936
 0.8657590303781428
 0.8506088285947234
 0.6936973349217656
 0.9629776168316614
 0.7255792468365476
 0.5129811381500429
 0.726308818744019

In the end we “map” a function over an array, so the following does the same as our loop defined above.

map(sqrt,a)

100-element Vector{Float64}:
 0.9600521482297211
 0.6526420528851079
 0.6917260533942913
 0.9409432647567283
 0.9442237519837718
 0.8808711985915347
 0.8231046779286021
 0.5391330544108317
 0.651259492779308
 0.1929081604917568
 ⋮
 0.230333791329993
 0.5099496064482936
 0.8657590303781428
 0.8506088285947234
 0.6936973349217656
 0.9629776168316614
 0.7255792468365476
 0.5129811381500429
 0.726308818744019

There is also the very similar broadcast function:

broadcast(sqrt,a)

100-element Vector{Float64}:
 0.9600521482297211
 0.6526420528851079
 0.6917260533942913
 0.9409432647567283
 0.9442237519837718
 0.8808711985915347
 0.8231046779286021
 0.5391330544108317
 0.651259492779308
 0.1929081604917568
 ⋮
 0.230333791329993
 0.5099496064482936
 0.8657590303781428
 0.8506088285947234
 0.6936973349217656
 0.9629776168316614
 0.7255792468365476
 0.5129811381500429
 0.726308818744019

Instead of calling the broadcast function explicitly, most Julia programmers would use the shorthand dot-syntax:

sqrt.(a)

100-element Vector{Float64}:
 0.9600521482297211
 0.6526420528851079
 0.6917260533942913
 0.9409432647567283
 0.9442237519837718
 0.8808711985915347
 0.8231046779286021
 0.5391330544108317
 0.651259492779308
 0.1929081604917568
 ⋮
 0.230333791329993
 0.5099496064482936
 0.8657590303781428
 0.8506088285947234
 0.6936973349217656
 0.9629776168316614
 0.7255792468365476
 0.5129811381500429
 0.726308818744019

which gets translated to the former expression when lowering the code.

4.1 Differences between broadcast and map

For single-argument functions there is no difference between map and broadcast. However, the functions differe in behavior when mutiple arguments are passed:

a = [0.1, 0.2, 0.3]
b = [1.0 2.0 3.0]
@show size(a)
@show size(b)
@show map(+,a,b)
@show broadcast(+,a,b)
@show a .+ b
nothing

size(a) = (3,)
size(b) = (1, 3)
map(+, a, b) = [1.1, 2.2, 3.3]
broadcast(+, a, b) = [1.1 2.1 3.1; 1.2 2.2 3.2; 1.3 2.3 3.3]
a .+ b = [1.1 2.1 3.1; 1.2 2.2 3.2; 1.3 2.3 3.3]

map - iterates over all arguments separately, and passing them one by one to the applied function - agnostic of array shapes

broadcast - is dimension-aware - matches lengths of arrays along each array dimension - expanding dimensions of length 1 or non-existing dimensions at the end

In most cases one uses broadcast because it is easier to type using the dot-notation.

4.1.1 `reduce` and `foldl`

reduce(+,1:10)

What is happening behind the scenes?

function myplus(x,y)
    @show x,y
    return x+y
end
reduce(myplus,1:10)

(x, y) = (1, 2)
(x, y) = (3, 3)
(x, y) = (6, 4)
(x, y) = (10, 5)
(x, y) = (15, 6)
(x, y) = (21, 7)
(x, y) = (28, 8)
(x, y) = (36, 9)
(x, y) = (45, 10)

foldl is very similar to reduce, but with left-associativty guaranteed (all elements of the array will be processed strictly in order), makes parallelization impossible.

Example task: find the longest streak of true values in a Bool array.

function streak(oldstate,newvalue)
    maxstreak, currentstreak = oldstate
    if newvalue #We extend the streak by 1
        currentstreak += 1
        maxstreak = max(currentstreak, maxstreak)
    else
        currentstreak = 0
    end
    return (maxstreak,currentstreak)
end
x = rand(Bool,1000)
foldl(streak,x,init=(0,0))

(8, 0)

mapreduce and mapfoldl combine both map and reduce. For example, to compute the sum of squares of a vector one can do:

r = rand(1000)
mapreduce(x->x*x,+,r)

340.7118723795464

To compute the longest streak of random numbers larger than 0.9 we could do:

mapfoldl(>(0.1),streak,r,init=(0,0))

(64, 8)

5 Allocations

Last exercise: In a vector of numbers, count how often a consecutive value is larger than its predecessor.

1 Why Julia?

2 Basics of Julia programming

2.0.1 So what is special about Julia?

2.1 Variables

2.2 Data types

2.2.1 Built-in Data types

2.2.2 Custom data types

2.3 Loops

2.4 Functions

2.5 Multiple Dispatch

3 Package management

4 Functional Programming

4.1 Differences between broadcast and map

4.1.1 reduce and foldl

5 Allocations

4.1.1 `reduce` and `foldl`