SlideShare uma empresa Scribd logo
1 de 33
Some things you need to know
Jongsu Kim
Fortran
Fortran….
• Still Fortran 77, 90, or 95?
• Fortran 2003 & 2008 is already here and 2015 will be a future.
• Some parts will be deleted or obsolescent.
• We are using Fortran wrong way.
What you shouldn’t use
Labeled Do Loops
do 100
ii=istart,ilast,istep
isum = isum + ii
100 continue
1 2 3 4 5 6 7
A
B
C(1) C(2)
EQUIVALENCE
specify the sharing of storage units by two or more objects
in a scoping unit
character (len=3) :: C(2)
character (len=4) :: A,B
equivalence (A,C(1)), (B,C(2))
COMMON
Blocks of physical storage accessed by any of
the scoping units in a program
COMMON /BLOCKA/ A,B,C(10,30)
COMMON I, J, K
ENTRY
subroutine-like-things Inside subroutine
FIXED FORM SOURCE
Fortran 77 style (80 column restriction)
CHARACTER* form
replaced with CHARACTER(LEN=?)
NON-BLOCK DO CONSTRUCT
the DO range doesn't end in a CONTINUE or
END DO
What you shouldn’t use
Labeled Do Loops
Label doesn’t need, hard to remember
what meaning of number. Moreover, we
have END DO or CYCLE statement
EQUIVALENCE
Equivalence is also error-prone. It is hard to
memorize all of positions where this variables
points.
Since COMMON and EQUIVALENCE is not to
encouraged to use, BLOCK statement is also not
to do.
COMMON
Sharing lots of variables over program is
dangerous. It is error-prone
ENTRY
It complicates program because we have
module & subroutine
NON-BLOCK DO CONSTRUCT
Hard to maintain where DO loop ends
What you might want to use – CYCLE , EXIT
• Avoid GOTO Statement
• Use CYCLE or EXIT statement
• CYCLE : Skip to the end of a loop
• EXIT : exit loop
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) exit
z = cos(x)
enddo
do i=1, 100
x = real(i)
y = sin(x)
if (i == 20) cycle
z = cos(x)
enddo
19 iteration will be done successfully, but at
20th iteration, y = sin(x) executed
then exit loop.
100 iteration, but at i=20, z = cos(x)
doesn’t executed
What you might want to use – CYCLE , EXIT
• Avoid GOTO statement
• Use CYCLE or EXIT statement with nested loop
• Constructs (DO, IF, CASE, etc.) may have names
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) exit outer
z = cos(x)
enddo inner
enddo outer
Exit whole loop at i=21 Skip z=cos(x) when i>21
outer: do j=1, 100
inner: do i=1, 100
x = real(i)
y = sin(x)
if (i > 20) cycle outer
z = cos(x)
enddo inner
enddo outer
What you might want to use – WHERE
real, dimension(4) :: &
x = [ -1, 0, 1, 2 ], &
a = [ 5, 6, 7, 8 ]
...
where (x < 0)
a = -1.
end where
where (x /= 0)
a = 1. / a
elsewhere
a = 0.
end where
where (x < 0)
a = -1.
end where
a : {-1.0, 6.0, 7.0, 8.0}
where (x /= 0)
a = 1. / a
elsewhere
a = 0.
end where
a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
What you might want to use – ANY
integer, parameter :: n = 100
real, dimension(n,n) :: a, b, c1, c2
c1 = my_matmul(a, b) ! home-grown function
c2 = matmul(a, b) ! built-in function
if (any(abs(c1 - c2) > 1.e-4)) then
print *, ’There are significant
differences’
endif
• ANY and WHERE remove redundant do loop
What you might want to use – DO CONCURRENT
• Vectorization
• Simple example of Auto-Parallelization
• Definition : Processes one operation on multiple pairs of operands at once
do concurrent (i=1:m)
call dosomething()
end do
DO i=1,1024
C(i) = A(i) * B(i)
END DO
DO i=1,1024,4
C(i:i+3) = A(i:i+3) * B(i:i+3)
END DO
• ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option.
• No data dependencies, No EXIT or CYCLE Statement, No return statement.
• Use with OpenMP.
For More..
• Read Fortran 2008 Standard
• http://www.j3-fortran.org/doc/year/10/10-007.pdf
• More recent document for Fortran 2015 (or more, working now)
• http://j3-fortran.org/doc/year/15/15-007.pdf
• Easy to read documents
• The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf
• Modern Programming Languages: Fortran90/95/2003/2008 :
https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf
Build System (MakeFile)
Build?
• Process From Source Code to Executable Files, so called Build.
• Compiler : tool for compile, Linker : tool for Link.
• ifort, gcc, gfortran, and so on are combined tool for compile & link.
Source Code1.f
Source Code2.f
Source Code3.f
Source Code1.o
Source Code2.o
Source Code3.o
Compile Link
Libraries(FFTW..)
Readable Unreadable
a.out
Makefile?
• make do all of compile & link jobs automatically. Makefile is a build script.
• make(actually gmake) is one of many tools. There are many tools like make, so called build
system.
• Visual studio has own build system. Hence it doesn’t use makefile.
$ gcc -o hellomake hellomake.c hellofunc.c -I.
hellomake: hellomake.c hellofunc.c
gcc -o hellomake hellomake.c hellofunc.c -I.
1. Command-line
2. Simple Makefile (1)
• “hellomake:” : rule name
• “hellomake.c hellofunc.c hellomake.h” : dependencies
• “gcc …” : actual command
• Simply “make” execute first rule defined in Makefile
Makefile Command-line
$ make or
$ make hellomake
Makefile?
CC=gcc
CFLAGS=-I.
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
3. Simple Makefile (3)
Add constants
• “CC=gcc” : C Compiler
• “CFLAGS” : list of flags to pass to the compilation command
• For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS”
• Indent(tab) with command line (“$(CC)”) is important!
$ make or
$ make hellomake
Makefile?
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
hellomake: hellomake.o hellofunc.o
$(CC) -o hellomake hellomake.o hellofunc.o -I.
%.o: %.c $(DEPS)
$(CC) -c $< $(CFLAGS)
4. Simple Makefile (4)
Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile
• Rule %.o : rule for compilation, Rule hellomake : rule for link.
• $@ is the name of the file to be made. (e.g. hellomake for rule hellomake)
• $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake)
• $^ The names of all the prerequisites, with spaces between them
• $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c)
$ make or
$ make hellomake
Compiler & Linker Options
FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include
LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm
Compiler Options and Linker Options
• -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive
Optimization)
• -r8 : real type is a double precision (8byte(=64bit) for real)
• -I : Specify include directory. Include : .h files (declaration)
• -L : Specify library directory. Library files : .so or .a
• -lfftw3 : Link with fftw3 library
• -lm : link with math library (to use several math intrinsic functions)
Compiler & Linker Options
Recommend options
• -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary
computations on the heap instead of the stack. Same effect as allocate statement.
• -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) :
SSE4.2
• -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results.
• -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT.
• -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f
suffix as Fortran 90 or higher, enable this option.
• $ man ifort gives us a lot of additional information.
Debug vs Release
• -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds
some additional code hence it slows code and turn off optimization automatically.
• If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or –
check options.
MKL BLAS & CG Method
Intel MKL(Math Kernel Library) and BLAS
Intel MKL
• A library of optimized math routines for science, engineering, and financial applications.
• Basic functions related to matrix or vector included.
• You don’t need any installation, just add library.
BLAS
• Basic Linear Algebra Subprograms
• a set of low-level routines for performing common linear algebra operations such as vector addition, scalar
multiplication, dot products, linear combinations, and matrix multiplication
• It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on.
• I will use MKL BLAS because it is easy to compile and well documentated.
• It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI
parallelism is not implemented).
I will show how to make CG method using MKL BLAS line by line.
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1
1
1
row offsets
column indices
values
9 entries (non zero entries)
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1
1 2
1 7
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2
1 7 2
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3
1 2 2 3
1 7 2 8
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1
1 7 2 8 5
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3
1 7 2 8 5 3
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5
1 2 2 3 1 3 4
1 7 2 8 5 3 9
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2
1 7 2 8 5 3 9 6
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
values
9 entries (non zero entries)
row offsets
Sparse Matrix Format
• Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
column indices
values
9 entries (non zero entries)
row offsets
Indicates end
Sparse matrix
• If construct A matrix with zeros, 16 * 8bytes is required
• Sparse matrix, CSR matrix, requires 23 * 8bytes.
• Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient.
1 7 0 0
0 2 8 0
5 0 3 9
0 6 0 4
1 3 5 8 10
1 2 2 3 1 3 4 2 4
1 7 2 8 5 3 9 6 4
What BLAS Library Functions Required?
• mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3-
array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation.
• call mkl_dcsrgemv(transa, m, a, ia, ja, x, y)
• transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’).
• m : # of rows of A
• a : Values array of A in CSR format
• ia : Row offset array of A in CSR format
• ja : Column indices array of A in CSR format
• x : x vector
• y : output (𝐴𝑥)
• dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥
• call dcopy(n, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
What BLAS Library Functions Required?
• ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦
• not subroutine, it’s a function.
• dot(x, y)
• x, y : 𝑥, 𝑦 vector
• daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y
• 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• call daxpy(n, a, x, y)
• n : # of elements in vectors 𝑥 and 𝑦.
• A : Scalar A
• x : Input, 𝑥 vector
• y : Output, 𝑦 vector
• dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦
• not subroutine, it’s a function
• nrm2(x)
• n : # of elements in vectors 𝑥.
• x : Input, 𝑥 vector

Mais conteúdo relacionado

Mais procurados

Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimizationZongYing Lyu
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumHossam Hassan
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemcSUBRAHMANYA S
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016Ehsan Totoni
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelismdeviyasharwin
 
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)Sɐɐp ɐɥɯǝp
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Christian Peel
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memoryNico Ludwig
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincxAlina Dolgikh
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler designAnul Chaudhary
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprogramsbaran19901990
 

Mais procurados (20)

Embedded system -Introduction to hardware designing
Embedded system  -Introduction to hardware designingEmbedded system  -Introduction to hardware designing
Embedded system -Introduction to hardware designing
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
Synthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrumSynthesizing HDL using LeonardoSpectrum
Synthesizing HDL using LeonardoSpectrum
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
Presentation systemc
Presentation systemcPresentation systemc
Presentation systemc
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 
Instruction level parallelism
Instruction level parallelismInstruction level parallelism
Instruction level parallelism
 
Programmable logic device (PLD)
Programmable logic device (PLD)Programmable logic device (PLD)
Programmable logic device (PLD)
 
Open mp
Open mpOpen mp
Open mp
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory(8) cpp stack automatic_memory_and_static_memory
(8) cpp stack automatic_memory_and_static_memory
 
Максим Харченко. Erlang lincx
Максим Харченко. Erlang lincxМаксим Харченко. Erlang lincx
Максим Харченко. Erlang lincx
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
 
Peephole optimization techniques in compiler design
Peephole optimization techniques in compiler designPeephole optimization techniques in compiler design
Peephole optimization techniques in compiler design
 
OpenMp
OpenMpOpenMp
OpenMp
 
09 implementing+subprograms
09 implementing+subprograms09 implementing+subprograms
09 implementing+subprograms
 
Matlab isim link
Matlab isim linkMatlab isim link
Matlab isim link
 
Openmp
OpenmpOpenmp
Openmp
 

Destaque

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1Computational Materials Science Initiative
 
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaJongsu "Liam" Kim
 
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorJongsu "Liam" Kim
 
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulationJongsu "Liam" Kim
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementationJongsu "Liam" Kim
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKMaho Nakata
 
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: IntroductionJollen Chen
 
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionagedgnadt
 
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?Faith Zeller
 
CITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANCITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANSheikh Hasnain
 

Destaque (20)

CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
CMSI計算科学技術特論A (2015) 第6回 線形代数演算ライブラリBLASとLAPACKの基礎と実践1
 
Cubase subject introduction
Cubase subject introductionCubase subject introduction
Cubase subject introduction
 
History Against Against
History Against AgainstHistory Against Against
History Against Against
 
Vietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South KoreaVietnam troops from South korea is benefit for South Korea
Vietnam troops from South korea is benefit for South Korea
 
Cubase1차발표
Cubase1차발표Cubase1차발표
Cubase1차발표
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
 
Stress Tensor & Rotation Tensor
Stress Tensor & Rotation TensorStress Tensor & Rotation Tensor
Stress Tensor & Rotation Tensor
 
Level Set Method
Level Set MethodLevel Set Method
Level Set Method
 
Level set method for droplet simulation
Level set method for droplet simulationLevel set method for droplet simulation
Level set method for droplet simulation
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
The MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACKThe MPACK : Multiple precision version of BLAS and LAPACK
The MPACK : Multiple precision version of BLAS and LAPACK
 
SAN
SANSAN
SAN
 
Android Application: Introduction
Android Application: IntroductionAndroid Application: Introduction
Android Application: Introduction
 
Intelligence, spies & espionage
Intelligence, spies & espionageIntelligence, spies & espionage
Intelligence, spies & espionage
 
Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012Carrick - Introduction to Physics & Electronics - Spring Review 2012
Carrick - Introduction to Physics & Electronics - Spring Review 2012
 
What is Network Security?
What is Network Security?What is Network Security?
What is Network Security?
 
Trends in spies
Trends in spiesTrends in spies
Trends in spies
 
Serial Killers Presentation1
Serial Killers Presentation1Serial Killers Presentation1
Serial Killers Presentation1
 
SAN Review
SAN ReviewSAN Review
SAN Review
 
CITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHANCITY OF SPIES BY SORAYYA KHAN
CITY OF SPIES BY SORAYYA KHAN
 

Semelhante a Fortran & Link with Library & Brief Explanation of MKL BLAS

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshopVinay Kumar
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningCarol McDonald
 
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2alhadi81
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)Ortus Solutions, Corp
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...Ortus Solutions, Corp
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Zohar Elkayam
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#Hawkman Academy
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsMicrosoft Tech Community
 
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming LanguageSteve Johnson
 
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219Peter Schouboe
 
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operatorsraksharao
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operatorsmcollison
 
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operationsSmee Kaem Chann
 

Semelhante a Fortran & Link with Library & Brief Explanation of MKL BLAS (20)

Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
 
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
 
embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
 
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, TuningJava 5 6 Generics, Concurrency, Garbage Collection, Tuning
Java 5 6 Generics, Concurrency, Garbage Collection, Tuning
 
SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2SKEL 4273 CAD with HDL Topic 2
SKEL 4273 CAD with HDL Topic 2
 
CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)CBStreams - Java Streams for ColdFusion (CFML)
CBStreams - Java Streams for ColdFusion (CFML)
 
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
ITB2019 CBStreams : Accelerate your Functional Programming with the power of ...
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
 
C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#C# 101: Intro to Programming with C#
C# 101: Intro to Programming with C#
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
 
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
 
Using existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analyticsUsing existing language skillsets to create large-scale, cloud-based analytics
Using existing language skillsets to create large-scale, cloud-based analytics
 
Etl2
Etl2Etl2
Etl2
 
Learn c++ Programming Language
Learn c++ Programming LanguageLearn c++ Programming Language
Learn c++ Programming Language
 
Meg bernal insight2014 4219
Meg bernal insight2014 4219Meg bernal insight2014 4219
Meg bernal insight2014 4219
 
Data types and Operators
Data types and OperatorsData types and Operators
Data types and Operators
 
Pi j1.3 operators
Pi j1.3 operatorsPi j1.3 operators
Pi j1.3 operators
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
Lecture 01 variables scripts and operations
Lecture 01   variables scripts and operationsLecture 01   variables scripts and operations
Lecture 01 variables scripts and operations
 

Último

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 

Último (20)

VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 

Fortran & Link with Library & Brief Explanation of MKL BLAS

  • 1. Some things you need to know Jongsu Kim
  • 3. Fortran…. • Still Fortran 77, 90, or 95? • Fortran 2003 & 2008 is already here and 2015 will be a future. • Some parts will be deleted or obsolescent. • We are using Fortran wrong way.
  • 4. What you shouldn’t use Labeled Do Loops do 100 ii=istart,ilast,istep isum = isum + ii 100 continue 1 2 3 4 5 6 7 A B C(1) C(2) EQUIVALENCE specify the sharing of storage units by two or more objects in a scoping unit character (len=3) :: C(2) character (len=4) :: A,B equivalence (A,C(1)), (B,C(2)) COMMON Blocks of physical storage accessed by any of the scoping units in a program COMMON /BLOCKA/ A,B,C(10,30) COMMON I, J, K ENTRY subroutine-like-things Inside subroutine FIXED FORM SOURCE Fortran 77 style (80 column restriction) CHARACTER* form replaced with CHARACTER(LEN=?) NON-BLOCK DO CONSTRUCT the DO range doesn't end in a CONTINUE or END DO
  • 5. What you shouldn’t use Labeled Do Loops Label doesn’t need, hard to remember what meaning of number. Moreover, we have END DO or CYCLE statement EQUIVALENCE Equivalence is also error-prone. It is hard to memorize all of positions where this variables points. Since COMMON and EQUIVALENCE is not to encouraged to use, BLOCK statement is also not to do. COMMON Sharing lots of variables over program is dangerous. It is error-prone ENTRY It complicates program because we have module & subroutine NON-BLOCK DO CONSTRUCT Hard to maintain where DO loop ends
  • 6. What you might want to use – CYCLE , EXIT • Avoid GOTO Statement • Use CYCLE or EXIT statement • CYCLE : Skip to the end of a loop • EXIT : exit loop do i=1, 100 x = real(i) y = sin(x) if (i == 20) exit z = cos(x) enddo do i=1, 100 x = real(i) y = sin(x) if (i == 20) cycle z = cos(x) enddo 19 iteration will be done successfully, but at 20th iteration, y = sin(x) executed then exit loop. 100 iteration, but at i=20, z = cos(x) doesn’t executed
  • 7. What you might want to use – CYCLE , EXIT • Avoid GOTO statement • Use CYCLE or EXIT statement with nested loop • Constructs (DO, IF, CASE, etc.) may have names outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) exit outer z = cos(x) enddo inner enddo outer Exit whole loop at i=21 Skip z=cos(x) when i>21 outer: do j=1, 100 inner: do i=1, 100 x = real(i) y = sin(x) if (i > 20) cycle outer z = cos(x) enddo inner enddo outer
  • 8. What you might want to use – WHERE real, dimension(4) :: & x = [ -1, 0, 1, 2 ], & a = [ 5, 6, 7, 8 ] ... where (x < 0) a = -1. end where where (x /= 0) a = 1. / a elsewhere a = 0. end where where (x < 0) a = -1. end where a : {-1.0, 6.0, 7.0, 8.0} where (x /= 0) a = 1. / a elsewhere a = 0. end where a : {-1.0, 0.0, 1.0/7.0, 1.0/8.0}
  • 9. What you might want to use – ANY integer, parameter :: n = 100 real, dimension(n,n) :: a, b, c1, c2 c1 = my_matmul(a, b) ! home-grown function c2 = matmul(a, b) ! built-in function if (any(abs(c1 - c2) > 1.e-4)) then print *, ’There are significant differences’ endif • ANY and WHERE remove redundant do loop
  • 10. What you might want to use – DO CONCURRENT • Vectorization • Simple example of Auto-Parallelization • Definition : Processes one operation on multiple pairs of operands at once do concurrent (i=1:m) call dosomething() end do DO i=1,1024 C(i) = A(i) * B(i) END DO DO i=1,1024,4 C(i:i+3) = A(i:i+3) * B(i:i+3) END DO • ALLOW/REQUEST Vectorization. If you need vectorization, enable –parallel option. • No data dependencies, No EXIT or CYCLE Statement, No return statement. • Use with OpenMP.
  • 11. For More.. • Read Fortran 2008 Standard • http://www.j3-fortran.org/doc/year/10/10-007.pdf • More recent document for Fortran 2015 (or more, working now) • http://j3-fortran.org/doc/year/15/15-007.pdf • Easy to read documents • The new features of Fortran 2008 : ftp://ftp.nag.co.uk/sc22wg5/N1801-N1850/N1828.pdf • Modern Programming Languages: Fortran90/95/2003/2008 : https://www.tacc.utexas.edu/documents/13601/162125/fortran_class.pdf
  • 13. Build? • Process From Source Code to Executable Files, so called Build. • Compiler : tool for compile, Linker : tool for Link. • ifort, gcc, gfortran, and so on are combined tool for compile & link. Source Code1.f Source Code2.f Source Code3.f Source Code1.o Source Code2.o Source Code3.o Compile Link Libraries(FFTW..) Readable Unreadable a.out
  • 14. Makefile? • make do all of compile & link jobs automatically. Makefile is a build script. • make(actually gmake) is one of many tools. There are many tools like make, so called build system. • Visual studio has own build system. Hence it doesn’t use makefile. $ gcc -o hellomake hellomake.c hellofunc.c -I. hellomake: hellomake.c hellofunc.c gcc -o hellomake hellomake.c hellofunc.c -I. 1. Command-line 2. Simple Makefile (1) • “hellomake:” : rule name • “hellomake.c hellofunc.c hellomake.h” : dependencies • “gcc …” : actual command • Simply “make” execute first rule defined in Makefile Makefile Command-line $ make or $ make hellomake
  • 15. Makefile? CC=gcc CFLAGS=-I. hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. 3. Simple Makefile (3) Add constants • “CC=gcc” : C Compiler • “CFLAGS” : list of flags to pass to the compilation command • For Fortran, “FC” instead of “CC”, “FFLAGS” instead of “CFLAGS” • Indent(tab) with command line (“$(CC)”) is important! $ make or $ make hellomake
  • 16. Makefile? CC=gcc CFLAGS=-I. DEPS = hellomake.h hellomake: hellomake.o hellofunc.o $(CC) -o hellomake hellomake.o hellofunc.o -I. %.o: %.c $(DEPS) $(CC) -c $< $(CFLAGS) 4. Simple Makefile (4) Automatically find .c files and make a rule for compilation(.o). $@ and $< are special macros in Makefile • Rule %.o : rule for compilation, Rule hellomake : rule for link. • $@ is the name of the file to be made. (e.g. hellomake for rule hellomake) • $< The name of the first prerequisite. (hellomake.o is first prerequisite of rule hellomake) • $^ The names of all the prerequisites, with spaces between them • $* the prefix shared by target and dependent files (hellomake : $* of hellomake.c) $ make or $ make hellomake
  • 17. Compiler & Linker Options FFLAGS=-O3 -r8 -openmp -I /home/astromece/usr/fftw/include LIBS=-L/home/astromeca/usr/lib -lfftw3 -lm Compiler Options and Linker Options • -O3 : Optimization Level (O1 : Code size optimization, O2 : General Optimization(Default), O3 : Aggressive Optimization) • -r8 : real type is a double precision (8byte(=64bit) for real) • -I : Specify include directory. Include : .h files (declaration) • -L : Specify library directory. Library files : .so or .a • -lfftw3 : Link with fftw3 library • -lm : link with math library (to use several math intrinsic functions)
  • 18. Compiler & Linker Options Recommend options • -heap-arrays [numbers] : Puts automatic arrays and arrays above [numbers]KB created for temporary computations on the heap instead of the stack. Same effect as allocate statement. • -axcode [code] : Specify CPU architecture. DGIST, Boolt : AVX, CSE Server(OMP) : SSE4.1, CSE Server(SMP) : SSE4.2 • -O2 : before enable –O3, compare results with -O2 and -O3 options. “Sometimes”, -O3 cause different results. • -parallel : Enable auto parallelized code. turn on if you use DO CONCURRENT. • -free : free-form source (f90 style), ifort automatically compile .f file as Fortran77. If you want to compile .f suffix as Fortran 90 or higher, enable this option. • $ man ifort gives us a lot of additional information. Debug vs Release • -g (to use debugger) or –check (check array bounds and son on) option help reducing errors, however, it adds some additional code hence it slows code and turn off optimization automatically. • If you are sure that you don’t have errors and want to get results, enable optimization but remove –g or – check options.
  • 19. MKL BLAS & CG Method
  • 20. Intel MKL(Math Kernel Library) and BLAS Intel MKL • A library of optimized math routines for science, engineering, and financial applications. • Basic functions related to matrix or vector included. • You don’t need any installation, just add library. BLAS • Basic Linear Algebra Subprograms • a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication • It has same interface but has various implementations, ATLAS, MKL, OpenBLAS, GotoBLAS and so on. • I will use MKL BLAS because it is easy to compile and well documentated. • It already parallelized. Hence, just turn on an option make all parallelism without using OpenMP. (MPI parallelism is not implemented). I will show how to make CG method using MKL BLAS line by line.
  • 21. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 1 row offsets column indices values 9 entries (non zero entries)
  • 22. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 1 2 1 7 column indices values 9 entries (non zero entries) row offsets
  • 23. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 1 7 2 column indices values 9 entries (non zero entries) row offsets
  • 24. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 1 2 2 3 1 7 2 8 column indices values 9 entries (non zero entries) row offsets
  • 25. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 1 7 2 8 5 column indices values 9 entries (non zero entries) row offsets
  • 26. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 1 7 2 8 5 3 column indices values 9 entries (non zero entries) row offsets
  • 27. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 1 2 2 3 1 3 4 1 7 2 8 5 3 9 column indices values 9 entries (non zero entries) row offsets
  • 28. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 1 7 2 8 5 3 9 6 column indices values 9 entries (non zero entries) row offsets
  • 29. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets
  • 30. Sparse Matrix Format • Before starting BLAS Library Functions, we need to consider how to construct 𝐴 matrix in 𝐴𝑥 = 𝑏. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4 column indices values 9 entries (non zero entries) row offsets Indicates end
  • 31. Sparse matrix • If construct A matrix with zeros, 16 * 8bytes is required • Sparse matrix, CSR matrix, requires 23 * 8bytes. • Inefficient? No, if you have large A matrix, such as 𝑛𝑥 ⋅ 𝑛𝑦 × (𝑛𝑥 ⋅ 𝑛𝑦), CSR is SOOOO efficient. 1 7 0 0 0 2 8 0 5 0 3 9 0 6 0 4 1 3 5 8 10 1 2 2 3 1 3 4 2 4 1 7 2 8 5 3 9 6 4
  • 32. What BLAS Library Functions Required? • mkl_dcsrgemv : Computes matrix - vector product of a sparse general matrix stored in the CSR format (3- array variation) with zero-based indexing with double precision. used in 𝐴𝑥 computation. • call mkl_dcsrgemv(transa, m, a, ia, ja, x, y) • transa : determine 𝐴𝑥 (transa=‘N’ or ‘n’) or 𝐴’𝑥 (transa=‘T’ or ‘t’ or ‘C’ or ‘c’). • m : # of rows of A • a : Values array of A in CSR format • ia : Row offset array of A in CSR format • ja : Column indices array of A in CSR format • x : x vector • y : output (𝐴𝑥) • dcopy : Copy vector (routines), copy arrays from x to y. 𝑦 = 𝑥 • call dcopy(n, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • x : Input, 𝑥 vector • y : Output, 𝑦 vector
  • 33. What BLAS Library Functions Required? • ddot : Computes a vector-vector dot product. 𝑥 ⋅ 𝑦 • not subroutine, it’s a function. • dot(x, y) • x, y : 𝑥, 𝑦 vector • daxpy : Computes a vector-scalar product and adds the result to a vector. SAXPY : Single-precision A·X Plus Y • 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • call daxpy(n, a, x, y) • n : # of elements in vectors 𝑥 and 𝑦. • A : Scalar A • x : Input, 𝑥 vector • y : Output, 𝑦 vector • dnrm2 : Computes the Euclidean norm of a vector. 𝑦 = 𝑎 ⋅ 𝑥 + 𝑦 • not subroutine, it’s a function • nrm2(x) • n : # of elements in vectors 𝑥. • x : Input, 𝑥 vector