This paper proposes an image generation attack that targets image scaling algorithms. The attack aims to 1) modify an image A to appear as image B when A is resized, and 2) introduce minimal distortions such that the attacked image still resembles A. The attack is model-agnostic, as it can target any model using a particular scaling framework and function. The paper develops an optimization approach to craft perturbed images that appear as a target when resized, and demonstrates successful attacks against commercial cloud vision APIs. Potential applications include data poisoning, evasion attacks, and fraud. Detection methods like color histograms may help identify such attacked images.
2. TL DR
âą This paper suggests image generation algorithm to form as convex optimization
to attack image scaling function. The objective of attack is
1. Make image đŽ to đ” when the đŽ is resized.
2. The distortion should be small enough that attack image should be almost looks
like đŽ
âą By this, we can assure that this attack is model-free. You can attack any model
that is using certain frame work and certain scaling function. And scaling
function is used in any code line
âą They suggests effective querying process to reveal Cloud Vision API providerâs
scaling size
3. Prerequisite
What is scaling?
âą Scaling function is resizing function that make input
image to match specific shape.
âą Deep learning model is basically matrix calculation.
So we should make our input static (solid shape)
4. Prerequisite
Inconsistency in DL model input shape and camera size
Basic Camera resolution chart Deep learning model input shape
Image scaling function is essential to every deep learning model.
5. Prerequisite
Interpolation and sampling
Bilinear interpolation
Interpolation: A type of estimation, a method of
constructing new data points within the range of a
discrete set of known data points. [Inter + pole]
Linear InterpolationGiven set Spline Interpolation
6. Prerequisite
Interpolation and sampling
Sampling: sampling is the reduction of a
continuous-time signal to a discrete-time signal
Bit depth : Quantization of input signal
Sampling rate : Quantization of time segment
7. Prerequisite
The Nyquist theorem specifies that a sinusoidal function in time or
distance can be regenerated with no loss of information as long as it is
sampled at a frequency greater than or equal to twice per cycle.
Alias and Nyquist Theorem
Nyquist Theorem
8. Prerequisite
Nyquist Theorem
Letâs suppose that pixel values are the discrete signal.
When we scale down the input image, we have not sufficient information of original image
value. To prevent aliasing artifacts, we must use filter to erase the aliasing artifacts.
Without optical low-pass filter With optical low-pass filterWithout optical low-pass filter
9. Prerequisite conclusion
1. Pixels are discrete signal.
2. We need filter with coefficient to scale down the image.
3. Scaling the image can be considered at data under-sampling
4. Due to physical limitations, scaling is used almost every deep
learning model.
10. Main subject
1. Background
âą A lot of DL Framework provide
their own image resize
method.
âą Order of interpolation is
Horizonal to vertical.(element
wise to channel-wise)
11. Main subject
1. Background
âą Even though you are not using
resize function, somewhere in
the framework might inferring
the resize function.
12. Main subject
2. Objective
âą The objective of this goal is map
the perturbations on Source
image that after scaling function
đđđđđđčđąđđ(đ„) , the attack image
turns into target image.
đđđđđđčđąđđ(đ„)
Source Image
Attack Image Target Image
đđđđđđčđąđđ(đ„)
Source Image
Attack Image Target image
13. Main subject
3. Taxonomy
âą Source image (đ đâđ): the image that an
attacker wants the attack image to look like
âą Attack image (đŽ đâđ): the crafted image
eventually created and fed to the scaling
function
âą Output image (đ· đâČâđâČ): the output image of
the scaling function
âą Target image (đ đâČâđâČ): the image that the
attacker wants the outImg to look like
âą Scale function (ScaleFunc): The scaling function
of image.
đ đâđ + â1 = đŽ đâđ
â1 = đŽ đâđ- đ đâđ
â2 = đ· đâČâđâČ - đ đâČâđâČ
14. Main subject
4. Attack method
âą Strong attack form: we KNOW the source image
that wants to make it to attack image
âą Weak attack: we DONâT know the source image
that wants to make it to attack image.
Unknown +
Example output image
=
15. Main subject
4.1 Strong attack form
âą Strong attack form: we KNOW the source image
that wants to make it to attack image
âą Weak attack: we DONâT know the source image
that wants to make it to attack image.
Objective function: min(| đŽ đâđâ đ đâđ |2
)
Constraints: ||đ đâČâđâČ - đ· đâČâđâČ||â †đ â đŒđ đđđ„
16. Main subject
4.1 Coefficient analysis
âą As we said before, we need filter matrix to
resize not to alias the image.
âą And because of overlapping in filter
matrix(like CNN) we need to calculate
separately to make perturbation
18. Main subject
4.3 Strong attack form
âą Constraints is a upper boundary of pixel
value(Constant function). So this constraints is
Linear.
âą By that, we can calculate this as a convex form
Objective function: min(| đŽ đâđâ đ đâđ |2
)
Constraints: ||đ đâČâđâČ - đ· đâČâđâČ||â †đ â đŒđ đđđ„
WLOG
19. Main subject
4.4 Strong attack form algorithm analysis
âą Decomposition into sub matrix problem.
20. Main subject
4.5 Cloud inference attack(black box)
âą We have to know the exact size of
cloud DL model input size.
âą inferring model image serach space is
đ đ4
= (đđđđđđđ â
đđđđđđđ đđđĄâđđ â âđđđâđĄ â đ€đđđĄâ)
ï setting range[201,300] in H, W
ï Infer different class by k times at the same
time (k=4)
21. Main subject
5.1 Result
âą Attack target: Azure, Baidu, Aliyun, Tencent
âą Testing Dataset: 935 (Crafted)
ï Class except Sheep or sheep-like animal
ï Set as 800*600 image
ï đ = 0.01
ï Target = Sheep
âą Baidu , Aliyun ,Tencent got 100% success ratio
where as Azuzre is more complex
âą CDF(cumulative distribution function) shows
that Tag and description is successfully attacked
by this algorithm.
23. Main subject
5.2 Possible attack scenario
âą Data poisoning on database.
âą Detection evasion and Cloaking on CNN
based deep learning models.
âą Fraud by Leveraging Inconsistencies
between Displays. (ex mobile)
24. Main subject
5.3 Detection of attack
âą Color-histogram-based Detection
âą Color-scattering-based Detection
25. Conclusion
6. Pros
âą This attack is model-free attack. It means
that we can use this attack in any situation
(not only limited in Deep learning)
âą This attack is more light-weight than
adversarial attack by deep learning.
âą Attack success ratio & confidence is high.
6. Cons
âą If the model do not use the resize method(such
as yolo based object detection). It cannot be
successful.
âą Only can be applied on smaller attack image
âą The perturbations are easily recognizable by
human. The key of this kind of attack is should
be out of human-eye. You can easily recognize
that this image is somewhat wrong.
Notas do Editor
So image scaling is basically a interpolation between the adjacent pixel matrix.
Gaussian process ë ìì§ë§ we will skip that.
So what is sampling? We know sampling in statistical way. It is almost same semantic meaning in Signal processing area. We cannot know continuous values so we split the each time line (called sampling rate) and represent the signal amplitude(ì§í) Letâs think
If we are digging this part too far, we will be buried at mathematical equations so let me just brifly go over this.
If we decompose the signal into really small time, we can see this kind of sin graph. But if our sampling rate
So what is sampling? We know sampling in statistical way. It is almost same semantic meaning in Signal processing area. We cannot know continuous values so we split the each time line (called sampling rate) and represent the signal amplitude(ì§í)
So what is sampling? We know sampling in statistical way. It is almost same semantic meaning in Signal processing area. We cannot know continuous values so we split the each time line (called sampling rate) and represent the signal amplitude(ì§í)