The document discusses tools and techniques for optimizing performance of OpenStack private clouds. It describes the need for better operations tools to manage instances and hosts. It also provides recommendations for disk input/output tuning in OpenStack components like Glance and Swift to improve performance. Specific tuning options discussed include adjusting the Glance chunk size and enabling caching in Glance for faster image retrieval.
1. Operating
your
OpenStack
Private
Cloud
Ryan
Richard
OpenStack
Engineer
ryan.richard@rackspace.com
@rackninja
October 12, 2012
Thursday, October 18, 12
2. Monitoring
and
Reporting
Where
we
were
-‐
April
2012
Basic
CDM
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
3. Monitoring
and
Reporting
Where
we
were
-‐
April
2012
Basic
CDM
Now
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
4. Monitoring
and
Reporting
Where
we
were
-‐
April
2012
Basic
CDM
Now
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
5. Tools
There
is
no
good
way
to
get
the
following
info:
I
need
a
list
of
instances
on
a
host
and
their
IPs
I
need
to
gracefully
start/stop
all
instances
on
a
host
Some
tools
needs
hostname,
some
need
id
(decimal
or
hex),
some
need
uuid
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
6. Tools
There
is
no
good
way
to
get
the
following
info:
I
need
a
list
of
instances
on
a
host
and
their
IPs
I
need
to
gracefully
start/stop
all
instances
on
a
host
Some
tools
needs
hostname,
some
need
id
(decimal
or
hex),
some
need
uuid
SELECT
instances.id,instances.hostname,instances.project_id,fixed_ips.address
as fixed_address,floating_ips.address as floating_address FROM instances
LEFT JOIN fixed_ips ON instances.id=fixed_ips.instance_id LEFT JOIN
floating_ips ON floating_ips.fixed_ip_id=fixed_ips.id WHERE
instances.deleted="NULL" AND instances.host="<hostname of physical
machine>" ORDER BY instances.id;
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
7. Tools
WE
NEED
BETTER
OPS
TOOLS!
RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
Thursday, October 18, 12
8. Tools
WE
NEED
BETTER
OPS
TOOLS!
Pulsar
https://github.com/
rsoprivatecloud/pulsar
“nova
swiss
army
knife”
requires
direct
nova
database
access
RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
Thursday, October 18, 12
9. Tools
WE
NEED
BETTER
OPS
TOOLS!
Pulsar
https://github.com/
rsoprivatecloud/pulsar
“nova
swiss
army
knife”
requires
direct
nova
database
access
RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
Thursday, October 18, 12
10. Tools
Holland
(opensource
database
backup
framework)
Written
by
Rackspace
DBAs
http://wiki.hollandbackup.org/
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
13. Tools
dsh
dsh -Mcg compute uname-a
bashfoo
for
i
in
`knife
node
list
|
grep
cpu`;
do
knife
node
run_list
add
$i
"role[single-‐compute]";
done
for
k
in
`seq
1
20`;
do
for
i
in
{compute,network};
do
nova-‐manage
service
disable
computevm0$k
nova-‐$i;
done;
done
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
14. Performance
and
Scale
Considerations
Disk
IO
For
high
performance
use
remote
block
storage
For
“local”
disk
IO,
raw
image
type
is
only
slightly
faster
than
qcow2
IO
will
degrade
while
Glance
copies
images
between
machines
scheduler=cfq,
KVM
cache=none
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
15. Performance
and
Scale
Considerations
Disk
IO Async&Random&IO&
rs/speed/test12"(cfq,"host"deadline,"cache=none)"
Rs/speed/test13"(noop,"cache=writeback)"
For
high
performance
use
rs/speed/test13"(cfq,"cache=writeback)"
remote
block
storage Rs/speed/test12"(noop"cache=none)"
randW"(direct)"
Rs/speed/test12"(cfq"cache=none)"
randR"(direct)"
For
“local”
disk
IO,
raw
randW"
Rs/speed/test13"(cfq,"cache=none,"no"ht)"
randR"
image
type
is
only
slightly
Rs/speed/test13"(deadline"cache=none)"
faster
than
qcow2 compute/host"(deadline)"
compute/host"(no"ht)"
compute/host"
IO
will
degrade
while
Glance
0" 200" 400" 600" 800" 1000" 1200" 1400" 1600"
Host&vs.&Instance&
copies
images
between
14000"
machines 12000"
10000"
scheduler=cfq,
KVM
8000"
cache=none
compute/host"
6000" Rs/speed/test12"(cfq"cache=none)"
4000"
2000"
0"
randR" randW" randR" randW" seqR" seqW"RACKSPACE® HOSTING
seqR" seqw" | WWW.RACKSPACE.COM
(direct)" (direct)" (direct)" (direct)"
Thursday, October 18, 12
20. Performance
and
Scale
Considerations
Swift
disk
usage
with
different
chunk
sizes
5
zones
-‐
4
x
1TB
disks
per
zone
20TB
raw
-‐
6.67TB
usable RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
21. Performance
and
Scale
Considerations
Swift
disk
usage
with
different
chunk
sizes
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
22. Performance
and
Scale
Considerations
Glance
chunk
size
Too
high
and
swift
can
become
unbalanced
What
are
the
downsides
to
being
too
low?
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
23. Performance
and
Scale
Considerations
Glance
Disk
Tuning
(swift)
read
ahead
on
your
block
device(s)
-‐
no
noticeable
gain
deadline
scheduler
-‐
no
noticeable
gain
Best
thing
for
glance
performance
-‐
Caching
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
24. Performance
and
Scale
Considerations
Glance
Disk
Tuning
(swift)
read
ahead
on
your
block
device(s)
-‐
no
noticeable
gain
deadline
scheduler
-‐
no
noticeable
gain
Best
thing
for
glance
performance
-‐
Caching
Image
Size Not
Cached Cached
1.4GB 20secs 1sec
16.4GB 2min
21secs 1sec
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
25. Performance
and
Scale
Considerations
Glance
Disk
Tuning
(swift)
read
ahead
on
your
block
device(s)
-‐
no
noticeable
gain
deadline
scheduler
-‐
no
noticeable
gain
Best
thing
for
glance
performance
-‐
Caching
Image
Size Not
Cached Cached *times
from
“creating
image”
to
“qemu-‐img
1.4GB 20secs 1sec create”
16.4GB 2min
21secs 1sec
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
26. Performance
and
Scale
Considerations
Scheduler
What
we
use
by
default:
scheduler
tasks
are
not
processed
in
parallel
Adding
additional
schedulers
helps
provide
HA
but
they
don’t
speed
up
overall
times
to
complete
requests
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
27. Automated
Config
Management
Chef:
http://github.com/rcbops/chef-‐cookbooks
time
to
stand
up
controller
-‐
less
than
20
minutes
compute
node
-‐
less
than
2
min
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
28. Day
to
Day
tasks
Dealing
with
new
issues
resize
-‐
all
nova-‐compute
processes
need
to
be
able
to
log
into
all
other
compute
nodes
via
ssh
keys
Hardware
failures
We’re
still
managing
infrastructure,
failures
happen
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
29. Lessons
Learned
We
need
better
Operations
tools!
Network
Design
can
be
confusing
for
people
used
to
“the
old
way”
OpenStack
is
still
relatively
new,
help
your
organization
understand
it.
It’s
easy
to
forget
we’re
working
with
Linux
machines
It’s
not
you,
it’s
a
bug
:)
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12
30. But....
But
this
is
a
design
summit
also
Open
to
discussions/thoughts/questions
RACKSPACE® HOSTING | WWW.RACKSPACE.COM
Thursday, October 18, 12