People love to talk about scale. Some vendors pitch that their systems easily support 100,000 simultaneous calls, or 500 calls per second, etc. The reality is, in the real world, people’s behaviors vary and the feature sets they use can cut these numbers down quickly. For example, ask that same vendor claiming 100,000 simultaneous calls if it can be done while call recording, call statistics and other features are turned on at the same time, and you’ll usually get a very different, cautious, qualified response.
In this presentation, we'll show you how to set up your infrastructure to support 100,000 simultaneous calls.
6. @kazoocon@kazoocon
Couch
• Overprovisioning is good
• Plan for # of accounts you’ll have in
1-2 years
• Z, N, W parameters
• Set these when you first start
• Z = Number of Datacenters / Zones
• N = Number of copies
• W = Number of writes before it’s
“good enough” to move on
• Db tool can be used later to redo
16. @kazoocon@kazoocon
REGISTER 401 REGISTER AUTHN_REQ COUCH AUTHN_RESP USRLOC REG_SUCCESS ECALLMGR
registration scenario
2 kamailio , 2 x kazoo registrar
instances with 4 listeners, 2 x
rabbit (clustered)
In this scenario, we are showing
5,800+ registrations per second.
Most phones register every 360
seconds, so:
360 x 5800 = 2,088,000 phones
(THEORETICAL!!!)
People love to talk about scale. Some vendors pitch that their systems easily support 100,000 simultaneous calls, or 500 calls per second, etc. The reality is, in the real world, people’s behaviors vary and the feature sets they use can cut these numbers down quickly. For example, ask that same vendor claiming 100,000 simultaneous calls if it can be done while call recording, call statistics and other features are turned on at the same time, and you’ll usually get a very different, cautious, qualified response.
The reality is scaling an open-source cluster to 10,000 phones and beyond can be a challenge because every customer’s usage patterns are different. There is no cookie-cutter solution if one customer has heavy dialer traffic, another heavy API traffic and yet another has business VoIP traffic that’s feature-rich.
Hardware variatioms, cloud based, vms, docker, hardware size, network capacity
Transcoding
Kazoo platform works with several components, each with its own tuning and scaling particularities
We will go through these componenst and see how to optimize them for scale.
All components work on top of a operating system.
And the OS also has some setting we should check/configure so that the components will work better when scaling up your ckuster
File Descriptors
Limits -> sockets -> processes
Logs -> syslog, drive capacity , SSD -> network if your shipping your log to another server
Network capacity
Overprovisioning
Db tool to reshard
File descriptors
Views
Crawlers
Auth db puts pressure
Balancing
Bare metal vs scale up
4 x 250 calls vs 1 x 1000 calls
Network capacity
Cpu if transcoding
Sqlite vs mysql (sofia tracks dialogs in db)
Mod_kazoo filters
Do not cross connect among zones
TCP workers
- udp listeners
- tcp listeners
Worker counts
Buffer parameters
Amqp workers / consumers
Memory (lots of)
Shared memory where we keep cached credentials, presence information, anti flood, dispatcher rules, acls
Db_text vs mysql
Mysql in memory – hard to maintain
Example of 3 zone configuration
1 has haproxy
Async example
kazoo_async_query
KAZOO_AUTHORIZATION_OK
KAZOO_AUTHORIZATION_ERROR
NO BLOCKING HERE
Separate servers for api / call process / utils
Package whapps independently to easily start new instances
Enough registrars
Enough registrar listeners
Handle the logs
Limits on lager / syslog
Balance the components of kazoo platform
Cheap load balancer
Rabbit
websockets
File descriptors
Limits
Balance to other zones , backup
Terminate ssl
Clustering
Granular statistics 10%
Cluster policies, not for private queues
Limits on rabbit
Thresholds that rabbit stops
Shared queues need ha policy
Federation plugin doesn’t have granular support for what you can downstream