Во время доклада, я поделюсь с Вами опытом, который мы получили, используя микросервисы в прод K8S кластере. Также, обозначу основные проблемы, с которыми столкнулась наша команда на этапе их диагностики. И, самое главное - что мы сделали чтобы избежать их в будущем. Отвечу на вопросы: Почему мы мигрировали в облако? Почему dotNet Core 2.2 вызвал кучу проблем? Данный доклад сохранит сотни часов вашим разработчикам и DevOps команде, жизнь которой может напоминать кошмар.
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
.NET Fest 2019. Леонид Молотиевский. DotNet Core in production
1. Тема доклада
Тема доклада
Тема доклада
KYIV 2019
.Net Core in production
By Leonid Molotiievskyi
.NET CONFERENCE #1 IN UKRAINE
2. 2
About me
• Hands-on software architect and technological
consultant
• Good at splitting a monolith to microservices
• Built a huge enterprise financial solution from scratch
• Technical guy who believes that right people decisions
are more important than technological ones
• Speaker and mentor
3. 3
Spoilers about what we are going to talk
Agenda
Context overview
Environment that we used to live with
Scaling
How did we scale our services?
4. 4
Hell for the DevOps
teamDo we solve the right problem?
Useful advices
The things that can help you to resolve
the problem
Lessons learned
How can we benefit in
future?
Q&A
Questions and answers
17. 17
Queues are growing… - 2
• A queue has a set of consumers
• Service A consumes the message
• Service A starts processing the message
• Heath check of consumer fails due to high load of
service A/network issue/OOM killed/etc.
• Duplicated message appear in the queue
18. 18
OOM Killed issue
• .Net Core 2.2 doesn’t respect docker limits:
https://github.com/aspnet/AspNetCore/issues/3409
https://github.com/dotnet/coreclr/issues/18971
• ” Server GC was designed with the assumption that
the process using Server GC is the dominant process
on the machine. By default it uses as many heaps as
there are # of processors on the machine.”
19. 19
Let’s fix issue by upgrade to .Net Core 3.0?
https://github.com/mongodb/mongo-csharp-driver/pull/372/files
21. 21
Docker: no space left on the device
level=info msg="[8] System error: write
/sys/fs/cgroup/docker/01f5670fbee1f6687f58f3a943b1e1bdaec26
30197fa4da1b19cc3db7e3d3883/cgroup.procs: no space left on
device"
25. 25
What can help you to find them?
Configured monitoring to track:
• Memory consumption
• CPU consumption
• Number of threads on worker node
• Number of open socket descriptors per node/pod
• Connection refused errors
• Correlation Ids in logs
• Number of messages in queues
• Number of consumers for queues
27. 27
Setup environment in the way…
• Infrastructure services must have HA setup
• Deploy at least two instances of each service
• Setup monitoring and alerting
• To be sure that “temporary data” disappear after
redeployment
• To not configure something manually
29. 29
Lessons learned
• ”Do it as simple as possible” principle
doesn’t work. “Do it in the smart way” - works
• Think about application scaling from the
begging
• Know about open issues inside your target
framework
• Do not blame DevOps team, try to help them
to find out what is the reason