O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Truecaller towards a data-driven company

1.045 visualizações

Publicada em

Marek Wiewiórka and Tomasz Żukowski had a pleasure to give a presentation at Data Science Summit 2017 which took place in Warsaw on May 26th. They came forth with one of the most popular startup in Sweden - Trucaller.
Please take a look at the presentation.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Truecaller towards a data-driven company

  1. 1. Truecaller towards a data-driven company Marek Wiewiórka, Tomasz Żukowski
  2. 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Agenda 1. Truecaller - a global phonebook 2. Evolution of the company’s data architecture 3. Data as a company asset
  3. 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller ■ World's largest mobile phone community ( > 250 mln users)
  4. 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller In Numbers ■ +6 billion application events daily ■ +3 TB of compressed user generated data daily ■ +65M active users and 250k application installations daily ■ +28M identified spam calls every day ■ ...
  5. 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Not ly Data-Driven Beginnings... ■ Data layer and analytics powered by MySQL databases ■ No separation of OLTP and OLAP domains ■ Daily ETL processes that used to take longer than one day ;) ■ Problems with storing and querying historical (cold) data ■ Basic reporting without possibility of doing real data science ■ Almost no DWH design principles in place...
  6. 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture DWH Data ingestion Schema repo
  7. 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture ■ Both data ingestion and data storage/analytics layers are horizontally scalable ■ High availability for both master and worker nodes ■ Apache Avro with schema evolution features and centralized schema repository makes adding new event types seamless for ETL processes ■ Clear separation of staging (raw - Avro format) and reporting (cleaned and enriched in ORC format) data
  8. 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Self-Service Analytics
  9. 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ly Analytical Tools
  10. 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Jupyter Notebooks
  11. 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score
  12. 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business
  13. 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades
  14. 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads
  15. 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads ■ Detect fraudulent user behaviour
  16. 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. LTV - how to calculate
  17. 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Market Share Estimation
  18. 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem? ■ Calculated on anonymized data of 200k users in the UK ■ Analysis prepared just after Brexit referendum
  19. 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem?
  20. 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  21. 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What’s next ■ More digging into data (a lot of areas not even touched yet) ■ More advanced modelling ■ Streaming analytics
  22. 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ?

×