O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019

Carregando em…3

Confira estes a seguir

1 de 150 Anúncio

Mais Conteúdo rRelacionado

Mais recentes (20)


Dafiti R&D, Semana Acadêmica do Centro de Tecnologia (SACT), UFSM 2019

  1. 1. UFSM SACT 01.04.2019
  2. 2. ❖ Head of R&D, Dafiti ❖ 17+ years IT stuff ❖ 1981 - 2011 in Germany ❖ 2011 Rocket Internet (locondo, lamoda, dafiti) ❖ Since 2011 in Brazil ➢ and in some way or another w/ Dafiti ❖ married, 2 sons ❖ Skype: georg.buske ❖ Drop me an email: georg.buske@dafiti.com.br whoami; Georg Buske
  3. 3. Lots of stuff, won’t stop at each slide for long Should be interactive - please ask anything during the presentation Disclaimer
  4. 4. Industry view Get to know Dafiti Lots of examples and showcases, about successes and failures Answer your questions Today’s Objective
  5. 5. About us / History
  6. 6. • Founded in 2011; • Offices in 4 countries; • 2.900 employees; • 5 warehouses in LATAM; • 50MM monthly users; • > R$ 1.4 bi gross revenue • Belongs to Global Fashion Group since 2014. • Today ~120 people in IT (Brasil) • R&D area created in 01/2018 • DFTech: Dafiti’s tech brand Our history Dafiti & GFG
  7. 7. Dafiti & GFG - Global fashion group (GFG) - founded in 2014 - HQ London / Singapore / tech hub Vietnam - operates in 27 counrites - joint initiatives
  8. 8. IT Organizational timeline ● 2010 - 2011 ○ project CTO / Rocket Internet ○ local dev team ○ Berlin dev team ● 2011 - 2012 (incl. first jira generation) ○ IT support ○ project teams ○ sprint team (each 1 Manager + coordinators) ○ backoffice team ○ infrastructure team ○ dedicated QA ○ outsourced developers ● 2012 - 2013 (incl. new jira generation) ○ as before with architecture team ● 2013 - 2014 ○ as before with module owners inside sprint and project teams (technical ownership) ○ NOC ● 2014 - 2015 (incl. new jira generation) ○ agile cells and committee of technical leaders (with POs and SMs) instead of project and sprint ○ lots of SAP consultants, backoffice team is now more part of global IT
  9. 9. IT Organizational timeline ● 2015 - 2016 ○ renamed architecture team to labs ○ dedicated UX / frontend team ○ removed NOC ○ removed outsourced developers ● 2016 - 2017 (incl. new jira generation) ○ squads and [explicit] cross functional teams instead of agile cells ○ PO = PM (product manager) ○ removed scrum master ○ renamed labs to devtools ○ added maintenance team ● 2017 - 2019++ ○ PMs, POs, squads, pillars, SREs, prioritization committee and product funnel, R&D (from 2018) Learning 2: All approaches since 2014 are all not that different and could have worked with the right focus and methodologies. Thus another approach might Learning 1: Whatever the next approach will be, we should fix the structure last (processes first).
  10. 10. “Culture eats strategy to breakfast” -- Peter Drucker
  11. 11. Organizational Design and communication structures Conway’s Law Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations. Brook’s Law Adding human resources to a late software project makes it late Jeff’s 2 Pizza rule If a team couldn't be fed with two pizzas, it was too big
  12. 12. Project timeline 2011 Relaunch (Magento -> Alice and Bob) 2012 - 2013 lots of minor projects 2014 SAP 2015 Marketplace 2016 TriKan Integration (also deployment change because of audit problem) ... Learning: There will never be the right time for [technical and/or cultural] shift - and there will be always be #blackfriday at the last friday in November!
  13. 13. Legacy systems “Today’s implementation is tomorrow’s legacy” Dafiti’s system is a 8 year old set of monolithic applications
  14. 14. the
  15. 15. THE CONCEPT
  16. 16. #DfTechJourney - Concept ● Core Customer ● 21th century mindset (VUCA) ● Technology Heavy User ● Contributors Culture – Empowerment ● Space for experimentation ● Execution speed & Fail Fast (MVP’s) ● BT - Business Technology ● Never left the day 1 – Agile “Start-UP” (by Jeff Bezos -Amazon) What are the main points of Exponential structure & BT?
  17. 17. OUR GOAL
  18. 18. Transform Dafiti's E-commerce to an Exponential Platform for our Customers Improving the user experience, applying the best technologies, through a learning culture and continuous improvement.
  19. 19. OUR MANTRAS
  20. 20. THE JOURNEY
  21. 21. Wave 1 - Rice and Beans 6 months
  22. 22. Wave 2 - The place to be 1 Year
  23. 23. Wave 3 - F*cking Awesome 6 months
  24. 24. THEMES
  25. 25. Infrastructure Corporate IT & DC Information Security People & Culture Products SRE Governance Innovation & Intelligence Tech Stack D&A Platform Backoffice
  26. 26. It was a busy year People
  27. 27. HIERARCHY CHART CTO Cristiano Hyppolito Head of Eng LATAM Rafael Morelo Head of R&D LATAM Georg Buske Manager of Eng LATAM Pablo Maronna Head of Gov LATAM Leandro Lemes Head of InfoSec LATAM Luis Gonçalvez Head of BackOffice LATAM Adriana Ramos Head of Infra LATAM Fabio Jacometto Argentina Chile Colômbia Coord Helpdesk CH & CO TBD Colombia Chile Argentina Chile Head of AGILE TBD Org structure: Classical Organigram, but in practice super flat During 2019: 300 Astronauts in Brazil + Argentina + Chile + Colômbia
  28. 28. #Dafiti Our purpose is to revolutionize the fashion ecosystem with intelligence. Our principles: - we put the customer at the center of everything - we never stop learning - we act with intelligence - we build the best teams - we trust and support each other - we work together for the common good Lots of achievements: Our purpose, our journey, our blackfriday! 4 x orders of a normal day 324 orders / minute
  29. 29. ● Lots of new collegues (third parties and full time hires) ● company wide agile rollout ● Ghostbusters (internal hackathon) ● intercontinental teams (AR + BR) ● lots of fun and beer (in fact, at least every friday - cheers) ● consulting for agile, platform and more ● new platform to come ● new dashboards via live and many more... #DFTechJourney
  30. 30. R&D and Innovation Recap & Outlook Training for all
  31. 31. Safari as learning platform R&D and Innovation Recap & Outlook
  32. 32. There are technical topics in other departments which want to get taught: Python. SQL. HTML, Big Data, Angular/ React, Arquitetura de Banco de Dados/ ETL, R (programming) DFTAcademy rollout R&D and Innovation Recap & Outlook
  33. 33. #DfTechJourney Trying new ways for talent acquisition in tech: hackerX and stackoverflow talent
  34. 34. #DfTechJourney
  35. 35. Workshops & Guilds
  36. 36. ● Machine Learning 101: regression (home prices prediction) ○ https://docs.google.com/presentation/d/1JAg382c9LMrdUm1lSvOfiTGTpWj9iEDKU1Saz9NAEPk ● Machine Learning 101: Image Understanding (Fashion-MNIST) ○ https://docs.google.com/presentation/d/122Pl6ej1x4JZVI1aN-Lawb6LlQ7gEOKT3C5x11L0EkA ● Machine Learning 101: Natural Language Processing (Rating and Reviews) ○ https://docs.google.com/presentation/d/1mC01GXDTByoRNtrPUdpxe1rWqlZ9u5EaxbJsM9Yl0rw ● Machine Learning 101: clustering (Dafiti brands) ○ https://drive.google.com/drive/folders/1XeHMBgh2Lx9LwJpX6Hunb2I0RgX5WgdQ?ogsrc=32 ● Machine Learning 101: Recommendation engine (Dafiti products) ○ https://drive.google.com/drive/folders/1hgf4NzOEE0ExRb0EFQ7XUpT8MqFivrfM?ogsrc=32 ● Python 101: ○ https://drive.google.com/drive/folders/1OHbNu8DBh3WecpY3jmJVQdd_tpyyaACs Internal workshops R&D and Innovation Recap & Outlook
  37. 37. Workshops delivered through DFT Academy and HR support 11/2018 and 12/2018 #DfTechJourney ML Workshops
  38. 38. Machine learning guild’s main objective in 2018: ● create internal workshops objective 2019: ● papers we love / journal club #DfTechJourney ML Guild
  39. 39. ● 25.07.2018 - definition and goals ● 29.08.2018 - DWH training Redshift https://docs.google.com/presentation/d/1muxuxnlBgG0GAF9RP9vFtYcNfEw5JWCydqaoEYT8VUY ● 19.09.2018 - Data catalog and internal system Hulk https://docs.google.com/presentation/d/1CscU8TcI- 2YsJGJCJxiewEXS9o1qxCykOzJZ95SZd4w/edit?ts=5ba293fb&pli=1#slide=id.g3f4ca1ae3c_1_0 ● 10.10.2018 - internal system Nick ● 02.01.2019 - data security (TBD) Summaries: https://docs.google.com/document/d/1d9Edegl2iiLlH4Qa7PkROwb_5FyYd- e3GaYAVQpufgU/edit# R&D and Innovation Recap & Outlook Data Guild
  40. 40. Events
  41. 41. ● Agile trends (H1) ● Sponsoring papis.io (H1) ● Hosting pydata meetup ● Semacomp ● II Congresso Latino-Americano de IA ● Mediaeval ● Hosting deep learning meetup Events #DfTechJourney
  42. 42. papis.io sponsoring papis.io is a maior conference about machine learning #DfTechJourney
  43. 43. Follow: https://twitter.com/dafiti_tech or https://www.linkedin.com/company/dafiti/ Tweet: I <3 ML #papis #dafiti #ufsm To make part of the raffle to win a papis LATAM 2019 ticket
  44. 44. Hosting pyData meetup #DfTechJourney
  45. 45. II Congresso IA LatAm R&D and Innovation Recap & Outlook
  46. 46. #DfTechJourney mediaEval, France
  47. 47. And more Agile trends 04/2018, Semacomp 10/2018, deep learning meetup in 12/2018 (TBD) o/ R&D and Innovation Recap & Outlook
  48. 48. Methodology
  49. 49. OKRs (a.k.a. objectives and key results)
  50. 50. R&D and Innovation Recap & Outlook Shared OKRs 1 12 5.001 ● company wide ● guarantees alignment and focus ● Strategic Objectives valid for 1 year ● KRs reviewed every 3 months ● regular team check-ins ● confidence index
  51. 51. Shared OKRs #DfTechJourney Still learning - MVP
  52. 52. #DfTechJourney ● Physical Kanban board with backlog ● Started with sprints and Jira board ○ continue to improve in 2019 with participation by agile masters ○ timebox: at least weeks ○ caution: extensive planning ● 2 - 2 - 2 ○ 2 days kick - off ○ 2 weeks demo ○ 2 months verifiable user facing prototype Methodology CRISP-DM Various breakdowns on Kanban board
  53. 53. Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Initiative Predictability Uncertainty Cone of Uncertainty Nessa etapa o time conseguirá dar uma previsibilidade de entrega baseado no histórico. When? What? How? For what? Which? Why? Stakeholders C-Level Product Manager Engineering Manager Product Owner Engineering Manager Engineering Manager 4x 2x 1,25 x 0,8 x 0,5 x 0,25 x Product development workflow
  54. 54. Engineering
  55. 55. Platform Team Discovery Payment & Order + MKTplace Post Sales Platform Team Platform Team Feature Team Feature Team Feature Team SRE Team SRE Team SRE Team Platform 01/01 Feature 01/01 Infra 00 / 01 Platform 01/01 Feature 01/01 Infra 01 / 01 Platform 01/01 Feature 01/01 Infra 00 / 01 Product team split (pillars & squads) - each pillar has a PM, an engineering manager and various quads responsible for specific features consisting of: Engineers and product owners - and supported by (cross): Agile coaches, UX, Data engineering, AI and infrastructure
  56. 56. Architecture 2019 #DfTechJourney
  57. 57. A macro view of the technologies we will use...
  58. 58. Dafiti Maturity Model
  59. 59. ...AND DATA FUELED by our awesome D&A team :)
  60. 60. Data Lake DWH Reporting Sharing Data Load/ Export Orchestration Data Quality Scheduling Monitoring ETL Data Streaming Datamarts Feeds Named Queries Data Security D&A Tech Services
  61. 61. accengage adjust admotion appannie b2b b2w bingads bob campaign carmen criteo cubiscan dynad exacttarget exchange external fabric facebook financial fit freight google gotcha Dafiti Data Lake homer ino internal itunes king madruga marketing markovian netsuite osticket parallel price reception responsys sap seller solr supplier taboola tms wms yahoo zanox zendesk > 50 Different Sources > 160 Database schemas 8Tb distributed in 800k ORC / Parquet Files 7.5Tb in 6k Tables Huge Files When the files aren´t so big and we need to apply filters For more demanded data D&A Data Architecture
  62. 62. D&A Governance D&A Data Sharing Map Transactional Systems / External Sources BI Tools Operational Reports (based on 1 system) Data Feeds / Data Interfaces Operational Reports - “Heavy”, “Hard do run” reports (based on 2 or more systems) Tactical Reports / Dashboards External platforms Historical Data Data Mining / AI GFG BI Global Pricing Live Visenze Marketing Apps
  63. 63. D&A Data Hub Pricing Commercial Planning Supply Chain Logistics Transportation Data Mining / AI Other Platforms GFG BI Global Pricing Google / Facebook Financial Processes Customer Service
  64. 64. the Now R&D and Innovation… Executive Summary
  65. 65. D&A / DWH R&D / D&A Ops / D&A R&D / Eng.
  66. 66. Team
  67. 67. R&D Team Will Marcio Ricardo contratando Georg Drop us an email: research-and-development@dafiti.com.br Rafael Albert
  68. 68. Partnerships & Startups
  69. 69. ● Visual conception ○ Visenze ○ Streamoid ○ Flashwall ○ Markable.ai ○ Syte ○ Flixstock R&D and Innovation Recap & Outlook Third party product integration (PoCs 2018) Understand needs, create assessment framework, search more possible third parties (benchmark or integration) use the startup ecosystem to create value for Dafiti! there are many startups pushing into chatbots and fashion (image similarity and catalog enrichment) but nobody is trying the hard stuff as a product (e.g. marketing budget allocation) ;-)
  70. 70. ● academic research group ● current status: paper work ● works on AR and image understanding R&D and Innovation Recap & Outlook
  71. 71. R&D and Innovation Recap & Outlook ● Internships @ Dafiti ○ Still working on the contractual part but we are making this happen ○ feel free to send me an email if you have interest: georg.buske@dafiti.com.br Dafiti <3 UFSM
  72. 72. R&D and Innovation Recap & Outlook Innovation hubs ● Starting in 2019 create hubs in Brazil ● Work more closely with GFG (first calls with lamoda R&D, get back to innovation topics within GFG) ● Until 2020 assess possibilities in China and USA Use the startup ecosystem as multiplier - this is what an exponential platform means... If you participate in UFSM incubator and/or creating a startup disrupting fashion and/or commerce we want to hear from you :)
  73. 73. Vision & key takeaways
  74. 74. R&D Vision Purpose: Lead the revolution of fashion and shopping with AI and technological innovation. Mission: Give Dafiti the capacity to use state-of-the- art AI. Innovation and research needs alignment too! Innovation and Intelligence Committee
  75. 75. Innovating our fashion eCommerce and help with the transformation to THE fashion platform in LATAM with the aid of innovative ways such as machine learning, resp. artificial intelligence in general. E.g.: ● Building algorithms that help us with anticipated shipping, purchasing forecast and protects us against system failure. ● using image recognition to give our users the highest possible convenience and coolest features. ● using state of the art game engines to build virtual reality into our customer experience. ● Optimize product search and build data consistency monitoring. ● Help building a large scale architecture together with entire IT team. ● create a machine learning framework / standard stack and rules (e.g. Sakemaker, CICD, multi-cloud, experiment, tracking, etc.) The outcome will be nothing less than transform the way e-commerce works and to provide sustainable solutions. Mostly the team won’t work directly on user facing products but assesses ways to create impact and works together with other areas to make them happen. R&D and Innovation Recap & Outlook
  76. 76. R&D and Innovation Recap & Outlook Key takeaways ● Events and techbranding is important to attract talent (team goal achieved - open positions filled, BIG THANKS TO OUR HRBPs <3) ○ OTOH we’ll reduce the number of indicators which make the brand index [the old index is not in these slides, please refer to the R&D strategy docs for more information] ● Techbranding and internal workshops not only helps foster DFTech as brand and teach our internal workforce but creates insights and identifies problems and opportunities ● The plan is to start 2019 with 100 % alignment and a mixed model of internal workforce, consulting partnerships and third party providers ○ regular update and alignment meetings will be held in form of an intelligence & innovation committee ○ using more rigid agile methods such as timeboxed sprints (incl. planning / review) to create more visibility and better alignment on results ● we will invest more into our ML standards and stack (as already started) 1 Innovation and research needs alignment, too! => Innovation and Intelligence Committee
  77. 77. R&D and Innovation Recap & Outlook Key takeaways ● To become a name in research we must invest more and thus will start with 20 % time for this and will partner more with academic institutions ● We’ll reduce the number of area KPIs monitored to budget, people, PoCs realized, models launched in production, innovations launched (ideation will be part of this metric), third parties assessed, internal workshops given and techbrand initiatives (papers, articles, events, etc.) for now - KPI review is not in this presentation (please see strategy docs for old area metrics list) ● Pricing optimization and marketing allocation projects didn’t bring the expected results yet ○ eventually we will invest into more research ○ also there are many startups pushing into chatbots and fashion (image similarity and catalog enrichment) but nobody trying the hard stuff as a product ;-) ● Investment in search, recommendations (looks, emails, onsite), catalog enrichment and image recognition might be the most important in 2019 2 Balance explore (PoCs) VS. exploit (production)!
  78. 78. R&D framework
  79. 79. HOW ● Committee OUTPUT ● prioritized shortlist ● team composition (third party, R&D, interdisciplinary team, etc.) R&D Framework How HOW ● Area or Product wishlist ● ML guilds or 20 % research ● Design thinking workshops per area OUTPUT ● Wishlist backlog (now: google docs, future: open innovation portal) ● ideas, hypothesis Ideation Prioritization Commit tee HOW ● Workshop per area together with R&D (regular schedule) OUTPUT ● ML Canvas (ML 101 workshop) ○ definition of success criterias and metrics ● Business canvas ● 6 pager Detailing R&D / areas Dafiti Identif y collab oratio n / work type
  80. 80. R&D Framework How HOW ● Retrospective (Committee) ● Operation (if success) OUTPUT ● Lessons learned / Insights Finalization Com mittee HOW ● Development ● Test (AB test) ● Refine until satisfied or aborted (validation with user) OUTPUT ● Success -> deployment, ops ○ API ○ end-to-end ● Failure -> fail wall Implement HOW ● Data curation ● Paper research ● Third party benchmark ● EDA OUTPUT ● Baseline model ● insights ● validated hypothesis ● GO/NOGO PoC R&D, area, third party KR: 6 KR: 2 Agile; squad s; TBD R&D, area, third party
  81. 81. R&D Framework RULES ● 2 weeks ahead of committee meetings requirements of possible projects and its definitions (success metrics, ML canvas) needs to be done / aligned with R&D ● no ideation during committee, only backlog discussion (exception: today) ● area / product person responsible [optionally together with R&D] will present the detailed ideation item to committee How
  82. 82. R&D Framework COLLABORATION / WORK TYPES ● PoC internal: The PoC execution is fully owned by R&D. ● Implementation: The implementation of the deployable live product which is full owned by R&D (either end-to-end or as API). ● Coach: The implementation or PoC execution is owned by the area and a R&D member is supporting the initiative as a coach. ● workshop (not part of framework): Either as TechTalk like workshop through DFTAcademy or deeper classroom trainings (certification) ML concepts will be taught to DFT employees (rather than a concrete business problem solved). How
  83. 83. Tools
  84. 84. Design Thinking workshops CONVERGENTE DIVERGENTE Entender Definir Gerar Ideias Decidir Necessid ades (Pessoas) Viabili dade (Negóc ios) Possib ilidade (Tecnol ogia) Oportun idades (Inovaç ão) DESIGN THINKING ● with and by our awesome UX team
  85. 85. Machine Learning Canvas ● the canvas help to understand the maturity of the project (in terms of data sources, value proposition, etc.) ● not everyone needs to understand every part but having the canvas created and validated shows it is ready to work on ● value proposition ○ if there is a overall business model canvas the proportional value can be used for the ML task at hand ○ there must be a success metric ● For the ones eager to learn more: ○ New book draft (I will send later on) ○ ML 101 workshop where we’ll discuss ML canvas ○ more to come :-)
  86. 86. Machine Learning Canvas
  87. 87. Example: Customer service ● Develop intelligence / integration for services we already have: Chat BOT Facebook Messenger / E-mail Form Site / FAQ ● Develop intelligence / integration for calls that we would like to have: BOT Time Line Facebook, Instagram and Twitter / Chat BOT Shop / Whatsapp (Online and Offline), URA (Voice Response) ● In addition, a work that was developed in B2W and generated many gains in speed of service, quality and standardization of contacts was the development of a Virtual Attendant, who in addition to passing information, can execute actions (sending 2ª Via Boleto, 2ª Via de Nota Fiscal, alteration of cadastral data, sending reset of password Initial ideas BMC (optional) MLC Ideation / Detailing
  88. 88. Innovation and product launches
  89. 89. R&D and Innovation Recap & Outlook Image Similarity
  90. 90. R&D and Innovation Recap & Outlook
  91. 91. R&D and Innovation Recap & Outlook Return rate prediction
  92. 92. Approach Dado um item comprado, prever se o item será retornado ou não. DW Feature Extraction Modelo 1 Modelo 2 Modelo Agregador Prob. de Retorno
  93. 93. Features Foram usadas em torno de 100 features a partir do produto, cliente e transação para o treinamento dos modelos. Dentre as features estão: - CEP de entrega - Fornecedor - Marca - Idade da Conta do Usuário - Tempo entre pedido e entrega - Tempo desde a última compra do cliente - Net Total Value - Número de pedidos e retornos observados no cliente até a data - etc...
  94. 94. Modelo Agregado - Scores abaixo de 0.01: - 90% dos itens não-retornados - Taxa de retorno de 0.02% - Scores acima de 0.5: - 53% dos itens retornados - Taxa de retorno de 19%
  95. 95. R&D and Innovation Recap & Outlook
  96. 96. R&D and Innovation Recap & Outlook ● a successful model for return rate prediction was created ● deployed via AWS sakemaker (part of ML standard) ● could be easily adapted for cancellation rate
  97. 97. Insights
  98. 98. ● During the return rate project we noted many of our business concern involve Survival Analysis. ● Survival Analysis model situations in which there are discrete events that take some time to occur. ● Most of our problems fall into a less standard type of Survival models called Cure Models ● We are currently developing the capability of applying cure models in complex datasets for both insights and predictive modelling. ● This will allows us to attack return rates, cancellation rates, second purchase behavior, time-to-delivery, time-to-stock- replenishment and all sorts of time-to-X problems.
  99. 99. A few return rate insights
  100. 100. A few return rate insights
  101. 101. A few return rate insights
  102. 102. A few return rate insights
  103. 103. A few return rate insights
  104. 104. PoCs (proof of concepts)
  105. 105. ● Search the look (H1) ● Search - S4 (H1) ● Categorization (catalog automatization) ● Causal impact and marketing budget allocation ● Size filters [external partner: bmind] ● Ratings and reviews ● Brand clustering ● Sales forecasting Blackfriday PoCs (proof of concepts) R&D and Innovation Recap & Outlook
  106. 106. R&D and Innovation Recap & Outlook Search S4
  107. 107. R&D and Innovation Recap & Outlook Search PoC (s4) as fallback for datajet (because of before outages) with advanced learning to rank and search optimization
  108. 108. R&D and Innovation Recap & Outlook
  109. 109. R&D and Innovation Recap & Outlook ● Strategy 2019 work together on Search as a global product (datajet) ● learnings and advanced concepts from s4 will be applied to datajet
  110. 110. R&D and Innovation Recap & Outlook Sales forecasting Blackfriday
  111. 111. Blackfriday throughout the years Looking at sales from Thursday 00:00 to Sunday 23:59 in the years 2013 to 2018 there is a pattern that repeats every year:
  112. 112. Simulating revenue for 2018 based on 2017 Given the distribution of gross revenue per hour that was generated in 2017 during the Blackfriday, we could generate a revenue projection for 2018. The values expected for each hour were derived based on the total revenue estimated by the Live Sales, which is a system used at Dafiti that implements a moving average type of calculation.
  113. 113. R&D and Innovation Recap & Outlook ● Success during Blackfriday ● Knowledge and models obtained being applied to “General Sales Forecasting”: ○ awareness of cyclic sales behaviour in specific time windows ○ lag features ○ extraction and usage of Dafiti’s full sales history ○ how to deal with the data granularity ○ benchmarking GBM vs Neural Network While starting to work on pricing optimization we realized we need a sophisticated forecasting first
  114. 114. R&D and Innovation Recap & Outlook Categorization (catalog automatization)
  115. 115. ● Its goal is to automate object identification only from sku images. ● Imagenet* exists since 2010, and this task is considered dominated by computer science. ● Deep Learning models are the actual state-of-the-art for this task. ● We have enough data for big learning models, over 3 million images. ● We have the data (needs some work) and we have the model! ● The data needs some adjustments as catalog “mistakes are easy to find”. ● Also the used catalog trees have duplicates, attributes are considered category, examples from name_tree3: ○ "Other", "Outras Roupas", "Outros". ○ "Pijamas", "Pijamas e Camisetas". ○ "Polo Manga Curta", "Polo Manga Longa", "Polos". Catalog automatization
  116. 116. ● The trained model achieves this results for these catalog trees: Catalog automatization Catalog model errors total sku accuracy name_tree1 72.683 681.244 89,33 % name_tree2 136.793 681.244 79,92 % name_tree3 158.898 681.244 76,67 %
  117. 117. Catalog automatization
  118. 118. Catalog automatization
  119. 119. Catalog automatization
  120. 120. Catalog automatization
  121. 121. Catalog automatization
  122. 122. Catalog automatization What is suggested to fulfill an automatization: 1. Data cleansing with model’s insights and/or enhanced categorization tree and attributes. 2. Train and validate new model’s predictions. 3. Repeat 1 and 2 until satisfied. 4. Connect this API into the sku registration steps. Next steps catalog automatization and conclusion: ● high potential for catalog curation ● learnings from 2018 will be applied in catalog cleanup 2019
  123. 123. R&D and Innovation Recap & Outlook Ratings & Reviews
  124. 124. ● The goal is to automate approval of reviews. ● Started with preparation for slides for a congress -> made part of the hackathon -> was incorporated into ML 101 workshops -> results aligned with business ● We have the data (also needs some work) and we have the model! ● The data needs some adjustments: ○ Is there a defined policy for approval/rejection of reviews? ○ Is historical data accurate enough for what the company wants for the future? ○ Does the company wants more insights from reviews?* Ratings & Reviews
  125. 125. Ratings & Reviews Historical data historical data: reviews_approved.csv 519.463 reviews_rejected.csv 81.598 total reviews model’s errors accuracy f1-score manually evaluated reviews 601.061 57.704 90,39 % 88 % approved rejected Test data (15%) results:
  126. 126. model’s confidence text 0.916 A qualidade não é tão boa. Pelo preço esperava ms 0.968 Muito boa,linda. 0.663 Não consigo fechar a compra 0.589 A calça e pequena tenho 1.63 ela ficou no meio das pernas odiei.por favor me reponha o valor pago. 0.773 Descascou no primeiro dia de uso. Decepcionada... 0.878 Recebi o tênis tem uma semana, a primeira vez que meu filho usou e fui limpar, o tênis desbotou. Não tem qualidade 0.869 Lola lp.k 0.917 Produto in store: REJECTED model’s prediction: APPROVED Ratings & Reviews Historical data
  127. 127. model’s confidence text 0.731 Gostaria de saber quando estará disponível o nº 34? 0.973 very satisfied with the product. Great finish and very good value for the money. Fits my shoe-size perfectly 0.549 gOSTARIA DE SABER SE VCS TEM ESSE SAPATO EM AZUL MARINHO!! OBRIGADO MARIA LUCIA 0.700 oieeeeeeeeee eu queria essa linda sandalia pfv venha me dar xhauu 0.527 Quero saber se posso trocar o número se não der .. 0.521 è muito bonito mas eu vivo em moçambique e gostava que abrisem uma loja ca em maputo na capital de moçambique. 0.691 morri porfavor digam-me alguma coisa porfavor in store: APPROVED model’s prediction: REJECTED Ratings & Reviews Historical data
  128. 128. Ratings & Reviews Pending reviews_pending.csv 219414 approved rejected pending reviews 194.173 (88,5 %) 25.241 (11,5 %) most confidence cases of: confidence value text APPROVED 0.999 Decepção. Malha Muito fina e áspera, parece uma lixa. REJECTED 0.999 Gostei muito da sandália, super confortável mas já estou mandando de volta pois ela esfolou inteirinha na parte interna em dois usos. Já enviei fotos, estarei enviando de volta amanhã pra dafiti.
  129. 129. Ratings & Reviews Pending
  130. 130. What is suggested to fulfill an automatization: 1. Data cleansing with model’s insights. 2. Train and validate new model’s predictions. 3. Repeat 1 and 2 until satisfied. 4. Connect this API into the rating and reviews validation steps. New project: extract insights and information directly from users reviews, possibilities to explore: a. brand and products alarms on user problems (quality, fitting,...) b. detect reviews that are customer support related c. sentiment analysis Ratings & Reviews Pending Owner? Decision? => committee
  131. 131. R&D and Innovation Recap & Outlook Causal Impact and budget allocation
  132. 132. R&D and Innovation Recap & Outlook Hold-Out Testes ● Processamento séries temporais estruturadas ● Teste em produção no canal “google non-brand SEM” ● Confirmação estatística de valor representativo do canal ● Criação de algoritmos em Python
  133. 133. R&D and Innovation Recap & Outlook Hackathon - Marketing Budget Allocation: ● Time series and non-linear optimization ● Minimization of “CIR” (1 / ROI) ● Algorithm makes resource allocation suggestions to optimize CIR
  134. 134. R&D and Innovation Recap & Outlook
  135. 135. R&D and Innovation Recap & Outlook Results: ● Opensourced port of causal impact package in R to python ● A Hackathon can create good insights and kick off BUT might create a false sense of success ● Understood GA data is not complete ● Optimization TBD
  136. 136. R&D and Innovation Recap & Outlook Brand clustering
  137. 137. Brand Clustering Analysis ● The goal is to bring marketing insights on how users act on brands, and reduce the brands dimension ● We used Google Analytics (GA) actions for 2 days on Dafiti website. 640.235 cookie sessions interacting with 276.923 skus of 4.825 brands. ● Top 4 more interacted brands are: ○ [('Colcci', 54948), ('Vizzano', 42715), ('Santa Lolla', 41401), ('Moleca', 34054)] ● Top 4 GA scores: ○ [('Beautiful Lingerie', 13.7532), ('Philco', 12.5799), ('#Euqfiz', 11.2241), ('Kmc', 9.1530)]
  138. 138. Brand Clustering Analysis
  139. 139. Brand Clustering Analysis How are Dafiti brands related to others? ● 9 brands cluster -> ['Armadillo', 'DAFITI UNIQUE', 'Ki-fofo', 'Lua Luá', 'Mania De Moça', 'Meketrefe', 'Penguin', 'Red Life', 'Styll Baby'] ● 6 brands cluster -> ['Cavage', 'DAFITI JOY', 'La Beauté Cosmétiques', 'Miu Miu', 'Montain Boot', 'Refuse'] ● 10 brands cluster -> ['DAFITI ACCESSORIES', 'Enox', 'Khatto', 'Paul Ryan', 'Prorider', 'Secret', 'Sunnies', 'THOMASTON', 'Terra e Agua', 'Tilit'] ● 582 brands cluster -> ['...Lost', '100% Marca Própria', '3 Sprouts', ..., 'DAFITI I.D.', 'DC Original', 'DGK', 'DKNY', ...,'Sex and the City Cosmetics', ...,'Shoes Shoes', ...,'You Rock', 'Zebu', 'Zenit', 'Ziva'] ● 71 brands cluster -> ['Alta Villa Shoes', 'Asics', 'Ausländer', 'Beautiful Lingerie', 'Botswana', 'Bracciale Acessórios', 'Bull Motors', 'CZ Brand', 'Calcifran', 'Cisco', 'Columbia', 'Crocs', 'DAFITI EDGE', 'Dangelis Moda Íntima', ...,'Won Sports', 'Yardley', 'adidas', 'adidas Originals', 'adidas Performance', 'test', 'zeus'] ● 534 brands cluster -> ['24 Horas Calçados', ..., 'Bvlgari', ...,'Café Brasil', ...,'Cravo & Canela', ..., 'DAFITI', 'DAFITI SHOES', ...,'GUESS Kids', ...,'Harley-Davidson Footwear', ...,'Moleca', ...,'Santa Lolla', ...,'Tiffany & Co.', ...,'VIA UNO', ...,'Vizzano',...]
  140. 140. The resulted clustering does not help much for marketing insights directly. Some changes are needed to provide a direct business value: 1. Consider the problem as a recommendation task. 2. Implement changes to the “Marreco” system, to provide an analysis over brand interactions. Brand clustering 3 most similar brands to Dafiti brands and its similarity score (cosine): ● DAFITI I.D. - [('D-Tox', 0.3293), ('Monte Carlo Polo Club', 0.1646), ('Drop Life', 0.0856)] ● DAFITI SHOES - [('Moleca', 0.2111), ('Ana Cristina', 0.1580), ('Vizzano', 0.1565)] ● DAFITI EDGE - [('FKN', 0.0718), ('Lemon Grove', 0.0665), ('Yachtsman', 0.0366)] ● DAFITI - [('Ride Skateboard', 0.0666), ('Santa Maria', 0.0591), ('Snoopy', 0.0524)] ● DAFITI ACCESSORIES - [('Vila Flor', 0.0455), ('Prorider', 0.0435), ('Flyca Girls', 0.0334)] ● DAFITI UNIQUE - [('Shoulder', 0.0246), ('Energia', 0.0231), ('DAFITI ONTREND', 0.0166)]
  141. 141. R&D and Innovation Recap & Outlook Filter cleanup analysis (sizes)
  142. 142. R&D and Innovation Recap & Outlook Sizes children’s clothes
  143. 143. Internal workshops R&D and Innovation Recap & Outlook
  144. 144. R&D and Innovation Recap & Outlook
  145. 145. R&D and Innovation Recap & Outlook Conclusion: ● we leveraged third party knowledge (consulting) to do the analysis ● few [marketplace] products are creating a very bad user experience ● has some potential quickwins ● we need to align the best form in terms of architecture (first idea of DB update might not be ideal) - product development support? ● What can we fix in registration process already?
  146. 146. Wishlist / Backlog
  147. 147. AI awareness Sales forecasting (train new model with learnings from blackfriday forecasting) Price optimization ng and Buying Marketing allocation Cancellation rate Email click prediction (recipient selection, Markovien) Customer segmentation / user profiles Online recommendations Search Reinforcement learning Survival analysis Email recommendations (jetlore competition) Image similarity Delivery prediction Delivery visualization Image segmentation Anticipatory shipping Intelligent Sizing NLP for chatbots and sentiment analysis Looks, Image understanding and shoppable videos VR personalized discounts R&D and Innovation Recap & Outlook
  148. 148. Thank You!