Science and data manipulation in Pharo

E
ESUGESUG
Science and data
manipulation in Pharo
Ferlicot-Delbecque Cyril | ESUG 2023
cyril@ferlicot.fr
Polymath, Pharo-AI and DataFrame
1
Summary
2
History
What is new?
Toward a new stage
History
3
Polymath
DataFrame
Pharo-AI
PolyMath
● Computation library for Pharo
● Similar to NumPy and SciPy in Python or SciRuby in Ruby
● Originally SciSmalltalk in Squeak
● Present in Pharo since a long time
4
DataFrame
5
Columns
Rows
Cells
DataFrame
● Table data structure
● Similar to DataFrame in Pandas, Julia, …
● Heavily used in data science
● Created in 2017 during GSoC
6
Pharo-AI
● Created in 2020
● Implements classical machine learning
algorithms (not deep learning)
○ K-Means, Linear Regression, N-Gram
Model, …
7
Harmony of communities
8
Polymath DataFrame
Pharo-AI
What is new?
9
Polymath: Modularization
10
● Rearchitecture
○ Extraction of data structures and random generators
○ Extraction of distributions in progress
● Cleaning of internal dependencies
Polymath
11
● Improvement of the CI robustness
● Align some conventions with Pharo-AI
● Divers cleanings and bug fixes
● Pharo 11 compatibility
Pharo-AI : Data manipulation
12
● Data partitioners : create tests sets
● Imputers : fill missing values
● Encoders : Standardize your datas
● Normalizer : Use common scales in your project
Pharo-AI
13
● Uniformization of projects
● Documentation
● Graph algorithms updates
● Divers speed up
● Cleaning and bug fixes in algos
DataFrame
14
● Speed up
● Better integration with other collections
● New visualizations based on DataFrames
● Integration with pharo-AI data preprocessing
DataFrame : GSoC 2023
15
● GSoC of Joshua Jose Dias Barreto
● Implementation of missing features
○ Better sorting
○ Data manipulation
○ Missing values management
○ …
DataFrame : GSoC 2023
16
Further improvements of DataFrame inspector
Toward a new stage
17
A push from the students
18
● DataFrame was started as a GSoC
● First AI algorithm were students projects
=> Data science interest more and more people
19
We are answering to this call
● Engineers are now pushing those projects
● Projects are maintained
● Speed is becoming correct
20
21
Are you using scientific
computing or data science?
Is the speed enough for you?
Are you encountering any
problem?
Are you missing features for
you projects?
Let us know ;)
1 de 21

Recomendados

Industrialiser sparkIndustrialiser spark
Industrialiser sparkLucien Fregosi
118 visualizações53 slides
Python in geospatial analysisPython in geospatial analysis
Python in geospatial analysisSakthivel R
536 visualizações19 slides
Glowing bear Glowing bear
Glowing bear thehyve
903 visualizações28 slides

Mais conteúdo relacionado

Similar a Science and data manipulation in Pharo

Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)8thLight
60 visualizações18 slides
Python mlPython ml
Python mlShubham Sharma
169 visualizações29 slides
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
7K visualizações126 slides
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
610 visualizações56 slides

Similar a Science and data manipulation in Pharo(20)

Dataframes Showdown (miniConf 2022)Dataframes Showdown (miniConf 2022)
Dataframes Showdown (miniConf 2022)
8thLight60 visualizações
Python mlPython ml
Python ml
Shubham Sharma169 visualizações
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
Databricks7K visualizações
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
markgrover610 visualizações
Data Science as ScaleData Science as Scale
Data Science as Scale
Conor B. Murphy204 visualizações
Future se oct15Future se oct15
Future se oct15
CS, NcState1.4K visualizações
MapReduce: Optimizations, Limitations, and Open IssuesMapReduce: Optimizations, Limitations, and Open Issues
MapReduce: Optimizations, Limitations, and Open Issues
Vasia Kalavri1.8K visualizações
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
Ferdin Joe John Joseph PhD229 visualizações
Network Automation e-AcademyNetwork Automation e-Academy
Network Automation e-Academy
CSUC - Consorci de Serveis Universitaris de Catalunya91 visualizações
C cerin piv2017_cC cerin piv2017_c
C cerin piv2017_c
Bertrand Tavitian237 visualizações
The Power of Machine Learning and GraphsThe Power of Machine Learning and Graphs
The Power of Machine Learning and Graphs
Franz Inc. - AllegroGraph354 visualizações

Mais de ESUG(20)

Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
ESUG85 visualizações
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
ESUG20 visualizações
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
ESUG36 visualizações
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
ESUG18 visualizações
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
ESUG85 visualizações
BioSmalltalkBioSmalltalk
BioSmalltalk
ESUG415 visualizações

Último(20)

Roadmap y Novedades de productoRoadmap y Novedades de producto
Roadmap y Novedades de producto
Neo4j5 visualizações
Best Mics For Your Live StreamingBest Mics For Your Live Streaming
Best Mics For Your Live Streaming
ontheflystream6 visualizações
Embedded RustEmbedded Rust
Embedded Rust
Jens Siebert512 visualizações
[PHPCon 2023] Blaski i ciebie BDD[PHPCon 2023] Blaski i ciebie BDD
[PHPCon 2023] Blaski i ciebie BDD
Mateusz Zalewski41 visualizações
Why Cloud Services.pdfWhy Cloud Services.pdf
Why Cloud Services.pdf
Pi DATACENTERS6 visualizações
ict act 1.pptxict act 1.pptx
ict act 1.pptx
sanjaniarun088 visualizações
Scala Left Fold Parallelisation- Three ApproachesScala Left Fold Parallelisation- Three Approaches
Scala Left Fold Parallelisation - Three Approaches
Philip Schwarz36 visualizações
BSides Lisbon 2023 - AI in Cybersecurity.pdfBSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdf
Tiago Henriques66 visualizações
Topic 1 What is Evolutionary Prototyping.pptxTopic 1 What is Evolutionary Prototyping.pptx
Topic 1 What is Evolutionary Prototyping.pptx
AHMADAIMAN777 visualizações
CASCADING STYLE SHEETS (CSS).pptxCASCADING STYLE SHEETS (CSS).pptx
CASCADING STYLE SHEETS (CSS).pptx
Asmr177 visualizações
Winter '24 Release Chat.pdfWinter '24 Release Chat.pdf
Winter '24 Release Chat.pdf
melbourneauuser7 visualizações
Cycleops - Automate deployments on top of bare metal.pptxCycleops - Automate deployments on top of bare metal.pptx
Cycleops - Automate deployments on top of bare metal.pptx
Thanassis Parathyras26 visualizações

Science and data manipulation in Pharo