There are great open-source technologies for NLP (NLTK), machine learning (gensim, scikit-learn) and distribution computation (Spark). So don't shy away from big ideas, and make use of these amazing technologies at your fingertips!
Presentation on how to chat with PDF using ChatGPT code interpreter
DevTalks Cluj - Open-Source Technologies for Analyzing Text
1. An open source tech stack to
Analyze all reviews on the Internet
Steffen Wenz, CTO TrustYou
steffen@trustyou.net
2. ✓ Very good hotel!*
✓ Near city centre
“Close to the city center”
✓ Clean rooms
« Chambre impeccable »
✓ Popular with solo travelers
“Remote doesnt work”
*) Ramada Cluj (Full summary)
8. Big Data & Open Source
2004
MapReduce, GFS
BigTable, Spanner, F1
…
Apache
Beam …
9. Spark
● User writes driver program which transparently
schedules execution in a cluster
● Faster and more expressive than MapReduce
● Spark SQL: Interactive query of large datasets
● Spark Streaming: Spark is “batch first”, but fast enough
to implement stream processing with “mini batches”
● Spark MLlib: Machine learning
10. ● Build complex pipelines of
batch jobs
○ Dependency resolution
○ Parallelism
○ Resume failed jobs
● Some support for Hadoop
● Pythonic replacement for Oozie
Luigi
11. Try it out!
GitHub repo showcasing:
● Luigi
● Scrapy
● Word2Vec model training with gensim
@ https://github.com/trustyou/meetups