SlideShare uma empresa Scribd logo
1 de 2
Baixar para ler offline
Min Xu
1 Bayard Rd, Apt 61, Pittsburgh, PA 15213
Mobile: 412-230-7574 E-mail: xumin9096@gmail.com
Objective
To obtain engineering position in fields of software development or data science
Education
Ph.D. Candidate, Electrical & Computer Engineering, Carnegie Mellon University (CMU), GPA: 3.89/4.0, AUG 2012 – PRESENT
Project: Design, Modeling, Implementation and Analysis of Emerging Reconfigurable RF/Memory Devices; Advisor: Prof. James A. Bain
B.S., Electrical & Computer Engineering, Huazhong University of Sci. and Tech. (HUST), GPA: 89/100, SEP 2008 – JUN 2012
Professional Experience
Data Scientist Intern, Entropy Technology (Startup), AUG 2016 – PRESENT
Integrate data from web sources to develop information retrieval algorithms for search engine production
Skills
Programming languages: Java, Python, C, Matlab, SystemVerilog, LabView, Assembly, Scala, HTML
Frameworks: Flask, VertX, SciPy, scikit-learn, Lucene, Samza, TensorFlow
Platforms and Software: Unix/Linux, Hadoop MapReduce, Spark, AWS, Elasticsearch, MySQL, HBase, MongoDB, Docker
Relevant Courses
Machine Learning, Cloud Computing, Search Engine, Machine Learning for Text Mining, Machine Learning with Large Dataset, Big
Data Analytics, Nature Language Processing, Computer Systems, Data Structures
Projects
Elasticsearch Based Search Engine at Entropy Technology (Python, Java, Elasticsearch)
 Developed crawler to crawl and clean data from Chinese job hiring websites and feed data to Elasticsearch
 Applied supervised/unsupervised re-scoring algorithms (query expansion, learning to rank, etc.) combining with existing
Elasticsearch features to enhance search relevancy and accommodate for both structured and unstructured data
Zhihu (Chinese Quora) Mining Web Service (Python, MongoDB, Flask, D3.js, HTML/CSS)
 Developed a full-stack web service for crawling, mining and visualizing for 50k+ users and 500k+ questions-answers data
 Backends: a multi-thread crawler with dynamic proxies, with supports for both MongoDB and on-disk file storage options
 Mining: keywords analysis, topic clustering, topic/user recommendation, sentiment analysis, popularity analysis
 Frontends: a Flask based web service with supports for mined data visualization using D3.js
Lucene Based Search Engine (Java, Lucene)
 Developed a text-based large scale search engine indexed with Lucene API on 500 k documents from ClueWeb09 dataset with a
prefix query language parser, which retrieves relevant documents in a Document-at-a-Time manner
 Supported different ranking retrieval models (Unranked/ranked Boolean, VSM, BM25, Indri), ten commonly used operators
(#AND, #OR, #WSUM, #NEAR/n, etc.), query expansion and learning to rank (pair-wise RankSVM)
Twitter Analytics Web Service (Java, HBase, MySQL, VertX, EMR, AWS)
 Performed Extract Transform (data cleaning, sentiment score analysis, term censorship, popularity analysis) and Load using
EMR on 1 TB of twitter raw dataset based on schemas designed for a variety of analytics queries
 Developed a RESTful web service API using VertX and deployed on AWS in response to different analytics queries.
 Deployed backend databases using MySQL and HBase on AWS. Sharding technique is used for the MySQL backend instances.
EMR Hadoop cluster is used for HBase. Fine performance tuning was performance on both databases and frontend
Recommendation System for Netflix Movies (Python, scikit-learn)
 Implemented movie rating prediction using memory-based/model-based collaborative filtering and probabilistic matrix
factorization (PMF) based on a subset of the Netflix Prize dataset
 Implemented collaborative ranking using pair-wise learning-to-rank based on RankSVM/LR-LETOR and PMF features, which
is used for movie recommendation for given user query
Image Classification on CIFAR Image Dataset (Python, TensorFLow, MATLAB)
 Used HOG and PCA for feature selections, implemented various classifiers from scratch (SVM with Linear/RBF kernels and
GNB with Ada Boosting) and performed cross-validation to achieve high classification accuracy
 Further improved accuracy by applying CNN implemented using TensorFlow, test accuracy ranked top 3 among all the teams
Restaurants Rating based on Yelp Comments (Python, scikit-learn)
 Data cleaning, term dictionary construction and feature engineering for sparse matrix samples from raw Yelp JSON dataset
 Implemented supervised learning using multi-class logistic regression and SVM for both hard and soft score prediction
Link Analysis and Personalized Search on CiteEval Dataset (Python)
 Performed K-Means with K-Means++ initialization on documents for topic clustering
 Performed link analysis using general PageRank, personalized PageRank (based on user-topic preference) and query
sensitive PageRank (based on query-document relevance score from Indri based search engine) for retrieval ranking
Input Text Predictor Based on Wikipedia Dataset (Java, Hadoop MapReduce, HBase, AWS)
 Generated phrase list based on N-Gram language model using MapReduce from Wikipedia plain-text dataset
 Stored calculated probabilities of words after each phrase in HBase backend
 Built a RESTful API for words prediction and text autocomplete in response to input phrases
Uber-like Rider-Driver Matching Service (Java, Kafka, Samza, AWS)
 Developed a driver-matching service using Apache Samza to process streams of GPS data (driver/rider locations and updates,
etc.) produced by Apache Kafka and generate a matching stream, with RocksDB as fast in-memory storage for streaming data
 Dynamically calculated and updated surge price based on block-wise driver availability
Social Network Timeline with Heterogeneous Backends (Java, MySQL, HBase and DynamoDB, AWS)
 Deployed a RESTful master instance that coordinates three different databases for various features
 Login authorization data was stored in MySQL, social graph of followers and followees was stored in HBase, self-posted
contents and posts timeline of followees was stored in Amazon DynamoDB
Distributed Storage API Development (Java, AWS)
 Implemented a distributed datastore coordinator API that supports different horizontal partitioning techniques such as
sharding and replication with strong consistency
 Implemented distributed datastore API on different EC2 instances based on strong, causal and eventual consistency models
 Developed a load balancer that can evenly distribute requests over instances based on CPU utilizations, perform health
monitoring to kill/generate EC2 instances for better system reliability, and horizontally scale EC2 instances to dynamically
handle requests in order to achieve the best performance-cost trade-off (minimize cost while maximize RPS)
Computer System Projects (C and Unix POSIX API)
 Implemented a concurrent caching web proxy based on Unix POSIX API with good error handling capability to handle
requests from clients and forward response from servers in a multi-thread manner. An LRU cache was implemented to cache
historically visited pages, supporting concurrent read operations
 Implemented a general purpose dynamic storage allocator with functions included malloc, calloc, realloc and free based on
Unix system call sbrk. Used segregated free lists (a combination of linked lists and binary search trees) for free memory blocks
management
Phase Change (PC) RF switch for Reconfigurable RF Systems (Ph.D. Project)
 Designed, fabricated and tested a 20 THz PC switch with low insertion loss and high isolation in reconfigurable RF systems
 Developed a complete automatic testing software system for high throughput large scale device testing and analysis
 Performed unsupervised clustering and semi-supervised learning for fault analysis and defect detection among devices
 Integrated the in-house fabricated device with a dual-band low noise amplifier (0.13 μm CMOS process) that can be reliably
cycled between 2.4 GHz and 5 GHz (results published in IEDM 2015)
Honors and Awards
Qualcomm Innovation Fellowship Final List, CMU, 2016
Carnegie Institute of Technology Dean’s Fellowship, CMU, 2012
HUST Excellent Graduate Award, HUST, 2012
HUST Excellent Academic Performance Scholarship, HUST, 2010&2011
HUST Top College Student Leader Scholarship, HUST, 2009
HUST Most Impressive Freshman Scholarship, HUST, 2009

Mais conteúdo relacionado

Destaque

Trunk Space commemorates decade of live performances with 10-day music fest
Trunk Space commemorates decade of live performances with 10-day music festTrunk Space commemorates decade of live performances with 10-day music fest
Trunk Space commemorates decade of live performances with 10-day music festJasmine Kemper
 
AHRI National Convention 2015
AHRI National Convention 2015 AHRI National Convention 2015
AHRI National Convention 2015 Jon Ingham
 
Hermeneutica Heidegger, Gadamer y Vattimo
Hermeneutica Heidegger, Gadamer y VattimoHermeneutica Heidegger, Gadamer y Vattimo
Hermeneutica Heidegger, Gadamer y VattimoJonas Fernandez
 
Evaluation Q1 section 2
Evaluation Q1 section 2Evaluation Q1 section 2
Evaluation Q1 section 2Emily Ore
 
La escritura fenomenológica hermenéutica
La escritura fenomenológica hermenéuticaLa escritura fenomenológica hermenéutica
La escritura fenomenológica hermenéuticanasly uribe
 
Πόλεμος
ΠόλεμοςΠόλεμος
Πόλεμοςsokaniak
 
Soteriologia ibaderj Prof. Dangelo Nascimento
Soteriologia ibaderj Prof. Dangelo NascimentoSoteriologia ibaderj Prof. Dangelo Nascimento
Soteriologia ibaderj Prof. Dangelo NascimentoDangelo Nascimento
 
Τουρισµός
ΤουρισµόςΤουρισµός
Τουρισµόςsokaniak
 
Digipak Designs and Development
Digipak Designs and DevelopmentDigipak Designs and Development
Digipak Designs and Developmentmeggarrattmedia
 

Destaque (14)

Anjos aula 1
Anjos   aula 1Anjos   aula 1
Anjos aula 1
 
Mda mde
Mda   mdeMda   mde
Mda mde
 
Trunk Space commemorates decade of live performances with 10-day music fest
Trunk Space commemorates decade of live performances with 10-day music festTrunk Space commemorates decade of live performances with 10-day music fest
Trunk Space commemorates decade of live performances with 10-day music fest
 
AHRI National Convention 2015
AHRI National Convention 2015 AHRI National Convention 2015
AHRI National Convention 2015
 
Anabel Ortiz 2016
Anabel Ortiz 2016Anabel Ortiz 2016
Anabel Ortiz 2016
 
Hermeneutica Heidegger, Gadamer y Vattimo
Hermeneutica Heidegger, Gadamer y VattimoHermeneutica Heidegger, Gadamer y Vattimo
Hermeneutica Heidegger, Gadamer y Vattimo
 
gv003 ver0.1
gv003 ver0.1gv003 ver0.1
gv003 ver0.1
 
Evaluation Q1 section 2
Evaluation Q1 section 2Evaluation Q1 section 2
Evaluation Q1 section 2
 
La escritura fenomenológica hermenéutica
La escritura fenomenológica hermenéuticaLa escritura fenomenológica hermenéutica
La escritura fenomenológica hermenéutica
 
Πόλεμος
ΠόλεμοςΠόλεμος
Πόλεμος
 
Soteriologia ibaderj Prof. Dangelo Nascimento
Soteriologia ibaderj Prof. Dangelo NascimentoSoteriologia ibaderj Prof. Dangelo Nascimento
Soteriologia ibaderj Prof. Dangelo Nascimento
 
Barroco: Sor Juana Inés de la Cruz
Barroco: Sor Juana Inés de la CruzBarroco: Sor Juana Inés de la Cruz
Barroco: Sor Juana Inés de la Cruz
 
Τουρισµός
ΤουρισµόςΤουρισµός
Τουρισµός
 
Digipak Designs and Development
Digipak Designs and DevelopmentDigipak Designs and Development
Digipak Designs and Development
 

Semelhante a Resume of Min Xu

Semelhante a Resume of Min Xu (20)

ZhenchuanPang16.8.25_v1
ZhenchuanPang16.8.25_v1ZhenchuanPang16.8.25_v1
ZhenchuanPang16.8.25_v1
 
Srinivasan Rajappa
Srinivasan RajappaSrinivasan Rajappa
Srinivasan Rajappa
 
Wilson Wu_
Wilson Wu_Wilson Wu_
Wilson Wu_
 
Ruchika Mehresh_Web Developer
Ruchika Mehresh_Web DeveloperRuchika Mehresh_Web Developer
Ruchika Mehresh_Web Developer
 
Arunraja resume
Arunraja resumeArunraja resume
Arunraja resume
 
Resume
ResumeResume
Resume
 
MyResume_Updated
MyResume_UpdatedMyResume_Updated
MyResume_Updated
 
ASHWINI RAMESHA
ASHWINI RAMESHAASHWINI RAMESHA
ASHWINI RAMESHA
 
resume-2016spring
resume-2016springresume-2016spring
resume-2016spring
 
GaurabDey_UFL_CV_fulltime (1)
GaurabDey_UFL_CV_fulltime (1)GaurabDey_UFL_CV_fulltime (1)
GaurabDey_UFL_CV_fulltime (1)
 
Resume_Brad_Johnson
Resume_Brad_JohnsonResume_Brad_Johnson
Resume_Brad_Johnson
 
LinkedinResume
LinkedinResumeLinkedinResume
LinkedinResume
 
Shubham Sharma Resume
Shubham Sharma ResumeShubham Sharma Resume
Shubham Sharma Resume
 
Bo(Frank)_Li_Resume
Bo(Frank)_Li_ResumeBo(Frank)_Li_Resume
Bo(Frank)_Li_Resume
 
WangCheng_CMU_ResumeS16
WangCheng_CMU_ResumeS16WangCheng_CMU_ResumeS16
WangCheng_CMU_ResumeS16
 
TanushreeHaldar
TanushreeHaldarTanushreeHaldar
TanushreeHaldar
 
RaymondResume2015v5
RaymondResume2015v5RaymondResume2015v5
RaymondResume2015v5
 
Web Technologies (4/12): Web Application Development in PHP
Web Technologies (4/12): Web Application Development in PHPWeb Technologies (4/12): Web Application Development in PHP
Web Technologies (4/12): Web Application Development in PHP
 
Atul_Mohan_Resume_LinkedIn
Atul_Mohan_Resume_LinkedInAtul_Mohan_Resume_LinkedIn
Atul_Mohan_Resume_LinkedIn
 
Manoj_Rajandrakumar_Resume
Manoj_Rajandrakumar_ResumeManoj_Rajandrakumar_Resume
Manoj_Rajandrakumar_Resume
 

Resume of Min Xu

  • 1. Min Xu 1 Bayard Rd, Apt 61, Pittsburgh, PA 15213 Mobile: 412-230-7574 E-mail: xumin9096@gmail.com Objective To obtain engineering position in fields of software development or data science Education Ph.D. Candidate, Electrical & Computer Engineering, Carnegie Mellon University (CMU), GPA: 3.89/4.0, AUG 2012 – PRESENT Project: Design, Modeling, Implementation and Analysis of Emerging Reconfigurable RF/Memory Devices; Advisor: Prof. James A. Bain B.S., Electrical & Computer Engineering, Huazhong University of Sci. and Tech. (HUST), GPA: 89/100, SEP 2008 – JUN 2012 Professional Experience Data Scientist Intern, Entropy Technology (Startup), AUG 2016 – PRESENT Integrate data from web sources to develop information retrieval algorithms for search engine production Skills Programming languages: Java, Python, C, Matlab, SystemVerilog, LabView, Assembly, Scala, HTML Frameworks: Flask, VertX, SciPy, scikit-learn, Lucene, Samza, TensorFlow Platforms and Software: Unix/Linux, Hadoop MapReduce, Spark, AWS, Elasticsearch, MySQL, HBase, MongoDB, Docker Relevant Courses Machine Learning, Cloud Computing, Search Engine, Machine Learning for Text Mining, Machine Learning with Large Dataset, Big Data Analytics, Nature Language Processing, Computer Systems, Data Structures Projects Elasticsearch Based Search Engine at Entropy Technology (Python, Java, Elasticsearch)  Developed crawler to crawl and clean data from Chinese job hiring websites and feed data to Elasticsearch  Applied supervised/unsupervised re-scoring algorithms (query expansion, learning to rank, etc.) combining with existing Elasticsearch features to enhance search relevancy and accommodate for both structured and unstructured data Zhihu (Chinese Quora) Mining Web Service (Python, MongoDB, Flask, D3.js, HTML/CSS)  Developed a full-stack web service for crawling, mining and visualizing for 50k+ users and 500k+ questions-answers data  Backends: a multi-thread crawler with dynamic proxies, with supports for both MongoDB and on-disk file storage options  Mining: keywords analysis, topic clustering, topic/user recommendation, sentiment analysis, popularity analysis  Frontends: a Flask based web service with supports for mined data visualization using D3.js Lucene Based Search Engine (Java, Lucene)  Developed a text-based large scale search engine indexed with Lucene API on 500 k documents from ClueWeb09 dataset with a prefix query language parser, which retrieves relevant documents in a Document-at-a-Time manner  Supported different ranking retrieval models (Unranked/ranked Boolean, VSM, BM25, Indri), ten commonly used operators (#AND, #OR, #WSUM, #NEAR/n, etc.), query expansion and learning to rank (pair-wise RankSVM) Twitter Analytics Web Service (Java, HBase, MySQL, VertX, EMR, AWS)  Performed Extract Transform (data cleaning, sentiment score analysis, term censorship, popularity analysis) and Load using EMR on 1 TB of twitter raw dataset based on schemas designed for a variety of analytics queries  Developed a RESTful web service API using VertX and deployed on AWS in response to different analytics queries.  Deployed backend databases using MySQL and HBase on AWS. Sharding technique is used for the MySQL backend instances. EMR Hadoop cluster is used for HBase. Fine performance tuning was performance on both databases and frontend Recommendation System for Netflix Movies (Python, scikit-learn)  Implemented movie rating prediction using memory-based/model-based collaborative filtering and probabilistic matrix factorization (PMF) based on a subset of the Netflix Prize dataset  Implemented collaborative ranking using pair-wise learning-to-rank based on RankSVM/LR-LETOR and PMF features, which is used for movie recommendation for given user query Image Classification on CIFAR Image Dataset (Python, TensorFLow, MATLAB)  Used HOG and PCA for feature selections, implemented various classifiers from scratch (SVM with Linear/RBF kernels and GNB with Ada Boosting) and performed cross-validation to achieve high classification accuracy
  • 2.  Further improved accuracy by applying CNN implemented using TensorFlow, test accuracy ranked top 3 among all the teams Restaurants Rating based on Yelp Comments (Python, scikit-learn)  Data cleaning, term dictionary construction and feature engineering for sparse matrix samples from raw Yelp JSON dataset  Implemented supervised learning using multi-class logistic regression and SVM for both hard and soft score prediction Link Analysis and Personalized Search on CiteEval Dataset (Python)  Performed K-Means with K-Means++ initialization on documents for topic clustering  Performed link analysis using general PageRank, personalized PageRank (based on user-topic preference) and query sensitive PageRank (based on query-document relevance score from Indri based search engine) for retrieval ranking Input Text Predictor Based on Wikipedia Dataset (Java, Hadoop MapReduce, HBase, AWS)  Generated phrase list based on N-Gram language model using MapReduce from Wikipedia plain-text dataset  Stored calculated probabilities of words after each phrase in HBase backend  Built a RESTful API for words prediction and text autocomplete in response to input phrases Uber-like Rider-Driver Matching Service (Java, Kafka, Samza, AWS)  Developed a driver-matching service using Apache Samza to process streams of GPS data (driver/rider locations and updates, etc.) produced by Apache Kafka and generate a matching stream, with RocksDB as fast in-memory storage for streaming data  Dynamically calculated and updated surge price based on block-wise driver availability Social Network Timeline with Heterogeneous Backends (Java, MySQL, HBase and DynamoDB, AWS)  Deployed a RESTful master instance that coordinates three different databases for various features  Login authorization data was stored in MySQL, social graph of followers and followees was stored in HBase, self-posted contents and posts timeline of followees was stored in Amazon DynamoDB Distributed Storage API Development (Java, AWS)  Implemented a distributed datastore coordinator API that supports different horizontal partitioning techniques such as sharding and replication with strong consistency  Implemented distributed datastore API on different EC2 instances based on strong, causal and eventual consistency models  Developed a load balancer that can evenly distribute requests over instances based on CPU utilizations, perform health monitoring to kill/generate EC2 instances for better system reliability, and horizontally scale EC2 instances to dynamically handle requests in order to achieve the best performance-cost trade-off (minimize cost while maximize RPS) Computer System Projects (C and Unix POSIX API)  Implemented a concurrent caching web proxy based on Unix POSIX API with good error handling capability to handle requests from clients and forward response from servers in a multi-thread manner. An LRU cache was implemented to cache historically visited pages, supporting concurrent read operations  Implemented a general purpose dynamic storage allocator with functions included malloc, calloc, realloc and free based on Unix system call sbrk. Used segregated free lists (a combination of linked lists and binary search trees) for free memory blocks management Phase Change (PC) RF switch for Reconfigurable RF Systems (Ph.D. Project)  Designed, fabricated and tested a 20 THz PC switch with low insertion loss and high isolation in reconfigurable RF systems  Developed a complete automatic testing software system for high throughput large scale device testing and analysis  Performed unsupervised clustering and semi-supervised learning for fault analysis and defect detection among devices  Integrated the in-house fabricated device with a dual-band low noise amplifier (0.13 μm CMOS process) that can be reliably cycled between 2.4 GHz and 5 GHz (results published in IEDM 2015) Honors and Awards Qualcomm Innovation Fellowship Final List, CMU, 2016 Carnegie Institute of Technology Dean’s Fellowship, CMU, 2012 HUST Excellent Graduate Award, HUST, 2012 HUST Excellent Academic Performance Scholarship, HUST, 2010&2011 HUST Top College Student Leader Scholarship, HUST, 2009 HUST Most Impressive Freshman Scholarship, HUST, 2009