Data science nlp_resume-2018-abridged

Rangarajan Chari
US Citizen
6709 WinnipegCove
Austin, TX 78759
512-461-1810 (C)
512-346-4616 (H)
rangarajan.chari@gmail.com
LinkedInProfile:https://www.linkedin.com/in/mlengineer
SUMMARY
Data Scientist,Machine Learningand Natural Language Processing Specialist and SoftwareEngineer
with a unique combination of solid algorithm design skills and research acumen. Always eager to learn
and apply new technologies. Relevant experience includes:
● Research in AI at the PhD level
● Applying word and sentence embeddings, CNNs and RNNs for NLP tasks
● Implementing Convolutional Neural Nets for face recognition
● Using Machine Learningmethods for classification
● Using ‘Big Data” technologies like Apache Spark (with Scala and PySpark), AWS, Hadoop and
Cascading
● Excellent programmingin python,C/C++ and Java
● Keeping pace with the latest developments and trends in ML and NLP
● Graph mining in social networkanalysis
● Conducting research in DoD SBIR projects
EDUCATION
● Georgia Instituteof Technology, Atlanta,Georgia, M.S., Computer Science. Admitted into PhD
program.
Major Focus in PhD program:AI (Computational Vision,Case-Based Reasoning)
Minor in Statistics (Stochastic Processes,Queuing Theory, Nonparametric Statistics)
Research in Ph.D. program on Computational Vision (Pre-attentivevision,textureperception,
visual routines). VisitingStudent,CMU Computer Vision Lab under Prof. Takeo Kanade.
● University of Denver, Denver, Colorado, M.S., Math and Computer Science.
Thesis on Computational Vision.
● Indian Instituteof Technology, Bombay,India, M.Sc., Physics.
Recent Courses
● Certificatefrom deeplearning.ai (October 2017)for “Neural Networks and Deep Learning”,
“StructuringMachineLearning Projects”, “Convolutional Networks”and “Sequence Models” in
the Deep Learning Specialization on Coursera.
Built several CNNs for object detection from scratch as well as with TensorFlow and Keras. Built
character-level as well as word-level sequence to sequence models for NLP.

● NLP courses taken on Coursera: “Introduction to Natural Language Processing” by Prof.
Dragomir Radev, Univ. of Michigan, “Introduction to Natural Language Processing” by Jurafsky
and Manning, Stanford U.
● Currently takinga fast.ai course on Deep Learning.
PATENTS AND PUBLICATIONS
 “Entity Resolution from Inferred Relationships and Behavior”, Jonathan Mugan, Rangarajan
Chari, et al., IEEE BigData 2014 .
 Method and Computer System for Identifying Entities in Interaction Data, Laura Hitt,
Rangarajan Chari,et al., U.S. Patent Application #14/525040,2014.
● AlternativeMethodology and Tool for AnalyzingCompetitiveBenchmarks (US Pat.# 6,381,558,
IBM, 2002)
● System and Method for Compressing and Decompressing Fonts Based Upon Font Stroke
Regularities (US Pat.# 5,524,182, Hewlett-Packard,1996)
PROFESSIONAL EXPERIENCE
General Motors, IT Innovation Center Aug 2018 – Present
Sr. Machine Learning Scientist (Contract)
NLP
 Experimenting with various approaches,techniques and tools to solve the business problem of
helping technicians,and in the longer-term,owners,diagnose and troubleshoot problems with
their GM cars/trucks in a much easier way than consulting manuals.
 Attackingthe immediateproblemof findingrelevantmaterialaboutmostprobable cause from
manuals andbulletins.
 Techniques and tools include word embeddings -- word2vec,GloVe, fastText -- and sentence
embeddings (e.g., the Google Universal Sentence Encoder and Facebook’s InferSent) as well as
machine learningand scientific computing libraries like sklearn and scipy.The Universal
Sentence Encoder and InferSent areless than twoyears old as of October 2018.
 Investigatingan algorithm for text segmentation intoparagraphs or topical sections based on a
signal processingalgorithm for salient peak detection in timeseries data.
 The continuation of this project is contingent upon funding in the next quarter.
Cognizant Technology Solutions Nov 2017 – Aug 2018
Sr. Data Scientist
Deep Learning
● Assignments with:
o A well-known server technology company intending to offer machine learning and AI
services to their high-end customers.
o A large clinical laboratory services company seeking to get improved reimbursement
outcomes from health insurers by analyzing reasons for denial.
o The largest pharmacy networkand prescription benefit plan manager in the US, with the
goal of reducing human effort in interpreting changing rules.
● Constructed a static Knowledge Graph of the prescription benefit plan and drug coverage domain
and devised a process to enrich it with entities and relations extracted from dynamic human-

generated text and enable it to be queried via SPARQL. The goal was to reduce human effort in
interpreting complex free-text annotations tied to a particular plan and particular drugs.
● Built a CNN with TensorFlow for predicting failures of freezers and coolers in a large chain of
stores using time series of sensor readings.
● Benchmarked the performanceof Dell servers with and without GPUs and with and without Intel-
optimized TensorFlow on well-known Convolutional Neural Network architectures such as
Lenet5, GoogLenet, Alexnet, VGG17 and Resnet50.
Infosys Limited May 2017 – Sep 2017
Data Scientist
Natural Language Processing
 Developed sentiment analysis and summarization algorithms for a world leader in the Oil & Gas
industry.
 Developed an algorithm to draw word clouds from text based on relative prominence.
 For another Fortune 10 company, developed an algorithm to extract information related to
customer care (the cause and resolution of customer complaints) from email threads.
Visual Semantics (now Third Insight) Nov 2016 – April 2017
Machine Learning Consultant
Deep Neural Networks
Implemented a compact,three-stagecascaded convolutional networkfor use in a system for face
detection. The network,built using Torch, reproduced results of a 2015 publication which had minimal
information about the architecture.Other frameworks like Theano/Lasagne werealso evaluated.Also
looked at face detection libraries such as FaceNet (Google) and OpenFace (CMU), which is partly based
on OpenCV.
Jobs2Careers (now Talroo) Sep 2015 – Oct 2016
Data Scientist
Semantic Search
Enabling job seekers to find jobs relevant to the intent of their queries by understanding job
descriptions – classifying them, tagging them, and finding the dominant “topics” in them.
● Used word2vec, a neural embedding algorithm,on millions of job descriptions along with the
graph clusteringalgorithm mcl to assign “signatures”to the descriptions for job retrieval.
Developed heuristics for word-sensedisambiguation and for automatically determiningterm
specificity.
● Tried a community detection approach todocument clustering.
● Applied the probabilistic topic modeling techniques LDA (Latent Dirichlet Allocation) and HDP
(Hierarchical Dirichlet Processes) availablein the gensim package to find the major themes in a
large job description corpus and use the model for Information Retrieval. Also experimented
with methods that combineLDA with word2vec (e.g., Topical Word Embeddings),
● Some of the abovecomputations weredone with Spark/Scala and PySparkin Databricks
notebooks and AWS S3. Gained experience with SparkDataFrames, RDDs and Spark SQL.
● Extensively used python machinelearning and NLP stacks (scikit-learn,nltk, scipy, numpy as
well as genism, spaCy and chainer - a python neural networklibrary with CUDA and GPU
computation support)plus open source Java libraries like OpenNLP, Stanford Core NLP and
GATE.

● Developed a gold standard of responses to a carefully engineered set of queries and a random
sample of job descriptions toevaluatesearch engine versions rapidly and without expensive and
time-consumingA/B testing.
● Tried developing folksonomy-styletagging methods for documents. In this context,
experimented with keyword extraction techniques (Kea, Maui-indexer, KP-Miner and TextRank).
● Correlated click-through data with presented jobs and combined this with clusteringof word
neighborhood graphs to find jobs likely to be clicked on.
Skills Used: research aptitude, machine learning algorithms, documentclustering, text
classification, graph clustering, neural networks, word2vec, lda2vec, spaCy, chainer, mcl,
statistical NLP, python, nltk, scikit-learn, numpy, scipy, Spark, Scala, Spark MLLib, Databricks,
Spark Data Frames and Datasets, SQL, MySql,AWS, parquetfiles, gensim package, WordNet,
Stanford Core NLP, OpenNLP, Solr, LDA, HDP, IR, Information Retrieval
The Home Depot, Atlanta, GA Sep 2014 – June 2015
Data Scientist
Online Search
Searching for products that are relevant to a customer by looking at product descriptions in
natural language in addition to structured data about them.
● Applied recent research on neural-net generated distributed,dense vector representations of
words and phrases in experiments to understand thecontext and intent of a user query by
mining a hithertounexploited corpus of descriptions of ~1M products sold online by The Home
Depot.
● Used word2vec to overcome vocabulary mismatch by suggestingrelated search terms with the
objectiveof improving online customer experience on homedepot.com and increasingconversion
rates by an order of magnitude.
● Devised and selected algorithms that scaleto millions of product descriptions.
● Categorized and provided insight into the reasons for “No Results Found” pages by mining
query logs containingtens of millions of unique queries. Assessed the potential impacts of better
spellchecking, model number recognition, automatic rephrasingofqueries on the customer’s
experience and conversion rate.
● Discovered a way touse word2vec for correctingspelling errors in O(1) time.
Technologies Used: Python, Java, C/C++, bash, Linux, cygwin, awk, sed, Maven,Ant,
ontologies, OWL, RDF, Protégé, OpenRDF, WordNet, neural networks, word2vec,clustering, k-
means, kNN, R, Dragon Toolkit, aspell, Hunspell, Jazzy, LingPipe, ARK TurboParser
Dependency Parser, Stanford NLP, GATE, OpenNLP, Named Entity Recognition, Statistical
NLP, TF-IDF, Jaro-Winkler, Levenshtein distance, fuzzy search algorithms, recommendation
systems, collaborative filtering, Named Entity Recognition (NER), POS tagging.
21st Century Technologies (21CT), Austin, TX Dec 2013 – July 2014
Senior R&D Software Engineer
Social Network Analytics
Analyzing local neighborhood structure of social network nodes in a graph-theoretic way to
discover and quantify similarities between them.

● Developed a highly scalable and fast technique for analyzing and characterizing roles of
individuals within large social networks, by importing ideas from the analysis of protein
interaction networks in bioinformatics.This innovativeapplication of graphlets to social networks
with ~105 edges is able to precisely identify in a matter of seconds individuals who play similar
roles toa single exemplar. It made a US Navy project for identifying potential terroristthreats in a
large social network enormously successful and is now part of the core IP of 21CT.
● Employed R packages for principal components analysis,k-means clusteringand decision trees to
analyzeresults of using graphlet methods on Facebook100, a complete set of Facebook friendship
data from 100 American Universities in 2005.
● Implemented the graphlet application in C++ as well as Java for incorporation into company
codebase as a Maven project.
● Participated in a project to study collective entity resolution by fusing network data coming from
sources in different modalities.System is aimed at coalescing multiple monikers belonging to the
same individual.
● Gained experience working on DoD SBIR research projects with tight deadlines.
● Created a small OWL ontology with RDF n-triples usingProtégé and Sesame. Experimented with
Rya, a distributed RDF repository on top of the Accumulo key-value store. Generated and ran
SPARQL queries against the repository.
● Converted a group detection algorithm to MapReduce, using the Cascading abstraction layer on
top of Hadoop.
● Worked with several Python scripts and libraries as well as R packages for classification,
clustering, principal components analysis and visualization.
Tools Used: Terrorism Intelligence Analytics, DoD contracts, Social Network Analysis, Java,
Maven . C++, NoSQL, Accumulo, R, principal components analysis (PCA), machine learning,
Python, iPython, Scipy, Numpy, Eclipse, Netbeans, Cytoscape, graphlets, graph mining, RDF,
SPARQL, OpenRdf, ontologies, OWL, Protégé, Sesame, Hadoop, Cascading, MapReduce, Big
Data, Cloud, Linux, cygwin, bash, sed, awk, svn, Agile, SCRUM, software integration..
RenewData Corp. Oct. 2012 – July 2013
Senior Software Engineer
Information Retrieval from free text databases
Retrieving legal documents relevant to a litigation with high precision and recall expanding
queries where needed and dealing with “vocabulary mismatch”,
● Researched and implemented some of the latest IR techniques for query suggestion, relevance
feedback and ranked retrieval to modernize and differentiate the company’s two main products
in the eDiscovery marketplace.
● Experimented with Latent Semantic Indexing (LSI) as implemented in the “semantic vectors”
package to creates models which represent collections of documents in terms of underlying
concepts.
● Enhanced components which are written in Java, Ruby and C#, use MongoDB, MySQL and SQL
Server databases and communicatevia SOAP/REST web services. Technologies employed include
Apache Lucene and Solr (for free text search), JBoss, Spring and Maven
Skills Used: C++, Boost, g++, cygwin, Visual C++, Java, JBoss, Maven, Spring, Svn, QuickBuild,
web services, SOAP, REST, XML, Big Data, SQL, NoSQL, MongoDB, Agile, SCRUM, Rally,
Applied Research in Information retrieval (IR), TF-IDF, machine learning, algorithm design and

implementation, universal hash functions,Bloom filters, performance analysis and optimization,
Eclipse, Mockito, Junit, document classification.
Polycom, Inc. Mar. 2012 – Oct. 2012
Senior Staff Software Engineer
Videoconferencing Systems
Developed RESTful web services in the Java Restlet framework on the Android platform to expose
functionalities of an embedded videoconferencing system w ith Java and C++ components communicating
via Google protobuf.
Skills Used: Java, REST, Restlet framework, web services, JSON, XML, C++, Google protobuf,
Android, Agile, SCRUM, Jira, svn
Consulting Software Engineer June 2006 – February 2012
Notable clients include:
● ShoreTel, Austin, TX, VoIP Phones
Refactored and completely re-implemented the Qt 4.6/C++-based Network Access Controller
eliminating critical bugs and memory leaks in next-generation phones.
● PayPal, Austin, TX, Infrastructural Software
Modified enterprise-wideC++ softwaretouse a standard version ofthe Xerces XML Parser.Led
a pilot project toprevent buffer overflow,code injection and other vulnerabilities in PayPal
softwareby introducingFortify,a static analysis tool intothe development process.
● Advanced Micro Devices, Austin, TX, CPU Diagnostics
Developed a remote diagnostics tool using the XML-RPC protocol.
● IBM, Austin, TX, AIX Kernel Technical Support
Interfaced with IBM customers worldwideas well as AIX kernel developers to resolvecode defects
in the loader and linker.
Tools and Skills: C, C++, Qt 4.6, Ubuntu Linux, Embedded Linux, CentOs Linux, Visual Studio
2008, Eclipse IDE, KDevelop, Subversion, CVS, Perforce, Rational ClearCase/ClearQuest, Jira,
XML, Xerces DOM Parser, Agile methodology, SCRUM, MVC architecture, design patterns,
multi-threaded systems, compiler front-end, XML-RPC protocol, http, ftp, AIX, bash, ksh, gcc,
gdb, Oracle VirtualBox, network programming, sockets, client interaction
Tanisys Technology, Austin, TX Nov 2004 – June 2006
Senior Engineer
Embedded compiler in C and message-oriented middleware in C++ for a memory tester
● Facilitated the use of the M1000 high-end memory tester by defining a custom language and
implementing an embedded compiler (target – PPC405) for it using flex, bison and Gnu crosstool
on Linux.
● Developed and deployed multi-threaded middlewarein Visual C++ for a distributed system with
socket communication between modules.
Skills: C, gcc, g++, flex, bison, compilers, GNU crosstool,embedded Linux, cygwin,Visual C++, sockets,
multi-threaded programming,UML, use-casescenarios, SQL, DatabaseTemplate Library (DTL)
Pattern Discovery, Austin, TX Jan 2004 – Nov 2004
Owner

Startup; Federal contracts under SBIR/STTR programs; collaboration with National Labs
● Researched ways of using swarm intelligence techniques for target identification by a collection
of power-constrained unmanned aerial vehicles (UAVs) in response to a US Navy SBIR
solicitation.
● Gained extensive field and industry knowledge by collaboratingwith Sandia National Labs and
faculty at the University of New Mexico.
Skills: Research, Artificial Intelligence, autonomous,decentralized systems,collaboration,knowledge
transfer
Intel Corp., Austin, TX July 2003 – Jan 2004
Compiler Engineer Consultant
Dynamic, profile-directed compiler development
● Implemented parts of an experimental dynamically retranslating binary-to-binary compiler
conceptually similar to H-P Labs' Dynamo to enable x86 (IA-32) code to be run in the Itanium 2
(IA-64) environment.
Skills: dynamic, profile-directed compilers, x86, and Itanium 2 instruction sets,research,Visual C++
Metrowerks, then a Motorola Company, Austin, TX July 2000 – June 2003
Compiler Engineer
Compiler Development; Performance Analysis and Measurement; Software Integration
● Initiated and led a project to integratetheMetrowerks re-targetablecompiler with HiWare,
Switzerland's Static Single-Assignment (SSA) based compiler to modernizeit and demonstrate
how new,more powerful optimizations enabled by SSA Form can improve code quality without
degrading performance.
● Re-implemented Global Common Sub Expression Elimination and other major dataflow
optimizations in the Metrowerks IntermediateRepresentation Optimizer toremove flaws and
enhance code quality.
● Measured compilation speed and code quality using Intel VTune and EEMBC, gcc and SPEC92
benchmarks.
Skills Used: C, compiler design, CodeWarrior IDE, CVS, SSA Form, performance analysis, Intel VTune,
collaboration
HIGHLIGHTS OF PREVIOUS EXPERIENCE
● Improved the performanceof the IBM Java Just-in-Time(JIT) compiler in conjunction with IBM
Tokyo Research Labs, achievingbenchmarkscores exceeding that of Microsoft Internet Explorer
by more than 35%. Discovered patterns of suboptimal code in regions outside busy loops using
Intel VTune as profiler leading to further improvement in scores on the order of 12%.
● Modified the JVM and tested a new criterion for JIT compilation based on actuarial lifetime
prediction algorithms.
● Invented an alternativeprofilingmethodology for analyzingcompetitiveJava benchmarks.
● Studied human perception of fonts and patented a font compression algorithm based on
discoveringpatterns in font “strokes”.
● Doubled the throughput of GTSTRUDL – a widely-used computer-aided structural engineering
tool by drastically refactoringits C-based kernel in which more than 90% of runtimewas spent.
This performanceoptimization helped make the product viablein the face of competition from
similar products from other companies such as McDonnell Douglas.

● Showed in MS Thesis at Univ. of Denver that theRelativeNeighborhood Graph of a dot pattern
is a strongpredictor of how humans connect dots in that pattern,and whether the random-dot
Moiré effect is perceived in it, leading to some hypotheses about early vision.
● GraduateResearch Assistant,University ofDenver Dept. of Geography and Georgia Tech, Dept.
of Computer Science, AI Group.

Data science nlp_resume-2018-abridged

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a Data science nlp_resume-2018-abridged

Semelhante a Data science nlp_resume-2018-abridged (20)

Último

Último (20)

Data science nlp_resume-2018-abridged