Prafulla Kumar Dash has over 5 years of experience in Hadoop development. He has worked on projects involving loading data from various sources into HDFS, performing ETL using Hive and Pig, and generating reports. Currently he is working on a bank reconciliation project involving matching data between systems using Spark and Scala.
1. PRAFULLA KUMAR DASH
License: 100-015-953
Mob +91-9818868839, 8892655753
Email – rajadash2006@gmail.com
Professional Experience:
Working with Synechron Technologies , Bangalore as a Sr. Associate Software from March 2016
Worked with Algofusion Technology Pvt. Ltd as a HADOOP Developer from 15-Jun-2015 to 27-Feb-2016
Worked with Accretive Health Pvt. Ltd from March-2010 to till Sep-2014
Experience Summary
Total 5.8+ years Including 2 + (HADOOP) years of experience in development and implementation of client/server multi-
user business applications using MSBI,JAVA and HADOOP framework
Having hands on experience in using Hadoop Technologies such as HDFS, Hive, Hbase, Cassandra, Pig, Sqoop,
Impala, Flume, spark with Scala, KAFKA
Having experience on importing and exporting data from different systems to Hadoop file system using Sqoop, databases
like mysql, oracle to hadoop.
Having experience on creating databases, tables and views in Hiveql, Impala and Pig Latin.
Strong knowledge of Hadoop and Hive’s analytical function.
Implemented Hadoop stack and different bigdata analytic tools, migration from different databases like mysql, oracle to
hadoop.
Having experience on Storage and Processing in Hue covering all Hadoop ecosystem components.
Load and transform large sets of structured, semi-structured and unstructured data using Hadoop ecosystem
components.
Having experience on using oozie to define and schedule jobs
Handle multiple files like xml, html, swift, text, tsv csv etc
Having good experience in all flavors’ of hadoop (Apache, Cloudera, Hortoworks).
Good knowledge of single node and multinode cluster setup
Good knowledge on oops concepts of core Java.
Good knowledge in using Linux commands
Working knowledge on Scala with Spark 2.0
Strong experience with good knowledge in SQL (Including Triggers, Stored Procedures). Experience in writing small to
complex queries.
Highly motivated, subject oriented, has ability to work independently and as a part of the team with Excellent Technical,
Analytical and Communication skills.
Algofusion and Me:
Experience in workingwithcrossestechnical andfunctional teamsand meetingthe customertimelines.
Havinggood experienceinETL Process.Goodknowledgeof dataware house concepts.
Havingexperience onAdminStudio,CompositionStudioandOperationsWorkbench.
Havingexperience inconfiguringthe reconciliations,definingthe matchingrulesforthe recons.
Proficienton SecuritiesReconciliation, NostroReconciliation,ATMReconciliation,Retail BankingReconciliations.
Well versedin Database Designforreconciliation product.
2. Project #1 –Now Working on
Client IDFC Bank.
Duration June 2015 – till date
Role HADOOP DEVELOPER
Environment Ambari server-2.1.0, HDP-2.3.0 with HDP-UTIL-1.1.0
Project Description:
The crux of the Reconciliation Application is to perform matching between various feeds and to ensure
that the data is consistent without any discrepancies inretail banking. The goalof this application is to describe
a mechanism which allows providing an end to end solution on reconciliations to financial institutions, and to
handle end to end recon process, starting from file retrieval, data transformation, and reconciliation to
exception enrichment and resolution.
Roles& Contribution:
Creating HADOOPEnvironment/ClusteronRHEL for DEV/UAT/PRODUCTION
Deployall hadoopjobs forall Environments
Takingall flatfilesfromthe source to executabledestination
Execute reconjobsandsendexecutedreportstoBA and Reportingteam
Start workingonscala withsparkfor our backendengine jobconfigure andcreate matchrulesbasedon
businessrules.
Debuggingthe Sparkprogram’swiththe helpof gigaspace andspark UI.
Sometimesworkingwithweblogicforwardeployments.
Givingsupportforfront-endusersovercurl/webhdfs
Updatingtable detailsinORACLEfor recon update andcreate
ConfiguredreconsforRetail Banking reconciliationsanddefinedthe matchingrules.
Preparedtestdatafor dry runs.
SIT and UAT support
Codefrux and Me:
WorkedwithCodefrux TechnologyPvt.Ltd. Forshortperiodof time,Reasonof leavingprojectcompleted
We usedHadoop1.x forour projectenhancementwithHIVEandMR
It’sa productbasedcompanyfor iOSdevelopment.
Project #1
Client Quikr-MEDIA
Duration September 14 – June 15
Role HADOOP DEVELOPER
Environment Hadoop 1x on Ubuntu12.0
Roles& Contribution:
Creating HADOOPEnvironment/Clusteron Ubuntufor DEV
3. Deployall hadoopjobsforall Environments
Takingall flatfilesfromthe source to executabledestination
We are loadingall filestoourhive tablesusingserde, thenwe will datafetchingasperrequirement
UsingSqoopfor Importand Exportdata from HDFS to Mysql database.
Preparedtestdatafor dry runs.
Project – 1 Log mining system
Customer Quikr-Media (POC- on HADOOP)
Period Sep- 2014 to Jun-2014
Description This project aims to move all log data from individual servers to HDFS as the main log storage and
management system and then perform analysis on these HDFS data-sets. Flume was used to move the
log data periodically into HDFS. Once the data-set is inside HDFS, Pig and Hive were used to perform
various analysis.
Role Developer
Environment
Cloudera, Hadoop, Map Reduce(JAVA), HDFS, Hive, Flume,
Responsibilities Involved in analyzing the system and business.
Involved in transferring files from OLTP server to Hadoop file system.
Involved in writing queries with HiveQL and Pig.
Involved in database connection by using SQOOP.
Importing and Exporting Data from Oracle to HiveQL.
Importing and Exporting Data from HiveQL to HDFS.
Process and analyze the data from Hive tables using HiveQL.
About Accretive Health and Projects
Description: AH is a Revenue Cycle Management (RCM) tool which is used to save the information of US Patients where
the patient registers and takes the services from the hospital and the follow up with the Insurance companies. It has 3
modules which has Front, Middle and Back. In Front Module patient, registers for the appointment with the doctor. In Middle
module, the patient takes services from the hospital. In the Back module, the patient pays the bill for the services render by
hospital.
Pssroject Details/Technical
Operating systems Ubuntu ,CentOS, XP, Win 7 and RedHat Linux.
Framework Hadoop, Hdfs, mapreduce, hive, hbase, sqoop, zookeeper, Cassandra, flume, impala,
Languages Sql, Hiveql, PigLatin, core java,
Database MYSQL, SQL HBASE, Casandra
Hadoop distributions Apache Hadoop, Cloudera,
Project Description:
As per American Health Association (AHA) there is fraud of 5 million dollars annually in health insurance in US. This
project is to identify possible fraudulent claims out of the total claims processed daily. We receive insurance claim data
(OLTP) in 11 different files from auto adjudication system in X12 format with fixed layout format. We load the data into HDFS
and have written multiple map reduce jobs to convert the X12 format into CSV format and load the data into Hive after
creating tables. Hive join queries are used to fetch information from multiple tables. Query output is populated into temporary
4. tables to perform more complex joins. Custom Hive UDFs have been created for data formatting. Map reduce jobs are
created to collect output from hive and generate multi claim xml files for further processing.
Project – 3 Claim Enhancement
Customer St. John Medical Center (SJMC)
Period Jan- 2014 to Sep-2014
Description The purpose of the project is to store terabytes of log information generated by the HL7 and Extract
meaningful information out of it. The solution is based on the open source BigData s/w Hadoop .The data
will be stored in Hadoop Distributed file system and processed using Map/Reduce jobs. Which intern
includes getting the raw text/weblog data from the websites, Process the text/weblog to obtain service and
insurance information, Extract various reports out of the product insurance information and Export the
information for further processing, Hadoop which can able to process large data sets (i.e. Tera bytes and
Peta bytes of data) in order to meet the client requirements with the increasing competition from his Hospital
Admission.
Every hospital has HOST system in which they registered the Patients information. Accretive collect all
the information into files and put them in server like we have belig file from ABCD hospital, so in server we have
folder ABCD/belig so operator put those file into this location. We have diff Sqoop script for each file to get
those data into HDFS. Once data in our HDFS we apply our business logic and requirement to get meaningful
data out of it by applying our MR, PIG and HIVE. We send it to hive table in structured format and Hbase
sometime, by using sqoop return the output as required to RDBMS then the reporting team use the meaningful
data for their reporting. Transfer data to Stage database (HIVE and HBASE). When data completed on stage
so we have others MAPREDUCE Job which will call tran Procedures to transfer this data to Accretive Tran
Data base (Hive and Hbase).These procedures update/insert information into person table, registrations, claims
payments, summary ETC tables.
Role Developer
Environment Hadoop 1.0.3, JDK1.6, Linux 6.0(Red Hat LINUX 6.3), Hive, HBASE, Pig, Sqoop, flume
Responsibilities Moved all crawl data flat files generated from various hospital to HDFS for further processing.
Written the Apache PIG scripts to process the HDFS data.
Created Hive tables to store the processed results in a tabular format.
Developed the sqoop scripts in order to make the interaction between RDBMS and HDFS.
Completely involved in the requirement analysis phase.
Troubleshoot map reduce jobs, PIG scripts and HIVE queries.
Involved in Commissioning and decommissioning the Data Node
Installed and configured pentaho for reports
Implemented PIG scripts According business rules.
Implemented Hive tables and HQL Queries for the reports.
Interacted closely with business users, providing end to end support
Preparing analysis document for the existing code.
Created Technical design documents based on business process requirements.
Involved in the debugging of the coding.
Project – 1 Middle Enhancement
Customer AHSPL
Period Aug - 2010 - Dec 2013
5. Description In my team we were responsible for front-end UI and Stored procedure enhancement certain code
changes was happening in a periodical basis as per AHA (DRG Codes) so we need to change all
the necessary code change for the incoming and Inter patients so that it could be effect to their bill
to respect the AHA and Insurance process flow. In-order to make changes in the respective table
for application immediate effect.
Role Developer
Environment TFS(team foundation Server) as code repository and SQL,
SQL SERVER 2005/2008/2012
Responsibilities As a Developer Understanding the business requirements given by client.
Designing Low level design Document for the Application.
Reviewing and modifying of existing Program.
Coding as per the change requests and technical specifications.
Production Implementation.
Tracking metric sheet, undergoing internal reviews and external reviews by peers
Involved in creating tables and their relationships in the data warehouse.
Created Technical Report Requirement document for the Standard Reports.
Involved in creating Dynamic Stored Procedures and views.
Involved in analyzing and gathering the requirements for Row level Security and Role based security.
Involved in Creating and Testing the SSIS packages developed for security.
Involved in creating SSIS packages to migrate large amounts of data.
Created Databases, Tables, Cluster/Non-Cluster,Index, Unique/Check Constraints, Views, Stored
Procedures, Triggers.
Involved in performance tuning of queries for query efficiency.
Methodology- AGILE
In Accretive Health we were following Agile methodology for our development. We strictly following scrum framework,
including daily scrum, backlog grooming, sprint, retrospective etc.
Devised and prepared concise and effective User Stories.
Determined precise and accurate User Story acceptance criteria used by developers.
Facilitated distinguish user requests from user needs
Academic Profile
BA (Bachelor of Arts with English) From NOU with 74.4%.
BCA (Bachelor of Computer Applications) From VMU with 73.4%.
Strengths:
Passion to learn new technology quickly, try to adopt the same to our practical life
Energetic, Hardworking, and capable to perform responsibilities under extreme pressure and time constraints.
Flexible, Good Interpersonal Skill and Confident, Straight forward, Silent Observer, Good Follower, Result oriented.
Professional Activities:
Successfully completed MAD (Make A Difference) Program in Accretive Health
Won ACE (Accretive Health Champion Employee) award for the year 2012, 2013 consecutively
PERSONAL INFORMATION:
Father’s Name : Upendra Kumar Dash
Nationality : Indian
Language Known : English
Passport Status : Valid Upto-2023