2. CONTENT
• Data Mining
• Methods of doing
• Difference with standard auditing
• Benefits and Risks
• Patterns in data
• Utilisation in different audits
• Forensic Audit
• What is a fraud
• Profile of a fraudster
• Tools available in excel
• Theorems
3. A PROBLEM…
•A large retail chain doing substantially well
had
•Dismal diaper sale ; Excellent Beer sale
SOLUTION
Place them together !
4. WHICH IS SUSPICIOUS ?
• User 1: Login → Click on Product #8473 → Click
on Product #157 → Click on Product #102 →
Complete Purchase
• User 2: Failed Login → Request Password →
Direct Link to Product #821 → Change Shipping
Address →Complete Purchase
7. IDENTIFYING SUSPICIOUS TRANSACTIONS
Computer Behavioral Smartphone Analytics
Mouse Dynamics Screen Pressure
Typing Speed Angle of usage of phone
Previous Navigation
Habits
Movement across screen
Entry & Exit points on
website
Heart Rate
8. DATA MINING - VALUE ADDITION
What was my total revenue in the last five years?
TO
What were sales in UP last March? Drill down to Kanpur
TO
What’s likely to happen to Kanpur sales next month? Why?
9. DATA MINING - METHODS
•Association
•Sequence or path analysis
•Classification
•Clustering
•Prediction
10. DATA MINING - TECHNIQUES
•Artificial neural networks
•Decision trees
•The nearest neighbour method
11. DATA MINING V. REGULAR AUDIT
Labor Verification
Regular Audit Data Mining
1. Contracted rate =
Billing rate
1. Contracted rate = Billing rate
2. The billing is relevant to
the audit period.
2. Employee Pay grade wise payment
3. Statutory Compliances 3. Mapping resignation to Last Pay
4. Mapping computer / biometric logins
after resignation / termination
5. Overtime Analysis to determine
a.) Regular Overtime
b.) Employees who worked 100 hrs
6. Those not availing leaves
13. DATA MINING – WHY INTEGRATE
•Transaction Volume
•Mitigate Inherent Risk
•Value addition to the client
•Cost Effective
14. DATA MINING – BENEFITS
•Remove Sampling risk – 100% coverage
•Decrease in Audit costs
•Provide Real time audit opinions
•Establish Completeness and accuracy
15. DATA MINING – SOFTWARE TYPES
•Generalized Software
•Specialized Software
16. DATA MINING – SOFTWARE TYPES
Characteristics Generalised Specialised
Batch Processing No Yes
Support entire audit procedures No Yes
User friendly Yes No
Require technical skill No Yes
Automated No Yes
Capable of learning No Yes
Cost Lower Higher
17. DATA MINING – RISKS
•First year costs might be higher
•Strong understanding of operations
•Availability of data in desired format
•Risk of Control totals
20. PURCHASES
•Round number transactions
•Duplicate transactions
•Same, Same, Different Test
•Above average payments
•Transactions exceeding PO quantity
•Sequential Invoice numbers
•Too many invoices beginning with “9”
DATA MINING – INTERNAL AUDITS
21. CREDITORS
•Those with high percentage of returns
•Those with rapid increasing purchases
•Small denomination but quick frequency
•SOD for vendor approver and purchaser
DATA MINING – INTERNAL AUDITS
24. VENDORS MASTER
• Analysis of Vendors master for creation date
• Identifying regular prompt vendor payment
• Cross reference vendors to employees
• Same, Same and Different test
DATA MINING – INTERNAL AUDITS
25. EMPLOYEES AND PAYROLL
• Regularly working overtime
• Not taking leaves
• Satisfied with unjustified salary deduction
• Segregating employees with salary in cash
• Biometric analysis – First to enter / last to leave
DATA MINING – INTERNAL AUDITS
26. TRAVEL EXPENSES
• Identify weekend or holiday travel
• Search for same or similar claims
• Identify costs outside of policy or costly late bookings
• Identify conveyance claim made for the same time period
as car rental or other transportation
• Compare mileage claims to distances reported
• Instances where employee has refunded a first class
ticket for an economy, but not reimbursed the balance
back to the company.
DATA MINING – INTERNAL AUDITS
27. SALES & DEBTORS
• Comparing Invoice to Shipping
• Conversely comparing Shipping to Invoice
• Preference in sale to a particular customer
• Same, Same, Different test to sale price
• Debtors
• Lapping
• Old outstanding invoices
DATA MINING – INTERNAL AUDITS
28. INVENTORY
• Determining slow moving inventory
• Determining quick moving inventory
• Purchasing frequency of a particular product
• Mapping stock valuation to last sale price
DATA MINING – INTERNAL AUDITS
29. • Transactions a customer does before shifting? (to
prevent attrition)
• Profile of an ATM customer and what type of
products is he likely to buy? (to cross sell)
• Patterns in credit transactions lead to fraud? (to
detect and deter fraud)
• Traits of a high-risk borrower? (to prevent
defaults, bad loans, and improve screening)
DATA MINING – BANKS
30. • Duplicate Customer id
• DP Limit = Limit = Outstanding
• Comparing Unsecured and secured within scheme
• Rate of Interest being applied
• Last Credit amount and Date
• Same PAN – Different Customer id
• Last Stock statement summary
DATA MINING – BANKS
33. REPORT TO THE NATION
• Each organization loses 5% of their REVENUE to fraud
• Asset Misappropriation is the biggest factor
• Fraud are generally NOT discovered for 18 months
• Higher the fraud perpetrator BIGGER the fraud
• 58% organizations NEVER recovered anything
35. BANK FRAUDS – 9 MONTHS FY 2014-15
Name Number of Cases Amount
PNB 123 2036,00,00,000
CBI 174 1736,00,00,000
SBI 474 1327,00,00,000
Syndicate 114 749,00,00,000
OBC 86 719,00,00,000
BOB --- 597,00,00,000
IDBI --- 507,00,00,000
UCO --- 424,00,00,000
United Bank --- 376,00,00,000
TOTAL 7542,00,00,000
36. • A false representation of a matter of fact
• whether by words or by conduct,
• by false or misleading allegations, or
• By concealment of what should have been
disclosed
• that deceives and is intended to deceive another
• so that the individual will act upon it to her or his
legal injury.
WHAT IS FRAUD ?
38. WHAT IS FORENSIC AUDIT
•The use of accounting skills;
•To investigate frauds / embezzlement and
•To analyze financial information
•For use in legal proceedings
39. FORENSIC VIS-À-VIS STATUTORY
Forensic Statutory
Very focused and micro approach Macro approach with wide coverage
Examines Reliability of documentation Relies on Documentary evidences
Not compulsory Regulatory compliance
Establishing existence of fraud Ensuring True and fair view
Determining the quantum of loss Verifying correct representations
Gathering evidences Evaluating Internal Controls
41. NEED FOR LEARNING THE TRAITS
Why frauds go unnoticed during stat audit -
• extremely intelligent
• Conversant with internal systems
• Technology savvy
• Aware of stale audit procedures
42. FRAUDSTERS PROFILE
• Flamboyant lifestyle
• Very aggressive in his approach / targets
• Over protectiveness of data / documents
• Being the first one in and last one out
• Unusual close association with vendor / customers
48. TOOLS AVAILABLE IN EXCEL
•Analyze round number transactions
•Duplicate detection
•Same, Same and different tests
•Above average payments to vendors
49. TOOLS AVAILABLE IN EXCEL
•Gap detection
•Automated sampling
•MATCH function
•Employee – Vendor match
50. SPECIAL MENTION – TIME & SPACE
•Establish transactions in quick successions
which take a substantial time in happening
•Storage in excess of the possible space
51. SPECIAL MENTION – RSF
•Ratio of Largest number to the second
largest number in the set
RSF = Largest Number / 2nd Largest
•RSF greater than 10 highlights probability of
fraud / error
52. SPECIAL MENTION – RSF
•Types of errors / frauds it can unearth
• Data Entry mistakes
• Fat Finger errors
• Wrong coding with masters
• Capital Asset written off in expense
• Excess payments in payroll
53. SPECIAL MENTION – BENFORD’S
LAW
•Formulated by Simon Newcomb in 1881 ;
further researched by Frank Benford in 1938
•U.S. accepts Benford’s law as an evidence
•Statistical tool which can be applied to
normal audits also to automate samples
55. SPECIAL MENTION – M-SCORE
•Theory propounded by Prof. Beneish
•Stipulates the accuracy of financial
statements based on certain ratios
•Ratios such as
• Sales to receivables and Sales Growth Index
• Gross margin Index
• Asset Quality Index
• Depreciation Index
56. SPECIAL MENTION – M-SCORE
•Financial statements score >-2.22 is
considered as fudging
•Statistically proven to have 76% accuracy
•Model being adopted by Income Tax
Department for CASS
57. EXCEL LIMITATIONS
•Absence of Log
•Not admissible in court
•Involves slight complexity in applying
•Data size limitation / Instability
•Risk of Hidden data
Association
Association is one of the best-known data mining technique. In association, a pattern is discovered based on a relationship between items in the same transaction. That’s is the reason why association technique is also known as relation technique. The association technique is used in market basket analysis to identify a set of products that customers frequently purchase together. Retailers are using association technique to research customer’s buying habits. Based on historical sale data, retailers might find out that customers always buy crisps when they buy beers, and, therefore, they can put beers and crisps next to each other to save time for customer and increase sales.
Sequential Patterns
Sequential patterns analysis is one of data mining technique that seeks to discover or identify similar patterns, regular events or trends in transaction data over a business period.
In sales, with historical transaction data, businesses can identify a set of items that customers buy together different times in a year. Then businesses can use this information to recommend customers buy it with better deals based on their purchasing frequency in the past.
Classification
Classification is a classic data mining technique based on machine learning. Basically, classification is used to classify each item in a set of data into one of a predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we develop the software that can learn how to classify the data items into groups. For example, we can apply classification in the application that “given all records of employees who left the company, predict who will probably leave the company in a future period.” In this case, we divide the records of employees into two groups that named “leave” and “stay”. And then we can ask our data mining software to classify the employees into separate groups.
Clustering
Clustering is a data mining technique that makes a meaningful or useful cluster of objects which have similar characteristics using the automatic technique. The clustering technique defines the classes and puts objects in each class, while in the classification techniques, objects are assigned into predefined classes. To make the concept clearer, we can take book management in the library as an example. In a library, there is a wide range of books on various topics available. The challenge is how to keep those books in a way that readers can take several books on a particular topic without hassle. By using the clustering technique, we can keep books that have some kinds of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in that topic, they would only have to go to that shelf instead of looking for the entire library.
Prediction
The prediction, as its name implied, is one of a data mining techniques that discovers the relationship between independent variables and relationship between dependent and independent variables. For instance, the prediction analysis technique can be used in the sale to predict profit for the future if we consider the sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.
Artificial neural networks are non-linear, predictive models that learn through training. Although they are powerful predictive modelling techniques, some of the power comes at the expense of ease of use and deployment. One area where auditors can easily use them is when reviewing records to identify fraud and fraud-like actions. Because of their complexity, they are better employed in situations where they can be used and reused, such as reviewing credit card transactions every month to check for anomalies.
Decision trees are tree-shaped structures that represent decision sets. These decisions generate rules, which then are used to classify data. Decision trees are the favored technique for building understandable models. Auditors can use them to assess, for example, whether the organization is using an appropriate cost-effective marketing strategy that is based on the assigned value of the customer, such as profit.
The nearest-neighbor method classifies dataset records based on similar data in a historical dataset. Auditors can use this approach to define a document that is interesting to them and ask the system to search for similar items.
Numeric Patterns – fictitious invoice numbers, fictitiously-generated transaction amounts…
Time Patterns – Transactions occurring too regularly, activity at unusual times or dates…
Name Patterns – Similar and altered names and addresses…
Geographic Patterns – Proximity relationships between apparently unrelated entities…
Relationship Patterns – Degrees of separation…
Textual Patterns – Detection of “tone” rather than words…