SlideShare uma empresa Scribd logo
1 de 17
Question #1) What do you understand by Data Modelling?
Ans) Data Modelling is the diagrammatic representation showing how the entities are
related to each other. It is the initial step towards database design. We first create the
conceptual model, then logical model and finally move to the physical model.
Generally, the data models are created in data analysis & design phase of software
development life cycle.
Question #2) Explain your understanding of different data models?
Ans) There are three types of data models – conceptual, logical and physical. The level
of complexity and detail increases from conceptual to logical to a physical data model.
The conceptual model shows a very basic high level of design while the physical data
model shows a very detailed view of design.
The conceptual model will be just portraying entity names and entity relationships.
Figure 1 shown in the later part of this article depicts a conceptual model.
The logical model will be showing up entity names, entity relationships, attributes,
primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this
article depicts a logical model.
The physical data model will be showing primary keys, foreign keys, table names,
column names and column data types. This view actually elaborates how the model will
be actually implemented in the database.
Question #3) Throw some light on your experience in Data Modelling with
respect to projects you have worked on till date?
Note: This was the very first question in one of my Data Modelling interviews. So, before
you step into the interview discussion, you should have a very clear picture of how data
modeling fits into the assignments you have worked upon.
Ans) I have worked on a project for a health insurance provider company where we
have interfaces build in Informatica that transforms and process the data fetched from
Facets database and sends out useful information to vendors.
Note: Facets is an end to end solution to manage all the information for health care
industry. The facets database in my project was created with SQL server 2012.
We had different entities that were linked together. These entities were subscriber,
member, healthcare provider, claim, bill, enrollment, group, eligibility, plan/product,
commission, capitation, etc.
Below is the conceptual data model showing how the project looked like on a
high-level
Figure 1:
Each of the data entities has their own data attributes. For example, a data attribute of
the provider will be provider identification number, few data attributes of the
membership will be subscriber ID, member ID, one of the data attribute of claim will
claim ID, each healthcare product or plan will be having a unique product ID and so on.
Question #4) What are the different design schemas in Data Modelling? Explain
with the example?
Ans) There are two different kinds of schemas in data modeling
 Star Schema
 Snowflake Schema
Now I will be explaining each of these schemas one by one.
The simplest of the schemas is star schema where we have a fact table in the center
which references multiple dimension tables around it.
All the dimension tables are connected to the fact table. The primary key in all dimension
tables acts as a foreign key in the fact table.
The ER diagram (see Figure 2) of this schema resembles the shape of a star and that is
why this schema is named as a star schema.
Figure 2:
The star schema is quite simple, flexible and it is in de-normalized form.
In a snowflake schema, the level of normalization increases. The fact table here
remains the same as in star schema.
However, the dimension tables are normalized. Due to several layers of dimension
tables, it looks like a snowflake and thus it is named as snowflake schema.
Figure 3:
Question #5) Which scheme did you use in your project & why?
Question #6) Which schema is better – star or snowflake?
Ans (Combined for Q# 5&6): The choice of a schema always depends upon the
project requirements & scenarios.
Since star schema is in de-normalized form, you require fewer joins for a query. The
query is simple and runs faster in a star schema. Coming to the snowflake schema, since
it is in normalized form, it will require a number of joins as compared to a star schema,
the query will be complex and execution will be slower than star schema.
Another significant difference between these two schemas is that snowflake schema does
not contain redundant data and thus it is easy to maintain. On the contrary, star
schema has a high level of redundancy and thus it is difficult to maintain.
Now, which one to choose for your project? If the purpose of your project is to do more
of dimension analysis, you should go for snowflake schema. For example, if you need to
find out that “how many subscribers are tied to a particular plan are currently
active?” – go with snowflake model.
If the purpose of your project is to do more of metrics analysis, you should go with a
star schema. For example, if you need to find out that “what is the claim amount
paid to a particular subscriber?” – go with a star schema.
In my project, we used snowflake schema because we had to do analysis across several
dimensions and generate summary reports for the business. Another reason for using
snowflake schema was it is less memory consumption.
Question#7) What do you understand by dimension and attribute?
Ans) Dimensions represent qualitative data. For example– plan, product, class are all
dimensions. A dimension table contains descriptive or textual attributes. For
example, product category & product name are the attributes of product dimension.
Question #8) What is a fact & a fact table?
Ans) Facts represent quantitative data. For example – net amount due is a fact. A fact
table contains numerical data and foreign keys from related dimensional tables. An
example of the fact table can be seen from Figure 2 shown above.
Question #9) What are the different types of dimensions you have come
across? Explain each of them in detail with an example?
Ans) There are typically five types of dimensions.
1) Conformed dimensions: A Dimension that is utilized as a part of different areas is
called as conformed dimension. It might be utilized with different fact tables in a single
database or over numerous data marts/warehouses. For example, if subscriber
dimension is connected to two fact tables – billing and claim then the subscriber
dimension would be treated as conformed dimension.
2) Junk Dimension: It is a dimension table comprising of attributes that don’t have a
place in the fact table or in any of the current dimension tables. Generally, these are the
properties like flags or indicators. For example, it can be member eligibility flag set as
‘Y’ or ‘N’ or any other indicator set as true/false, any specific comments, etc. if we keep
all such indicator attributes in the fact table then its size gets increased. So, we combine
all such attributes and put in a single dimension table called as junk dimension having
unique junk IDs with a possible combination of all the indicator values.
3) Role Playing Dimension: These are the dimensions which are utilized for multiple
purposes in the same database. For example, a date dimension can be used for “Date
of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role
playing dimension. The primary key of Date dimension will be associated with multiple
foreign keys in the fact table.
4) Slowly Changing Dimension (SCD): These are most important amongst all the
dimensions. These are the dimensions where attribute values vary with time. Below are
the varies types of SCDs
 Type-0: These are the dimensions where attribute value remains steady with
time. For example, Subscriber’s DOB is a type-0 SCD because it will always
remain the same irrespective of the time.
 Type-1: These are the dimensions where previous value of the attribute is
replaced by the current value. No history is maintained in Type-1 dimension. For
example, Subscriber’s address (where the business requires to keep the only
current address of subscriber) can be a Type-1 dimension.
 Type-2: These are the dimensions where unlimited history is preserved. For
example,Subscriber’s address (where the business requires to keep a record of
all the previous addresses of the subscriber). In this case, multiple rows for a
subscriber will be inserted in the table with his/her different addresses.
 There will be some column(s) that will identify the current address. For
example, ‘start date’ and ‘End date’. The row where ‘End date’ value will be
blank would contain subscriber’s current address and all other rows will be having
previous addresses of the subscriber.
 Type-3: These are the type of dimensions where limited history is preserved.
And we use an additional column to maintain the history. For
example, Subscriber’s address (where the business requires to keep a record of
current & just one previous address). In this case, we can dissolve the ‘address’
column into two different columns – ‘current address’ and ‘previous address’.
 So, instead of having multiple rows, we will be having just one row showing
current as well as the previous address of the subscriber.
 Type-4: In this type of dimension, the historical data is preserved in a separate
table. The main dimension table holds only the current data.
 For example, the main dimension table will have only one row per subscriber
holding its current address. All other previous addresses of the subscriber will be
kept in the separate history table. This type of dimension is hardly ever used.
5) Degenerated Dimension: A degenerated dimension is a dimension which is not a
fact but presents in the fact table as a primary key. It does not have its own dimension
table. We can also call it as a single attribute dimension table.
But, instead of keeping it separately in a dimension table and putting an additional join,
we put this attribute in the fact table directly as a key. Since it does not have its own
dimension table, it can never act a foreign key in fact table.
Question#10) Give your idea regarding factless fact? And why do we use it?
Ans) Factless fact table is a fact table that contains no fact measure in it. It has only the
dimension keys in it.
At times, certain situations may arise in the business where you need to have factless
fact table. For example, suppose you are maintaining an employee attendance record
system, you can have a factless fact table having three keys.
Employee_ID
Department_ID
Time_ID
You can see that the above table does not contain any measure. Now if you want to
answer below question, you can do easily using the above single factless fact table
rather than having two separate fact tables:
“How many employees of a particular department were present on a particular day?”
So, factless fact table offers flexibility to the design.
Question #11) Distinguish between OLTP and OLAP?
Ans) OLTP stands for Online Transaction Processing system & OLAP stands
for Online Analytical processing system. OLTP maintains the transactional data of
the business & is highly normalized generally. On the contrary, OLAP is for analysis and
reporting purpose & it is in de-normalized form.
This difference between OLAP and OLTP also gives you the way to choosing the design of
schema. If your system is OLTP, you should go with star schema design and if your
system is OLAP, you should go with snowflake schema.
Question #12) What do you understand by data mart?
Ans) Data marts are for the most part intended for a solitary branch of business. They
are designed for the individual departments. For example, I used to work for a health
insurance provider company which had different departments in it like Finance,
Reporting, Sales and so forth.
We had a data warehouse that was holding the information pertaining to all these
departments and then we have few data marts built on top of this data warehouse.
These DataMart were specific to each department. In simple words, you can say that a
DataMart is a subset of a data warehouse.
Question #13) What are the different types of measures?
Ans) We have three types of measures
 Non- additive measures
 Semi-additive measures
 Additive measures
Non-additive measures are the ones on top of which no aggregation function can be
applied. For example, a ratio or a percentage column; a flag or an indicator column
present in fact table holding values like Y/N, etc. is a non-additive measure.
Semi-additive measures are the ones on top of which some (but not all) aggregation
functions can be applied. Example – fee rate or account balance.
Additive measures are the ones on top of which all aggregation functions cab be applied.
Example- units purchased.
Question # 14) What is a Surrogate key? How is it different from a primary
key?
Ans) Surrogate key is a unique identifier or a system generated sequence number key
that can act as a primary key. It can be a column or a combination of columns. Unlike a
primary key, it is not picked up from the existing application data fields.
Question # 15) Is this true that all databases should be in 3NF?
Ans) It is not mandatory for a database to be in 3NF. However, if your purpose is an
easy maintenance of data, less redundancy, and efficient access then you should go with
a de-normalized database.
Question # 16) Have you ever came across the scenario of recursive
relationships? If yes, how did you handle it?
Ans) A recursive relationship occurs in the case where an entity is related to itself. Yes,
I have come across such scenario.
Talking about health care domain, it is a possibility that a health care provider (say, a
doctor) is a patient to any other health care provider. Because, if the doctor himself falls
ill and needs a surgery, he will have to visit some other doctor for getting the surgical
treatment. So, in this case, the entity – health care provider is related to itself. A foreign
key of the health insurance provider’s number will have to present in each member’s
(patient) record.
Question # 17) List out few common mistakes encountered during Data
Modelling?
Ans) Below are the few common mistakes encountered during Data Modelling
 Building massive data models: Large data models are like to have more
design faults. Try to restrict your data model to not more than 200 tables.
 Lack of purpose: If you do not know that what is your business solution is
intended for, you might come up with an incorrect data model. So having clarity
on the business purpose is very important to come up with a right data model.
 Inappropriate use of surrogate keys: Surrogate key should not be used
unnecessarily. Use surrogate key only when the natural key cannot serve the
purpose of a primary key.
 Unnecessary de-normalization: Don’t denormalize until and unless you have a
solid & clear business reason to do so because de-normalization creates
redundant data which is difficult to maintain.
Question #18) What is the number of child tables that can be created out from
a single parent table?
Ans) The number of child tables that can be created that can be created out of the
single parent table is equal to the number of fields/columns in the parent table that are
non-keys.
Question #19) Employee health details are hidden from his employer by the
health care provider. Which level of data hiding is this? Conceptual, physical or
external?
Ans) This is the scenario of an external level of data hiding.
Question #20) What is the form of fact table & dimension table?
Ans) Generally, the fact table is in normalized form and dimension table is in de-
normalized form.
Question # 21) What particulars you would need to come up with a conceptual
model in a health care domain project?
Ans) For a health care project, below details would suffice the requirement to design a
basic conceptual model
 Different categories of health care plan and products.
 Type of subscription (group or individual).
 Set of health care providers.
 Claim and billing process overview.
Question # 22) Tricky one: If a unique constraint is applied to a column then
will it throw an error if you try to insert two nulls into it?
Ans) No, it will not throw any error in this case because a null value is unequal to
another null value. So, more than one null will be inserted in the column without any
error.
Question # 23) Can you quote an example of sub-type and super-type entity?
Ans) Yes, let’s say we have these different entities – vehicle, car, bike, Economy car,
family car, sports car. Here, a vehicle is a super type entity. Car and bike are its sub-
type entities. Furthermore, economy car, sports car, and family car are sub-type entities
of its super-type entity- car. A super-type entity is the one which is at a higher level.
Sub-type entities are ones which are grouped together on the basis of certain
characteristics. For example, all bikes are two-wheelers and all cars are four wheelers.
And since both are vehicles, so their super-type entity is ‘vehicle’.
A super-type entity is the one which is at a higher level. Sub-type entities are ones
which are grouped together on the basis of certain characteristic s. For example, all bikes
are two-wheelers and all cars are four wheelers. And since both are vehicles, so their
super-type entity is ‘vehicle’.
Question # 24) What is the significance of metadata?
Ans) Metadata is data about data. It tells you that what kind of data is actually stored in
the system, what is its purpose and for whom it is intended for.
To Summarize:
 Practical understanding of Data Modelling concept and how it fits into the
assignments done by you is much needed to crack a data modeling interview.
 Most commonly asked topics in Data Modelling interview are – different types of
data models, types of schemas, types of dimensions and normalization.
 Be well prepared for scenario based questions as well.
 I would suggest that whenever you are answering a question to the interviewer,
it’s better that you explain the idea through an example. This would show that
you have actually worked into that area and you understand the core of the
concept very well.
LookupFor ID , VsRefrence
What is a “Lookup Relationship”?
Up to 25 allowed for object
Parent is not a required field.
No impact on a security and access.
No impact on deletion.
Can be multiple layersdeep.
Lookup field is not required.
What is “Master-Detail Relationship”?
Master Detail relationship is the Parent child relationship. In which Master representsParent and
detail representsChild. IfParent is deleted then Child also getsdeleted. Rollup summaryfields can
only be created on Master records which will calculate the SUM, AVG, MIN of the Child records.
Up to 2 allowed to object.
Parent field on child is required.
Access to parent determinesaccess to children.
Deletingparent automaticallydeleteschild.
A child of one master detail relationship cannot be the parent of another.
Lookup field on page layout is required.
----------------------------------------------
Actual InterviewQuestions
Question 1a: Sequence SQL query is executed.
Answer: SQL Query is made into prase tree by parser which Then Execute Each of these Datasets in
Temp table using parse tree of SQL.
Question 1b:
Sequence in which SQL Query keywords is executed internally : Query Select from Where Having Group
by
Answer: Select From [table name]
Where [restrict rows hence optimizes]
Then Group By [ To aggregate ]
Having to filter on aggregation (sum(salary) > 10,000);
Question 2:What are Types of clustering in data modelling
Dominance Grouping
Abstraction Grouping
Continous Grouping
Relationship grouping
Question 3: What for Group insurance Policy What type of clustering used Horizontal or Vertical?
My : Since group ploicy is generally same for all in department it may horizontal Grouping.
But Since if implementation is dependent on designation or levels in organisation So In that case it
is vertical My anwer is wrong based
Question 4:What is surrogate Key how its used. ?
My Answer: Key made up by combination of Foreign Keys from Primary Tables by combination of many
such ids jointly.
Surrogate key reduces time required to fetch records from database as only ID fetched and matched which
are alphanumeric is very fast compared with join o whole ccolumn
Question 5: We are developing one claim portal. Claims are fetched using surrogate key [from many tables
like [policy details, insurance details, Other details, etc...].
Problem: if claim submitted is not successful then surrogate key lost What might be problem how can we
solved it.
Answer Each time claim is submitted then maintain flag to show claim was sussesfull/not sucess
Flag is Stored separate in table with claim details.
Question 6: Filled Claim of previous data.
Question 7 :
Question What are benefit driven of VLDB how dos it help?
Answer : Very Large Database VLDB optimizes the Data Fetching due to query can be achieved by
querying subset of partition maintained by partition
Question 8: MAster table
Child table 2 Child table 1
Is This Star Schema or Snowflake schema.
Answer : Master and two slave under comes into SNOWFLAKE SCHEMA. All Oracle Datwarehouse fact
table is 3NF ,
£rd Normal form say: All non prime atttribte dependent on prime atributs and no Transitive Dependency.
Here MAster and Child relation exists on in snowflake schema where each dimension od futer broken into
child dimenssion.
Question 10. What is normalized Form?
Answer: 1st normal for : no multivalued attributes
2nd Normal form: All non prome attributes Dependent on Prime attributes.
3rd normal form which was 2nd Normal form+ No Trasitive dependency.
Question 11: Did you work with Health Insurance companies?.
Answer: world's largst Blue cross blue shield BCBS , awe terms like term insurance, ., Anthem...
Question 12 :Did you knw healthcase insurance..
Question 13: How well you know life insurance ? p
Answer: I did studied for LOMA, Certified on statisticla
I have decent eposure.. to heath care acturial science, Claim processing, Machines needs more
intellignce., under writing.
As avg dtd.
Also I Certified on 5 ETL Datahouse and 5 BI Tools, certified om
Many of my presentation like SQL,
1. My These Data modelling presentation are viewed 17,000 view per year since 2012.
https://www.slideshare.net/SandeepSharma65/oracle-complete-interview-questions
2. https://www.slideshare.net/SandeepSharma65/oracle-security-ols-vs-vpd
3. data modelling semantic layer on BI data modelling
https://www.slideshare.net/SandeepSharma65/data-modelling-qlikview-45572297
4. Casandra data modelling
https://www.slideshare.net/SandeepSharma65/cassandra-data-modelling-best-practices-45571733
5. https://www.slideshare.net/SandeepSharma65/cassandra-data-modelling-best-practices
Attribute Generalization Resulting from
Entity Generalization
Surrogate KeyPerformance
The two arguments most commonly advanced against surrogate keys are
programming complexity and performance. Frequently, we need to access
a reference table to find the corresponding natural identifier. This situation
occurs often enough that programmers are frequently opponents of surrogate
keys. However, performance is not usually a problem if the reference
tables are small and can reside in primary storage.
The more common performance-related issue with surrogate keys is the
need for additional access mechanisms such as indexes to support access
on both the surrogate and natural keys.
In databases handling high volumes of new data, problems may also
arise with contention for “next available numbers.” However, many DBMSs
provide mechanisms specifically to generate unique key values efficiently
Self Refrencing Relationship
SELECT * /*This is the outer query part */
FROM Employee Emp1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;

Mais conteúdo relacionado

Mais procurados

Dialysis_Health_Care_Analytics[1]1.pptx
Dialysis_Health_Care_Analytics[1]1.pptxDialysis_Health_Care_Analytics[1]1.pptx
Dialysis_Health_Care_Analytics[1]1.pptxmandireddyViswanath
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaVaibhav Khanna
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse FundamentalsRashmi Bhat
 
İleri Seviye T-SQL Programlama - Chapter 02
İleri Seviye T-SQL Programlama - Chapter 02İleri Seviye T-SQL Programlama - Chapter 02
İleri Seviye T-SQL Programlama - Chapter 02Cihan Özhan
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension Sunita Sahu
 
Lect 08 materialized view
Lect 08 materialized viewLect 08 materialized view
Lect 08 materialized viewBilal khan
 
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Simplilearn
 
Advantages and disadvantages of relational databases
Advantages and disadvantages of relational databasesAdvantages and disadvantages of relational databases
Advantages and disadvantages of relational databasesSanthiNivas
 
Slowly Changing Dimension Nedir? | Alkanity
Slowly Changing Dimension Nedir? | AlkanitySlowly Changing Dimension Nedir? | Alkanity
Slowly Changing Dimension Nedir? | AlkanityDeniz Alkan
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQLRam Kedem
 

Mais procurados (20)

Ms sql-server
Ms sql-serverMs sql-server
Ms sql-server
 
Dialysis_Health_Care_Analytics[1]1.pptx
Dialysis_Health_Care_Analytics[1]1.pptxDialysis_Health_Care_Analytics[1]1.pptx
Dialysis_Health_Care_Analytics[1]1.pptx
 
Data warehouse 21 snowflake schema
Data warehouse 21 snowflake schemaData warehouse 21 snowflake schema
Data warehouse 21 snowflake schema
 
Data Warehouse Fundamentals
Data Warehouse FundamentalsData Warehouse Fundamentals
Data Warehouse Fundamentals
 
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R ModellingData Warehouse Designing: Dimensional Modelling and E-R Modelling
Data Warehouse Designing: Dimensional Modelling and E-R Modelling
 
İleri Seviye T-SQL Programlama - Chapter 02
İleri Seviye T-SQL Programlama - Chapter 02İleri Seviye T-SQL Programlama - Chapter 02
İleri Seviye T-SQL Programlama - Chapter 02
 
OLTP vs OLAP
OLTP vs OLAPOLTP vs OLAP
OLTP vs OLAP
 
Slowly changing dimension
Slowly changing dimension Slowly changing dimension
Slowly changing dimension
 
Database
DatabaseDatabase
Database
 
OLAP
OLAPOLAP
OLAP
 
Lect 08 materialized view
Lect 08 materialized viewLect 08 materialized view
Lect 08 materialized view
 
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
Tableau Interview Questions & Answers | Tableau Interview Questions | Tableau...
 
Advantages and disadvantages of relational databases
Advantages and disadvantages of relational databasesAdvantages and disadvantages of relational databases
Advantages and disadvantages of relational databases
 
Slowly Changing Dimension Nedir? | Alkanity
Slowly Changing Dimension Nedir? | AlkanitySlowly Changing Dimension Nedir? | Alkanity
Slowly Changing Dimension Nedir? | Alkanity
 
Ppt
PptPpt
Ppt
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
Chapter 1 introduction to sql server
Chapter 1 introduction to sql serverChapter 1 introduction to sql server
Chapter 1 introduction to sql server
 
MYSQL join
MYSQL joinMYSQL join
MYSQL join
 
DBMS
DBMSDBMS
DBMS
 
Oracle Index
Oracle IndexOracle Index
Oracle Index
 

Semelhante a Data modelling interview question

Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdfMobeenMasoudi
 
software_engg-chap-03.ppt
software_engg-chap-03.pptsoftware_engg-chap-03.ppt
software_engg-chap-03.ppt064ChetanWani
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13AnwarrChaudary
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousingtheextraaedge
 
Informatica Data Modelling : Importance of Conceptual Models
Informatica Data Modelling : Importance of  Conceptual ModelsInformatica Data Modelling : Importance of  Conceptual Models
Informatica Data Modelling : Importance of Conceptual ModelsZaranTech LLC
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data modelVnktp1
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3Malik Alig
 
3Individual Assignment Social, Ethical and Legal Implicat.docx
3Individual Assignment Social, Ethical and Legal Implicat.docx3Individual Assignment Social, Ethical and Legal Implicat.docx
3Individual Assignment Social, Ethical and Legal Implicat.docxrhetttrevannion
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
Modeling System Requirements
Modeling System RequirementsModeling System Requirements
Modeling System RequirementsAsjad Raza
 
ICT DBA3 09 0710 Model Data Objects.pdf
ICT DBA3 09 0710 Model Data Objects.pdfICT DBA3 09 0710 Model Data Objects.pdf
ICT DBA3 09 0710 Model Data Objects.pdfInfotech27
 
topic 2 data flow daigram topic-converted.pdf
topic 2 data flow daigram topic-converted.pdftopic 2 data flow daigram topic-converted.pdf
topic 2 data flow daigram topic-converted.pdftahir427002
 
Data Warehouse by Amr Ali
Data Warehouse by Amr AliData Warehouse by Amr Ali
Data Warehouse by Amr AliAmr Ali
 

Semelhante a Data modelling interview question (20)

Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
ER modeling
ER modelingER modeling
ER modeling
 
(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf
 
software_engg-chap-03.ppt
software_engg-chap-03.pptsoftware_engg-chap-03.ppt
software_engg-chap-03.ppt
 
Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
Basics+of+Datawarehousing
Basics+of+DatawarehousingBasics+of+Datawarehousing
Basics+of+Datawarehousing
 
Informatica Data Modelling : Importance of Conceptual Models
Informatica Data Modelling : Importance of  Conceptual ModelsInformatica Data Modelling : Importance of  Conceptual Models
Informatica Data Modelling : Importance of Conceptual Models
 
Dw concepts
Dw conceptsDw concepts
Dw concepts
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
3Individual Assignment Social, Ethical and Legal Implicat.docx
3Individual Assignment Social, Ethical and Legal Implicat.docx3Individual Assignment Social, Ethical and Legal Implicat.docx
3Individual Assignment Social, Ethical and Legal Implicat.docx
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Modeling System Requirements
Modeling System RequirementsModeling System Requirements
Modeling System Requirements
 
ICT DBA3 09 0710 Model Data Objects.pdf
ICT DBA3 09 0710 Model Data Objects.pdfICT DBA3 09 0710 Model Data Objects.pdf
ICT DBA3 09 0710 Model Data Objects.pdf
 
topic 2 data flow daigram topic-converted.pdf
topic 2 data flow daigram topic-converted.pdftopic 2 data flow daigram topic-converted.pdf
topic 2 data flow daigram topic-converted.pdf
 
Data Warehouse by Amr Ali
Data Warehouse by Amr AliData Warehouse by Amr Ali
Data Warehouse by Amr Ali
 
Dwbi Project
Dwbi ProjectDwbi Project
Dwbi Project
 

Mais de Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

Mais de Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW (20)

Management Consultancy Saudi Telecom Digital Transformation Design Thinking
Management Consultancy Saudi Telecom Digital Transformation Design ThinkingManagement Consultancy Saudi Telecom Digital Transformation Design Thinking
Management Consultancy Saudi Telecom Digital Transformation Design Thinking
 
Major new initiatives
Major new initiativesMajor new initiatives
Major new initiatives
 
Digital transformation journey Consulting
Digital transformation journey ConsultingDigital transformation journey Consulting
Digital transformation journey Consulting
 
Agile Jira Reporting
Agile Jira Reporting Agile Jira Reporting
Agile Jira Reporting
 
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
Lnt and bbby Retail Houseare industry Case assignment  sandeep sharmaLnt and bbby Retail Houseare industry Case assignment  sandeep sharma
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
 
Risk management Consulting For Municipality
Risk management Consulting For MunicipalityRisk management Consulting For Municipality
Risk management Consulting For Municipality
 
GDPR And Privacy By design Consultancy
GDPR And Privacy By design ConsultancyGDPR And Privacy By design Consultancy
GDPR And Privacy By design Consultancy
 
Real implementation Blockchain Best Use Cases Examples
Real implementation Blockchain Best Use Cases ExamplesReal implementation Blockchain Best Use Cases Examples
Real implementation Blockchain Best Use Cases Examples
 
Ffd 05 2012
Ffd 05 2012Ffd 05 2012
Ffd 05 2012
 
Biztalk architecture for Configured SMS service
Biztalk architecture for Configured SMS serviceBiztalk architecture for Configured SMS service
Biztalk architecture for Configured SMS service
 
Pmo best practices
Pmo best practicesPmo best practices
Pmo best practices
 
Agile project management
Agile project managementAgile project management
Agile project management
 
Enroll hostel Business Model
Enroll hostel Business ModelEnroll hostel Business Model
Enroll hostel Business Model
 
Cloud manager client provisioning guideline draft 1.0
Cloud manager client provisioning guideline draft 1.0Cloud manager client provisioning guideline draft 1.0
Cloud manager client provisioning guideline draft 1.0
 
Bpm digital transformation
Bpm digital transformationBpm digital transformation
Bpm digital transformation
 
Digital transformation explained
Digital transformation explainedDigital transformation explained
Digital transformation explained
 
Government Digital transformation trend draft 1.0
Government Digital transformation trend draft 1.0Government Digital transformation trend draft 1.0
Government Digital transformation trend draft 1.0
 
Enterprise architecture maturity rating draft 1.0
Enterprise architecture maturity rating draft 1.0Enterprise architecture maturity rating draft 1.0
Enterprise architecture maturity rating draft 1.0
 
Organisation Structure For digital Transformation Team
Organisation Structure For digital Transformation TeamOrganisation Structure For digital Transformation Team
Organisation Structure For digital Transformation Team
 
Case study haad operating model improvement model
Case study  haad operating model improvement modelCase study  haad operating model improvement model
Case study haad operating model improvement model
 

Último

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 

Último (20)

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 

Data modelling interview question

  • 1.
  • 2.
  • 3. Question #1) What do you understand by Data Modelling?
  • 4. Ans) Data Modelling is the diagrammatic representation showing how the entities are related to each other. It is the initial step towards database design. We first create the conceptual model, then logical model and finally move to the physical model. Generally, the data models are created in data analysis & design phase of software development life cycle. Question #2) Explain your understanding of different data models? Ans) There are three types of data models – conceptual, logical and physical. The level of complexity and detail increases from conceptual to logical to a physical data model. The conceptual model shows a very basic high level of design while the physical data model shows a very detailed view of design. The conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model. The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model. The physical data model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database. Question #3) Throw some light on your experience in Data Modelling with respect to projects you have worked on till date? Note: This was the very first question in one of my Data Modelling interviews. So, before you step into the interview discussion, you should have a very clear picture of how data modeling fits into the assignments you have worked upon. Ans) I have worked on a project for a health insurance provider company where we have interfaces build in Informatica that transforms and process the data fetched from Facets database and sends out useful information to vendors. Note: Facets is an end to end solution to manage all the information for health care industry. The facets database in my project was created with SQL server 2012. We had different entities that were linked together. These entities were subscriber, member, healthcare provider, claim, bill, enrollment, group, eligibility, plan/product, commission, capitation, etc. Below is the conceptual data model showing how the project looked like on a high-level Figure 1:
  • 5. Each of the data entities has their own data attributes. For example, a data attribute of the provider will be provider identification number, few data attributes of the membership will be subscriber ID, member ID, one of the data attribute of claim will claim ID, each healthcare product or plan will be having a unique product ID and so on. Question #4) What are the different design schemas in Data Modelling? Explain with the example? Ans) There are two different kinds of schemas in data modeling  Star Schema  Snowflake Schema Now I will be explaining each of these schemas one by one. The simplest of the schemas is star schema where we have a fact table in the center which references multiple dimension tables around it. All the dimension tables are connected to the fact table. The primary key in all dimension tables acts as a foreign key in the fact table. The ER diagram (see Figure 2) of this schema resembles the shape of a star and that is why this schema is named as a star schema. Figure 2: The star schema is quite simple, flexible and it is in de-normalized form. In a snowflake schema, the level of normalization increases. The fact table here remains the same as in star schema. However, the dimension tables are normalized. Due to several layers of dimension tables, it looks like a snowflake and thus it is named as snowflake schema. Figure 3:
  • 6. Question #5) Which scheme did you use in your project & why? Question #6) Which schema is better – star or snowflake? Ans (Combined for Q# 5&6): The choice of a schema always depends upon the project requirements & scenarios. Since star schema is in de-normalized form, you require fewer joins for a query. The query is simple and runs faster in a star schema. Coming to the snowflake schema, since it is in normalized form, it will require a number of joins as compared to a star schema, the query will be complex and execution will be slower than star schema. Another significant difference between these two schemas is that snowflake schema does not contain redundant data and thus it is easy to maintain. On the contrary, star schema has a high level of redundancy and thus it is difficult to maintain. Now, which one to choose for your project? If the purpose of your project is to do more of dimension analysis, you should go for snowflake schema. For example, if you need to find out that “how many subscribers are tied to a particular plan are currently active?” – go with snowflake model. If the purpose of your project is to do more of metrics analysis, you should go with a star schema. For example, if you need to find out that “what is the claim amount paid to a particular subscriber?” – go with a star schema. In my project, we used snowflake schema because we had to do analysis across several dimensions and generate summary reports for the business. Another reason for using snowflake schema was it is less memory consumption. Question#7) What do you understand by dimension and attribute? Ans) Dimensions represent qualitative data. For example– plan, product, class are all dimensions. A dimension table contains descriptive or textual attributes. For example, product category & product name are the attributes of product dimension. Question #8) What is a fact & a fact table? Ans) Facts represent quantitative data. For example – net amount due is a fact. A fact table contains numerical data and foreign keys from related dimensional tables. An example of the fact table can be seen from Figure 2 shown above. Question #9) What are the different types of dimensions you have come across? Explain each of them in detail with an example? Ans) There are typically five types of dimensions. 1) Conformed dimensions: A Dimension that is utilized as a part of different areas is called as conformed dimension. It might be utilized with different fact tables in a single
  • 7. database or over numerous data marts/warehouses. For example, if subscriber dimension is connected to two fact tables – billing and claim then the subscriber dimension would be treated as conformed dimension. 2) Junk Dimension: It is a dimension table comprising of attributes that don’t have a place in the fact table or in any of the current dimension tables. Generally, these are the properties like flags or indicators. For example, it can be member eligibility flag set as ‘Y’ or ‘N’ or any other indicator set as true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table then its size gets increased. So, we combine all such attributes and put in a single dimension table called as junk dimension having unique junk IDs with a possible combination of all the indicator values. 3) Role Playing Dimension: These are the dimensions which are utilized for multiple purposes in the same database. For example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term date”. So, such a dimension will be called as Role playing dimension. The primary key of Date dimension will be associated with multiple foreign keys in the fact table. 4) Slowly Changing Dimension (SCD): These are most important amongst all the dimensions. These are the dimensions where attribute values vary with time. Below are the varies types of SCDs  Type-0: These are the dimensions where attribute value remains steady with time. For example, Subscriber’s DOB is a type-0 SCD because it will always remain the same irrespective of the time.  Type-1: These are the dimensions where previous value of the attribute is replaced by the current value. No history is maintained in Type-1 dimension. For example, Subscriber’s address (where the business requires to keep the only current address of subscriber) can be a Type-1 dimension.  Type-2: These are the dimensions where unlimited history is preserved. For example,Subscriber’s address (where the business requires to keep a record of all the previous addresses of the subscriber). In this case, multiple rows for a subscriber will be inserted in the table with his/her different addresses.  There will be some column(s) that will identify the current address. For example, ‘start date’ and ‘End date’. The row where ‘End date’ value will be blank would contain subscriber’s current address and all other rows will be having previous addresses of the subscriber.  Type-3: These are the type of dimensions where limited history is preserved. And we use an additional column to maintain the history. For example, Subscriber’s address (where the business requires to keep a record of current & just one previous address). In this case, we can dissolve the ‘address’ column into two different columns – ‘current address’ and ‘previous address’.  So, instead of having multiple rows, we will be having just one row showing current as well as the previous address of the subscriber.  Type-4: In this type of dimension, the historical data is preserved in a separate table. The main dimension table holds only the current data.  For example, the main dimension table will have only one row per subscriber holding its current address. All other previous addresses of the subscriber will be kept in the separate history table. This type of dimension is hardly ever used. 5) Degenerated Dimension: A degenerated dimension is a dimension which is not a fact but presents in the fact table as a primary key. It does not have its own dimension table. We can also call it as a single attribute dimension table. But, instead of keeping it separately in a dimension table and putting an additional join, we put this attribute in the fact table directly as a key. Since it does not have its own dimension table, it can never act a foreign key in fact table. Question#10) Give your idea regarding factless fact? And why do we use it? Ans) Factless fact table is a fact table that contains no fact measure in it. It has only the dimension keys in it.
  • 8. At times, certain situations may arise in the business where you need to have factless fact table. For example, suppose you are maintaining an employee attendance record system, you can have a factless fact table having three keys. Employee_ID Department_ID Time_ID You can see that the above table does not contain any measure. Now if you want to answer below question, you can do easily using the above single factless fact table rather than having two separate fact tables: “How many employees of a particular department were present on a particular day?” So, factless fact table offers flexibility to the design. Question #11) Distinguish between OLTP and OLAP? Ans) OLTP stands for Online Transaction Processing system & OLAP stands for Online Analytical processing system. OLTP maintains the transactional data of the business & is highly normalized generally. On the contrary, OLAP is for analysis and reporting purpose & it is in de-normalized form. This difference between OLAP and OLTP also gives you the way to choosing the design of schema. If your system is OLTP, you should go with star schema design and if your system is OLAP, you should go with snowflake schema. Question #12) What do you understand by data mart? Ans) Data marts are for the most part intended for a solitary branch of business. They are designed for the individual departments. For example, I used to work for a health insurance provider company which had different departments in it like Finance, Reporting, Sales and so forth. We had a data warehouse that was holding the information pertaining to all these departments and then we have few data marts built on top of this data warehouse. These DataMart were specific to each department. In simple words, you can say that a DataMart is a subset of a data warehouse. Question #13) What are the different types of measures? Ans) We have three types of measures  Non- additive measures  Semi-additive measures  Additive measures Non-additive measures are the ones on top of which no aggregation function can be applied. For example, a ratio or a percentage column; a flag or an indicator column present in fact table holding values like Y/N, etc. is a non-additive measure. Semi-additive measures are the ones on top of which some (but not all) aggregation functions can be applied. Example – fee rate or account balance. Additive measures are the ones on top of which all aggregation functions cab be applied. Example- units purchased. Question # 14) What is a Surrogate key? How is it different from a primary key? Ans) Surrogate key is a unique identifier or a system generated sequence number key that can act as a primary key. It can be a column or a combination of columns. Unlike a primary key, it is not picked up from the existing application data fields.
  • 9. Question # 15) Is this true that all databases should be in 3NF? Ans) It is not mandatory for a database to be in 3NF. However, if your purpose is an easy maintenance of data, less redundancy, and efficient access then you should go with a de-normalized database. Question # 16) Have you ever came across the scenario of recursive relationships? If yes, how did you handle it? Ans) A recursive relationship occurs in the case where an entity is related to itself. Yes, I have come across such scenario. Talking about health care domain, it is a possibility that a health care provider (say, a doctor) is a patient to any other health care provider. Because, if the doctor himself falls ill and needs a surgery, he will have to visit some other doctor for getting the surgical treatment. So, in this case, the entity – health care provider is related to itself. A foreign key of the health insurance provider’s number will have to present in each member’s (patient) record. Question # 17) List out few common mistakes encountered during Data Modelling? Ans) Below are the few common mistakes encountered during Data Modelling  Building massive data models: Large data models are like to have more design faults. Try to restrict your data model to not more than 200 tables.  Lack of purpose: If you do not know that what is your business solution is intended for, you might come up with an incorrect data model. So having clarity on the business purpose is very important to come up with a right data model.  Inappropriate use of surrogate keys: Surrogate key should not be used unnecessarily. Use surrogate key only when the natural key cannot serve the purpose of a primary key.  Unnecessary de-normalization: Don’t denormalize until and unless you have a solid & clear business reason to do so because de-normalization creates redundant data which is difficult to maintain. Question #18) What is the number of child tables that can be created out from a single parent table? Ans) The number of child tables that can be created that can be created out of the single parent table is equal to the number of fields/columns in the parent table that are non-keys. Question #19) Employee health details are hidden from his employer by the health care provider. Which level of data hiding is this? Conceptual, physical or external? Ans) This is the scenario of an external level of data hiding. Question #20) What is the form of fact table & dimension table? Ans) Generally, the fact table is in normalized form and dimension table is in de- normalized form. Question # 21) What particulars you would need to come up with a conceptual model in a health care domain project? Ans) For a health care project, below details would suffice the requirement to design a basic conceptual model  Different categories of health care plan and products.  Type of subscription (group or individual).  Set of health care providers.  Claim and billing process overview. Question # 22) Tricky one: If a unique constraint is applied to a column then will it throw an error if you try to insert two nulls into it? Ans) No, it will not throw any error in this case because a null value is unequal to another null value. So, more than one null will be inserted in the column without any error. Question # 23) Can you quote an example of sub-type and super-type entity? Ans) Yes, let’s say we have these different entities – vehicle, car, bike, Economy car, family car, sports car. Here, a vehicle is a super type entity. Car and bike are its sub- type entities. Furthermore, economy car, sports car, and family car are sub-type entities
  • 10. of its super-type entity- car. A super-type entity is the one which is at a higher level. Sub-type entities are ones which are grouped together on the basis of certain characteristics. For example, all bikes are two-wheelers and all cars are four wheelers. And since both are vehicles, so their super-type entity is ‘vehicle’. A super-type entity is the one which is at a higher level. Sub-type entities are ones which are grouped together on the basis of certain characteristic s. For example, all bikes are two-wheelers and all cars are four wheelers. And since both are vehicles, so their super-type entity is ‘vehicle’. Question # 24) What is the significance of metadata? Ans) Metadata is data about data. It tells you that what kind of data is actually stored in the system, what is its purpose and for whom it is intended for. To Summarize:  Practical understanding of Data Modelling concept and how it fits into the assignments done by you is much needed to crack a data modeling interview.  Most commonly asked topics in Data Modelling interview are – different types of data models, types of schemas, types of dimensions and normalization.  Be well prepared for scenario based questions as well.  I would suggest that whenever you are answering a question to the interviewer, it’s better that you explain the idea through an example. This would show that you have actually worked into that area and you understand the core of the concept very well.
  • 11.
  • 12. LookupFor ID , VsRefrence What is a “Lookup Relationship”? Up to 25 allowed for object Parent is not a required field. No impact on a security and access. No impact on deletion. Can be multiple layersdeep. Lookup field is not required. What is “Master-Detail Relationship”?
  • 13. Master Detail relationship is the Parent child relationship. In which Master representsParent and detail representsChild. IfParent is deleted then Child also getsdeleted. Rollup summaryfields can only be created on Master records which will calculate the SUM, AVG, MIN of the Child records. Up to 2 allowed to object. Parent field on child is required. Access to parent determinesaccess to children. Deletingparent automaticallydeleteschild. A child of one master detail relationship cannot be the parent of another. Lookup field on page layout is required. ---------------------------------------------- Actual InterviewQuestions Question 1a: Sequence SQL query is executed. Answer: SQL Query is made into prase tree by parser which Then Execute Each of these Datasets in Temp table using parse tree of SQL. Question 1b: Sequence in which SQL Query keywords is executed internally : Query Select from Where Having Group by Answer: Select From [table name] Where [restrict rows hence optimizes] Then Group By [ To aggregate ] Having to filter on aggregation (sum(salary) > 10,000); Question 2:What are Types of clustering in data modelling Dominance Grouping Abstraction Grouping Continous Grouping Relationship grouping
  • 14. Question 3: What for Group insurance Policy What type of clustering used Horizontal or Vertical? My : Since group ploicy is generally same for all in department it may horizontal Grouping. But Since if implementation is dependent on designation or levels in organisation So In that case it is vertical My anwer is wrong based Question 4:What is surrogate Key how its used. ? My Answer: Key made up by combination of Foreign Keys from Primary Tables by combination of many such ids jointly. Surrogate key reduces time required to fetch records from database as only ID fetched and matched which are alphanumeric is very fast compared with join o whole ccolumn Question 5: We are developing one claim portal. Claims are fetched using surrogate key [from many tables like [policy details, insurance details, Other details, etc...]. Problem: if claim submitted is not successful then surrogate key lost What might be problem how can we solved it. Answer Each time claim is submitted then maintain flag to show claim was sussesfull/not sucess Flag is Stored separate in table with claim details. Question 6: Filled Claim of previous data. Question 7 : Question What are benefit driven of VLDB how dos it help? Answer : Very Large Database VLDB optimizes the Data Fetching due to query can be achieved by querying subset of partition maintained by partition
  • 15. Question 8: MAster table Child table 2 Child table 1 Is This Star Schema or Snowflake schema. Answer : Master and two slave under comes into SNOWFLAKE SCHEMA. All Oracle Datwarehouse fact table is 3NF , £rd Normal form say: All non prime atttribte dependent on prime atributs and no Transitive Dependency. Here MAster and Child relation exists on in snowflake schema where each dimension od futer broken into child dimenssion. Question 10. What is normalized Form? Answer: 1st normal for : no multivalued attributes 2nd Normal form: All non prome attributes Dependent on Prime attributes. 3rd normal form which was 2nd Normal form+ No Trasitive dependency. Question 11: Did you work with Health Insurance companies?. Answer: world's largst Blue cross blue shield BCBS , awe terms like term insurance, ., Anthem... Question 12 :Did you knw healthcase insurance.. Question 13: How well you know life insurance ? p Answer: I did studied for LOMA, Certified on statisticla I have decent eposure.. to heath care acturial science, Claim processing, Machines needs more intellignce., under writing. As avg dtd. Also I Certified on 5 ETL Datahouse and 5 BI Tools, certified om Many of my presentation like SQL, 1. My These Data modelling presentation are viewed 17,000 view per year since 2012. https://www.slideshare.net/SandeepSharma65/oracle-complete-interview-questions 2. https://www.slideshare.net/SandeepSharma65/oracle-security-ols-vs-vpd 3. data modelling semantic layer on BI data modelling https://www.slideshare.net/SandeepSharma65/data-modelling-qlikview-45572297 4. Casandra data modelling https://www.slideshare.net/SandeepSharma65/cassandra-data-modelling-best-practices-45571733 5. https://www.slideshare.net/SandeepSharma65/cassandra-data-modelling-best-practices Attribute Generalization Resulting from
  • 16. Entity Generalization Surrogate KeyPerformance The two arguments most commonly advanced against surrogate keys are programming complexity and performance. Frequently, we need to access a reference table to find the corresponding natural identifier. This situation
  • 17. occurs often enough that programmers are frequently opponents of surrogate keys. However, performance is not usually a problem if the reference tables are small and can reside in primary storage. The more common performance-related issue with surrogate keys is the need for additional access mechanisms such as indexes to support access on both the surrogate and natural keys. In databases handling high volumes of new data, problems may also arise with contention for “next available numbers.” However, many DBMSs provide mechanisms specifically to generate unique key values efficiently Self Refrencing Relationship SELECT * /*This is the outer query part */ FROM Employee Emp1 WHERE (N-1) = ( /* Subquery starts here */ SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > Emp1.Salary) SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;