1.
ETL is abbreviated as
Correct Answer
B. B) Extract,Transform,Load
Explanation
ETL stands for Extract, Transform, Load. This refers to the process of extracting data from various sources, transforming it into a suitable format, and then loading it into a target database or data warehouse. The extraction phase involves gathering data from different sources, such as databases, files, or APIs. The transformation phase involves cleaning, filtering, and reformatting the data to meet the requirements of the target system. Finally, the loaded data is stored in a database or data warehouse for further analysis and reporting. Therefore, option b) Extract, Transform, Load is the correct abbreviation for ETL.
2.
Arrange the following sequence
1.Data 2.Knowledge 3.Information 4.Insight
Correct Answer
B. B) 1,3,2,4
Explanation
The correct arrangement of the sequence is 1,3,2,4. This is because data is raw facts and figures, which are then processed and organized to become information. Knowledge is gained by understanding and interpreting the information. Finally, insight is the deeper understanding or realization that comes from applying knowledge to a specific context. Therefore, the correct order is 1 (Data), 3 (Information), 2 (Knowledge), and 4 (Insight).
3.
OLTP systems are
Correct Answer
A. A) Operating Systems
Explanation
OLTP systems refer to Online Transaction Processing systems, which are used for managing and processing real-time transactions in a business. These systems are responsible for handling day-to-day operations such as order processing, inventory management, and customer interactions. They are not purely decision support systems or analytical systems, as their primary function is to facilitate transactional activities rather than analyzing data or generating reports. Additionally, OLTP systems are not security systems, although they may have security features to protect the integrity and confidentiality of data. Therefore, the correct answer is a) Operating Systems.
4.
Operational Databases are
Correct Answer
D. None of the above
Explanation
Operational databases are databases that are designed to support day-to-day operations of an organization. They are typically normalized databases, meaning that the data is organized efficiently to minimize redundancy and improve data integrity. Performance of operational databases can vary depending on various factors such as hardware, software, and database design. While operational databases may contain historical data, it is not a defining characteristic. Therefore, the correct answer is "None of the above".
5.
Top Down approach is suggested by
Correct Answer
B. B) Bill Inmon
Explanation
Bill Inmon is the correct answer because he is known for advocating the top-down approach in data warehouse design. Inmon believes that a data warehouse should be built using a normalized relational model, with a focus on integrating and consolidating data from various sources. This approach emphasizes the importance of designing a comprehensive data model and a detailed data architecture before implementing the physical database. In contrast, Ralph Kimball is known for advocating the bottom-up approach, which focuses on building data marts first and then integrating them into a data warehouse.
6.
It is an automated process of building a data warehouse which involves taking the data from disparate source systems, converting them into a consistent form that can be loaded into the warehouse and performing quality checks while building the data ware house. It typically accounts for 70-80% of the effort in a data warehousing initiative.
Correct Answer
A. Data Staging
Explanation
Data staging is the correct answer because it refers to the process of taking data from disparate source systems and converting them into a consistent form that can be loaded into the data warehouse. This process also involves performing quality checks to ensure the data is accurate and reliable. Data staging is a crucial step in building a data warehouse and typically requires a significant amount of effort, accounting for 70-80% of the overall data warehousing initiative.
7.
Analytical Systems are
Correct Answer
D. D) All of the above
Explanation
Analytical systems are tools or platforms that contain historic data, which can be used for decision making and analysis purposes. These systems provide insights and help in making informed decisions based on the analysis of the available data. Therefore, option d) "All of the above" is the correct answer as it encompasses all the mentioned characteristics of analytical systems.
8.
Which of the following does not describe Data warehouse
Correct Answer
D. D) Updateable
Explanation
Data warehouse is a central repository of integrated and subject-oriented data that is used for reporting and data analysis. It is designed to support complex queries and provide historical data for analysis. However, data warehouses are not updateable in the sense that they are not meant to be constantly updated with real-time data. Instead, they are typically loaded with data periodically, such as daily or weekly, to provide a consistent and stable dataset for analysis. Therefore, option d) is the correct answer as it does not describe the characteristic of a data warehouse.
9.
DWM bus concept is there in which concept
Correct Answer
A. A) RalpH Kimball
Explanation
The correct answer is a) Ralph Kimball. Ralph Kimball is associated with the concept of the Data Warehouse Bus Architecture. This concept involves organizing the data warehouse into a central "bus" structure, where data flows from different source systems to various dimensional data marts. This approach emphasizes the use of dimensional modeling and is known for its flexibility and simplicity in designing data warehouses.
10.
Which of the following is/are functions of warehouse manager of data warehouse
1.Transforming and managing data
2.Back ups and archiving data warehouse
3.Directing and managing queries
Correct Answer
A. Only 1
Explanation
The warehouse manager of a data warehouse is responsible for transforming and managing data. This involves tasks such as data integration, data cleansing, data transformation, and data quality management. The manager ensures that the data in the warehouse is accurate, consistent, and up-to-date. They also oversee the storage and organization of data within the warehouse. Backing up and archiving data, as well as directing and managing queries, are not typically within the scope of the warehouse manager's responsibilities.
11.
In organization the relation b/n projects and employees
Correct Answer
A. A) Many-to-many
Explanation
In an organization, the relationship between projects and employees is often a many-to-many relationship. This means that multiple employees can be assigned to multiple projects, and each project can have multiple employees working on it. This type of relationship allows for flexibility and collaboration within the organization, as employees can contribute to different projects and projects can benefit from the expertise of multiple employees.
12.
In star schema surrounding referencing tables around central factual dimension tables are
Correct Answer
A. Dimensional tables
Explanation
In a star schema, the central tables are the fact tables which contain the measurements or metrics of the data being analyzed. These fact tables are surrounded by dimensional tables, which provide context and additional information about the data in the fact tables. Dimensional tables are used for categorizing and describing the data in the fact tables, allowing for easy analysis and reporting. Relational tables and temporary tables are not specific to star schemas and may or may not be used in conjunction with dimensional and fact tables.
13.
Arrange the following steps in the correct sequence
1. Look for essential transactions.
2. Check if the fact is a dimension.
3. Check if dimension is the fact.
4. Determine key dimensions.
Correct Answer
C. C) 1,4,2,3
Explanation
The correct sequence for the given steps is as follows:
1. Look for essential transactions - This step involves identifying the transactions that are crucial and necessary for the process.
2. Determine key dimensions - Once the essential transactions are identified, the key dimensions related to those transactions are determined.
3. Check if dimension is the fact - In this step, it is checked whether the identified dimensions can serve as the main fact or not.
4. Check if the fact is a dimension - Finally, it is checked whether the main fact identified in the previous step can also function as a dimension in other contexts.
14.
Which of the following statements is/are true about fact data and dimensional data
1. Fact data represents a physical transaction that has occurred at the point in time and as such is unlikely change during ongoing basis during the life of data warehouse
2. In general dimensional data in a star schema or snowflake schema is designed to minimize the cost of change and is typically very low volume data (i.e. under 5 GB)
3. Fact data will have only one foreign key whereas reference data will have one primary key.
Correct Answer
D. D) Both 1 and 2
Explanation
Both statements 1 and 2 are true. Statement 1 explains that fact data represents physical transactions that have occurred and is unlikely to change during the life of the data warehouse. This means that once the transaction is recorded, it remains unchanged. Statement 2 states that dimensional data in a star schema or snowflake schema is designed to minimize the cost of change and is typically low volume data. This means that the structure of the dimensional data is optimized for efficient querying and analysis, and it is usually not a large dataset. Therefore, the correct answer is d) Both 1 and 2.
15.
Which of the following statements is/are false about OLAP tools
1. Do not learn
2. Create new knowledge
3. More powerful than data mining
4. Cannot search for new solutions
Correct Answer
B. B) Only 3
Explanation
OLAP tools are not more powerful than data mining. Data mining involves the process of discovering patterns and relationships in large datasets, often using complex algorithms and statistical techniques. It is a powerful tool for extracting valuable insights and knowledge from data. On the other hand, OLAP tools are primarily used for analyzing and querying multidimensional data, allowing users to perform advanced calculations and aggregations. While OLAP tools can provide valuable insights, they do not have the same level of complexity and advanced functionality as data mining tools. Therefore, statement 3 is false.
16.
Arrange the following
1. Staging Data
2. Persistent staging
3. History staging
4. ODS
5. Data warehouse
6. Datamart
Correct Answer
C. C) 1,3,2,4,5,6
Explanation
The correct answer is c) 1,3,2,4,5,6. This order represents the typical flow of data in a data warehousing environment. Staging data is the first step, where raw data is loaded into a staging area. Then, history staging is performed to track changes and maintain a historical record of data. Next, the data is loaded into the Operational Data Store (ODS), which acts as a central repository for current and integrated data. After that, the data is transformed and loaded into the data warehouse, which is a centralized and structured database for reporting and analysis. Finally, data marts are created to provide a subset of data tailored to specific business needs.
17.
What is Business Intelligence?
Correct Answer(s)
A. A) Process
B. B) Tool
C. C) Technique
Explanation
Business Intelligence refers to the process of collecting, analyzing, and interpreting data to make informed business decisions. It involves gathering data from various sources, transforming it into meaningful insights, and using those insights to drive strategic actions. While it can involve the use of tools and software, Business Intelligence is primarily a process that encompasses the entire lifecycle of data analysis, from data collection to reporting. It also involves the application of various techniques, such as data mining, data visualization, and predictive analytics, to extract valuable insights from the data.
18.
Who invented the top-down approach?
Correct Answer
A. A) Inmon
Explanation
Inmon is credited with inventing the top-down approach. This approach involves designing a data warehouse by starting with the overall business requirements and then breaking them down into smaller, more specific components. Inmon's approach emphasizes the importance of creating a centralized and integrated data model that can be used for various reporting and analysis purposes. This approach has been widely adopted in the field of data warehousing and has proven to be effective in managing and organizing large volumes of data.
19.
Which of the following data modeling?
Correct Answer
A. OLAP
Explanation
OLAP stands for Online Analytical Processing, which is a data modeling technique used for analyzing and reporting data. It allows users to perform complex multidimensional analysis, such as slice-and-dice, drill-down, and roll-up operations, on large volumes of data. OLAP is specifically designed for decision support systems and provides a fast and efficient way to access and analyze data from multiple perspectives. It is commonly used in business intelligence applications to support strategic decision-making processes.
20.
The process of examining the data available in existing data source and collecting statistics and information about the data
Correct Answer
C. Data Profiling
Explanation
Data profiling refers to the process of examining the data available in an existing data source and collecting statistics and information about the data. It involves analyzing the structure, content, and quality of the data to understand its characteristics and identify any issues or anomalies. This process helps in gaining insights into the data, such as its completeness, accuracy, consistency, and uniqueness. Data profiling is an essential step in data management and data quality improvement, as it provides a foundation for data cleansing, data integration, and data analysis.
21.
OLTP data is the combination of master data and
Correct Answer
A. Dimensional data
Explanation
OLTP data refers to online transaction processing data, which is the data generated from day-to-day operational activities of a business. It includes both master data, which is the core data about entities such as customers, products, and employees, and dimensional data, which is the data organized into dimensions or categories for analysis purposes. Aggregated data refers to summarized data, while details data refers to the raw, granular data. Therefore, the correct answer is dimensional data, as it is a part of OLTP data along with master data.
22.
What does fact less fact table contains
Correct Answer
B. Multipart key only
Explanation
A factless fact table contains only a multipart key, meaning it consists of multiple foreign keys from different dimension tables. It does not contain any measures or numeric data. This type of fact table is used to represent events or occurrences that have no numerical value associated with them, such as a sales order being placed or a customer joining a loyalty program. The purpose of a factless fact table is to capture relationships between dimensions and provide a way to analyze the occurrences or events based on their attributes.
23.
Full form of ODS
Correct Answer
A. Operational data storage
Explanation
The correct answer is "Operational data storage." Operational data storage refers to a system or database that is used to store and manage real-time operational data within an organization. This type of storage is designed to support the day-to-day operations and decision-making processes of a business. It allows for the storage and retrieval of data related to transactions, customer interactions, inventory levels, and other operational activities.
24.
The key which is substitution for natural primary key in data warehouse
Correct Answer
D. Surrogate key
Explanation
A surrogate key is a key that is generated by the system and used as a substitution for a natural primary key in a data warehouse. It is typically a simple integer value that has no meaning or significance outside of the data warehouse. Surrogate keys are used to uniquely identify records and provide a stable identifier for data integration and manipulation purposes. They are often preferred over natural keys because they are more efficient for indexing, joining, and sorting operations in a data warehouse environment.
25.
What is dimensional modeling?
Correct Answer
B. Equivalent of logical data design in DWH
Explanation
Dimensional modeling is the equivalent of logical data design in a data warehouse (DWH). It involves designing the structure and relationships of the data in a way that is optimized for reporting and analysis. This includes identifying and defining the dimensions (categories or attributes) by which the data will be analyzed, as well as the measures (quantitative data) that will be used. Dimensional modeling helps to organize and structure the data in a way that is intuitive and easy to understand for end users, enabling efficient querying and analysis.
26.
Dimension used in multiple schemas is called
Correct Answer
A. Conformed dimension
Explanation
A dimension used in multiple schemas is called a conformed dimension. This means that the dimension is consistent and standardized across different schemas or data models. It allows for integration and comparison of data from different sources or systems. By using a conformed dimension, organizations can ensure that data is accurately and consistently represented, improving data quality and enabling meaningful analysis across different data sets.
27.
Arrange the following
1.Presentation Layer 2.ODS Layer 3.Replica Layer 4.DWH Layer
Correct Answer
B. 3,2,4,1
Explanation
The correct answer is 3,2,4,1. In a typical data architecture, the data flows from the Presentation Layer to the ODS (Operational Data Store) Layer, then to the Replica Layer, and finally to the DWH (Data Warehouse) Layer. The Presentation Layer is responsible for displaying the data to the end-users. The ODS Layer is used to store operational data for immediate access and processing. The Replica Layer is used for data replication and synchronization. And the DWH Layer is used for long-term storage and analysis of historical data.
28.
A good ETL tool must be able to communicate with many different
Correct Answer
B. Sources
Explanation
A good ETL tool must be able to communicate with many different sources, including relational databases, targets, and flat files. This is important because data can be stored in various formats and locations, and the ETL tool needs to be able to extract data from these different sources in order to transform and load it into the desired destination. By being able to communicate with a wide range of sources, the ETL tool provides flexibility and versatility in handling diverse data sources, making it an effective tool for data integration and management.
29.
Types of schemas in DWH
Correct Answer
C. Snowflake schema, Star schema
Explanation
The correct answer is Snowflake schema, Star schema.
A snowflake schema is a type of schema used in data warehousing where dimensions are normalized into multiple tables, creating a more complex structure. This allows for more efficient storage and reduces redundancy. On the other hand, a star schema is a simpler type of schema where dimensions are denormalized into a single table, resulting in a more straightforward structure. Both snowflake and star schemas are commonly used in data warehousing, but they have different levels of complexity and normalization.
30.
Which one dimension is Role Playing Dimension?
Correct Answer
C. Time dimension
Explanation
The role-playing dimension refers to a dimension that can be used to represent different roles or perspectives within a data model. The time dimension is often used in data models to analyze data over different time periods, such as days, months, or years. It allows for the analysis of data trends and patterns over time, making it a suitable candidate for the role-playing dimension.
31.
What type of dimension use “Rolling window operation”?
Correct Answer
A. Degenerate dimension
Explanation
A degenerate dimension is a dimension that is derived from the fact table itself, meaning it does not have its own separate dimension table. It is typically used to store transactional or operational data that does not fit into any other dimension. Rolling window operation refers to performing calculations or aggregations on a sliding or moving time window. Since degenerate dimensions are directly related to the fact table and often involve time-based calculations, they are the most likely type of dimension to use rolling window operations.
32.
Which one of following is the standard methodology followed for building data warehouse?
Correct Answer
A. Extract, Transform, Load
Explanation
The correct answer is "Extract, Transform, Load". This is the standard methodology followed for building a data warehouse. In this process, data is first extracted from various sources, then transformed to fit the data warehouse schema and business rules, and finally loaded into the data warehouse for analysis and reporting purposes. This methodology ensures that the data is cleansed, integrated, and organized in a consistent and meaningful way, enabling effective data analysis and decision-making.
33.
-
Star schema stores data in form of
Correct Answer
B. 2NF
Explanation
The star schema stores data in the second normal form (2NF). In 2NF, the data is organized in a way that eliminates redundant data by separating them into separate tables. The star schema is a type of dimensional modeling used in data warehousing, where a central fact table is surrounded by multiple dimension tables. This design allows for efficient querying and analysis of data.
34.
Fact tables are normalized
Correct Answer
A. True
Explanation
Fact tables in a data warehouse are typically normalized to reduce redundancy and improve data integrity. Normalization involves breaking down the data into smaller tables and linking them through relationships. This helps in reducing data duplication and ensures that updates or changes to the data are consistent across the database. By normalizing fact tables, it becomes easier to query and analyze the data efficiently. Therefore, the statement "Fact tables are normalized" is true.
35.
Static extract is used for ongoing warehouse maintenance
Correct Answer
B. False
Explanation
The statement is false because a static extract is not used for ongoing warehouse maintenance. A static extract is a one-time extraction of data from a source system to be used for analysis or reporting purposes. It is not meant for ongoing maintenance tasks in a warehouse.
36.
For “point in time” concept in DWH, history staging is must
Correct Answer
A. True
Explanation
The "point in time" concept in data warehousing refers to capturing and storing data as it existed at a specific moment in time. This allows for historical analysis and reporting. In order to implement this concept effectively, history staging is necessary. History staging involves storing historical versions of data, enabling the tracking of changes over time. Without history staging, it would be difficult to accurately capture and analyze data at specific points in time. Therefore, the statement "For 'point in time' concept in DWH, history staging is a must" is true.
37.
A conditional report is based on data gathered, then analyzed during report generation, which reports out the results of analysis in report’s output
Correct Answer
A. True
Explanation
A conditional report is a type of report that is generated based on data that has been gathered and analyzed. The analysis of the data is done during the report generation process, and the results of the analysis are then reported in the output of the report. This means that the report is dependent on the data and the analysis that has been performed on it. Therefore, the statement "A conditional report is based on data gathered, then analyzed during report generation, which reports out the results of analysis in report's output" is true.
38.
Can we use single dimension in multi schema
Correct Answer
A. True
Explanation
Yes, we can use a single dimension in multiple schemas. In a data warehouse, schemas are used to organize and structure the data. A dimension represents a specific attribute or characteristic of the data. By using a single dimension in multiple schemas, we can ensure consistency and avoid redundancy in the data. This allows us to analyze and report on the data from different perspectives without duplicating the dimension data.
39.
Data warehouse must be de-normalized
Correct Answer
A. True
Explanation
A data warehouse is designed to support complex analytical queries and reporting, rather than transactional processing. Denormalization involves combining multiple tables and duplicating data to improve query performance and simplify data retrieval. By denormalizing the data warehouse, it reduces the number of joins required in queries, which can significantly improve query performance. Therefore, it is true that a data warehouse must be denormalized to optimize its performance for analytical purposes.
40.
ODS is used for day to day decision
Correct Answer
A. True
Explanation
ODS stands for Operational Data Store, which is a database that is designed to support day-to-day decision-making processes. It stores real-time or near real-time data from various operational systems, making it readily available for analysis and reporting. This allows organizations to make informed decisions based on current and accurate data. Therefore, the statement "ODS is used for day to day decision" is true.
41.
Dimension are normalized
Correct Answer
B. False
Explanation
The statement "Dimensions are normalized" is false. Normalizing dimensions means scaling them to a common range or unit. However, in this context, it is unclear what dimensions are being referred to, so a proper explanation cannot be provided.
42.
In warehouse data is stored with bitmap index
Correct Answer
A. True
Explanation
In a warehouse, data is stored with a bitmap index. This means that the data is organized and indexed using a bitmap, which is a data structure that represents a set of bits. Each bit in the bitmap corresponds to a specific value or attribute in the data, and its value indicates whether or not the corresponding value is present in the data. This type of indexing allows for efficient querying and analysis of the data, as it can quickly determine the presence or absence of specific values or combinations of values. Therefore, the statement "True" is correct.
43.
All data bases must be in third normal form
Correct Answer
A. True
Explanation
The statement is true because the third normal form (3NF) is a level of database normalization that ensures data is organized efficiently and eliminates redundancy. In 3NF, every non-key attribute is dependent on the primary key, and there are no transitive dependencies between non-key attributes. By adhering to 3NF, databases can minimize data duplication and improve data integrity and consistency.
44.
Derived data are detailed, current data that are intended to be single, authoritative data for all decision support applications
Correct Answer
B. False
Explanation
Derived data refers to data that is derived or calculated from other data sources. It is not necessarily intended to be single or authoritative data for decision support applications, as it is dependent on the accuracy and reliability of the data sources it is derived from. Therefore, the given statement is false.
45.
Every key used to join fact table with dimensional table should be a surrogate key
Correct Answer
A. True
Explanation
In a data warehouse, a fact table contains the measurements, metrics, or facts of a business process, while a dimensional table provides the context or dimensions for these facts. To establish a relationship between the fact and dimensional tables, a key is used. A surrogate key is a system-generated unique identifier that replaces the natural key in a table. Using surrogate keys ensures that the join between the fact and dimensional tables is efficient and avoids any potential issues with natural keys, such as changes or duplicates. Therefore, every key used to join the fact table with the dimensional table should be a surrogate key.
46.
Periodic data are data that are physically altered once added to the store
Correct Answer
B. False
Explanation
Periodic data are not data that are physically altered once added to the store. Instead, periodic data refers to data that is collected or recorded at regular intervals, such as daily, weekly, or monthly. This type of data is used to analyze trends or patterns over time. Therefore, the correct answer is False.
47.
Star schema is generally suited to online transaction processing and therefore is generally used in operational systems, operational data stores and EDW
Correct Answer
B. False
Explanation
Star schema is not generally suited to online transaction processing. It is actually designed for data warehousing and analytical processing. This schema is used to organize data into a central fact table surrounded by dimension tables, which allows for efficient querying and analysis of data. Therefore, it is commonly used in data warehousing environments, operational data stores, and enterprise data warehouses (EDW), rather than in operational systems.
48.
Data in data warehouse are retrieved and loaded from operational systems
Correct Answer
A. True
Explanation
Data in a data warehouse is retrieved and loaded from operational systems. This means that the data warehouse collects and integrates data from various operational systems within an organization. The purpose of this is to provide a centralized and consistent source of data for reporting and analysis purposes. By retrieving and loading data from operational systems, the data warehouse ensures that the information stored is accurate, up-to-date, and suitable for decision-making processes.
49.
In snow flake we don’t have any flake
Correct Answer
B. False
50.
In star schema dimensional tables are usually not in BCNF form
Correct Answer
A. True
Explanation
In a star schema, the dimensional tables are designed to optimize query performance by denormalizing the data. This means that redundant data is intentionally introduced into the tables to avoid complex joins and improve query response time. BCNF (Boyce-Codd Normal Form) is a higher level of normalization that eliminates all functional dependencies within a table. However, in a star schema, dimensional tables often contain redundant data and are not in BCNF form to support efficient querying. Therefore, the statement that dimensional tables in star schema are usually not in BCNF form is true.