1.
A voluminous amount of structured, semi-structured, and unstructured data that has the potential to be mined for information.
Correct Answer
D. Big Data
Explanation
Big Data refers to a large amount of structured, semi-structured, and unstructured data that has the potential to be analyzed and extracted for valuable insights. This term is used to describe datasets that are too large and complex to be processed by traditional data processing applications. Big Data often includes information from various sources such as social media, sensors, and online transactions. By analyzing Big Data, organizations can gain valuable insights, make data-driven decisions, and discover patterns and trends that can lead to innovation and improved business strategies.
2.
A free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.
Correct Answer
A. Hadoop
Explanation
Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is designed to handle big data processing and storage across a cluster of computers, providing scalability and fault tolerance. Hadoop uses a distributed file system (HDFS) and a processing framework (MapReduce) to efficiently process and analyze large volumes of data. It is widely used in the industry for big data analytics and is known for its ability to handle massive amounts of data in parallel.
3.
The branch of data mining concerned with the prediction of future probabilities and trends.
Correct Answer
B. Predictive Analytics
Explanation
Predictive analytics is the correct answer because it specifically deals with the prediction of future probabilities and trends. It involves the use of statistical algorithms and machine learning techniques to analyze historical data and make predictions about future events or outcomes. This branch of data mining focuses on identifying patterns and trends in data to forecast future behavior, allowing businesses to make informed decisions and take proactive actions.
4.
The science of examining raw data with the purpose of drawing conclusions about that information.
Correct Answer
A. Data Analytics
Explanation
Data analytics is the science of examining raw data to conclude that information. It involves the use of various techniques and tools to analyze and interpret data to uncover patterns, trends, and insights. By analyzing data, organizations can make informed decisions, identify areas for improvement, and gain a competitive advantage. Data analytics encompasses different types of analytics, such as descriptive analytics, predictive analytics, and prescriptive analytics, each serving a specific purpose in extracting meaningful information from data.
5.
An approach to querying data when it resides in a computer’s random access memory (RAM), as opposed to querying data that is stored on physical disks.
Correct Answer
C. In-memory Analytics
Explanation
In-memory analytics refers to the approach of querying data that is stored in a computer's random access memory (RAM) instead of physical disks. This approach allows for faster and more efficient data retrieval and analysis as accessing data from RAM is much quicker than accessing it from disks. By keeping the data in memory, in-memory analytics enables real-time analysis and faster decision-making, making it suitable for applications that require quick and interactive data processing.
6.
What is the name of the programming framework originally developed by Google that supports the development of applications for processing large data sets in a distributed computing environment?
Correct Answer
D. MapReduce
Explanation
MapReduce is a programming framework originally developed by Google that supports the development of applications for processing large data sets in a distributed computing environment. It allows for parallel and distributed processing of data across a cluster of computers, making it efficient for handling big data. Hive is a data warehouse infrastructure, Zookeeper is a coordination service, and Hadoop is an open-source framework that includes MapReduce as one of its components.
7.
A method of storing data within a system that facilitates the collocation of data in various schemata and structural forms.
Correct Answer
B. Data Lake
Explanation
A data lake is a method of storing data within a system that allows for the collocation of data in various schemata and structural forms. It is a centralized repository that stores raw, unprocessed data from different sources, such as databases, applications, and IoT devices. Data lakes enable organizations to store large volumes of data in its native format, without the need for upfront data modeling or transformation. This flexibility allows for more efficient data analysis and enables data scientists to explore and extract insights from diverse data sets.
8.
Leading analyst firm Gartner defines Big Data from three aspects, all starting with the letter V. Which of these are not a part of their consideration of big data?
Correct Answer
A. Value
Explanation
While some consider valuing a key proposition to big data, Gartner does not consider it one of the core concepts that define big data.
9.
Where did Hadoop get its name from?
Correct Answer
B. Toy elepHant
Explanation
One of the lead developers, Doug Cutting, had a son with a toy elephant named Hadoop.
10.
Last summer, Splunk announced a new product to search, access and report on Hadoop data sets. What is this product called?
Correct Answer
C. Hunk
Explanation
Hunk is also known as Splunk Analytics for Hadoop.
11.
According to a study conducted by IBM, what is the largest single source where data is gathered?
Correct Answer
C. Business Transactions
Explanation
According to IBM, 90% of organizations gather data on this subject – more than any other. Social Media is the lowest on this list, coming in at 39%
12.
___________ Analysis is used to analyze a system in terms of its requirements to identify its impact on customers’ satisfaction.
Fill in the blank.
Correct Answer
A. Kano
Explanation
The Kano analysis is used to analyze a system in terms of its requirements and identify its impact on customers' satisfaction. This analysis helps to categorize customer preferences into different types: basic requirements, performance requirements, and excitement requirements. By understanding these different types of requirements, businesses can prioritize their efforts and focus on delivering features and functionalities that will truly satisfy their customers.
13.
What does SAAS stand for?
Correct Answer
D. Software as a Service
Explanation
SAAS stands for Software as a Service. This model allows users to access software applications over the internet on a subscription basis, rather than having to install and maintain the software on their own computers or servers. With SAAS, the software is hosted and managed by the provider, and users can access it through a web browser. This approach offers convenience, scalability, and cost-effectiveness, as users can easily access and use the software from any device with an internet connection.
14.
According to a Jaspersoft Survey, what is the most popular big data store?
Correct Answer
A. Relational Databases
Explanation
The correct answer is A. Relational Databases. According to a Jaspersoft survey conducted in 2012, relational databases were the most popular big data store, with 56% of respondents using them, while MongoDB and Hadoop HDFS tied as the second most popular data stores, with 18% each. Relational databases are data stores that use a structured and predefined schema to organize data into tables, rows, and columns. They are widely used for transactional and analytical purposes, as they offer high performance, reliability, consistency, and security.
15.
Which of the following is/are the correct types of data?
Correct Answer
D. Both a & b
Explanation
Semi-structured data and unstructured data are both correct types of data. Semi-structured data refers to data that does not have a fixed structure, but still has some organizational elements, such as tags or labels. Unstructured data, on the other hand, does not have any predefined structure or organization. Both types of data are important in different contexts and require different approaches for analysis and storage.