IT 234 UNIT 10 ASSIGNMENT-in-Memory Database

23 September, 2024 | 6 Min Read

Running head: UNIT 10 ASSIGNMENT

Unit 10 Assignment

Kaplan University

IT 234 Database Concepts

When creating a database, the first thing that needs to be decided is what kind of information will be going into the database, and what is the best way to store the information inside of it. One of the more popular choices for a database is the relational database. “A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many ways without having to reorganize the database tables” [Rou32]. The standard user and application program interface to a relational database is the Structured Query Language (SQL). SQL statements are used to create interactive queries for information from a relational database and is used for gathering data for reports. Relational databases have a reputation of being easy to create and use, because they can easily be extended or edited after being created. Although, relational databases are popular, they are not the only option for creating a database. Among the few alternatives are: In-memory databases, Hadoop/NoSQL, Virtualized or “Federated” databases, Columnar databases, and Streaming databases.

The first alternative to a relational database is an in-memory database (IMDB). An IMDB is a database whose data is stored in main memory to facilitate faster response times. The source data is loaded into system memory in a compressed, non-relational format. “An IMDB is categorized as an analytic database, which is a read-only system that stores historical data on metrics for business intelligence/business analytics (BI/BA) applications, typically as part of a data warehouse or data mart” [Rou33]. This type of database allows users to run queries and reports on the information contained, which is regularly updated to incorporate recent transaction data from an organization’s operational system. Not only does this database provide extremely fast query response times, it also can reduce or eliminate the need for data indexing and storing pre-aggregated data in OLAP cubes or aggregate tables. By doing this it reduces IT costs and allows faster implementation of BI/BA applications [Rou33].

A second alternative to relational databases is Hadoop/NoSQL. Hadoop is used because it applies looser standards to data managements, (allowing temporary inconsistency between data copies or related data) to maximize throughput [Ker15]. NoSQL does not mean “no relational allowed” but rather “where relational simply cannot scale adequately”. Therefore, over time this type of approach has standardized on storing massive amounts of Web/cloud data as files, handled by the Hadoop data-access software. Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment, and is part of the Apache project sponsored by the Apache Software Foundation [Rou34]. Hadoop makes it possible to run applications with thousands of hardware nodes, and to handle thousands of terabytes of data. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating in case of a node failure. By using this approach, it lowers the risk of a catastrophic system failure and unexpected data loss, even if a significant number of nodes become inoperative [Rou34]. Hadoop has quickly emerged as a foundation for big data processing tasks, such as scientific analytics, business and sales planning, and processing enormous volumes of sensor data, including from “Internet of Things” sensors.

The next type of alternative are virtualized databases. The fundamental idea behind a virtualized database is to provide a “veneer” that looks like a database and allows common SQLlike access to widely disparate data sources. Over time, this aim has come close to complete reality, as virtualized databases now offer: administration, one-interface development, and dynamically evolving support for most of today’s new data types [Ker15]. A key feature of virtualized databases is support for data discovery and global metadata repositories. Meaning, users can now get a better understanding of the range of data that is in-house than that of which data warehouses give them, plus support for data quality initiatives such as data governance [Ker15]. Virtualization databases provide performance optimization which has led to “touch first, move if necessary” data processing, which has eliminated a key problem with Hadoop use [Ker15].

Another alternative to the traditional relational database is a columnar database. A columnar database is a database management system (DBMS) that stores data in columns instead of rows, like a traditional relational database. The goal of a columnar database is to efficiently write and read data to and from hard disk storage to speed up the time it takes to return a query. A columnar database works differently from relational database because, “all the column 1 values are physically together, followed by all the column 2 values, etc” [Rou35]. The data is then stored in record order, so the 50^th entry for column 1 and the 50^th entry for column 2 belong to the same input record. This allows individual data elements, such as a customer name for instance to be accessed in column as a group, rather than individually row-by-row [Rou35]. A main benefit of a columnar database is that the data can be compressed. This compression permits columnar operations (min, max, sum, count) to be performed very rapidly. Another benefit of columnar database is that since it is self-indexing, it uses less disk space than a relational database management system (RDBMS) containing the same data.

The final alternative, that I will mention, that can be used instead of relational databases are streaming databases. A streaming database treats data as a single steam passing under the “head” of the database engine, which must make an immediate decision on whether to store it, process it, use it to generate an alert and/or re-route it to some other appropriate data source [Ker15]. Since, streaming databases provide this feature they are often used as a “rapidresponse” approach, which is unable to bring the full context of data warehouse in analyzing the net bit of information but far quicker to note obvious important and time-critical data.

Although relational databases are the most popular choice for creating a database, it is important to look at what features will fit your needs the best, because in some cases a relational database might not be the best choice. Before starting you will have to do some research to decide which type of database will be the best for your needs.

References

Kernochan, W. (2015, October 16). 5 Alternatices to the Traditional Relational Database. Retrieved from Enterprise Apps Todat: http://www.enterpriseappstoday.com/datamanagement/slideshows/5-alternatives-to-the-traditional-relational-database.html

Rouse, M. (n.d.). Columnar Database. Retrieved from Tech Target: Search Data Management:

http://searchdatamanagement.techtarget.com/definition/columnar-database

Rouse, M. (n.d.). Hadoop. Retrieved from Tech Target: Search Cloud Computing: http://searchcloudcomputing.techtarget.com/definition/Hadoop

Rouse, M. (n.d.). In-memory database. Retrieved from Tech Target: WhatIs.com: http://whatis.techtarget.com/definition/in-memory-database

Rouse, M. (n.d.). Relational Database. Retrieved from Tech Target: Search SQL Server:

http://searchsqlserver.techtarget.com/definition/relational-database

Nov 23, 2024

NTR 100 COMPLETE Syllabus and Academic Integrity Acknowledgement Arizona State University

NTR 100 COMPLETE Syllabus and Academic Integrity Acknowledgement Question 1 1 / 1 pts I have read the ASU …

by Admin

Nov 23, 2024

HEP 456 Module 6 Section 14 Communication and Dissemination of The Findings Arizona State University

HEP 456 Module 6 Section 14 Communication and Dissemination of The Findings HEP 456: Health Promotion Program …

by Admin

Nov 23, 2024

HEP 456 Module 5 Section 12 and 13 Planning for Analysis and Interpretation and Gantt chart

HEP 456 Module 5 Section 12 and 13 Planning for Analysis and Interpretation and Gantt chart Name HEP 456: …

by Admin