Data Persistence

1. Discuss the role of data in information systems indicating the need for data persistence

Persistent data in the field of data processing denotes information that is infrequently accessed and not likely to be modified. Static data is information, for example a record, that does not change and may be intended to be permanent. It may have previously been categorized as persistent or dynamic.

2. Explain the terms: Data, Database, Database Server, and Database Management System
Data : Information in raw or unorganized form
Database : A database is a collection of information that is organized so that it can be easily accessed, managed and updated
Database Server : A database is a collection of information that is organized so that it can be easily accessed, managed and updated
Database Management System : system software for creating and managing databases. The DBMS provides users and programmers with a systematic way to create, retrieve, update and manage data.
3. Compare Files and Databases, discussing pros and cons of them

Pros of the File System

Performance can be better than when you do it in a database. To justify this, if you store large files in DB, then it may slow down the performance because a simple query to retrieve the list of files or filename will also load the file data if you used Select * in your query. In a files ystem, accessing a file is quite simple and light weight.

Saving the files and downloading them in the file system is much simpler than it is in a database since a simple "Save As" function will help you out. Downloading can be done by addressing a URL with the location of the saved file.

Migrating the data is an easy process. You can just copy and paste the folder to your desired destination while ensuring that write permissions are provided to your destination.

It's cost effective in most cases to expand your web server rather than pay for certain databases.

It's easy to migrate it to cloud storage i.e. Amazon S3, CDNs, etc. in the future.

Cons of the File System

Loosely packed. There are no ACID (Atomicity, Consistency, Isolation, Durability) operations in relational mapping, which means there is no guarantee. Consider a scenario in which your files are deleted from the location manually or by some hacking dudes. You might not know whether the file exists or not. Painful, right?

Low security. Since your files can be saved in a folder where you should have provided write permissions, it is prone to safety issues and invites trouble, like hacking. It's best to avoid saving in the file system if you cannot afford to compromise in terms of security.

Pros of Database

ACID consistency, which includes a rollback of an update that is complicated when files are stored outside the database.

Files will be in sync with the database and cannot be orphaned, which gives you the upper hand in tracking transactions.

Backups automatically include file binaries.

It's more secure than saving in a file system.

Cons of Database

You may have to convert the files to blob in order to store them in the database.

Database backups will be more hefty and heavy.

Memory is ineffective. Often, RDBMSs are RAM-driven, so all data has to go to RAM first. Yeah, that’s right. Have you ever thought about what happens when an RDBMS has to find and sort data? RDBMS tracks each data page — even the lowest amount of data read and written — and it has to track if it’s in-memory or if it’s on-disk, if it’s indexed or if it's sorted physically etc.

4. Discuss different arrangements of data, giving examples for each
Un-structured

Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.
Examples include e-mail messages, word processing documents, videos, photos, audio files, presentations, webpages and many other kinds of business documents.

Semi-structured

Semi-structured data is data that is neither raw data, nor typed data in a conventional database system. It is structured data, but it is not organized in a rational model, like a table or an object-based graph. A lot of data found on the Web can be described as semi-structured. Data integration especially makes use of semi-structured data.

Examples of semi-structured : CSV but XML and JSON documents are semi structured documents, NoSQL databases are considered as semi structured.

Structured

structured data. Structured data refers to any data that resides in a fixed field within a record or file. This includes data contained in relational databases and spreadsheets

Examples of structured data include numbers, dates, and groups of words and numbers called strings. Most experts agree that this kind of data accounts for about 20 percent of the data that is out there. Structured data is the data you're probably used to dealing with. It's usually stored in a database.

5. Explain different types of databases, providing examples for their use

Database type
•Hierarchical databases
•Network databases
•Relational databases
•Non-relational databases (NoSQL) •Object-oriented databases •Graph databases
•Document databases

Hierarchical Databases

In a hierarchical database management systems (hierarchical DBMSs) model, data is stored in a parent-children relationship nodes. In a hierarchical database, besides actual data, records also contain information about their groups of parent/child relationships.

In a hierarchical database model, data is organized into a tree like structure. The data is stored in form of collection of fields where each field contains only one value. The records are linked to each other via links into a parent-children relationship. In a hierarchical

database model, each child record has only one parent. A parent can have multiple children.

Network Databases

Network database management systems (Network DBMSs) use a network structure to create relationship between entities. Network databases are mainly used on a large digital computers. Network databases are hierarchical databases but unlike hierarchical databases where one node can have one parent only, a network node can have relationship with multiple entities. A network database looks more like a cobweb or interconnected network of records.

In network databases, children are called members and parents are called occupier. The difference between each child or member can have more than one parent.

Relational Databases

In relational database management systems (RDBMS), the relationship between data is relational and data is stored in tabular form of columns and rows. Each column if a table represents an attribute and each row in a table represents a record. Each field in a table represents a data value.

Structured Query Language (SQL) is a the language used to query a RDBMS including inserting, updating, deleting, and searching records.

Relational databases work on each table has a key field that uniquely indicates each row, and that these key fields can be used to connect one table of data to another.
Relational databases are the most popular and widely used databases. Some of the popular DDBMS are Oracle, SQL Server, MySQL, SQLite, and IBM DB2.

Object-Oriented Model
In this Model we have to discuss the functionality of the object oriented Programming. It takes more than storage of programming language objects. Object DBMS's increase the semantics of the C++ and Java.I t provides full-featured database programming capability, while containing native language compatibility. It adds the database functionality to object programming languages. This approach is the analogical of the application and database development into a constant data model and language environment. Applications require less code, use more natural data modeling, and code bases are easier to maintain. Object developers can write complete database applications with a decent amount of additional effort.

Graph Databases

Graph Databases are NoSQL databases and use a graph structure for sematic queries. The data is stored in form of nodes, edges, and properties. In a graph database, a Node represent an entity or instance such as customer, person, or a car. A node is equivalent to a record in a relational database system. An Edge in a graph database represents a relationship that connects nodes. Properties are additional information added to the nodes.

The Neo4j, Azure Cosmos DB, SAP HANA, Sparksee, Oracle Spatial and Graph, OrientDB, ArrangoDB, and MarkLogic are some of the popular graph databases. Graph database structure is also supported by some RDBMs including Oracle and SQL Server 2017 and later versions.

6. Compare and contrast data warehouse with Big data

7. Explain how the application components communicate with files and databases

To allow BRM components to communicate with each other, you use entries in configuration or properties files. The basic connection entries in the files identify the host names and port numbers of each component.

These connection entries are set when you install BRM and when you install each client application. You can change them if you change your configuration. Depending on how you install BRM, you might have to change some entries to connect BRM components.

8. Differentiate the SQL statements, Prepared statements, and Callable statements

Statement - Use this for general-purpose access to your database. Useful when you are using static SQL statements at runtime. The Statement interface cannot accept parameters.
PreparedStatement - Use this when you plan to use the SQL statements many times. The PreparedStatement interface accepts input parameters at runtime.
CallableStatement - Use this when you want to access the database stored procedures. The CallableStatement interface can also accept runtime input parameters.

9. Argue the need for ORM, explaining the development with and without ORM
Object-relational mapping (ORM, O/RM, and O/R mapping tool) in computer science is a programming technique for converting data between incompatible type systems using object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language. There are both free and commercial packages available that perform object-relational mapping, although some programmers opt to construct their own ORM tools.
For example, here is a completely imaginary case with a pseudo language:

You have a book class, you want to retrieve all the books of which the author is "Linus". Manually, you would do something like that:

book_list = new List();
sql = "SELECT book FROM library WHERE author = 'Linus'";
data = query(sql); // I over simplify ...
while (row = data.next())
{
book = new Book();
book.setAuthor(row.get('author');
book_list.add(book);
}

With an ORM library, it would look like this:

book_list = BookTable.query(author="Linus");

The mechanical part is taken care of automatically via the ORM library.
10. Discuss the POJO, Java Beans, and JPA, indicating their similarities and differences
JavaBeans: JavaBeans are reusable software components for Java that can be manipulated visually in a builder tool. Practically, they are classes written in the Java programming language conforming to a particular convention. They are used to encapsulate many objects into a single object (the bean), so that they can be passed around as a single bean object instead of as multiple individual objects. A JavaBean is a Java Object that is serializable, has a nullary constructor, and allows access to properties using getter and setter methods.

POJO (Plain Old Java Object): A Plain Old Java Object or POJO is a term initially introduced to designate a simple lightweight Java object, not implementing any javax.ejb interface, as opposed to heavyweight EJB 2.x (especially Entity Beans, Stateless Session Beans are not that bad IMO). Today, the term is used for any simple object with no extra stuff.

Enterprise JavaBeans (EJB) is a managed, server software for modular construction of enterprise software, and one of several Java APIs. EJB is a server-side software component that encapsulates the business logic of an application.

11. Identify the ORM tools available for different development platforms (Java, PHP, and .Net)
Hibernate ORM (Hibernate in short) is an object-relational mapping tool for the Java programming language. It provides a framework for mapping an object-oriented domain model to a relational database. Hibernate handles object-relational impedance mismatch problems by replacing direct, persistent database accesses with high-level object handling functions.
IBatis / MyBatis
iBATIS is a persistence framework which automates the mapping between SQL databases and objects in Java, .NET, and Ruby on Rails. In Java, the objects are POJOs (Plain Old Java Objects). The mappings are decoupled from the application logic by packaging the SQL statements in XML configuration files. The result is a significant reduction in the amount of code that a developer needs to access a relational database using lower level APIs like JDBC and ODBC.

Features of IBatis:

Support for Unit of work / object level transactions
In memory object filtering
Providing an ODMG compliant API and/or OCL and/or OPath
Supports multiservers (clustering) and simultaneous access by other applications without loss of transaction integrity

12. Discuss the need for NoSQL indicating the benefits, also explain different types of NoSQL databases

Key benefits of NoSQL include:

Efficient, scale-out architecture instead of monolithic architecture
The ability to handle high volumes of structured, semi-structured, and unstructured data
Being better aligned with object-oriented programming
Working well with today's software development methodologies that involve agile sprints and frequent code pushes

Types of NoSQL databases-
There are 4 basic types of NoSQL databases:
Key-Value Store – It has a Big Hash Table of keys & values {Example- Riak, Amazon S3 (Dynamo)}

The schema-less format of a key value database like Riak is just about what you need for your storage needs. The key can be synthetic or auto-generated while the value can be String, JSON, BLOB (basic large object) etc.
Document-based Store- It stores documents made up of tagged elements. {Example- CouchDB}

The data which is a collection of key value pairs is compressed as a document store quite similar to a key-value store, but the only difference is that the values stored (referred to as “documents”) provide some structure and encoding of the managed data. XML, JSON (Java Script Object Notation), BSON (which is a binary encoding of JSON objects) are some common standard encodings.

Column-based Store- Each storage block contains data from only one column, {Example- HBase, Cassandra}

In column-oriented NoSQL database, data is stored in cells grouped in columns of data rather than as rows of data. Columns are logically grouped into column families. Column families can contain a virtually unlimited number of columns that can be created at runtime or the definition of the schema. Read and write is done using columns rather than rows.
Graph-based-A network database that uses edges and nodes to represent and store data. {Example- Neo4J}

In a Graph Base NoSQL Database, you will not find the rigid format of SQL or the tables and columns representation, a flexible graphical representation is instead used which is perfect to address scalability concerns. Graph structures are used with edges, nodes and properties which provides index-free adjacency. Data can be easily transformed from one model to the other using a Graph Base NoSQL database.
13. Discuss what Hadoop is, explaining the core concepts of it

Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems. It is at the center of a growing ecosystem of big data technologies that are primarily used to support advanced analytics initiatives, including predictive analytics, data mining and machine learningapplications. Hadoop can handle various forms of structured and unstructured data, giving users more flexibility for collecting, processing and analyzing data than relational databases and data warehouses provide.
14. Explain the concept of IR, identifying tools for IR

International relations is a vibrant field of significant growth and change. This book guides students through the complexities of the major theories of international relations and the debates that surround them, the core theoretical concepts, and the key contemporary issues. Introduced by an overview of the discipline's development and general structure, the more than 40 entries are broken down as follows:
Parts one and two introduce the key theories and each chapter includes:

A broad overview
A discussion of methodologies
A review of empirical applications
A guide to further reading and useful websites

Part three discusses the major concepts and for each concept provides:

An introduction to the core questions
An overview of the definitions and theoretical perspectives
A review of empirical problems
Links to other entries, further reading and useful websites

Data Persistence

Friday, April 12, 2019