Database have been around almost almost as long as computer have been around. There are many flavours of databases out in wild, and as a Data Engineer you will run in to some of them if not all of them throughout your career. In this post I will look at the 4 most common types of databases you could encounter.
OLTP
The grand daddy of the database world. Online Transaction Processing (OLTP) is your most common database out there and the one that I guess most people associate with as the “database” if you are thinking about databases.
This type of database is optimised for transactional processing, typically used for managing and processing large amounts of data in real-time. OLTP systems are designed to handle a high number of concurrent transactions, such as those that occur in e-commerce or financial systems (i.e. banking or credit card activity).
OLAP
The Online Analytical Processing (OLAP) databases are similar to OLTP databases in that they are built from the same core technology, the difference in being in the way the database is optimised.
OLAP databases use a technique called pre-aggregation, which stores the data in a highly summarised form, which makes it easy and fast to query large amounts of data. They are are optimised for complex analytical\reporting queries and are typically used for data warehousing and business intelligence applications.
It’s important to note that OLAP systems are traditionally designed to support read-only queries against append-only style databases, and do not support write-transactions, instead, data is loaded from other data sources like your OLTP kind of database, where the transactions happen.
NOSQL
The first time I was introduced to NoSQL was at a MongoDB conference back in 2012. The idea that data didn’t have to be relational, or follow any sort of schema blew my mind.
A NoSQL (Not Only SQL) database is a type of database that does not use a fixed schema and is designed to handle large amounts of unstructured data. NoSQL databases are often used for big data and real-time web applications. They are typically more flexible and scalable than traditional relational databases and can be document-based, key-value based, graph-based, or column-based. Examples of NoSQL databases include MongoDB, Cassandra, and Redis.
On the flip side without a fixed schema, NoSQL databases lack the constraints that ensure data consistency. This lack of structure can make it difficult to maintain reliability and consistency in large-scale data systems. As a result, it is possible for large amounts of incorrect or inconsistent data to exist in a NoSQL database.
NewSQL
The best of both worlds and pretty much re-architected from the ground up, Hello to the NewSQL database. This type of database combines the features of a traditional relational database (such as OLTP and OLAP) with the advantages of NoSQL databases, such as resistance to failures and the ability to scale easily.
These databases are designed to handle large amounts of data and high levels of concurrency while also supporting transactions and SQL-based querying. Examples of NewSQL databases include Google Spanner and CockroachDB.
So…
Ultimately, when working with data you will be working with databases somewhere along your data pipeline. Having knowledge about different types of databases and understanding when to use them and why can be a significant advantage when working on data-related projects.
Tim