Why We Should Have Different Databases

@artemgArtem Gogin

Data Engineer, Teacher and Technical Writer.

Today there are hundreds of SQL and NoSQL databases. Some of them are popular, some are ignored. Some are user-friendly and well documented and some are hard to use. Some are open-sourced and some are proprietary. And, perhaps, the most important – some are scalable, optimized, highly available and some are difficult to scale or maintain.

There comes the natural question: how to choose a database? To answer it, we should decide, what we want to achieve with a database. To create a view, we should answer questions like these:

Do we need analytical access to the database?
Do we need real-time writing or reading?
How many tables/records we want to store?
What availability do we need?
Do we need columns?
Will we access tables with filter by column or with filter by row?

When the decision is done, we need to keep in mind, what one or another database able to offer. Particular features of each database may vary, but in general, there are only a few types of databases. Within these types, we can achieve mostly the same goals. Let’s look at them closely.

1. SQL relational databases.

If you have ever worked with databases, most likely you have begun with this type of database. This type is the most popular and widespread. These databases allow storing data in relational tables with defined columns of a particular type. Relational tables are good normalization and joins.

Advantages

SQL support
ACID transactions (Atomicity, Consistency, Isolation, and Durability)
Indexing and partitioning support

Disadvantages

Poor support of unstructured data / complex types
Bad optimization for event processing
Difficult/expensive scaling

Examples: Oracle DB, MySQL, PostgreSQL.

2. Document-oriented databases.

If we don’t want to join several tables to retrieve desired data, we can look at the document-oriented databases. These databases allow storing records in JSON-like format. With this format, we can create complex value for any key and include all the data structure in one record at once.

Advantages

Schema free
No need to always write all the fields in every record
Good complex types support
OLTP suited

Disadvantages

Poor transactions support
Poor analytics support
Difficult/expensive scaling

Examples: MongoDB

3. In-memory databases.

Databases of this type can provide real-time response for selecting and inserting particular records. Most of them mainly store data into RAM but also offer persistent storage on HDD or SSD for some cases. Most of these databases operate with key/value records, so the values may recall document-oriented format. But some databases also operate with columns and allow secondary indexing in the same table. Using RAM allows to process data rapidly but makes it more unstable and expensive.

Advantages

Fast writing
Fast reading

Disadvantages

Difficult reliability
Expensive scaling

Examples: Redis, Tarantool, Apache Ignite

4. Wide-column databases.

These databases store data as key/value records on HDD or SSD. These solutions are designed to scale well enough to manage petabytes of data across thousands of commodity servers in a distributed system. They represent the SSTable architecture. This architecture was designed for two use cases: fast access by key and fast, highly available writing.

Advantages

Fast writing row by row
Fast reading by key
Good scalability
High availability

Disadvantages

Key/value format
No analytics support

Examples: Cassandra, HBase

5. Columnar databases.

Sometimes we need to access data fast not with particular keys, but with particular columns. In this case, we better get rid of inserting row by row and move to batch writing. Batch inserts allow columnar databases to prepare the data for rapid read by columns.

Advantages

Fast reading by column
Good analytics support
Good scalability

Disadvantages

Only good for batch inserts

Examples: Vertica, Clickhouse

6. Search engine

If we want to access the data with filter by any value and even with any word in column, we should remember search engines. These databases perform indexing of every word in columns and allow full-text search. They are perfect for storing and analyzing logs or large text values.

Advantages

Quick access by any word
Good scalability

Disadvantages

Only good for batch inserts
Poor analytics support

Examples: ElasticSearch, Apache Solr

7. Graph databases

For some use cases exist graph data structures. We can find their realization in graph databases. If your tasks require working with graphs, there are special databases designed to satisfy your needs.

Advantages

Graph data structure
Manageable relations between entities
Flexible structures

Disadvantages

Special query language
Difficult to scale

Examples: Neo4j

Conclusion

Almost every task can be done with almost any type of database. The question is how expensive and optimized it would be. Choosing the tool you are used to can reduce your time to market, but it also can cost you an enormous amount of money to maintain and expand your hardware, which may be used inefficiently. Always try to use a database in the way it was meant to use. Perhaps, a solution that suits your needs already exists.

Generative Data Intelligence

@artemgArtem Gogin

1. SQL relational databases.

2. Document-oriented databases.

3. In-memory databases.

4. Wide-column databases.

5. Columnar databases.

6. Search engine

7. Graph databases

Conclusion

Tags

Pearson plc email alerting service (18 April 2024)

Arizona’s Apache Junction Unified School District Supports Innovative Teaching and Learning with New EdTech Resources

Latest Intelligence

Oil retreats in the red while markets downplay Israel’s attack as just a warning shot

Silver Price Forecast: XAG/USD falls after early gains from $29, near-term outlook remains firm

18 April 2024: Google Alert – LRN

AgriZeroNZ investment in biolumic to shine light on low emissions pasture solution

Media round-up – Carbon News

Friday 5: Universal Design for Learning

Chat with us