Adam Sosiński | Databases | 12.01.2022
One of the key challenges for programmers, architects and managers involved in e-commerce projects is the selection of an appropriate database for storing data representing products or services. Just as products are physically kept in warehouses, in the virtual world information about them is stored in databases. When choosing a database management system (DBMS) for your online store, you need to pay attention to a number of different aspects: flexibility, high availability, reliability, handling multiple inquiries and data timeliness. An example of a popular system addressing these needs is MongoDB, the capabilities of which I will discuss in this article.
In simple terms, e-commerce means commercial transactions conducted electronically on the Internet. By this, we mean sale-purchase transactions, as payment and delivery can be done either online or offline. Online stores are the most popular type of this trade, and for some people they equate to the concept of e-commerce itself. It is worth noting, however, that apart from e-shops, we can also distinguish auction sites, e-exchange offices, electronic banking and betting platforms.
In the e-commerce industry, databases undertake special tasks.
A well-configured database system should:
This is especially important during peak sales periods, such as Black Friday or Cyber Monday, that can translate into an increased number of queries. For this reason, e-commerce companies should focus on database scalability.
Let’s take a closer look at data storage for e-commerce services. We can choose from several databases, the best-known of which are relational (SQL) and non-relational (NoSQL). Let’s take a look at the differences between them. To be more precise, SQL is a Structured Query Language, a language for retrieving data from a relational database. However, this type of database has been called an “SQL database”, so I will use this term for the purposes of comparison. It also makes it easier to remember the name of the second type – a NoSQL database – that is often referred to as “not SQL”.
There are 5 basic differences between them:
|clearly defined data relationships||no relationship; the data in our database is loosely coupled|
|data is stored in tables||data stored in documents, graphs, as the so-called key-value|
|defined schema||dynamic schema, unordered data|
|preferred in the case of multi-line operations||preferred when the speed of data acquisition is important|
|vertically scalable||horizontally scalable|
As you can see, the NoSQL databases perfectly match the requirements and needs of the e-commerce market in terms of data availability and storage. Currently, the most popular database system of this type is MongoDB.
MongoDB is a document database for easy creation and scaling. Documents are created and stored in BSON (Binary JSON) format. Thanks to the use of JSON, it’s very easy to convert the queries and results into a format that can “understand” the frontend code of the e-commerce application. It is also more understandable for humans. The NoSQL solution includes hierarchy, automatic fragmentation, and built-in replication for better scalability and high availability.
Now that we have a picture of what the main challenges are in e-commerce and are sure that MongoDB is a good choice for data storage, let’s learn more about how MongoDB can support the e-commerce industry.
Thanks to dynamic schemas, the documents in the collection do not have to have the same fields, and a given field can have different types depending on the document. This increases the flexibility of mapping to entities or objects. However, practice shows that the structure of documents inside the collection is similar. To guarantee this, MongoDB introduced the ability to set validation rules per collection.
Thanks to the use of the JSON format, it’s easy to structure the data. You can do this by embedding one document in another or by providing references. The use of a given method should be considered individually for each collection. Embedding is recommended because it allows you to obtain data with a single query, which improves the system’s performance. References are worth considering for more complex hierarchy representations or when the benefits of embedding do not outweigh the effects of data duplication (such as the need to monitor changes when replacing data).
MongoDB utilizes a concept called Replica Set, which is a set of nodes containing the same data. This enables data replication, the purpose of which is to increase availability and protect against database server failures. A properly designed architecture also allows for faster access to data.
We will discuss the key assumptions and replication mechanisms on the basis of the diagram below.
The replica set consists of one node, the so-called Primary member, and Secondary members. There is also a special member of such a set, the Arbiter, which does not contain a copy of the data but is used to select an alternative in the event that the main server is unavailable.
Saving operations are performed only on the Primary instance, from which the built-in MongoDB mechanism then copies the data to the other instances.
By default, read operations also go through the Primary instance, but it’s possible to configure the nodes so that the secondary servers are used to handle queries, which may involve the occurrence of the so-called eventual-consistency, i.e. the delayed update of data.
The clocking mechanism (heartbeat). Each of the nodes (members) polls the others every 2 seconds to check their availability. If the main server is unavailable, a new one is selected.
This process consists in selecting the one with the highest priority from the remaining instances. According to documentation, the replica can have up to 50 nodes, of which only 7 can participate in the selection process (voting); the successor is chosen from among them. Other servers, named Non-Voting members, must have the properties votes and priority set to 0. Setting an uneven number of voting instances is recommended; hence, the minimum number of nodes in a replica set is 3.
Fragmentation means the process of dividing a data set into smaller pieces. In doing so, you can scale your database horizontally, practically without any limits. For fragmentation, MongoDB uses a cluster that consists of:
The relationship between the components is presented in the following diagram:
For fragmentation, it is important to choose the right key and strategy. When selecting the document field that you wish to use as the key, you need to consider:
When it comes to strategies, there are two to take advantage of:
With this strategy, MongoDB automatically generates Hash from the key field values. It works well when the key values change consistently. Hash increases the consistent distribution of documents between shards. The disadvantage is that in the case of inquiries about a given scope, it is unlikely that all documents will be in one shard. This results in polling all parts of the collection (chunks), because the router cannot clearly determine which shard the searched documents are located in.
Each of the shards holds parts of the collection within a given key-value range. This strategy works well when the set of values for the key is large, but each of them does not repeat often. The main advantage is that you can target your inquiry to a specific shard or collection, which significantly affects the polling speed. The built-in MongoDB mechanism serves divide into parts and to allocate them. The mechanism ensures that they are consistently distributed and tries to maintain similarity in their sizes. When deciding on fragmentation, remember that MongoDB does not have an option allowing you to merge data – you only run fragmentation again using a different key.
As of version 3.6, MongoDB allows you to listen for changes in a selected collection, database or the entire system, except for the admin, premises and config collections. This is done by starting the cursor, which allows you to iteratively navigate through events related to a given range. Since this mechanism uses aggregation, you can also listen for specific changes or modify received notifications. The basic requirement is to use a replica set as notification takes place at the point of saving changes in the majority of those that are responsible for data storage.
Change streams use a special, limited oplog collection to store information on operations that have an impact on the current state of the data. Documents in this collection rotate, which means that when the new document reaches the size limit of the collection, the oldest ones are deleted. Therefore, you should choose the appropriate size for this collection, depending on the frequency of events, so that you can capture the selected one before it is removed.
According to predictions, the dynamic development of e-commerce in Poland will continue for the next few years. Customers’ requirements for websites or applications are growing. The most important factors in improving the Customer Experience include availability, speed and reliability. A properly configured database system such as MongoDB is resistant to failures, scalable, and allows you to hierarchize and store of large amounts of data, so it fulfils the needs of any e-commerce projects.