Introduction to Document Databases with MongoDB

When should you drop your relational database for a shiny NoSQL alternative?

Introduction to Document Databases with MongoDB

Derick Rethans

By now most you will probably have heard of the term NoSQL. It's a vague term that covers a lot of different types of database engines. The main classes of NoSQL databases are key/value stores, column databases, graph databases and document databases. Examples of a key/value stores are memcache or Redis, where data can only be stored and retrieved through a specific key. Column databases, such as Cassandra and Hadoop, are great for processing large amounts of data. Graph databases such as Neo4j and OrientDB model the relations between entities. Apache CouchDB and MongoDB belong to the last category, Document databases. We will be looking extensively at MongoDB in this article.

In a document database such as MongoDB the smallest unit is a document. In MongoDB, documents are stored in a collection, which in turn make up a database. Document are analogous to rows in a SQL table, but there is one big difference: not every document needs to have the same structure - each of them can have different fields and that is a very useful feature in many situations. Another feature of MongoDB is that fields in a document can contain arrays and or sub-documents (sometimes called nested or embedded documents).

MongoDB's Strengths

Supporting a different set of fields for each document in a collection is one of MongoDB's features. It allows you to store similar data, but with different properties in the same collection. A good example of this is storing real (not MongoDB) documents in a way that is beneficial for a Content Management System (CMS). The CMS might want to store articles, which have certain properties (e.g. author, tags, and body), but also related books, which have additional properties such as their ISBN number, but no body field. An article may need to store the periodical's ISSN number in lieu of an ISBN number. In a relational database there are various ways to solve this. Most frequently it is either solved by having a table per object "class" (article or book) or coming up with a scheme that stores object's properties in linked tables (for example through the EAV pattern). In MongoDB you would simply store the article and book with the fields they need, using the structure shown below.

{
    _id: ObjectId("51156a1e056d6f966f268f81"),
    type: "Article",
    author: "Derick Rethans",
    title: "Introduction to Document Databases with MongoDB",
    date: ISODate("2013-04-24T16:26:31.911Z"),
    body: "This arti..."
},
{
    _id: ObjectId("51156a1e056d6f966f268f82"),
    type: "Book",
    author: "Derick Rethans",
    title: "php|architect's Guide to Date and Time Programming with PHP",
    isbn: "978-0-9738621-5-7"
}

Even though the two documents represent different classes of objects, you can still construct a query that looks for all the items by an author, or for all the items with a specific title.

Data Model

Each document in a collection in MongoDB can look totally different, and how you structure your documents is up to you. MongoDB doesn't enforce a schema, but your application still should. Although MongoDB is generally very fast, the way how you structure and index your documents and collections has a big influence on the performance of your application. While designing your schema you should focus more on how the data is inserted, updated and queried and less on how the data is structured. If sometimes you need to denormalise your data, then that is a totally normal thing to do, even though it might look dirty at first.

Interactions Between Collections

MongoDB makes different choices regarding functionality and scaling than relational databases. MongoDB is very easy to scale through replication and sharding, but it misses out on features like joins and transactions because of this. Operations in MongoDB are only atomic per single document, and only operate on one collection. Not allowing operations between collections (joins) sounds like a real issue, but with the support of arrays and sub-documents this is actually in most cases not a problem.

This is just a preview - to read the rest of the article, download your free copy of Web & PHP Magazine.

 

Comments

Add new comment