MongoDB Sharding

crackp0t

010001100100011101010100
Jun 24, 2009
3,636
70
0
This tutorial will show you how to setup a basic MongoDB Sharded Cluster. It will consist of three parts
and will require a minimum of 5 servers. It will consist of 3 configuration servers and 2 MongoDB nodes.
You can run the shard servers on the MongoDB nodes, on your application servers, or a mixture of your
MongoDB nodes and application servers.

This tutorial assumes that you can setup two functioning MongoDB servers and have added a database
to one of those servers and left the other server void of databases.

For this tutorial we will use the following hostnames.

Configuration Servers:
configsvr-one.mongodb.example.com
configsvr-two.mongodb.example.com
configsvr-three.mongodb.example.com

MongoDB Nodes:
node0.mongodb.example.com
node1.mongodb.example.com

MongoDB Shards:
conn-svr0.mongodb.example.com
conn-svr1.mongodb.example.com

Step 1)
Configuration Servers

For a testing environment you can use one and be fine, but for production 3 is recommended for redundancy.
It's also recommended to use hostnames and NOT IP addresses, because using IP addresses will force you
to restart everything should they end up changing.

**** GOTCHA BEGIN ****

Make sure you have ntp configured on all of your configuration servers. The time has to match on all of them
or it will not work!!

**** GOTCHA END ****

For this example we'll use the following hostnames for our config servers.


Getting a config server up and running is extremely simple. The only thing you need to do is ad the command line
option --configsvr to whatever you currently have running mongodb.

Ubuntu Instructions:
cp /etc/init/mongodb.conf /etc/init/mongodb-configsvr.conf
cp /etc/init.d/mongodb /etc/init.d/mongodb-configsvr
chmod +x /etc/init.d/mongodb-configsvr

vim /etc/init/mongodb-configsvr.conf
change the line that looks like 1 to 2
1) if [ "x$ENABLE_MONGODB" = "xyes" ]; then exec start-stop-daemon --start --quiet --chuid root --exec /usr/bin/mongod -- --config /etc/mongodb.conf; fi
2) if [ "x$ENABLE_MONGODB" = "xyes" ]; then exec start-stop-daemon --start --quiet --chuid root --exec /usr/bin/mongod -- --config /etc/mongodb-configsvr.conf --configsvr; fi

Debian Wheezy:
cp /etc/init/mongodb.conf /etc/init/mongodb-configsvr.conf
cp /etc/init.d/mongodb /etc/init.d/mongodb-configsvr

vim /etc/init.d/mongodb-configsvr
Change the CONF variable from 1) to 2)
1) CONF=/etc/mongodb.conf
2) CONF=/etc/mongodb-configsvr.conf

Change DAEMON_OPTS from 1 to 2
1) DAEMON_OPTS=${DAEMON_OPTS:-"--unixSocketPrefix=$RUNDIR --config $CONF run"}
2) DAEMON_OPTS=${DAEMON_OPTS:-"--unixSocketPrefix=$RUNDIR --config $CONF --configsvr run"}

You will need to run the above commands or the equivalent for your operating system on every one of your configuration servers.


If you're on Ubuntu and you followed my steps above type the following command on all of your configuration servers.
service mongodb-configsvr start

Step 2)
Once again we'll be using a hostname in the example for mongos. The command mongos stands for "MongoDB Shard"
and it's a routing service for MongoDB shard configuration. This is what you point your application to, because it's main job
is know where the data is located in the sharded cluster.

mongos instances are really lightweight and where you run them are up to you. They also don't require data directories so
you're more than welcome to run an instance of mongos on all of your mongodb nodes or on your application servers.
Personally I run them on my mongodb nodes.


Start the instances of mongos with the following command. Do this on all of the servers you want to.

**** GOTCHA BEGIN ****
Make sure there are no spaces between the hostnames and the commas!
**** GOTCHA END ****

mongos --configdb configsvr-one.mongodb.example.com,configsvr-two.mongodb.example.com,configsvr-three.mongodb.example.com


Step 3)
Time to add some shards! First we need to drop to the mongo console and connect to a mongo shard instance.

mongos conn-svr0.mongodb.example.com
sh.addShard("node0.mongodb.example.com")
sh.addShard("node1.mongodb.example.com")

You should now be able to type "show dbs" and see the database you added. If you connect to the other mongo shard instance and type
"show dbs" you should see it there as well.

sh.sharedCollection("<database>.<collection>", { shard keys })

I'll leave how to select the shard keys for another time. That's a thousand word post in its self.

Now everything should be setup and ready to go. Feel free to post comments, suggestions, or questions.

I didn't proof read this and I've only done it a couple times so for give me if there are errors.
 


This tutorial will show you how to setup a basic MongoDB Sharded Cluster. It will consist of three parts
and will require a minimum of 5 servers. It will consist of 3 configuration servers and 2 MongoDB nodes.
You can run the shard servers on the MongoDB nodes, on your application servers, or a mixture of your
MongoDB nodes and application servers.

This tutorial assumes that you can setup two functioning MongoDB servers and have added a database
to one of those servers and left the other server void of databases.

For this tutorial we will use the following hostnames.

Configuration Servers:
configsvr-one.mongodb.example.com
configsvr-two.mongodb.example.com
configsvr-three.mongodb.example.com

MongoDB Nodes:
node0.mongodb.example.com
node1.mongodb.example.com

MongoDB Shards:
conn-svr0.mongodb.example.com
conn-svr1.mongodb.example.com

Step 1)
Configuration Servers

For a testing environment you can use one and be fine, but for production 3 is recommended for redundancy.
It's also recommended to use hostnames and NOT IP addresses, because using IP addresses will force you
to restart everything should they end up changing.

**** GOTCHA BEGIN ****

Make sure you have ntp configured on all of your configuration servers. The time has to match on all of them
or it will not work!!

**** GOTCHA END ****

For this example we'll use the following hostnames for our config servers.


Getting a config server up and running is extremely simple. The only thing you need to do is ad the command line
option --configsvr to whatever you currently have running mongodb.

Ubuntu Instructions:
cp /etc/init/mongodb.conf /etc/init/mongodb-configsvr.conf
cp /etc/init.d/mongodb /etc/init.d/mongodb-configsvr
chmod +x /etc/init.d/mongodb-configsvr

vim /etc/init/mongodb-configsvr.conf
change the line that looks like 1 to 2
1) if [ "x$ENABLE_MONGODB" = "xyes" ]; then exec start-stop-daemon --start --quiet --chuid root --exec /usr/bin/mongod -- --config /etc/mongodb.conf; fi
2) if [ "x$ENABLE_MONGODB" = "xyes" ]; then exec start-stop-daemon --start --quiet --chuid root --exec /usr/bin/mongod -- --config /etc/mongodb-configsvr.conf --configsvr; fi

Debian Wheezy:
cp /etc/init/mongodb.conf /etc/init/mongodb-configsvr.conf
cp /etc/init.d/mongodb /etc/init.d/mongodb-configsvr

vim /etc/init.d/mongodb-configsvr
Change the CONF variable from 1) to 2)
1) CONF=/etc/mongodb.conf
2) CONF=/etc/mongodb-configsvr.conf

Change DAEMON_OPTS from 1 to 2
1) DAEMON_OPTS=${DAEMON_OPTS:-"--unixSocketPrefix=$RUNDIR --config $CONF run"}
2) DAEMON_OPTS=${DAEMON_OPTS:-"--unixSocketPrefix=$RUNDIR --config $CONF --configsvr run"}

You will need to run the above commands or the equivalent for your operating system on every one of your configuration servers.


If you're on Ubuntu and you followed my steps above type the following command on all of your configuration servers.
service mongodb-configsvr start

Step 2)
Once again we'll be using a hostname in the example for mongos. The command mongos stands for "MongoDB Shard"
and it's a routing service for MongoDB shard configuration. This is what you point your application to, because it's main job
is know where the data is located in the sharded cluster.

mongos instances are really lightweight and where you run them are up to you. They also don't require data directories so
you're more than welcome to run an instance of mongos on all of your mongodb nodes or on your application servers.
Personally I run them on my mongodb nodes.


Start the instances of mongos with the following command. Do this on all of the servers you want to.

**** GOTCHA BEGIN ****
Make sure there are no spaces between the hostnames and the commas!
**** GOTCHA END ****

mongos --configdb configsvr-one.mongodb.example.com,configsvr-two.mongodb.example.com,configsvr-three.mongodb.example.com


Step 3)
Time to add some shards! First we need to drop to the mongo console and connect to a mongo shard instance.

mongos conn-svr0.mongodb.example.com
sh.addShard("node0.mongodb.example.com")
sh.addShard("node1.mongodb.example.com")

You should now be able to type "show dbs" and see the database you added. If you connect to the other mongo shard instance and type
"show dbs" you should see it there as well.

sh.enableSharding("<database>")
sh.sharedCollection("<database>.<collection>", { shard keys })

I'll leave how to select the shard keys for another time. That's a thousand word post in its self.

Now everything should be setup and ready to go. Feel free to post comments, suggestions, or questions.

I didn't proof read this and I've only done it a couple times so for give me if there are errors.

Forgot a step. I added it in bold.
 
question for those in the thread/reading the thread: what use case is mongo better suited for than traditional relational databases?

I've looked at mongo a few times because of the big support base it has in the rails community but can't for the life of me think of a compelling use case for it.
 
question for those in the thread/reading the thread: what use case is mongo better suited for than traditional relational databases?

I've looked at mongo a few times because of the big support base it has in the rails community but can't for the life of me think of a compelling use case for it.

Buffer uses MongoDB.

It's more directly scalable than MySQL and it's better suited to document style storage I guess. For example with buffer they have tons of tweets to store in their DB.
 
Buffer uses MongoDB.

It's more directly scalable than MySQL and it's better suited to document style storage I guess. For example with buffer they have tons of tweets to store in their DB.

yeah but buffer is a trivially simple application to clone with normal mysql and it would perform just fine. User HasMany Tweets. Few columns for scheduling stuff, and a job queue (they use sidekiq or resque I believe so something redis backed) and you're done.

MySQL can be scaled to the moon and back and has been done so over and over again, far more than mongo has ever been scaled.

I guess I just don't get the whole document based storage thing. I don't see the point of it.
 
question for those in the thread/reading the thread: what use case is mongo better suited for than traditional relational databases?

I've looked at mongo a few times because of the big support base it has in the rails community but can't for the life of me think of a compelling use case for it.

Large number of user accounts would be one. If you have say 500 million users, it's easier and more efficient to open up that one user's catalog, versus going through a mySQL table of 500 million rows.

I think that's right at least, but if not, I'm sure crackpot or someone will correct me. I've never had a use for it, because everything I need is relational. Correct me if I'm wrong, but you can't aggregate data across multiple tables / catalogs with Mongo, right?
 
@dchuk
Mongo excels at the simple storage and querying of complex data structures. It's schemaless, so there's nothing to define in advance, other than perhaps what indexes you want to use. You literally just say "store this document in this collection" and bam, it's stored, and can be queried easily, using any of a myriad of flexible query operators. Documents do not have to follow a consistent structure, so for example you can quite easily version your document format without having to resort to additional tables and databases. On top of that, it's super easy to scale out, and because of its "eventual consistency" write model, it's super fast to fire data at it if it's not important that the data be immediately available at the end of the query.