Why should I bother knowing about Docker ?
Short Answer: it is used in a lot of open-source codebases like for example cal.com , OpenMRS , many of which belong to GSoC organsations as well . So if you are someone who wishes to make serious contributions to open-source , then you might need to know it .
Long Answer: it helps in containerization . So that it is quite easy for the developer to setup the project with a single command (or sometimes two) such that whatever necessary stuff is required for the project like lets say certain version of Node , a local MongoDB database and so on , will be initialized in a container and you can then work on your project . After you are done with your project , you can simply destroy the container so that its contents get removed . And that is not necessarily a bad thing , since now you have no requirements of lets say that local MongoDB database right , it would rather occupy space in your machine .
Lets learn more about Containerization .
What is Containerization ?
Lets take the previous example . You bring an open-source codebase into your machine but now you need the corresponding dependencies like Node , local MongoDB database and some other things . Now without containerization what will you do ?
You will run commands separately for each one of them . After your project is done , you would be still left with the dependencies since you installed them in your machine .
To remove this hassle , concept of containerization is introduced where you run a single command such that a local container is started which basically behaves like a virtual machine and loads the necessary dependencies in it instead of your machine . Now when you are done with your project , you can simply destroy the container and all the dependencies will get removed (except the codebase obviously).
There is one more advantage of containerization , but before knowing that we will need to know about Image and how it is different from container .
It also enhances security since a hacker can try to manipulate the codebase by giving malicious input but if all of it runs in a container , atmost the container would be down and not the original codebase .
Difference between Image and Container
An image is created once by the makers of the codebase which is basically the template which defines how a container should be realized . The image is run in an instance called container . Container is kind of running software whereas an image is basically a template. For the same image you can run multiple containers . The image is created once but containers can be created multiple times. Containers are the working instance of a read-only template called image .
The image can be used in any operating system be it Linux or Ubuntu or Windows and that's another use case of containerization .
Having understood all this , we can finally move to Docker which is one such way to implement containerization , there are others as well but this is the most popular .
What is Docker?
Docker or Docker Engine is an open source containerization technology for building , testing and deploying your containerized applications. There are three things associated with it :
Docker Engine
Docker CLI
Docker Hub - Github for Docker images
There is one more thing we need to know before see the things practically : Dockerfile.
What is Dockerfile?
The dockerfile describes your image. It should be prepared as follows:
First set the base (node /ubuntu ...)
Then add all the softwares to install
Then copy all the files which would be required in the container
Then give the commands for building the project .
Then expose the right set of ports
And then start the process
Okay ! So we are all set to understand it practically . We will be first looking into a very simple node project . You can clone it from here : https://github.com/career-tokens/DockerTutorial-Basics
Let's understand Docker using a project
I hope you have cloned it . For now , we would be looking into dockerTutorial1 .
Under it lets have a look at these two files:
package.json:
{
"name": "dockerTutorial1",
"version": "1.0.0",
"main": "index.js",
"license": "MIT",
"scripts": {
"build": "npx tsc -b",
"start": "node dist/index.js"
},
"dependencies": {
"express": "^4.18.2",
"tsc": "^2.0.4",
"typescript": "^5.3.3"
}
}
You can see the dependencies and equivalent commands for npm run build and npm run start (I have explained the usage of the commands in my typescript blog) .
src/index.ts:
import express from "express";
const app = express()
const port = 3000
app.get('/', (req, res) => {
res.send('Hello World!')
})
app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
A simple express application . A simple GET request to url+"/" and you get "Hello World!" as response .
I believe the project is pretty simple . Now lets understand how we would run the project without dockerisation .
Lets try to run it in the default way
First we would install the dependencies:
npm install
Then we would build (or lets say convert TS files to JS files specifically here) the project:
npm run build
You might see an error here that This is not the tsc command you are looking for . In case you do , install typescript separately again .
npm install typescript
Then run the index.js file from dist folder using the equivalent command:
npm run start
You will find your app running on port 3000 .
As the maker of the codebase you wouldn't want new developers to go through such hassle so what you would do is use Docker in your project .
Lets use Docker to run this project
As the maker of the project you would create a Dockerfile in the project . Lets see the Dockerfile we would use :
FROM node:14 # we would using node version 14 as the base
WORKDIR /usr/src/app #the working directory for docker would be /usr/src/app
COPY . . # copy all the folders and the files from the root project
RUN npm install # will run this command
RUN npm run build # will run this command
CMD ["npm", "start"] # will run npm start
EXPOSE 3000 # expose the port 3000
Now a new developer would first clone the project and then create the image from the dockerfile using the command (we can use PowerShell , cmd or other terminal to run the commands provided Docker Engine is running in the background):
docker build -t simpleproject .
This command builds the image with the name simpleproject using the Dockerfile present in the same directory (as suggested by .) .
Then the developer would run a container based on the image:
docker run simpleproject
It will say your app is running on port 3000 but when you actually go to localhost:3000 you will find page doesn't exist . Why is it so?
It is because we are accessing the port 3000 of our local machine and not the port 3000 of the container . We need to connect this two so that the request made to port 3000 of the local machine is forwarded to port 3000 of the container .
Lets do that now :
docker run -p 3000:3000 simpleproject
In case you are facing an error port already allocated then what you can do is use some other port of your local machine lets say 4000.
docker run -p 4000:3000 simpleproject
What happens is when you make GET request to localhost:4000 , it is forwarded to port 3000 and then you are able to see the reponse Hello World!
Lets discuss some other commands:
docker images
It shows the current docker images present in your machine .
The sizes of the images are quite heavy . For our current application , Node itself takes a lot of space .
Next is :
docker ps
It shows the current containers . You can kill a container by using its container id .
docker kill <container id>
One question which might arise is why we are not keeping the node modules in the codebase and copying them over to our image , rather why we are separately running npm install ?
Its because :
Sometimes npm install would install the dependencies based on the OS , you wouldn't want to kill that flexibility.
There might be some dev dependencies installed in node modules which are not required during production so we can simply omit them while preparing image using npm install --omit=dev
We will also need to implement Layer optimisation .
So let's discuss Layer Optimisation in little bit more detail .
Layer Optimisation
docker build builds the image layer by layer and is quite an expensive process so it would be better if we could reuse as many processes as possible from the cache to save time and resources .
Lets build the image again and you will see this:
Each of the steps is cached . Why ? Since wrt last time we built it , nothings changed and we could reuse everything from the cache .
Make a change in index.ts file lets say add a space or something . And run it again:
We will find this time we could reuse only the node and the working directory from the cache because we made some changes in the src folder . Well we can do better than this . Move to dockerTutorial2 project (also bring the terminal control there) .
See the new Dockerfile which is actually a better one compared to the previous one:
FROM node:14 # we would using node version 14 as the base
WORKDIR /usr/src/app #the working directory for docker would be /usr/src/app
COPY package*.json ./
COPY tsconfig.json ./
RUN npm install # will run this command
COPY . . # copy all the folders and the files from the root project
RUN npm run build # will run this command
CMD ["npm", "start"] # will run npm start
EXPOSE 3000 # expose the port 3000
Build the image and in the console you will see this :
There is only some caching because this project's image is built for the first time . Make some changes in the index.ts file and then build the image again:
Now you would see that we can reuse a heck lot of things from the cache . Most importantly we could cache npm install which took a good amount of time . Now one logical reason why it could be cached is that normally package.json and tsconfig.json remain unchanged and mostly the src folder(or some equivalent folder) is changed by the developer .
Wrapping up !
I hope I was able to explain the basics of Docker and some practical basic applications of it . Take time to marinate this and best of luck with your open-source journey .
Thanks for reading !