VM (Virtual Machine) dùng hypervisor (VMware, KVM, VirtualBox) virtualize toàn bộ hardware, mỗi VM có OS kernel riêng — strong isolation, có thể chạy Windows trên Linux host; overhead: mỗi VM tốn hàng GB RAM cho OS, boot time hàng phút. Container share kernel của host OS, chỉ isolate userspace — nhỏ hơn (MB thay vì GB), start trong milliseconds, dense packing (hàng trăm containers/host).
Container không phải magic: đó chỉ là Linux process với namespace isolation và cgroup resource limits. Docker thực ra tạo:
- Namespaces: PID namespace (container có PID 1 riêng), Network namespace (interface riêng), Mount namespace (filesystem riêng), UTS (hostname riêng), IPC, User namespace
- cgroups giới hạn CPU/memory/disk I/O/network
- Union filesystem (overlay2) cho image layers.
docker runthực sự gọiclone(CLONE_NEWPID | CLONE_NEWNET | ...)
Bảo mật: container escape là lỗ hổng khai thác syscall để thoát namespace; privileged container nguy hiểm vì bỏ qua nhiều restriction.
A VM (Virtual Machine) uses a hypervisor (VMware, KVM, VirtualBox) to virtualize the full hardware stack; each VM runs its own OS kernel — providing strong isolation and the ability to run Windows on a Linux host; overhead: each VM consumes gigabytes of RAM for its OS and takes minutes to boot. Containers share the host OS kernel and only isolate the userspace — much smaller (megabytes instead of gigabytes), start in milliseconds, and allow dense packing (hundreds of containers per host).
Containers are not magic: they are simply Linux processes with namespace isolation and cgroup resource limits. When Docker creates a container it sets up:
- Namespaces — PID namespace (the container has its own PID 1), Network namespace (its own network interface), Mount namespace (its own filesystem view), UTS (its own hostname), IPC, and User namespaces
- cgroups to limit CPU, memory, disk I/O, and network
- a union filesystem (overlay2) for layered images.
docker runessentially callsclone(CLONE_NEWPID | CLONE_NEWNET | ...)
Security: container escapes exploit syscalls to break out of namespace isolation; privileged containers are dangerous because they bypass many of these restrictions.