Kernel Buffer
Kernel buffers are memory the kernel allocates and manages (kernel space), used to stage/hold data for things like file I/O, networking, pipes, etc.
Some categories:
- Filesystem Page Cache
There’s this buffer for incoming UDP and TCP messages. If not processed fast enough, the buffer overflows and old messages are discarded.
UDP: packet arrives → kernel queues datagram → your recvfrom() reads it
Sender does sendto().
Receiver roughly does:
sock = socket(AF_INET, SOCK_DGRAM, 0)bind(sock, ip:port)(or0.0.0.0:portto accept on all interfaces)- Kernel receives UDP datagrams destined for that port and puts them in a queue for that socket.
- Your thread calls
recvfrom(sock, ...)- If there’s a datagram queued: you get it immediately.
- If not:
recvfrom()blocks (unless non-blocking).
UDP is message-oriented: one recv = one datagram (up to buffer size).
TCP: connection first → kernel buffers a byte-stream → your recv() reads bytes
TCP is different: it’s a stream, not discrete messages.
Server side:
listen_sock = socket(AF_INET, SOCK_STREAM, 0)bind(listen_sock, ip:port)listen(listen_sock)- Client connects → kernel completes the handshake
- Your code calls
accept(listen_sock)→ returnsconn_sock(a new socket per connection) - Kernel buffers incoming bytes for
conn_sock - Your code calls
recv(conn_sock, ...)to read bytes
Important: TCP recv() returns “however many bytes are currently available” (could be half a message, 3 messages, etc.). So apps add framing:
-
length-prefix (common in binary protocols)
-
delimiter (
\n) -
fixed-size messages
That’s why “ordering matters” in TCP: it preserves byte order. “Corruption” isn’t the main reason.
is every receive a system call? since dat resides in kernel buffer?
Usually yes: a recv() / recvfrom() / read() on a socket is a system call (or at least it logically is), because you’re asking the kernel to copy data out of the kernel’s socket buffer into your user-space memory.
“With normal sockets, receiving data involves syscalls like recv() because data is in kernel buffers and must be copied into user space. High-performance systems reduce syscall overhead by batching (recvmmsg), reading larger chunks, and using event loops; extreme low-latency setups may use kernel bypass.”