mirror of
https://libwebsockets.org/repo/libwebsockets
synced 2024-12-04 13:57:15 +00:00
286 lines
13 KiB
Markdown
286 lines
13 KiB
Markdown
# Considerations around Event Loops
|
|
|
|
Much of the software we use is written around an **event loop**. Some examples
|
|
|
|
- Chrome / Chromium, transmission, tmux, ntp SNTP... [libevent](https://libevent.org/)
|
|
- node.js / cjdns / Julia / cmake ... [libuv](https://archive.is/64pOt)
|
|
- Gstreamer, Gnome / GTK apps ... [glib](https://people.gnome.org/~desrt/glib-docs/glib-The-Main-Event-Loop.html)
|
|
- SystemD ... sdevent
|
|
- OpenWRT ... uloop
|
|
|
|
Many applications roll their own event loop using poll() or epoll() or similar,
|
|
using the same techniques. Another set of apps use message dispatchers that
|
|
take the same approach, but are for cases that don't need to support sockets.
|
|
Event libraries provide crossplatform abstractions for this functoinality, and
|
|
provide the best backend for their event waits on the platform automagically.
|
|
|
|
libwebsockets networking operations require an event loop, it provides a default
|
|
one for the platform (based on poll() for Unix) if needed, but also can natively
|
|
use any of the event loop libraries listed above, including "foreign" loops
|
|
already created and managed by the application.
|
|
|
|
## What is an 'event loop'?
|
|
|
|
Event loops have the following characteristics:
|
|
|
|
- they have a **single thread**, therefore they do not require locking
|
|
- they are **not threadsafe**
|
|
- they require **nonblocking IO**
|
|
- they **sleep** while there are no events (aka the "event wait")
|
|
- if one or more event seen, they call back into user code to handle each in
|
|
turn and then return to the wait (ie, "loop")
|
|
|
|
### They have a single thread
|
|
|
|
By doing everything in turn on a single thread, there can be no possibility of
|
|
conflicting access to resources from different threads... if the single thread
|
|
is in callback A, it cannot be in two places at the same time and also in
|
|
callback B accessing the same thing: it can never run any other code
|
|
concurrently, only sequentially, by design.
|
|
|
|
It means that all mutexes and other synchronization and locking can be
|
|
eliminated, along with the many kinds of bugs related to them.
|
|
|
|
### They are not threadsafe
|
|
|
|
Event loops mandate doing everything in a single thread. You cannot call their
|
|
apis from other threads, since there is no protection against reentrancy.
|
|
|
|
Lws apis cannot be called safely from any thread other than the event loop one,
|
|
with the sole exception of `lws_cancel_service()`.
|
|
|
|
### They have nonblocking IO
|
|
|
|
With blocking IO, you have to create threads in order to block them to learn
|
|
when your IO could proceed. In an event loop, all descriptors are set to use
|
|
nonblocking mode, we only attempt to read or write when we have been informed by
|
|
an event that there is something to read, or it is possible to write.
|
|
|
|
So sacrificial, blocking discrete IO threads are also eliminated, we just do
|
|
what we should do sequentially, when we get the event indicating that we should
|
|
do it.
|
|
|
|
### They sleep while there are no events
|
|
|
|
An OS "wait" of some kind is used to sleep the event loop thread until something
|
|
to do. There's an explicit wait on file descriptors that have pending read or
|
|
write, and also an implicit wait for the next scheduled event. Even if idle for
|
|
descriptor events, the event loop will wake and handle scheduled events at the
|
|
right time.
|
|
|
|
In an idle system, the event loop stays in the wait and takes 0% CPU.
|
|
|
|
### If one or more event, they handle them and then return to sleep
|
|
|
|
As you can expect from "event loop", it is an infinite loop alternating between
|
|
sleeping in the event wait and sequentially servicing pending events, by calling
|
|
callbacks for each event on each object.
|
|
|
|
The callbacks handle the event and then "return to the event loop". The state
|
|
of things in the loop itself is guaranteed to stay consistent while in a user
|
|
callback, until you return from the callback to the event loop, when socket
|
|
closes may be processed and lead to object destruction.
|
|
|
|
Event libraries like libevent are operating the same way, once you start the
|
|
event loop, it sits in an inifinite loop in the library, calling back on events
|
|
until you "stop" or "break" the loop by calling apis.
|
|
|
|
## Why are event libraries popular?
|
|
|
|
Developers prefer an external library solution for the event loop because:
|
|
|
|
- the quality is generally higher than self-rolled ones. Someone else is
|
|
maintaining it, a fulltime team in some cases.
|
|
- the event libraries are crossplatform, they will pick the most effective
|
|
event wait for the platform without the developer having to know the details.
|
|
For example most libs can conceal whether the platform is windows or unix,
|
|
and use native waits like epoll() or WSA accordingly.
|
|
- If your application uses a event library, it is possible to integrate very
|
|
cleanly with other libraries like lws that can use the same event library.
|
|
That is extremely messy or downright impossible to do with hand-rolled loops.
|
|
|
|
Compared to just throwing threads on it
|
|
|
|
- thread lifecycle has to be closely managed, threads must start and must be
|
|
brought to an end in a controlled way. Event loops may end and destroy
|
|
objects they control at any time a callback returns to the event loop.
|
|
|
|
- threads may do things sequentially or genuinely concurrently, this requires
|
|
locking and careful management so only deterministic and expected things
|
|
happen at the user data.
|
|
|
|
- threads do not scale well to, eg, serving tens of thousands of connections;
|
|
web servers use event loops.
|
|
|
|
## Multiple codebases cooperating on one event loop
|
|
|
|
The ideal situation is all your code operates via a single event loop thread.
|
|
For lws-only code, including lws_protocols callbacks, this is the normal state
|
|
of affairs.
|
|
|
|
When there is other code that also needs to handle events, say already existing
|
|
application code, or code handling a protocol not supported by lws, there are a
|
|
few options to allow them to work together, which is "best" depends on the
|
|
details of what you're trying to do and what the existing code looks like.
|
|
In descending order of desirability:
|
|
|
|
### 1) Use a common event library for both lws and application code
|
|
|
|
This is the best choice for Linux-class devices. If you write your application
|
|
to use, eg, a libevent loop, then you only need to configure lws to also use
|
|
your libevent loop for them to be able to interoperate perfectly. Lws will
|
|
operate as a guest on this "foreign loop", and can cleanly create and destroy
|
|
its context on the loop without disturbing the loop.
|
|
|
|
In addition, your application can merge and interoperate with any other
|
|
libevent-capable libraries the same way, and compared to hand-rolled loops, the
|
|
quality will be higher.
|
|
|
|
### 2) Use lws native wsi semantics in the other code too
|
|
|
|
Lws supports raw sockets and file fd abstractions inside the event loop. So if
|
|
your other code fits into that model, one way is to express your connections as
|
|
"RAW" wsis and handle them using lws_protocols callback semantics.
|
|
|
|
This ties the application code to lws, but it has the advantage that the
|
|
resulting code is aware of the underlying event loop implementation and will
|
|
work no matter what it is.
|
|
|
|
### 3) Make a custom lws event lib shim for your custom loop
|
|
|
|
Lws provides an ops struct abstraction in order to integrate with event
|
|
libraries, you can find it in ./includes/libwebsockets/lws-eventlib-exports.h.
|
|
|
|
Lws uses this interface to implement its own event library plugins, but you can
|
|
also use it to make your own customized event loop shim, in the case there is
|
|
too much written for your custom event loop to be practical to change it.
|
|
|
|
In other words this is a way to write a customized event lib "plugin" and tell
|
|
the lws_context to use it at creation time. See [minimal-http-server.c](https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/http-server/minimal-http-server-eventlib-custom/minimal-http-server.c)
|
|
|
|
### 4) Cooperate at thread level
|
|
|
|
This is less desirable because it gives up on unifying the code to run from a
|
|
single thread, it means the codebases cannot call each other's apis directly.
|
|
|
|
In this scheme the existing threads do their own thing, lock a shared
|
|
area of memory and list what they want done from the lws thread context, before
|
|
calling `lws_cancel_service()` to break the lws event wait. Lws will then
|
|
broadcast a `LWS_CALLBACK_EVENT_WAIT_CANCELLED` protocol callback, the handler
|
|
for which can lock the shared area and perform the requested operations from the
|
|
lws thread context.
|
|
|
|
### 5) Glue the loops together to wait sequentially (don't do this)
|
|
|
|
If you have two or more chunks of code with their own waits, it may be tempting
|
|
to have them wait sequentially in an outer event loop. (This is only possible
|
|
with the lws default loop and not the event library support, event libraries
|
|
have this loop inside their own `...run(loop)` apis.)
|
|
|
|
```
|
|
while (1) {
|
|
do_lws_wait(); /* interrupted at short intervals */
|
|
do_app_wait(); /* interrupted at short intervals */
|
|
}
|
|
```
|
|
|
|
This never works well, either:
|
|
|
|
- the whole thing spins at 100% CPU when idle, or
|
|
|
|
- the waits have timeouts where they sleep for short periods, but then the
|
|
latency to service on set of events is increased by the idle timeout period
|
|
of the wait for other set of events
|
|
|
|
## Common Misunderstandings
|
|
|
|
### "Real Men Use Threads"
|
|
|
|
Sometimes you need threads or child processes. But typically, whatever you're
|
|
trying to do does not literally require threads. Threads are an architectural
|
|
choice that can go either way depending on the goal and the constraints.
|
|
|
|
Any thread you add should have a clear reason to specifically be a thread and
|
|
not done on the event loop, without a new thread or the consequent locking (and
|
|
bugs).
|
|
|
|
### But blocking IO is faster and simpler
|
|
|
|
No, blocking IO has a lot of costs to conceal the event wait by blocking.
|
|
|
|
For any IO that may wait, you must spawn an IO thread for it, purely to handle
|
|
the situation you get blocked in read() or write() for an arbitrary amount of
|
|
time. It buys you a simple story in one place, that you will proceed on the
|
|
thread if read() or write() has completed, but costs threads and locking to get
|
|
to that.
|
|
|
|
Event loops dispense with the threads and locking, and still provide a simple
|
|
story, you will get called back when data arrives or you may send.
|
|
|
|
Event loops can scale much better, a busy server with 50,000 connections active
|
|
does not have to pay the overhead of 50,000 threads and their competing for
|
|
locking.
|
|
|
|
With blocked threads, the thread can do no useful work at all while it is stuck
|
|
waiting. With event loops the thread can service other events until something
|
|
happens on the fd.
|
|
|
|
### Threads are inexpensive
|
|
|
|
In the cases you really need threads, you must have them, or fork off another
|
|
process. But if you don't really need them, they bring with them a lot of
|
|
expense, some you may only notice when your code runs on constrained targets
|
|
|
|
- threads have an OS-side footprint both as objects and in the scheduler
|
|
|
|
- thread context switches are not slow on modern CPUs, but have side effects
|
|
like cache flushing
|
|
|
|
- threads are designed to be blocked for arbitrary amounts of time if you use
|
|
blocking IO apis like write() or read(). Then how much concurrency is really
|
|
happening? Since blocked threads just go away silently, it is hard to know
|
|
when in fact your thread is almost always blocked and not doing useful work.
|
|
|
|
- threads require their own stack, which is on embedded is typically suffering
|
|
from a dedicated worst-case allocation where the headroom is usually idle
|
|
|
|
- locking must be handled, and missed locking or lock order bugs found
|
|
|
|
### But... what about latency if only one thing happens at a time?
|
|
|
|
- Typically, at CPU speeds, nothing is happening at any given time on most
|
|
systems, the event loop is spending most of its time in the event wait
|
|
asleep at 0% cpu.
|
|
|
|
- The POSIX sockets layer is disjoint from the actual network device driver.
|
|
It means that once you hand off the packet to the networking stack, the POSIX
|
|
api just returns and leaves the rest of the scheduling, retries etc to the
|
|
networking stack and device, descriptor queuing is driven by interrupts in
|
|
the driver part completely unaffected by the event loop part.
|
|
|
|
- Passing data around via POSIX apis between the user code and the networking
|
|
stack tends to return almost immediately since its onward path is managed
|
|
later in another, usually interrupt, context.
|
|
|
|
- So long as enough packets-worth of data are in the network stack ready to be
|
|
handed to descriptors, actual throughput is completely insensitive to jitter
|
|
or latency at the application event loop
|
|
|
|
- The network device itself is inherently serializing packets, it can only send
|
|
one thing at a time. The networking stack locking also introduces hidden
|
|
serialization by blocking multiple threads.
|
|
|
|
- Many user systems are decoupled like the network stack and POSIX... the user
|
|
event loop and its latencies do not affect backend processes occurring in
|
|
interrupt or internal thread or other process contexts
|
|
|
|
## Conclusion
|
|
|
|
Event loops have been around for a very long time and are in wide use today due
|
|
to their advantages. Working with them successfully requires understand how to
|
|
use them and why they have the advantages and restrictions they do.
|
|
|
|
The best results come from all the participants joining the same loop directly.
|
|
Using a common event library in the participating codebases allows completely
|
|
different code can call each other's apis safely without locking.
|