|Posted by Pype @ 09.12.01
Still under cooking ...
Related cakes :hot-swapping scheduler, process manager
Even if Clicker is currently unable to run on multiprocessor hardware, the scheduler has been designed to simplify extension to such systems. At early design stage, the multitasking system has been splitted in two logical components:
Every dispatcher only deals with a single processor where the scheduler has to have a global knowledge of the system. Information exchange between scheduler and dispatchers is made through wait queues where kThreads structure are stored.
- the scheduler that deals with tasks priorities and defines a global schedule for the whole system. It will define the actual order of execution for each processing unit and will - in future version - also deal with cpu time reservation for some high-priority threads to ensure we'll keep enough resource to match deadlines of multimedia/realtime components.
- the dispatchers that perform tasks switching for a giving processing unit, following the planning defined by the single scheduler. Dispatchers are some kind of Hardware Abstraction Layer for task-switching and are -de facto- architecture dependents.
In Clicker model of scheduling, each dispatcher keeps the local control of its threads (and, for instance, will have its own quantum interrupt through a local APIC), while the scheduler acts as a shared component - optionnally needing inter-processor signals to co-ordinate the whole system.
model view of Clicker scheduler managing 3 distincts CPUs
This component deals with the "Lordy" part of tasks management . If dispatcher/scheduler interface is correctly respected, a given scheduler can potentially be ported to several architectures with a very low work cost.
In addition to the dispatcher/scheduler interface, the scheduler can be controlled through generic thread-SPI calls like threadActivate(), threadSuspend() or threadResched().Any thread insertion within the system will go through the scheduler, which will then select a dispatcher to run that thread and place the kThread structure in the dispathcer's queue at the right position.
Clicker microkernel is distributed with a minimal Round Robin scheduler for monoprocessors systems. If another kind of scheduling policy is needed, one can dynamically replace the scheduler by another server that implements "services.sys.scheduler:default" interface.
The default scheduler's behaviour on threads activation is to reset the thread's quantum to 0 (which means it cannot accumulate unused time in the past for a longer quantum in the future) and put it at the end of the dispatching queue.
In the generic version, Clicker scheduler can deal with several dispatchers, each of them being materialized by a kDispatcher structure. All these dispatchers can be discovered by the scheduler by looking at "services-var.sys.scheduler.dispatchers.*". This KDS group has one entry by dispatcher which name is the logical CPU name (usually "CPUx" , x ranging from 0 to nbCpus-1) .
Dispatcher workingThe dispatcher just leaves control to a thread (the active thread for that dispatcher) until a signal tells it it has to switch to the next thread. This signal - a call to dispatchNext() - can come from the thread itself (when waiting for an event or a synchronization object), from the dispatcher's internal clock (when the active thread has exceeded its running quantum) or from some other external component: the dispatcher don't take in account the source of that signal.
However, a distinction is made between two cases:
- the active thread stills remain executable on the CPU and the backme flag is set to 1. The dispatcher will then deliver the thread to the scheduler so that it get pushed back in a wait queue.
- the active thread is suspending (and therefore don't need the CPU anymore) and resets the backme flag to 0. The active thread is then responsible to have been stored in some wait queue for a synchronization object that will wake it up later or to have a waking event programmed. If nothing similar has been made, the thread will just be dropped by the system and becomes unactivable - still consuming memory resources for the rest of the system lifetime.
schematic view of a dispatcher
Each dispatcher actually owns two waitqueues where a scheduler can push threads for execution: the normal queue and the urgent queue. This latter one has the priority on the former one, which means a thread in the normal queue won't be processed until threads are pending in the urgent queue. Within a given queue, the execution order is the one defined by the scheduler (i.e. dispatcher just executes threads with a fifo policy according to the queue order).
Rather than re-computing next thread after at each switching, the dispatcher just process head-of-queue thread and scheduler defines queue order so that it reflects wanted policy. In simplest system, the position computation is made when pushback() is called, but we could imagine systems where the computation is pinned on a given CPU and can be deferred so that the scheduler computes a chunk of threads at a time to lower switching overheads.
The 'urgent' queue is reserved for thread that have received an event allowing them to have priviledged processing. We hope this scheme will help to improve the system' interactivity by giving priority to events response on batch processing.
Unlike most new OS projects, first versions of Clicker were already designed to support preemptive multitasking (rather than relying on user process cooperation for switching). The whole kernel is preemptive, which means the system can decide to take the processor back to a thread even if it is in kernel mode. This leads to an architecture where several threads can be running in kernel mode and thus lowers the system call latency if calls are made to different parts of the system (of course, if both try to allocate system memory, a synchronization mechanism will force one of the threads to wait until the other one is done with the memory component :-).
In such a system, it's mandatory to be able to avoid some task switching when the processor is executing critical section in kernel code (manipulating threads lists, etc.) When these sections are small, we can easily solve the problem by desactivating interrupts (on monoprocessor systems, at least). This technique is used in Clicker kernel through the kMutex object and its access functions kSysLock() and kSysUnlock(). They are used when critical section isn't longer than a few lines and when busy waiting could be used on multi-processor systems; among other, scheduler state coherency makes wide use of kMutex. This technique is also used as a basic mechanism to implement more complex synchronization objects like semaphores, messages queues, etc.
The main switching reason is the expiration of the quantum timer of the active thread. Each thread gets its quantum field initialised by the dispatcher when it becomes active. This field will be decremented each time a clock tick occurs until it reaches 0. To initialize the quantum, dispatchers add a constant value (set up by the scheduler through kDispatchParams structure it gives to the dispatcher) to the previous value of the thread's quantum (also set up by the scheduler when the thread is pushed in the wait queue). This gives the scheduler the opportunity to increase or reduce a thread's quantum by virtually any amount.
If a thread releases the processor before is quantum is over, the remaining quantum will be stored in the kThread structure, so that the scheduler can use it to reschedule the thread earlier (due to it's nicety), or give it a longer quantum next time ... or whatever it judges interresting to do with it. Clicker default scheduler just ignores that values and force this field to 0.
Implementing task switchesIn IA-32 architecture, it's enough to perform a far jump instruction to a TSS selector to force a task switch. However, the instant where this instruction is called is very important for system stability - especially when this instruction is executed within an interruption handler.
One must know that, on PC architecture, an interrupt handler must end by a I/O command to the Programmable Interrupt Controller (8259a or friend) that the current interrupt request has been processed before it can deliver other interrupt requests. If a task switch occurs when the PIC is waiting for that command and if it is switched to a thread that made a voluntary switch (and thus has no interrupt-reenable code to execute before resuming its normal execution), the interrupts will remain disabled for a very long time (maybe forever :(
In order to avoid such cases, a switch request ( do_switch() request to the dispatcher) which happens within an interrupt handler, it gets deferred until the handler has send and "EndOfInterrupt" command to the PIC. Such switching ask a bit more state than usual switches:
- the interrupting counter is incremented and decremented by processIrqList() to reflect the amount of nested interrupts. If interrupting!=0, the dispatcher knows the current thread is servicing an interrupt handler.
- the toswitch variable holds the selector of a pending TSS switch. No switch occurs when this value is identical to the current TaskRegister. Note that due to the deferred switch technique, we must set this variable to the target TSS even if the switch isn't deferred or we'll risk to have a fake deferred switch to the previous thread by the system.
The deferred switches mechanism occurs in the system core (control.asm) which has been modified to face this case. It's rather hackish to perform switch control that way - and it doesn't scale well to multiprocessor architecture.
If some endOfInterrupt macro is written, we could extend processIrqList so that it calls 'toswitch' instead of the control.asm stuff.
It could also be handy if we could program some handlers that are called by processIrqList after EOI command has been sent.
Idle task problemSomething has to be done when a dispatcher has no more threads to dispatch, i.e. when no more thread is in the "READY" state. In such cases, we'd like the dispatcher to wait until a new thread gets ready and then gently switch to that one ...
The problem we face is that the dispatcher has no private thread: it always execute in the context of its active thread... Executing a hlt instruction (which puts the processor in idle until the next interrupt request) in the dispatcher code isn't enough and it could even become dangerous when the dispatcher is called within an interrupt handler.
Even with interrupts re-enabled before hlt is called, interrupt calls will nests on the last active thread's system stack until this stack overflows.
The solution choosed within Clicker microkernel to issue this problem is to give each dispatcher an idle thread, which will simply execute hlt commands in a loop, optionnally updating some accounting variables (counting idle ticks).
Note that this idle thread is never given to the scheduler for processing: it's an internal task that is called by the dispatcher when all of its waitqueues are empty. Moreover, its quantum time is only 1, which means the queue will be polled every millisecond (on Clicker, dispatcher clock tick rate is set to 1KHz). This means that we'll have to force a preemption by the scheduler or some event manager if we want a response time under 1ms for some event).