内核学习-->时间管理(中)

##6.时间管理

###6.4时钟事件抽象在硬件时钟源章节，讲解了中断定时器的基本原理和作用，提到了单触发模式触发中断的方式，利用这种定时器的特性，为了更好的支持高精度的软件定时器和通用的时间管理框架，linux提出了一个概念叫时钟事件，它的数据结构为clock_event_device。它可以在系统需要的时候产生定时器时间到期中断，这个时间间隔完全取决于编程者。

####6.4.1数据结构

/**
 * struct clock_event_device - clock event device descriptor
 * @event_handler:	Assigned by the framework to be called by the low
 *			level handler of the event source
 * @set_next_event:	set next event function using a clocksource delta
 * @set_next_ktime:	set next event function using a direct ktime value
 * @next_event:		local storage for the next event in oneshot mode
 * @max_delta_ns:	maximum delta value in ns
 * @min_delta_ns:	minimum delta value in ns
 * @mult:		nanosecond to cycles multiplier
 * @shift:		nanoseconds to cycles divisor (power of two)
 * @mode:		operating mode assigned by the management code
 * @features:		features
 * @retries:		number of forced programming retries
 * @set_mode:		set mode function
 * @broadcast:		function to broadcast events
 * @min_delta_ticks:	minimum delta value in ticks stored for reconfiguration
 * @max_delta_ticks:	maximum delta value in ticks stored for reconfiguration
 * @name:		ptr to clock event name
 * @rating:		variable to rate clock event devices
 * @irq:		IRQ number (only for non CPU local devices)
 * @bound_on:		Bound on CPU
 * @cpumask:		cpumask to indicate for which CPUs this device works
 * @list:		list head for the management code
 * @owner:		module reference
 */
struct clock_event_device {
	void			(*event_handler)(struct clock_event_device *);
	int			(*set_next_event)(unsigned long evt,
						  struct clock_event_device *);
	int			(*set_next_ktime)(ktime_t expires,
						  struct clock_event_device *);
	ktime_t			next_event;
	u64			max_delta_ns;
	u64			min_delta_ns;
	u32			mult;
	u32			shift;
	enum clock_event_mode	mode;
	unsigned int		features;
	unsigned long		retries;

	void			(*broadcast)(const struct cpumask *mask);
	void			(*set_mode)(enum clock_event_mode mode,
					    struct clock_event_device *);
	void			(*suspend)(struct clock_event_device *);
	void			(*resume)(struct clock_event_device *);
	unsigned long		min_delta_ticks;
	unsigned long		max_delta_ticks;

	const char		*name;
	int			rating;
	int			irq;
	int			bound_on;
	const struct cpumask	*cpumask;
	struct list_head	list;
	struct module		*owner;
} ____cacheline_aligned;

event_handler 一个回调函数指针，在时间中断到来时，利用该回调实现对时钟事件的处理。该字段的初始化在linux调用时间框架层，函数种类比较多，后续讲解。

set_next_event 设置下一次时间触发的时间，使用类似于clocksource的cycle计数值。该函数最终会调用底层设置硬件定时器单触发模式，并根据上层传下的cycle值设置硬件定时器的计数器寄存器，从而最终在计数完预期的cycle之后单次触发一次中断。

set_next_ktime 设置下一次触发的时间，使用ktime时间作为参数而不是cycle，我们不用此函数

max_delta_ns 可设置的最大时间差，单位是纳秒，在硬件定时器原理章节讲解过硬件计数器不能随意值，在32位系统中它最多计数2^32个cycle，接着会回绕，所以根据硬件计数器的bit数可以设置这个最大值。

min_delta_ns 可设置的最小时差，单位是纳秒，同理

mult shift 与clocksource中的类似，只不过是用于把纳秒转换为cycle。

mode 该时钟事件设备的工作模式，两种主要的工作模式分别如下：参考上面的硬件定时器原理：

CLOCK_EVT_MODE_PERIODIC	周期触发模式，设置后按给定的周期不停地触发事件；
CLOCK_EVT_MODE_ONESHOT	单次触发模式，只在设置好的触发时刻触发一次；

set_mode 函数指针，用于设置时钟事件设备的工作模式，当上层传下来不同的模式的时候，我们最终会去设置硬件定时器的时间模式寄存器。

rating 表示该设备的精确等级

list 系统中注册的时钟事件设备用该字段挂在全局链表变量clockevent_devices上。

####6.4.2内核代码

注册与配置接口

内核用函数clockevents_config_and_register来注册一个时钟事件设备

  /**
   * clockevents_config_and_register - Configure and register a clock event device
   * @dev:	device to register
   * @freq:	The clock frequency
   * @min_delta:	The minimum clock ticks to program in oneshot mode
   * @max_delta:	The maximum clock ticks to program in oneshot mode
   *
   * min/max_delta can be 0 for devices which do not support oneshot mode.
   */
  void clockevents_config_and_register(struct clock_event_device *dev,
  				     u32 freq, unsigned long min_delta,
  				     unsigned long max_delta)
  {
  	dev->min_delta_ticks = min_delta;
  	dev->max_delta_ticks = max_delta;
  	clockevents_config(dev, freq);
  	clockevents_register_device(dev);
  }
  EXPORT_SYMBOL_GPL(clockevents_config_and_register);

clockevents_config函数功能主要是计算cycle到ns的转化因子mult和shift，方法请参考clocksource章节，原理相同

clockevents_register_device函数负责注册一个具体的clock_event_device到内核中。

设置操作模式接口

该函数负责设置硬件定时器的两周触发模式，如果硬件定时器硬件本身支持周期性的触发中断，那么就无需每次都设置set_next_event，如果硬件定时器设置单触发模式，则在后续操作中需要调用set_next_event接口设置下一次到期的cycle，在linux中这种操作称为reprogram，以便cycle递减至零的时候触发一次中断。

  /**
   * clockevents_set_mode - set the operating mode of a clock event device
   * @dev:	device to modify
   * @mode:	new mode
   *
   * Must be called with interrupts disabled !
   */
  void clockevents_set_mode(struct clock_event_device *dev,
  				 enum clock_event_mode mode)
  {
  	if (dev->mode != mode) {
  		dev->set_mode(mode, dev);
  		dev->mode = mode;
    
  		/*
  		 * A nsec2cyc multiplicator of 0 is invalid and we'd crash
  		 * on it, so fix it up and emit a warning:
  		 */
  		if (mode == CLOCK_EVT_MODE_ONESHOT) {
  			if (unlikely(!dev->mult)) {
  				dev->mult = 1;
  				WARN_ON(1);
  			}
  		}
  	}
  }

设置下一次触发时间

如果硬件定时器设置单触发模式，则在后续操作中需要调用该函数接口设置下一次到期的cycle，以便cycle递减至零的时候触发一次中断。

  /**
   * clockevents_program_event - Reprogram the clock event device.
   * @dev:	device to program
   * @expires:	absolute expiry time (monotonic clock)
   * @force:	program minimum delay if expires can not be set
   *
   * Returns 0 on success, -ETIME when the event is in the past.
   */
  int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
  			      bool force)
  {
  	unsigned long long clc;
  	int64_t delta;
  	int rc;
    
  	if (unlikely(expires.tv64 < 0)) {
  		WARN_ON_ONCE(1);
  		return -ETIME;
  	}
    
  	dev->next_event = expires;
    
  	if (dev->mode == CLOCK_EVT_MODE_SHUTDOWN)
  		return 0;
    
  	/* Shortcut for clockevent devices that can deal with ktime. */
  	if (dev->features & CLOCK_EVT_FEAT_KTIME)
  		return dev->set_next_ktime(expires, dev);
    
  	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
  	if (delta <= 0)
  		return force ? clockevents_program_min_delta(dev) : -ETIME;
    
  	delta = min(delta, (int64_t) dev->max_delta_ns);
  	delta = max(delta, (int64_t) dev->min_delta_ns);
    
  	clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
  	rc = dev->set_next_event((unsigned long) clc, dev); //
    
  	return (rc && force) ? clockevents_program_min_delta(dev) : rc;
  }

####6.4.3平台实现以三星的驱动为例子：

static void __init samsung_clockevent_init(void)
{
	unsigned long pclk;
	unsigned long clock_rate;
	unsigned int irq_number;

	pclk = clk_get_rate(pwm.timerclk);

	samsung_timer_set_prescale(pwm.event_id, pwm.tscaler_div);
	samsung_timer_set_divisor(pwm.event_id, pwm.tdiv);

	clock_rate = pclk / (pwm.tscaler_div * pwm.tdiv);
	pwm.clock_count_per_tick = clock_rate / HZ;

	time_event_device.cpumask = cpumask_of(0);
	clockevents_config_and_register(&time_event_device,
						clock_rate, 1, pwm.tcnt_max);

	irq_number = pwm.irq[pwm.event_id];
	setup_irq(irq_number, &samsung_clock_event_irq);

	if (pwm.variant.has_tint_cstat) {
		u32 mask = (1 << pwm.event_id);
		writel(mask | (mask << 5), pwm.base + REG_TINT_CSTAT);
	}
}

static struct clock_event_device time_event_device = {
	.name		= "samsung_event_timer",
	.features	= CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
	.rating		= 200,
	.set_next_event	= samsung_set_next_event,
	.set_mode	= samsung_set_mode,
	.resume		= samsung_clockevent_resume,
};

从驱动代码片段我们可以知道无论是clock_event_device还是clocksource都需要平台相关的代码对其进行初始化，然后注册到内核中去。

###6.5周期事件 linux内核的周期事件处理函数，无论是低分辨率定时器还是高分辨率定时器都需要支持周期事件的处理。该函数每1/HZ秒执行一次。

####6.5.1 do_timer do_timer函数需要处理全局范围内的任务，包括更新jiffies，更新墙上时间，计算系统负载等工作，它一般由一个选定的CPU负责做这件事情，和其他CPU无关。

/*
 * Must hold jiffies_lock
 */
void do_timer(unsigned long ticks)
{
	jiffies_64 += ticks;
	calc_global_load(ticks);
}

####6.5.2 update_process_times 该函数用来更新进程在用户态和内核态所消耗的CPU时间，触发低精度内核定时器，和做一些调度的辅助工作。

这里重点强调run_local_timers()函数，该函数通过raise_softirq(TIMER_SOFTIRQ);触发TIMER_SOFTIRQ软中断，而该软中断的处理函数为：run_timer_softirq，最终在调用软中断的时机run_timer_softirq函数将被执行，并找到一个到期的低精度定时器，执行到期操作。 hrtimers_run_queues()函数会执行__run_hrtimers，从而执行具体的hrtimer处理函数。

/*
 * Called from the timer interrupt handler to charge one tick to the current
 * process.  user_tick is 1 if the tick is user time, 0 for system.
 */
void update_process_times(int user_tick)
{
	struct task_struct *p = current;

	/* Note: this timer irq context must be accounted for as well. */
	account_process_tick(p, user_tick);
	run_local_timers(); //
	rcu_check_callbacks(user_tick);
#ifdef CONFIG_IRQ_WORK
	if (in_irq())
		irq_work_tick();
#endif
	scheduler_tick();
	run_posix_cpu_timers(p);
}

/*
 * Called by the local, per-CPU timer interrupt on SMP.
 */
void run_local_timers(void)
{
	hrtimer_run_queues();   //
	raise_softirq(TIMER_SOFTIRQ);   //
}

基于低分辨率定时器的低精度定时器timer_list的通用处理函数：__run_timers(base)，负责查找到期的定时器并执行定时器的function。

/*
 * This function runs timers and the timer-tq in bottom half context.
 */
static void run_timer_softirq(struct softirq_action *h)
{
	struct tvec_base *base = __this_cpu_read(tvec_bases);

	hrtimer_run_pending();  //

	if (time_after_eq(jiffies, base->timer_jiffies))
		__run_timers(base); //
}

hrtimer_run_pending()负责定时器切换。

如果配置了高分辨率定时器，并已经切换到高分辨率定时器的情况下，该函数什么也不做，直接返回。

如果配置了高分辨率定时器，并还没有切换到高分辨率定时器的情况下，该函数将执行定时器切换工作。

如果没有配置高分辨率定时器，该函数负责低分辨率定时器的HZ模式和NO HZ模式之间的切换。

所以定时器之间的切换，HZ模式和NO HZ模式之间的切换时机是在处理周期事件的中断处理函数的中断下半部TIMER_SOFTIRQ软中断中执行的。

/*
 * Called from timer softirq every jiffy, expire hrtimers:
 *
 * For HRT its the fall back code to run the softirq in the timer
 * softirq context in case the hrtimer initialization failed or has
 * not been done yet.
 */
void hrtimer_run_pending(void)
{
	if (hrtimer_hres_active())
		return;

	/*
	 * This _is_ ugly: We have to check in the softirq context,
	 * whether we can switch to highres and / or nohz mode. The
	 * clocksource switch happens in the timer interrupt with
	 * xtime_lock held. Notification from there only sets the
	 * check bit in the tick_oneshot code, otherwise we might
	 * deadlock vs. xtime_lock.
	 */
	if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
		hrtimer_switch_to_hres();
}

下面看hrtimer_run_queues函数，该函数在开启高精度定时器的时候直接返回，什么也不做。因为高精度的定时器模型中处理hrtimer节点的是hrtimer_interrupt。

/*
 * Called from hardirq context every jiffy
 */
void hrtimer_run_queues(void)
{
	struct timerqueue_node *node;
	struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases);
	struct hrtimer_clock_base *base;
	int index, gettime = 1;

	if (hrtimer_hres_active())  //
		return;

	for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
		base = &cpu_base->clock_base[index];
		if (!timerqueue_getnext(&base->active))
			continue;

		if (gettime) {
			hrtimer_get_softirq_time(cpu_base);
			gettime = 0;
		}

		raw_spin_lock(&cpu_base->lock);

		while ((node = timerqueue_getnext(&base->active))) {
			struct hrtimer *timer;

			timer = container_of(node, struct hrtimer, node);
			if (base->softirq_time.tv64 <=
					hrtimer_get_expires_tv64(timer))
				break;

			__run_hrtimer(timer, &base->softirq_time);
		}
		raw_spin_unlock(&cpu_base->lock);
	}
}

jekyll 1

idea 2

Assembly 20

Linux 4

aws 1

Vim 1

Mac 1

Electricity 1

embedded 32

socket 1

kernel 10

python 5

QEMU 5

buildroot 1

Arm64 5

Debian 2

ARM 1

Debug 1

GDB 1

REHL 1

power 1

Ubuntu 1

SSH 1

id_rsa 1

Systemd 1

Gsnova 1

Heroku 1

Perl 1

Math 1

灰度发布 SpringCloud 2

Zuul SpringCloud 2

分布式事务 7

java 1

内核学习-->时间管理(中)