java打印对象内存地址 分布式事务 事务消息 分布式事务 几种解决方案 分布式事务-Seata 分布式事务-Seata 分布式事务-LCN-TCC 分布式事务-LCN 分布式事务-消息队列-定时任务-本地事件表 Zuul网关实战02 Zuul网关实战01 灰度发布落地实战2 灰度发布落地实战1 Gsnova on Heroku build Systemd Debian system initialization manage multi id_rsa ubuntu 64bits cannot run 32bits app REHL power auditing Debug Assembly for ARMv8 on QEMU ARM体系结构--寄存器 Run Debian iso on QEMU ARMv8 QEMU ARM64 guide cross compiler install buildroot install QEMU install python入门--数据结构 python入门--内置数据类型 python入门--类 异常 python入门--条件表达式 方法 python入门--数字 字符串 数组 RTC驱动分析 块设备驱动 TCP UDP socket 触摸屏驱动 USB驱动 LED按键中断 LCD 驱动 驱动信号 根文件系统 实验 内核实验 字符设备驱动程序 绪论 uboot 实验 LCD 实验 系统时钟和UART 中断控制器 Nand Flash控制器 MMU 实验 储存管理器实验 GPIO实验 点亮LED 编译加载驱动 制作烧写内核 dnw替代方法 MINI2440 TQ2440安装配套Linux 使用NFS 制作烧写跟文件系统 grub引导Windows 烧写裸版程序-linux Ubuntu 网络没有 eth0 Linux自动挂载 烧写裸板程序 电路基础 Mac词典 Vim插件 Assembly 综合研究 Assembly 指令总结 Assembly 直接定址表 Assembly 使用BIOS进行键盘输入和磁盘读写 Assembly 外中断 Assembly 端口 Assembly int指令 Assembly 内中断 Assembly 标志寄存器 Assembly 转移指令的原理 Assembly Call和ret指令 Assembly 数据处理两个基本问题 Assembly 灵活定位内存地址 Assembly 包含多个段的程序 Assembly [bx] loop Assembly 第一个程序 Assembly 寄存器 (内存访问) Assembly 寄存器 AWS VPN with EC2 hidden file in picture(linux) Assembly 基础 idea shortcuts 常用快捷键 idea plugin folder install ruby and jekyll

内核学习-->时间管理(下)

2015年06月03日

##6.时间管理 ###6.6内核定时器

Alt text

传统的内核定时器支持周期性产生中断,中断频率为HZ,一般这个值的取值范围100-1000HZ,发生一次时钟中断我们称之为一个tick,我们称这样的定时器为低分辨率定时器,因此每个tick的时间间隔为1/HZ秒,大约为1-10ms的精度。基于这种定时器的内核的时间精度一般为ms级别,精度不高,现在的移动设备中一般定时精度都要求为ns级别。系统在睡眠的时间里,也需要周期性的产生中断,这对嵌入式设备的功耗来说无疑会很大。

而高分辨率定时器,则有些按需触发中断的特性,只有在系统有事件到期需要处理的时候才会触发一次中断,然后去执行这个到期事件的处理函数,从上图可知这个高分辨率定时器并不是一个周期定时设备。

####6.6.1定时器内核配置 前文中我们阐述过,硬件定时器的两种工作模式:周期模式触发、单次触发。他们是整个linux内核定时器正常运作的基础。

linux提供了两种使能硬件定时器工作模式的参数,该参数定义在clock_event_device结构体中,变量名称为mode和features:

enum clock_event_mode	mode;
unsigned int		features;
/*
 * Clock event features
 */
#define CLOCK_EVT_FEAT_PERIODIC		0x000001
#define CLOCK_EVT_FEAT_ONESHOT		0x000002
#define CLOCK_EVT_FEAT_KTIME		0x000004

如果硬件定时器支持两种工作模式,那么必须在初始化clock_event_device的结构体的时候对features进行赋值,一般的SOC都同时支持两种工作模式,所以features设置为:

features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ ONESHOT

features是声明了硬件定时器支持工作模式的能力,但是具体的使能这种工作模式则是通过clock_event_device的设置工作模式接口set_next_event函数实现,请参考“时钟事件抽象”章节。该函数的参数为mode,也就是说程序首先得到features参数,判断硬件定时器支持的工作模式,然后初始化相应的mode参数,在把这个mode参数传递给set_next_event函数,最终根据设置的mode对硬件定时器进行具体的硬件编程。

由于高分辨率定时器的引入和内核需要提供对传统的低分辨率定时器的向后兼容,使内核中的通用时间子系统的配置变得相对复杂。下图为总结了linux的时间子系统的内核配置组合。

无广播: Alt text

####6.6.2软件定时器hrtimer 软件定时器hrtimer是依赖于clocksource的时间为基准进行定时的定时器,它的执行时机又依赖于clock_event_device时钟事件。clocksource就好比一个精准的时钟,而clock_event_device就好比一个精准的闹钟,而软件定时器hrtimer就是从clocksource中读取精确的时间,然后把这个时间加上一个时间间隔之后,把这个时间间隔作为到期时间设置到clock_event_device的闹钟中去。

#####6.6.2.1设计思想 每个hrtimer定时器按过期时间从小到大的排序用红黑树数据结构组织起来,在当前的定时器到期执行具体的clock_event_device->event_handler函数,event_handler函数会最终调用到function函数,从而执行定时器的处理函数。

高分辨率的定时器的时间基准可以包含很多种,可以是monotonic 时钟,xtime 时钟或者 是boot 时间等等前面在“内核时间表示”的章节研究过的各类内核时间。

#####6.6.2.2数据结构

  • 每个CPU 都会有多个clock 基准, 多个clock 基准其实是包含在每CPU 的变量

    hrtimer_cpu_base 中:

      /*
       * struct hrtimer_cpu_base - the per cpu clock bases
       * @lock:		lock protecting the base and associated clock bases
       *			and timers
       * @cpu:		cpu number
       * @active_bases:	Bitfield to mark bases with active timers
       * @clock_was_set:	Indicates that clock was set from irq context.
       * @expires_next:	absolute time of the next event which was scheduled
       *			via clock_set_next_event()
       * @hres_active:	State of high resolution mode
       * @hang_detected:	The last hrtimer interrupt detected a hang
       * @nr_events:		Total number of hrtimer interrupt events
       * @nr_retries:		Total number of hrtimer interrupt retries
       * @nr_hangs:		Total number of hrtimer interrupt hangs
       * @max_hang_time:	Maximum time spent in hrtimer_interrupt
       * @clock_base:		array of clock bases for this cpu
       */
      struct hrtimer_cpu_base {
      	raw_spinlock_t			lock;
      	unsigned int			cpu;
      	unsigned int			active_bases;
      	unsigned int			clock_was_set;
      #ifdef CONFIG_HIGH_RES_TIMERS
      	ktime_t				expires_next;
      	int				hres_active;
      	int				hang_detected;
      	unsigned long			nr_events;
      	unsigned long			nr_retries;
      	unsigned long			nr_hangs;
      	ktime_t				max_hang_time;
      #endif
      	struct hrtimer_clock_base	clock_base[HRTIMER_MAX_CLOCK_BASES];
      };
    
  • 每CPU 的clock 基准的数据结构为,clock 基准的个数为HRTIMER_MAX_CLOCK_BASES:

      /**
       * struct hrtimer_clock_base - the timer base for a specific clock
       * @cpu_base:		per cpu clock base
       * @index:		clock type index for per_cpu support when moving a
       *			timer to a base on another cpu.
       * @clockid:		clock id for per_cpu support
       * @active:		red black tree root node for the active timers
       * @resolution:		the resolution of the clock, in nanoseconds
       * @get_time:		function to retrieve the current time of the clock
       * @softirq_time:	the time when running the hrtimer queue in the softirq
       * @offset:		offset of this clock to the monotonic base
       */
      struct hrtimer_clock_base {
      	struct hrtimer_cpu_base	*cpu_base;
      	int			index;
      	clockid_t		clockid;
      	struct timerqueue_head	active;
      	ktime_t			resolution;
      	ktime_t			(*get_time)(void);
      	ktime_t			softirq_time;
      	ktime_t			offset;
      };
    
  • 每CPU CLOCK基准的初始化:

      /*
       * The timer bases:
       *
       * There are more clockids then hrtimer bases. Thus, we index
       * into the timer bases by the hrtimer_base_type enum. When trying
       * to reach a base using a clockid, hrtimer_clockid_to_base()
       * is used to convert from clockid to the proper hrtimer_base_type.
       */
      DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
      {
        
      	.lock = __RAW_SPIN_LOCK_UNLOCKED(hrtimer_bases.lock),
      	.clock_base =
      	{
      		{   //
      			.index = HRTIMER_BASE_MONOTONIC,
      			.clockid = CLOCK_MONOTONIC,
      			.get_time = &ktime_get,
      			.resolution = KTIME_LOW_RES,
      		},
      		{   //
      			.index = HRTIMER_BASE_REALTIME,
      			.clockid = CLOCK_REALTIME,
      			.get_time = &ktime_get_real,
      			.resolution = KTIME_LOW_RES,
      		},
      		{
      			.index = HRTIMER_BASE_BOOTTIME,
      			.clockid = CLOCK_BOOTTIME,
      			.get_time = &ktime_get_boottime,
      			.resolution = KTIME_LOW_RES,
      		},
      		{
      			.index = HRTIMER_BASE_TAI,
      			.clockid = CLOCK_TAI,
      			.get_time = &ktime_get_clocktai,
      			.resolution = KTIME_LOW_RES,
      		},
      	}
      };
    

    具体的ktime_get系列之后再研究,该函数具体工作就是通过调用clocksource 结构 体里面的read 函数读clocksource 提供的timer。

  • hrtimer自身的数据结构:

      /**
       * struct hrtimer - the basic hrtimer structure
       * @node:	timerqueue node, which also manages node.expires,
       *		the absolute expiry time in the hrtimers internal
       *		representation. The time is related to the clock on
       *		which the timer is based. Is setup by adding
       *		slack to the _softexpires value. For non range timers
       *		identical to _softexpires.
       * @_softexpires: the absolute earliest expiry time of the hrtimer.
       *		The time which was given as expiry time when the timer
       *		was armed.
       * @function:	timer expiry callback function
       * @base:	pointer to the timer base (per cpu and per clock)
       * @state:	state information (See bit values above)
       * @start_pid: timer statistics field to store the pid of the task which
       *		started the timer
       * @start_site:	timer statistics field to store the site where the timer
       *		was started
       * @start_comm: timer statistics field to store the name of the process which
       *		started the timer
       *
       * The hrtimer structure must be initialized by hrtimer_init()
       */
      struct hrtimer {
      	struct timerqueue_node		node;
      	ktime_t				_softexpires;
      	enum hrtimer_restart		(*function)(struct hrtimer *);
      	struct hrtimer_clock_base	*base;
      	unsigned long			state;	//hrtimer的状态
      #ifdef CONFIG_TIMER_STATS
      	int				start_pid;
      	void				*start_site;
      	char				start_comm[16];
      #endif
      };
    
  • hrtimer状态:

      /*
       * Values to track state of the timer
       *
       * Possible states:
       *
       * 0x00		inactive
       * 0x01		enqueued into rbtree
       * 0x02		callback function running
       * 0x04		timer is migrated to another cpu
       *
       * Special cases:
       * 0x03		callback function running and enqueued
       *		(was requeued on another CPU)
       * 0x05		timer was migrated on CPU hotunplug
       *
       * The "callback function running and enqueued" status is only possible on
       * SMP. It happens for example when a posix timer expired and the callback
       * queued a signal. Between dropping the lock which protects the posix timer
       * and reacquiring the base lock of the hrtimer, another CPU can deliver the
       * signal and rearm the timer. We have to preserve the callback running state,
       * as otherwise the timer could be removed before the softirq code finishes the
       * the handling of the timer.
       *
       * The HRTIMER_STATE_ENQUEUED bit is always or'ed to the current state
       * to preserve the HRTIMER_STATE_CALLBACK in the above scenario. This
       * also affects HRTIMER_STATE_MIGRATE where the preservation is not
       * necessary. HRTIMER_STATE_MIGRATE is cleared after the timer is
       * enqueued on the new cpu.
       *
       * All state transitions are protected by cpu_base->lock.
       */
      #define HRTIMER_STATE_INACTIVE	0x00
      #define HRTIMER_STATE_ENQUEUED	0x01
      #define HRTIMER_STATE_CALLBACK	0x02
      #define HRTIMER_STATE_MIGRATE	0x04
    
    • 0x00 inactive
    • 0x01 enqueued into rbtree
    • 0x02 callback function running
    • 0x04 timer is migrated to another cpu
  • hrtimer与clock base对应情况图:

    Alt text

#####6.6.2.3接口函数

  • hrtimer_init:

      static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
                 enum hrtimer_mode mode)
      {
      	struct hrtimer_cpu_base *cpu_base;
      	int base;
        
      	memset(timer, 0, sizeof(struct hrtimer));
        
      	cpu_base = raw_cpu_ptr(&hrtimer_bases);
        
      	if (clock_id == CLOCK_REALTIME && mode != HRTIMER_MODE_ABS)
      		clock_id = CLOCK_MONOTONIC;	//选择clock base
        
      	base = hrtimer_clockid_to_base(clock_id);  //选择clock base
      	timer->base = &cpu_base->clock_base[base];	//初始化timer->base为clock base
      	timerqueue_init(&timer->node);	//初始化timer->node,此为红黑树的节点元素
        
      #ifdef CONFIG_TIMER_STATS
      	timer->start_site = NULL;
      	timer->start_pid = -1;
      	memset(timer->start_comm, 0, TASK_COMM_LEN);
      #endif
      }
    

    总结: 该初始化函数其实就是通过hrtimer_bases 找到匹配的每CPU clock 基准,前面我们知道CPU clock 基准有多种,选择哪一种完全是按照clock_id 和mode 这两个参数决定的。

      /**
       * hrtimer_init - initialize a timer to the given clock
       * @timer:	the timer to be initialized
       * @clock_id:	the clock to be used
       * @mode:	timer mode abs/rel
       */
      void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
      		  enum hrtimer_mode mode)
      {
      	debug_init(timer, clock_id, mode);
      	__hrtimer_init(timer, clock_id, mode);
      }
      EXPORT_SYMBOL_GPL(hrtimer_init);
    
  • remove_hrtimer

      /*
       * remove hrtimer, called with base lock held
       */
      static inline int
      remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
      {
      	if (hrtimer_is_queued(timer)) {
      		unsigned long state;
      		int reprogram;
        
      		/*
      		 * Remove the timer and force reprogramming when high
      		 * resolution mode is active and the timer is on the current
      		 * CPU. If we remove a timer on another CPU, reprogramming is
      		 * skipped. The interrupt event on this CPU is fired and
      		 * reprogramming happens in the interrupt handler. This is a
      		 * rare case and less expensive than a smp call.
      		 */
      		debug_deactivate(timer);
      		timer_stats_hrtimer_clear_start_info(timer);
      		reprogram = base->cpu_base == this_cpu_ptr(&hrtimer_bases);
      		/*
      		 * We must preserve the CALLBACK state flag here,
      		 * otherwise we could move the timer base in
      		 * switch_hrtimer_base.
      		 */
      		state = timer->state & HRTIMER_STATE_CALLBACK;
      		__remove_hrtimer(timer, base, state, reprogram);    //
      		return 1;
      	}
      	return 0;
      }
    
      /*
       * __remove_hrtimer - internal function to remove a timer
       *
       * Caller must hold the base lock.
       *
       * High resolution timer mode reprograms the clock event device when the
       * timer is the one which expires next. The caller can disable this by setting
       * reprogram to zero. This is useful, when the context does a reprogramming
       * anyway (e.g. timer interrupt)
       */
      static void __remove_hrtimer(struct hrtimer *timer,
      			     struct hrtimer_clock_base *base,
      			     unsigned long newstate, int reprogram)
      {
      	struct timerqueue_node *next_timer;
      	if (!(timer->state & HRTIMER_STATE_ENQUEUED)) //timer 的初始化状态为0x0,
      所以处于初始化状态的timer 不能被remove。如果该timer 处于enqueue 状态,也就是说
      该timer 的红黑树节点已经加入到以每CPU clock base 中的头节点active 为跟节点的红
      黑树中,那么它是可以被remove 的。
      		goto out;
        
      	next_timer = timerqueue_getnext(&base->active);
      	timerqueue_del(&base->active, &timer->node);
      	if (&timer->node == next_timer) {
      #ifdef CONFIG_HIGH_RES_TIMERS
      		/* Reprogram the clock event device. if enabled */
      		if (reprogram && hrtimer_hres_active()) {
      			ktime_t expires;
        
      			expires = ktime_sub(hrtimer_get_expires(timer),
      					    base->offset);
      			if (base->cpu_base->expires_next.tv64 == expires.tv64)
      				hrtimer_force_reprogram(base->cpu_base, 1);
      		}
      #endif
      	}
      	if (!timerqueue_getnext(&base->active))
      		base->cpu_base->active_bases &= ~(1 << base->index);
      out:
      	timer->state = newstate;
      }
    
  • hrtimer_start:

      /**
       * hrtimer_start - (re)start an hrtimer on the current CPU
       * @timer:	the timer to be added
       * @tim:	expiry time
       * @mode:	expiry mode: absolute (HRTIMER_MODE_ABS) or
       *		relative (HRTIMER_MODE_REL)
       *
       * Returns:
       *  0 on success
       *  1 when the timer was active
       */
      int
      hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
      {
      	return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);    //
      }
      EXPORT_SYMBOL_GPL(hrtimer_start);
    
      int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
          unsigned long delta_ns, const enum hrtimer_mode mode,
          int wakeup)
      {
      	struct hrtimer_clock_base *base, *new_base;
      	unsigned long flags;
      	int ret, leftmost;
        
      	base = lock_hrtimer_base(timer, &flags);
        
      	/* Remove an active timer from the queue: */
      	ret = remove_hrtimer(timer, base);  //
        
      	if (mode & HRTIMER_MODE_REL) {
      		tim = ktime_add_safe(tim, base->get_time());
      		/*
      		 * CONFIG_TIME_LOW_RES is a temporary way for architectures
      		 * to signal that they simply return xtime in
      		 * do_gettimeoffset(). In this case we want to round up by
      		 * resolution when starting a relative timer, to avoid short
      		 * timeouts. This will go away with the GTOD framework.
      		 */
      #ifdef CONFIG_TIME_LOW_RES
      		tim = ktime_add_safe(tim, base->resolution);
      #endif
      	}
        
      	hrtimer_set_expires_range_ns(timer, tim, delta_ns);
        
      	/* Switch the timer base, if necessary: */
      	new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
        
      	timer_stats_hrtimer_set_start_info(timer);
        
      	leftmost = enqueue_hrtimer(timer, new_base);	//将新加的timer添加到红黑树的相应的节点位置,并找出最左节点即最快一个到期的timer。
        
      	if (!leftmost) {
      		unlock_hrtimer_base(timer, &flags);
      		return ret;
      	}
        
      	if (!hrtimer_is_hres_active(timer)) {
      		/*
      		 * Kick to reschedule the next tick to handle the new timer
      		 * on dynticks target.
      		 */
      		wake_up_nohz_cpu(new_base->cpu_base->cpu);
      	} else if (new_base->cpu_base == this_cpu_ptr(&hrtimer_bases) &&
      			hrtimer_reprogram(timer, new_base)) {
      		/*
      		 * Only allow reprogramming if the new base is on this CPU.
      		 * (it might still be on another CPU if the timer was pending)
      		 *
      		 * XXX send_remote_softirq() ?
      		 */
      		if (wakeup) {
      			/*
      			 * We need to drop cpu_base->lock to avoid a
      			 * lock ordering issue vs. rq->lock.
      			 */
      			raw_spin_unlock(&new_base->cpu_base->lock);
      			raise_softirq_irqoff(HRTIMER_SOFTIRQ);
      			local_irq_restore(flags);
      			return ret;
      		} else {
      			__raise_softirq_irqoff(HRTIMER_SOFTIRQ);
      		}
      	}
        
      	unlock_hrtimer_base(timer, &flags);
        
      	return ret;
      }
      EXPORT_SYMBOL_GPL(__hrtimer_start_range_ns);
    
      /*
       * enqueue_hrtimer - internal function to (re)start a timer
       *
       * The timer is inserted in expiry order. Insertion into the
       * red black tree is O(log(n)). Must hold the base lock.
       *
       * Returns 1 when the new timer is the leftmost timer in the tree.
       */
      static int enqueue_hrtimer(struct hrtimer *timer,
      			   struct hrtimer_clock_base *base)
      {
      	debug_activate(timer);
        
      	timerqueue_add(&base->active, &timer->node);
      	base->cpu_base->active_bases |= 1 << base->index;
        
      	/*
      	 * HRTIMER_STATE_ENQUEUED is or'ed to the current state to preserve the
      	 * state of a possibly running callback.
      	 */
      	timer->state |= HRTIMER_STATE_ENQUEUED;
        
      	return (&timer->node == base->active.next);
      }
    

    Alt text

    由上图可知:enqueue_hrtimer 其实就是做了使hrtimer 节点入列并判断是否为最左下节点的 工作。

  • hrtimer_forward:

    该函数的作用为:上一次的过期时间+n*interval >now 推断出n >=(now - 旧的过期时间) / interval 返回值为n,归纳起来就是:调整当前的timer 的过期时间到NOW 之后。基本原 理请看下图,以模拟周期时钟为例即:tick_period 作为interval。

    图:hrtimer_forward的基本原理

    图:hrtimer_forward的基本原理

      /**
       * hrtimer_forward - forward the timer expiry
       * @timer:	hrtimer to forward
       * @now:	forward past this time
       * @interval:	the interval to forward
       *
       * Forward the timer expiry so it will expire in the future.
       * Returns the number of overruns.
       */
      u64 hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval)
      {
      	u64 orun = 1;
      	ktime_t delta;
        
      	delta = ktime_sub(now, hrtimer_get_expires(timer));
        
      	if (delta.tv64 < 0)
      		return 0;
        
      	if (interval.tv64 < timer->base->resolution.tv64)
      		interval.tv64 = timer->base->resolution.tv64;
        
      	if (unlikely(delta.tv64 >= interval.tv64)) {
      		s64 incr = ktime_to_ns(interval);
        
      		orun = ktime_divns(delta, incr);
      		hrtimer_add_expires_ns(timer, incr * orun);
      		if (hrtimer_get_expires_tv64(timer) > now.tv64)
      			return orun;
      		/*
      		 * This (and the ktime_add() below) is the
      		 * correction for exact:
      		 */
      		orun++;
      	}
      	hrtimer_add_expires(timer, interval);
        
      	return orun;
      }
      EXPORT_SYMBOL_GPL(hrtimer_forward);
    

#####6.6.2.4应用实例

----- 高分辨率模式下的周期时钟仿真0

前面我们知道:高分辨率模式下的时钟处理函数为:hrtimer_interrupt, 是需要硬件支持 one shot timer 运作的。如果系统中要求支持动态时钟的话,那么hrtimer_interrupt 就没办法 支持周期运行的定时器了,或者说有些困难,不过我们根据hrtimer 的原理,完全可以模拟 一个周期运行的timer 给系统用来更新jiffies 等需要周期timer 支持的操作。

tick_sched 数据结构是一个专门用来管理上述模拟周期timer 的相关信息的数据结构, 由tick_cpu_sched 专门为每CPU 提供了这个数据结构的实例:

DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched);

高分辨率定时器hrtimer 实例:struct hrtimer sched_timer,保存在tick_sched 数据结构中。

/**
 * struct tick_sched - sched tick emulation and no idle tick control/stats
 * @sched_timer:	hrtimer to schedule the periodic tick in high
 *			resolution mode
 * @last_tick:		Store the last tick expiry time when the tick
 *			timer is modified for nohz sleeps. This is necessary
 *			to resume the tick timer operation in the timeline
 *			when the CPU returns from nohz sleep.
 * @tick_stopped:	Indicator that the idle tick has been stopped
 * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
 * @idle_calls:		Total number of idle calls
 * @idle_sleeps:	Number of idle calls, where the sched tick was stopped
 * @idle_entrytime:	Time when the idle call was entered
 * @idle_waketime:	Time when the idle was interrupted
 * @idle_exittime:	Time when the idle state was left
 * @idle_sleeptime:	Sum of the time slept in idle with sched tick stopped
 * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
 * @sleep_length:	Duration of the current idle sleep
 * @do_timer_lst:	CPU was the last one doing do_timer before going idle
 */
struct tick_sched {
	struct hrtimer			sched_timer;    //
	unsigned long			check_clocks;
	enum tick_nohz_mode		nohz_mode;
	ktime_t				last_tick;
	int				inidle;
	int				tick_stopped;
	unsigned long			idle_jiffies;
	unsigned long			idle_calls;
	unsigned long			idle_sleeps;
	int				idle_active;
	ktime_t				idle_entrytime;
	ktime_t				idle_waketime;
	ktime_t				idle_exittime;
	ktime_t				idle_sleeptime;
	ktime_t				iowait_sleeptime;
	ktime_t				sleep_length;
	unsigned long			last_jiffies;
	unsigned long			next_jiffies;
	ktime_t				idle_expires;
	int				do_timer_last;
};
/**
 * tick_setup_sched_timer - setup the tick emulation timer
 */
void tick_setup_sched_timer(void)
{
	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
	ktime_t now = ktime_get();

	/*
	 * Emulate tick processing via per-CPU hrtimers:
	 */
	hrtimer_init(&ts->sched_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);	//取绝对模式,单调时钟。
	ts->sched_timer.function = tick_sched_timer;	//设置timer的到期处理函数

	/* Get the next period (per cpu) */
	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());	//设置到期时间值为:tick_init_jiffy_update()

	/* Offset the tick to avert jiffies_lock contention. */
	if (sched_skew_tick) {
		u64 offset = ktime_to_ns(tick_period) >> 1;
		do_div(offset, num_possible_cpus());
		offset *= smp_processor_id();
		hrtimer_add_expires_ns(&ts->sched_timer, offset);
	}

	for (;;) {
		hrtimer_forward(&ts->sched_timer, now, tick_period);	//根据上面讲解的timer forward的作用其实就是,重新调整下timer的过期时间,之后再开始运行timer。
		hrtimer_start_expires(&ts->sched_timer,
				      HRTIMER_MODE_ABS_PINNED);
		/* Check, if the timer was already in the past */
		if (hrtimer_active(&ts->sched_timer))
			break;
		now = ktime_get();
	}

#ifdef CONFIG_NO_HZ_COMMON
	if (tick_nohz_enabled) {
		ts->nohz_mode = NOHZ_MODE_HIGHRES;
		tick_nohz_active = 1;
	}
#endif
}
#endif /* HIGH_RES_TIMERS */

####6.6.3低分辨率定时器 #####6.6.3.1设计思想

综上所述,低分辨率定时器顾名思义就是内核周期性的产生中断,中断的频率一般为HZ。 该定时器的硬件支撑就是要求硬件定时器,支持周期模式触发中断或者单触发模式触发中 断,二者任选其一都可以实现低分辨率的设计。而二者唯一的区别就是单触发模式需要不断 的设置下次中断触发的时间,这个时间间隔可以设置为1/HZ 秒。

低分辨率定时器有两种配置:

HZ 模式:

CONFIG_HZ_PERIODIC && !CONFIG_HIGH_RES_TIMER

NO HZ 模式:

CONFIG_NO_HZ_IDLE && !CONFIG_HIGH_RES_TIMER && TICK_ONESHOT

#####6.6.3.2程序实现 ######6.6.3.2.1 HZ模式 系统在初始化的时候会调用tick_setup_device 函数:

/*
 * Setup the tick device
 */
static void tick_setup_device(struct tick_device *td,
			      struct clock_event_device *newdev, int cpu,
			      const struct cpumask *cpumask)
{
	ktime_t next_event;
	void (*handler)(struct clock_event_device *) = NULL;

	/*
	 * First device setup ?
	 */
	if (!td->evtdev) {	//(a)
		/*
		 * If no cpu took the do_timer update, assign it to
		 * this cpu:
		 */
		if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
			if (!tick_nohz_full_cpu(cpu))
				tick_do_timer_cpu = cpu;
			else
				tick_do_timer_cpu = TICK_DO_TIMER_NONE;
			tick_next_period = ktime_get();	//(b)
			tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
		}

		/*
		 * Startup in periodic mode first.
		 */
		td->mode = TICKDEV_MODE_PERIODIC;	//(c)
	} else {
		handler = td->evtdev->event_handler;
		next_event = td->evtdev->next_event;
		td->evtdev->event_handler = clockevents_handle_noop;
	}

	td->evtdev = newdev;

	/*
	 * When the device is not per cpu, pin the interrupt to the
	 * current cpu:
	 */
	if (!cpumask_equal(newdev->cpumask, cpumask))
		irq_set_affinity(newdev->irq, cpumask);

	/*
	 * When global broadcasting is active, check if the current
	 * device is registered as a placeholder for broadcast mode.
	 * This allows us to handle this x86 misfeature in a generic
	 * way. This function also returns !=0 when we keep the
	 * current active broadcast state for this CPU.
	 */
	if (tick_device_uses_broadcast(newdev, cpu))
		return;

	if (td->mode == TICKDEV_MODE_PERIODIC)
		tick_setup_periodic(newdev, 0);	//(d)
	else
		tick_setup_oneshot(newdev, handler, next_event);
}

在系统初始化的时候由于td->evtdev 还是空值,程序会进入到代码(c),设置td->mode = TICKDEV_MODE_PERIODIC,这个周期模式宏定义的结果就是要求硬件定时器支持周期模式的中断触发方式。从而执行代码(d)tick_setup_periodic(newdev, 0)函数:

/*
 * Setup the device for a periodic tick
 */
void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
{
	tick_set_periodic_handler(dev, broadcast);  //

	/* Broadcast setup ? */
	if (!tick_device_is_functional(dev))
		return;

	if ((dev->features & CLOCK_EVT_FEAT_PERIODIC) &&    //
	    !tick_broadcast_oneshot_active()) {
		clockevents_set_mode(dev, CLOCK_EVT_MODE_PERIODIC);
	} else {
		unsigned long seq;
		ktime_t next;

		do {
			seq = read_seqbegin(&jiffies_lock);
			next = tick_next_period;
		} while (read_seqretry(&jiffies_lock, seq));

		clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);

		for (;;) {
			if (!clockevents_program_event(dev, next, false))
				return;
			next = ktime_add(next, tick_period);
		}
	}
}

函数tick_set_periodic_handler 会设置dev->event_handler = tick_handle_periodic,将 event_handler 进行初始化,所以在系统初始化前期,低精度的定时器的处理函数就为: tick_handle_periodic 那么在此函数中调用函数tick_periodic,该函数就是周期事件 函数的包装函数,请看下面的代码。

/*
 * Event handler for periodic ticks
 */
void tick_handle_periodic(struct clock_event_device *dev)
{
	int cpu = smp_processor_id();
	ktime_t next = dev->next_event;

	tick_periodic(cpu);

	if (dev->mode != CLOCK_EVT_MODE_ONESHOT)
		return;
	for (;;) {
		/*
		 * Setup the next period for devices, which do not have
		 * periodic mode:
		 */
		next = ktime_add(next, tick_period);

		if (!clockevents_program_event(dev, next, false))
			return;
		/*
		 * Have to be careful here. If we're in oneshot mode,
		 * before we call tick_periodic() in a loop, we need
		 * to be sure we're using a real hardware clocksource.
		 * Otherwise we could get trapped in an infinite
		 * loop, as the tick_periodic() increments jiffies,
		 * which then will increment time, possibly causing
		 * the loop to trigger again and again.
		 */
		if (timekeeping_valid_for_hres())
			tick_periodic(cpu);
	}
}
/*
 * Periodic tick
 */
static void tick_periodic(int cpu)
{
	if (tick_do_timer_cpu == cpu) {
		write_seqlock(&jiffies_lock);

		/* Keep track of the next tick event */
		tick_next_period = ktime_add(tick_next_period, tick_period);

		do_timer(1);    //
		write_sequnlock(&jiffies_lock);
		update_wall_time();
	}

	update_process_times(user_mode(get_irq_regs()));    //
	profile_tick(CPU_PROFILING);
}

至此,系统低分辨率定时器开始工作,设置的硬件定时器工作模式为周期触发模式,中断处 理函数为:tick_handle_periodic。如果设置的硬件定时器工作模式不支持周期触发模式, 那么对于单触发模式的定时器也可以模拟周期触发,就是每次都编程set_next_event,每次指定的cycle 的值都是固定的,就可以模拟周期性触发模式了。

######6.6.3.2.2 NO HZ模式

  • 从HZ 模式切换到NO HZ 模式的前提: (1)硬件定时器需要支持单次触发模式 (2)高精度定时器没有配置
  • 从HZ 模式切换到NO HZ 模式的时机: 我们在“周期事件”的章节中阐述了hrtimer_run_pending()函数的作用,并有这么结论: “所以定时器之间的切换,HZ 模式和NO HZ 模式之间的切换时机是在处理周期事件的中 断处理函数的中断下半部TIMER_SOFTIRQ 软中断中执行的。

NO HZ 模式是基于前面讲解过的软件定时器hrtimer 原理实现的。根据上面的分析NO HZ 模式下的周期中断处理函数为:tick_nohz_handler

在“周期事件”的章节中,我们提到过:hrtimer_run_pending,在该函数中实现低分辨 率HZ 模式向NO HZ 模式的切换:

/*
 * Called from timer softirq every jiffy, expire hrtimers:
 *
 * For HRT its the fall back code to run the softirq in the timer
 * softirq context in case the hrtimer initialization failed or has
 * not been done yet.
 */
void hrtimer_run_pending(void)
{
	if (hrtimer_hres_active())
		return;

	/*
	 * This _is_ ugly: We have to check in the softirq context,
	 * whether we can switch to highres and / or nohz mode. The
	 * clocksource switch happens in the timer interrupt with
	 * xtime_lock held. Notification from there only sets the
	 * check bit in the tick_oneshot code, otherwise we might
	 * deadlock vs. xtime_lock.
	 */
	if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
		hrtimer_switch_to_hres();
}
/**
 * tick_nohz_switch_to_nohz - switch to nohz mode
 */
static void tick_nohz_switch_to_nohz(void)
{
	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
	ktime_t next;

	if (!tick_nohz_enabled)
		return;

	local_irq_disable();
	if (tick_switch_to_oneshot(tick_nohz_handler)) {    //
		local_irq_enable();
		return;
	}
	tick_nohz_active = 1;
	ts->nohz_mode = NOHZ_MODE_LOWRES;   //

	/*
	 * Recycle the hrtimer in ts, so we can share the
	 * hrtimer_forward with the highres code.
	 */
	hrtimer_init(&ts->sched_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);  //
	/* Get the next period */
	next = tick_init_jiffy_update();

	for (;;) {
		hrtimer_set_expires(&ts->sched_timer, next);    //
		if (!tick_program_event(next, 0))   //
			break;
		next = ktime_add(next, tick_period);    //
	}
	local_irq_enable();
}

我们注意到在这个函数里面有初始化&ts->sched_timer,设置tick_nohz_handler 函数 为dev->event_handler 。每次中断的时间间隔为tick_period 。当中断发生时, tick_nohz_handler 函数被执行,该函数会处理周期事件,我们在“周期事件”讲解过 hrtimer_run_queues()函数会执行__run_hrtimer,从而执行具体的hrtimer 处理函数。 函数tick_nohz_reprogram,会执行hrtimer_forward 函数,设置下一次的到期时间。 从而维持一个模拟的周期时钟。

/*
 * The nohz low res interrupt handler
 */
static void tick_nohz_handler(struct clock_event_device *dev)
{
	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
	struct pt_regs *regs = get_irq_regs();
	ktime_t now = ktime_get();

	dev->next_event.tv64 = KTIME_MAX;

	tick_sched_do_timer(now);
	tick_sched_handle(ts, regs);

	/* No need to reprogram if we are running tickless  */
	if (unlikely(ts->tick_stopped))
		return;

	while (tick_nohz_reprogram(ts, now)) {
		now = ktime_get();
		tick_do_update_jiffies64(now);
	}
}

####6.6.4软件定时器timer_list

软件定时器timer_list 是依赖于周期事件的中断处理函数而存在的,因为它是一种依靠jiffies 时间来定时的定时器,我们在“周期事件”章节讲解过,它的执行时机是在中断下半部的软 中断处理函数中执行。启动一个内核定时器只是声明了要在未来的某个时刻执行一项任务, 当前进程仍然继续执行。不要用定时器完成硬实时任务!

#####6.6.4.1设计思想 内核用数据结构来管理注册到内核的所有软件定时器,如下图所示,该设计理念可以高速有 效的检查内核即将到期的定时器。

图6.5.2:内核定时器数据结构设计原理图

图6.5.2:内核定时器数据结构设计原理图

上图中每个双向链表连接的元素就是一个timer_list 结构体。当你注册一个自己的内核定 时器的时候,你其实就是申请了一个timer_list 的数据结构,然后对其进行初始化,然后 把它加入到上面的链表中去。我们现在来具体的研究下它的运行原理。

前提:这里我们首先假设jiffies 始终是一个线性增长的一个变量。 定时器其实就是你设置一个从当前时间jiffies 开始算起,以后,直到具体某个时间点B(单 位:jiffies)才到期,并且到期后就会执行timer 处理函数。这样我们在初始化上图的链表 的时候其实就是计算(B - jiffies)的时间间隔,当时间间隔在0-255 之间的时候我们就把 该定时器节点放置在相应的tv1 数组元素为头节点的list 中,这样tv1 中的链表的每个元 素其实就代表的是一个定时器,如果有相同的到期时间的话,那么就用双链表连接起来。也 就说在tv1 中的每个元素为头节点的链表中的每个元素的到期时间其实都是相等的,相当于 时间间隔为0 个时钟周期。

而对于tv2 来说,每个数组元素代表的双向链表的头节点中的每个元素的时间间隔为2^8 个时钟周期。

而对于tv3 来说,每个数组元素代表的双向链表的头节点中的每个元素的时间间隔为2^14 个时钟周期。

而对于tv4 来说,每个数组元素代表的双向链表的头节点中的每个元素的时间间隔为2^20 个时钟周期。

而对于tv5 来说,每个数组元素代表的双向链表的头节点中的每个元素的时间间隔为2^26 个时钟周期。

#####6.6.4.2数据结构 定时器由结构timer_list 表示,定义在<linux/timer.h>

struct timer_list {
	/*
	 * All fields that change during normal runtime grouped to the
	 * same cacheline
	 */
	struct list_head entry;
	unsigned long expires;
	struct tvec_base *base;

	void (*function)(unsigned long);
	unsigned long data;

	int slack;

#ifdef CONFIG_TIMER_STATS
	int start_pid;
	void *start_site;
	char start_comm[16];
#endif
#ifdef CONFIG_LOCKDEP
	struct lockdep_map lockdep_map;
#endif
};

内核其他相关的数据结构:

struct tvec_base {
	spinlock_t lock;
	struct timer_list *running_timer;
	unsigned long timer_jiffies;
	unsigned long next_timer;
	unsigned long active_timers;
	unsigned long all_timers;
	int cpu;
	struct tvec_root tv1;
	struct tvec tv2;
	struct tvec tv3;
	struct tvec tv4;
	struct tvec tv5;
} ____cacheline_aligned;

struct tvec_base boot_tvec_bases;
EXPORT_SYMBOL(boot_tvec_bases);
static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;

tvec_base 中有五个相关的数据结构变量tv1-tv5,大家注意到tv1 和其他的tv 其实并一样, 分别属于不同的结构体类型:

struct tvec_root {
	struct list_head vec[TVR_SIZE];
};

struct tvec {
	struct list_head vec[TVN_SIZE];
};

从上面的定义中我们知道每个tv 其实都是一个数组,这个数据的每个元素都是list_head 类型的。它们将被用来作为timer_list 定时器的双向链表的头节点。

这样,确定一个定时器是否到期,内核无需扫描一个巨大的定时器链表,而是只关心tv1 中的256 个定时器就可以了,当tv1[0]到期执行完毕后,该元素被删除,并且指针指向下 一个tv1[1]的时间点,直到256 个都执行完毕,指针被重置为0.

当指针被重置为0 的时候,内核会把tv2 中的tv2[0]为头节点的的双向链表中的元素拷贝 到tv1 中填充256 个元素。这样以此类推到tv3 填充tv2,tv4 填充tv3,tv5 填充tv4.

#####6.6.4.3接口函数 内核在<linux/timer.h>中提供了一系列管理定时器的接口。

创建定时器 struct timer_list my_timer;

初始化定时器 init_timer(&my_timer); /* 填充数据结构/ my_timer.expires = jiffies + delay; my_timer.data = 0; my_timer.function = my_function; /定时器到期时调用的函数*/

定时器的执行函数 超时处理函数的原型如下: void my_timer_function(unsigned long data); 可以利用data 参数用一个处理函数处理多个定时器。可以将data 设为0。

激活定时器 add_timer(&my_timer); 定时器一旦激活就开始运行。

更改已激活的定时器的超时时间 mod_timer(&my_timer, jiffies+ney_delay); 可以用于那些已经初始化但还没激活的定时器,如果调用时定时器未被激活则返回 0,否则返回1。 一旦mod_timer 返回,定时器将被激活。

删除定时器 del_timer(&my_timer); 被激活或未被激活的定时器都可以使用,如果调用时定时器未被激活则返回0,否 则返回1。不需要为已经超时的定时器调用,它们被自动删除

同步删除 del_time_sync(&my_timer); 在smp 系统中,确保返回时,所有的定时器处理函数都退出。不能在中断上下文 使用!

#####6.6.4.4程序实现 初始化tv1-tv5:

static inline int
__mod_timer(struct timer_list *timer, unsigned long expires,
						bool pending_only, int pinned)
{
	struct tvec_base *base, *new_base;
	unsigned long flags;
	int ret = 0 , cpu;

	timer_stats_timer_set_start_info(timer);
	BUG_ON(!timer->function);

	base = lock_timer_base(timer, &flags);

	ret = detach_if_pending(timer, base, false);
	if (!ret && pending_only)
		goto out_unlock;

	debug_activate(timer, expires);

	cpu = get_nohz_timer_target(pinned);
	new_base = per_cpu(tvec_bases, cpu);

	if (base != new_base) {
		/*
		 * We are trying to schedule the timer on the local CPU.
		 * However we can't change timer's base while it is running,
		 * otherwise del_timer_sync() can't detect that the timer's
		 * handler yet has not finished. This also guarantees that
		 * the timer is serialized wrt itself.
		 */
		if (likely(base->running_timer != timer)) {
			/* See the comment in lock_timer_base() */
			timer_set_base(timer, NULL);
			spin_unlock(&base->lock);
			base = new_base;
			spin_lock(&base->lock);
			timer_set_base(timer, base);
		}
	}

	timer->expires = expires;
	internal_add_timer(base, timer);    //

out_unlock:
	spin_unlock_irqrestore(&base->lock, flags);

	return ret;
}

执行时机:

/*
 * Called by the local, per-CPU timer interrupt on SMP.
 */
void run_local_timers(void)
{
	hrtimer_run_queues();
	raise_softirq(TIMER_SOFTIRQ);   //
}

执行函数:

/*
 * This function runs timers and the timer-tq in bottom half context.
 */
static void run_timer_softirq(struct softirq_action *h)
{
	struct tvec_base *base = __this_cpu_read(tvec_bases);

	hrtimer_run_pending();

	if (time_after_eq(jiffies, base->timer_jiffies))
		__run_timers(base); //
}
/**
 * __run_timers - run all expired timers (if any) on this CPU.
 * @base: the timer vector to be processed.
 *
 * This function cascades all vectors and executes all expired timer
 * vectors.
 */
static inline void __run_timers(struct tvec_base *base)
{
	struct timer_list *timer;

	spin_lock_irq(&base->lock);
	if (catchup_timer_jiffies(base)) {
		spin_unlock_irq(&base->lock);
		return;
	}
	while (time_after_eq(jiffies, base->timer_jiffies)) {
		struct list_head work_list;
		struct list_head *head = &work_list;
		int index = base->timer_jiffies & TVR_MASK;

		/*
		 * Cascade timers:
		 */
		if (!index &&   //注意
			(!cascade(base, &base->tv2, INDEX(0))) &&
				(!cascade(base, &base->tv3, INDEX(1))) &&
					!cascade(base, &base->tv4, INDEX(2)))
			cascade(base, &base->tv5, INDEX(3));
		++base->timer_jiffies;
		list_replace_init(base->tv1.vec + index, head);
		while (!list_empty(head)) {
			void (*fn)(unsigned long);
			unsigned long data;
			bool irqsafe;

			timer = list_first_entry(head, struct timer_list,entry);
			fn = timer->function;
			data = timer->data;
			irqsafe = tbase_get_irqsafe(timer->base);

			timer_stats_account_timer(timer);

			base->running_timer = timer;
			detach_expired_timer(timer, base);

			if (irqsafe) {
				spin_unlock(&base->lock);
				call_timer_fn(timer, fn, data);
				spin_lock(&base->lock);
			} else {
				spin_unlock_irq(&base->lock);
				call_timer_fn(timer, fn, data);
				spin_lock_irq(&base->lock);
			}
		}
	}
	base->running_timer = NULL;
	spin_unlock_irq(&base->lock);
}

####6.6.5高分辨率定时器 #####6.6.5.1设计思想 #####6.6.5.2程序实现 ######6.6.5.2.1 HZ模式 上文讲过hrtimer_run_pending 函数中,如果使能了高分辨率定时器配置,则系统通过 hrtimer_switch_to_hres 函数实现从低分辨率高高分辨率定时器之间的切换。 tick_init_highres 函数设置dev->event_handler = hrtimer_interrupt,高精度定 时器的中断处理函数将会为:hrtimer_interrupt。tick_setup_sched_timer()函数是 利用前面讲过的软件定时器hrtimer 原理实现的一个模拟周期触发的hrtimer,时钟周期 为tick_period。

当程序完全切到高精度模式的时候我们设置base->hres_active = 1 , retrigger_next_event 函数负责triger 中断,是通过调用hrtimer_force_reprogram函数完成的。

/*
 * Switch to high resolution mode
 */
static int hrtimer_switch_to_hres(void)
{
	int i, cpu = smp_processor_id();
	struct hrtimer_cpu_base *base = &per_cpu(hrtimer_bases, cpu);
	unsigned long flags;

	if (base->hres_active)
		return 1;

	local_irq_save(flags);

	if (tick_init_highres()) {
		local_irq_restore(flags);
		printk(KERN_WARNING "Could not switch to high resolution "
				    "mode on CPU %d\n", cpu);
		return 0;
	}
	base->hres_active = 1;  //
	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++)
		base->clock_base[i].resolution = KTIME_HIGH_RES;

	tick_setup_sched_timer();   //
	/* "Retrigger" the interrupt to get things going */
	retrigger_next_event(NULL);     //
	local_irq_restore(flags);
	return 1;
}

######6.6.5.2.2 NO HZ模式 高分辨率定时器的时候HZ, NO HZ 模式没有太大区别,唯一的区别就是在系统idle 的时候 NO HZ 模式可以关定时器。

/*
 * Generic idle loop implementation
 *
 * Called with polling cleared.
 */
static void cpu_idle_loop(void)
{
	while (1) {
		/*
		 * If the arch has a polling bit, we maintain an invariant:
		 *
		 * Our polling bit is clear if we're not scheduled (i.e. if
		 * rq->curr != rq->idle).  This means that, if rq->idle has
		 * the polling bit set, then setting need_resched is
		 * guaranteed to cause the cpu to reschedule.
		 */

		__current_set_polling();
		tick_nohz_idle_enter();     //

		while (!need_resched()) {
			check_pgt_cache();
			rmb();

			if (cpu_is_offline(smp_processor_id()))
				arch_cpu_idle_dead();

			local_irq_disable();
			arch_cpu_idle_enter();

			/*
			 * In poll mode we reenable interrupts and spin.
			 *
			 * Also if we detected in the wakeup from idle
			 * path that the tick broadcast device expired
			 * for us, we don't want to go deep idle as we
			 * know that the IPI is going to arrive right
			 * away
			 */
			if (cpu_idle_force_poll || tick_check_broadcast_expired())
				cpu_idle_poll();
			else
				cpuidle_idle_call();

			arch_cpu_idle_exit();
		}

		/*
		 * Since we fell out of the loop above, we know
		 * TIF_NEED_RESCHED must be set, propagate it into
		 * PREEMPT_NEED_RESCHED.
		 *
		 * This is required because for polling idle loops we will
		 * not have had an IPI to fold the state for us.
		 */
		preempt_set_need_resched();
		tick_nohz_idle_exit();      //
		__current_clr_polling();

		/*
		 * We promise to call sched_ttwu_pending and reschedule
		 * if need_resched is set while polling is set.  That
		 * means that clearing polling needs to be visible
		 * before doing these things.
		 */
		smp_mb__after_atomic();

		sched_ttwu_pending();
		schedule_preempt_disabled();
	}
}

###6.7延时执行 设备驱动程序经常需要将某些特定代码延迟一段时间后执行,通常是为了让硬件能完成某些 任务。长于定时器周期(也称为时钟嘀嗒)的延迟可以通过使用系统时钟完成,而非常短的延 时则通过软件循环的方式完成。

  1. 短延时 对于那些最多几十个毫秒的延迟,无法借助系统定时器。系统通过软件循环提供了下面 的延迟函数:

     #include <linux/delay.h> /* 实际在<asm/delay.h> */
     void ndelay(unsigned long nsecs); /*延迟纳秒*/
     void udelay(unsigned long usecs); /*延迟微秒*/
     void mdelay(unsigned long msecs); /*延迟毫秒*/
    

    这三个延迟函数均是忙等待函数,在延迟过程中无法运行其他任务。实际上,当前所有 平台都无法达到纳秒精度。

  2. 长延时
    • 在延迟到期前让出处理器

        while(time_before(jiffies, j1))
        schedule();
      

      在等待期间可以让出处理器,但系统无法进入空闲模式(因为这个进程始终在进行调度),不利于省电。

    • 超时函数

        #include <linux/sched.h>
        signed long schedule_timeout(signed long timeout);
      

      使用方式:

        set_current_state(TASK_INTERRUPTIBLE);
        schedule_timeout(2*HZ); /* 睡2 秒*/
      

      进程经过2 秒后会被唤醒。如果不希望被用户空间打断,可以将进程状态设置为TASK_UNINTERRUPTIBLE。

  3. 等待队列
  4. 内核定时器