java打印对象内存地址 分布式事务 事务消息 分布式事务 几种解决方案 分布式事务-Seata 分布式事务-Seata 分布式事务-LCN-TCC 分布式事务-LCN 分布式事务-消息队列-定时任务-本地事件表 Zuul网关实战02 Zuul网关实战01 灰度发布落地实战2 灰度发布落地实战1 Gsnova on Heroku build Systemd Debian system initialization manage multi id_rsa ubuntu 64bits cannot run 32bits app REHL power auditing Debug Assembly for ARMv8 on QEMU ARM体系结构--寄存器 Run Debian iso on QEMU ARMv8 QEMU ARM64 guide cross compiler install buildroot install QEMU install python入门--数据结构 python入门--内置数据类型 python入门--类 异常 python入门--条件表达式 方法 python入门--数字 字符串 数组 RTC驱动分析 块设备驱动 TCP UDP socket 触摸屏驱动 USB驱动 LED按键中断 LCD 驱动 驱动信号 根文件系统 实验 内核实验 字符设备驱动程序 绪论 uboot 实验 LCD 实验 系统时钟和UART 中断控制器 Nand Flash控制器 MMU 实验 储存管理器实验 GPIO实验 点亮LED 编译加载驱动 制作烧写内核 dnw替代方法 MINI2440 TQ2440安装配套Linux 使用NFS 制作烧写跟文件系统 grub引导Windows 烧写裸版程序-linux Ubuntu 网络没有 eth0 Linux自动挂载 烧写裸板程序 电路基础 Mac词典 Vim插件 Assembly 综合研究 Assembly 指令总结 Assembly 直接定址表 Assembly 使用BIOS进行键盘输入和磁盘读写 Assembly 外中断 Assembly 端口 Assembly int指令 Assembly 内中断 Assembly 标志寄存器 Assembly 转移指令的原理 Assembly Call和ret指令 Assembly 数据处理两个基本问题 Assembly 灵活定位内存地址 Assembly 包含多个段的程序 Assembly [bx] loop Assembly 第一个程序 Assembly 寄存器 (内存访问) Assembly 寄存器 AWS VPN with EC2 hidden file in picture(linux) Assembly 基础 idea shortcuts 常用快捷键 idea plugin folder install ruby and jekyll

内核学习-->时间管理(上)

2015年06月02日

##6.时间管理 时钟是整个操作系统的脉搏,它为进程的时间片调度,定时事件提供了依据。另外,用户空间的很多操作都依赖于时钟,例如select poll make。操作系统管理的时间分为两种,一种称为当前时间,也即我们日常生活所用的时间。这个时间一般保存在CMOS中,主板中有特定的芯片为其提供计时依据。另外一种时间称为相对时间,例如系统运行时间。显然对计算机而言,相对时间比当前时间更为重要。

###6.1硬件时钟源 ####6.1.1 RTC时钟 现在的SOC中都内置了RTC功能,如果SOC不支持,则使用RTC芯片来支持操作系统断电时间管理。RTC在断电的情况下采用备用电池供电。

####6.1.2硬件定时器 中断定时器就好比Linux操作系统的心脏,Linux的进程调度、统计以及子系统的运转都是基于这个中断时钟的定期触发中断,我们称一次定期触发中断为一个tick,每次中断到来的时候内核就会更新jiffies,更新统计,进程切换等等比较重要的事情,那么每次触发中断的时间间隔是如何设定的呢?

一般来说大多数厂家的SOC都会内置可以周期模式的或者单触发模式触发时间到期中断的定时器,发送中断信号的间隔可以对其进行编程控制,可以通过编程定时器寄存器来设置周期性的触发中断,也可以设置单次触发中断,即每次触发中断,都需要编程寄存器设置中断触发的时间间隔,而这个时间间隔可以不是周期性的,可以每次都不同。

这种定时器的基本原理就是:定时器的内部都会有个计数器寄存器,如果定时器的时钟源为当前系统中的时钟频率为F,假设F未经分频直接作为定时器的时钟源,那么定时器的计数器的计数频率也会为F,即计数器在每个时钟周期到来的时候都会加1,但是由于计数器属于寄存器,最大为32bits,所以这个cycle最大会从0加到最大为2^32个cycle。

###6.2时钟源抽象 ####6.2.1数据结构 在上面的章节理解了时钟源的定义和用途,每提出一个事物,内核总会抽象出相应的数据结构与之对应,位于/include/linux目录里,内核对时钟源的抽象数据结构为:

struct clocksource {
	/*
	 * Hotpath data, fits in a single cache line when the
	 * clocksource itself is cacheline aligned.
	 */
	cycle_t (*read)(struct clocksource *cs);
	cycle_t cycle_last;
	cycle_t mask;
	u32 mult;
	u32 shift;
	u64 max_idle_ns;
	u32 maxadj;
#ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
	struct arch_clocksource_data archdata;
#endif

	const char *name;
	struct list_head list;
	int rating;
	int (*enable)(struct clocksource *cs);
	void (*disable)(struct clocksource *cs);
	unsigned long flags;
	void (*suspend)(struct clocksource *cs);
	void (*resume)(struct clocksource *cs);

	/* private: */
#ifdef CONFIG_CLOCKSOURCE_WATCHDOG
	/* Watchdog related data, used by the framework */
	struct list_head wd_list;
	cycle_t cs_last;
	cycle_t wd_last;
#endif
} ____cacheline_aligned;

rating:时钟源的精度

同一个设备下,可以有多个时钟源,每个时钟源的精度由驱动它的时钟频率决定,比如一个由10MHz时钟驱动的时钟源,他的精度就是100nS。clocksource结构中有一个rating字段,代表着该时钟源的精度范围,它的取值范围如下:

  • 1–99:不适合于用作实际的时钟源,只用于启动过程或用于测试;
  • 100–199:基本可用,可用作真实的时钟源,但不推荐;
  • 200–299:精度较好,可用作真实的时钟源;
  • 300–399:很好,精确的时钟源;
  • 400–499:理想的时钟源,如有可能就必须选择它作为时钟源;

read回调函数

时钟源本身不会产生中断,要获得时钟源的当前计数,只能通过主动调用它的read回调函数来获得当前的计数值,注意这里只能获得计数值,也就是所谓的cycle,要获得相应的时间,必须要借助clocksource的mult和shift字段进行转换计算。

multshift字段

因为从clocksource中读到的值是一个cycle计数值,要转换为时间,我们必须要知道驱动clocksource的时钟频率F,一个简单的计算就可以完成。

t = cycle / F ;单位为s

所有cycle个时钟周期T转化为ns为:

ns = cycle * NSEC_PER_SEC / F

实际计算时,由于内核不支持浮点运算,只支持整数的除法运算,会带来很大精度损失,所以对上面式子进行转换,如下:

ns = cycle * NSEC_PER_SEC * 2^shift / F * 2^shift

我们设:mult = NESC_PER_SEC * 2^shift / F = NESC_PER_SEC « shift / F

则ns = cycle * mult / 2 ^ shift = (cycle * mult) » shift

####6.2.2内核代码 初始化clocksource结构体:

int __init clocksource_mmio_init(void __iomem *base, const char *name,
	unsigned long hz, int rating, unsigned bits,
	cycle_t (*read)(struct clocksource *))
{
	struct clocksource_mmio *cs;

	if (bits > 32 || bits < 16)
		return -EINVAL;

	cs = kzalloc(sizeof(struct clocksource_mmio), GFP_KERNEL);
	if (!cs)
		return -ENOMEM;

	cs->reg = base;
	cs->clksrc.name = name;
	cs->clksrc.rating = rating;
	cs->clksrc.read = read;
	cs->clksrc.mask = CLOCKSOURCE_MASK(bits);
	cs->clksrc.flags = CLOCK_SOURCE_IS_CONTINUOUS;

	return clocksource_register_hz(&cs->clksrc, hz);
}

将clocksource注册到内核:

int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
{

	/* Initialize mult/shift and max_idle_ns */
	__clocksource_updatefreq_scale(cs, scale, freq);

	/* Add clocksource to the clocksource list */
	mutex_lock(&clocksource_mutex);
	clocksource_enqueue(cs);
	clocksource_enqueue_watchdog(cs);
	clocksource_select();
	mutex_unlock(&clocksource_mutex);
	return 0;
}
EXPORT_SYMBOL_GPL(__clocksource_register_scale);

自动计算上文中提到的mult和shift值:

void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
{
	u64 sec;
	/*
	 * Calc the maximum number of seconds which we can run before
	 * wrapping around. For clocksources which have a mask > 32bit
	 * we need to limit the max sleep time to have a good
	 * conversion precision. 10 minutes is still a reasonable
	 * amount. That results in a shift value of 24 for a
	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
	 * margin as we do in clocksource_max_deferment()
	 */
	sec = (cs->mask - (cs->mask >> 3));
	do_div(sec, freq);
	do_div(sec, scale);
	if (!sec)
		sec = 1;
	else if (sec > 600 && cs->mask > UINT_MAX)
		sec = 600;

	clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
			       NSEC_PER_SEC / scale, sec * scale);  

	/*
	 * for clocksources that have large mults, to avoid overflow.
	 * Since mult may be adjusted by ntp, add an safety extra margin
	 *
	 */
	cs->maxadj = clocksource_max_adjustment(cs);
	while ((cs->mult + cs->maxadj < cs->mult)
		|| (cs->mult - cs->maxadj > cs->mult)) {
		cs->mult >>= 1;
		cs->shift--;
		cs->maxadj = clocksource_max_adjustment(cs);
	}

	cs->max_idle_ns = clocksource_max_deferment(cs);
}
EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);

将clocksource添加到clocksource_list中:

clocksource_enqueue函数负责按clocksource的rating的大小,把该clocksource按顺序挂在全局链表clocksource_list上,rating值越大,在链表上的位置越靠前。

static void clocksource_enqueue(struct clocksource *cs)
{
	struct list_head *entry = &clocksource_list;
	struct clocksource *tmp;

	list_for_each_entry(tmp, &clocksource_list, list)
		/* Keep track of the place, where to insert */
		if (tmp->rating >= cs->rating)
			entry = &tmp->list;
	list_add(&cs->list, entry);
}

####6.2.3平台实现 以三星平台为例子:

static void __init samsung_clocksource_init(void)
{
	unsigned long pclk;
	unsigned long clock_rate;
	int ret;

	pclk = clk_get_rate(pwm.timerclk);

	samsung_timer_set_prescale(pwm.source_id, pwm.tscaler_div);
	samsung_timer_set_divisor(pwm.source_id, pwm.tdiv);

	clock_rate = pclk / (pwm.tscaler_div * pwm.tdiv);

	samsung_time_setup(pwm.source_id, pwm.tcnt_max);
	samsung_time_start(pwm.source_id, true);

	if (pwm.source_id == 4)
		pwm.source_reg = pwm.base + 0x40;
	else
		pwm.source_reg = pwm.base + pwm.source_id * 0x0c + 0x14;

	sched_clock_register(samsung_read_sched_clock,
						pwm.variant.bits, clock_rate);

	samsung_clocksource.mask = CLOCKSOURCE_MASK(pwm.variant.bits);
	ret = clocksource_register_hz(&samsung_clocksource, clock_rate);    //
	if (ret)
		panic("samsung_clocksource_timer: can't register clocksource\n");
}
static struct clocksource samsung_clocksource = {
	.name		= "samsung_clocksource_timer",
	.rating		= 250,      //
	.read		= samsung_clocksource_read,     //
	.suspend	= samsung_clocksource_suspend,
	.resume		= samsung_clocksource_resume,
	.flags		= CLOCK_SOURCE_IS_CONTINUOUS,
};
static cycle_t samsung_clocksource_read(struct clocksource *c)
{
	return ~readl_relaxed(pwm.source_reg);
}

###6.3内核时间的表示 ####6.3.1内核时间单位 #####6.3.1.1 Jiffies 内核用jiffies变量记录系统启动以来经过的时钟滴答数,它的声明如下:

/*
 * The 64-bit value is not atomic - you MUST NOT read it
 * without sampling the sequence number in jiffies_lock.
 * get_jiffies_64() will do this for you as appropriate.
 */
extern u64 __jiffy_data jiffies_64;
extern unsigned long volatile __jiffy_data jiffies;

可见,32位的系统上,jiffies是一个32位的无符号数,系统每过1/Hz秒,jiffies的值就会加1,最终该变量可能会溢出,所以内核同时又定义了一个64位的变量jiffies_64,下面的代码段,就是声明两个变量放置在同一个数据段,linux链接脚本会确保jiffies变量和jiffies_64变量的内存地址是相同的。

/* some arch's have a small-data section that can be accessed register-relative
 * but that can only take up to, say, 4-byte variables. jiffies being part of
 * an 8-byte variable may not be correctly accessed unless we force the issue
 */
#define __jiffy_data  __attribute__((section(".data")))

通常,我们可以直接访问jiffies变量,但是要获得jiffies_64变量,linux针对不同的CPU位数定义了两个函数实现:get_jiffies_64,如下面两段代码:

64bit系统:

#if (BITS_PER_LONG < 64)
u64 get_jiffies_64(void);
#else
static inline u64 get_jiffies_64(void)
{
	return (u64)jiffies;
}
#endif

32bit系统:

#if (BITS_PER_LONG < 64)
u64 get_jiffies_64(void)
{
	unsigned long seq;
	u64 ret;

	do {
		seq = read_seqbegin(&jiffies_lock);
		ret = jiffies_64;
	} while (read_seqretry(&jiffies_lock, seq));
	return ret;
}
EXPORT_SYMBOL(get_jiffies_64);
#endif

jiffies的更新函数:

/*
 * Must hold jiffies_lock
 */
void do_timer(unsigned long ticks)
{
	jiffies_64 += ticks;
	calc_global_load(ticks);
}

jiffies是内核的低精度定时器的计时单位,低精度定时器在后续章节讲解。

####6.3.1.2 struct timeval timeval由下面代码可知,是由秒和微秒为单位的变量组成。

struct timeval {
	__kernel_time_t		tv_sec;		/* seconds */
	__kernel_suseconds_t	tv_usec;	/* microseconds */
};

####6.3.1.3 struct timespec timespec由下面的代码可知,是由秒和纳秒为单位的变量组成。

struct timespec {
	__kernel_time_t	tv_sec;			/* seconds */
	long		tv_nsec;		/* nanoseconds */
};

使用timespec的变量有以下数据结构:

struct timekeeper {
	…
	struct timespec64	wall_to_monotonic;
    struct timespec64	raw_time;
    …
}
/*
 * This wants to go into uapi/linux/time.h once we agreed about the
 * userspace interfaces.
 */
#if __BITS_PER_LONG == 64
# define timespec64 timespec
#else
struct timespec64 {
	time64_t	tv_sec;			/* seconds */
	long		tv_nsec;		/* nanoseconds */
};
#endif

墙上时间相关的大部分都会使用timespec时间,精确到纳秒级。

#####6.3.1.4 union ktime linux的通用时间架构ktime来表示时间,该变量的定义兼容32位和64位以big-little endian系统。64位的系统可以直接访问tv64字段,单位是纳秒。32位的系统则被拆分为两个字段:sec和nsec,并且照顾了大小端的不同。高精度定时器通常用ktime作为计时单位。

/*
 * ktime_t:
 *
 * A single 64-bit variable is used to store the hrtimers
 * internal representation of time values in scalar nanoseconds. The
 * design plays out best on 64-bit CPUs, where most conversions are
 * NOPs and most arithmetic ktime_t operations are plain arithmetic
 * operations.
 *
 */
union ktime {
	s64	tv64;
};

#####6.3.1.5 struct timekeeper

/**
 * struct timekeeper - Structure holding internal timekeeping values.
 * @tkr:		The readout base structure
 * @xtime_sec:		Current CLOCK_REALTIME time in seconds
 * @ktime_sec:		Current CLOCK_MONOTONIC time in seconds
 * @wall_to_monotonic:	CLOCK_REALTIME to CLOCK_MONOTONIC offset
 * @offs_real:		Offset clock monotonic -> clock realtime
 * @offs_boot:		Offset clock monotonic -> clock boottime
 * @offs_tai:		Offset clock monotonic -> clock tai
 * @tai_offset:		The current UTC to TAI offset in seconds
 * @base_raw:		Monotonic raw base time in ktime_t format
 * @raw_time:		Monotonic raw base time in timespec64 format
 * @cycle_interval:	Number of clock cycles in one NTP interval
 * @xtime_interval:	Number of clock shifted nano seconds in one NTP
 *			interval.
 * @xtime_remainder:	Shifted nano seconds left over when rounding
 *			@cycle_interval
 * @raw_interval:	Raw nano seconds accumulated per NTP interval.
 * @ntp_error:		Difference between accumulated time and NTP time in ntp
 *			shifted nano seconds.
 * @ntp_error_shift:	Shift conversion between clock shifted nano seconds and
 *			ntp shifted nano seconds.
 *
 * Note: For timespec(64) based interfaces wall_to_monotonic is what
 * we need to add to xtime (or xtime corrected for sub jiffie times)
 * to get to monotonic time.  Monotonic is pegged at zero at system
 * boot time, so wall_to_monotonic will be negative, however, we will
 * ALWAYS keep the tv_nsec part positive so we can use the usual
 * normalization.
 *
 * wall_to_monotonic is moved after resume from suspend for the
 * monotonic time not to jump. We need to add total_sleep_time to
 * wall_to_monotonic to get the real boot based time offset.
 *
 * wall_to_monotonic is no longer the boot time, getboottime must be
 * used instead.
 */
struct timekeeper {
	struct tk_read_base	tkr;
	u64			xtime_sec;
	unsigned long		ktime_sec;
	struct timespec64	wall_to_monotonic;
	ktime_t			offs_real;
	ktime_t			offs_boot;
	ktime_t			offs_tai;
	s32			tai_offset;
	ktime_t			base_raw;
	struct timespec64	raw_time;

	/* The following members are for timekeeping internal use */
	cycle_t			cycle_interval;
	u64			xtime_interval;
	s64			xtime_remainder;
	u32			raw_interval;
	/* The ntp_tick_length() value currently being used.
	 * This cached copy ensures we consistently apply the tick
	 * length for an entire tick, as ntp_tick_length may change
	 * mid-tick, and we don't want to apply that new value to
	 * the tick in progress.
	 */
	u64			ntp_tick;
	/* Difference between accumulated time and NTP time in ntp
	 * shifted nano seconds. */
	s64			ntp_error;
	u32			ntp_error_shift;
	u32			ntp_err_mult;
};

####6.3.2内核时间分类

include/uapi/linux/timer.h 中有内核时间的分类定义:

/*
 * The IDs of the various system clocks (for POSIX.1b interval timers):
 */
#define CLOCK_REALTIME			0
#define CLOCK_MONOTONIC			1
#define CLOCK_PROCESS_CPUTIME_ID	2
#define CLOCK_THREAD_CPUTIME_ID		3
#define CLOCK_MONOTONIC_RAW		4
#define CLOCK_REALTIME_COARSE		5
#define CLOCK_MONOTONIC_COARSE		6
#define CLOCK_BOOTTIME			7
#define CLOCK_REALTIME_ALARM		8
#define CLOCK_BOOTTIME_ALARM		9
#define CLOCK_SGI_CYCLE			10	/* Hardware specific */
#define CLOCK_TAI			11

#define MAX_CLOCKS			16
#define CLOCKS_MASK			(CLOCK_REALTIME | CLOCK_MONOTONIC)
#define CLOCKS_MONO			CLOCK_MONOTONIC

我们只讨论CLOCK_REALTIME,CLOCK_MONOTONIC,CLOCK_MONOTONIC_RAW和CLOCK_BOOTTIME宏定义对应的内核时间。

#####6.3.2.1 RTC时间(rtc_time) linux为RTC时间提供了rtc_time的数据结构,与其他类型的内核时间相比RTC时间的读取时需要linux RTC驱动的支持。RTC驱动从硬件寄存器中读取相应的年月日天时分秒等时间存储到数据结构之中,当我们需要从RTC读取时间数据的时候,可以通过rtc_read_time函数得到当前的rtc_time的值。驱动通过rtc_tm_to_time接口将rtc_time时间转化为秒。

/*
 * The struct used to pass data via the following ioctl. Similar to the
 * struct tm in <time.h>, but it needs to be here so that the kernel
 * source is self contained, allowing cross-compiles, etc. etc.
 */

struct rtc_time {
	int tm_sec;
	int tm_min;
	int tm_hour;
	int tm_mday;
	int tm_mon;
	int tm_year;
	int tm_wday;
	int tm_yday;
	int tm_isdst;
};

RTC的驱动程序代码放置在linux的RTC子系统中,位置为:drivers/rtc目录

/**
 * Deprecated. Use rtc_tm_to_time64().
 */
static inline int rtc_tm_to_time(struct rtc_time *tm, unsigned long *time)
{
	*time = rtc_tm_to_time64(tm);

	return 0;
}

/*
 * rtc_tm_to_time64 - Converts rtc_time to time64_t.
 * Convert Gregorian date to seconds since 01-01-1970 00:00:00.
 */
time64_t rtc_tm_to_time64(struct rtc_time *tm)
{
	return mktime64(tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday,
			tm->tm_hour, tm->tm_min, tm->tm_sec);
}
EXPORT_SYMBOL(rtc_tm_to_time64);

tm->tm_year为什么要加上1900?

/**
 * Deprecated. Use mktime64().
 */
static inline unsigned long mktime(const unsigned int year,
			const unsigned int mon, const unsigned int day,
			const unsigned int hour, const unsigned int min,
			const unsigned int sec)
{
	return mktime64(year, mon, day, hour, min, sec);
}

/*
 * mktime64 - Converts date to seconds.
 * Converts Gregorian date to seconds since 1970-01-01 00:00:00.
 * Assumes input in normal date format, i.e. 1980-12-31 23:59:59
 * => year=1980, mon=12, day=31, hour=23, min=59, sec=59.
 *
 * [For the Julian calendar (which was used in Russia before 1917,
 * Britain & colonies before 1752, anywhere else before 1582,
 * and is still in use by some communities) leave out the
 * -year/100+year/400 terms, and add 10.]
 *
 * This algorithm was first published by Gauss (I think).
 */
time64_t mktime64(const unsigned int year0, const unsigned int mon0,
		const unsigned int day, const unsigned int hour,
		const unsigned int min, const unsigned int sec)
{
	unsigned int mon = mon0, year = year0;

	/* 1..12 -> 11,12,1..10 */
	if (0 >= (int) (mon -= 2)) {
		mon += 12;	/* Puts Feb last since it has leap day */
		year -= 1;
	}

	return ((((time64_t)
		  (year/4 - year/100 + year/400 + 367*mon/12 + day) +
		  year*365 - 719499
	    )*24 + hour /* now have hours */
	  )*60 + min /* now have minutes */
	)*60 + sec; /* finally seconds */
}
EXPORT_SYMBOL(mktime64);

请注意,mktime函数传给它的参数是正常的年月日时间,但是返回的值应该是减去01-01-1970 00:00:00之后秒数。

#####6.3.2.2墙上时间(xtime_sec/nsec) 墙上时间就是你的手表上显示的现实时间

墙上时间,在系统启动过程中根据实时钟(RTC)芯片保存数据进行初始化,在系统运行期间由系统时钟维护并在合适的时刻和RTC芯片进行同步。

墙上时间储存于系统核心变量xtime中,但是最新的内核似乎有了变化,原来xtime中的sec被直接嵌入在了struct timekeeper的结构体中:

u64			xtime_sec;

该xtime_sec,xtime_nsec变量记录了现实世界中的年月日格式的时间,以便内核对某些对象和事件做时间标记,如记录文件的创建时间、修改时间、上次访问时间,或者供用户进程通过系统调用来使用。

墙上时间的初始化:

在start_kernel函数执行过程中,有函数timekeeping_init,先去读RTC时间,然后初始化xtime。

/*
 * timekeeping_init - Initializes the clocksource and common timekeeping values
 */
void __init timekeeping_init(void)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	struct clocksource *clock;
	unsigned long flags;
	struct timespec64 now, boot, tmp;
	struct timespec ts;

	read_persistent_clock(&ts);	   //(a)
	now = timespec_to_timespec64(ts);
	if (!timespec64_valid_strict(&now)) {
		pr_warn("WARNING: Persistent clock returned invalid value!\n"
			"         Check your CMOS/BIOS settings.\n");
		now.tv_sec = 0;
		now.tv_nsec = 0;
	} else if (now.tv_sec || now.tv_nsec)
		persistent_clock_exist = true;

	read_boot_clock(&ts);
	boot = timespec_to_timespec64(ts);
	if (!timespec64_valid_strict(&boot)) {
		pr_warn("WARNING: Boot clock returned invalid value!\n"
			"         Check your CMOS/BIOS settings.\n");
		boot.tv_sec = 0;
		boot.tv_nsec = 0;
	}

	raw_spin_lock_irqsave(&timekeeper_lock, flags);
	write_seqcount_begin(&tk_core.seq);
	ntp_init();

	clock = clocksource_default_clock();
	if (clock->enable)
		clock->enable(clock);
	tk_setup_internals(tk, clock);

	tk_set_xtime(tk, &now);	//(b)
	tk->raw_time.tv_sec = 0;
	tk->raw_time.tv_nsec = 0;
	tk->base_raw.tv64 = 0;
	if (boot.tv_sec == 0 && boot.tv_nsec == 0)
		boot = tk_xtime(tk);

	set_normalized_timespec64(&tmp, -boot.tv_sec, -boot.tv_nsec);
	tk_set_wall_to_mono(tk, tmp);

	timekeeping_update(tk, TK_MIRROR);

	write_seqcount_end(&tk_core.seq);
	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
}

代码(a)从RTC中读取当前时间,代码(b)根据读取的时间初始化xtime_sec,xtime_nsecs墙上时间。但是并不是每种CPU架构都能够支持read_persistent_clock(&ts)函数。 比如ARM架构就是通过:rtc_hctosys函数来初始化xtime_sec/nsec的。

初始化xtime的函数:

static void tk_set_xtime(struct timekeeper *tk, const struct timespec64 *ts)
{
	tk->xtime_sec = ts->tv_sec;
	tk->tkr.xtime_nsec = (u64)ts->tv_nsec << tk->tkr.shift;
}

ARM架构初始化xtime的代码:

/* IMPORTANT: the RTC only stores whole seconds. It is arbitrary
 * whether it stores the most close value or the value with partial
 * seconds truncated. However, it is important that we use it to store
 * the truncated value. This is because otherwise it is necessary,
 * in an rtc sync function, to read both xtime.tv_sec and
 * xtime.tv_nsec. On some processors (i.e. ARM), an atomic read
 * of >32bits is not possible. So storing the most close value would
 * slow down the sync API. So here we have the truncated value and
 * the best guess is to add 0.5s.
 */

static int __init rtc_hctosys(void)
{
	int err = -ENODEV;
	struct rtc_time tm;
	struct timespec tv = {
		.tv_nsec = NSEC_PER_SEC >> 1,
	};
	struct rtc_device *rtc = rtc_class_open(CONFIG_RTC_HCTOSYS_DEVICE);

	if (rtc == NULL) {
		pr_err("%s: unable to open rtc device (%s)\n",
			__FILE__, CONFIG_RTC_HCTOSYS_DEVICE);
		goto err_open;
	}

	err = rtc_read_time(rtc, &tm);	//(a)
	if (err) {
		dev_err(rtc->dev.parent,
			"hctosys: unable to read the hardware clock\n");
		goto err_read;

	}

	err = rtc_valid_tm(&tm);
	if (err) {
		dev_err(rtc->dev.parent,
			"hctosys: invalid date/time\n");
		goto err_invalid;
	}

	rtc_tm_to_time(&tm, &tv.tv_sec); 	//(b)

	err = do_settimeofday(&tv); 	//(c)

	dev_info(rtc->dev.parent,
		"setting system clock to "
		"%d-%02d-%02d %02d:%02d:%02d UTC (%u)\n",
		tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday,
		tm.tm_hour, tm.tm_min, tm.tm_sec,
		(unsigned int) tv.tv_sec);

err_invalid:
err_read:
	rtc_class_close(rtc);

err_open:
	rtc_hctosys_ret = err;

	return err;
}

代码(a)(b)读取RTC的当前时间值,代码(c)通过调用do_settimeofday来调用tk_set_xtime对xtime进行初始化

#####6.3.2.3 Monotonic time

除了xtime表示墙上的真实时间外,linux维护了另外一个时间:monotonic time,可以把它理解为系统启动以来所经过的时间,该时间只能单调递增,可以理解为xtime虽然正常情况下也是递增的,但是毕竟用户可以主动向前或向后调整墙上时间,从而修改xtime值。 但是monotonic时间不可以往后退,系统启动后只能不断递增,不像xtime,当系统休眠时,monotonic时间不会递增。

内核定义了一个变量wall_to_monotonic,记录了墙上时间xtime和monotonic时间之间的偏移量,当需要获monotonic时间时使用算式:

monotonic时间 = xtime + wall_to_monotonic(1)

在前面讲到的timekeeping_init 函数中,有代码:

/*
 * timekeeping_init - Initializes the clocksource and common timekeeping values
 */
void __init timekeeping_init(void)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	struct clocksource *clock;
	unsigned long flags;
	struct timespec64 now, boot, tmp;
	struct timespec ts;

	read_persistent_clock(&ts);
	now = timespec_to_timespec64(ts);
	if (!timespec64_valid_strict(&now)) {
		pr_warn("WARNING: Persistent clock returned invalid value!\n"
			"         Check your CMOS/BIOS settings.\n");
		now.tv_sec = 0;
		now.tv_nsec = 0;
	} else if (now.tv_sec || now.tv_nsec)
		persistent_clock_exist = true;

	read_boot_clock(&ts);	//(a)
	boot = timespec_to_timespec64(ts);
	if (!timespec64_valid_strict(&boot)) {
		pr_warn("WARNING: Boot clock returned invalid value!\n"
			"         Check your CMOS/BIOS settings.\n");
		boot.tv_sec = 0;
		boot.tv_nsec = 0;
	}

	raw_spin_lock_irqsave(&timekeeper_lock, flags);
	write_seqcount_begin(&tk_core.seq);
	ntp_init();

	clock = clocksource_default_clock();
	if (clock->enable)
		clock->enable(clock);
	tk_setup_internals(tk, clock);

	tk_set_xtime(tk, &now);
	tk->raw_time.tv_sec = 0;
	tk->raw_time.tv_nsec = 0;	//(d)
	tk->base_raw.tv64 = 0;
	if (boot.tv_sec == 0 && boot.tv_nsec == 0)	//(b)
		boot = tk_xtime(tk);

	set_normalized_timespec64(&tmp, -boot.tv_sec, -boot.tv_nsec);
	tk_set_wall_to_mono(tk, tmp);	//(c)

	timekeeping_update(tk, TK_MIRROR);

	write_seqcount_end(&tk_core.seq);
	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
}

代码(a):

void read_boot_clock(struct timespec *ts)
{
	__read_boot_clock(ts);
}

linux现有的平台中很少有自己支持__read_boot_clock的,所以都使用的了linux默认的函数:static clock_access_fn __read_boot_clock = dummy_clock_access;而这个函数将boot时间初始化为0.

static void dummy_clock_access(struct timespec *ts)
{
	ts->tv_sec = 0;
	ts->tv_nsec = 0;
}

代码(b)是将boot时间初始化为xtime,代码(c)是初始化wall_to_monotonic为 –xtime,而我们根据算式(1)得知,系统的初始化monotonic时间为0。

#####6.3.2.4 Raw Monotonic time

Raw Monotonic time用来表示真正的硬件时间,它和monotonic时间区别就在于不受ntp时间调整的影响,开机后就单调的递增。它在内核中的表现形式为数据结构raw_time。它被内嵌在struct timekeeper的结构体中:

struct timespec64	raw_time;

上节的代码(d)就是初始化raw_time为0.

#####6.3.2.5 Boot time 与monotonic原理相同,唯一不同的是计算它需要累加上系统休眠的时间,它代表着系统上电后的总时间。所以这里面我们不得不引入一个数据结构total_sleep_time,内核用total_sleep_time记录休眠的时间,每次休眠醒来后重新累加该时间,并调整wall_to_monotonic的值,使其在系统休眠醒来后monotonic时间不会发生跳变。因为monotonic的值被调整。所以如果想获取boot time,需要加入该变量的值:

/* @offs_boot:		Offset clock monotonic -> clock boottime */
ktime_t			offs_boot;

####6.3.3 读取内核时间 #####6.3.3.1 getnstimeofday 读取xtime墙上时间,返回timespec结构的时间值,精确到ns级别。

/**
 * __getnstimeofday64 - Returns the time of day in a timespec64.
 * @ts:		pointer to the timespec to be set
 *
 * Updates the time of day in the timespec.
 * Returns 0 on success, or -ve when suspended (timespec will be undefined).
 */
int __getnstimeofday64(struct timespec64 *ts)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	unsigned long seq;
	s64 nsecs = 0;

	do {
		seq = read_seqcount_begin(&tk_core.seq);

		ts->tv_sec = tk->xtime_sec;	//(a)
		nsecs = timekeeping_get_ns(&tk->tkr);	//(b)

	} while (read_seqcount_retry(&tk_core.seq, seq));

	ts->tv_nsec = 0;
	timespec64_add_ns(ts, nsecs);	//(c)

	/*
	 * Do not bail out early, in case there were callers still using
	 * the value, even in the face of the WARN_ON.
	 */
	if (unlikely(timekeeping_suspended))
		return -EAGAIN;
	return 0;
}
EXPORT_SYMBOL(__getnstimeofday64);

代码(a)xtime_sec的值直接赋值给了ts的sec,所以当前的秒的分量值是直接赋值的。

代码(b)展开如下代码,这个函数是通过clocksource时钟源读取cycle然后转化成当前的纳秒的分量值的。代码(c)和(b)中读取的ns值累加到timespec的nsec分量中。

请注意下面的代码(d)(e)片段,(d)是直接调用我们讲解过的clocksource的read函数,读取cycle的值。代码(e)使用我们预先计算出来的mult,shift因子将cycle转换成ns。

static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
{
	cycle_t cycle_now, delta;
	s64 nsec;

	/* read clocksource: */
	cycle_now = tkr->read(tkr->clock);	//(d)

	/* calculate the delta since the last update_wall_time: */
	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);

	nsec = delta * tkr->mult + tkr->xtime_nsec;
	nsec >>= tkr->shift;	//(e)

	/* If arch requires, add in get_arch_timeoffset() */
	return nsec + arch_gettimeoffset();
}

#####6.3.3.2 ktime_get 读取monotonic时间,返回ktime结构的时间值,精确到ns级别。

函数跟上面的__getnstimeofday比较类似,但是区别是这里将xtime时间加上了一个tk->wall_to_monotonic的偏移量将其转化为monotonic时间。

ktime_t ktime_get(void)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	unsigned int seq;
	ktime_t base;
	s64 nsecs;

	WARN_ON(timekeeping_suspended);

	do {
		seq = read_seqcount_begin(&tk_core.seq);
		base = tk->tkr.base_mono;
		nsecs = timekeeping_get_ns(&tk->tkr);

	} while (read_seqcount_retry(&tk_core.seq, seq));

	return ktime_add_ns(base, nsecs);
}
EXPORT_SYMBOL_GPL(ktime_get);

#####6.3.3.3 ktime_get_ts 读取monotonic时间,返回timespec结构的时间值,精确到ns级别。

#####6.3.3.4 getnstime_raw_and_real 读取raw monotonic时间和getnstimeofday时间。

#####6.3.3.5 do_gettimeofday 从函数实现可以看出这个函数是调用了getnstimeofday(&now);,返回值为timeval结构的时间显示。

/**
 * do_gettimeofday - Returns the time of day in a timeval
 * @tv:		pointer to the timeval to be set
 *
 * NOTE: Users should be converted to using getnstimeofday()
 */
void do_gettimeofday(struct timeval *tv)
{
	struct timespec64 now;

	getnstimeofday64(&now);
	tv->tv_sec = now.tv_sec;
	tv->tv_usec = now.tv_nsec/1000;
}
EXPORT_SYMBOL(do_gettimeofday);

#####6.3.3.6 ktime_get_real 返回ktime结构类型的xtime墙上时间。

/**
 * ktime_get_real - get the real (wall-) time in ktime_t format
 */
static inline ktime_t ktime_get_real(void)
{
	return ktime_get_with_offset(TK_OFFS_REAL);
}

#####6.3.3.7 getrawmonotonic 返回timespec结构的raw monotonic时间。

static inline void getrawmonotonic(struct timespec *ts)
{
	getrawmonotonic64(ts);
}

/**
 * getrawmonotonic64 - Returns the raw monotonic time in a timespec
 * @ts:		pointer to the timespec64 to be set
 *
 * Returns the raw monotonic time (completely un-modified by ntp)
 */
void getrawmonotonic64(struct timespec64 *ts)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	struct timespec64 ts64;
	unsigned long seq;
	s64 nsecs;

	do {
		seq = read_seqcount_begin(&tk_core.seq);
		nsecs = timekeeping_get_ns_raw(tk);
		ts64 = tk->raw_time;

	} while (read_seqcount_retry(&tk_core.seq, seq));

	timespec64_add_ns(&ts64, nsecs);
	*ts = ts64;
}
EXPORT_SYMBOL(getrawmonotonic64);

#####6.3.3.8 getboottime 如果要得到monotonic时间:

monotonic时间 = xtime + wall_to_monotonic

如果要得到boot时间:

Boot时间 = xtime + wall_to_monotonic + total_sleep_time

该函数只能得到boot与xtime之间相对的偏移量。

/**
 * getboottime - Return the real time of system boot.
 * @ts:		pointer to the timespec to be set
 *
 * Returns the wall-time of boot in a timespec.
 *
 * This is based on the wall_to_monotonic offset and the total suspend
 * time. Calls to settimeofday will affect the value returned (which
 * basically means that however wrong your real time clock is at boot time,
 * you get the right time here).
 */
void getboottime(struct timespec *ts)
{
	struct timekeeper *tk = &tk_core.timekeeper;
	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);

	*ts = ktime_to_timespec(t);
}
EXPORT_SYMBOL_GPL(getboottime);

#####6.3.3.9 get_monotonic_boottime 该函数能够得到真正的boot时间。

/*
 * Timespec interfaces utilizing the ktime based ones
 */
static inline void get_monotonic_boottime(struct timespec *ts)
{
	*ts = ktime_to_timespec(ktime_get_boottime());
}

#####6.3.3.10 ktime_get_boottime 该函数返回ktime形式的真正的boot时间。

/**
 * ktime_get_boottime - Returns monotonic time since boot in ktime_t format
 *
 * This is similar to CLOCK_MONTONIC/ktime_get, but also includes the
 * time spent in suspend.
 */
static inline ktime_t ktime_get_boottime(void)
{
	return ktime_get_with_offset(TK_OFFS_BOOT);
}