本教程介绍如何使用线程分析器在多线程程序中检测潜在的死锁和实际的死锁。
本教程涵盖以下主题:
术语死锁描述的是两个或多个线程由于相互等待而永远被阻塞的情况。导致死锁的原因有很多,例如程序逻辑错误以及不恰当使用同步(如锁和屏障)。本教程重点介绍由于不恰当使用互斥锁而导致的死锁。此类死锁通常发生在多线程应用程序中。
以下三个条件成立时,包含两个或多个线程的进程可能会进入死锁状态:
已持有锁的线程请求新锁
同时发出对新锁的请求
两个或多个线程形成一个循环链,其中每个线程都在等待链中下一个线程持有的锁
以下是一个死锁情况的简单示例:
线程 1 持有锁 A 并请求锁 B
线程 2 持有锁 B 并请求锁 A
死锁可分为两种类型:潜在死锁或实际死锁,二者的区别如下:
潜在死锁不一定在给定的运行过程中发生,但可能发生在程序的任何执行过程中,具体取决于线程的调度情况以及线程进行锁请求的时间。
实际死锁是在程序执行过程中发生的死锁。实际死锁会导致所涉及的线程挂起,但不一定会导致整个进程挂起。
本教程中使用的 din_philo.c 源文件位于目录 /opt/solstudio12.2/prod/examples/tha/din_philo(Oracle Solaris 系统)或 /opt/oracle/solstudio12.2/prod/examples/tha/din_philo(Linux 系统)下。该示例目录包含 Makefile 和 DEMO 说明文件,但是本教程并不遵循这些说明,也不使用 Makefile。本教程将逐步指导您执行命令。
为了学习本教程,可以将 din_philo.c 文件从示例目录复制到其他目录,也可以创建自己的文件并从下面列出的代码内容中复制代码。
模拟哲学家就餐问题的 din_philo.c 样例程序是一个使用 POSIX 线程的 C 程序。该程序可以同时展示潜在死锁和实际死锁。
din_philo.c 的源代码如下所示:
1 /* 2 * Copyright (c) 2006, 2010, Oracle and/or its affiliates. All Rights Reserved. 3 * @(#)din_philo.c 1.4 (Oracle) 10/03/26 4 */ 5 6 #include <pthread.h> 7 #include <stdio.h> 8 #include <unistd.h> 9 #include <stdlib.h> 10 #include <errno.h> 11 #include <assert.h> 12 13 #define PHILOS 5 14 #define DELAY 5000 15 #define FOOD 100 16 17 void *philosopher (void *id); 18 void grab_chopstick (int, 19 int, 20 char *); 21 void down_chopsticks (int, 22 int); 23 int food_on_table (); 24 25 pthread_mutex_t chopstick[PHILOS]; 26 pthread_t philo[PHILOS]; 27 pthread_mutex_t food_lock; 28 int sleep_seconds = 0; 29 30 31 int 32 main (int argn, 33 char **argv) 34 { 35 int i; 36 37 if (argn == 2) 38 sleep_seconds = atoi (argv[1]); 39 40 pthread_mutex_init (&food_lock, NULL); 41 for (i = 0; i < PHILOS; i++) 42 pthread_mutex_init (&chopstick[i], NULL); 43 for (i = 0; i < PHILOS; i++) 44 pthread_create (&philo[i], NULL, philosopher, (void *)i); 45 for (i = 0; i < PHILOS; i++) 46 pthread_join (philo[i], NULL); 47 return 0; 48 } 49 50 void * 51 philosopher (void *num) 52 { 53 int id; 54 int i, left_chopstick, right_chopstick, f; 55 56 id = (int)num; 57 printf ("Philosopher %d is done thinking and now ready to eat.\n", id); 58 right_chopstick = id; 59 left_chopstick = id + 1; 60 61 /* Wrap around the chopsticks. */ 62 if (left_chopstick == PHILOS) 63 left_chopstick = 0; 64 65 while (f = food_on_table ()) { 66 67 /* Thanks to philosophers #1 who would like to take a nap 68 * before picking up the chopsticks, the other philosophers 69 * may be able to eat their dishes and not deadlock. 70 */ 71 if (id == 1) 72 sleep (sleep_seconds); 73 74 grab_chopstick (id, right_chopstick, "right "); 75 grab_chopstick (id, left_chopstick, "left"); 76 77 printf ("Philosopher %d: eating.\n", id); 78 usleep (DELAY * (FOOD - f + 1)); 79 down_chopsticks (left_chopstick, right_chopstick); 80 } 81 82 printf ("Philosopher %d is done eating.\n", id); 83 return (NULL); 84 } 85 86 int 87 food_on_table () 88 { 89 static int food = FOOD; 90 int myfood; 91 92 pthread_mutex_lock (&food_lock); 93 if (food > 0) { 94 food--; 95 } 96 myfood = food; 97 pthread_mutex_unlock (&food_lock); 98 return myfood; 99 } 100 101 void 102 grab_chopstick (int phil, 103 int c, 104 char *hand) 105 { 106 pthread_mutex_lock (&chopstick[c]); 107 printf ("Philosopher %d: got %s chopstick %d\n", phil, hand, c); 108 } 109 110 void 111 down_chopsticks (int c1, 112 int c2) 113 { 114 pthread_mutex_unlock (&chopstick[c1]); 115 pthread_mutex_unlock (&chopstick[c2]); 116 }
哲学家就餐方案是一个结构如下的经典方案。编号从 0 到 4 的五位哲学家坐在圆桌旁思考。随着时间的推移,每个人开始饥饿并决定进餐。餐桌上有一盘面条,但是每位哲学家只有一根筷子可用。为了吃到面条,他们必须共用筷子。每位哲学家左边(他们面向餐桌而坐)的筷子的编号与该哲学家的编号相同。
每位哲学家首先拿到与自己编号相同的筷子。他拿到分配给自己的筷子后,他将去拿分配给邻座的筷子。拿到两根筷子后,他就可以进餐了。吃完后,他将筷子放回桌上的原来位置,一边一根。该过程一直重复,直到把面条吃完。
每位哲学家都拿到自己的筷子并等待使用其邻座的筷子时,就会发生实际的死锁。
编号为 0 的哲学家拿到编号为 0 的筷子,并等待编号为 1 的筷子。
编号为 1 的哲学家拿到编号为 1 的筷子,并等待编号为 2 的筷子。
编号为 2 的哲学家拿到编号为 2 的筷子,并等待编号为 3 的筷子。
编号为 3 的哲学家拿到编号为 3 的筷子,并等待编号为 4 的筷子。
编号为 4 的哲学家拿到编号为 4 的筷子,并等待编号为 0 的筷子。
在这种情况下,没人可以吃到东西,于是哲学家们陷入了死锁之中。多次运行该程序,您将看到该程序有时会挂起,而有时又可以运行至完成。该程序挂起的情况如以下运行过程样例所示:
prompt% cc din_philo.c prompt% a.out Philosopher 0 is done thinking and now ready to eat. Philosopher 2 is done thinking and now ready to eat. Philosopher 2: got right chopstick 2 Philosopher 2: got left chopstick 3 Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 4 is done thinking and now ready to eat. Philosopher 4: got right chopstick 4 Philosopher 2: eating. Philosopher 3 is done thinking and now ready to eat. Philosopher 1 is done thinking and now ready to eat. Philosopher 0: got right chopstick 0 Philosopher 3: got right chopstick 3 Philosopher 2: got right chopstick 2 Philosopher 1: got right chopstick 1 (hang) Execution terminated by pressing CTRL-C |
一种避免死锁的方法是让 1 号哲学家在拿起他的筷子之前等待一段时间。就代码而言,可让他在拿起自己的筷子之前休眠指定的一段时间 (sleep_seconds)。如果他休眠的时间足够长,该程序就可能会完成而不发生任何实际死锁。可以将他休眠的秒数指定为可执行程序的参数。如果不指定参数,该哲学家就不会休眠。
以下伪代码显示了每位哲学家的逻辑:
while (there is still food on the table) { if (sleep argument is specified and I am philosopher #1) { sleep specified amount of time } grab right fork grab left fork eat some food put down left fork put down right fork }
下面列出了该程序的一次运行,在这次运行中 1 号哲学家在拿起他的筷子之前等待 30 秒。该程序运行至完成,所有五位哲学家都吃到了面条。
% a.out 30 Philosopher 0 is done thinking and now ready to eat. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 4 is done thinking and now ready to eat. Philosopher 4: got right chopstick 4 Philosopher 3 is done thinking and now ready to eat. Philosopher 3: got right chopstick 3 Philosopher 0: eating. Philosopher 2 is done thinking and now ready to eat. Philosopher 2: got right chopstick 2 Philosopher 1 is done thinking and now ready to eat. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. ... Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0 is done eating. Philosopher 4: got left chopstick 0 Philosopher 4: eating. Philosopher 4 is done eating. Philosopher 3: got left chopstick 4 Philosopher 3: eating. Philosopher 3 is done eating. Philosopher 2: got left chopstick 3 Philosopher 2: eating. Philosopher 2 is done eating. Philosopher 1: got right chopstick 1 Philosopher 1: got left chopstick 2 Philosopher 1: eating. Philosopher 1 is done eating. % Execution terminated normally |
尝试多次运行该程序,并指定不同的休眠参数。如果 1 号哲学家在拿起他的筷子之前仅等待很短的一段时间,那么发生什么情况呢?如果他等待的时间更长一些,又会怎么样?尝试为可执行程序 a.out 指定不同的休眠参数。在使用或不使用休眠参数的情况下多次重新运行该程序。该程序有时会挂起,而有时又可以运行至完成。程序是否挂起取决于线程的调度情况以及线程进行锁请求的时间。
可以使用线程分析器检查程序中的潜在死锁和实际死锁。线程分析器沿用与 Oracle Solaris Studio 性能分析器相同的“收集-分析”模型。
使用线程分析器的过程涉及三个步骤:
编译源代码。
创建死锁检测实验。
检查实验结果。
编译代码并务必指定 -g。不要指定高优化级别,因为在高优化级别时报告的信息(如行号和调用栈)可能是错误的。应使用 -g -xopenmp=noopt 编译 OpenMP 程序,并仅使用 -g -mt 编译 POSIX 线程程序。
有关这些选项的更多信息,请参见 cc(1)、CC(1) 或 f95(1) 手册页。
对于本教程,请使用以下命令编译代码:
% cc -g -o din_philo din_philo.c |
使用线程分析器的带有 -r deadlock 选项的 collect 命令。此选项将在程序的执行期间创建死锁检测实验。
对于本教程,请使用以下命令创建名为 din_philo.1.er 的死锁检测实验:
% collect -r deadlock -o din_philo.1.er din_philo |
可以通过创建多个死锁检测实验来提高检测到死锁的可能性。对于各个实验,应使用不同的线程数和不同的输入数据。例如,在 din_philo.c 代码中,可以更改以下行中的值:
13 #define PHILOS 5 14 #define DELAY 5000 15 #define FOOD 100
然后可以像以前一样进行编译,并收集其他实验。
有关更多信息,请参见 collect(1) 和 collector(1) 手册页。
可以使用线程分析器、性能分析器或 er_print 实用程序检查死锁检测实验。线程分析器和性能分析器都提供 GUI 界面;线程分析器显示的是一组简化的缺省标签,但在其他方面与性能分析器完全相同。
要启动线程分析器并打开 din_philo.1.er 实验,请键入以下命令:
% tha din_philo.1.er |
线程分析器包含菜单栏、工具栏以及包含多个标签的拆分窗格(不同标签对应不同的显示)。
打开收集的死锁检测实验时,缺省情况下会在左侧窗格中显示以下标签:
"Deadlocks"(死锁)标签
此标签显示线程分析器在程序中检测到的潜在死锁和实际死锁列表。缺省情况下会选中此标签。其中会显示每个死锁涉及到的线程。这些线程形成一个循环链,其中每个线程都持有一个锁并请求该链中的下一个线程持有的另一个锁。
"Dual Source"(双源)标签
在 "Deadlocks"(死锁)标签上的循环链中选择一个线程,然后单击 "Dual Source"(双源)标签。"Dual Source"(双源)标签会显示该线程持有锁的源代码位置,以及同一线程请求锁的源代码位置。该线程持有锁和请求锁的源代码行会突出显示。
"Experiments"(实验)标签
此标签显示实验中的装入对象并列出所有错误和警告消息。
线程分析器显示屏的右侧窗格中会显示以下标签:
"Summary"(摘要)标签,显示从 "Deadlocks"(死锁)标签中选择的死锁的摘要信息。
"Deadlocks Details"(死锁详细信息)标签,显示从 "Deadlocks"(死锁)标签中选择的线程上下文的详细信息。
er_print 实用程序提供命令行界面。可以在交互式会话中使用 er_print 实用程序并在该会话期间指定子命令。也可以使用命令行选项以非交互方式指定子命令。
使用 er_print 实用程序检查死锁时,以下子命令非常有用:
-deadlocks
该选项会报告实验中检测到的所有潜在死锁和实际死锁。在 (er_print) 提示符下指定 deadlocks,或者在 er_print 命令行上指定 -deadlocks。
-ddetail deadlock_id
该选项会返回具有指定 deadlock_id 的死锁的详细信息。在 (er_print) 提示符下指定 -ddetail,或者在 er_print 命令行上指定 ddetail。如果指定的 deadlock_id 为 all,将显示所有死锁的详细信息。否则,请指定单个死锁编号,例如为第一个死锁指定 1。
-header
该选项会显示有关实验的描述性信息并报告所有错误或警告。在 (er_print) 提示符下指定 header,或者在命令行上指定 -header。
有关更多信息,请参阅 collect(1)、tha(1)、analyzer(1) 和 er_print(1) 手册页。
本节说明如何使用线程分析器检查哲学家就餐程序中的死锁。
下面列出了导致实际死锁的哲学家就餐程序的一次运行。
% cc -g -o din_philo din_philo.c % collect -r deadlock -o din_philo.1.er din_philo Creating experiment database din_philo.1.er ... Philosopher 1 is done thinking and now ready to eat. Philosopher 2 is done thinking and now ready to eat. Philosopher 3 is done thinking and now ready to eat. Philosopher 0 is done thinking and now ready to eat. Philosopher 1: got right chopstick 1 Philosopher 3: got right chopstick 3 Philosopher 0: got right chopstick 0 Philosopher 1: got left chopstick 2 Philosopher 3: got left chopstick 4 Philosopher 4 is done thinking and now ready to eat. Philosopher 1: eating. Philosopher 3: eating. Philosopher 3: got right chopstick 3 Philosopher 4: got right chopstick 4 Philosopher 2: got right chopstick 2 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 1: got right chopstick 1 Philosopher 4: got left chopstick 0 Philosopher 4: eating. Philosopher 0: got right chopstick 0 Philosopher 3: got left chopstick 4 Philosopher 3: eating. Philosopher 4: got right chopstick 4 Philosopher 2: got left chopstick 3 Philosopher 2: eating. Philosopher 3: got right chopstick 3 Philosopher 1: got left chopstick 2 Philosopher 1: eating. Philosopher 2: got right chopstick 2 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 1: got right chopstick 1 Philosopher 4: got left chopstick 0 Philosopher 4: eating. Philosopher 0: got right chopstick 0 Philosopher 3: got left chopstick 4 Philosopher 3: eating. ... Philosopher 4: got right chopstick 4 Philosopher 2: got left chopstick 3 Philosopher 2: eating. Philosopher 2: got right chopstick 2 Philosopher 3: got right chopstick 3 (hang) Execution terminated by pressing CTRL-C |
键入以下命令以使用 er_print 实用程序检查实验:
% er_print din_philo.1.er (er_print) deadlocks |
Deadlock #1, Potential deadlock Thread #2 Lock being held: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #3 Lock being held: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #4 Lock being held: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #5 Lock being held: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #6 Lock being held: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Deadlock #2, Actual deadlock Thread #2 Lock being held: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #3 Lock being held: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #4 Lock being held: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #5 Lock being held: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #6 Lock being held: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Deadlocks List Summary: Experiment: din_philo.1.er Total Deadlocks: 2 (er_print)
以下屏幕抓图显示了线程分析器中显示的死锁信息。
线程分析器报告了 din_philo.c 的两个死锁,一个是潜在死锁,另一个是实际死锁。通过进一步检查,可以发现两个死锁是相同的。
该死锁涉及的循环链如下:
线程 2:在地址 |
线程 3:在地址 |
线程 4:在地址 |
线程 5:在地址 |
线程 6:在地址 |
选择循环链中的第一个线程(线程 2),然后单击 "Dual Source"(双源)标签,可以在源代码中看到,线程 2 在地址 0x215a0
持有锁,在源代码中也可以看到,该线程在地址 0x215b8
请求锁。以下屏幕抓图显示了对应于线程 2 的 "Dual Source"(双源)标签。每个源代码行的左侧会显示缺省的度量:"Exclusive Deadlocks"(互斥死锁)度量。此度量显示了该行上报告的(死锁涉及的)锁持有或锁请求操作的总次数。
如果提供足够大的休眠参数,哲学家就餐程序就可避免实际死锁并正常终止。但是,正常终止并不意味着程序可以完全避免死锁。它仅仅表示在给定的一次运行中,持有和请求的锁未形成死锁链。如果在其他运行中出现了时间变化,则仍可能会发生实际死锁。下面列出了由于引入 40 秒休眠时间而正常终止的哲学家就餐程序的一次运行。但是,er_print 实用程序和线程分析器会报告潜在死锁。
% cc -g -o din_philo_pt din_philo.c % collect -r deadlock -o din_philo_pt.1.er din_philo_pt 40 Creating experiment database tha.2.er ... Philosopher 0 is done thinking and now ready to eat. Philosopher 2 is done thinking and now ready to eat. Philosopher 1 is done thinking and now ready to eat. Philosopher 3 is done thinking and now ready to eat. Philosopher 2: got right chopstick 2 Philosopher 3: got right chopstick 3 Philosopher 0: got right chopstick 0 Philosopher 4 is done thinking and now ready to eat. Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 3: got left chopstick 4 Philosopher 3: eating. Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 0: got right chopstick 0 Philosopher 2: got left chopstick 3 Philosopher 2: eating. ... Philosopher 4: got right chopstick 4 Philosopher 3: got right chopstick 3 Philosopher 2: got right chopstick 2 Philosopher 4: got left chopstick 0 Philosopher 4: eating. Philosopher 4 is done eating. Philosopher 3: got left chopstick 4 Philosopher 3: eating. Philosopher 0: got right chopstick 0 Philosopher 0: got left chopstick 1 Philosopher 0: eating. Philosopher 3 is done eating. Philosopher 2: got left chopstick 3 Philosopher 2: eating. Philosopher 0 is done eating. Philosopher 2 is done eating. Philosopher 1: got right chopstick 1 Philosopher 1: got left chopstick 2 Philosopher 1: eating. Philosopher 1 is done eating. % Execution terminated normally |
键入下列以粗体显示的命令以使用 er_print 实用程序检查实验:
% er_print din_philo_pt.1.er (er_print) deadlocks Deadlock #1, Potential deadlock Thread #2 Lock being held: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #3 Lock being held: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #4 Lock being held: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #5 Lock being held: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #6 Lock being held: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Deadlocks List Summary: Experiment: din_philo_pt.1.er Total Deadlocks: 1 (er_print) |
Deadlock #1, Potential deadlock Thread #2 Lock being held: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #3 Lock being held: 0x215b8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #4 Lock being held: 0x215d0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #5 Lock being held: 0x215e8, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Thread #6 Lock being held: 0x21600, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Lock being requested: 0x215a0, at: grab_chopstick + 0x0000002C, line 106 in "din_philo.c" Deadlocks List Summary: Experiment: din_philo_pt.1.er Total Deadlocks: 1 (er_print)
以下屏幕抓图显示了线程分析器界面中的潜在死锁信息。
一种消除潜在死锁和实际死锁的方法是使用一种令牌机制,该机制要求哲学家必须收到令牌后才能尝试进餐。可用令牌的数量必须少于餐桌前哲学家的人数。当哲学家收到令牌后,他可以根据餐桌规则尝试进餐。每位哲学家在进餐后归还令牌,然后重复该过程。以下伪代码显示了使用令牌机制时每位哲学家的逻辑:
while (there is still food on the table) { get token grab right fork grab left fork eat some food put down left fork put down right fork return token }
以下几节将详细介绍令牌机制的两种不同实现。
下面列出了使用令牌机制的哲学家就餐程序的修正版本。该解决方案结合了四个令牌,比就餐人数少一个,因此最多有四位哲学家可同时尝试进餐。该版本的程序称为 din_philo_fix1.c:
从 /opt/solstudio12.2/prod/examples/tha/din_philo 或 /opt/oracle/solstudio12.2/prod/examples/tha/din_philo 目录复制 din_philo_fix1.c。
1 /* 2 * Copyright (c) 2006, 2010, Oracle and/or its affiliates. All Rights Reserved. 3 * @(#)din_philo_fix1.c 1.3 (Oracle) 10/03/26 4 */ 5 6 #include <pthread.h> 7 #include <stdio.h> 8 #include <unistd.h> 9 #include <stdlib.h> 10 #include <errno.h> 11 #include <assert.h> 12 13 #define PHILOS 5 14 #define DELAY 5000 15 #define FOOD 100 16 17 void *philosopher (void *id); 18 void grab_chopstick (int, 19 int, 20 char *); 21 void down_chopsticks (int, 22 int); 23 int food_on_table (); 24 void get_token (); 25 void return_token (); 26 27 pthread_mutex_t chopstick[PHILOS]; 28 pthread_t philo[PHILOS]; 29 pthread_mutex_t food_lock; 30 pthread_mutex_t num_can_eat_lock; 31 int sleep_seconds = 0; 32 uint32_t num_can_eat = PHILOS - 1; 33 34 35 int 36 main (int argn, 37 char **argv) 38 { 39 int i; 40 41 pthread_mutex_init (&food_lock, NULL); 42 pthread_mutex_init (&num_can_eat_lock, NULL); 43 for (i = 0; i < PHILOS; i++) 44 pthread_mutex_init (&chopstick[i], NULL); 45 for (i = 0; i < PHILOS; i++) 46 pthread_create (&philo[i], NULL, philosopher, (void *)i); 47 for (i = 0; i < PHILOS; i++) 48 pthread_join (philo[i], NULL); 49 return 0; 50 } 51 52 void * 53 philosopher (void *num) 54 { 55 int id; 56 int i, left_chopstick, right_chopstick, f; 57 58 id = (int)num; 59 printf ("Philosopher %d is done thinking and now ready to eat.\n", id); 60 right_chopstick = id; 61 left_chopstick = id + 1; 62 63 /* Wrap around the chopsticks. */ 64 if (left_chopstick == PHILOS) 65 left_chopstick = 0; 66 67 while (f = food_on_table ()) { 68 get_token (); 69 70 grab_chopstick (id, right_chopstick, "right "); 71 grab_chopstick (id, left_chopstick, "left"); 72 73 printf ("Philosopher %d: eating.\n", id); 74 usleep (DELAY * (FOOD - f + 1)); 75 down_chopsticks (left_chopstick, right_chopstick); 76 77 return_token (); 78 } 79 80 printf ("Philosopher %d is done eating.\n", id); 81 return (NULL); 82 } 83 84 int 85 food_on_table () 86 { 87 static int food = FOOD; 88 int myfood; 89 90 pthread_mutex_lock (&food_lock); 91 if (food > 0) { 92 food--; 93 } 94 myfood = food; 95 pthread_mutex_unlock (&food_lock); 96 return myfood; 97 } 98 99 void 100 grab_chopstick (int phil, 101 int c, 102 char *hand) 103 { 104 pthread_mutex_lock (&chopstick[c]); 105 printf ("Philosopher %d: got %s chopstick %d\n", phil, hand, c); 106 } 107 108 109 110 void 111 down_chopsticks (int c1, 112 int c2) 113 { 114 pthread_mutex_unlock (&chopstick[c1]); 115 pthread_mutex_unlock (&chopstick[c2]); 116 } 117 118 119 void 120 get_token () 121 { 122 int successful = 0; 123 124 while (!successful) { 125 pthread_mutex_lock (&num_can_eat_lock); 126 if (num_can_eat > 0) { 127 num_can_eat--; 128 successful = 1; 129 } 130 else { 131 successful = 0; 132 } 133 pthread_mutex_unlock (&num_can_eat_lock); 134 } 135 } 136 137 void 138 return_token () 139 { 140 pthread_mutex_lock (&num_can_eat_lock); 141 num_can_eat++; 142 pthread_mutex_unlock (&num_can_eat_lock); 143 }
尝试编译这一修正版本的哲学家就餐程序,并多次运行该程序。令牌机制限制了尝试使用筷子的就餐人数,从而可避免实际死锁和潜在死锁。
要进行编译,请使用以下命令:
cc -g -o din_philo_fix1 din_philo_fix1.c |
收集实验:
collect -r deadlock din_philo_fix1 -o din_philo_fix1.1.er |
即使是使用令牌机制,线程分析器也会报告该实现存在潜在死锁(虽然并不存在死锁)。这是一个误报。请参考以下包含潜在死锁详细信息的屏幕抓图。
选择循环链中的第一个线程(线程 2),然后单击 "Dual Source"(双源)标签,可以看到线程 2 在地址 0x216a8
持有锁的对应源代码位置,以及该线程在 0x216c0
请求锁的对应源代码位置。下图显示了对应于线程 2 的 "Dual Source"(双源)标签。
din_philo_fix1.c 中的 get_token() 函数使用 while 循环来同步线程。线程在成功获得令牌之前不会离开 while
循环(当 num_can_eat
大于 0 时可成功获得令牌)。while
循环将同时就餐的人数限制为 4。但是,线程分析器无法识别由 while
循环实现的同步。它认为所有五位哲学家同时尝试抓取筷子并进餐,因此报告了一个潜在死锁。下一节将详细说明如何使用线程分析器可识别的同步来限制同时就餐的人数。
下面列出了令牌机制的另一种实现。该实现仍然使用四个令牌,所以最多有四个就餐者可同时尝试进餐。但是,该实现使用 sem_wait() 和 sem_post() 信号例程限制就餐的哲学家人数。该版本的源文件称为 din_philo_fix2.c。
从 /opt/sunstudio12.2/prod/examples/tha/din_philo 或 /opt/oracle/solstudio12.2/prod/examples/tha/din_philo 目录复制 din_philo_fix2.c。
下面列出了详细的 din_philo_fix2.c:
1 /* 2 * Copyright (c) 2006, 2010, Oracle and/or its affiliates. All Rights Reserved. 3 * @(#)din_philo_fix2.c 1.3 (Oracle) 10/03/26 4 */ 5 6 #include <pthread.h> 7 #include <stdio.h> 8 #include <unistd.h> 9 #include <stdlib.h> 10 #include <errno.h> 11 #include <assert.h> 12 #include <semaphore.h> 13 14 #define PHILOS 5 15 #define DELAY 5000 16 #define FOOD 100 17 18 void *philosopher (void *id); 19 void grab_chopstick (int, 20 int, 21 char *); 22 void down_chopsticks (int, 23 int); 24 int food_on_table (); 25 void get_token (); 26 void return_token (); 27 28 pthread_mutex_t chopstick[PHILOS]; 29 pthread_t philo[PHILOS]; 30 pthread_mutex_t food_lock; 31 int sleep_seconds = 0; 32 sem_t num_can_eat_sem; 33 34 35 int 36 main (int argn, 37 char **argv) 38 { 39 int i; 40 41 pthread_mutex_init (&food_lock, NULL); 42 sem_init(&num_can_eat_sem, 0, PHILOS - 1); 43 for (i = 0; i < PHILOS; i++) 44 pthread_mutex_init (&chopstick[i], NULL); 45 for (i = 0; i < PHILOS; i++) 46 pthread_create (&philo[i], NULL, philosopher, (void *)i); 47 for (i = 0; i < PHILOS; i++) 48 pthread_join (philo[i], NULL); 49 return 0; 50 } 51 52 void * 53 philosopher (void *num) 54 { 55 int id; 56 int i, left_chopstick, right_chopstick, f; 57 58 id = (int)num; 59 printf ("Philosopher %d is done thinking and now ready to eat.\n", id); 60 right_chopstick = id; 61 left_chopstick = id + 1; 62 63 /* Wrap around the chopsticks. */ 64 if (left_chopstick == PHILOS) 65 left_chopstick = 0; 66 67 while (f = food_on_table ()) { 68 get_token (); 69 70 grab_chopstick (id, right_chopstick, "right "); 71 grab_chopstick (id, left_chopstick, "left"); 72 73 printf ("Philosopher %d: eating.\n", id); 74 usleep (DELAY * (FOOD - f + 1)); 75 down_chopsticks (left_chopstick, right_chopstick); 76 77 return_token (); 78 } 79 80 printf ("Philosopher %d is done eating.\n", id); 81 return (NULL); 82 } 83 84 int 85 food_on_table () 86 { 87 static int food = FOOD; 88 int myfood; 89 90 pthread_mutex_lock (&food_lock); 91 if (food > 0) { 92 food--; 93 } 94 myfood = food; 95 pthread_mutex_unlock (&food_lock); 96 return myfood; 97 } 98 99 void 100 grab_chopstick (int phil, 101 int c, 102 char *hand) 103 { 104 pthread_mutex_lock (&chopstick[c]); 105 printf ("Philosopher %d: got %s chopstick %d\n", phil, hand, c); 106 } 107 108 void 109 down_chopsticks (int c1, 110 int c2) 111 { 112 pthread_mutex_unlock (&chopstick[c1]); 113 pthread_mutex_unlock (&chopstick[c2]); 114 } 115 116 117 void 118 get_token () 119 { 120 sem_wait(&num_can_eat_sem); 121 } 122 123 void 124 return_token () 125 { 126 sem_post(&num_can_eat_sem); 127 }
这一新实现使用信号 num_can_eat_sem
来限制可同时就餐的哲学家人数。信号 num_can_eat_sem
初始化为 4,比哲学家人数少一个。尝试进餐前,哲学家调用 get_token(),后者又调用 sem_wait(&num_can_eat_sem)
。调用 sem_wait() 会导致执行调用的哲学家进入等待状态,直到信号值变为正值,然后将信号值减 1。哲学家进餐完毕后,他将调用 return_token(),后者又调用 sem_post(&num_can_eat_sem)
。调用 sem_post() 可将信号值加 1。线程分析器可识别对 sem_wait() 和 sem_post() 的调用,并判定并非所有的哲学家都同时尝试进餐。
必须使用 -lrt 编译 din_philo_fix2.c 以链接相应的信号例程。
要编译 din_philo_fix2.c,请使用以下命令:
cc -g -lrt -o din_philo_fix2 din_philo_fix2.c |
如果多次运行程序 din_philo_fix2 的这一新实现,将会发现它每次都正常终止,不会挂起。
在这一新的二进制代码上创建实验:
collect -r deadlock -o din_philo_fix2.1.er din_philo_fix2 |
您将发现线程分析器不报告 din_philo_fix2.1.er 实验中的任何实际死锁或潜在死锁,如下图所示:
有关线程分析器可识别的线程和内存分配 API 的列表,请参见附录 A。