Performance Analyzer Tutorial
™Sun Studio 9
Sun Microsystems, Inc.
www.sun.com
817-7624-10
May 2004
Submit comments about this document at: http://www.sun.com/hwdocs/feedbackCopyright © 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.
U.S. Government Rights - Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement and
applicable provisions of the FAR and its supplements. Use is subject to license terms.
This distribution may include materials developed by third parties.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in
the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, Java, and JavaHelp are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and
other countries.All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the
U.S. and other countries. Products bearing SPARC trademarks are based upon architecture developed by Sun Microsystems, Inc.
This product is covered and controlled by U.S. Export Control laws and may be subject to the export or import laws in other countries. Nuclear,
missile, chemical biological weapons or nuclear maritime end uses or end users, whether direct or indirect, are strictly prohibited. Export or
reexport to countries subject to U.S. embargo or to entities identified on U.S. export exclusion lists, including, but not limited to, the denied
persons and specially designated nationals lists is strictly prohibited.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright © 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits réservés.
L’utilisation est soumise aux termes de la Licence.
Cette distribution peut comprendre des composants développés par des tierces parties.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, Java, et JavaHelp sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux
Etats-Unis et dans d’autres pays.Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées
de SPARC International, Inc. aux Etats-Unis et dans d’autres pays. Les produits portant les marques SPARC sont basés sur une architecture
développée par Sun Microsystems, Inc.
Ce produit est soumis à la législation américaine en matière de contrôle des exportations et peut être soumis à la règlementation en vigueur
dans d’autres pays dans le domaine des exportations et importations. Les utilisations, ou utilisateurs finaux, pour des armes nucléaires,des
missiles, des armes biologiques et chimiques ou du nucléaire maritime, directement ou indirectement, sont strictement interdites. Les
exportations ou réexportations vers les pays sous embargo américain, ou vers des entités figurant sur les listes d’exclusion d’exportation
américaines, y compris, mais de manière non exhaustive, la liste de personnes qui font objet d’un ordre de ne pas participer, d’une façon directe
ou indirecte, aux exportations des produits ou des services qui sont régis par la législation américaine en matière de contrôle des exportations et
la liste de ressortissants spécifiquement désignés, sont rigoureusement interdites.
LA DOCUMENTATION EST FOURNIE "EN L’ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFAÇON.Contents
Contents xiii
Figures xvii
Tables xxi
Before You Begin iii
How This Book Is Organized iii
Typographic Conventions iv
Shell Prompts v
Accessing Sun Studio Software and Man Pages v
Accessing Compilers and Tools Documentation viii
Accessing Related Solaris Documentation xi
Resources for Developers xii
Contacting Sun Technical Support xii
Sending Your Comments xiii
1. Learning to Use the Performance Tools 1
Setting Up the Examples for Execution 2
System Requirements 3
Choosing Alternative Compiler Options 3
xiiiBasic Features of the Performance Analyzer 4
2. Basic Performance Analysis 7
Collecting Data forsynprog 7
Simple Metric Analysis 8
Extension Exercise for Simple Metric Analysis 12
Metric Attribution and thegprof Fallacy 13
The Effects of Recursion 16
Loading Dynamically Linked Shared Objects 20
Descendant Processes 21
Extension Exercise for Descendant Processes 27
3. Analyzing the Performance of a Mixed Java/C++ Application 29
jsynprog Program Structure and Control Flow 29
Collecting Data forjsynprog 30
Analyzingjsynprog Program Data 31
4. OpenMP Parallelization Strategies 41
Collecting Data foromptest 41
Comparing Parallel Sections and Parallel Do Strategies 43
Comparing Critical Section and Reduction Strategies 45
5. Locking Strategies in Multithreaded Programs 47
Collecting Data formttest 47
How Locking Strategies Affect Wait Time 48
How Data Management Affects Cache Performance 53
Extension Exercises formttest 56
6. Cache Behavior and Optimization 57
Collecting Data forcachetest 57
Execution Speed 58
xiv Performance Analyzer • March 2004Program Structure and Cache Behavior 60
Program Optimization and Performance 63
A. Profiling Programs Withprof,gprof, andtcov 67
Usingprof to Generate a Program Profile 68
Usinggprof to Generate a Call Graph Profile 70
Usingtcov for Statement-Level Analysis 73
Creatingtcov Profiled Shared Libraries 76
Locking Files 77
Errors Reported bytcov Runtime Functions 77
Usingtcov Enhanced for Statement-Level Analysis 79
Creating Profiled Shared Libraries fortcov Enhanced 80
Locking Files 80
tcov Directories and Environment Variables 81
Index 83
Contents xvxvi Performance Analyzer • March 2004Figures
FIGURE 2-1 Source Tab Showing Annotated Source Code for Functioncputime 9
FIGURE 2-2icputime 10
FIGURE 2-3 Disassembly Tab Showing Instructions for the Line in Whichx Is Incremented in Function
cputime 11
FIGURE 2-4x
icputime 12
FIGURE 2-5 Callers-Callees Tab Withgpf_work as the Selected Function 14
FIGURE 2-6 Source Tab Showing Annotated Source Code for Functionsgpf_a andgpf_b 15
FIGURE 2-7 Source Tab Showing Annotated Source Code for Functiongpf_work 16
FIGURE 2-8 Callers-Callees Tab Withreal_recurse as the Selected Function 17
FIGURE 2-9bounce_a 18
FIGURE 2-10bounce_b 19
FIGURE 2-11 Functions Tab Showing Functionsso_burncpu andsx_burncpu 21
FIGURE 2-12 Timeline Tab Showing the Seven Experiments Recorded for the Parent Process and its
Descendant Processes 22
FIGURE 2-13 Timeline Tab at High Zoom Showing Event Markers and Gaps Between Them 24
FIGURE 2-14 Experiments Tab Showing Seven Experiments, Three of Which Are Marked as “Bad” 25
FIGURE 2-15 Timeline Tab at High Zoom, Showing Short Sample for Experiment
test.2.er/_f2_f1.er 26
FIGURE 2-16 Event Tab Showing Very Short Duration Sample 27
FIGURE 3-1 The Functions Tab, Showing Severaljsynprog Experiment Methods 32
FIGURE 3-2 The Summary Panel 33
Figures xviiFIGURE 3-3 The Metrics Tab of the Set Data Presentation Dialog 33
FIGURE 3-4 The Callers-Callees Tab, Withjsynprog.main Selected 34
FIGURE 3-5 Source Tab, Listingjsynprog.java 35
FIGURE 3-6 The Disassembly Tab, Showing Annotated Bytecode 36
FIGURE 3-7 The Timeline Tab in the Java Representation 37
FIGURE 3-8 Setting Java Mode Using the Formats Tab of the Set Data Presentation Dialog Box 38
FIGURE 3-9 The Timeline Tab in the Expert-Java Representation 38
FIGURE 3-10 The Functions Tab, Showing Interpreted and Dynamically-compiled Versions of
Routine.sys_op 39
FIGURE 4-1 Summary Tabs for Functionpsec_ From the Four-CPU Run (Left) and the Two-CPU Run
(Right) 43
FIGURE 4-2pdo_
(Right) 44
FIGURE 4-3 Functions Tab Showing Entries forcritsum_ andredsum_ 45
FIGURE 5-1 Functions Tab for the Four-CPU Experiment Showing Data forlock_local and
lock_global 49
FIGURE 5-2 Source Tab for the Four-CPU Experiment for Functionlock_global 50
FIGURE 5-3lock_local 51
FIGURE 5-4 Functions Tab for the One-CPU Experiment Showing Data forlock_local and
lock_global 52
FIGURE 5-5 Functions Tab for the One-CPU Experiment Showing Data for FunctionscomputeA and
computeB 53
FIGURE 5-6 Functions Tab for the Four-CPU Experiment Showing Data for FunctionscomputeA and
computeB 54
FIGURE 5-7 Source Tab for the Four-CPU Experiment Showing Annotated Source Code forcomputeA
andcomputeB 55
FIGURE 5-8 Source tab for the Four-CPU Experiment Showing Annotated Source Code for
cache_trash 55
FIGURE 6-1 Functions Tab Showing User CPU, FP Adds and FP Muls for the Six Variants ofdgemv 59
FIGURE 6-2 Functions Tab Showing User CPU Time, CPU Cycles, Instructions Executed and D- and E-
Cache Stall Cycles for the Six Variants ofdgemv 60
FIGURE 6-3 Source Tab Showing Annotated Source Code fordgemv_g1 anddgemv_g2 62
FIGURE 6-4 Source Tab fordgemv_hi1 Showing Compiler Commentary That Includes Loop Interchange
Messages 64
Figures xviiiFIGURE 6-5 Source Tab fordgemv_hi2 Showing Compiler Commentary 65
FIGURE 6-6 Disassembly Tab fordgemv_hi1 Showing Non-Sequential Line Numbering and Instruction
Repetition Due to Loop Unrolling 66
Figures xixxx Performance Analyzer • March 2004