Adam Leventhal's blog

Search
Close this search box.

more on gcore

October 13, 2004

Trawling through b.s.c I noticed Fintan Ryan talking about gcore(1), and I realized that I hadn’t sufficently promoted this cool utility. As part of my work adding variable core file content, I rewote gcore from scratch (it used to be a real pile) to add a few new features and to make it use libproc (i.e. make it slightly less of a pile).

You use gcore to take a core dump of a live running process without actually causing the process to crash. It’s not completely uninvasive because gcore stops the process you’re taking the core of to ensure a consistent snapshot, but unless the process is huge or it’s really cranky about timing the perturbation isn’t noticeable. There are a lot of places where taking a snapshot with gcore is plenty useful. Let’s say a process is behaving strangely, but you can’t attach a debugger because you don’t want to take down the service, or you want to have a core file to send to someone who can debug it when you yourself can’t — gcore is perfect. I use to it to take cores of mozilla when it’s chugging away on the processor, but not making any visible progress.

I mentioned that big processes can take a while to gcore — not surprising because we have to dump that whole image out to disk. One of the cool uses of variable core file content is the ability to take faster core dumps by only dumping the sections you care about. Let’s say there’s some big ISM segment or a big shared memory segment: exclude it and gcore will go faster:

hedge /home/ahl -> gcore -c default-ism 256755
gcore: core.256755 dumped

Pretty handy, but the coolest I’ve been making of gcore lately is by mixing it with DTrace and the new(ish) system() action. This script snapshots my process once every ten seconds and names the files according to the time they were produced:

# cat gcore.d
#pragma D option destructive
#pragma D option quiet
tick-10s
{
doit = 1;
}
syscall:::
/doit && pid == $1/
{
stop();
system("gcore -o core.%%t %d", pid);
system("prun %d", pid);
doit = 0;
}
# dtrace -s gcore.d  256755
gcore: core.1097724567.256755 dumped
gcore: core.1097724577.256755 dumped
gcore: core.1097724600.256755 dumped
^C

WARNING! When you specify destructive in DTrace, it means destructive. The system() and stop() actions can be absolutely brutal (I’ve rendered at least one machine unusable my indelicate use of that Ramirez-Ortiz-ian one-two combo. That said, if you screw something up, you can break into the debugger and set dtrace_destructive_disallow to 1.

OK, so be careful, but that script can give you some pretty neat results. Maybe you have some application that seems to be taking a turn for the worse around 2 a.m. — put together a DTrace script that detects the problem and use gcore to take a snapshot so you can figure out what was going on when to get to the office in the morning. Take a couple of snapshots to see how things are changing. You do like debugging from core dumps, right?

6 Responses

  1. Dtrace not needed really 🙂
    #!/usr/bin/bash
    N=$1
    PID=$2
    if [ -z $N ] || [ -z $PID ]; then
    echo “usage: $0 (interval) (pid)”
    exit 0;
    fi
    EPOCH=`truss -t time date 2>&1 | awk ‘/^time/ {print $3; exit}’`
    while [ $n -gt 0 ];
    do
    /usr/bin/pstop $pid
    /usr/bin/gcore -o core.$EPOCH $pid
    /usr/bin/prun $pid
    let n=$n-1
    done

  2. Rodrick,
    True, that we don’t need DTrace — this was an example of how you can mix DTrace and gcore. The idea is that you can grab a core when you see any series of events. One other thing to note is that the %t format character can be used with the -o option to gcore(1) — while your use of truss(1) is exciting, there’s a simpler way.

  3. Yes good point Adam now I think about it this will be very good for situations where you could have a probe in dtrace trigger an event that could call gcore to further trouble shoot an issue with mdb 🙂
    Very cool in deed.
    One thing
    #pragma D option destructive scares me lol 🙂

  4. Rodrick,
    I basically think you can’t be too afraid of destructive actions. Consider the following D script:

    syscall:::entry
    {
    stop();
    }

    Ouch… and it’s real tough to recover from a system where every process is stopped…
    Destructive actions are <em>incredibly</em> useful, but wield them with care.

Recent Posts

April 17, 2024
January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016

Archives

Archives