Effective traces in Go

Effective traces in Go

The runtime/trace package provides a powerful tool for understanding and debugging Go programs. The functionality inside allows you to create a trace of the execution of each goroutine over a certain period. By using go tool trace (or the excellent open source gotraceui tool) you can visualize and explore the data in these traces.

The magic of tracing is that it can easily reveal things in a program that are difficult to see in other ways. For example, a parallelism bottleneck where multiple goroutines block the same channel can be quite difficult to see in a CPU profile because there is no execution to sample. But in the trace, the lack of execution will be visible with amazing clarity, and a stack trace of blocked goroutines will quickly point to the culprit.

Go developers can even instrument their own applications with tasks, regions, and logs, which they can use to correlate their high-level problems with low-level execution details.


Unfortunately, a huge amount of information in the traces may not be available. Historically, this has been hindered by four major problems.

  • Routing had high overhead.

  • Traces do not scale well and can become too large to analyze.

  • It was often unclear when to start tracing to capture specific misbehavior.

  • Only the bravest of gophers could programmatically analyze traces, given the lack of a publicly available package for analyzing and interpreting traces.

If you’ve been using tracers in the last few years, chances are you’ve been frustrated by one or more of these problems. But we’re happy to share that we’ve made great progress in all four areas over the past two versions of Go.

Low overhead routing

Prior to Go 1.21, the runtime overhead of tracing was 10-20% of CPU for many applications, which limited tracing to situational use rather than constant as with CPU profiling. It turned out that the majority of tracing costs are accounted for by tracebacks. Many events generated by the runtime contain stack traces, which are invaluable for determining what goroutines were doing at key points in their execution.

Thanks to the work of Felix Geisendörfer and Nick Ripley on optimizing the performance of tracebacks, the CPU overhead of tracing has been reduced significantly, down to 1-2% for many applications. You can read more about the work done in Felix’s excellent blog post on the subject.

Scalable tracings

The format of the track and its events was designed with efficient emission in mind, but required a toolkit to parse and save the state of the entire track. A track several hundred megabytes in size could require several gigabytes of RAM to analyze!

This problem, unfortunately, is fundamental to the way traces are generated. To reduce runtime overhead, all events are written to the equivalent of thread-local buffers. But this means that events appear out of their true order, and the burden falls on the tracing tool to figure out what actually happened.

The key to allowing tracing to scale while maintaining low overhead was the periodic division of the generated trace. Each split point will behave roughly the same as simultaneously disabling and re-enabling tracing in one go. All trace data is currently a complete and self-contained trace, and new trace data smoothly transitions to where it left off.

Fixing all of this required rethinking and rewriting many of the fundamentals of the trace implementation. We are happy to announce that this work has been completed in Go 1.22 and is now publicly available. There are many nice improvements, including some improvements go tool trace Team. If you are interested, all the details are described in the design document.

(Note: go tool trace still loads the full trace into memory, but this restriction can now be removed for traces generated by Go 1.22+ programs.)

Flight recording

Let’s say you are working on a web service and RPC takes a long time. You couldn’t start tracing when you already knew the RPC was taking a long time because the cause of the slow request had already happened and wasn’t caught.

A technology called flight recording, which you may already be familiar with from other programs, can help with this. The essence of flight recording is to constantly monitor and always save the latest trace data, just in case. Then, as soon as something interesting happens, the program can just record everything it has!

Before the appearance of the possibility of dividing tracks, it was practically impossible. But since continuous tracing has become viable due to the low overhead and the fact that the runtime can now split traces at any time, it turns out that flight recording is easier to implement.

As a result, we are pleased to announce a flight recorder experiment available at golang.org/x/exp/trace.

Please try it! Below is an example that sets up flight recording to capture a long HTTP request to get you started.

// Set up the flight recorder.
fr := trace.NewFlightRecorder()

// Set up and run an HTTP server.
var once sync.Once
http.HandleFunc("/my-endpoint", func(w http.ResponseWriter, r *http.Request) {
	start := time.Now()

	// Do the work...
	doWork(w, r)

	// We saw a long request. Take a snapshot!
	if time.Since(start) > 300*time.Millisecond {
		// Do it only once for simplicity, but you can take more than one.
		once.Do(func() {
			// Grab the snapshot.
			var b bytes.Buffer
			_, err = fr.WriteTo(&b)
			if err != nil {
			// Write it to a file.
			if err := os.WriteFile("trace.out", b.Bytes(), 0o755); err != nil {
log.Fatal(http.ListenAndServe(":8080", nil))

If you have any feedback, positive or negative, please share it in the proposal issue!

Trace reader API

Along with the redesign of the trace implementation, there was an attempt to tidy up other internal components such as go tool trace. This gave rise to an attempt to create an API for reading traces that would be good enough to share and could make traces more accessible.

As with flight recorder, we are happy to announce that we also have an experimental API for reading traces that we would like to share. It is available in the same package as flight recorder, golang.org/x/exp/trace.

We think it’s good enough to start building something new on top of it, so please give it a try! Below is an example that measures the proportion of goroutine blocking events that are blocked waiting on the network.

// Start reading from STDIN.
r, err := trace.NewReader(os.Stdin)
if err != nil {

var blocked int
var blockedOnNetwork int
for {
	// Read the event.
	ev, err := r.ReadEvent()
	if err == io.EOF {
	} else if err != nil {

	// Process it.
	if ev.Kind() == trace.EventStateTransition {
		st := ev.StateTransition()
		if st.Resource.Kind == trace.ResourceGoroutine {
			id := st.Resource.Goroutine()
			from, to := st.GoroutineTransition()

			// Look for goroutines blocking, and count them.
			if from.Executing() && to == trace.GoWaiting {
				if strings.Contains(st.Reason, "network") {
// Print what we found.
p := 100 * float64(blockedOnNetwork) / float64(blocked)
fmt.Printf("%2.3f%% instances of goroutines blocking were to block on the network\n", p)

And just like with the flight recorder, there is a proposal issue for your feedback!

We’d like to give a special shout out to Dominik Honnef as someone who tested it early, provided excellent feedback, and contributed to maintaining older versions of the API.

Thank you!

This work was completed, in large part, thanks to the help of those on the Diagnostics Working Group, launched more than a year ago as a collaboration of stakeholders from across the Go community and open to the public.

We’d like to take a moment to thank those community members who have attended meetings regularly over the past year: Felix Geisendörfer, Nick Ripley, Rhys Hiltner, Dominik Honnef, Bryan Boreham, thepudds.

The discussions, feedback and work you’ve all done has helped us get to where we are today. Thank you!

Related posts