Increased resolution for wallclock


When debugging a piece of code which rarely checks time, uinfo wallclock can become quite imprecise.  For example:

(udb) uinfo wallclock
2021-04-27T20:57:07.547331Z +/- 301288940 microseconds

...+/- 5 minutes!  A section of code that doesn’t make any system calls is much rarer, so perhaps syscall handling could also fetch a timestamp (is rdtsc cheap enough and usable for interpolation?)

A use case for this is where two processes are communicating over a socket (hence making syscalls) and using wallclock to jump between corresponding sends/receives in those processes, but not needing clock times during that process.

The primary use-case for uinfo wallclock is, given a fairly precise timestamp obtained typically from a log file, to jump to that exact moment in time.

I’m not really convinced about the utility of UDB’s interpolation between known timestamps. In your example would if not be quicker and easier to match up the send and receives via a data watchpoint on a sequence counter or on the contents of the data being exchanged?


> would if not be quicker and easier to match up the send and receives via a data watchpoint on a sequence counter or on the contents of the data being exchanged?

 

Only if there is such a sequence counter you can watch. I was debugging something a while ago where there was no reasonable way to correlate sends/receives. Happily in that case the wallclock was precise enough.

 

I guess the point here is that wallclock is useful a lot of places other than logjump. I’ve used it for performance investigations, for knowing am I before or after the thing paused for a long time. Or correlating against some other system output with timestamps I have. If your program happens not to have done a gettimeofday or equivalent for a long time then effectively wallclock is not available and that means our tools are just a little bit less awesome than they might be.


Updated idea status NewDiscussion ongoing

There are indeed many protocols which lack sequence counters.  Even when they are there, working out “what is the sequence counter” and “how do I translate it into a breakpoint condition” are a mental effort.  If you can go to about the right time in a recording and then advance or go back to a read/write you will often be lucky enough to find the right one - and funnily enough manually checking it’s the right one is often a lower mental burden then working out how to tell the debugger to check for you.  If you’re not so lucky, then you can fall back on doing it anyway.

It’s not just protocols with established connections, either.  Adjacent to that, what if I want to go from a TCP connect in one process, to the corresponding accept in another?  Sure I can set a watchpoint looking for an accept that yields the right port, but finding that out is again quite burdensome.

In all such cases, if the timer granularity is better, my chances of being sufficiently lucky to get away with the easy/lazy thing are vastly improved.


As mentioned on Slack, our use case is exactly as described above - syncing recordings where you can potentially have a server with multiple clients making socket calls to it. I would like to put a breakpoint at method A in the server, where that method is the “top level” after getting called from all the socket handling gubbins. And then be at a point in a client where we are just about to make a remote call to method A. So we need the server recording fast forwarding to (roughly) at least where the client is in time. Of course method A may get called by multiple clients. I’m guessing that around 10ms accuracy would be good enough but that’s just a gut feel.

And by the way, want to do this automatically, not manually.

Might have a look to see if I can set a breakpoint where the actual socket read/writes are done and see if the packets can be matched up. Unlikely to be exact same packet? Or a combination of rough wall clock time plus packet matching.

Another vague idea I had was to pre-process a recording to get the timings of all the socket calls. This could be bbcount. And then use that - just the intervals as a set of integers - to create a “signature”. Then you could slide the client signature forwards and backwards against the server signature to get the best fit. I imagine such signatures would be fairly unique. (Need to be a fuzzy match obv)


@david_griffiths Your “signature” idea is similar to one that’s been floating around for a while (pre-dating this community otherwise I’d share a link) around byte-counting.

The concept, as I’ve always understood it, is that we would match up corresponding file descriptors in two processes, and set a point in each recording where the exchange of data is in sync.  From that point we can inspect reads and writes and by counting bytes exactly correlate the flow of bytes between those recordings.  There’s a little more work to set the initial “in-sync” point when the fds aren’t opened within the recording, but if you get that alignment then no fuzziness would be required.

This approach might actually be best for you if you want to fully-automate your navigation in twin recordings, but it feels like the improvement to recording wallclock resolution could be very useful in manual navigation relatively cheaply.


bbcounts is silly of me btw because will vary across processes so no relationship to time!


I did some investigation on this a while ago; it’d be good to look into again for a hackathon project.

I found that if you have a complete recording of both sender and receiver (complete = starting from before first byte is sent/received), that gives you the ability to take some received byte in the receiver’s memory, and trace it back to the matching byte in sender’s memory. I think that’s pretty neat and that level of abstraction is useful in a lot of cases, without any higher level protocol knowledge.

For finding corresponding fds in processes, network sockets are relatively straightforward. UNIX sockets also aren’t hard. The awkward ones are sockets created with socketpair (and inherited with fork), and sockets that are sent via msgs on other sockets.


I have spun up a separate thread for the ongoing conversation about correlating between recordings using a socket to communicate with one another:

 


When it says `+/- 301288940 microseconds`, how accurate is that? Is it guaranteed not to be outside those limits?