linux filesystem performance help
Aug. 18th, 2013 03:00 pmNEW READERS: IT’S NOT ABOUT THE FILESYSTEM ANYMORE BUT IT’S STILL BROKEN: SEE UPDATES AT BOTTOM OF POST. Addressing filesystem performance only partly fixed it. Thanks!
Since always, I’ve had latency issues on my digital audio workstation, which is running Ubuntu Linux (currently 12.04 LTS) against a Gigabyte motherboard with 4G of RAM and a suitably symmetric four-core processor. CPUs run 20%-ish in use most of the time (and all the time for these purposes), and I never have to swap.
In this configuration, I should be able to get down to around 7ms of buffer time and not get XRUNs (data loss due to buffer overrun) in my audio chain. 14ms if I want to be safe.
In reality, I can’t make it reliably at 74ms, and that has hitches I just have to live with. To get no XRUNs or close to it I have to go up to like 260ms, which is insane. I even tried getting a dedicated root-device USB card – I’ve long assumed it was some sort of USB issue. But no.
With some new tools (latencytop in particular) I have found it. It’s the file system. Specifically, it’s in the ext3′s internal transaction logging. To wit:
EXT3: committing transaction 302.9ms log_wait_commit 120.3ms
If I turn off read-time updating, which I tried last night, I get rid of 90% of the XRUNs, because the file system does about 90% less transaction logging to update all those inodes with new times.
But any attempt to write – well, you can guess. Even the pure realtime kernel doesn’t help; I compiled and installed a custom build of one today, but apparently this is still atomic: I get exactly the same behaviour. I may be able to live with that to some degree, because it’s a start-and-stop-of-writes thing, and as long as it doesn’t trigger during writes, I can get by.
But it’s bullshit, and it pisses me off.
I’m currently in progress of updating ext3 to ext4. I’d like to think that would solve it, given ext4′s dramatically better performance, but I have no such assurances at this point. I genuinely thought the realtime kernel might do it.
DO YOU HAVE ANYTHING YOU CAN TELL ME, DEAR INTERNETS? Particularly about filesystem tuning. Because this shouldn’t be happening; it just shouldn’t. Honestly, three tenths of a second to commit a transaction? I’ve been places where that kind of number was reasonable; it was called 1983, and I don’t live there anymore.
Anybody?
THINGS IT IS NOT:
- Shared interrupt
- This particular hard drive (the previous drive did it too; this one is faster)
- ondemand CPU scheduling (i’m running in performance)
- this particular USB port or a USB hub or extension cord or any of the sort
- bluetooth or other random services (including search)
- Corrupt HD
- Old technology (it’s SATA; the drive is like six months old)
- lack of RT kernel. I built this RT kernel today.
- Going to be solved by installing a different operating system. Please don’t.
ETA: I got the ext3 filesystem upgraded to ext4, which made all those above numbers get dramatically smaller, but no further XRUN improvement. So I then disabled journaling, a configuration which outperforms raw ext2 in benchmarks I saw, and the machine is screamingly fast despite the RT kernel…
…and it hasn’t made one goddamn whit of difference in the remaining XRUNs. WTF, computer? WTF.
ETA2 (23:51 18 August): Okay, while screwing with the filesystem did solve many XRUN problems, there are still other XRUNs which are apparently unrelated, most notably, the master-record-enable XRUN. Even moving the project to a tmpfs RAM disk and running from there produced identical results, so I’m concluding this is an entirely separate problem.
I’ve already done pretty much everything there is to do the LinuxMusicians configuration consultation page and my setup actually passes their evaluation script. I should be golden, but I’m not. Help?
ETA3 (0:26 19 August): Every two minutes, right now, with the system mostly idle, I’m getting a burst of XRUNs. On an idle machine. But it is exactly every two minutes. And while Ardour remains on top of Top even when idle (at 10% of CPU and 13.5% of RAM), Xorg pops up just underneath it, and its CPU use spikes.
What does Xorg do every two minutes? Anybody? Seriously I have no idea.
ETA4 (13:19 19 August): ARDOUR 3 TRIGGERS SESSION SAVE EVERY TWO MINUTES BY DEFAULT. Disabling that STOPS the two-minute failures entirely. We’re back to file system adventures. Holy hell. THIS HAPPENS EVEN ON RAMDISK so it’s not filesystem or media specific. What the hell is going on here?
Mirrored from Crime and the Blog of Evil. Come check out our music at:
Bandcamp (full album streaming) | Videos | iTunes | Amazon | CD Baby
I just noticed something
Date: 2013-08-19 01:10 am (UTC)You specifically mention .
Is this an external drive?
Re: I just noticed something
Date: 2013-08-19 02:18 am (UTC)Also, hi!
Re: I just noticed something
Date: 2013-08-19 02:55 am (UTC)And Hi!
Re: I just noticed something
Date: 2013-08-19 03:48 am (UTC)So, yeah.
Re: I just noticed something
Date: 2013-08-19 02:51 am (UTC)...it's made no goddamn difference whatsoever.
god DAMMIT.
Re: I just noticed something
Date: 2013-08-19 02:52 am (UTC)Re: Xorg (and thats why new thread)
Date: 2013-08-19 06:20 pm (UTC)Re: Xorg (and thats why new thread)
Date: 2013-08-19 07:40 pm (UTC)Section "Device"
Identifier "Configured Video Device"
BusID "PCI:00:02:0"
Driver "Intel"
EndSection
Section "Device"
Identifier "Configured Video Device[2]"
BusID "PCI:03:00:0"
Driver "ati"
EndSection
Section "Monitor"
Identifier "Configured Monitor"
Option "DPMS"
EndSection
Section "Monitor"
Identifier "Configured Monitor[2]"
Option "PreferredMode" "1280x1024_60.00"
Option "DPMS"
EndSection
Section "Screen"
Identifier "Default Screen"
Monitor "Configured Monitor"
Device "Configured Video Device"
EndSection
Section "Screen"
Identifier "Second Screen"
Monitor "Configured Monitor[2]"
Device "Configured Video Device[2]"
Option "AddARGBGLXVisuals" "True"
SubSection "Display"
Depth 24
Modes "1280x1024"
EndSubSection
EndSection
Section "ServerLayout"
Identifier "Layout0"
Screen 0 "Default Screen"
Screen 1 "Second Screen" RightOf "Default Screen"
Option "Xinerama" "on"
Option "RANDR" "on"
Option "BlankTime" "20"
Option "StandbyTime" "20"
Option "SuspendTime" "20"
Option "OffTime" "40"
EndSection
Section "Extensions"
Option "Compsite" "Enable"
EndSection
Re: Xorg (and thats why new thread)
Date: 2013-08-19 09:15 pm (UTC)Re: Xorg (and thats why new thread)
Date: 2013-08-20 03:43 am (UTC)Re: Xorg (and thats why new thread)
Date: 2013-08-20 05:17 am (UTC)In the source code, most of save happens in
libs/ardour/session_state.cc
Save works fine when plugins aren't activated but very badly when plugins are deactivated.
Save state calls a lot of things including get_state which includes getting latency data from plugins via get_state which calls add_state which calls eventually latency_compute_run which is the same! in both lv2 and ladspa plugins. This calcutes the latency by actually running the plugin. Not a copy: the actual plugin that's in use.
Most notibly in add_state found here:
libs/ardour/lv2_plugin.cc
libs/ardour/ladspa_plugin.cc
latency_compute_run activates the plugin even if it's already activated (!) then deactivates it on exit (which I guess is stacked somehow because they don't deactivate in Ardour itself) and runs a second thread on the plugin (presumably because how else I guess?).
hypothesis: this is causing the cpu to retrace because of bad prediction or bad hyperthreading. Penalty for this in Intel land is large. The two versions of the active plugin may be continually invalidating each other(!) for the duration of the latency test. It may even be causing the on-chip cache to be being thrown out. This would explain why it stops being an issue when the plugin is not active.
Thoughts?