Page 1 of 1

USB issues - capturing useful information

Posted: Fri Nov 09, 2012 7:13 pm
by davidmam
I, like others, am experiencing issues with USB on the Pi, leading to some frustration. Rather than posting yet another "It's not working" message, is there somewhere some guidelines to how to capture information that would actually be useful to those with the abilities to delve into the USB code?

A short guide of what to run, how we can capture useful info pre and post incident and ensure that we can contribute to a fix as speedily as possible without overwhelming with good intentions that are useless at best and hindering in the worst case.

Anyone?

Re: USB issues - capturing useful information

Posted: Fri Nov 09, 2012 8:22 pm
by M33P
In general, Linux by default offers multiple methods of extracting information useful for debugging.

The mainstay of this is the kernel message buffer, readable by the dmesg command. This dumps the kernel messages (created by drivers, subsystems and the kernel itself) since the system was last restarted. In recent versions of the rpi kernel, both timestamp and debugging information is available [e.g in the case of a kernel bug() or oops()].

As these events are timestamped since boot time in seconds, it is important that these are correlated with actions e.g. insertion or removal of a USB device, or when something goes faulty.

Please post the output of the dmesg command in \[code\] blocks to aid readability.

In the case where a severe problem disables or breaks the entire USB and ethernet subsystem, and your Pi is used without a monitor, it is possible to interrogate it using its own onboard serial port (ttyAMA0). This is a 3.3v-level UART connected to the GPIO pins that by default prints all kernel messages and spawns a login prompt. This does require external hardware such as a Bus Pirate or other logic-level serial converter.

Please try to eliminate the commonplace or obvious (TP1, TP2, voltage yada yada). Repeat your tests on a fresh install of a raspi distribution and pin down exactly what steps are required to reproduce the issue you encounter.

Re: USB issues - capturing useful information

Posted: Fri Nov 09, 2012 8:31 pm
by davidmam
So I have a headless Pi, it will lose network USB randomly about 12 -15 hours after I last logged in. What I am really looking for is something that I can do to capture that info when a crash occurs. Without USB and not being physically near the machine I can't use a keyboard or the network. HDMI doesn't work post-hoc (only on boot up IME). And it could be many many hours passed since the critical events occurred. Would it be worth capturing dmesg from a cron job? What system configuration information would be useful and how should it be reported?

ie:
'Capture the results of the following commands: uname <options>, lsusb -v ' etc.

I'm wondering whether I can actually use the Pi to monitor it's own TP1/TP2 voltage? any ideas? I obviously cannot be standing there for 12-15 hours monitoring TP1/TP2 and waiting for it to fall over. But computers are remarkably good at boring repetitive tasks.

Re: USB issues - capturing useful information

Posted: Sat Nov 10, 2012 10:31 pm
by gsh
Latest kernel updates should fix that

Re: USB issues - capturing useful information

Posted: Sat Nov 10, 2012 10:34 pm
by davidmam
when were those latest updates? Unless they were within the last few days, they don't. Edit: OK, a newer version is available so I have updated. Time will tell.

..d

Re: USB issues - capturing useful information

Posted: Sun Nov 11, 2012 2:08 pm
by davidmam
The latest fixes do not seem to have any effect on my issue.

I am running a Pi connected to a wireless network and to an Arexx TL-500 datalogger via a powered hub.

uname -a:
Linux raspberrypi 3.2.27+ #260 PREEMPT Thu Nov 8 00:34:12 GMT 2012 armv6l

Running Raspbian with all updates/upgrades. Include python usb.core (libusb 1.0)

lsmod:

Code: Select all

Module                  Size  Used by
aes_generic            31457  1 
w1_therm                2705  0 
w1_gpio                 1283  0 
wire                   23530  2 w1_gpio,w1_therm
cn                      4649  1 wire
snd_bcm2835            12808  0 
snd_pcm                74834  1 snd_bcm2835
snd_page_alloc          4951  1 snd_pcm
snd_seq                52536  0 
snd_seq_device          6300  1 snd_seq
snd_timer              19698  2 snd_seq,snd_pcm
snd                    52489  5 snd_timer,snd_seq_device,snd_seq,snd_pcm,snd_bcm2835
arc4                    1187  2 
rt2800usb              13009  0 
rt2800lib              47242  1 rt2800usb
rt2x00usb              11075  1 rt2800usb
rt2x00lib              41057  3 rt2x00usb,rt2800lib,rt2800usb
mac80211              236178  3 rt2x00lib,rt2x00usb,rt2800lib
cfg80211              171957  2 mac80211,rt2x00lib
crc_ccitt               1465  1 rt2800lib
dmesg output shows nothing exceptional - no error messages.

I have a mulitthreaded python logging script. I can run it quite happily once. It is killed by removing a lockfile and the threads are then harvested. Each logging thread feeds a queue. A separate thread aggregates log events from the queue and updates COSM.
The USB reading code is the part below.

Code: Select all

'''
Created on Oct 23, 2012

@author: dmamartin
'''


from __future__ import with_statement
import datetime
from time import strftime
from time import time
import time
import sys
import array
import os
import fcntl, sys
import optparse
import subprocess
from threading import Thread 
from struct import unpack

try:
    import rrdtool
except:
    raise Exception("You need rrdtool to run this script!")
 
try:
    import usb.core
except:
    raise Exception("You need pyusb-1 (usb.core) to run this script!")

try:
    import usb.util
except:
    raise Exception("You need pyusb-1 (usb.util) to run this script!")


class TL500:
    '''
    Adapted from original as described below.
    # This Version of the TL500 logger only reads from the usb Device and writes a raw File raw.txt
    # This part of the programm has to be run with enhanced priviledges. That's why the 
    # functionality is reduced to a minimum.
    # The resulting raw data then can be processed with the conversion 
    # script tl500-read-raw2rrd-v0.3-2.py 
    #
    # Another benefit of splitting the capture and the conversion is to be able to 
    # adapt the conversion afterwards. This is usefull, since I'm sure that I still 
    # don't know everything of the bytes returned by the logger.
    #
    # To automatically start and restart the Logger you can add a line into your crontab
    # /etc/crontab:
    # */10 *  * * *   root /usr/local/bin/tl500-raw-logger_v0.3-3.py

    # Original Python Source from
    # http://www.j-raedler.de/2010/08/reading-arexx-tl-500-with-python-on-linux-part-ii/
    ''' 
    debug=False
    errors=0
    
    def __init__(self,  rawAction=None, debug=False):
        self.dev = None
        self.rawAction = rawAction
        self.debug=debug
        self._buffer = array.array('B', [0]*64)
        if self.debug:
            print "TL500 Init Done"
        self.errors = 0

    def connect(self):
        try:
            #usb.core.reset()
            self.dev = usb.core.find( idVendor=0x0451, idProduct=0x3211 )
        except Exception, e:
            print datetime.datetime.now(),"TL500: Error: " 
            print e
            self.errors += 1
            raise Exception("Cannot Connect to Arexx TL-500!")

        if not self.dev:
            raise Exception("Arexx TL-500 not found!")

        # to avoid a SIGSEG with newer Linux-Kernels
        try:
            self.dev.set_configuration() 
        except Exception, e:
            print datetime.datetime.now(),"TL500: Error: Cannot COnfigure USB Device"
            print e
            self.errors += 1

        if self.debug:
            print "TL500: Connect Done"
            #print str(self.dev.bConfigurationValue) + '\n'

    def loopCollect(self, interval=10):
        if not self.dev:
            try:
                self.connect()
            except Exception, e:
                print "TL500: Cannot connect to logger - exiting: %s"%e
                exit(1)
        # Switch device in the correct Mode
        if self.debug:
            print "TL500: pre first write "
        self._buffer[0] = 4
        try:
            resp=self.dev.write(0x1, self._buffer, 0, 1000)
        except Exception, e:
            print datetime.datetime.now(),"TL500: Error: Cannot set Mode for logging Device"
            print e
            self.errors += 1

        if self.debug:
            print "TL500: First write done: %s"%resp

        # Ask for Data and read it
        self._buffer[0] = 3
        self.running=1
        while self.running==1:
            try:
                resp=self.dev.write(0x1, self._buffer, 0, 1000)
                #if self.debug==True:
                #    print "wrote 0x1 got %s"%( resp,)
            except Exception, e:
                print datetime.datetime.now(),"TL500: Error: Cannot ask for new Data"
                print e
                self.errors += 1
                
            try:
                rawData = self.dev.read(0x81, 64, 0, 1000)
                if rawData[1] != 0:
                    if self.debug==True:
                        print "TL500: calling callback for tl500 data: %s"%self.rawAction 
                    if self.rawAction:
                        self.rawAction(rawData)
            except Exception, e:
                print datetime.datetime.now(),"TL500: Error: Cannot read new Data"
                print e
                self.errors += 1
            
            time.sleep(interval)
            try:
                resp=self.dev.write(0x1, None, 0,1000)
                rawData=self.dev.read(0x81,0,0,1000)
            except Exception, e:
                print datetime.datetime.now(),"TL500: Error: Null read/write"
                print e
            if self.debug: # Progress Dots
                sys.stdout.write(".")



class ThreadedLogger(Thread):

    def __init__(self, pidfile, cosmqueue=None, debug=False, textlog=None, interval=5):
        Thread.__init__(self)
        self.queue=cosmqueue
        self.pidfile=pidfile
        self.debug=debug
        self.rawFile=None
        self.interval=interval
        if textlog!=None:
            self.rawFile=open(textlog,"w")
        if self.debug==True:
            print "TL500 Logger thread"
        self.running=0
        
    def run(self):
        '''overide threading superclass'''
        fp = open(self.pidfile, 'w')
        self.errors=0
        if self.debug==True:
            print "Started TL500 Logging"
        try:
            fcntl.lockf(fp, fcntl.LOCK_EX | fcntl.LOCK_NB)
        except  IOError:
            if self.debug:
                print "TL500: another instance is running\n"
            sys.exit(0)
        self.errors=0
        fp.write( str(os.getpid())+"\n")
        fp.flush()
        self.tl500=TL500(rawAction=self.processRaw, debug=self.debug)
        self.running=1
        while self.running==1:
            try:
                self.tl500.loopCollect(self.interval)
            except Exception, e:
                etype,emsg,tb = sys.exc_info()
                print datetime.datetime.now(),"Error (",self.tl500.errors,")(Line ",tb.tb_lineno,") : " , e
                #self.tl500.errors += 1
                if self.debug:
                    print datetime.datetime.now() ,"TL500: Outer Loop",self.tl500.errors
        if self.debug==True:
            print "AREXX: thread stopped"
    def stop(self):
        if self.debug==True:
            print "AREXX: thread stopping"
        self.running=0
        self.tl500.running=0
        
    def logRaw(self, rawData):
        if self.rawFile==None:
            print "No log file open"
            return
        
        self.rawFile.write("%15f" % time.time())
        for d in rawData:
            self.rawFile.write(" %02X" % d)
        self.rawFile.write("\n")
        self.rawFile.flush()
   
    def processRaw(self,binData):
        p=1
        if self.rawFile !=None:
            self.logRaw(binData)
        rawData = unpack("64B",binData)
        if self.debug==True:
            print "TL500: processing raw data length %s"%len(rawData)
            
        #while p<len(rawData)-10:
        #    slice=rawData[p:p+10]
        #if self.debug==True:
        #        print "%s %s"%(len(slice), ",".join(["%02X"%d for d in slice]))
        try:
            parmlist=self.processDataSlice(rawData)
            if self.debug==True:
                print "TL500: Parsed raw data as %s"%parmlist
            if parmlist !=None: 
                for parms in parmlist:              
                    self.queue.put({"at": datetime.datetime.now(), "id":parms['SensorName'], "value":parms['value']})
                    if self.debug:
                        print "TL500: added datapoint to the COSM queue: %s"%{"at":datetime.datetime.now(), "id":parms['SensorName'], "value":parms['value']}
        except Exception, e:
            print "TL500: Error processing raw data: %s"%e
    #        p=p+10

    def time_as_str(self):
        return datetime.datetime.fromtimestamp(self.time).strftime("%Y-%m-%d %H:%M:%S")
   
        # start logging
    def processDataSlice(self,rawSlice):
        .... no system calls here - method redacted.

The second time I run it I get and object not found error as the file is not found by Python (cant initialise my tl500 class)

The third time it gets to give the output

Code: Select all

TL500 Init Done
TL500: Connect Done
TL500: pre first write 
TL500: First write done: 64
And the Pi hangs. I cannot access it with keyboard, shell or any other means.

Running strace on the script I have been using gives:

Code: Select all

open("raw.tl500.txt", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
write(1, "TL500 Logger thread\n", 20TL500 Logger thread
)   = 20
mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40e95000
mprotect(0x40e95000, 4096, PROT_NONE)   = 0
clone(child_stack=0x41693fa8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x416944d8, tls=0x41694930, child_tidptr=0x416944d8) = 2569
futex(0xc72220, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc0cf48, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc0cf48, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41695000
mprotect(0x41695000, 4096, PROT_NONE)   = 0
clone(child_stack=0x41e93fa8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x41e944d8, tls=0x41e94930, child_tidptr=0x41e944d8) = 2570
futex(0xc0cf48, FUTEX_WAKE_PRIVATE, 1)  = 1
futex(0xc0cf48, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc0cf48, FUTEX_WAKE_PRIVATE, 1)  = 1
futex(0xc6fe00, FUTEX_WAKE_PRIVATE, 1)  = 1
futex(0xc0cf48, FUTEX_WAKE_PRIVATE, 1)  = 0
futex(0xc0cf48, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
mmap2(NULL, 8388608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x41f2b000
mprotect(0x41f2b000, 4096, PROT_NONE)   = 0
clone(child_stack=0x42729fa8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x4272a4d8, tls=0x4272a930, child_tidptr=0x4272a4d8) = 2571
futex(0xc73ee8, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0xc0cf48, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
stat64("cosm.pid", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
write(1, "Logging\n", 8Logging
)                = 8
write(1, "[]\n", 3[]
)                     = 3
select(0, NULL, NULL, NULL, {60, 0}Started TL500 Logging
TL500 Init Done
TL500: Connect Done
TL500: pre first write 
TL500: First write done: 64


running latrace gives:

Code: Select all

 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       realloc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       free [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       strlen [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       strncpy [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memcmp [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memcpy [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       strchr [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       realloc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       free [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       strlen [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       strncpy [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memcmp [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memcpy [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       strchr [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       realloc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       free [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       strlen [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       strncpy [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memcmp [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memcpy [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       strchr [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       realloc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       free [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memset [/usr/lib/arm-linux-gnueabihf/libcofi_rpi.so]  
 2257       strlen [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       strncpy [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       __ctype_b_loc [/lib/arm-linux-gnueabihf/libc.so.6]  
 2257       memcmp [/lib/arm-linux-gnueabihf/libc.so.6]
which goes on interminably. Should it be doing that or something more interesting?

Trying this without the wireless (on wired network only) gives exactly the same (except that some modules relating to wireless aren't loaded) so I'd rule out the wireless network. It looks like I can inadvertently hang the USB subsystem and hence the Pi at will.

Apologies for the rather long post, but this is starting to get rather wearing.

..d

Re: USB issues - capturing useful information

Posted: Wed Feb 20, 2013 7:34 am
by Laurentlap
Hello David,

Sometimes, specific USB config can help stabilize the pi : http://www.raspberrypi.org/phpBB3/viewtopic.php?t=16280

I understand that you are running your Arexx logger on the pi. We've tried to run the code provided by Arexx but unsuccessfully. how did you do it ?

Laurent

Re: USB issues - capturing useful information

Posted: Wed Jun 26, 2013 5:25 pm
by wiero
I'm also unsuccessful with arexx drivers. Please tell how you did to get it to work.

Re: USB issues - capturing useful information

Posted: Tue Aug 20, 2013 5:04 pm
by Vulcan_TBB
I'm also struggling with the arexx datalogger. Do you have updated Information?

Re: USB issues - capturing useful information

Posted: Fri Aug 23, 2013 3:40 pm
by davidmam
I have been using it successfully - I have some code at https://github.com/davidmam/pyCOSM in the examples directory.

I haven't looked at this for a while - I'll update the Pi firmware/distro etc and see how it runs over the weekend.