Check diff for PFE drops – Multiple Devices


Many times we conclude path Рwhich looks problematic but it takes immense time to narrow down that problem to Device and FPC. Some times even after knowing device, due to high number of FPCs Рit is difficult to narrow down FPC. With this python program Рwe are login into list of devices and check for diff of pfe statistics drops and that to at same time with multithreading support.

Below is github link, to retrieve this program and explanation of functions. – please provide comments and suggest if any betterment needed.

github link for


Input: Device Pointer  (dev) and Online FPC list (fpclist)
Output: Array of each fpcs - Fabric Drops, Bad Route discard, Data Error, Timeout Discard, Truncated Key Discard, Bit-to-test Discard, Stack Underflow, Stack Overflow, Next-Hop Discard, Invalid IIF Error, Info Cell Discard, Input Checksum, Output MTU Errors
def fediscard(dev,fpclist):
 fpcs_fdiscard = []
 fpcs_brdiscard = []
 fpcs_derror = []
 fpcs_tdiscard = []
 fpcs_tkdiscard = []
 fpcs_bttdiscard = []
 fpcs_sudiscard = []
 fpcs_sodiscard = []
 fpcs_nhdiscard = []
 fpcs_iidiscard = []
 fpcs_icdiscard = []
 fpcs_ichecksum = []
 fpcs_omtu = []
 for num in fpclist:
 fpc_lxml_elements = dev.rpc.get_pfe_statistics(fpc=num)
 string = etree.tostring(fpc_lxml_elements)
 dom = parseString(string)
 #pfehwdiscard = dom.getElementsByTagName("pfe-hardware-discard-statistics")
 #for var in pfehwdiscard:
 fdiscard = int(dom.getElementsByTagName("fabric-discard")[0]
 brdiscard = int(dom.getElementsByTagName("bad-route-discard")[0]
 derror = int(dom.getElementsByTagName("data-error-discard")[0]
 tdiscard = int(dom.getElementsByTagName("timeout-discard")[0]
 tkdiscard = int(dom.getElementsByTagName("truncated-key-discard")[0]
 bttdiscard = int(dom.getElementsByTagName("bits-to-test-discard")[0]
 sudiscard = int(dom.getElementsByTagName("stack-underflow-discard")[0]
 sodiscard = int(dom.getElementsByTagName("stack-overflow-discard")[0]
 nhdiscard = int(dom.getElementsByTagName("nexthop-discard")[0]
 iidiscard = int(dom.getElementsByTagName("invalid-iif-discard")[0]
 icdiscard = int(dom.getElementsByTagName("info-cell-discard")[0]
 ichecksum = int(dom.getElementsByTagName("input-checksum")[0]
 omtu = int(dom.getElementsByTagName("output-mtu")[0]
 return fpcs_fdiscard,fpcs_brdiscard,fpcs_derror, fpcs_tdiscard, fpcs_tkdiscard, fpcs_bttdiscard, fpcs_sudiscard, fpcs_sodiscard, fpcs_nhdiscard, fpcs_iidiscard, fpcs_icdiscard, fpcs_ichecksum, fpcs_omtu


Input: Device Pointer
Output: Array of Online FPCs
def onlinefpcs(dev):
 fpc_lxml_elements = dev.rpc.get_fpc_information()
 string_fpc = etree.tostring(fpc_lxml_elements)
 dom_fpc = parseString(string_fpc)
 cms = dom_fpc.getElementsByTagName("fpc")
 print 'Number of FPCs Sloat', len(cms)
 fpcsloat = []
 for cm in cms:
 state = str(cm.getElementsByTagName("state")[0]
 fpcsn = cm.getElementsByTagName("slot")[0]
 fpc = re.match(r"Online", state)
 if fpc:
 return fpcsloat


Input: Dev Pointer,iteration and Wait time
Output: Print FE discard diff for each FPC for list of devices
print "Function been called %sdev" %(dev)
 dev = Device(host=dev.strip(), user="labroot", passwd="lab123")
 except Exception, e:
 print "Unable to connect to host:", e
 ts = time.time()
 st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
 print st 
 fpclist = onlinefpcs(dev)
 for count in xrange(iteration):
 print "Collecting first set of data %s" % (dev)
 fpcs_fdiscard,fpcs_brdiscard,fpcs_derror, fpcs_tdiscard, fpcs_tkdiscard, fpcs_bttdiscard, fpcs_sudiscard, fpcs_sodiscard, fpcs_nhdiscard, fpcs_iidiscard, fpcs_icdiscard, fpcs_ichecksum, fpcs_omtu = fediscard(dev,fpclist)
 print "Sleeping for %s second on device %s" % (wait,dev)
 print "Collecting second set of data %s" % (dev)
 fpcs_fdiscard2,fpcs_brdiscard2,fpcs_derror2, fpcs_tdiscard2, fpcs_tkdiscard2, fpcs_bttdiscard2, fpcs_sudiscard2, fpcs_sodiscard2, fpcs_nhdiscard2, fpcs_iidiscard2, fpcs_icdiscard2, fpcs_ichecksum2, fpcs_omtu2 = fediscard(dev,fpclist) 
 for fpc in fpclist:
 diff_fdiscard = fpcs_fdiscard2[i] - fpcs_fdiscard[i]
 diff_brdiscard = fpcs_brdiscard2[i] - fpcs_brdiscard[i]
 diff_derror = fpcs_derror2[i] - fpcs_derror[i]
 diff_tdiscard = fpcs_tdiscard2[i] - fpcs_tdiscard[i]
 diff_tkdiscard = fpcs_tkdiscard2[i] - fpcs_tkdiscard[i]
 diff_bttdiscard = fpcs_bttdiscard2[i] - fpcs_bttdiscard[i]
 diff_sudiscard = fpcs_sudiscard2[i] - fpcs_sudiscard[i]
 diff_sodiscard = fpcs_sodiscard2[i] - fpcs_sodiscard[i]
 diff_nhdiscard = fpcs_nhdiscard2[i] - fpcs_nhdiscard[i]
 diff_iidiscard = fpcs_iidiscard2[i] - fpcs_iidiscard[i]
 diff_icdiscard = fpcs_icdiscard2[i] - fpcs_icdiscard[i]
 diff_ichecksum = fpcs_ichecksum2[i] - fpcs_ichecksum[i]
 diff_omtu = fpcs_omtu2[i] - fpcs_omtu[i]
 if diff_fdiscard > 0:
 print "dev %s For FPC %s, Fabric Drop count increases %s" % (dev, fpc,diff_fdiscard)
 if diff_brdiscard > 0:
 print "dev %s For FPC %s, Bad Route Discard count increases %s" % (dev,fpc,diff_brdiscard)
 if diff_derror > 0:
 print "dev %s For FPC %s, Data Error count increases %s" % (dev,fpc,diff_derror) 
 if diff_tdiscard > 0:
 print "dev %s For FPC %s, Timeout Discard increases %s" % (dev,fpc,diff_tdiscard) 
 if diff_tdiscard > 0:
 print "dev %s For FPC %s, Timeout Discard increases %s" % (dev,fpc,diff_tdiscard) 
 if diff_tkdiscard > 0:
 print "dev %s For FPC %s, Truncated Key Discard increases %s" % (dev,fpc,diff_tkdiscard)
 if diff_bttdiscard > 0:
 print "dev %s For FPC %s, Bit to test Discard increases %s" % (dev,fpc,diff_bttdiscard)
 if diff_sudiscard > 0:
 print "dev %s For FPC %s, Stack Underflow Discard increases %s" % (dev,fpc,diff_sudiscard)
 if diff_sodiscard > 0:
 print "dev %s For FPC %s, Stack Overflow Discard increases %s" % (dev,fpc,diff_sodiscard)
 if diff_nhdiscard > 0:
 print "dev %s For FPC %s, Nexthop Discard increases %s" % (dev,fpc,diff_nhdiscard)
 if diff_iidiscard > 0:
 print "dev %s For FPC %s, Invalid iif Discard increases %s" % (dev,fpc,diff_iidiscard)
 if diff_icdiscard > 0:
 print "dev %s For FPC %s, Info Cell Discard increases %s" % (dev,fpc,diff_icdiscard)
 if diff_ichecksum > 0:
 print "dev %s For FPC %s, Input Checksum Drop increases %s" % (dev,fpc,diff_ichecksum)
 if diff_omtu > 0:
 print "dev %s For FPC %s, Output MTU Drop increases %s" % (dev,fpc,diff_omtu)
 ts = time.time()
 st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
 print st

Class to create New thread:

Input:This function being called from main thread - where it initiates threads.
Output: It calls checkfedrops_devices based on specific device thread
class newthread (threading.Thread):
 def __init__(self,threadID,dev,nus,wait):
 self.threadID = threadID = dev
 self.nus = nus
 self.wait = wait
 def run(self):

Main Thread:

Input: Takes file name with list of devices. Have hardcoded some wait time and number of iteration - which can be modified based on need
Output: It creates thread per device listed to check for PFE Drops
if __name__ == '__main__':
 parser = argparse.ArgumentParser(description="Enter File Name for devices")
 parser.add_argument('-f', action='store',dest='devfile', help='Enter File Name for Device')
 result = parser.parse_args()
 dfile = open(result.devfile,'r')
 ts = time.time()
 st = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S')
 print st
 nthread = 1
 nus = 1
 wait = 4
 threadfun = []
 iteration = 1
 nes = []
 for devices in dfile:
 dev = devices.strip()
 for ne in nes:
 thread = ne+str(nthread)
 print thread
 thread = newthread(nthread,ne,nus,wait)
 print "Error: unable to start thread" 

Running Script:

% python -f devices
2016-09-12 13:12:28
Function been called a.b.c.ddev
Function been called p.q.r.sdev

2016-09-12 13:12:30
2016-09-12 13:12:30
Number of FPCs Sloat 10
Collecting first set of data Device(p.q.r.s)
Number of FPCs Sloat 20
Collecting first set of data Device(a.b.c.d)
Sleeping for 4 second on device Device(p.q.r.s)
Sleeping for 4 second on device Device(a.b.c.d)
Collecting second set of data Device(p.q.r.s)
dev Device(p.q.r.s) For FPC 0, Fabric Drop count increases 95230285
dev Device(p.q.r.s) For FPC 8, Bad Route Discard count increases 2
dev Device(p.q.r.s) For FPC 9, Fabric Drop count increases 101731073
2016-09-12 13:12:37
Collecting second set of data Device(a.b.c.d)
2016-09-12 13:12:39

