After the tracking comes the difficult part. I have not yet found a reasonable way to merge the individual tracking results into a reasonable output. I am curious to see what progress you will make here.
Just a note: you could continue training on the yolo model that herpaderpapotato provided and use it to train on your dataset to improve it further. (You should probably use the same labels as he did. It would certainly be helpful to set a standard for the labels so that everyone doesn’t do their own thing, which doesn’t lead to any real progress)