I needed to generate random verbiage that still reflected indexing performance somewhat realistically for a JMeter test plan I built for a project with Jazkarta. As such, I wanted something other than random strings, I wanted real words but random selection. I wrote z3c.gibberish to do this.
There are a couple improvements I'd like to make to z3c.gibberish given interest and time:
Weighted Word Frequency
I'd like to improve z3c.gibberish to use a weighted random selection so that more common words are selected more often.
On further thought, it would probably be best to support this by using a dictionary that repeats words to represent frequency. Maybe z3c.gibberish could be extended to take a normal dictionary and a file representing weights and output a weighted dictionary to be used for subsequent runs of z3c.gibberish.
Optimized Row Handling
I could also make it more efficient by pre-assembling a list of dictionary indexes for each row and then assembling the row by retrieving the words corresponding to the index in order to avoid repeatedly iterating through the dictionary for each row/cell.