Small Quadrata puzzles - in the order of 3x3 and 4x4 - can be created by hand but as the puzzles get
bigger, it starts to take a disproportionate amount of time to make them. In order to make them there are a number of
things that have to happen and you can tighten these things up should they start to become an issue.
First of all, you need to decide what you want to do. I wanted to make puzzles for the website and
for books so I needed a way of solving them and creating them. Instead of building an enormous program that does
everything, I decided to build a set of programs where each does a job and can be optimised without interfering with
the functioning of everything else. The problem with a large program that does everything is that you might modify part
of it and break the functionality of another part. Another is that if you want to adapt the purpose for some reason,
you might break the original function. By dividing up the jobs between specialised programs, each can be tackled in a
sensible way and as long as you don't break your rules for communication, they will continue to communicate with each
other.
So, I needed to decide how the programs would talk to each other and talk back. There are a number of
ways, including but not limited to: Pipes; Command Line; Environment Variables; and, Files.
- Pipes are handy in that any program can write to them and any program can read from them - they are,
for the uninitiated, a FIFO stack - and they are fast because they exist only in memory - they appear to be part of
the file system because they are virtual. The problem with them is that you can only read any particular piece of data
once, unless it is written more than once.
- Command Line variables are quick and easy to read but it can be limited in size.
- Environment Variables can pass that information quite well but again, can limit size.
- Files can store anything can can be as big as you like. They are stored in memory and at some stage copied to
disc but if you are writing a file and them reading from it a matter of milliseconds later, the data that is being
'read' is coming straight from the cache in memory so in effect, as far as the functionality required here, as quick
as the others.
So, for the range of sizes of the data and the quantity of data, the best way to talk program-to-program
is small files so next we need to decide what data needs to be sent and how flexible that needs to be.
Looking at the puzzles, these things need to be defined: The size of the puzzle; the characters that go
in it and their order; which if any characters are given as clues; which inequalities are given as clues.
Note that the inequalities are given as horizontal and vertical inequalities and they only appear between
cells. Also, the characters in the the puzzles in the newspapers are numbers only so the order is already defined - but
why limit it to numbers? So...
Step 1. - Define a file standard that has cell contents/placeholders; the horizontal inequalities;
the vertical inequalities and; with flexibility in mind; the placeholder defined and the set of characters in order. It
doesn't really matter what form this takes - a simple flat text file is good enough and serves the purpose well. The only
provision is that it is standard within the bounds of the set of programs that use it.
Step 2. - write the solver - no progress is going to be made with anything else without one.
Step 3. - Write the program that generates the puzzles and saves them in a standard format.
Step 4. - Write the web content generator program.
Step 1 is a no-brainer so we'll just assume it is effectively done. Note that in order to test the
program that you are going to write next, the solver, you need to create on of these by hand so a flat text file is a very
good place to start.
Step 2 basically takes the content of the data file and solves the puzzle that it represents. So, the steps within
that task can be broken down into:
- Create an array of cell contents as currently known, an array each for the horizontal and vertical inequalities;
- loop around doing what you do when you do the puzzle yourself but translated into computer speak:
- clearing each row and column of any value that has been identified as a 'sole occupant';
- reducing available values in a cell affected by an inequality;
- clearing out pairs or triplets;
- evaluating the completeness of the puzzle and go back to 1 until no further progress can be made.
and at the end of that, we should know if the puzzle as supplied in the file can be solved and if it can, what the solution
looks like.
- print out the answer in a way that can be useful to a human and to a program that is interested in whether or not it can
be solved
Typical program output looks like this;
> time ./quadsolve
1
Quadrata Solver (c)2020 Paul Grosse
Puzzle size = 3 x 3
Sequence: 1 2 3
[3] [2] [1]
[1] [3] [2]
[2] [1] [3]
which took just 3ms to solve the puzzle. Note that the program outputs a single digit on the first line of output. This is set
to a 1 for success and a 0 for failure. You can use an exit code to signal this as well but here, this is the visual output.
Of course, the amount of time that a puzzle takes to solve will increase with the size. The graph on the right
shows the times it takes for various puzzle sizes to be solved by the solver, using the 'time' program that is a standard part
of Linux/BSD distributions. Another thing to note is that this is just a number of puzzles of each size to show how it increases
with size and for a given size, the results can vary quite a bit.
Step 3 is writing a program that generates the puzzles.
- Define the character set, the size of the puzzle and set out the board;
- Fill the cells on the board with all of the characters;
- Reduce the cells down to one character per cell such that no row or column has more than one of a particular character in it;
We now have a Latin square. If the character set is words, there will be vowels there and a possibility that offensive words could
be formed so we need to scan through the rows and columns and look for them.
- Scan for profanity and if any is found, reject that square and start again;
- Insert all of the inequalities;
- Take out Randomly located inequalities and characters;
- Test the puzzle with the solver
- If it passes, go back to step 6;
- If it fails, increase the failure count;
- If the failure count is less than some arbitrary value, restore the puzzle's previous state and go back to step 6;
- Restore the puzzle's previous state and leave the loop;
- Scan all cells, one at a time and take out inequalities and characters;
- Test the puzzle with the solver
- If it passes, go back to step 6;
- If it fails, increase the failure count;
- If the failure count is less than some arbitrary value, restore the puzzle's previous state and go back to step 9;
- Restore the puzzle's previous state and leave the loop;
We now have a puzzle that works and is free of superfluous clues.
- Save the puzzle's data in a log and finish.
The idea of using a log file is that you can use that data to produce a number of different types of output whether that is for
PNG image files or PostScript files and therefore PDF files.
Note that the generator program can also be written so that arguments can be passed to it either from the command
line of the terminal it is running on or from another program.
This is the output when it created the above problem...
Quadrata Generator (c)2020 Paul Grosse
Puzzle size = 3 x 3
[ ] [2] [ ]
[ ] [ ] [ ]
[ ] [ ] [3]
Sequence...
1 < 2 < 3
and the time program revealed that it took only 64ms to produce it.
A program with flexibility designed into it both in terms of output puzzle size and what characters can be
used will take different times to produce the same sized puzzle because different checks are being made - the additional check
for profanity in the case of words-based puzzles - the letters and numbers puzzles take the same times to produce as there is no
difference in how they are made.
Again, the time taken to produce files of a given size increases with size but becomes increasingly variable as
well as disproportionately long as you can see from the graph on the right which shows a scatter of five runs for each size of
puzzle from 5x5 to 19x19 using words (so the profanity checks are in there).
Although there is a rough curve that you can use to guess the time take to make a puzzle of a given size, the
problem is np complete so there is no way of calculating it for certain. However, for the sake of an intellectual exercise, you
can do a rough fit of the output of an equation to the curve of average times as in the graph on the right. Interestingly it
seems to follow a power of 7 and as it is so random (average out more than five points and eliminate rogue values and you might
get a better fit) I have only used odd powers and a constant to get a fit. So, the equation for the amount of time it takes to
produce a given sized puzzle, and taking into account that the real time can vary wildly from this, is as follows:
T ≈ 0.00052 x n7 + 0.01 x n5 + n3 + 60 x n - 250
where T is the time in ms and n is the puzzle side length.
There are many ways in which a given run can vary such
as:
- whether it has to reject a layout because some cells end up with no character in it;
- there happens to be a working Latin square that has profanity in it, forcing another cycle in the generating loop;
- the random elimination part of the program happens to be more or less efficient in eliminating cell contents that can be
eliminated; and so on.
There are ways that you can make the program more efficient by tweaking the program but you might find that some things you do
will make it more intensive where it doesn't have to be - increasing the run time for smaller puzzles but decreasing the run
time for larger ones.
With 1, you might try to eliminate some instances of this by detecting the problem area (two cells with only
two values having one of those values used by a third cell in the elimination process - make a routine that focusses on those
cells and eliminates those values from other cells on that row/column (depending on the orientation of pair).
Further on 1, the part of the program that looks for a cell that makes the whole square a failure doesn't have
to scan the whole square - once it detects one, it can exit the loop and a new one searchd for;
With 2, A similar thing can be done with searching for profanity - building up rows and columns then scanning
them with hundreds of search strings is achieved quickly in a string processing optimised language such as Perl but it still
takes time and if you have to do it a lot, it will all add up so as soon as profanity is detected, it can drop out of that loop.
Further on 2, instead of making the program throw that Latin square away - an option that isn't a problem with
small squares - realise that the Latin square is the solution to the problem that the first stage of the program is looking for
and the actual characters in there are the ones that are causing the problem. The solution to this is simply do a Caesar cipher
substitution and then re-evaluate it for profanity. With this, it is a good idea to have a counter set up for this because you
might find that that Latin square will have a tendency to fail on this so it might be worth putting a halt to the routine if it
fails too many times for a given square.
With 3, this is largely down to chance but you can set a value based upon the acceptable level of elimination -
set it too high and it will waste time trying out things that don't work, set it too low and the next step will mop that up.
At each stage, you can get an idea of what is going on with the program by getting it to print out bits of
information so you can adjust the program parameters making it more efficient for whatever sized job you want to concentrate on.
Making a program that only works on numbers will end up faster but there is no flexibility in that. So, what can be achieved?
The following table is for word-based Quadrata problems...
Comparison of Quadrata Generator Runs Before and After Optimisation. |
Size | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
Pre-opti-misation |
Run Time / s |
1.38 | 1.94 | 3.08 | 6.00 | 12.4 | 21.8 | 42.0 | 86.3 | 119 | 192 | 332 | 522 | |
Optimised |
Run Time / s | 0.268 | 0.557 | 1.18 | 2.11 | 4.11 | 8.15 | 14.3 | 23.2 | 39.5 | 60.6 | 89.0 | 167 | 229 | 326 | 556 |
Grid-fill loops | 0 | 1 | 1 | 1 | 1 | 4 | 8 | 14 | 11 | 11 | 8 | 34 | 124 | 170 | 245 |
Instances of profanity* | 0 | 0 | 1 | 0 | 1 | 5 | 2 | 1 | 7 | 8 | 5 | 18 | 19 | 44 | 44 |
Stage One Removals | 88 | 132 | 178 | 233 | 288 | 357 | 424 | 511 | 584 | 675 | 762 | 873 | 975 | 1077 | 1198 |
Stage Two Removals | 4 | 5 | 7 | 11 | 19 | 24 | 34 | 37 | 53 | 66 | 84 | 93 | 114 | 140 | 157 |
Run-Time Factor of Pre- and Post-Optimised Runs |
Factor x:1 | 5.1 | 3.5 | 2.6 | 2.8 | 3.0 | 2.7 | 2.9 | 3.7 | 3.0 | 3.2 | 3.7 | 3.1 | |
Run-Time of Solver |
Run Time / ms | 8 | 11 | 21 | 33 | 46 | 62 | 84 | 152 | 243 | 310 | 380 | 498 | 650 | 841 | 928
|
|
* This can vary widely according to the proportion of vowels - some runs generating over 50 in one cycle.
|
So, you can see that by: taking out a lot of unnecessary loops by exiting early when there is a fail state; and, by re-lettering instances of profanity thus largely eliminating the need to reseed the grid, a lot of time has been saved. Overall, the speed-up has saved roughly two-thirds of the run time.
Step 4 - writing programs to take log content and turn it into various things - is fairly straight forward.
In order to run this program productively, another program is written that runs the generator program, passing to it the information that is needed in order to get it to produce what is needed.
So, for web content, puzzle type and size is passed to it according to which day of the week it is and the results turned into PNG files and PS files which are then converted to PDF files using PS2PDF.
In the case of the books, one program sends it which types and sizes are needed so that it fills up a special log file. This can be left running on its own without getting in the way of other programs on the system by giving it a low priority. Once that has finished, another program reads that log file in and turns it into the PS for a book.
In order to do this, all I can say is that there is little that is as satisfying than writing a computer program that writes other computer programs.