The Dune RADIUS server V0.03 Design.

Scope of the system.
Configuration of server.
Technology Used in Implementation
Code Design
Interaction Between the Server and the Environment

Scope of the system.
The first part of the system is a DLL which is to be used for RADIUS protocol functionality. This DLL will be used by programs in the system and will be licensed under the LGPL so that it can be used in other programs. The main part of the system will be a RADIUS server described later in this document. Also the package will comprise RADIUS test programs. I currently plan to write two test programs, the first is to be a simple program which will send a single RADIUS request based on command-line parameters as the "radtest" programs that come with several pieces of RADIUS software do. This will serve as an example of how to use the RADIUS DLL and also will be useful for diagnosing why a RADIUS server doesn't work. The second test program will be designed to test the performance of RADIUS servers. It will be called Sandstorm.
Currently I have no definate plans for other components of the system. I have thought about monitoring tools etc, but I think that I have enough work for the moment. If you want to write such things then let me know and I'll give you a list of ideas.

Configuration of server.
The configuration of the server will be based around a hierarchical tree of files. Each directory represents a separate part of the configuration. The code for parsing the files will be based around pure virtual classes so that in future it will be easy to write some new classes to take configuration data from an LDAP directory or any other source (it makes sense to have the RADIUS server configuration in the same place as the user data). NB All files will work with either UNIX or DOS format (^M characters will be silently ignored).
The various config files will be in a "name=value" format that will be familiar to anyone who's ever configured Samba. I might even put support for configuration via LDAP in the first release as it maps so well to this type of configuration.
The server will reload it's configuration if it notices the presence of a flag file (to allow configuration fron NFS or SMB clients) or if it receives a signal. There will be an option to have the server automatically reload it's configuration periodically to make it easier to have the config files be generated and re-generated automatically.
The aim of the server is that it should be easy to setup and document. To that end any file in the configuration tree ending in .html, .txt, or .doc will be ignored, these will be considered to be documentation for the configuration. For every directory in which there are a number of files which are likely to have the same or similar contents it will be possible to have a file named "default" to specify default values. This should allow rapid installation, easy and reliable reconfiguration (you won't forget to change something), and should result in a system that is easy to understand.
When the server looks up a file or directory to load it's configuration it will first look for the exact case specified, if that doesn't match then it will do a case-insensitive lookup. This is designed to work well with DOS, OS/2, and Windows systems which often mess up case. Also wherever possible inside config files the data will be case insensitive, but the server will operate in a case-preserving manner (so even if the server believes that case doesn't matter it will use the case you specify).
Any line that starts with a "#" character will be considered a comment. A line-feed character will be taken as an end of line with no exceptions. A carriage-return character (^M) will be ignored at the end of a line (to support DOS format text files), if such a character occurs anywhere else then it will have no special treatment. A blank line will always be ignored. Most config files will consist of lines of the format "name=value". Everything after the "=" will be considered part of the value, it may contain another equals sign or any other character other than a new line character.
I have not yet decided exactly on how "strings" should be represented. Initially I will hard code the logging to match the output of some common RADIUS server. I plan to enable the use of config files to specify the log file format. I would like some suggestions on this. Should I use something similar to a printf() format string? NB If I do that then I'll write my own parser for the strings as I will not pass data from config files straight to printf() for obvious reasons. One problem with printf() is that we can't specify the order of parameters. If we don't use printf() formatting then do we use the C style methods for specifying non-ASCII characters?
The top level directories will be:

Clients	Config for RADIUS clients
Addresses	The IP addresses and UDP ports we listen on. Must also specify which addresses will do accounting, which will do authentication, and which (if any) will do both.
Accounting	How accounting entries are processed. Controls accounting proxies as well as regular accounting. An accounting proxy will store the records to disk and then forward them to the master accounting server when it's online. This also controls managing a db2.6 database of when each user logged on last and how much time they have used. May use this database for setting session limits.
Control	Server control files. These will determine how often (if ever) the server checks for changed config files and anything else of a global scope that doesn't fit into other categories.
Flag	For flag files and other transient files
DataSources	When looking up a user the server will search through sub-directories of this directory in alphabetical order, with each sub-directory specifying a source of user data. I recommend that the directory name start with a number to make the order unambiguous for all (the server will use strcasecmp() and not be case sensitive, so you must start the names with digits to avoid any possible ambiguity. Think of this as being similar to the contents of /etc/rc?.d/ directories. Each directory will contain at least one file specifying a source of information to be used in validating user accounts. For each lookup the server will go through the directories in order, and in each directory it will use a round-robin selection method to determine which file is used. The idea is that you may have multiple files in a directory to support load-balancing or failover. Inside each directory a file named control will specify how the selection of the different data sources will be performed. This will make it possible to specify that one source can be used primarily and the next will only be used if it fails and until it recovers. Or it could be configured to use a service until it fails and then use the next listed service exclusively until an error occurs. Or full round-robin for best performance. It will also specify what to do if all the sources listed in the directory fail. Should it go to the next directory or just reject the RADIUS request.
Logging	Configure the logging of RADIUS requests.
bin	This directory name is reserved in case the administrator wants to use scripts to configure the system (IE scripts to generate the config files). This is just so that the one tree can have all the RADIUS server configuration-related files. In future versions I may suggest putting certain files in this directory, and I may add support for the server to run scripts from this directory. But I will not make any changes that force anyone to change their "bin" directory. This directory does not have to exist.
lib	Same as the above but for libraries. NB If you store the server configuration under /etc (as you would in a Debian Linux system) then "bin" and "lib" should be sym-links to somewhere more appropriate for binaries.
Others	For future use in configuring functionality outside the core RADIUS functions. This could be used for configuring TACACS is support for that protocol is added.

Technology Used in Implementation
My aim is for this program to be easily ported to any modern UNIX system and to provide the maximum possible performance on minimal hardware. I believe that threads are needed to get good performance with a clean design, and the POSIX threads are supported in all recent UNIX systems so I plan to use them extensively. I am currently looking at the ACE communications library. I am not interested in this for it's platform independance (although it could come in handy), but for the features it provides. My current plans are for LDAP to be the primary source of account data with the OpenLDAP libraries used to access LDAP. I believe that all good programs should be written in Java or C++, as Java is not appropriate for this type of application at the moment this means C++. I believe that all good C++ programs should use the STL. Ideally the code should compile on as many C++ compilers as possible, but I only plan to test with the EGCS compiler as used in the Debian distribution of Linux (I already have a volunteer to test on Solaris and SCO and expect to find volunteers to support other platforms).
Code Design
The OpenLDAP libraries influence my choice of code design for two reasons. Firstly at the moment I consider LDAP to be the primary source of account data. Secondly I expect other sources of such data to have similar APIs and cause similar design issues.
The search APIs of the OpenLDAP library can be invoked in two ways, synchronous and asynchronous. The synchronous API allows setting a timeout and seems to do everything you would want. The asynchronous API involves a search API call that returns a message ID. You can then poll for the result of that search or block on an API call waiting for the result. If there are many LDAP operations in progress then it is possible to issue a call for the result of any completed operation (again this call may be blocking or non-blocking).
In my tests of a commercial LDAP directory with a commercial RADIUS server accessing it I found that when under a load of 140 RADIUS requests per second it took just over 1.7s for each request to be satisfied. Let us assume for the sake of discussion that 0.2s was taken by the RADIUS server and 1.5s was taken by the LDAP directory. So this means that if there are 140 requests per second hitting the server and each one take 1.5s then 140*1.5 = 210 transactions in progress on average. I think that it is not feasible to have a single thread polling 210 transactions separately so we have a choice of either having separate threads blocking on each transaction or having a single thread which polls for LDAP responses and another thread which cancels LDAP queries if they take longer than the RADIUS timeout (if using synchronous calls then we can just set the timeout to be the RADIUS timeout).
The time taken to create threads is not a deciding factor. Linux 2.2.10 running on a Pentium233 can create and destroy a thread in about 300us according to my forktest program. Solaris running on a comparably quick SPARC machine can do this even faster (under 100us). Even if we end up having a 300us delay added to each transaction then I believe that's an acceptable overhead if it fixes other problems.
One issue that may be a deciding factor is the maximum number of threads that we can have running. On a default Linux 2.2.10 kernel it is 1023 (we can create 1022 threads which gives 1023 when we count the parent). If we have 200+ threads used for LDAP authentication before we start using threads for other purposes then I am concerned that 1023 is not enough.
The other option is to have one thread for listening to RADIUS requests and issuing LDAP asynchronous search requests. Then we have another thread for reading the LDAP responses (hopefully the LDAP API will allow us to do a blocking request for LDAP responses with a large timeout even when no transactions are in progress so the LDAP reader doesn't have to bother about whether there are any requests in progress). Then we would need to have a thread which checks for RADIUS timeouts and cancels transactions that take to long. The reader thread would be responsible for sending out RADIUS responses.
Interaction Between the Server and the Environment
When writing the log files the server will call stat() on the log file name every minute to see if it has been renamed (by comparing the inode number). If so then it will close the file handle that it has for the log (which may now refer to a deleted file or a file with another name) and open/create the new log file. This means that to run the end of day processing on the log file you just have to do "mv log old.log ; sleep 60 ; process old.log".
I recently attended a presentation on performance analysis from an SGI employee at the Conference of Australian Linux Users. I would like to add some hooks for performance analysis when someone writes some decent performance monitoring software for Linux. I plan to ask the SGI guy about what considerations I should make in the architecture of the program for this.