In Python for now, eventually in VBA, I’m trying to come up with a way to test whether or not a particular quantity is appropriately specified with valid units and tolerances. It turns out to be slightly simpler to be (semi-)lenient than extremely strict, so here’s what I have so far:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#decimal numbers number='[\d\.]+' #unicode superscript digits, minus sign, and parentheses exponent='[\u2070\u2074-\u2079\u207b\u207d\u207e\xb2\xb3\xb9]*' #SI prefixes prefix='(?:[YZEPTGMkhdcm\xb5npfazy]|da)?' #indivisible SI prefixes ind_prefix='(?:[YZEPTGMkh]|da)?' #JEDEC/IEC Binary prefixes bin_prefix='(?:[YZEPTGMK]i)?' #units that can have a prefix (though some are less standard than others) prefixed_unit='[ABCFHJKLNRSTVWbglmstu\u2126]|Bq|Ci|Da|Gy|Hz|Np|Pa|Sv|Wb|bar|cd|eV|kat|lm|lx|rad|rem|sr|ua|mol' #units that cannot have a prefix (normally) solo_unit='[°\'\"dh\xc5]|°C|ha|mmHg|min' #units that primarily take binary prefixes bin_unit='B|bit' #additional non-standard units that cannot have a prefix (because I said so, not because there's any "ban" on it or anything) solo_unit+='|°F|ft|in|kt|lb|nmi' #define a generic unit unit='(?:(?:'+prefix+prefixed_unit+')|(?:'+solo_unit+')|(?:'+bin_prefix+bin_unit+')|(?:'+ind_prefix+bin_unit'+))'+exponent #separator: whitespace plus thin space sep='[\s\u2009]?' #multiply operator: whitespace, thin space, and cdots multiply='[×\u2219\u22c5\xb7\s\u2009]' #define compound units involving only multiplication compound_unit='(?:'+unit+multiply+'{1,3})*'+unit #define compound units involving division as well mega_unit=compound_unit+sep+'(?:\/(?:'+sep+'\('+sep+compound_unit+sep+'\))|(?:\/'+sep+unit+'))*' #put it all together everything=number+sep+'('+mega_unit+')'+sep+'(?:±'+sep+number+sep+'\\1|\+'+sep+number+sep+'\\1'+sep+'\/'+sep+'-'+sep+number+sep+'\\1)|[<>\u2264\u2265]'+sep+number+sep+compound_unit #compile! re.compile(everything) |

This will handle quantities with symmetric and asymmetric tolerances are specified, as well as inequalities with units. It doesn’t enforce asymmetric tolerances (the numbers could, theoretically, be the same), and the separators are lazy for now.

The goal is actually to highlight *non*-conforming quantities so that they can be corrected. I suspect that will be more than a minor challenge.