Django Tips

I love Django. The web framework, not the jazz guitarist, though he’s not bad, either. Their documentation is a bit lacking in places, mostly in the details, which may be semi-niche edge cases, but I think it’s worth documenting what I’ve found, even if just for myself (as these posts usually are).

Widget Templates

I’ve wanted to tweak a few, and have, previously to what I discovered yesterday, gone deep into the weeds to modify the templates (in django/templates/django/forms/widgets). There are much easier ways! First of all, there’s an invaluable app, django-widget-tweaks, that, even though the code is a tad stale, does enough to be worth its use. Second, if you want to be fancier about things, or if you use other apps that override widgets (like django-markdownx), newer versions of Django need additional settings that are far from clear in the documentation.

First thing, edit your settings.py file. Add FORM_RENDERER='django.forms.renderers.TemplatesSetting' as its own line somewhere in the file. If you do this, and only this, you will find yourself presented with an error page where Django is expecting you to provide templates for everything. Maybe this is what you want, but probably not. To fix this behavior, in INSTALLED_APPS, add ‘django.forms’, and make sure this is the last line of the installed apps, or at the very least, after anything that might feasibly override a widget. That’ll take care of details that you don’t feel like worrying about as you develop your app. You might reach a point where you’ve covered every widget with your own, but it’s good to have a fallback just in case one was missed!

ModelForms

So you’re useing ModelForms to help build forms that you don’t have the time or inclination to write get and put methods all the time. Great! That’s what they’re there for! But there are some things that aren’t obvious, by any stretch!

How about passing data to one of these forms via the URL?

Let’s consider the polls model from the tutorial. Say that, instead of adding answers when editing the question (which, yes, in that case, makes more sense, but go with me for a minute) we want to add answers to a question on a page with the url ‘polls/<int:question_id>/add_answer/’. We know we’re adding an answer, and because of the URL we know the question ID, but the normal ModelForm for the given Answer model will ask you to provide the question_id, because of the ForeignKey field.

Here’s what you need to do:

class AnswerAdd(generic.CreateView):
    model=Answer
    def get_form(self):
        self.question=get_object_or_404(Question,id=self.kwargs['question_id'])
        partial_ans=Answer(question=self.question)
        kwargs=self.get_form_kwargs()
        kwargs['instance']=partial_ans
        form=AnswerForm(**kwargs)
        return form

First, you can’t get away without creating an AnswerForm class, but since this only requires a couple of extra lines from what you’d need in your view, and you’d have to duplicate that in your create and update views anyway, it’s probably worth it.

Second, and annoyingly, you can’t instantiate the form by doing AnswerForm(self.get_form_kwargs(), instance=partial_ans), but this is for good reason: because there’d then be two instance keyword arguments in the constructor—not allowed. Thus the separate lines for kwargs. This also allows you to pass in pass in extra data to the form.

Why can’t we do this in get_context_data? Because that method doesn’t handle the object instance.

There is a downside here, too: you can’t use the mere existence of the object in your template to figure out if someone is adding or editing an object. There’s a simple fix, though, just update your template to use this logic instead:

{% if not object.pk %}Create{% else %}Update{% endif %}

Updating Widgets at Runtime

This may be controversial, but I don’t know why. I think it has it place, and it’s situational. I have a number field. It has a meaning, and as the software developer I know what it means, but since that meaning can vary between instances in a rather complicated way that doesn’t lend itself to creating more objects and tables, and since I want to convey that meaning to my users, I want to replace the <input> widget with a <select> widget, a drop-down that’s far more intuitive for my users.

Here’s how you do that, though:

class MyModelForm(forms.ModelForm):
    class Meta:
        model=MyModel
        fields=[
            'field1',
            'field2',
            # you get the idea
        ]
    def __init__(self,*args,**kwargs):
        obj=kwargs.pop('some_object')
        super().__init__(*args,**kwargs)
        self.fields['field1'].widget=forms.Select(
            choices=obj.get_choices()
        )

Removing Widgets Dynamically

This is answered a few places at SO, but it’s worth rehashing here for my own benefit. Sometimes, you want/need to have a field that is only displayed during the creation of an object, or only when an object is updated, but you’re lazy and don’t want to make a CreateForm and an UpdateForm. I’m with you, all the way! And, as usual, there’s a way to do this:

class MyModelForm(forms.ModelForm):
    def __init__(self,*args,**kwargs):
        super().__init__(*args,**kwargs)
        if not self.instance:
            self.fields.pop('special_field1')
        else:
            self.fields.pop('special_field2')

With the logic above, special_field1 is removed when creating a new MyModel, and special_field2 is removed when updating that MyModel. The downside is that this won’t work in conjunction with passing data via the URL, because you have an instance, even if it’s mostly empty. You have to be a teensy bit cleverer (though I gave this answer out earlier in this post, I only figured it out here, so I’m repeating it for everyone’s benefit):

class MyModelForm(forms.ModelForm):
    def __init__(self,*args,**kwargs):
        super().__init__(*args,**kwargs)
        if not self.instance.pk:
            self.fields.pop('special_field1')
        else:
            self.fields.pop('special_field2')

Note that this time I’m checking for a value of instance.pk that isn’t a Boolean false. While you create an instance, no primary key is (typically) created until the object is saved in the database. Even if you have some situation where the PK is filled out in the instance you pass to the form, it’s likely there will be something that you can use to test if the form should be an edit or an update.

Deploying Python Scripts

I write a lot of little Python scripts for work, and a few for personal use. Most of them, even the work-related ones, never leave my own machine, because I write them to make my own life easier, and the syntax is specific, confusing, or I am too lazy to put in any error checking and don’t want to have to deal with the consequences of someone else messing something up by using one of my scripts. But for those that are extremely useful, I need to get them in a format usable by the masses.

We’ll use an example of munger.py—your day-to-day involves munging a file for a particular reason—and you want to simplify the process for you and your colleagues. And the function you want to run, because again you were lazy and not terribly forward-thinking, is called mung (not main).

Python is available on target…

If Python (of the appropriate version) is available on the target machine, you could simply pass along the script as a .py file and be done with it. But you’re nicer than that.

If you’re like me, you have a folder chock-full of random Python scripts. The first thing you’ll have to do is make a folder for the one you want to deploy, and give it an appropriate name. Let’s just do this here:

mung/
|-- mung/
| |-- __init__.py
| `-- munger.py
`-- setup.py

The original script was/is munger.py; the CLI command that will, in the end, be run, will be mung (i.e. mung filename.ext). __init__.py and setup.py are empty files for now. Let’s put some code in there.

Most of the time, I don’t bother with __init__.py, but it can be important when using setuptools, or in creating any package. Most of what you put in here is just imports, though, and here will be no different:

from .mung import *

And now, setup.py:

import setuptools

setuptools.setup(
    name='mung',
    version='1.0',
    packages=setuptools.find_packages(),
    author='<your name here>',
    author_email='<your email here>',
    description='a description',
    entry_points={
        'console_scripts': [
            'mung = mung.mung:mung'
        ],
        'gui_scripts': [
            'mung_gui = mung.mung:mung_gui'
        ],
    },
)

Refer to the documentation here if you plan on deploying your script to PyPI: there are a ton more options and keywords to pass to setup, and most of them will help your project’s use if you want to make it public (e.g. a full test suite, documentation, bug tracking, etc.). If you have a larger package, specific dependencies, a more complicated directory structure, they’re all covered there. But for the basics, what’s above should be more than sufficient.

Keep in mind: you must include name, version, packages, and entry_points (technically, entry_points is not required, but it’ll make documentation and explaining/teaching the use of your script that much simpler later—it’s worth it).

Now you’re ready. With those files saved, in the top mung directory (containing setup.py), and after ensuring you have the latest version of setuptools and wheel, run the following:

python setup.py sdist bdist_wheel

You’ll find the built version of your script in dist/mung-1.0-py3-none-any.whl. The version, and ‘none’ and ‘any’ parts of the filename may be different depending on your build options. But you can pass that one file on to your colleagues, and have them run the following to install it:

pip install mung-1.0-py3-none-any.whl

Rename as you feel appropriate before passing it on!

Your built script will be a bit larger, but not by a lot; I used a 1064 byte test file on my own machine as an example and the built wheel was just 2052 bytes!

Python not available on target…

If Python is not available on the target machine, things are a bit easier and a bit harder at the same time. I recommend is getting your hands on the latest versions of PyInstaller (which does not actually create or install things) and NSIS (which does). I’ll only discuss utilizing these tools. If you want to cross-compile, this will not do the job. For that I’d recommend trying to get Python installed on the target computer—it’s probably easier in the long run.

Each of the following options have their pros and cons. If speed is an issue, and rewriting the program in C or installing Python on the target are not options, go with option 1. If portability is the main factor, go with option 2. If you’re tight on file size, try 2a.

Option 1: Installer

Using PyInstaller, building the script into an executable is fairly straightforward, although you don’t have the same flexibility as the previous method. You can’t define entry points, so you need to be able to directly run the script—no function calling after-the-fact. Add something like this to the bottom of your file if necessary:

if __name__=='__main__':
    mung()

Run the following command:

pyinstaller munger.py

You’ll wind up with munger.exe and a bazillion other files in a dist directory. You may have to mess with the spec file that’s created to make it work just right; see the documentation for details there.

With my 1064 byte test input, I wound up with 52 files in the output totaling 12,513,606 bytes.

How do we deploy 52 files? NSIS. Create mung.nsi with the following contents:

!include "MUI.nsh"
#define compression method
SetCompressor /SOLID LZMA

!define MajorVersion 1
!define MinorVersion 0
!define PatchVersion 0

#define version information
VIAddVersionKey "ProductName" "Munger"
VIAddVersionKey "CompanyName" "<Your Company Here>"
VIAddVersionKey "LegalCopyright" "©20XX All rights reserved."
VIAddVersionKey "FileVersion" "${MajorVersion}.${MinorVersion}.${PatchVersion}.0"
VIAddVersionKey "ProductVersion" "${MajorVersion}.${MinorVersion}.${PatchVersion}.0"
VIAddVersionKey "FileDescription" "<Executable Description Here>"
VIProductVersion "${MajorVersion}.${MinorVersion}.${PatchVersion}.0"
VIFileVersion "${MajorVersion}.${MinorVersion}.${PatchVersion}.0"

RequestExecutionLevel admin ;Require admin rights on NT6+ (When UAC is turned on)

!include LogicLib.nsh

Function .onInit
UserInfo::GetAccountType
pop $0
${If} $0 != "admin" ;Require admin rights on NT4+
    MessageBox mb_iconstop "Administrator rights required!"
    SetErrorLevel 740 ;ERROR_ELEVATION_REQUIRED
    Quit
${EndIf}
FunctionEnd

#define output file
!ifndef LabelVersion
  Outfile "munger-${MajorVersion}.${MinorVersion}.${PatchVersion}.exe"
!endif
!ifdef LabelVersion
  Outfile "munger-${MajorVersion}.${MinorVersion}.${PatchVersion}-${LabelVersion}.exe"
!endif

#define installation directory
# should be C:\Program Files\Munger
InstallDir "$PROGRAMFILES64\Munger" #Insert appropriate directory here

#define installer name
Caption "Munger"
Name "Munger"

#define variables
!define MUI_ICON "myicon.ico"
ShowInstDetails show

!insertmacro MUI_PAGE_WELCOME
!insertmacro MUI_PAGE_DIRECTORY
!insertmacro MUI_PAGE_INSTFILES
!insertmacro MUI_PAGE_FINISH

#Include logic functions
!include LogicLib.nsh

#default section
Section
  #define options
  SetShellVarContext current
  SetOverwrite ifnewer
  SetOutPath $INSTDIR
  #install file
  File /r dist\*
  CreateShortcut "$DESKTOP\munger.lnk" "$INSTDIR\munger.exe"
  WriteUninstaller "$INSTDIR\uninstaller.exe"
SectionEnd

Section "Uninstall"
  Delete "$INSTDIR\uninstaller.exe"
  !include /CHARSET=UTF8 "remfiles.nsh"
  RMDir "$INSTDIR"
  Delete "$DESKTOP\munger.lnk"
SectionEnd

!insertmacro MUI_LANGUAGE "English"

Add nsis_uninstall_helper.py with the following contents:

import os
import sys

#generate removal file for NSIS
print(sys.argv[1])
with open('remfiles.nsh','w') as fp:
    for root,dirs,files in os.walk(sys.argv[1],False):
        for f in files:
            t=fp.write('  Delete "$INSTDIR\\{}"\n'.format(os.path.join(root,f)[len(sys.argv[1])+1:]))
        for d in dirs:
            t=fp.write('  RMDir "$INSTDIR\\{}"\n'.format(os.path.join(root,d)[len(sys.argv[1])+1:]))

Now you should have something like the following:

somedir
|-- build/
| |-- munger/
| |-- Analysis-00.toc
| |-- COLLECT-00.toc
| |-- EXE-00.toc
| |-- PKG-00.toc
| |-- PYZ-00.pyz
| |-- PYZ-00.toc
| |-- base_library.zip
| |-- munger.exe
| |-- munger.exe.manifest
| |-- warn-munger.txt
| `-- xref-password.html
|-- dist/
| |-- munger/
| | |-- ... <--a bunch of stuff
| | `-- munger.exe
|-- munger.py
|-- munger.spec
|-- nsis_uninstall_helper.py
`-- mung.nsi

In the dist directory, run the following:

python nsis_uninstall_helper.py
makensis mung.nsi

If all goes well, you’ll wind up with munger-1.0.0.exe in the directory. My personal sample came in at a svelte 4,443,448 bytes. This you can distribute like any other installer, and, provided the target user has the appropriate rights, they shouldn’t have much trouble.

Option 2: Single File

Just like in the previous section, when using PyInstaller, building the script into an executable is fairly straightforward, although you don’t have the same flexibility as the previous method. You can’t define entry points, so you need to be able to directly run the script—no function calling after-the-fact. Add something like this to the bottom of your file if necessary:

if __name__=='__main__':
    mung()

Run the following command:

pyinstaller -F munger.py

You’ll wind up with munger.exe in a dist directory. You may have to mess with the spec file that’s created to make it work just right; see the documentation for details there. But that’s all you need! Pass along the file as-is, and you shouldn’t have too many issues.

With my 1064 byte test input, I wound up with just the one file totaling 6,094,767 bytes. Not too shabby!

Option 2a: Single File (With UPX)

You can, perhaps, compress your file even more using UPX. I won’t go into installing or configuring it, see the documentation for the details, but here are my results:

Run the following command:

pyinstaller -F --upx-dir=path\to\upx munger.py

You’ll wind up with munger.exe in a dist directory. You may have to mess with the spec file that’s created to make it work just right; see the documentation for details there. But that’s all you need! Pass along the file as-is, and you shouldn’t have too many issues.

Again, with my 1064 byte test input, I wound up with just the one file totaling 5,057,013 bytes!

Money

Debt. Consider all your credit cards, student loans, mortgage, a car payment or two, and the financing on your smartphone. They all have different interest rates, calculated in various ways, with payments sometimes being applied non-equally across balances (like promotional, seasonal, balance transfer, and cash advance rates on credit cards all might be different, and a payment might or might not be applied to the balance with the highest interest rate first). How might one figure out how best to pay off these debts?

With a conservative Christian circle of friends and family, Dave Ramsay and Larry Burkett are quoted at me a lot, but, while their debt snowball calculator can be useful, I doubted it was necessarily the most efficient method. I haven’t yet created a function that matches theirs, but the main problem I see is that it doesn’t re-order the creditors or balances in any way.

Another piece of advice I’ve often heard quoted is the idea that one should pay off the highest interest rate first, then work your way down. The idea that this is the most financially sound (that it avoids the most interest) is not always valid, while it seems logically sound. In certain scenarios this does work, but it can be (quite) a bit more complex than that when dealing with several balances.

I’m going to use some completely realistic numbers for an example scenario. Realistic because they’re not too far off from a reality I faced in the not too distant past. Not actually real because the numbers have been slightly modified to protect the guilty parties.

Reference NumberTypeInitial Principal BalanceInterest RateMinimum Payment PercentageMinimum Payment Amount
1Credit Card881.4821.99%3.00%35
2Credit Card1259.3721.99%3.00%20
3Credit Card1063.5421.99%120
4Credit Card5458.2818.20%2.25%25
5Credit Card9395.247.90%3.00%20
6Credit Card10537.4516.99%2.42%20
7Car Loan9082.43.75%183.04
8Miscellaneous Loan9380.774.25%277.94
9Cell Phone770055
10Mortgage 196263.743.75%463.12
11Mortgage 2161391.633.75%972.54

The minimum payment for this set of loans is $2803.66 (35.00+37.78+120.00+122.81+281.86+254.57+183.04+277.94+55.00+463.12+972.54).

Paying the minimum, it would take 27 years 9 months to pay off all these loans with $151456.03 interest.

Additional PaymentPayoff TimeInterest Paid
027 years 9 months151456.03
10023 years 6 months128311.29
25017 years 10 months96941.52

What happens if we rearrange this so we do work on the highest interest rate first? How does that change the outcome?

Additional PaymentPayoff TimeInterest Paid
027 years 9 months150395.76
10025 years 11 months133170.63
25023 years 1 month119501.26

Already it’s clear that paying the highest interest rate alone, even when throwing a constant amount above the minimum at one’s debts, isn’t always the best method.

The method I prefer is quite similar to the snowball method, but rather than keeping all payments equal to the current minimum until one is paid off (and subsequently applying that balance to the next, and the next), mine is adjusted so that the minimum balance is paid on all until one is paid off, keeping the overall paid balance the same.

Anyway, with my method, and the original sort order:

Additional PaymentPayoff TimeInterest Paid
011 years 5 months77763.61
10010 years 11 months74617.76
25010 years 3 months67780.65

But which order to pay these in? Can a different order get me a better (lower) number on interest paid and/or time to pay?

While there may be a better way, I only know brute force, which means I have 11!=39 916 800 possibilities to try out, but even so, by my figuring and without paying a penny extra per month, simply by rearranging the order in which the cash is applied, I can get interest paid down to $77033.62 (after trying only about 23% of the possibilities, though that took about 22 hours). Yes, that includes the two mortgages! A savings of a mere 1% might not seem much over the course of 23 years, but savings can be quite a bit more, percentage-wise, for other amounts. Dropping the mortgages for the sake of argument (and ease and speed of calculation), let’s revisit that last table:

Additional PaymentPayoff TimeInterest PaidReordered Payoff TimeReordered InterestNew Order
03 years 6 months9627.473 years 6 months9085.053, 1, 9, 2, 6, 8, 4, 5, 7
1003 years 3 months9423.473 years 2 months7849.853, 2, 9, 1, 4, 6, 8, 5, 7
2502 years 10 months7183.472 years 10 months6633.551, 2, 3, 4, 6, 5, 9, 8, 7
10001 year 11 months6635.471 year 10 months3834.801, 3, 2, 4, 9, 6, 5, 8, 7

By shuffling the order in which one pays off debts, one can save more than 16% at the $100 additional amount. Note also that the time doesn’t change much at all, despite the occasionally massive changes in interest owed. Work the system to your advantage. (Note also that the initial interest always ends in 47¢. This could be an amazing coincidence, but is probably a bug in my code. I believe it is insignificant in the grand scheme of things, however! I’ll revisit the calculations once I produce code that’s ready for the light of day.)

As far as that code goes, here are my goals:

  1. Make the DebtCollection class iterable
  2. Drastically increase the speed of the shuffling/interest calculation code (10–20 minutes for 362 880 permutations; est. 100 hours for 39 916 800)
  3. Calculate maximum interest for shuffled objects
  4. Allow for a single Debt object with multiple balances at separate interest rates
  5. Correct for payments that do not occur monthly (e.g. biweekly)
  6. Use Decimal class to head off rounding issues
  7. Automagically calculate amortized payment if initial balance is provided
  8. Output payment schedule in nicer format (CSV, HTML, PDF, something) including dates

And my evaluation of those goals:

  1. Easy.
  2. I haven’t noticed any sort of pattern in the results yet, so this one’s more than a bit tricky.
  3. Just one line of code to add to #2.
  4. Another tricky one. I’d have to deal with the ability to specify the order in which balances are paid.
  5. Harder than I initially thought. Could multiply by an appropriate scalar, but that’d turn out incorrect answers.
  6. I can implement as I clean up the code.
  7. If initial balance is provided, easy as pie.
  8. Almost entirely cosmetic, unless I decide to add due dates for each of these, in which case I’d need an extra method for DebtCollection: next_due or something.

Secrets From The Future

I know I’ve covered this in some ancient blog post before, but I lost it, and the code that went with it.

Message Digests

Support

First, some support functions:

import collections

def rotl(x,amount,bits=32):
    x&=2**bits-1
    return ((x<<amount)|(x>>(bits-amount)))&(2**bits-1)

def rotr(x,amount,bits=32):
    x&=2**bits-1
    return ((x>>amount)|(x<<(bits-amount)))&(2**bits-1)

def pad(message,length,mod,tagbytes=0x80,padbyte=0x00):
    m=message[:]
    m.extend(tagbytes if isinstance(tagbytes,collections.Iterable) else [tagbytes])
    m.extend([padbyte]*((length-len(m))%mod))
    return m

def hexdigest(digest,length=32,endianness='little'):
    raw=digest.to_bytes(16,endianness)
    return ('{:0'+str(length)+'x}').format(int.from_bytes(raw,'big'))

The functions rotl  and rotr  are rotate left and rotate right, respectively, taking in x  (the value to rotate), amount  (the number of bits to rotate), and bits  (the total number of bits in the result).

The function pad  is important. Many, if not most, encryption and hashing functions require a multiple of a known number of bits or bytes in the input to operate properly. So this pads the input message such that its length is congruent to length  mod mod .

Starting with the math, if we want, say, \(x \equiv 448 \pmod {512}\), \(x\) can be 448, or it can be 960, or 1472, 1984, etc.

So if we have a message, say the alphabet (abcdefghijklmnopqrstuvwxyz, 26 letters) and we need the length to be congruent to \(0 \pmod {16}\), we need to pad it with 6 extra characters. In our case, each character in the message is represented to the computer by a single byte, so we’ll need to add six bytes. Exactly how the padding is applied can vary from algorithm to algorithm.

The parameters tagbytes  and padbyte  are typical, though my names are possibly unique. For this function, tagbytes  (one or more bytes) will be “tagged” to the end of the message first, and any padbyte  will be used to pad out the message following the tagbytes. A typical scenario: the algorithm requires I add a ‘1’ (bit) to the end of the message, and pad the remainder to a length in bits congruent to \(448 \pmod {512}\). So I pass tagbytes=0x80  and padbyte=0x00  (the default values here) to perform that operation. It is rather unlikely I’ll have a message whose length in bits is not a multiple of 8, so I don’t consider that case. Here, anyway.

For all digests, there’s this general procedure

  1. Pad message
  2. Process message
  3. Return digest

MD2

RFC 1319 (and the extremely important errata) covers the MD2 message digest function. But how would one go about implementing this in Python?

Following our general procedure:

  1. Pad message
    We pad this message so its length in bytes is congruent to \(0 \pmod {16}\), by adding i bytes of value i to the message. But padding of some kind must be added, so if the message length is already congruent to \(0 \pmod {16}\), add 16 bytes of value 16 to the message.
    Next we add a 16-byte checksum to the end of the message. I won’t go into details here as it’s in my code, but this is where the errata from the RFC is important!
  2. Process message
    Prime the buffer, then run it through 18 rounds of 48 exclusive ors with the S-table (which, in this case, is a hexidecimal representation of π).
  3. Return digest
import support
import math

#md2
s_table=[0x29, 0x2E, 0x43, 0xC9, 0xA2, 0xD8, 0x7C, 0x01, 0x3D, 0x36, 0x54, 0xA1, 0xEC, 0xF0, 0x06, 0x13,
         0x62, 0xA7, 0x05, 0xF3, 0xC0, 0xC7, 0x73, 0x8C, 0x98, 0x93, 0x2B, 0xD9, 0xBC, 0x4C, 0x82, 0xCA,
         0x1E, 0x9B, 0x57, 0x3C, 0xFD, 0xD4, 0xE0, 0x16, 0x67, 0x42, 0x6F, 0x18, 0x8A, 0x17, 0xE5, 0x12,
         0xBE, 0x4E, 0xC4, 0xD6, 0xDA, 0x9E, 0xDE, 0x49, 0xA0, 0xFB, 0xF5, 0x8E, 0xBB, 0x2F, 0xEE, 0x7A,
         0xA9, 0x68, 0x79, 0x91, 0x15, 0xB2, 0x07, 0x3F, 0x94, 0xC2, 0x10, 0x89, 0x0B, 0x22, 0x5F, 0x21,
         0x80, 0x7F, 0x5D, 0x9A, 0x5A, 0x90, 0x32, 0x27, 0x35, 0x3E, 0xCC, 0xE7, 0xBF, 0xF7, 0x97, 0x03,
         0xFF, 0x19, 0x30, 0xB3, 0x48, 0xA5, 0xB5, 0xD1, 0xD7, 0x5E, 0x92, 0x2A, 0xAC, 0x56, 0xAA, 0xC6,
         0x4F, 0xB8, 0x38, 0xD2, 0x96, 0xA4, 0x7D, 0xB6, 0x76, 0xFC, 0x6B, 0xE2, 0x9C, 0x74, 0x04, 0xF1,
         0x45, 0x9D, 0x70, 0x59, 0x64, 0x71, 0x87, 0x20, 0x86, 0x5B, 0xCF, 0x65, 0xE6, 0x2D, 0xA8, 0x02,
         0x1B, 0x60, 0x25, 0xAD, 0xAE, 0xB0, 0xB9, 0xF6, 0x1C, 0x46, 0x61, 0x69, 0x34, 0x40, 0x7E, 0x0F,
         0x55, 0x47, 0xA3, 0x23, 0xDD, 0x51, 0xAF, 0x3A, 0xC3, 0x5C, 0xF9, 0xCE, 0xBA, 0xC5, 0xEA, 0x26,
         0x2C, 0x53, 0x0D, 0x6E, 0x85, 0x28, 0x84, 0x09, 0xD3, 0xDF, 0xCD, 0xF4, 0x41, 0x81, 0x4D, 0x52,
         0x6A, 0xDC, 0x37, 0xC8, 0x6C, 0xC1, 0xAB, 0xFA, 0x24, 0xE1, 0x7B, 0x08, 0x0C, 0xBD, 0xB1, 0x4A,
         0x78, 0x88, 0x95, 0x8B, 0xE3, 0x63, 0xE8, 0x6D, 0xE9, 0xCB, 0xD5, 0xFE, 0x3B, 0x00, 0x1D, 0x39,
         0xF2, 0xEF, 0xB7, 0x0E, 0x66, 0x58, 0xD0, 0xE4, 0xA6, 0x77, 0x72, 0xF8, 0xEB, 0x75, 0x4B, 0x0A,
         0x31, 0x44, 0x50, 0xB4, 0x8F, 0xED, 0x1F, 0x1A, 0xDB, 0x99, 0x8D, 0x33, 0x9F, 0x11, 0x83, 0x14]

def pad(message,encoding='utf-8'):
    #add padding of i bytes of i such that message length is congruent to 0 mod 16
    m=bytearray(message,encoding)
    l=len(m)
    tagbyte=(16-len(m))%16
    tagbyte=(tagbyte if tagbyte>0 else 16)
    m=support.pad(m,0,16,tagbyte,tagbyte)
    #append checksum
    checksum=bytearray(16)
    L=0
    for i in range(len(m)//16):
        for j in range(16):
            c=m[i*16+j]
            checksum[j]=checksum[j]^s_table[c^L]
            L=checksum[j]
    m.extend(checksum)
    return m

def md2(message,encoding='utf-8'):
    m=pad(message,encoding)
    d=process(m)
    return d

def process(message):
    X=bytearray(48) #buffer
    for i in range(len(message)//16):
        for j in range(16):
            X[16+j]=message[i*16+j]
            X[32+j]=X[16+j]^X[j]
        t=0
        for j in range(18):
            for k in range(48):
                t=X[k]=(X[k]^s_table[t])
            t=((t+j)%256)
    return sum(x<<(8*i) for i,x in enumerate(X[:16]))

if __name__=='__main__':
    print('testing MD2:')
    test_vectors=(('','8350e5a3e24c153df2275c9f80692773'),
                  ('a','32ec01ec4a6dac72c0ab96fb34c0b5d1'),
                  ('abc','da853b0d3f88d99b30283a69e6ded6bb'),
                  ('message digest','ab4f496bfb2a530b219ff33031fe06b0'),
                  ('abcdefghijklmnopqrstuvwxyz','4e8ddff3650292ab5a4108c3aa47940b'),
                  ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789','da33def2a42df13975352846c30338cd'),
                  ('12345678901234567890123456789012345678901234567890123456789012345678901234567890','d5976f79d83d3a0dc9806c3c66f3efd8'))
    for t,r in test_vectors:
        d=support.hexdigest(md2(t))
        print('digest \'{}\': {}, {}'.format(t,d,'passed' if d==r else 'failed'))

Now, the part that nobody ever shows, which frustrates me when debugging, is the step-by-step, what’s going on inside. That’s the main reason I’m adding to the noise. I’ll run three messages: the empty string, the letter “a”, and, for this one, the alphabet “abcdefghijklmnopqrstuvwxyz”. That should cover all relevant cases

Null String Step-By-Step

Our message begins as m=b”. Such an innocent looking piece of code.

First, the padding. i bytes of value i such that message length is congruent to \(0 \pmod {16}\). It already is, but we have to add something to the end, so 16 bytes of value 16 it is. We wind up with

m=b'\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10'

at line 28. Next part of the padding is the checksum. The outer “for” loop is once for every 16 bytes of the message, and since this message is only 16 bytes long, it’s only run once, and i=0. The inner loop is run for each byte. Starting with an empty checksum bytearray (all bytes in the checksum initialized to 0) and L=0, our first pass gives

c=m[i*16+j]=m[0+0]=m[0]=b'\x10'=16
checksum[j]=checksum[j]^s_table[c^L]=0^s_table[16^0]=0^s_table[16]=0x62=98
L=checksum[j]=0x62=98

For this special case, c will always be 16. But L changes for every iteration (and I’ll only show checksum, because L becomes equal to that value each time around), so

checksum[0]^s_table[16^0]=0^s_table[16]=0x62=98
checksum[1]^s_table[16^98]=0^s_table[114]=0x38=56
checksum[2]^s_table[16^56]=0^s_table[40]=0x67=103
checksum[3]^s_table[16^103]=0^s_table[119]=0xB6=182
checksum[4]^s_table[16^182]=0^s_table[166]=0xAF=175
checksum[5]^s_table[16^175]=0^s_table[191]=0x52=82
checksum[6]^s_table[16^82]=0^s_table[66]=0x79=121
checksum[7]^s_table[16^121]=0^s_table[105]=0x5E=94
checksum[8]^s_table[16^94]=0^s_table[78]=0x5F=95
checksum[9]^s_table[16^95]=0^s_table[79]=0x21=33
checksum[10]^s_table[16^33]=0^s_table[49]=0x4E=78
checksum[11]^s_table[16^78]=0^s_table[94]=0x97=151
checksum[12]^s_table[16^151]=0^s_table[135]=0x20=32
checksum[13]^s_table[16^32]=0^s_table[48]=0xBE=190
checksum[14]^s_table[16^190]=0^s_table[174]=0xEA=234
checksum[15]^s_table[16^234]=0^s_table[250]=0x8D=141

Tack those on to the end and we get

m=b'\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10b8g\xb6\xafRy^_!N\x97 \xbe\xea\x8d'

At the beginning, the buffer is (in this case) simply composed of the following:

X=b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10\x10'

Since our message is now 32 bytes long, we need to do two rounds of 18 rounds of 48. Originally I was going to put it all directly in the post, but that’s way too much. Instead, here’s a text file:

MD2 Null String debug text

Short Input Step-By-Step

In this context, a “short” input is one that is longer than 0 bytes but shorter than the padding requirement (for MD2, between 0 and 15 bytes). The input used here is ‘a’. I’ll skip the details and instead provide the debug text file.

MD2 short input debug text

Long Input Step-By-Step

In this context, a “long” input is one that is longer than the padding requirement (for MD2, greater than 16 bytes). The input used here is ‘abcdefghijklmnopqrstuvwxyz’. This, in particular, requires two rounds of the checksum, which is why the errata is so important. I’ll skip the remainder of the details and instead provide the debug text file.

MD2 long input debug text

MD5

This one’s actually still vaguely relevant today, despite the fact that collision attacks are easy to come by.

RFC 1321 covers the MD5 message digest function.

Following our general procedure:

  1. Pad message
    We pad this message so its length in bits is congruent to \(448 \pmod {512}\), by adding first a ‘1’ bit, then ‘0’ bits until the length is appropriate. Since a ‘1’ is always added, there is always padding of some sort.
    Next we add the length of the original message in bits as a 64-bit number in little-endian byte order.
  2. Process message
  3. Return digest
import support
import math

#md5
rotate_amounts=[ 7,12,17,22, 7,12,17,22, 7,12,17,22, 7,12,17,22,
                 5, 9,14,20, 5, 9,14,20, 5, 9,14,20, 5, 9,14,20,
                 4,11,16,23, 4,11,16,23, 4,11,16,23, 4,11,16,23,
                 6,10,15,21, 6,10,15,21, 6,10,15,21, 6,10,15,21]
constants=[int(abs(math.sin(i+1))*2**32) for i in range(64)]
init_values=[0x67452301,0xefcdab89,0x98badcfe,0x10325476]
functions=16*[lambda b,c,d:(b&c)|(~b&d)]+\
          16*[lambda b,c,d:(d&b)|(~d&c)]+\
          16*[lambda b,c,d:b^c^d]+\
          16*[lambda b,c,d:c^(b|~d)]
index_functions=16*[lambda i:i]+\
                16*[lambda i:(5*i+1)%16]+\
                16*[lambda i:(3*i+5)%16]+\
                16*[lambda i:(7*i)%16]

def pad(message,encoding='utf-8'):
    m=bytearray(message,encoding)
    l=(8*len(m))%(2**64)
    m=support.pad(m,56,64,0x80,0)
    m+=l.to_bytes(8,'little')
    return m

def md5(message,encoding='utf-8'):
    m=pad(message,encoding)
    d=process(m)
    return d

def process(message):
    hash_pieces=init_values[:]
    for chunk_offset in range(0,len(message),64):
        a,b,c,d=hash_pieces
        chunk=message[chunk_offset:chunk_offset+64]
        for i in range(64):
            f=functions[i](b,c,d)
            g=index_functions[i](i)
            to_rotate=a+f+constants[i]+int.from_bytes(chunk[4*g:4*g+4],'little')
            new_b=(b+support.rotl(to_rotate,rotate_amounts[i]))&0xffffffff
            a,b,c,d=d,new_b,b,c
        for i,val in enumerate([a,b,c,d]):
            hash_pieces[i]+=val
            hash_pieces[i]&=0xffffffff
    return sum(x<<(32*i) for i,x in enumerate(hash_pieces))

if __name__=='__main__':
    print('testing MD5:')
    test_vectors=(('','d41d8cd98f00b204e9800998ecf8427e'),
                  ('a','0cc175b9c0f1b6a831c399e269772661'),
                  ('abc','900150983cd24fb0d6963f7d28e17f72'),
                  ('message digest','f96b697d7cb7938d525a2f31aaf161d0'),
                  ('abcdefghijklmnopqrstuvwxyz','c3fcd3d76192e4007dfb496cca67e13b'),
                  ('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789','d174ab98d277d9f5a5611c2c9f419d9f'),
                  ('12345678901234567890123456789012345678901234567890123456789012345678901234567890','57edf4a22be3c955ac49da2e2107b67a'))
    for t,r in test_vectors:
        d=support.hexdigest(md5(t))
        print('digest \'{}\': {}, {}'.format(t,d,'passed' if d==r else 'failed'))