Tuesday, July 02, 2024

Morphology and Software Development

A new, popular library or framework isn't necessarily better than all the old ones. That's a corollary to this: success doesn't necessarily indicate the best quality. Typically, the new platform technology simply won: on a lumpy playing field against other technologies, perhaps through association with a performant platform, or via momentum from a community, a sponsor, or a monopoly. When Google Cloud Platform moved from webapp2 to Flask, it wasn't justified on scientific or engineering or psychological grounds. There was no attempt to make a case: Flask simply became more popular, and Google didn't properly fund webapp2. At the same time, for similar reasons, we had to move from Python 2 to 3, and from datastore db to ndb.

Let me demonstrate two ways in which webapp2 was better. These are morphological observations, or more precisely psychological observations about morphology. That is, I'm talking about the shapes we perceive when we look at our code, and the consequences for our minds. It's really the same approach we take to, for example, understanding how we feel when we walk down a beautiful street, or enter a pleasant town square. Those have a positive effect on our mental life for sometimes very simple reasons: a safer pedestrian environment, more interesting things to see at a human-scale, trees, shade, water, a sense that people care, and a sense of peace. This kind of analysis is pretty common in the UX and design world, but for some reason, it's barely discussed in the world of software development environments. The "user" experience gets a lot of attention. The "developer as user" experience: not so much.

The first thing lost with webapp2 was the route table. These exist in plenty of other programming environments, but now it's forgotten in Google's serverless python environment. It used to look like this:

# handler blocks

# handler block for People
Class People(webapp2.RequestHandler):
def get(self, string):
# code
def post(self):
#code
def people_get_function():
# code
def people_set_function():
# code
def other_people_functions_etc():
...
#lots of other handler blocks
...
# route table
app = webapp2.WSGIApplication([('/people/(.*)', People),
                               ('/neighborhoods/(.*)', Neighborhoods),
                               ('/cities/(.*)', Cities),
... )])

The shape of this code lends itself to clusters of functionality. But equally important:

Do you need to look up how you handle people? Look at the route table.

It's like the table of contents in a book. It lets you find the chapter about people, neighborhoods, cities, etc. ... no matter how idiosyncratically you mapped the world onto your program.

Instead, the approximate equivalent in flask looks like this:

@app.route('/people')
def people_response():
# code

@app.route('/neighborhood')
def neighborhood_response():
# code

@app.route('/city')
def city_response():
# code

On first blush, this puts everything, even the external API presentation or URL endpoints, in the same place as the functionality. But it's atomizing. In a complex application, you'll now be looking through all of your code for the route, which is a conceptual pointer to an idea, whose name you might not even remember, and it could be in a large file or many small ones. That was something you didn't need to do before. You simply needed to look through your route table, which is small, to re-orient yourself in your appplication, like an index or overview. The orienting qualities of such a global shape are obvious, but obviously not well-enough appreciated. In Flask now, you're forced to read everything to find what you want. Did you separate everything into different files? Then you'd have to read all the file names, which are ordered alphabetically, which is no order at all when you can't remember what word you're looking for. It's an unecessary tax on memory. When you have many such applications to build and maintain, you'd have no easy way to "get back up to speed" about your own code, which you might not have looked at for months. Because you can no longer lean on a simple, helpful shape.
Here's a second advantage webapp2 has over Flask. Flask has no built-in class model for the primary network commitment of every server-side application: to build a response to a request.

In webapp2, there was a "self". In this self are all the values that came from the internet. This is also where, in the course of the application, you'll put all of the resources that you'll pass back in your response. This was a simple idea: one part of your server app could focus on handling cookies and authentication, another could focus on stored values to make some sub-domain of the application work, a third would provide the text or image content, etc.

In flask, to get this same functionality, you need to try to find the appropriate place to allocate a response object: but where? So you need an init that gets called everywhere, and flags to see what you've done already ... it's a mess.

This is also morphological ... flask scatters and brings forward a task that was a background assumption in webapp2.

There are many good, humane ideas that disappear in computing. Typically, they are rediscovered in some form in the future, since these good ideas, at least the morphological-psychological ones (morpho-psychological?), are based on what is comfortable for human perception and cognition.

But it would be nice if software developers across the board would advocate a bit harder for their "UX" comfort. This is one topic of a seminar I host at the Building Beauty school of architecture, which we call Beautiful Software. Computer people can study both nature and beauty, to improve understanding of their innate, natural sensitivities. It would change everything if they did: not only their code.

Monday, July 01, 2024

Google Datastore URL-safe keys: incompatibilities from db to ndb inaccessible to LLMs

This is a story about accidental, unnecessary platform migration incompatibilities, which turned into problems undiscovered by LLMs with access to the relevant corpus.

On Google App Engine, in Python 2.7 using webapp2 and db, I had to do this sort of thing, to get a list of the URL-safe keys for a set of entities:

return_string = ""
cities = db.GqlQuery("SELECT * FROM City ORDER BY name ASC")
for c in cities:
  return_string = return_string + str(c.key()) + ","
return return_string

Now, moving to python 3, with flask, and ndb:

return_string = ""
with client.context():
 cities = City.query().order(City.name).fetch()
 for c in cities:
       return_string = return_string +
  c.key.to_legacy_urlsafe(location_prefix="s~").decode('utf-8') + ","
return return_string

... and I'm sure this won't be the end of the saga.

Datastore, which is now the datastore mode of the firestore database at google, is accessed now through a library called ndb. But when you look at the database viewer on Google Console, the UUID that represents the data entity -- which is called the URL-safe key or reference -- is not the same as the value you extract normally using the programmatic interface of ndb. The numbers are just different. For the same entity.

I was quite surprised that neither Google's AI (Gemini) nor OpenAI's ChatGPT 4o could make heads, nor tails, of this problem. The code is Google's, and Google keeps it on Github, along with a long conversation thread about the problem by someone poking the datastore team until the "legacy" method was implemented. But Microsoft bought GitHub in order to feed all of this sort of information to OpenAI. So, why could it not solve the above problem?

Partly, it's because there was no real documentation of the problem: just a polite complaint that turned into a conversation which would be incomprehensible to an LLM without the context of actually building an application around this feature/bug, and having it fail. So, LLMs are still bound to have trouble from a lack of interaction with their fellow machines, and the lack of human narration for that interaction when it IS allowed.

In the meantime, their vast capacity for reading technical material cannot solve difficult problems like this for us, where the expectation (that a UUID would look the same no matter which library is looking at it) is just an obvious assumption, but not really described by anybody, hence inaccessible to an LLM.