Tuesday, July 02, 2024

Morphology and Software Development

A new, popular library or framework isn't necessarily better than all the old ones. That's a corollary to this: success doesn't necessarily indicate the best quality. Typically, the new platform technology simply won: on a lumpy playing field against other technologies, perhaps through association with a performant platform, or via momentum from a community, a sponsor, or a monopoly. When Google Cloud Platform moved from webapp2 to Flask, it wasn't justified on scientific or engineering or psychological grounds. There was no attempt to make a case: Flask simply became more popular, and Google didn't properly fund webapp2. At the same time, for similar reasons, we had to move from Python 2 to 3, and from datastore db to ndb.

Let me demonstrate two ways in which webapp2 was better. These are morphological observations, or more precisely psychological observations about morphology. That is, I'm talking about the shapes we perceive when we look at our code, and the consequences for our minds. It's really the same approach we take to, for example, understanding how we feel when we walk down a beautiful street, or enter a pleasant town square. Those have a positive effect on our mental life for sometimes very simple reasons: a safer pedestrian environment, more interesting things to see at a human-scale, trees, shade, water, a sense that people care, and a sense of peace. This kind of analysis is pretty common in the UX and design world, but for some reason, it's barely discussed in the world of software development environments. The "user" experience gets a lot of attention. The "developer as user" experience: not so much.

The first thing lost with webapp2 was the route table. These exist in plenty of other programming environments, but now it's forgotten in Google's serverless python environment. It used to look like this:

# handler blocks

# handler block for People
Class People(webapp2.RequestHandler):
def get(self, string):
# code
def post(self):
#code
def people_get_function():
# code
def people_set_function():
# code
def other_people_functions_etc():
...
#lots of other handler blocks
...
# route table
app = webapp2.WSGIApplication([('/people/(.*)', People),
                               ('/neighborhoods/(.*)', Neighborhoods),
                               ('/cities/(.*)', Cities),
... )])

The shape of this code lends itself to clusters of functionality. But equally important:

Do you need to look up how you handle people? Look at the route table.

It's like the table of contents in a book. It lets you find the chapter about people, neighborhoods, cities, etc. ... no matter how idiosyncratically you mapped the world onto your program.

Instead, the approximate equivalent in flask looks like this:

@app.route('/people')
def people_response():
# code

@app.route('/neighborhood')
def neighborhood_response():
# code

@app.route('/city')
def city_response():
# code

On first blush, this puts everything, even the external API presentation or URL endpoints, in the same place as the functionality. But it's atomizing. In a complex application, you'll now be looking through all of your code for the route, which is a conceptual pointer to an idea, whose name you might not even remember, and it could be in a large file or many small ones. That was something you didn't need to do before. You simply needed to look through your route table, which is small, to re-orient yourself in your appplication, like an index or overview. The orienting qualities of such a global shape are obvious, but obviously not well-enough appreciated. In Flask now, you're forced to read everything to find what you want. Did you separate everything into different files? Then you'd have to read all the file names, which are ordered alphabetically, which is no order at all when you can't remember what word you're looking for. It's an unecessary tax on memory. When you have many such applications to build and maintain, you'd have no easy way to "get back up to speed" about your own code, which you might not have looked at for months. Because you can no longer lean on a simple, helpful shape.
Here's a second advantage webapp2 has over Flask. Flask has no built-in class model for the primary network commitment of every server-side application: to build a response to a request.

In webapp2, there was a "self". In this self are all the values that came from the internet. This is also where, in the course of the application, you'll put all of the resources that you'll pass back in your response. This was a simple idea: one part of your server app could focus on handling cookies and authentication, another could focus on stored values to make some sub-domain of the application work, a third would provide the text or image content, etc.

In flask, to get this same functionality, you need to try to find the appropriate place to allocate a response object: but where? So you need an init that gets called everywhere, and flags to see what you've done already ... it's a mess.

This is also morphological ... flask scatters and brings forward a task that was a background assumption in webapp2.

There are many good, humane ideas that disappear in computing. Typically, they are rediscovered in some form in the future, since these good ideas, at least the morphological-psychological ones (morpho-psychological?), are based on what is comfortable for human perception and cognition.

But it would be nice if software developers across the board would advocate a bit harder for their "UX" comfort. This is one topic of a seminar I host at the Building Beauty school of architecture, which we call Beautiful Software. Computer people can study both nature and beauty, to improve understanding of their innate, natural sensitivities. It would change everything if they did: not only their code.

Monday, July 01, 2024

Google Datastore URL-safe keys: incompatibilities from db to ndb inaccessible to LLMs

This is a story about accidental, unnecessary platform migration incompatibilities, which turned into problems undiscovered by LLMs with access to the relevant corpus.

On Google App Engine, in Python 2.7 using webapp2 and db, I had to do this sort of thing, to get a list of the URL-safe keys for a set of entities:

return_string = ""
cities = db.GqlQuery("SELECT * FROM City ORDER BY name ASC")
for c in cities:
  return_string = return_string + str(c.key()) + ","
return return_string

Now, moving to python 3, with flask, and ndb:

return_string = ""
with client.context():
 cities = City.query().order(City.name).fetch()
 for c in cities:
       return_string = return_string +
  c.key.to_legacy_urlsafe(location_prefix="s~").decode('utf-8') + ","
return return_string

... and I'm sure this won't be the end of the saga.

Datastore, which is now the datastore mode of the firestore database at google, is accessed now through a library called ndb. But when you look at the database viewer on Google Console, the UUID that represents the data entity -- which is called the URL-safe key or reference -- is not the same as the value you extract normally using the programmatic interface of ndb. The numbers are just different. For the same entity.

I was quite surprised that neither Google's AI (Gemini) nor OpenAI's ChatGPT 4o could make heads, nor tails, of this problem. The code is Google's, and Google keeps it on Github, along with a long conversation thread about the problem by someone poking the datastore team until the "legacy" method was implemented. But Microsoft bought GitHub in order to feed all of this sort of information to OpenAI. So, why could it not solve the above problem?

Partly, it's because there was no real documentation of the problem: just a polite complaint that turned into a conversation which would be incomprehensible to an LLM without the context of actually building an application around this feature/bug, and having it fail. So, LLMs are still bound to have trouble from a lack of interaction with their fellow machines, and the lack of human narration for that interaction when it IS allowed.

In the meantime, their vast capacity for reading technical material cannot solve difficult problems like this for us, where the expectation (that a UUID would look the same no matter which library is looking at it) is just an obvious assumption, but not really described by anybody, hence inaccessible to an LLM.

Saturday, April 06, 2024

Python 2.7 -> 3.8 on App Engine. Ordered checklist with an unfolding stub.


The context here is a kind of "resource pump" webapp, a common enough migration of early static web sites, during the mid-web-2.0-era (say 2008), to Python 2 with WSGI on Google App Engine. The simple python program does very little, allowing the app.yaml file to define the work, serving folders full of static files.

So, how do we move these sites to App Engine with Python 3.8 and Flask 2? 

Here's an ordered checklist, that is, a set of instructions with ordered dependencies. 

That's also typical of unfolding sequences in software, but in the latter, there tends to be more creativity and judgment involved ... and the steps are not instructions ... instead they're helpful issues to consider at that moment. 

Most of the steps below simply need to happen, in that order. So they are instructions. 

There's only one somewhat creative step here (5) ... yet it highlights the point where more steps might be written and inserted, to serve a wider range of migrations.

  1. I assume this is a directory with git source control.


    If so:

    git commit -a -m 'starting migration from python 2.7' 

     If not: 

    git init   
    git add app.yaml  
    git commit -a -m 'starting source control'

  2. mkdir templates

  3. mv index.html templates

    and git add templates/index.html

  4. add requirements.txt 

    and git add requirements.txt

  5. add main.py (if there's a route table in the WSGI version, move the routes to flask. This is the one creative task. It's kind of a stub: this is where all the creative tasks in a sequence would go, to serve a greater range of programs.) 

    and git add main.py

  6. change the app.yaml head

  7. change the app.yaml tail 

    and git commit -m 'first python 3.8 changes'

  8. (if it makes sense, set up the local test environment)

  9. (if it makes sense, run gunicorn & test) Note that gunicorn does not use app.yaml, so your mileage may vary, in using this local test environment. If you don't need to debug the server-side python, see deployment test steps 12-14

  10. (if you created a virtual environment, add the <project env> directory (see below) to .gcloudignore)

  11. gcloud app deploy --project <migrating site> --no-promote

  12. Go to (cloud console->app engine -> versions), find the new version, launch and test

  13. Is the test good? Select the new version and click “migrate traffic”.

If you want to setup a local test environment (again, useful if there's more server code to test):

virtualenv -p python3.8.2 <project env>

source ./<project env>/bin/activate

pip install -r requirements.txt

(or

pip install gunicorn

pip install flask

pip install google-cloud-datastore

pip list

)

gunicorn -b :8080 main:app

(test in browser at localhost:8080)

^c

deactivate


old app.yaml head:

runtime: python27

api_version: 1

threadsafe: false


new app.yaml head:

runtime: python38

app_engine_apis: true


old app.yaml tail:

- url: /.*

  script: <migrating site>.app

  secure: always

  redirect_http_response_code: 301


new app.yaml tail:

- url: /.*

  script: auto

  secure: always

  redirect_http_response_code: 301


new requirements.txt:

Flask==2.2.2

google-cloud-datastore==2.7.0

appengine-python-standard>=1.0.0

google-auth==2.17.1

google-auth-oauthlib==1.0.0

google-auth-httplib2==0.1.0

werkzeug==2.2.2

And here's the one creative step in this checklist's sequence: migrating the route table. It's only a stub for further creative-and-judged unfolding steps, if one is migrating server-side application logic:

new main.py:

from flask import Flask, render_template, request

app = Flask(__name__)


@app.route('/')

@app.route('/endpoin_one')

@app.route('/endpoint_two')

def root():

    # NB: index.html must be in /templates

    return render_template('index.html')

if __name__ == '__main__':

    app.run()


old <migrating site>.py:

# universal index.html delivery

# in python27 as a service

# on Google App Engine

import cgi

import os

import webapp2

from google.appengine.ext.webapp import template

from google.appengine.api import users

class MainPage(webapp2.RequestHandler):

    def get(self):

        template_values = {

            }

        path = os.path.join(os.path.dirname(__file__), 'index.html')

        self.response.out.write(template.render(path, template_values))

app = webapp2.WSGIApplication(

                                     [('/', MainPage)

                                     ,('/endpoint_one',MainPage)

                                     ,('/endpoint_two',MainPage)

                                      ],

                                     debug=True)


Friday, March 15, 2024

Broken or erratic or unreliable visual editor (emacs, vim, etc.) over ssh in Mac terminal?

This is a rather specific problem. 

But I couldn't find mention of it anywhere.

The MacOS terminal implementation has difficulty when the window, or more precisely the amount of data stored in the terminal process, gets very large ... 

... say you've been using it for days, and lots of output has scrolled up, but you haven't opened a new terminal window ... maybe because you want to look at what you've done already. You could export it, but then you'd have to think about where to put that exported data. 

Mac terminal tends to be a bit greedy of RAM, and if you have lots of these terminal windows, managing lots of projects, you may see some performance degradation.

... but, also, you may see some actual problems.

For example, if one of these terminals is connected to a remote host using ssh, and you start to use a visual editor (say, emacs, vi, or vim) on the remote machine, the editor might start to make errors, and become essentially unusable. The remote editor has expectations for terminal text control signals, and Mac terminal, unable to work fast enough or buffer signals reliably, because of the large volume of text in the current process, simply fails to keep sync.

So, you're not crazy. Not everything in computing is deterministic, especially when networking is involved. (And yes, Apple could and should fix this problem.)

For now, simply close the remote host, export the terminal text, close the terminal, and open a new one.