A Boggling Python Import

Despite Python's apparent simplicity, I've always struggled with its import system. I'm not quite sure of the reason(s), but I simply find it more confusing than that of, say, C -- especially when __init__.py becomes involved. It may be because Python is the first programming language I learned well, and I didn't learn it by reading the documentation in a linear fashion. Instead, I jumped around a lot -- I learned the language rather organically, and I think Python lends itself to that, again, at least on the surface. But as I continue to use the language for more ambitious projects, I constantly find myself running into walls.

Recently, I wrote a tree-walk interpreter in Python for Lox, a programming language designed and explained thoroughly by Robert Nystrom in his excellent Crafting Interpreters. This is the largest Python code base I've written myself: It's roughly 2500 lines, including comments and documentation.

Whereas I chose to use Python, Nystrom presents his implementation in Java. As a result, the initial architecture of my implementation resembled a traditional Java application more so than a Python program. It still remains very much object oriented.

I don't know Java, but it seems standard to import or include the file with main() throughout other files in the application. In Python, on the other hand, it poses risks.

When a user executes a Python module from the command line, Python creates the module __main__ under the hood, hence the common:

      
        if __name__ == "__main__":
            main()

The variable __name__ stores the name of the current module, as the identifier implies. The above snippet then means, Python executes main() if and only if the name of this module is __main__. I don't find this confusing, and it's explained in detail across the internet, be it on StackOverflow or individual blogs or the official Python documentation.

For clarity, the official documentation reads:

__main__ is the name of the environment where top-level code is run. "Top-level code" is the first user-specified Python module that starts running. It's "top-level" because it imports all other modules that the program needs. Sometimes "top-level code" is called an entry point to the application.

Now, that being said, I found the code below confusing. It contains that same snippet, but it also imports the module that contains main() in a separate module. It resembles an issue I faced while writing Plox, my Lox interpreter written in Python, and it led to issues with state early in the process.

Given the following two (contrived) modules -- x.py and y.py, respectively -- imagine the output or the final state of z.

      x.py
      
        import y

        z = False

        def f():
            global z
            z = True

        def main():
            y.g()
            print(z)

        if __name__ == "__main__":
            main()

      y.py
      
        import x

        def g():
            x.f()

I initially expected the output to be True. However, the execution of python x.y from the command line output False.

A kind soul on Python's IRC channel explained why I was wrong. There are effectively two instances of module x, and thus two instances of any global variables in the module as well. Python instantiates the first instance of this module upon the execution of python x.y; it names it __main__. Python instantiates the second upon executing import x in module y. It names this one x.

g() of module y calls f() in module x. Because it calls x.f(), the function modifies x.z, not z or more specifically __main__.z. This is the distinction. It's possible to access each instance of z separately as well. __main__.z -- or simply z -- and x.z are both accessible.

All of this is to say, don't import the module that contains main() -- and (probably) avoid global variables, like they say.