Despite Python's apparent simplicity, I've always struggled with its
import system. I'm not quite sure of the reason(s), but I simply find it
more confusing than that of, say, C -- especially when
__init__.py becomes involved. It may be because Python is
the first programming language I learned well, and I didn't learn it by
reading the documentation in a linear fashion. Instead, I jumped around a
lot -- I learned the language rather organically, and I think Python
lends itself to that, again, at least on the surface. But as I continue
to use the language for more ambitious projects, I constantly find myself
running into walls.
Recently, I wrote a tree-walk interpreter in Python for Lox, a programming language designed and explained thoroughly by Robert Nystrom in his excellent Crafting Interpreters. This is the largest Python code base I've written myself: It's roughly 2500 lines, including comments and documentation.
Whereas I chose to use Python, Nystrom presents his implementation in Java. As a result, the initial architecture of my implementation resembled a traditional Java application more so than a Python program. It still remains very much object oriented.
I don't know Java, but it seems standard to import or
include the file with main() throughout other files in the
application. In Python, on the other hand, it poses risks.
When a user executes a Python module from the command line, Python
creates the module __main__ under the hood, hence the
common:
if __name__ == "__main__":
main()
The variable __name__ stores the name of the current module,
as the identifier implies. The above snippet then means, Python executes
main() if and only if the name of this module is __main__. I
don't find this confusing, and it's explained in detail across the
internet, be it on StackOverflow or individual blogs or the official
Python documentation.
For clarity, the official documentation reads:
__main__ is the name of the environment where top-level code is run. "Top-level code" is the first user-specified Python module that starts running. It's "top-level" because it imports all other modules that the program needs. Sometimes "top-level code" is called an entry point to the application.
Now, that being said, I found the code below confusing. It contains that same snippet, but it also imports the module that contains main() in a separate module. It resembles an issue I faced while writing Plox, my Lox interpreter written in Python, and it led to issues with state early in the process.
Given the following two (contrived) modules -- x.py and
y.py, respectively -- imagine the output or the final state
of z.
x.py
import y
z = False
def f():
global z
z = True
def main():
y.g()
print(z)
if __name__ == "__main__":
main()
y.py
import x
def g():
x.f()
I initially expected the output to be True. However, the
execution of python x.y from the command line output
False.
A kind soul on Python's IRC channel explained why I was wrong. There are
effectively two instances of module x, and thus two
instances of any global variables in the module as well. Python
instantiates the first instance of this module upon the execution of
python x.y; it names it __main__. Python
instantiates the second upon executing import x in module
y. It names this one x.
g() of module y calls f() in module x. Because
it calls x.f(), the function modifies x.z, not
z or more specifically __main__.z. This is the
distinction. It's possible to access each instance of z
separately as well. __main__.z -- or simply z
-- and x.z are both accessible.
All of this is to say, don't import the module that contains main() -- and (probably) avoid global variables, like they say.