I am reading Dive into Python and have finished 7 chapters till now. This blog post is having the points from the book that
- I liked.
- I may forget.
- have so much meaning in them.
- tells stories about why it was introduced in specific Python version.
- I felt like writing it in this blog post.
Some practice code samples can be found here.
Objects in Python
- Everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Even modules are objects. Almost everything has attributes and methods. All functions have a built-in attribute __doc__, which returns the doc string defined in the function’s source code.
- Different programming languages define “object” in different ways. In some, it means that all objects must have attributes and methods; in others, it means that all objects are subclassable. In Python, the definition is looser; some objects have neither attributes nor methods, and not all objects are subclassable. But everything is an object in the sense that it can be assigned to a variable or passed as an argument to a function.
Dictionaries in Python
- Dictionaries are unordered.
- Within a single dictionary, the values don’t all need to be the same type
- del lets you delete individual items from a dictionary by key.
- clear deletes all items from a dictionary.
- The set of empty curly braces signifies a dictionary without any items.
List in Python
- A list is an ordered set of elements enclosed in square brackets.
- Negative List Indices li[-n] == li[len(li) – n].
- Reading the list from left to right, the first slice index specifies the first element you want, and the second slice index specifies the first element you don’t want. The return value is everything in between.
- li[:n] will always return the first n elements, and li[n:] will return the rest, regardless of the length of the list.
- li[:] is shorthand for making a complete copy of a list.
Index in Python
- Index finds the first occurrence of a value in the list and returns the index.
- If the value is not found in the list, Python raises an exception. This is notably different from most languages, which will return some invalid index. While this may seem annoying, it is a good thing, because it means your program will crash at the source of the problem, rather than later on when you try to use the invalid index.
+, *, += with lists in Python
>> li = [‘a’, ‘b’, ‘mpilgrim’]
>>> li = li + [‘example’, ‘new’] (1)
[‘a’, ‘b’, ‘mpilgrim’, ‘example’, ‘new’]
>>> li += [‘two’]
[‘a’, ‘b’, ‘mpilgrim’, ‘example’, ‘new’, ‘two’]
>>> li = [1, 2] * 3
[1, 2, 1, 2, 1, 2]
+ and extend() in Python
- The + operator returns a new (concatenated) list as a value, whereas extend only alters an existing list. This means that extend is faster, especially for large lists.
Tuple in Python
- A tuple is an immutable list. A tuple can not be changed in any way once it is created.
Classes in Python
- Classes can (and should) have doc strings too, just like modules and functions.
- __init__ is called immediately after an instance of the class is created. It would be tempting but incorrect to call this the constructor of the class. It’s tempting, because it looks like a constructor (by convention, __init__ is the first method defined for the class), acts like one (it’s the first piece of code executed in a newly created instance of the class), and even sounds like one (“init” certainly suggests a constructor-ish nature). Incorrect, because the object has already been constructed by the time __init__ is called, and you already have a valid reference to the new instance of the class. But __init__ is the closest thing you’re going to get to a constructor in Python, and it fills much the same role.
- The first argument of every class method, including __init__, is always a reference to the current instance of the class. By convention, this argument is always named self. In the __init__ method, self refers to the newly created object; in other class methods, it refers to the instance whose method was called. Although you need to specify self explicitly when defining the method, you do not specify it when calling the method; Python will add it for you automatically.
- __init__ methods can take any number of arguments, and just like functions, the arguments can be defined with default values, making them optional to the caller. In this case, filename has a default value of None, which is the Python null value.
When to use self and __init__
- When defining your class methods, you must explicitly list self as the first argument for each method, including __init__. When you call a method of an ancestor class from within your class, you must include the self argument. But when you call your class method from outside, you do not specify anything for the self argument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it’s not really inconsistent, but it may appear inconsistent because it relies on a distinction (between bound and unbound methods) that you don’t know about yet.
- If you forget everything else, remember this one thing, because I promise it will trip you up: __init__ Methods __init__ methods are optional, but when you define one, you must remember to explicitly call the ancestor’s __init__ method (if it defines one). This is more generally true: whenever a descendant wants to extend the behavior of the ancestor, the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments.
Garbage Collection in Python
- If creating new instances is easy, destroying them is even easier. In general, there is no need to explicitly free instances, because they are freed automatically when the variables assigned to them go out of scope.
- Memory leaks are rare in Python.
- The technical term for this form of garbage collection is “reference counting”. Python keeps a list of references to every instance created.
- In previous versions of Python, there were situations where reference counting failed, and Python couldn’t clean up after you. If you created two instances that referenced each other (for instance, a doubly-linked list, where each node has a pointer to the previous and next node in the list), neither instance would ever be destroyed automatically because Python (correctly) believed that there is always a reference to each instance. Python 2.0 has an additional form of garbage collection called “mark-and-sweep” which is smart enough to notice this virtual gridlock and clean up circular references correctly.
I will write a separate blog post on this.
Function Overloading in Python
- Python supports neither of these; it has no form of function overloading whatsoever.
- Methods are defined solely by their name, and there can be only one method per class with a given name. So if a descendant class has an __init__ method, it always overrides the ancestor __init__ method, even if the descendant defines it with
a different argument list. And the same rule applies to any other method.
- Guido, the original author of Python, explains method overriding this way: “Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class, may in fact end up calling a method of a derived class that overrides it.
- All methods in Python are effectively virtual
Special Class Methods in Python
- special class method; not only can you call it yourself, you can get Python to call it for you by using the right syntax.
- __repr__ is a special method that is called when you call repr(instance). The repr function is a built-in function that returns a string representation of an object. It works on any object, not just class instances. You’re already intimately familiar with repr and you don’t even know it. In the interactive window, when you type just a variable name and press the ENTER key, Python uses repr to display the variable’s value. Go create a dictionary d with some data and then print repr(d) to see for yourself.
- __cmp__ is called when you compare class instances. In general, you can compare any two Python objects, not just class instances, by using ==. There are rules that define when built-in datatypes are considered equal; for instance, dictionaries are equal when they have all the same keys and values, and strings are equal when they are the same length and contain the same sequence of characters. For class instances, you can define the __cmp__ method and code the comparison logic yourself, and then you can use == to compare instances of your class and Python will call your __cmp__ special method for you.
- __len__ is called when you call len(instance). The len function is a built-in function that returns the length of an object. It works on any object that could reasonably be thought of as having a length. The len of a string is its number of characters; the len of a dictionary is its number of keys; the len of a list or tuple is its number of elements. For class instances, define the __len__ method and code the length calculation yourself, and then call len(instance) and Python will call your __len__ special method for you.
- __delitem__ is called when you call del instance[key], which you may remember as the way to delete individual items from a dictionary. When you use del on a class instance, Python calls the __delitem__ special method for you.
Class attributes in Python
- Class attributes can be used as class-level constants, but they are not really constants. You can also change them.
- There are no constants in Python. Everything can be changed if you try hard enough. This fits with one of the core principles of Python: bad behavior should be discouraged but not banned. If you really want to change the value of None, you can do it, but don’t come running to me when your code is impossible to debug.
Private, Public in Python
- Unlike in most languages, whether a Python function, method, or attribute is private or public is determined entirely by its name.
- If the name of a Python function, class method, or attribute starts with (but doesn’t end with) two underscores, it’s private; everything else is public. Python has no concept of protected class methods (accessible only in their own class and descendant classes). Class methods are either private (accessible only in their own
class) or public (accessible from anywhere).
- __setitem__ is a special method; normally, you would call it indirectly by using the dictionary syntax on a class instance, but it is public, and you could call it directly if you had a really good reason. However, __parse is private, because it has two underscores at the beginning of its name.
- Strictly speaking, private methods are accessible outside their class, just not easily accessible. Nothing in Python is truly private; internally, the names of private methods and attributes are mangled and unmangled on the fly to make them seem inaccessible by their given names. You can access the __parse method of the ABC class by the name _ABC__parse.
- Acknowledge that this is interesting, but promise to never, ever do it in real code. Private methods are private for a reason, but like many other things in Python, their privateness is ultimately a matter of convention, not force.
Exception Handling in Python
- Python uses try…except to handle exceptions and raise to generate them. Java and C++ use try…catch to handle exceptions, and throw to generate them.
- a try…finally block is for: code in the finally block will always be executed, even if something in the try block raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before.
Modules in Python
- Modules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary sys.modules
- Every Python class has a built-in class attribute __module__, which is the name of the module in which the class is defined.
Getattr and hasattr in Python
- getattr, which gets a reference to an object by name. hasattr is a complementary
function that checks whether an object has a particular attribute; in this case, whether a module has a particular class (although it works for any object and any attribute, just like getattr).
I will write a separate blog post on this.
Listdir function in Python
- The listdir function takes a pathname and returns a list of the contents of the directory.
- listdir returns both files and folders, with no indication of which is which. You can use list filtering and the isfile function of the os.path module to separate the files from the folders. isfile takes a pathname and returns 1 if the path represents a file, and 0 otherwise. Here you’re using os.path.join to ensure a full pathname, but isfile also works with a partial path, relative to the current working directory. You can use os.getcwd() to get the current working directory.
- os.path also has a isdir function which returns 1 if the path represents a directory, and 0 otherwise. You can use this to get a list of the subdirectories within a directory.
glob module in Python
- The glob module, on the other hand, takes a wildcard and returns the full path of all files and directories matching the wildcard. Here the wildcard is a directory path plus “*.mp3”, which will match all .mp3 files. Note that each element of the returned list already includes the full path of the file.
- You have a music directory, with several subdirectories within it, with .mp3 files within each subdirectory. You can get a list of all of those with a single call to glob, by using two wildcards at once. One wildcard is the “*.mp3” (to match .mp3 files), and one wildcard is within the directory path itself, to match any subdirectory within c:\music. That’s a crazy amount of power packed into one deceptively simple-looking function!
I will write separate blog posts for the points I like in other chapters.