Points To Ponder · Python · Reading

Points From Dive Into Python : Part 2

I finished reading Dive into Python some days back. This blog post is in continuation of previous blog post. I have used a lot of constructs which has been described starting from chapter 7, be it regex in Python or things like urrlib e.t.c, so there is not much that I have picked from those chapters.

The reason for writing this blog post is same as I mentioned in my previous blog post.

 

Namespaces in Python

  • At any particular point in a Python program, there are several namespaces available. Each function has its own namespace, called the local namespace, which keeps track of the function’s variables, including function arguments and locally defined variables. Each module has its own namespace, called the global namespace, which keeps track of the module’s variables, including functions, classes, any other imported modules, and module-level variables and constants. And there is the built-in namespace, accessible from any module, which holds built-in functions and exceptions.
  • When a line of code asks for the value of a variable x, Python will search for that variable in all the available namespaces, in order:
    1. local namespace – specific to the current function or class method. If the function defines a local variable x, or has an argument x, Python will use this and stop searching.
    2. global namespace – specific to the current module. If the module has defined a variable, function, or class called x, Python will use that and stop searching.
    3. built-in namespace – global to all modules. As a last resort, Python will assume that x is the name of built-in function or variable.
  • If Python doesn’t find x in any of these namespaces, it gives up and raises a NameError with the message.

Difference between module import and import module

  • Remember the difference between from module import and import module? With import module, the module itself is imported, but it retains its own namespace, which is why you need to use the module name to access any of its functions or attributes: module.function. But with from module import, you’re actually importing specific functions and attributes from another module into your own namespace, which is why you access them directly without referencing the original module they came from.

 

Package in Python

  • A package is a directory with the special __init__.py file in it. The __init__.py file defines the attributes and methods of the package. It doesn’t need to define anything; it can just be an empty file, but it has to exist. But if __init__.py doesn’t exist, the directory is just a directory, not a package, and it can’t be imported or contain modules or nested packages.

 

Before Unicode..

  • Before unicode, there were separate character encoding systems for each language,
    each using the same numbers (0-255) to represent that language’s characters. Some languages (like Russian) have multiple conflicting standards about how to represent the same characters; other languages (like Japanese) have so many characters that they require multiple-byte character sets. Exchanging documents between systems was difficult because there was no way for a computer to tell for certain which character encoding scheme the document author had used; the computer only saw numbers, and the numbers could mean different things. Then think about trying to store these documents in the same place (like in the same database table); you would need to store the character encoding alongside each piece of text, and make
    sure to pass it around whenever you passed the text around. Then think about multilingual documents, with characters from multiple languages in the same document. (They typically used escape codes to switch modes; poof, you’re in Russian koi8-r mode, so character 241 means this; poof, now you’re in Mac Greek
    mode, so character 241 means something else. And so on.) These are the problems which unicode was designed to solve.
  • To solve these problems, unicode represents each character as a 2-byte number, from 0 to 65535.5 Each 2-byte number represents a unique character used in at least one of the world’s languages. (Characters that are used in multiple languages have the same numeric code.) There is exactly 1 number per character, and exactly 1 character per number. Unicode data is never ambiguous. Of course, there is still the matter of all these legacy encoding systems. 7-bit ASCII, for instance, which stores English characters as numbers ranging from 0 to 127. (65 is capital “A”, 97 is lowercase “a”, and so forth.) English has a very simple alphabet, so it can be completely expressed in 7-bit ASCII. Western European languages like French, Spanish, and German all use an encoding system called ISO-8859-1 (also called “latin-1”), which uses the 7-bit ASCII characters for the numbers 0 through 127, but then extends into the 128-255 range for characters like n-with-a-tilde-over-it (241), and u-with-two-dots-over-it (252). And unicode uses the same characters as 7-bit ASCII for 0 through 127, and the same characters as ISO- 8859-1 for 128 through 255, and then extends from there into characters for other languages with the remaining numbers, 256 through 65535.
  • To create a unicode string instead of a regular ASCII string, add the letter “u” before the string. Note that this particular string doesn’t have any non-ASCII characters. That’s fine; unicode is a superset of ASCII (a very large superset at that), so any regular ASCII string can also be stored as unicode. When printing a string, Python will attempt to convert it to your default encoding, which is usually ASCII. (More on this in a minute.) Since this unicode string is made up of characters that are also ASCII characters, printing it has the same result as printing a normal ASCII string; the conversion is seamless, and if you didn’t know that s was a unicode string, you’d never notice the difference.
  • The real advantage of unicode, of course, is its ability to store non-ASCII characters, like the Spanish “ñ” (n with a tilde over it). The unicode character code for the tilde-n is 0xf1 in hexadecimal (241 in decimal), which you can type like this: \xf1

 

 

Command line arguments in Python

  • The first thing to know about sys.argv is that it contains the name of the script you’re calling.
  • Command-line arguments are separated by spaces, and each shows up as a separate element in the sys.argv list.
  • Command-line flags, like –help, also show up as their own element in the sys.argv list.
  • To make things even more interesting, some command-line flags themselves take arguments. For instance, here you have a flag (-m) which takes an argument (kant.xml). Both the flag itself and the flag’s argument are simply sequential elements in the sys.argv list. No attempt is made to associate one with the other; all you get is a list.

 

My next reads for Python are

  1. A Python Book: Beginning Python, Advanced Python, and Python Exercises by Dave Kuhlman.
  2.  Think Python by Allen B. Downey.
  3. Problem Solving with Algorithms and Data Structures by Brad Miller, David Ranum.
  4. may be.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s