banner
Previous Page
PCLinuxOS Magazine
PCLinuxOS
Article List
Disclaimer
Next Page

Casual Python, Part 3


by Peter Kelly (critter)

Stuff

One of the many books that I read while trying to learn python was 'Learning Python,' by Mark Lutz. In the book, he wrote "Programs do things with stuff." That just about sums it up and brings it down to our level. So, what "stuff" and what "things"? The complete answer turns out to be "Just about anything with anything," but for now we shall have to be content with just doing normal things to the easy to understand stuff.

Python calls the 'stuff' that he talks about 'objects,' and python has a lot of objects. In fact, in python, everything is an object. To do the things it does with these objects, python generally uses functions or methods. There are many different types of objects, and python has a few 'core' types about which we really need to know if we are going any further. Each of the types has its own set of methods, and these we also need to be familiar with.

The core types can be listed as:

  • Numbers
  • Strings
  • Lists
  • Tuples
  • Dictionaries
  • Sets
  • Files

There are possibly more, but there is a bit of an overlap between objects, such as numbers and booleans, so the distinction is blurred. Our next project uses strings, lists and files, so I will describe the basics of those. Numbers are something we have already covered.


Strings

A string is an ordered sequence of characters. The characters may be alphanumeric, punctuation, whitespace, or anything defined by the utf-8 standard. Don't worry too much about the last bit. If you can type it, then it is covered by utf-8, which is how the computer knows how to display it. All programming languages use strings. Some, such a C, also have a char type which holds a single character. Python does not. In python, that is a string of length one. Strings in python can be any length that will fit in available memory (but if you really need a really huge string, consider using a file).

A string in python is an object of type str and some other objects may be converted to strings by using the str() function.

>>> str(11.75)
'11.75'
      # The float type has been converted to a string type.

Strings are 'immutable,' which means resistant to mutation, i.e., they cannot be changed. That may seem to be a big disadvantage, but in fact, as you will see, it is just the opposite, and there are other ways to get new strings.

A string is usually referenced by assigning it to a variable, which really means giving it a name.

s = 'The python language'

Any time python is passed the variable s, it is immediately replaced by the text it represents. Since strings are immutable, you can be certain of what that will be. No other part of the program can have changed it. Strings can be chopped up by a process known as slicing, but the original is unchanged. Any slices that need to kept will have to be given a new name (assigned to a variable). Slicing is done using square brackets containing a start position, stop position and optional step or 'stride'. The stop character is not included. If start or stop is omitted, the the beginning or end is assumed. The characters are indexed starting with 0, so that we have:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
T h e     p y t h o n     l a n g u a g e
18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

s[0:6] ==> 'The py'
s[4:10] ==> 'python'
s[:10] ==> 'The python'
s[:10:2] ==> 'Tepto'     # from the beginning, up to but not including pos'n 10 with step 2
s[:] ==> 'The python language'

The last is a quick way to make a copy.

Negative values count backwards from 1 (there can be no minus zero):

s[-3:] ==> 'age'
s[-12:-4] ==> 'hon lang'
s[:-9] ==> 'The python'

s still references 'The python language.' That cannot be changed. This takes practice, but is invaluable, so persevere.

t = s

Both s and t reference the same string, 'The python language'

The way 'things' are done to strings is by methods, and there is a method for just about anything you would want to do to a string. Methods are called using 'dot notation'

s.upper() ==> 'THE PYTHON LANGUAGE'
s.find('python') ==> 4, the start index of the substring 'python'
s.replace('python', 'English') ==> 'The English language'
s.split() returns a list of strings ==> ['The', 'python', 'language']

There are a lot of these methods, which are documented in the official documentation https://docs.python.org/3/, but you can get a good start by reading C H Swaroops book that I mentioned earlier: https://python.swaroopch.com/.

When I covered numbers, I showed how to use python's built-in interpreter. However, there is a better version, which is available for installation from the repositories, named ipython. This is a straight replacement for the standard interpreter, but has many more features. The first difference that you will notice is that the prompt has changed from >>> to 'In [1]:' and is in color, green by default. When you enter an expression and press return, a new line is output beginning with 'Out [1]:' in red by default, which is followed by the output from the expression. Each time you enter a new expression, the number in the brackets is incremented. This is your command history. There are a whole lot more benefits to using this interpreter, some of which we will cover shortly.

Doing things with stuff is more generally referred to as 'data processing' and, when the stuff is text, 'text processing.'But, as the venerable Mr. Lutz pointed out, we are still only doing things with stuff. Although the string cannot be changed, the object that the variable s references can be, and it can be changed to any object type.

s = s[4:10] ==> 'python'

Now s references the new string 'python,' which was obtained by slicing. The original string is untouched so that any other variable names that referenced it, such as t, will not be inconvenienced. If no other references to it have been made, then it will be automatically removed from memory by a process known as 'garbage collection'.

s = 3.14159
s now references a floating point number ( decimal).
t ==> 'The python language' . # t is unchanged

Strings support concatenation and repetition by adding and multiplying.

t + ' is powerful' ==> 'The python language is powerful'
t[4:10] * 3 ==> 'pythonpythonpython'

This can be combined

(t[4:10] + ' ') * 3 ==> 'python python python ' # spaces added.

The characters in strings have to be quoted using either single or double quotes. Which you use doesn't matter, but if a string contains a quote mark, then the other type should be used.

'He said "Hello" to the stranger'

"Pythons' syntax"

There are a third type of quotes in python, called triple quotes. These can be single or double, and allow strings to span multiple lines. They are commonly used in code documentation.

""" This function calculates the square of a number.
sq(x)
x is the number to be squared
returns the square of x """

Using the interpreter, here I am using the ipython interpreter, type the following:

In [2]: dir(str)
Out[2]:
['__add__',
'__class__',
'__contains__',
...
... many more methods deleted
...
'rstrip',
'split',
'splitlines',
'startswith',
'strip',
'swapcase',
'title',
'translate',
'upper',
'Zfill']

This gives a full list of the methods available for use with strings. You can ignore the ones that start with a double underscore for now. If you type, for example:

In [3]: help(str.s

Press the tab key and you will be prompted with the names of the methods that start with s

In [3]: help(str.s
str.split str.strip
str.splitlines str.swapcase
Str.startswith

Use the arrow keys to select one, close the parentheses and press return

In [3]: help(str.strip)

Help on method_descriptor:

strip(self, chars=None, /)

Return a copy of the string with leading and trailing whitespace remove.

If chars is given and not None, remove characters in chars instead. This is some basic help on the method. Press 'q' to return. Try it out.

In [4]: s = ' the end '

In [5]: s.strip()
Out[5]: 'the end'

This tab completion is a great help when typing expressions. The color coding extends to different object types, which is also helpful. Your history, but not the values in variables, is saved across sessions. Press the up arrow to go back.


Lists

In python, a list is an ordered collection of 'things', which are listed between square brackets and separated by commas. The 'things' can be any type of object, even other lists. Lists, unlike strings, are mutable, they can be changed in place.

list1 = ['The', 'python', 'programming', 'language'] # a list of strings
list2 = [42, 3.14159]                    # a list of two types of number
list3 = ['Linux', 17, 42]                # a mixed list
list4 = []                               # an empty list
list5 = list1                            # both list1 and list5 reference the same
# list
list1 == list5 ==> True                  # has the same value
list1 is list5 ==> True                  # references the same object

A single '=' is used for assignment, '==' is used to test for equality.

Lists are sliceable like strings.

list6 = list1[:] # list6 is a copy of list1 list6 == list1 ==> True # has the same value list6 is list1 ==> False # reference different objects

Also like with strings, concatenation and repetition are supported. The result of these operations are a single new list, which must be assigned to a variable to be kept. The original list(s) are not changed. The multiplier in repetition must be an integer.

list1 = [7, 3]

In [6]: list1 + list1
Out[6]: [7, 3, 7, 3]

In [7]: list1 * 3
Out[7]: [7, 3, 7, 3, 7, 3]

In [8]: list1
Out[8]: [7, 3]

In [9]: list1 * list1
----------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-86e343e4b412> in <module>
---->
1 list1 * list1

TypeError: can't multiply sequence by non-int of type 'list'

Lists are also indexable.

list1[1] ==> 'python'
list3[2] ==> 42

Elements can be 'popped' # removed from the end.

list1.pop(2) ==> 'programming'
list1.pop() ==> 'language'
list1 is now ['The', 'python']

They can also be cleared.

list3.clear() ==> []

Elements can be appended.

list2.append(1.6e3) ==> [42, 3.14159, 1600.0]
                              # 1.6e3 is 1.6 * 103

Or inserted at a position before an index.

list2.insert(2, 4096) ===> [42, 3.14159, 4096, 1600.0]

And they can be sorted.

list2.sort() ==> [3.14159, 42, 1600.0, 4096]
list1.sort() ==> ['The', 'language', 'programming', 'python']

As strings are mutable, this sort is carried over to all variables that reference it.

list5 ==> ['The', 'language', 'programming', 'python']

Immutable objects are not such a bad thing.

The sort can be controlled by use of a key.

list6 ==> ['The', 'python', 'programming', 'language']
list6.sort(key=str.lower) ==> ['language', 'programming', 'python', 'The']

Here strings lower method, which returns a lowercase version of the string, is used as the sort key.

This uses the list objects own sort method, but there is also a built in function named sorted (functions like this are known unsurprisingly as 'builtins'). This can be used when the object type has no sort method of its own. The sorted object is not automatically updated, and must be manually assigned to a variable name. The items to be sorted must be of the same type, or you will get an error.

list5 = [99, 17, 42]
list5 = sorted(list5) ==> [17, 42, 99]

However,

list6 = [17, 'Linux', 9]
list6 = sorted(list6)
Traceback (most recent call last):
File "list_sort.py", line 13, in <module>
list6 = sorted(list6)
TypeError: '<' not supported between instances of 'str' and 'int'

You will get a lot of errors like this when you first start writing your own code. As we get more complex, I'll show how to find and deal with errors -- something even very experienced programmers have to do, so don't be discouraged.


Files

Files are the permanent storage of the 'stuff' we are doing 'things' to. Usually, the storage is a hard drive partition on the local or a remote computer. Remotely stored files are typically on a networked computer, web based or on cloud storage systems but to python there is little difference.

To use a file you have to know its location, and have access rights appropriate for reading, writing, creating or deleting the file or directory. The file has to be opened in the correct mode for the action you wish to perform, and there are several modes available:

'r' is read mode, the default if not stated.

'w' is write.

'a' is append.

'b' is for binary or byte files.

't' is for text files, the default, but if omitted is assumed, so is rarely seen. 'wt' means write in text mode'

Adding a '+' e.g. 'r+' provides read and write mode, and if the mode is 'w' or 'a', then the file is created if it does not exist.

Caution! Write will overwrite whatever is already there. If that is not what you want, then use append. Without the 'b', files are opened in text mode. This is what we mostly want, but if you are working with binary data such as graphics or audio, the the file must be opened in bytes mode. The two are not interchangeable, and using the wrong mode will result in garbage at best, and data loss at worst.

For now, we'll stay with text mode. When you have finished with the file, it must be closed. Usually, python will close any open files for you when the program terminates. But if the program crashes while the file is still open, there is a very real chance of data loss. To make our lives easier, python has something called a context manager that ensures that files are closed, even in the event of a crash.

If we do something like:

with open('myfile.txt', mode='a+') as f:
f.write( 'Adding to file...\n')

Write does not automatically add an end of line (\n) so we have to do it. Then run this code 5 times, if we then run this code:

with open('myfile.txt', mode='r') as f:
print(f.read())

We will see:

Adding to file...
Adding to file...
Adding to file...
Adding to file...
Adding to file...

Alternatively we could use the file objects readline method to read a single line.

with open('myfile.txt', mode='r') as f:
print(f.readline(), end='')

adding to file...

The end='' in the print statement is necessary, as the print function does add a newline character by default (to the one we added when writing the file), as we can see if we iterate over the file:

with open('myfile.txt', mode='r') as f:
    for line in f:
        print(line)

adding to file...

adding to file...

adding to file...

adding to file...

adding to file...

This may seem strange, but when we write to a file, we want to control exactly what gets sent to the file. When we print out something, we usually want a newline, but can override this when necessary with the end= clause. When we used print(f.read()) earlier, the print function output what was in the file, and then added a final newline.

But what if....?

If a script could only execute one line at a time in the order they appeared, we would be severely restricted. Sometimes, decisions have to be made and, depending on the result of that decision, a different action taken. Occasionally, the same 'things' need to be done to a whole bunch of 'stuff'. This is called looping and branching, or program control.

Decision making can be controlled with the if statement and the optional elif and else statements. This takes the form

if condition is true:
    do things to stuff

elif a different condition is true:
    do other things to stuff

Else:
    do this

There is also a while statement that looks like this:

while condition is true:
    do things to stuff

This will continue doing things to stuff until the condition is no longer true.

There is also a break statement that jumps straight out of the loop, and a continue statement that cancels just the current iteration of the loop and continues with the next.

To do things to groups of stuff, we have the for statement that was used to iterate over the file in the example above.

for all of stuff in this collection of stuff:
    do these things

Keep an eye out for these in the code of the next few applications, which will start to use these new structures.

With either the standard interpreter or ipython, there is a continuation prompt for multiline expressions such a loops which is three dots:

In [6]: for i in s.strip():
...: if i == ' ':
...: print('Found a space')
...:

Found a space

That's enough theory for now.


Docz revisited

So far, we haven't used a lot of python code, because we don't really know much. This is about to change, as in the next application, we put to use some of the theory just covered. This application doesn't just use a command line tool to perform a function. It does some real file handling and text processing in a mouse and keyboard sensitive GUI. And it is actually quite useful.

We want to make this application do more than just display the names of some files. To be useful, it should launch the associated application so that we can work on it. In this case, the application to be launched is LibreOffice Writer, and the command to do this is

Libreoffice --writer filename.odt

Where filename.odt is supplied by the docz application.

To make things easier, we will change to the directory where the files are stored. To do this, we need to import another standard module named os, which gives us some useful operating system commands. The methods we are looking for are os.path.expanduser() and os.chdir(). The first expands the string '~' to the path of the user's home directory, the second changes directory. We shall also have to tell the application what to do when the mouse is clicked on a filename (and what to do if it is clicked in the empty space after the list of files).

For the application to know where in the text we clicked, we need a cursor, which Qt can provide. However, we need to add another Qt import statement, as it is not provided by either of the two imports we already have. This is fairly advanced stuff, but we can use it even if we don't really understand it.

The modified code looks like this.

#!/usr/bin/env python3

import sys
from PyQt5.QtCore import * from PyQt5.QtGui import * from PyQt5.QtWidgets import * import subprocess import os import docz_ui class Docz(QWidget, docz_ui.Ui_Form): def __init__(self): super(self.__class__, self).__init__() self.setupUi(self) self.quitButton.clicked.connect(self.exitApplication) home = os.path.expanduser('~') os.chdir(home + '/Documents/wills_plays/') cmd = "ls -1 *.odt" # that's -one not -ell result = subprocess.check_output(cmd, shell=True).decode('utf-8') self.resultslist.setText(result) self.resultslist.mousePressEvent = self.mousePressed def mousePressed(self, Event): textCursor = self.resultslist.cursorForPosition(Event.pos()) textCursor.select(QTextCursor.BlockUnderCursor) self.resultslist.setTextCursor(textCursor) f_name = textCursor.selectedText() if f_name == '': # mouse was clicked in empty space pass # so ignore it else: f_name = f_name[1:] # drop the first character command = ('libreoffice --writer ' + f_name) subprocess.Popen(command, shell=True) self.exitApplication() def keyPressEvent(self, e): if e.key() == Qt.Key_Escape: self.exitApplication() def exitApplication(self): self.close() sys.exit() if __name__ == '__main__': app = QApplication(sys.argv) form = Docz() form.show() app.exec_()

If you run the code as supplied, make sure that you have some writer files in your home directory for the application to find.

So, what have we done? We added a line to the initialization code that connects a mouse press to a new mousePressed method. The first four lines in this method set up a QtextCursor, which is how Qt determines its current position in the text. Also, in this new method, we use an if - else routine to check if the mouse was clicked on some text or in empty space. The first character of the returned filename is an invisible character, known as a paragraph separator, which we don't want. So, we assign the variable f_name to a slice of the string that does not contain it. This is how we chop strings up and re-assign them, as strings cannot be changed; they are immutable.

if the text was blank - pass. Pass is python's do-nothing command which does - nothing!

else build a command using the text referred to by word.

execute the command and shut down the application.

Finally, open designer and change the textedit font size to 12 or 14. This will make it easier to select the correct file. Save it and run update_res.sh.

One thing to note here is that although we have a keyPressedEvent method in our code, we are using the Qt textedits' mousePressEvent method, so we should name our mouse click detection routine differently to avoid possible conflicts hence mousePressed.



Previous Page              Top              Next Page