Software Design By Example A Toolbased Introduction With Python Greg Wilson
Software Design By Example A Toolbased Introduction With Python Greg Wilson
Software Design By Example A Toolbased Introduction With Python Greg Wilson
Software Design By Example A Toolbased Introduction With Python Greg Wilson
Software Design By Example A Toolbased Introduction With Python Greg Wilson
1.
Software Design ByExample A Toolbased
Introduction With Python Greg Wilson download
https://ebookbell.com/product/software-design-by-example-a-
toolbased-introduction-with-python-greg-wilson-55974684
Explore and download more ebooks at ebookbell.com
2.
Here are somerecommended products that we believe you will be
interested in. You can click the link to download.
Software Design By Example A Toolbased Introduction With Python Greg
Wilson
https://ebookbell.com/product/software-design-by-example-a-toolbased-
introduction-with-python-greg-wilson-55959768
Upnp Design By Example A Software Developers Guide To Universal Plug
And Play Michael Jeronimo
https://ebookbell.com/product/upnp-design-by-example-a-software-
developers-guide-to-universal-plug-and-play-michael-jeronimo-2184566
Software Design By Example Greg Wilson
https://ebookbell.com/product/software-design-by-example-greg-
wilson-47254836
Practical Design Patterns For Java Developers Hone Your Software
Design Skills By Implementing Popular Design Patterns In Java 1st
Edition Miroslav Wengner
https://ebookbell.com/product/practical-design-patterns-for-java-
developers-hone-your-software-design-skills-by-implementing-popular-
design-patterns-in-java-1st-edition-miroslav-wengner-49111936
3.
Handson Design PatternsWith C And Net Core Write Clean And
Maintainable Code By Using Reusable Solutions To Common Software
Design Problems 1st Edition Gaurav Aroraa
https://ebookbell.com/product/handson-design-patterns-with-c-and-net-
core-write-clean-and-maintainable-code-by-using-reusable-solutions-to-
common-software-design-problems-1st-edition-gaurav-aroraa-53022704
Addison Wesley Domaindriven Design Tackling Complexity In The Heart Of
Software By Eric By Eric Evans
https://ebookbell.com/product/addison-wesley-domaindriven-design-
tackling-complexity-in-the-heart-of-software-by-eric-by-eric-
evans-36065612
Handson Domaindriven Design With Net Core Tackling Complexity In The
Heart Of Software By Putting Ddd Principles Into Practice Alexey
Zimarev
https://ebookbell.com/product/handson-domaindriven-design-with-net-
core-tackling-complexity-in-the-heart-of-software-by-putting-ddd-
principles-into-practice-alexey-zimarev-11074038
Software Design
https://ebookbell.com/product/software-design-58948182
Software Design Patterns The Ultimate Guide 1st Edition Sufyan Bin
Uzay
https://ebookbell.com/product/software-design-patterns-the-ultimate-
guide-1st-edition-sufyan-bin-uzay-47260458
6.
Software Design by
Example
Thebest way to learn design in any field is to study examples, and some of the best examples
of software design come from the tools programmers use in their own work. Software De-
sign by Example: A Tool-Based Introduction with Python therefore builds small versions
of the things programmers use in order to demystify them and give some insights into how ex-
perienced programmers think. From a file backup system and a testing framework to a regular
expression matcher, a browser layout engine, and a very small compiler, we explore common
design patterns, show how making code easier to test also makes it easier to re-use, and help
readers understand how debuggers, profilers, package managers, and version control systems
work so that they can use them more effectively.
This material can be used for self-paced study, in an undergraduate course on software design,
or as the core of an intensive weeklong workshop for working programmers. Each chapter has
a set of exercises ranging in size and difficulty from half a dozen lines to a full day’s work. Read-
ers should be familiar with the basics of modern Python, but the more advanced features of the
language are explained and illustrated as they are introduced.
All the written material in this project can be freely re-used under the terms of the Creative
Commons - Attribution license, while all of the software is made available under the terms of
the Hippocratic License. All proceeds from the sale of this book will go to support the Red Door
Family Shelter in Toronto.
Features:
•
Teaches software design by showing programmers how to build the tools they use every day
•
Each chapter includes exercises to help readers check and deepen their understanding
•
All the example code can be downloaded, re-used, and modified under an open license
Dr. Greg Wilson is a programmer, author, and educator based in Toronto. He co-founded and
was the first Executive Director of Software Carpentry, which has taught basic software skills to
tens of thousands of researchers worldwide, and he has authored or edited over a dozen books
(including two for children). Greg is a member of the Python Software Foundation and a recipi-
ent of ACM SIGSOFT’s Influential Educator of the Year Award.
1
Introduction
• The complexityof a system increases more rapidly than its size.
• The best way to learn design is to study examples, and the best programs to use as
examples are the ones programmers use every day.
• These lessons assume readers can write small programs and want to write larger
ones, or are looking for material to use in software design classes that they teach.
• All of the content is free to read and re-use under open licenses, and all royalties
from sales of this book will go to charity.
Terms defined: cognitive load
The best way to learn design in any field is to study examples [Schon1984; Petre2016],
and the most approachable examples are ones that readers are already familiar with. These
lessons therefore build small versions of tools that programmers use every day1
to show
how experienced software designers think. Along the way, they introduce some fundamen-
tal ideas in computer science that many self-taught programmers haven’t encountered. We
hope these lessons will help you design better software yourself, and that if you know how
programming tools work, you’ll be more likely to use them and better able to use them well.
1.1 Audience
This learner persona [Wilson2019] describes who this book is for:
Maya has a master’s degree in genomics. She knows enough Python to analyze
data from her experiments, but struggles to write code other people can use. These
lessons will show her how to design, build, and test large programs in less time and
with less pain.
Like Maya, you should be able to:
• Write Python programs using lists, loops, conditionals, dictionaries, and functions.
• Puzzle your way through Python programs that use classes and exceptions.
• Run basic Unix shell commands like ls and mkdir.
• Read and write a little bit of HTML.
• Use Git2
to save and share files. (It’s OK not to know the more obscure commands3
.)
1https://en.wikipedia.org/wiki/Programming_tool
2https://git-scm.com/
3https://git-man-page-generator.lokaltog.net/
1
19.
2 1 Introduction
Objectsand
Classes
Running
Tests
An
Interpreter
Functions and
Closures
Object
Persistence
A Template
Expander
Protocols
Binary
Data
A Build
Manager
Undo and
Redo
A File
Viewer
A Debugger
A Virtual
Machine
A Database
Performance
Profiling
Page
Layout
An HTML
Validator
A Code
Linter
Parsing
Text
Matching
Patterns
A File
Archiver
Finding
Duplicate Files
Transferring
Files
Serving
Web Pages
A Package
Manager
source
sink
Legend
2
3
4
5
6
7
8
9
10
11
12
12
14
15
16
17
18
19
20
21
22
23 24
25
26
Figure 1.1: Lesson topics and dependencies.
These chapters (Figure 1.1) are also designed to help another persona:
Yim teaches two college courses on software development. They are frustrated that
so many books talk about details but not about design and use examples that their
students can’t relate to. This book will give them material they can use in class and
starting points for course projects.
1.2 The Big Ideas
Our approach to design is based on three big ideas. First, as the number of components
in a system grows, the complexity of the system increases rapidly (Figure 1.2). However,
the number of things we can hold in working memory at any time is fixed and fairly small
[Hermans2021]. If we want to build large programs that we can understand, we therefore
need to construct them out of pieces that interact in a small number of ways. Figuring out
what those pieces and interactions should be is the core of what we call “design”.
Second, “making sense” depends on who we are. When we use a low-level language,
we incur the cognitive load of assembling micro-steps into something more meaningful.
When we use a high-level language, on the other hand, we incur a similar load translating
functions of functions into actual operations on actual data.
More experienced programmers are more capable at both ends of the curve, but that’s
not the only thing that changes. If a novice’s comprehension curve looks like the lower one
in Figure 1.3, then an expert’s looks like the upper one. Experts don’t just understand more
at all levels of abstraction; their preferred level has also shifted so they find
√
x2 + y2 easier
to read than the Medieval equivalent “the side of the square whose area is the sum of the
areas of the two squares whose sides are given by the first part and the second part”. This
20.
1.3 Formatting 3
AB
C
A B
C
D E
F
A B
C
D E
F
3 components have
3 interactions
6 components have
30 interactions
2 components with 3 sub-components
have 7 interactions
Figure 1.2: How complexity grows with size.
Abstraction
Comprehension
expert
novice
Figure 1.3: Novice and expert comprehension curves.
curve means that for any given task, the code that is quickest for a novice to comprehend
will almost certainly be different from the code that an expert can understand most quickly.
Our third big idea is that programs are just another kind of data. Source code is just text,
which we can process like other text files. Likewise, a program in memory is just a data
structure that we can inspect and modify like any other. Treating code like data enables us
to solve hard problems in elegant ways, but at the cost of increasing the level of abstraction
in our programs. Once again, finding the balance is what we mean by “design”.
1.3 Formatting
We display Python source code like this:
for ch in example:
print(ch)
and Unix shell commands like this:
for filename in *.dat
do
cut -d , -f 10 $filename
done
21.
4 1 Introduction
Datafiles and program output are shown like this:
- name: read
params:
- sample_data.csv
alpha
beta
gamma
delta
We use ... to show where lines have been omitted, and occasionally break lines in
unnatural ways to make them fit on the page. Where we do this, we end all but the last line
with a single backslash . Finally, we show glossary entries in bold text and write functions
as function_name rather than function_name(). The latter is more common, but the empty
parentheses makes it hard to tell whether we’re talking about the function itself or a call to
the function with no parameters.
1.4 Usage
The source for this book is available in our Git repository4
and all of it can be read on our
website5
. All of the written material in this book is licensed under the Creative Commons -
Attribution - NonCommercial 4.0 International license6
(CC-BY-NC-4.0), while the software
is covered by the Hippocratic License7
. The first license allows you to use and remix this
material for noncommercial purposes, as-is or in adapted form, provided you cite its orig-
inal source; if you want to sell copies or make money from this material in any other way,
you must contact us8
and obtain permission first. The second license allows you to use
and remix the software on this site provided you do not violate international agreements
governing human rights; please see Appendix D for details.
If you would like to improve what we have, add new material, or ask questions, please
file an issue in our GitHub repository9
or send an email10
. All contributors are required to
abide by our Code of Conduct (Appendix E).
1.5 What People Are Saying
Here’s what people said about the JavaScript version of this book [Wilson2022a]:
• Jessica Kerr11
: “Software Design by Example is the book I’ll recommend to every new
dev... It is nice to you. It wants you to succeed... It’s a bridge from ‘learn to program’ to
working programmer.”
4https://github.com/gvwilson/sdxpy/
5https://third-bit.com/sdxpy/
6https://creativecommons.org/licenses/by-nc/4.0/
7https://firstdonoharm.dev/
8mailto:gvwilson@third-bit.com
9https://github.com/gvwilson/sdxpy/
10mailto:gvwilson@third-bit.com
11https://jessitron.com/2023/02/20/book-review-software-design-by-example/
22.
1.6 Acknowledgments 5
•Jenn Schiffer12
: “I am v much enjoying gvwilson’s book Software Design by Example. It
makes me miss teaching, it would be such a fun text to use!”
• Emily Gorcenski13
: “There’s a lot of books on programming but fewer books that couple
software development with effective and practical use of tools, presenting a language not
as a main course but as a part of an engineering ecosystem. Greg Wilson’s book hits all
the right notes in bringing together theory, pragmatism, and best practices.”
• Danielle Navarro14
: “The book is really bloody lovely.”
1.6 Acknowledgments
Like [Wilson2022a], this book was inspired by [Kamin1990; Kernighan1979; Kernighan1981;
Kernighan1983; Kernighan1988; Oram2007; Wirth1976] and by:
• The Architecture of Open Source Applications15
series [Brown2011; Brown2012; Arm-
strong2013; Brown2016];
• Mary Rose Cook’s16
Gitlet17
;
• Matt Brubeck’s18
browser engine tutorial19
;
• Web Browser Engineering20
by Pavel Panchekha21
and Chris Harrelson22
;
• Connor Stack’s23
database tutorial24
;
• Maël Nison’s25
package manager tutorial26
;
• Paige Ruten’s27
kilo text editor28
and Wasim Lorgat’s29
editor tutorial30
;
• Bob Nystrom’s31
Crafting Interpreters32
[Nystrom2021]; and
• the posts and zines33
created by Julia Evans34
.
12https://mastodon.social/@jenn@pixel.kitchen/109985276835264400
13https://emilygorcenski.com/post/book-report-software-design-by-example-by-greg-wilson/
14https://blog.djnavarro.net/posts/2023-05-31_software-design-by-example/
15https://aosabook.org/
16https://maryrosecook.com/
17http://gitlet.maryrosecook.com/
18https://limpet.net/mbrubeck/
19https://limpet.net/mbrubeck/2014/08/08/toy-layout-engine-1.html
20https://browser.engineering/
21https://pavpanchekha.com/
22https://twitter.com/chrishtr
23https://connorstack.com/
24https://cstack.github.io/db_tutorial/
25https://arcanis.github.io/
26https://classic.yarnpkg.com/blog/2017/07/11/lets-dev-a-package-manager/
27https://viewsourcecode.org/
28https://viewsourcecode.org/snaptoken/kilo/index.html
29https://wasimlorgat.com/
30https://github.com/seem/editor
31http://journal.stuffwithstuff.com/
32https://craftinginterpreters.com/
33https://wizardzines.com/
34https://jvns.ca/
23.
6 1 Introduction
Iam grateful to Miras Adilov, Alvee Akand, Rohan Alexander, Alexey Alexapolsky, Lina
Andrén, Alberto Bacchelli, Yanina Bellini Saibene, Matthew Bluteau, Adrienne Canino, Marc
Chéhab, Stephen Childs, Hector Correa, Socorro Dominguez, Christian Drumm, Christian
Epple, Julia Evans, Davide Fucci, Thomas Fritz, Francisco Gabriel, Florian Gaudin-Delrieu,
Craig Gross, Jonathan Guyer, McKenzie Hagen, Han Qi, Fraser Hay, Alexandru Hurjui,
Bahman Karimi, Carolyn Kim, Kitsios Konstantinos, Jenna Landy, Peter Lin, Zihan Liu,
Becca Love, Dan McCloy, Ramiro Mejia, Michael Miller, Firas Moosvi, Joe Nash, Sheena
Ng, Reiko Okamoto, Juanan Pereira, Mahmoodur Rahman, Arpan Sarkar, Silvan Schlegel,
Rosan Shanmuganathan, Dave W. Smith, Stephen M. Sturdevant, Diyar Taskiran, Ece Tur-
nator, and Yao Yundong for feedback on early drafts of this material.
I am also grateful to Shashi Kumar for help with LaTeX, to Odin Beuchat35
for help with
JavaScript, and to the creators of Black36
, flake837
, Glosario38
, GNU Make39
, isort40
, ark41
,
LaTeX42
, pip43
, Python44
, Remark45
, WAVE46
, and many other open source tools: if we all
give a little, we all get a lot.
All royalties from this book will go to the Red Door Family Shelter47
in Toronto.
1.7 Exercises
Setting Up
1. Use pip48
to install Black49
, flake850
, and isort51
on your computer.
2. Run them on a few programs you have already written. (The file setup.cfg in the root
directory of this book’s GitHub repository52
has the settings we use for these tools.) What
problems do they report? Which of these reports do you disagree with?
Avoiding Potholes
Go to the GitHub repository53
for this book and look at the open issues. Which of them can
you understand? What makes the others hard to understand? What could you add, leave
out, or write differently when you report a problem that you have found?
35https://www.drafolin.ch/
36https://black.readthedocs.io/
37https://flake8.pycqa.org/
38https://glosario.carpentries.org/
39https://www.gnu.org/software/make/
40https://pycqa.github.io/isort/
41https://www.dmulholl.com/docs/ark/main/
42https://www.latex-project.org/
43https://pip.pypa.io/
44https://www.python.org/
45https://remarkjs.com/
46https://wave.webaim.org/
47https://www.reddoorshelter.ca/
48https://pip.pypa.io/
49https://black.readthedocs.io/
50https://flake8.pycqa.org/
51https://pycqa.github.io/isort/
52https://github.com/gvwilson/sdxpy/
53https://github.com/gvwilson/sdxpy/
24.
2
Objects and Classes
•Objects are useful without classes, but classes make them easier to understand.
• A well-designed class defines a contract that code using its instances can rely on.
• Objects that respect the same contract are polymorphic, i.e., they can be used
interchangeably even if they do different specific things.
• Objects and classes can be thought of as dictionaries with stereotyped behavior.
• Most languages allow functions and methods to take a variable number of
arguments.
• Inheritance can be implemented in several ways that differ in the order in which
objects and classes are searched for methods.
Terms defined: alias, argument, cache, class method, constructor, derived class,
design by contract, monkey patching, multiple inheritance, object-oriented
programming, parameter, polymorphism, recursion, spread, static method,
upcall, varargs
We are going to create a lot of objects and classes in these lessons, and they will be a
lot easier to use if we understand how they are implemented. Historically, object-oriented
programming (OOP) was invented to solve two problems:
1. What is a natural way to represent real-world “things” in code?
2. How can we organize code to make it easier to understand, test, and extend?
2.1 Objects
As a motivating problem, let’s define some of the things a generic shape in a drawing
package must be able to do:
class Shape:
def __init__(self, name):
self.name = name
def perimeter(self):
raise NotImplementedError(perimeter)
def area(self):
raise NotImplementedError(area)
A specification like this is sometimes called a contract because an object must satisfy it
in order to be considered a shape, i.e., must provide methods with these names that do
7
25.
8 2 Objectsand Classes
what those names suggest. For example, we can derive classes from Shape to represent
squares and circles.
class Square(Shape):
def __init__(self, name, side):
super().__init__(name)
self.side = side
def perimeter(self):
return 4 * self.side
def area(self):
return self.side ** 2
class Circle(Shape):
def __init__(self, name, radius):
super().__init__(name)
self.radius = radius
def perimeter(self):
return 2 * math.pi * self.radius
def area(self):
return math.pi * self.radius ** 2
Since squares and circles have the same methods, we can use them interchangeably.
This is called polymorphism, and it reduces cognitive load by allowing the people using
related things to ignore their differences:
examples = [Square(sq, 3), Circle(ci, 2)]
for thing in examples:
n = thing.name
p = thing.perimeter()
a = thing.area()
print(f{n} has perimeter {p:.2f} and area {a:.2f})
sq has perimeter 12.00 and area 9.00
ci has perimeter 12.57 and area 12.57
But how does polymorphism work? The first thing we need to understand is that a func-
tion is an object. While the bytes in a string represent characters and the bytes in an image
represent pixels, the bytes in a function are instructions (Figure 2.1). When Python exe-
cutes the code below, it creates an object in memory that contains the instructions to print
a string and assigns that object to the variable example:
def example():
print(in example)
We can create an alias for the function by assigning it to another variable and then call the
function by referencing that second variable. Doing this doesn’t alter or erase the connection
between the function and the original name:
alias = example
alias()
in example
We can also store function objects in data structures like lists and dictionaries. Let’s
write some functions that do the same things as the methods in our original Python and
store them in a dictionary to represent a square (Figure 2.2):
26.
2.1 Objects 9
astext
it's all just bytes as image old_x = x
x = 2 * x + 1
as instructions
bytes
74 69 73 27 61 20 6C 6C 6A 20 73 75 20 74 79 62 65 74 0A 73
Figure 2.1: Bytes can be interpreted as text, images, instructions, and more.
name
side
perim
area
square
sq
2
square_perim()
square_area()
name
radius
perim
area
circle
ci
3
circle_perim()
circle_area()
Figure 2.2: Using dictionaries to emulate objects.
def square_perimeter(thing):
return 4 * thing[side]
def square_area(thing):
return thing[side] ** 2
def square_new(name, side):
return {
name: name,
side: side,
perimeter: square_perimeter ,
area: square_area
}
If we want to use one of the “methods” in this dictionary, we call it like this:
def call(thing, method_name):
return thing[method_name](thing)
examples = [square_new(sq, 3), circle_new(ci, 2)]
for ex in examples:
n = ex[name]
p = call(ex, perimeter)
a = call(ex, area)
print(f{n} {p:.2f} {a:.2f})
The function call looks up the function stored in the dictionary, then calls that function with
the dictionary as its first object; in other words, instead of using obj.meth(arg) we use
obj[meth](obj, arg). Behind the scenes, this is (almost) how objects actually work.
27.
10 2 Objectsand Classes
We can think of an object as a special kind of dictionary. A method is just a function that
takes an object of the right kind as its first parameter (typically called self in Python).
2.2 Classes
One problem with implementing objects as dictionaries is that it allows every single object
to behave slightly differently. In practice, we want objects to store different values (e.g.,
different squares to have different sizes) but the same behaviors (e.g., all squares should
have the same methods). We can implement this by storing the methods in a dictionary
called Square that corresponds to a class and having each individual square contain a ref-
erence to that higher-level dictionary (Figure 2.3). In the code below, that special reference
uses the key _class:
def square_perimeter(thing):
return 4 * thing[side]
def square_area(thing):
return thing[side] ** 2
Square = {
perimeter: square_perimeter ,
area: square_area ,
_classname: Square
}
def square_new(name, side):
return {
name: name,
side: side,
_class: Square
}
Calling a method now involves one more lookup because we have to go from the object
to the class to the method, but once again we call the “method” with the object as the first
argument:
name
side
perim
area
square
sq
2
square_perim()
square_area()
name
radius
perim
area
circle
ci
3
circle_perim()
circle_area()
_class
_class
_classname square
_classname circle
Square
Circle
Figure 2.3: Using dictionaries to emulate classes.
28.
2.3 Arguments 11
defcall(thing, method_name):
return thing[_class][method_name](thing)
examples = [square_new(sq, 3), circle_new(ci, 2)]
for ex in examples:
n = ex[name]
p = call(ex, perimeter)
a = call(ex, area)
c = ex[_class][_classname]
print(f{n} is a {c}: {p:.2f} {a:.2f})
As a bonus, we can now reliably identify objects’ classes and ask whether two objects are
of the same class or not by checking what their _class keys refer to.
Arguments vs. Parameters
Many programmers use the words argument and parameter interchangeably, but
to make our meaning clear, we call the values passed into a function its arguments
and the names the function uses to refer to them as its parameters. Put another way,
parameters are part of the definition, and arguments are given when the function is
called.
2.3 Arguments
The methods we have defined so far operate on the values stored in the object’s dictionary,
but none of them take any extra arguments as input. Implementing this is a little bit tricky
because different methods might need different numbers of arguments. We could define
functions call_0, call_1, call_2 and so on to handle each case, but like most modern
languages, Python gives us a better way. If we define a parameter in a function with a
leading *, it captures any “extra” values passed to the function that don’t line up with named
parameters. Similarly, if we define a parameter with two leading stars **, it captures any
extra named parameters:
def show_args(title, *args, **kwargs):
print(f{title} args '{args}' and kwargs '{kwargs}')
show_args(nothing)
show_args(one unnamed argument, 1)
show_args(one named argument, second=2)
show_args(one of each, 3, fourth=4)
nothing args '()' and kwargs '{}'
one unnamed argument args '(1,)' and kwargs '{}'
one named argument args '()' and kwargs '{'second': '2'}'
one of each args '(3,)' and kwargs '{'fourth': '4'}'
This mechanism is sometimes referred to as varargs (short for “variable arguments”).
A complementary mechanism called spreading allows us to take a list or dictionary full of
arguments and spread them out in a call to match a function’s parameters:
29.
12 2 Objectsand Classes
def show_spread(left, middle, right):
print(fleft {left} middle {middle} right {right})
all_in_list = [1, 2, 3]
show_spread(*all_in_list)
all_in_dict = {right: 30, left: 10, middle: 20}
show_spread(**all_in_dict)
left 1 middle 2 right 3
left 10 middle 20 right 30
With these tools in hand, let’s add a method to our Square class to tell us whether a
square is larger than a user-specified size:
def square_larger(thing, size):
return call(thing, area) size
Square = {
perimeter: square_perimeter ,
area: square_area ,
larger: square_larger ,
_classname: Square
}
The function that implements this check for circles looks exactly the same:
def circle_larger(thing, size):
return call(thing, area) size
We then modify call to capture extra arguments in *args and spread them into the function
being called:
def call(thing, method_name , *args):
return thing[_class][method_name](thing, *args)
Our tests show that this works:
examples = [square_new(sq, 3), circle_new(ci, 2)]
for ex in examples:
result = call(ex, larger, 10)
print(fis {ex['name']} larger? {result})
is sq larger? False
is ci larger? True
However, we now have two functions that do exactly the same thing—the only difference
between them is their names. Anything in a program that is duplicated in several places
will eventually be wrong in at least one, so we need to find some way to share this code.
2.4 Inheritance
The tool we want is inheritance. To see how this works in Python, let’s add a method called
density to our original Shape class that uses other methods defined by the class
30.
2.4 Inheritance 13
classShape:
def __init__(self, name):
self.name = name
def perimeter(self):
raise NotImplementedError(perimeter)
def area(self):
raise NotImplementedError(area)
def density(self, weight):
return weight / self.area()
examples = [Square(sq, 3), Circle(ci, 2)]
for ex in examples:
n = ex.name
d = ex.density(5)
print(f{n}: {d:.2f})
sq: 0.56
ci: 0.40
To enable our dictionary-based “classes” to do the same thing, we create a dictionary
to represent a generic shape and give it a “method” to calculate density:
def shape_density(thing, weight):
return weight / call(thing, area)
Shape = {
density: shape_density ,
_classname: Shape,
_parent: None
}
We then add another specially-named field to the dictionaries for “classes” like Square
to keep track of their parents:
Square = {
perimeter: square_perimeter ,
area: square_area ,
_classname: Square,
_parent: Shape
}
and modify the call function to search for the requested method (Figure 2.4):
def call(thing, method_name , *args):
method = find(thing[_class], method_name)
return method(thing, *args)
def find(cls, method_name):
while cls is not None:
if method_name in cls:
return cls[method_name]
cls = cls[_parent]
raise NotImplementedError(method_name)
31.
14 2 Objectsand Classes
name
side
perim
area
square
sq
2
square_perim()
square_area()
name
radius
perim
area
circle
ci
3
circle_perim()
circle_area()
_class
_class
_classname square
_classname circle
Square
Circle
_parent
_parent
density shape_density()
_classname shape
Shape
_parent
Figure 2.4: Using dictionary search to implement inheritance.
A simple test shows that this is working as intended:
examples = [square_new(sq, 3), circle_new(ci, 2)]
for ex in examples:
n = ex[name]
d = call(ex, density, 5)
print(f{n}: {d:.2f})
sq: 0.56
ci: 0.40
We do have one task left, though: we need to make sure that when a square or circle
is made, it is made correctly. In short, we need to implement constructors. We do this
by giving the dictionaries that implement classes a special key _new whose value is the
function that builds something of that type:
def shape_new(name):
return {
name: name,
_class: Shape
}
Shape = {
density: shape_density ,
_classname: Shape,
_parent: None,
_new: shape_new
}
In order to make an object, we call the function associated with its _new key:
def make(cls, *args):
return cls[_new](*args)
That function is responsible for upcalling the constructor of its parent. For example, the
constructor for a square calls the constructor for a generic shape and adds square-specific
values using | to combine two dictionaries:
32.
2.5 Summary 15
defsquare_new(name, side):
return make(Shape, name) | {
side: side,
_class: Square
}
Square = {
perimeter: square_perimeter ,
area: square_area ,
_classname: Square,
_parent: Shape,
_new: square_new
}
Of course, we’re not done until we test it:
examples = [make(Square, sq, 3), make(Circle, ci, 2)]
for ex in examples:
n = ex[name]
d = call(ex, density, 5)
print(f{n}: {d:.2f})
sq: 0.56
ci: 0.40
2.5 Summary
We have only scratched the surface of Python’s object system. Multiple inheritance, class
methods, static methods, and monkey patching are powerful tools, but they can all be
understood in terms of dictionaries that contain references to properties, functions, and
other dictionaries (Figure 2.5).
class
object
methods
attributes
defines
has
instance of
contract
respects
dictionary
dictionary
stored
in
refers
to
constructor
includes
stored
in
Figure 2.5: Concept map for implementing objects and classes.
33.
16 2 Objectsand Classes
2.6 Exercises
Handling Named Arguments
The final version of call declares a parameter called *args to capture all the positional
arguments of the method being called and then spreads them in the actual call. Modify it to
capture and spread named arguments as well.
Multiple Inheritance
Implement multiple inheritance using dictionaries. Does your implementation look methods
up in the same order as Python would?
Class Methods and Static Methods
1. Explain the differences between class methods and static methods.
2. Implement both using dictionaries.
Reporting Type
Python type method reports the most specific type of an object, while isinstance deter-
mines whether an object inherits from a type either directly or indirectly. Add your own
versions of both to dictionary-based objects and classes.
Using Recursion
A recursive function is one that calls itself, either directly or indirectly. Modify the find
function that finds a method to call so that it uses recursion instead of a loop. Which version
is easier to understand? Which version is more efficient?
Method Caching
Our implementation searches for the implementation of a method every time that method
is called. An alternative is to add a cache to each object to save the methods that have
been looked up before. For example, each object could have a special key called _cache
whose value is a dictionary. The keys in that dictionary are the names of methods that have
been called in the past, and the values are the functions that were found to implement
those methods. Add this feature to our dictionary-based objects. How much more complex
does it make the code? How much extra storage space does it need compared to repeated
lookup?
34.
3
Finding Duplicate Files
•A hash function creates a fixed-size value from an arbitrary sequence of bytes.
• Use big-oh notation to estimate the running time of algorithms.
• The output of a hash function is deterministic but not easy to predict.
• A good hash function’s output is evenly distributed.
• A large cryptographic hash can be used to uniquely identify a file’s contents.
Terms defined: big-oh notation, binary mode, bucket, collision (in hashing),
cryptographic hash function, hash code, hash function, hexadecimal, SHA-256
(hash function), streaming API, time complexity
Suppose we want to find duplicated files, such as extra copies of photos or data sets.
People often rename files, so we must compare their contents, but this will be slow if we
have a lot of files.
We can estimate how slow “slow” will be with a simple calculation. N objects can be
paired in N(N − 1) ways. If we remove duplicate pairings (i.e., if we count A-B and B-A
as one pair) then there are N(N − 1)/2 = (N2
− N)/2 distinct pairs. As N gets large,
this value is approximately proportional to N2
. A computer scientist would say that the
time complexity of our algorithm is O(N2
), which is pronounced “big-oh of N squared”.
In simpler terms, when the number of files doubles, the running time roughly quadruples,
which means the time per file increases as the number of files increases.
Slowdown like this is often unavoidable, but in our case there’s a better way. If we
generate a shorter identifier for each file that depends only on the bytes it contains, we can
group together the files that have the same identifier and only compare the files within a
group. This approach is faster because we only do the expensive byte-by-byte comparison
on files that might be equal. And as we’ll see, if we are very clever about how we generate
identifiers then we can avoid byte-by-byte comparisons entirely.
3.1 Getting Started
We’ll start by implementing the inefficient N2
approach so that we can compare our later
designs to it. The short program below takes a list of filenames from the command line,
finds duplicates, and prints the matches:
def find_duplicates(filenames):
matches = []
for left in filenames:
for right in filenames:
if same_bytes(left, right):
matches.append((left, right))
return matches
17
35.
18 3 FindingDuplicate Files
if __name__ == __main__:
duplicates = find_duplicates(sys.argv[1:])
for (left, right) in duplicates:
print(left, right)
This program uses a function called same_bytes that reads two files and compares them
byte by byte:
def same_bytes(left_name, right_name):
left_bytes = open(left_name, rb).read()
right_bytes = open(right_name, rb).read()
return left_bytes == right_bytes
Notice that the files are opened in binary mode using rb instead of the usual r. As
we’ll see in Chapter 17, this tells Python to read the bytes exactly as they are rather than
trying to convert them to characters.
To test this program and the others we’re about to write, we create a tests directory
with six files:
Filename a1.txt a2.txt a3.txt b1.txt b2.txt c1.txt
Content aaa aaa aaa bb bb c
We expect the three a files and the two b files to be reported as duplicates. There’s no
particular reason for these tests—we just have to start somewhere. Our first test looks like
this:
python brute_force_1.py tests/*.txt
tests/a1.txt tests/a1.txt
tests/a1.txt tests/a2.txt
tests/a1.txt tests/a3.txt
tests/a2.txt tests/a1.txt
tests/a2.txt tests/a2.txt
tests/a2.txt tests/a3.txt
tests/a3.txt tests/a1.txt
tests/a3.txt tests/a2.txt
tests/a3.txt tests/a3.txt
tests/b1.txt tests/b1.txt
tests/b1.txt tests/b2.txt
tests/b2.txt tests/b1.txt
tests/b2.txt tests/b2.txt
tests/c1.txt tests/c1.txt
Our program’s output is correct but not useful: every file is reported as being identical
to itself, and every match of different files is reported twice. Let’s fix the nested loop in
find_duplicates so that we only check potentially differing pairs once (Figure 3.1):
def find_duplicates(filenames):
matches = []
for i_left in range(len(filenames)):
left = filenames[i_left]
for i_right in range(i_left):
right = filenames[i_right]
if same_bytes(left, right):
matches.append((left, right))
return matches
36.
3.2 Hashing Files19
1
2
3
0
0 1 2 3
i_left
i_right
Figure 3.1: Scoping the inner loop to produce unique combinations.
3.2 Hashing Files
Instead of comparing every file against every other, let’s process each file once to produce
a short identifier that depends only on the file’s contents and then only compare files that
have the same identifier, i.e., that might be equal. If files are evenly divided into g groups
then each group will contain roughly N/g files, so the total work will be roughly O(g(N/g)2
)
(i.e., g groups times (N/g)2
comparisons within each group). Simplifying, this is N2
/g, so
as the number of groups grows, and the overall running time should decrease (Figure 3.2).
a1.txt
a2.txt
a3.txt
b1.txt
b2.txt
c1.txt
aaa
aaa
aaa
bb
bb
c
6
6
6
10
10
5
filename contents hash code
Figure 3.2: Grouping by hash code reduces comparisons from 15 to 4.
We can construct IDs for files using a hash function to produce a hash code. Since
bytes are just numbers, we can create a very simple hash function by adding up the bytes
in a file and taking the remainder modulo some number:
def naive_hash(data):
return sum(data) % 13
Here’s a quick test that calculates the hash code for successively longer substrings of
the word hashing:
example = bytes(hashing, utf-8)
for i in range(1, len(example) + 1):
substring = example[:i]
hash = naive_hash(substring)
print(f{hash:2} {substring})
0 b'h'
6 b'ha'
4 b'has'
4 b'hash'
5 b'hashi'
37.
20 3 FindingDuplicate Files
0 2 4 6 8 10 12
0
1000
2000
3000
hash
count
Figure 3.3: Distribution of hash codes per line in Dracula.
0 2 4 6 8 10 12
0
200
400
600
800
1000
hash
count
Figure 3.4: Distribution of hash codes per unique line in Dracula.
11 b'hashin'
10 b'hashing'
The output seems random, but is it? As a more stringent test, let’s try hashing every
line of text in the Project Gutenberg1
version of the novel Dracula and plot the distribution
(Figure 3.3).
Most of the buckets are approximately the same height, but why is there a peak at
zero? Our big-oh estimate of how efficient our algorithm would be depended on files being
distributed evenly between groups; if that’s not the case, our code won’t be as fast as we
hoped.
After a bit of digging, it turns out that the text file we’re processing uses a blank line to
separate paragraphs. These hash to zero, so the peak reflects an unequal distribution in
our data. If we plot the distribution of hash codes of unique lines, the result is more even
(Figure 3.4).
1https://www.gutenberg.org/
38.
3.3 Better Hashing21
Hashing is a tremendously powerful tool: for example, Python’s dictionaries hash their
keys to make lookup fast. Now that we can hash files, we can build a dictionary with hash
codes as keys and sets of filenames as values. The code that does this is shown below;
each time it calculate a hash code, it checks to see if that value has been seen before. If
not, it creates a new entry in the groups dictionary with the hash code as its key and an
empty set as a value. It can then be sure that there’s a set to add the filename to:
def find_groups(filenames):
groups = {}
for fn in filenames:
data = open(fn, rb).read()
hash_code = naive_hash(data)
if hash_code not in groups:
groups[hash_code] = set()
groups[hash_code].add(fn)
return groups
We can now re-use most of the code we wrote earlier to find duplicates within each
group:
groups = find_groups(sys.argv[1:])
for filenames in groups.values():
duplicates = find_duplicates(list(filenames))
for (left, right) in duplicates:
print(left, right)
tests/a2.txt tests/a1.txt
tests/a3.txt tests/a1.txt
tests/a3.txt tests/a2.txt
tests/b1.txt tests/b2.txt
3.3 Better Hashing
Let’s go back to the formula O(N2
/g) that tells us how much work we have to do if we have
divided N files between g groups. If we have exactly as many groups as files—i.e., if g is
equal to N—then the work to process N files would be O(N2
/N) = O(N), which means
that the work will be proportional to the number of files. We have to read each file at least
once anyway, so we can’t possibly do better than this, but how can we ensure that each
unique file winds up in its own group?
The answer is to use a cryptographic hash function. The output of such a function
is completely deterministic: given the same bytes in the same order, it will always produce
the same output. However, the output is distributed like a uniform random variable: each
possible output is equally likely, which ensures that files will be evenly distributed between
groups.
Cryptographic hash functions are hard to write, and it’s very hard to prove that a partic-
ular algorithm has the properties we require. We will therefore use a function from Python’s
hashing module2
that implements the SHA-256 hashing algorithm. Given some bytes as
input, this function produces a 256-bit hash, which is normally written as a 64-character
hexadecimal string. This uses the letters A-F (or a-f) to represent the digits from 10 to 15,
so that (for example) 3D5 is (3×162
) + (13×161
) + (5×160
), or 981 in decimal:
2https://docs.python.org/3/library/hashlib.html
39.
22 3 FindingDuplicate Files
example = bytes(hash, utf-8)
for i in range(1, len(example) + 1):
substring = example[:i]
hash = sha256(substring).hexdigest()
print(f{substring}n{hash})
b'h'
aaa9402664f1a41f40ebbc52c9993eb66aeb366602958fdfaa283b71e64db123
b'ha'
8693873cd8f8a2d9c7c596477180f851e525f4eaf55a4f637b445cb442a5e340
b'has'
9150c74c5f92d51a92857f4b9678105ba5a676d308339a353b20bd38cd669ce7
b'hash'
d04b98f48e8f8bcc15c6ae5ac050801cd6dcfd428fb5f9e65c4e16e7807340fa
The Birthday Problem
The odds that two people share a birthday are 1/365 (ignoring February 29). The
odds that they don’t are therefore 364/365. When we add a third person, the odds
that they don’t share a birthday with either of the preceding two people are 363/365,
so the overall odds that nobody shares a birthday are (364/365)×(363/365). If we
keep going, there’s a 50% chance of two people sharing a birthday in a group of just
23 people, and a 99.9% chance with 70 people.
The same math can tell us how many files we need to hash before there’s a 50%
chance of a collision with a 256-bit hash. According to Wikipedia3
, the answer is
approximately 4×1038
files. We’re willing to take that risk.
Using this library function makes our duplicate file finder much shorter:
import sys
from hashlib import sha256
def find_groups(filenames):
groups = {}
for fn in filenames:
data = open(fn, rb).read()
hash_code = sha256(data).hexdigest()
if hash_code not in groups:
groups[hash_code] = set()
groups[hash_code].add(fn)
return groups
if __name__ == __main__:
groups = find_groups(sys.argv[1:])
for filenames in groups.values():
print(, .join(sorted(filenames)))
python dup.py tests/*.txt
tests/a1.txt, tests/a2.txt, tests/a3.txt
tests/b1.txt, tests/b2.txt
tests/c1.txt
3https://en.wikipedia.org/wiki/Birthday_problem
40.
3.4 Summary 23
Moreimportantly, our new approach scales to very large sets of files: as explained
above, we only have to look at each file once, so the running time is as good as it possibly
can be.
3.4 Summary
Figure 3.5 summarizes the key ideas in this chapter, the most important of which is that
some algorithms are intrinsically better than others.
hash function
bytes
file
hash code
from
input is
output
is
deterministic
uniformly
distributed
is
unique for
each sequence of
duplicates
may be efficiency
improves
of detecting
big-oh
measured by
such
as
linear O(N)
quadratic O(N2)
Figure 3.5: Concept map for duplicate file detection with hashing.
3.5 Exercises
Odds of Collision
If hashes were only 2 bits long, then the chances of collision with each successive file
assuming no previous collision are:
Number of Files Odds of Collision
1 0%
2 25%
3 50%
4 75%
5 100%
A colleague of yours says this means that if we hash four files, there’s only a 75% chance
of any collision occurring. What are the actual odds?
41.
24 3 FindingDuplicate Files
Streaming I/O
A streaming API delivers data one piece at a time rather than all at once. Read the doc-
umentation for the update method of hashing objects in Python’s hashing module4
and
rewrite the duplicate finder from this chapter to use it.
Big Oh
Chapter 1 said that as the number of components in a system grows, the complexity of the
system increases rapidly. How fast is “rapidly” in big-oh terms?
The hash Function
1. Read the documentation for Python’s built-in hash function.
2. Why do hash(123) and hash(123) work when hash([123]) raises an exception?
How Good Is SHA-256?
1. Write a function that calculates the SHA-256 hash code of each unique line of a text file.
2. Convert the hex digests of those hash codes to integers.
3. Plot a histogram of those integer values with 20 bins.
4. How evenly distributed are the hash codes? How does the distribution change as you
process larger files?
4https://docs.python.org/3/library/hashlib.html
42.
4
Matching Patterns
• Useglobs and regular expressions to match patterns in text.
• Use inheritance to make matchers composable and extensible.
• Simplify code by having objects delegate work to other objects.
• Use the Null Object pattern to eliminate special cases in code.
• Use standard refactorings to move code from one working state to another.
• Build and check the parts of your code you are least sure of first to find out if your
design will work.
Terms defined: Chain of Responsibility pattern, child class, Extract Parent
Class refactoring, globbing, greedy matching, helper method, inheritance, lazy
matching, literal (in parsing), Null Object pattern, refactor, regular expression,
signature, technical debt, test-driven development
We used *.txt to tell the duplicate file finder of Chapter 3 which files to compare. Older
programmers (like this author) refer to this kind of pattern-matching as globbing because
early versions of Unix had a tool called glob1
to do it. Globbing was so useful that it was
quickly added to the shell, and the Python standard library includes a module called glob2
to
match filenames in the same way. For example, 2023-*.{pdf,txt} matches 2023-01.txt
and 2023-final.pdf but not draft-2023.docx (Figure 4.1).
Globbing patterns are simpler than the regular expressions used to scrape data from
text files, but the principles are the same. This chapter therefore implements a simple ver-
sion of globbing to show how pattern-matching works in general. This matcher will only
handle the cases in Table 4.1, but as the exercises will show, our design makes it easy to
add new kinds of patterns.
2023- * . {pdf,txt}
2023- 01 . txt
2023- * . {pdf,txt}
2023- final . pdf
2023- * . {pdf,txt}
draft 2023 . docx
Figure 4.1: Examples of glob matching.
1https://en.wikipedia.org/wiki/Glob_(programming)
2https://docs.python.org/3/library/glob.html
25
43.
26 4 MatchingPatterns
Pattern Text Match? Pattern Text Match?
abc “abc” true a*c “abc” true
ab “abc” false {a,b} “a” true
abc “ab” false {a,b} “c” false
* ”“ true {a,b} “ab” false
* “abc” true *{x,y} “abcx” true
Table 4.1: Pattern-matching cases.
4.1 Simple Patterns
Matching is conceptually simple. If the first element of the pattern matches the target string
at the current location, we check if the rest of the pattern matches what’s left of the string.
If the element doesn’t match the front of the string, or if the rest of the pattern can’t match
the rest of the string, matching fails. (This behavior makes globbing different from regular
expressions, which can match parts of strings.)
This design makes use of the Chain of Responsibility design pattern. Each matcher
matches if it can then asks the next matcher in the chain to try to match the remaining text
(Figure 4.2). Crucially, objects don’t know how long the chain after them is: they just know
whom to ask next.
In some cases we only need to know what kind of matching we’re doing: for example, the
* pattern matches any characters. In other cases, though, we need some extra information,
such as the literal text abc or the two alternatives pdf and txt. We therefore decide to
create matching objects that can hold this extra information rather than just writing functions.
Our first matcher checks whether a piece of text like abc matches a string. We call
this class Lit because a fixed string of characters is sometimes called a literal, and it has
a constructor and a match method:
class Lit:
def __init__(self, chars, rest=None):
self.chars = chars
self.rest = rest
def match(self, text, start=0):
end = start + len(self.chars)
if text[start:end] != self.chars:
return False
if self.rest:
return self.rest.match(text, end)
return end == len(text)
chars is the characters to be matched, while rest is responsible for matching the rest of
the text. If rest is None, this matcher is the last one in the chain, so it must match to the
end of the target string.
2023- * . {pdf,txt}
2023- 01 . txt
Literal Any Literal Either
Figure 4.2: Matching with Chain of Responsibility.
44.
4.1 Simple Patterns27
The match method takes the text to be matched as an input along with an optional start
parameter that indicates where matching is to start. This parameter has a default value of
0 (meaning “start at the beginning”), but if this Lit follows other matchers, they need to tell
it where to start looking. To see if this works, let’s write and run a few tests:
def test_literal_match_entire_string():
# /abc/ matches abc
assert Lit(abc).match(abc)
def test_literal_substring_alone_no_match():
# /ab/ doesn't match abc
assert not Lit(ab).match(abc)
def test_literal_superstring_no_match():
# /abc/ doesn't match ab
assert not Lit(abc).match(ab)
Notice that we give tests long, meaningful names to make failure reports from the test
runner easier to read.
We could go ahead and build some more matchers right away, but as [Petre2016] ex-
plains, good programmers build and check the parts of their code that they are least sure
of as early as possible to find out if their entire design is going to work or not. We there-
fore write a test to make sure that chaining works when one literal matcher is followed by
another:
def test_literal_followed_by_literal_match():
# /a/+/b/ matches ab
assert Lit(a, Lit(b)).match(ab)
def test_literal_followed_by_literal_no_match():
# /a/+/b/ doesn't match ac
assert not Lit(a, Lit(b)).match(ac)
Chaining two literal matchers together is unnecessary: we could (and probably should)
write Lit(ab) instead of Lit(a, Lit(b)). However, the fact that these two tests
pass reassures us that our design is working.
Test-Driven Development
Some programmers write the tests for a piece of code before writing the code itself.
This practice is called test-driven development, and its advocates claim that it yields
better code in less time because (a) writing tests helps people think about what the
code should do before they’re committed to a particular implementation and (b) if
people write tests first, they’ll actually write tests. However, research shows that the
order in which the tests are written doesn’t actually make a difference [Fucci2016];
what actually matters is alternating short bursts of testing and coding.
These tests for Lit pass, so we’re ready to move on to wildcards. A * character in our
pattern matches zero or more characters, so if there are no more matchers in the chain,
then this * matches to the end of the target string and match returns True right away. If there
are other matchers, on the other hand, we try matching no characters, one character, two
characters, and so on and see if those other matchers can get us to the end of the string if
we do so. If none of these possibilities succeeds, the overall match fails (Figure 4.3).
45.
28 4 MatchingPatterns
a * .txt
0
no
.txt
a b c
a * .txt
1
no
.txt
a b c
a * .txt
yes
.txt
a b c
2
Figure 4.3: How wildcard matching works.
class Any:
def __init__(self, rest=None):
self.rest = rest
def match(self, text, start=0):
if self.rest is None:
return True
for i in range(start, len(text)):
if self.rest.match(text, i):
return True
return False
Once again we write a few tests before moving on:
def test_any_matches_empty():
# /*/ matches
assert Any().match()
def test_any_matches_entire_string():
# /*/ matches abc
assert Any().match(abc)
def test_any_matches_as_prefix():
# /*def/ matches abcdef
assert Any(Lit(def)).match(abcdef)
def test_any_matches_as_suffix():
# /abc*/ matches abcdef
assert Lit(abc, Any()).match(abcdef)
def test_any_matches_interior():
# /a*c/ matches abc
assert Lit(a, Any(Lit(c))).match(abc)
Either/or matching works much the same way. If the first alternative matches, we try the
rest of the chain. If not, we try the second alternative, and if that doesn’t work either, we
fail:
class Either:
def __init__(self, left, right, rest=None):
self.left = left
self.right = right
self.rest = rest
def match(self, text, start=0):
return self.left.match(text, start) or
self.right.match(text, start)
46.
4.2 Rethinking 29
Ourfirst few tests pass:
def test_either_two_literals_first():
# /{a,b}/ matches a
assert Either(Lit(a), Lit(b)).match(a)
def test_either_two_literals_not_both():
# /{a,b}/ doesn't match ab
assert not Either(Lit(a), Lit(b)).match(ab)
but further testing uncovers a bug:
def test_either_followed_by_literal_match():
# /{a,b}c/ matches ac
assert Either(Lit(a), Lit(b), Lit(c)).match(ac)
def test_either_followed_by_literal_no_match():
# /{a,b}c/ doesn't match ax
assert not Either(Lit(a), Lit(b), Lit(c)).match(ax)
======================= test session starts ========================
test_glob_problem.py F. [100%]
===================== short test summary info ======================
FAILED test_glob_problem.py::test_either_followed_by_literal_match
=================== 1 failed, 1 passed in 0.00s ====================
The problem is that Either.match isn’t using rest properly—in fact, it’s not using rest
at all because it doesn’t know what to pass it as a starting point. Instead of having match
methods return True or False, we need them to return an indication of where the next
match should start so that Either can pass that information along to rest. Before making
this change, we will clear up a bit of technical debt in our code.
4.2 Rethinking
We now have three matchers with the same interfaces. Before we do any further work, we
will refactor using a pattern called Extract Parent Class to make the relationship between
the matchers clear (Figure 4.4). At the same time, each matcher is checking to see if its
rest is None. We can simplify this by creating a class to represent “nothing here”, which is
known as the Null Object pattern.
We Didn’t Invent This
We didn’t invent any of the patterns or refactorings used in this chapter. Instead,
we learned them from books like [Gamma1994; Fowler2018; Kerievsky2004]. And
as [Tichy2010] showed, learning these patterns makes people better programmers.
47.
30 4 MatchingPatterns
Lit
__init__()
match()
Any
__init__()
match()
Either
__init__()
match()
Lit
__init__()
_match()
Any
__init__()
_match()
Either
__init__()
_match()
Null
__init__()
_match()
Match
__init__()
match()
calls
Figure 4.4: Using the Extract Parent Class refactoring.
Our new parent class Match looks like this:
class Match:
def __init__(self, rest):
self.rest = rest if rest is not None else Null()
def match(self, text):
result = self._match(text, 0)
return result == len(text)
Match.rest requires every child class to have a helper method called _match that returns
the location from which searching is to continue. Match.match checks whether the entire
match reaches the end of the target string and returns True or False as appropriate.
Our new Null Object class looks like this:
class Null(Match):
def __init__(self):
self.rest = None
def _match(self, text, start):
return start
Null objects must be at the end of the matching chain, i.e., their rest must be None, so
we remove the rest parameter from the class’s constructor and pass None up to the parent
constructor every time. Since Null objects don’t match anything, Null._match immediately
returns whatever starting point it was given. Every other matcher can now pass responsi-
bility down the chain without having to test whether it’s the last matcher in line or not.
With these changes in place, our literal matcher becomes:
class Lit(Match):
def __init__(self, chars, rest=None):
super().__init__(rest)
self.chars = chars
def _match(self, text, start):
end = start + len(self.chars)
if text[start:end] != self.chars:
return None
return self.rest._match(text, end)
Lit’s constructor calls the constructor of its parent class to initialize the things that all
classes share, then adds the data specific to this class. It returns None for “no match” or
whatever self.rest returns If this object’s rest is an instance of Null, this result will be
the index after the overall match.
48.
4.3 Summary 31
Asbefore, the matcher for * checks what happens if it matches an ever-larger part of
the target string:
class Any(Match):
def __init__(self, rest=None):
super().__init__(rest)
def _match(self, text, start):
for i in range(start, len(text) + 1):
end = self.rest._match(text, i)
if end == len(text):
return end
return None
(The exercises will ask why loop has to run to len(text) + 1.) Finally, the either/or matcher
that prompted this refactoring becomes:
class Either(Match):
def __init__(self, left, right, rest=None):
super().__init__(rest)
self.left = left
self.right = right
def _match(self, text, start):
for pat in [self.left, self.right]:
end = pat._match(text, start)
if end is not None:
end = self.rest._match(text, end)
if end == len(text):
return end
return None
Looping over the left and right alternative saves us from repeating code or introducing a
helper method. It also simplifies the handling of more than two options, which we explore
in the exercises.
Crucially, none of the existing tests change because none of the matching classes’
constructors changed and the signature of the match method (which they now inherit
from the generic Match class) stayed the same as well. We should add some tests for
Null, but we have now met our original goal, and as the exercises will show we can easily
add matchers for other kinds of patterns.
4.3 Summary
Figure 4.5 summarizes the key ideas in this chapter; we will see the Null Object and Chain
of Responsibility design patterns again.
49.
32 4 MatchingPatterns
glob pattern text
matchers
class base class
Chain of
Responsiblity
is a
matches
written as
made up of
each implemented as
of
cooperate
using
Null Object
ends
with
Figure 4.5: Regular expression matching concept map.
4.4 Exercises
Looping
Rewrite the matchers so that a top-level object manages a list of matchers, none of which
know about any of the others. Is this design simpler or more complicated than the Chain of
Responsibility design?
Length Plus One
Why does the upper bound of the loop in the final version of Any run to len(text) + 1?
Find One or More
Extend the regular expression matcher to support +, meaning “match one or more charac-
ters”.
Match Sets of Characters
1. Add a new matching class that matches any character from a set, so that
Charset('aeiou') matches any lower-case vowel.
2. Create a matcher that matches a range of characters. For example, Range(a, z)
matches any single lower-case Latin alphabetic character. (This is just a convenience
matcher: ranges can always be spelled out in full.)
3. Write some tests for your matchers.
Exclusion
1. Create a matcher that doesn’t match a specified pattern. For example, Not(Lit(abc))
only succeeds if the text isn’t “abc”.
2. Write some tests for it.
50.
4.4 Exercises 33
MakeRepetition More Efficient
Rewrite Any so that it does not repeatedly re-match text.
Multiple Alternatives
1. Modify Either so that it can match any number of sub-patterns, not just two.
2. Write some tests for it.
3. What does your implementation do when no sub-patterns are specified?
Returning Matches
Modify the matcher so that it returns the substrings that matched each part of the expres-
sion. For example, when *.txt matches name.txt, the library should return some indication
that * matched the string name.
Alternative Matching
The tool we have built implements lazy matching, i.e., the * character matches the shortest
string it can that results in the overall pattern matching. Modify the code to do greedy
matching instead, and combine it with the solution to the previous exercise for testing.
5
Parsing Text
• Parsingtransforms text that’s easy for people to read into objects that are easy for
computers to work with.
• A grammar defines the textual patterns that a parser recognizes.
• Most parsers tokenize input text and then analyze the tokens.
• Most parsers need to implement some form of precedence to prioritize different
patterns.
• Operations like addition and function call work just like user-defined functions.
• Programs can overload built-in operators by defining specially-named methods that
are recognized by the compiler or interpreter.
Terms defined: abstract syntax tree, concrete class, CSV, grammar, JSON,
operator overloading, parser, token, tokenizer, YAML
We constructed objects to match patterns in Chapter 4, but an expression like
2023-*{pdf,txt} is a lot easier to read and write than code like Lit(2023-,
Any(Either(pdf, txt))). If we want to use the former, we need a parser to con-
vert those human-readable strings into machine-comprehensible objects.
Table 5.1 shows the grammar our parser will handle. When we are done, our parser
should be able to recognize that 2023-*.{pdf,txt} means the literal 2023-, any charac-
ters, a literal ., and then either a literal pdf or a literal txt.
Please Don’t Write Parsers
Languages that are comfortable for people to read and write are usually difficult for
computers to understand and vice versa, so we need parsers to translate the former
into the latter. However, the world doesn’t need more file formats: please use CSV,
JSON, YAML, or something else that already has an acronym rather than inventing
something of your own.
5.1 Tokenizing
Most parsers are written in two parts (Figure 5.1). The first stage groups characters into
atoms of text called “tokens“, which are meaningful pieces of text like the digits making up
a number or the letters making up a variable name. Our grammar’s tokens are the special
characters ,, {, }, and *. Any sequence of one or more other characters is a single multi-
letter token. This classification determines the design of our tokenizer:
35
53.
36 5 ParsingText
Meaning Character
Any literal character c c
Zero or more characters *
Alternatives {x,y}
Table 5.1: Glob grammar.
1. If a character is not special, then append it to the current literal (if there is one) or start
a new literal (if there isn’t).
2. If a character is special, then close the existing literal (if there is one) and create a token
for the special character. Note that the , character closes a literal but doesn’t produce
a token.
The result of tokenization is a flat list of tokens. The second stage of parsing assembles
tokens to create an abstract syntax tree (AST) that represents the structure of what was
parsed. We will re-use the classes defined in Chapter 4 for this purpose.
Before we start writing our tokenizer, we have to decide whether to implement it as a
set of functions or as one or more classes. Based on previous experience, we choose the
latter: this tokenizer is simple enough that we’ll only need a handful of functions, but one
capable of handling a language like Python would be much larger, and classes are a handy
way to group related functions together.
The main method of our tokenizer looks like this:
def tok(self, text):
self._setup()
for ch in text:
if ch == *:
self._add(Any)
elif ch == {:
self._add(EitherStart)
elif ch == ,:
self._add(None)
elif ch == }:
self._add(EitherEnd)
elif ch in CHARS:
self.current += ch
else:
raise NotImplementedError(fwhat is '{ch}'?)
self._add(None)
return self.result
This method calls self._setup() at the start so that the tokenizer can be re-used. It
doesn’t call self._add() for regular characters; instead, it creates a Lit entry when it
2023-*{pdf,txt}
[Lit, 2023-]
[Any]
[Either, pdf, txt]
Lit
2023- Any
Either
pdf txt Null
tokenizer parser
Figure 5.1: Stages in parsing pipeline.
54.
5.1 Tokenizing 37
A
*
{
B
,
C
}
A*{B,C}
currentresult
A [ ]
ch
[[Lit, A], [Any]]
[[Lit, A], [Any], [EitherStart]]
B [[Lit, A], [Any], [EitherStart]]
[[Lit, A], [Any], [EitherStart], [Lit, B]]
C [[Lit, A], [Any], [EitherStart], [Lit, B]]
[[Lit, A], [Any], [EitherStart],
[Lit, B], [Lit, C], [EitherEnd]]
Figure 5.2: Steps in tokenizing a string.
encounters a special character (i.e., when the current literal ends) and after all the input
has been parsed (to capture the last literal).
The method self._add adds the current thing to the list of tokens. As a special case,
self._add(None) means “add the literal but nothing else” (Figure 5.2):
def _add(self, thing):
if len(self.current) 0:
self.result.append([Lit, self.current])
self.current =
if thing is not None:
self.result.append([thing])
Finally, we work backward to initialize the tokenizer when we construct it and to define
the set of characters that make up literals:
CHARS = set(string.ascii_letters + string.digits)
class Tokenizer:
def __init__(self):
self._setup()
def _setup(self):
self.result = []
self.current =
A Simple Constant
The code fragment above defines CHARS to be a set containing ASCII letters and digits.
We use a set for speed: if we used a list, Python would have to search through it each
time we wanted to check a character, but finding something in a set is much faster.
Chapter 17 will explain why the word “ASCII” appears in string library’s definition of
characters, but using it and string.digits greatly reduces the chances of us typing
abcdeghi...yz rather than abcdefghi...yz. (The fact that it took you a moment
to spot the missing letter ‘f’ proves this point.)
We can now write a few tests to check that the tokenizer is producing a list of lists in
which each sub-list represents a single token:
venomous snake there,but the only viper, except Russell’s viper, a
much larger snake.
Only twice could I observe the toxic effects of the Echis carinata at
present (1882) in the collection; both cases being in hot weather. It has
so far conformed to circumstances in England, as to consent to dine on
small white mice, failing scorpions. In the first case it struck the mouse
savagely as soon as it was dropped into the cage, and the mouse died
in less than two minutes. Echis approached it stealthily and timidly, but
having at last got courage to seize it, ate it very quickly; and as the
snake moved and dragged it, the mouse appeared to be quite stiff in
that short time. On the second occasion, it bit a mouse on the leg, and
it was five minutes dying. At first only the leg was paralyzed; then a
spasm followed, and the mouse fell over and lay extended flat and still
as if dead; but presently a spasmodic convulsion followed. It again
appeared to be dead, and the little viper approached; but on a very
slight spasm receded swiftly, not once taking its eyes off the mouse,
which was dying slowly. The viper was at least five minutes swallowing
this, and as if it did not much care about it. One must argue, therefore,
that the charge of venom had been scantily expended, as the difference
between this and the previous victim was remarkable. Echis poison has
been seen to take instantaneous effect. The small Vipera atropos from
the South African mountains is also astoundingly virulent. One in the
collection in 1881 struck a mouse as soon as it arrived, and death
occurred in fifty seconds by the watch. A large store of poison must
have accumulated during its journey and since its previous meal.
One more African snake must be mentioned before I conclude the
painful duty of describing the inevitable—though happily short—
sufferings inflicted by venomous serpents.
Three young Najas, the well-known Ring Halsschlange of South
Africa, were brought in the spring of, I think, 1877. They were very
black and very shy, and for a long while one could see nothing more of
them than three little heads in a row peeping out from under their
blanket, and watching with their large round black eyes, but vanishing
like a shot at your approach. ‘They cut away the moment you go near
them,’ said the keeper. When they did give us an opportunity of looking
at them, we found that one was quite black, and another was speckled
with white; they erected their heads and distended their necks
57.
defiantly. Their eyeshad a white rim round them, and were bright and
undeniably beautiful, even though belonging to a venomous snake.
Whether because they were young and inexperienced, or naturally
stupid, I could not decide but of all the snakes none ever went so
awkwardly to work in feeding, or put their victims to such unnecessary
torture, as did these ridiculous little Najas. The feeding observations
were made in August, when they had grown considerably, and had
become accustomed to their home. They seemed to bite the prey
anywhere without much effect, sometimes retaining it in their mouth,
and at other times beginning at once to eat it. One frog was ten
minutes from the time it was struck until it was swallowed, and for no
reason beyond the feeder’s awkwardness. The little snake began at a
hind leg, and not being able to get the frog into its mouth, put it down
and began again at the side, but with no better result, the legs being in
the way. Then he gave it up and let the frog go, and presently his
comrade struck the half-dead thing and took five minutes to eat it. One
might decide from this that frogs were not their natural food; but with
very young sparrows the same mismanagement was observable. The
bird was awkwardly bitten on the tip of the wing, and the snake held it
helplessly for a quarter of an hour while the bird was struggling
violently. Not getting good hold, the snake put it down and began
again, so that the poor little sparrow was twenty minutes in being
swallowed, gasping to the last, and evidently only very feebly poisoned.
One of the Najas bit his companion, and held on for about ten minutes,
and for no reason whatever that one could discern. In no other
venomous snakes have I seen such prolonged suffering caused by such
stupidity or bungling as in those young African ‘Ring Hals.’ Their fangs
are, however, exceedingly short, as I found on examining a dead one,
and this may account for the slow effect of them.
Three other heads were often seen in a row peeping out, but
belonging to harmless ‘glass snakes,’ and there was intelligence in their
looks; for they recognised the keeper, and advanced to the glass
whenever he passed, asking for their dinner as plainly as little snakes
could ask. A Heterodon exhibited equal intelligence when it was dinner-
time, and sprang at the glass when he saw the keeper coming. Some of
the pythons display intelligence too, on feeding days, but of quite an
epicure form. One day in May 1876, on remarking that the pythons
58.
were disinclined toeat, Holland said ‘they were waiting for young
ducks,’ only elderly birds being in their cage at the time. Even in
summer they don’t eat the old ducks so eagerly, because the large,
hard quills annoy them. A bunch of these quills passes undigested. Hair
or feathers in a desiccated mass pass through the snakes, and
occasionally, when they are not in health, digestible but undigested
substances too, also the beaks of the ducks.
Vegetable substances have been found in snakes, from which it has
been argued that they sometimes eat vegetables. But it rather argues
that they don’t digest vegetables, which have probably been swallowed
in the stomach of a rabbit or some other herbivorous animal that they
have caught.
An indifference to food was noticeable in the snakes in ungenial
weather. One cold, raw, foggy day in October 1873, a python caught a
duck and partially coiled it, but so feebly that the bird, after passively
submitting for a time, at last disengaged her feet and walked away to
shake herself, and then turn and stare as if to discover what possibly
had kept her there.
A similar disinclination to exert themselves was seen that same
chilly day in the largest cage, where were three large pythons. One of
them having killed a duck, could not get a satisfactory hold of its head,
and let go repeatedly. Another held a duck, but not to crush it or hurt
it; for it, like the one above named, only gazed deliberately around, and
as if asking the meaning of its detention. A third duck was put into the
den for the third python, who, however, only lazily stared at it and
made no attempt to seize it; while the bird gazed in astonishment at
the one in the embrace of the other snake, as if to inquire, ‘What are
you doing there?’ Presently this duck also got away, and was again
caught and only partially coiled. The python seemed too large and fat
to constrict so small a thing as a duck. It was like tying up a pill-box
with a rope. Some of the spectators expressed satisfaction that the
duck was not more tightly coiled, and hoped it would succeed in getting
away (the duck was not worth two shillings, the python could not be
bought for twenty pounds), and were far more horrified when a
vigorous constrictor caught and killed its prey in one flash, as when an
extended watch-spring flies back to its original position. But a half-
constricted creature does suffer, and happily this does not often occur,
59.
the chilly weatherthat day diminishing ophidian energy considerably. A
gentleman, disappointed because they did not eat, and wishing to
assign some reason for such unaccountable abstinence, remarked to his
friend, ‘I have an idea they sting themselves.’
Watching these gigantic ophidians on one of those half-wintry days,
it happened that two of them were lazily gliding, partially hidden by
their blanket, and with neither heads nor tails visible, so that the two
bodies seemed as only one snake. Two youths stood watching and
vainly endeavouring to calculate the numbers of feet or of yards which
were entwined and entwisted in those moving coils. Portions and loops
of two other pythons in the same cage were visible beyond the rug, but
only one head of all the four snakes was to be seen; and to distinguish
to which of the gliding, shining curves that head belonged, was
impossible. ‘It seems to me that snake’s such a length that he doesn’t
know the other part belongs to him,’ remarked one boy to his friend.
‘I don’t think he knows where it is,’ returned the other boy
sympathetically.
Not a little are the keepers sometimes tried in replying to the
inquiries of visitors desirous of improving their minds. Let me repeat
one or two conversations overheard on those Fridays.
‘Is that duck put in there for the snake to eat?’ asked a respectably
dressed man of the keeper on one of those autumnal days, when a
duck sat pluming itself as if settling itself for the evening.
‘Yes, sir,’ replied the keeper.
‘Will he swallow it whole?’
‘Yes, sir.’
‘Choke him! I should think?’
‘No, sir; no—it won’t choke him.’
The man studied the duck, and studied the size of the python’s head
and throat for some time. The duck apparently going to rest, but not
quite reconciled at so many persons intruding upon her, the man looked
disappointed, and again began:
‘Now is that duck charmed, sitting there?’
‘I should think, sir, she was not at all charmed with the prospect,’
sedately replied Holland.
60.
‘Does that duckknow it’s going to be eaten?’ then inquired the man
after fresh scrutiny.
‘No, sir,’ returned the keeper with the utmost gravity.
‘That snake don’t seem to be hungry,’ then said the disappointed
observer.
‘No, sir. He’ll eat well enough next Friday. He’s going to change his
skin.’
‘Oh!’ said the man to a boy by his side, satisfied, though still rather
puzzled, ‘that snake’s going to change his skin next Friday.’
Though there are always on an average fifty snakes in the Reptile
House, and on an average each casts its coat three times a year, the
visitors are for the most part much mystified about this phenomenon. A
snake that had just completed a new toilet had a portion of the old
slough still adhering to its tail, when a boy drew attention to it, saying,
‘Papa, that snake is all ragged and torn on its tail.’
‘Yes, my dear, it is casting its tail.’ Papa must have been reading
Aristotle, who wrote: ‘Tails, also, of serpents and lizards when cut off
are reproduced.’ With regard to the reproduction of their eyes, Aristotle
spake more cautiously. ‘It is reported that the eyes of serpents, if dug
out, will be reproduced.’ But, on the contrary, the eyes of snakes are
easily injured, and not easily healed; snakes are therefore frequently
seen partially blind. As need scarcely be said, only lizards ‘reproduce’ a
tail that has been accidentally abridged; and the repair is after all only a
boneless one. The truncated member gradually heals, and by and by a
short point is again formed, but can always be recognised as a
repaired, and not the original, tail; and as far as I have been able to
observe, viz. for three or four months, no bone was reproduced.
Probably also a snake’s tail might heal in the same way, and to a casual
observer appear quite perfect; but the anatomical structure in either
case would not, I imagine, be restored.
That boy was not far wrong when he said he thought the python did
not know which was its own tail. At all events, it is not endowed with
much external sensation, as one might judge by the way in which the
rats and guinea-pigs take liberties with it. This must be owing to the
thickness of the cuticle, because, as we have seen in the constricting
snakes, there is keen muscular sensibility in the tail. I may cite an
61.
instance of eachcase. One day a young rabbit caught hold of a small
python with its teeth and held firmly on. The reptile was moving across
the cage, and did not appear to feel any hindrance. Indeed, being
much the stronger of the two, the persistent bunny was compelled to
hop along at the same pace, still holding on by its teeth. But presently,
from the position of the snake, the rabbit was obliged to let go, when it
next caught hold of the tip of the python’s tail, and again holding tight,
hopped after the retreating reptile as if enjoying the joke. In this case I
do not think the snake was conscious of the insult, as perhaps the
rabbit had hold of the skin only.
On the other occasion a guinea-pig was biting a coiled and passive
constrictor, Python sebœ. The snake wished to be quiet, but piggy got
among its coils and worried it, hopping over it and biting its tail. The
python on this, moving only the end of its tail, pushed away the
guinea-pig, which soon returned to the attack. The snake again gave
the little animal a caudal hint that his fidgeting was annoying; but as
the guinea-pig did not take the hint, and still nibbled and teased the
snake, the latter with two coils of the tail put an end to the annoyance,
not once turning its head, but just tucking up its persecutor in the end
of its tail as you might tuck up a parcel under your arm. The python
was not hungry, and took no more notice of the offender, though thus
effectually punishing the offence with the last two feet of its practical
tail. Could we suppose such a quality as muscular intelligence, we
might think the tails of those constricting snakes were surely endowed
with it. As in other instances already described in chaps. xi. and xii., the
eyes took no part in directing the movements of the snake; the whole
nine or ten feet of the animal remaining passively coiled, while only the
extremity of the tail exerted itself. When reptiles are in a partially torpid
condition, their sensations are slow; when hibernating, they are
reduced to a minimum. At such times, the creatures being half dead,
they may be maimed or injured without any apparent effect. Rats have
been known to attack and nibble snakes under these circumstances,
and even to eat bits out of them, the snakes being at the time
unconscious of injury, though possibly dying from the after effects.
A good deal of very interesting matter might be added on the
economics of the reptilian ménage, the mode of ventilating and
warming it, the cost of its larder, and the best means of preserving the
62.
health of theinmates. There are, besides, some incidental experiences
not devoid of sensationalism in connection with snake guardianship, but
my own herpetological experience does not extend beyond the keeping
of pet lizards, including blindworms. I may add a word, however, in
reply to some often-heard lamentations of disappointed spectators who
object to the coverlets, after sometimes waiting in vain to see the
snakes emerge from beneath them.
‘Those horrid blankets! Why not give the snakes moss or hay in
their cages? or turf and sand and dead leaves? Much more natural for
them than those woollen rugs.’
I, too, may have echoed such plaints until a better comprehension
of ophidian nature showed the wisdom of what is certainly a somewhat
disappointing arrangement. And those who have honoured these pages
with a patient perusal, and discovered the nervous timidity and
sensitiveness of these reptiles, their proneness to reject or to disgorge
their food, to injure themselves or each other when molested, not to
mention the danger of meddling with the venomous kinds and the easy
escape of the swifter snakes, will admit the importance of providing
them with such retreat and shelter as can be most speedily arranged,
and which will secure the least annoyance to the terrified serpents
while the keepers are doing their best to preserve order and
cleanliness.
The allusion to lizards tempts me to add a word or two on the
exceptional species which has lately become an inmate of our
Zoological Gardens. There are certain features in it so much in common
with viperine snakes, that I may be pardoned for dragging a lizard into
these pages. I allude to the Heloderm (Heloderma horridum) from
Mexico, presented to the Zoological Society in July 1882 by Sir John
Lubbock. Its advent was an event in reptilian annals; and being
surrounded by a halo of curiosity, it claims a passing notice. We have
been at some pains to exonerate saurians from the evil character which
our ancestors were apt to give them; but suddenly—and to the surprise
of even some herpetological authorities—there comes a lizard that with
one grip of his jaws caused a frog to fall dead in a moment, and a
guinea-pig in three minutes, the symptoms appearing to be the same
as those produced by deadly snakes. The Heloderm is ‘said’ to be
furnished with poison glands in both jaws! But until a dead specimen
63.
has been furtherexamined and described, the signification of ‘poison
gland’ must be restricted. Its teeth—many and strong—are grooved
with a deep furrow; its salivary glands are largely developed; and under
excitement a thick, acrid secretion flows abundantly from its jaws. Yet
so far as present observations enable us to form an opinion, the reptile
does not use these formidable teeth to secure its prey, or even in
feeding. It did not devour the victims of its bite, nor has it since killed
any creature for the express purpose of eating it. Up to the date at
which I write (Oct. 1882), eggs have formed its chief diet, varied by an
occasional dead mouse. Now it certainly does not require deeply-
grooved teeth and venomous saliva to bite raw eggs and dead mice.
Nor does the noxious secretion flow continuously from its gums in
repose, but abundantly so when irritated.
Though a stranger in England, this lizard was known more than two
hundred years ago. Hernandez, in his Nova Animalium Mexicanum,
published at Rome in 1651, described its bite as ‘hurtful, but not
deadly;’ and that it was ‘more dreadful in appearance than reality.’ Its
Mexican name, Acaltetepon, is (or was then) applied to all large and
suspicious-looking lizards. Scorpione is its modern name. As Heloderma
horridum was awarded plenty of space in the journals at the time of its
arrival, full accounts of it will be found elsewhere; it is introduced here
merely as one of the venomous reptiles that form the chief subjects of
this chapter, and to trace its analogy with them. In its slow, stealthy
movements there is the same striking contrast between the Heloderm
and most other lizards, that there is between the deadly vipers and the
active colubrine snakes; and the inquiry suggests itself, Can the venom
elaborated in their system so act upon themselves as to produce this
habitual lethargy? Drowsiness and coma are almost invariable effects of
snake venom in the blood, and why is it that the deadly serpents are so
constitutionally different from others? The Heloderm has a round,
heavy tail, of no service to it in swimming, and short, weak fingers, ill
suited to climbing; and it passes its lethargic existence on the sandy
plains of Mexico, manifesting in its actions, or rather in its inactivity and
stealthiness, a conscious timidity and cowardice. Motionless for hours,
with an impulse to retreat if molested, but attempting to bite if
angered, its noxious saliva would seem to be rather protective than
aggressive. It may have formidable enemies at home; and by all we see
64.
of it here,it does not use its teeth as a means of obtaining food. In this
respect, therefore, it is an exception to deadly serpents, and cannot
take its venom into its stomach as they do. And, again, the remarkable
development of its tongue suggests a peculiarity of food. In lapping the
egg, the action of it is apparently perfected by practice; the tongue is
twisted, extended, twined under, then over, now used as a shovel, a
scoop, or a broom, as occasion requires. It is the very reverse of what I
noticed in some other lizards feebly lapping up an egg (see p. 71), for
in a most expeditious manner does Heloderm cause its raw eggs to
disappear.
A word à propos of its name horridum, supposed by many to refer
to its objectionable qualities. Unfortunately the word ‘horrid’ has almost
entirely lost its original signification and become mere slang in English.
But when Wiegmann assigned it the name of Heloderma horridum in
1829, ‘horrid’ was understood according to its original meaning, from
horridus, rough, rugged, etc.; and as this reptile has a remarkable skin,
dotted over with little prominences, like knobs or warts (hence its
generic name, Heloderma, warty skin), there can be but little doubt as
to the intention of horridum. In a communication to Knowledge (Sept.
29), I ventured to call this the ‘Warty-skinned Lizard,’ in consequence of
the confused accounts of it which have appeared in print. There are
several other warty-skinned or ‘tuberculous lizards.’ The specific
horridus, as applied to the South American Crotalus, also signified its
terrible or dreadful character, and not the ‘horrid’ which spectators
apply indiscriminately to snakes and their blankets.
With the rapid advance of ophiology comes the splendid new home
for snakes which will shortly grace our Zoological Gardens; and in
taking leave of my readers, I cannot offer them a kinder wish than that
their visits there to observe the snakes will be productive to them of as
much pleasure as has been mine to describe them.
65.
INDEX.
LIST OF ABBREVIATIONS.
Am.,American; Ass., Association; Br., British; co., cobra; Con.,
Convention; cons., constrictor; cro., crotalus; cy., cyclopædia; C.M.Z.S.,
Corresponding Member of the Zoological Society; F.Z., Fellow of the
Zoological Society; hist., history; nat., natural; N.Y., New York; py., python;
Soc., Society; s., snake; ss., snakes; ser., serpent; v., viper; vs., vipers;
U.S., United States; z., zoological; Z.G., Zoological Gardens; Trans.,
Transactions; Proc., Proceedings.
A
‘Aberfoyle’ (the barque), sea-ser. seen from, 249.
Abnormal development, of teeth, 67;
of sea-sers., 265, et seq.;
of hair, 302;
two heads, 189.
Abnormal, health in captivity, 84, 440, 450, 457, 565;
gestation, 437, 439, et seq., 459, 466, 505.
Academy of Sciences, Paris, 444.
Acaltetepon, the, 590.
‘Account, of the Beasts in Virginia’, 280;
of the rattlesnake, 275.
Acrobats, 181, 196, 198, 214, et seq., 219, 239, 472, et seq.
Adaptation, of organization, 70, 135;
66.
of habits totemperature, 159, et seq. (see hibernation);
of coils, 200, 204;
of ribs in progression, 207;
of form, 215.
Adaptive development, of head bones, 31, 34;
of spine bones, 70;
of windpipe, 132;
of salivary glands, 342, 350, 537, 557;
of teeth, 348, 350, 364, 408.
Adipose tissue, 394.
Admiralty, the, report of a sea-ser. to, 255, 259.
Advance of Ophiology, 3, 21, 75, 81, 191, 273, 353, 363, 372, 443, 485, 515,
et seq.
Ælian, an error traced to, 191.
Æsculapius, his remedies, 522.
Africa, range of sea-s., 236.
Air-bladder, the, 262.
Air-breathers, 44, 146, 222, 265.
Air-tube, 132, 135, 137, et seq. (see glottis, respiration).
‘Albert Nyanza’, the, 358.
Alceste, Voyage of the, 111, et seq.
Alcohol a popular remedy, 548, et seq.
Aldrovandus, 102;
his work, 272.
Alexander the Great, 407 (see Bucephalus).
Alligators, 43, 261.
Amazon, the Jararaca, 421;
Wallace’s ‘Travels in’, ib.;
anaconda from, 455.
‘American Agriculturist’, the, 485.
American Ass. for the Advancement of Science, 485, et seq., 572;
secretary to, 490.
American colonies, the, 284.
‘American Naturalist’, the, 93, 151, 308, 310, 450, 485, et seq., 490;
67.
editor of, 485(see Putnam).
American sea-ser., 252.
Ammonia an approved remedy, 533, 547.
Amphibia, the, 46.
Amsterdam, py. at, 437.
Anatomy, of head, 30;
of jaws, 31, 34;
of the spine, 209;
of the rattlesnake, 275.
‘Anatomy of the Vertebrates’, 67, 119, 131, 143, 147, 180, 184, 196, et seq.,
212, 336 (see Owen).
‘Anecdotes of Serpents’, 216, 523.
‘Animal Biography’, 134.
Animal kingdom, 51.
‘Animal Physiology’, 147, 210 (see Carpenter).
‘Animal Physiology’ (Roget’s), 120, 195 (see Roget).
‘Animal World’, the, 169.
‘Animalium Mexicanum’, 190, 590.
‘Annales des Sciences Naturelle’, 78, 91, 442, 444.
Antennæ of insects, 118, 120, 126, et seq.
Antidotes, 275;
Fayrer’s definition of, 533;
various reputed ones, ib., 534, 539, 547, et seq.
Apodal, ss. are, 206.
Appendages, mythical, 101;
caudal, 172, et seq.;
epidermal, 315;
‘horns’, 320;
crest, 325;
tentacular, 326;
as ‘claws’, ib. (see rattle, epidermis).
Aquatic ss., 53, 145, 150, 174, 221, 225, et seq., 233, et seq., 401, 423, 453,
495, 496.
Arabs chew herbs, 525.
68.
Archeopteryx, 44.
Aristotle, hisname for reptiles, 45;
bite of s., 96;
marine ss., 244;
vs., 432, et seq.
Arius, the, 489.
Armadillo, 413.
Association, Am., 485, et seq.
Association, Br., 42.
Atlantic, the, 249;
sea-ss. in, 238;
sea-ser., 252.
Atrophied limbs, 326.
Atter, ætter, 479.
‘Aunt Judy’s Magazine’, 21, 72, 303, 312, 333, 470.
Aural: powers of anguis fragilis, 476;
apparatus of ground ss., 527.
Australia, s.-hunting in, 167;
sea-s., 236, 246.
‘Australia, Snakes of’ (see Krefft).
Axolotl, 44.
B
Bacon, Lord, quoted, 49, 57;
a poor naturalist, 99.
Bagrus, the, 489.
Baird of U.S. on the bull-s., 502.
Baker, Owen, a sea-ser. seen by, 257, et seq.
Baker, Sir Samuel: v. fangs, 357.
Balance of Nature, 17, 57.
Balfour’s ‘British India’, 74, 76, 86.
Balfour’s ‘Cyclopædia of India’, 86.
Bancroft, H. H., ser.-worship, 514.
69.
Banks, Sir Joseph,action of ribs, 207;
letter to Waterton, 418.
Bard of Avon, the, 97.
Bartlett, Mr. A. D., on the sea-ser., 255, 261, et seq.;
presenting a v., 322;
buying an anaconda, 455.
Barton, Benjamin Smith, on the Cro., 299.
Bartram (Mr., of U.S.): a ‘roaring’ s., 155;
‘cluster of teeth’, 371.
Bates (H. W.): a cannibal s., 39;
Jararaca, 421.
Batrachia, the, 51.
Battues of ss., 286, et seq., 289.
Beal (J. W., of U.S.): sound of the rattle, 309.
Beaufort, Duke of (A.D. 1709), 173.
Beaumont and Fletcher, quotation from, 487.
Beauvoir, Palisot de, his testimony, 488.
Bell, Prof. Thomas, 4;
food of ring-s., 74;
his tame s., 76;
his ‘British Reptiles’, ib.;
a trustworthy herpetologist, 78;
ss. drink, ib., 91;
editor ‘Zoological Journal’, 140;
on anguis fragilis, 453;
gestation of anguis fragilis, 466;
sloughing of anguis fragilis, 481;
the maternal refuge, 493.
Bellowing s., 155, et seq.
Ben Jonson, 99.
Bengal, Bay of, range of sea-ss., 236.
Berkeley; rattlesnake remedies, 538.
Beverley, Colonel: a stinging tail, 174;
his ‘History of Virginia’, 281;
a severed head biting, 390.
70.
Bezoar, 523.
Bibron, 80(see Duméril).
Bingley, a boa feeding, 134.
‘Blackie’, 458, et seq.
Bladder, the swim-, 145.
Blake (Colonel, of Virginia), a rattlesnake, 284.
Blanket, swallowed, 35;
disappointing, 588.
Blowing: the, vi. 152;
as ‘puffing’, 148;
like a bull, 155.
Bluets, 65.
Bond, Rev. H., his testimony, 491.
Bones: of the head, 30;
intermaxillary, 31;
of spine, 178, 198, 213;
in Deirodon, 67;
of tail, 183;
of ‘claw’, 220 (see jaw, vertebræ, tails, etc.).
Bonnat, 228, 383.
Borneo, Crotalidæ in, 386.
Boston, U.S., sea-ser. off, 251.
Buffon, 197;
his era, 383.
Bourbon, Isle of, py. incubating in, 444.
Bournemouth, lizards at, 164;
a capture, 458.
Bourrelets, of the rattle, 304.
Bowerbank, a two-headed s., 190.
Braden, J. G., Esq.: specimens of rattle, 306.
Brazil, tree s. in, 219;
‘Discourse’ of, 271;
names of Crotalidæ, 277;
specimens from, 339, 360;
71.
Jararaca in, 396;
‘Travels’in, 397;
experiments in, 406;
vernaculars of, 419, 423, et seq.
Breathing: irregular, 142, et seq.;
sometimes partial, 144;
suspension of, 145, 161, 168, 253;
as ‘puffing’, 149, 155 (see hibernation, respiration).
Bridgewater Treatise, 120.
British Ass., 42.
British Guiana, Hist. of, 420.
British India (see Günther, Fayrer, Balfour, Nicholson, etc.).
British Museum, 19, 131, 291, 377, 405, 418;
Dr. Günther’s catalogue of ss. at, 354.
‘British Reptiles’ (Bell’s), 76, 140, 453 (see Bell).
Brittain, Mr., evidence of, 504.
Broderip, Dr. J., a naturalist, 113;
his works, ib.;
on the larynx, 135;
quoted by Gosse, 112;
observations by, 134, et seq.
Browne, Sir Thomas: ‘Vulgar Errours’, 171;
a two-headed s., 191;
the maternal refuge, 487.
Bruton, Dr. Lauder, 552.
Buckland, Frank: his visit to Chelsea, 13;
a ‘flannel sausage’, 36;
the Coronella, 83, 435;
a waggish s., 104;
a mistake of, 115;
sensitiveness of tail, 183;
a sea-s., 238;
a ribbon fish, 250;
sea-ser., 255, et seq., 264;
two-headed s., 189;
reserve fangs, 346;
s. eggs, 437;
anaconda, 455;
72.
invited, 464;
obituary noticeof ‘Cleo’, 515.
Bull -frog, 156.
Bullen, George, Esq., of Br. Museum, 19.
Bulletin, U.S., Zoological ‘Reports’, 291, 309, 388.
Burman coast, sea-ss. range, 238.
Burrowing ss., 46, 53, 188, 459, 468.
C
Caledonia, New, sea-ss. at, 238.
Cannibalism, 37, 182, 199, 401, 562;
sometimes unintentional, 38, 572;
common among the Elapidæ, 567, 571.
Cañons, ss. in, 162.
Cantor, Dr. Theodore, sea-ss. drinking, 82;
quotes Schlegel, 90;
tongue of sea-ss., 125;
sea-ss. asleep, 169;
on pelagic sers., 235;
their poison, 243;
their food, 245;
number of species, 246.
Cape, the, sea ser. at, 252, 259.
Capybara, the, 229.
Carbolic acid, kills ss., 544;
a remedy for the venom, ib.
Carinate scales, 319 (see ‘keel’).
‘Carolina, History of’ (see Lawson).
‘Carolina, Natural History of’ (see Catesby).
Carpenter, Dr., F.R.S., etc., links and transitions in nature, 44;
his opinion of Bacon, 99;
lungs of ss., 143;
hibernation, 165;
length of spine, 210;
insalivation of food, 352;
73.
vegetable protectives, 539.
Carver,Jonathan, a ‘blowing’ s., 153;
large swarms of ss., 225;
the maternal refuge, 487.
Catesby, Mark: an egg robber, 63;
the ‘blowing v.’, 152;
‘Horn ss.’, 174, 189;
‘water vs.’, 226;
rattle-ss., 284;
‘hog-nosed’ ss., 410;
Indian remedies, 538.
Catlin, George, rattlesnake battues, 287;
an alarm, 391;
conjugal ss., 502;
Indian superstitions, 509.
Cats, their whiskers as feelers, 124;
tenacious of life, 559;
resist venom, ib.
Caudal, eloquence, 155, 179, 311, 587;
appendages, 170, et seq. 174 (see tail, rattle, etc.).
Caverns, the retreat of ss., 287, 443.
Caves, sacred, 509;
abode of ss., 124, 162, 166, 288.
Centipedes, legs of, 212.
‘Ceylon, History of’ (see Tennant).
Chalk epoch, 262.
‘Challenger’, voyage of the, 262.
Chambers: ‘Anecdotes of Serpents’, 216, 523;
editor of the ‘Journal’, 23;
observations at the Z.G., 217;
the ‘Miscellany’, 523.
Chancery, ss. in, 13, 515.
Chandernagor, travels in, 444.
Charas, Moyse: his work, 273;
he ‘grovels’ for fangs, 359, 372;
experiments on vs., 371;
74.
a ‘nursery’ offangs, ib.;
knew of the mobility of fangs, 372.
Charming, Sir H. Sloane on, 281;
its origin, 515, et seq., 578, 585 (see fascination).
Charms, 281;
in s. relics, etc., 509, et seq.
Chase, a, with a s., 214.
Chased by a s., 185.
Chateaubriand’s descriptions of ss., 153, 175, 197, 307;
the maternal asylum, 487.
Chelonia, 51.
Chelsea, tame ss. at, 13, 27, 515, 525.
Chicago, observations at, 496.
Chimaphila, 65.
Chinese mythologies, 509.
Chittagong, rare beast from, 261.
Chordæ vocales, 147.
Circulation of blood, 56;
checked by cold, 161;
renewed by warmth, ib.;
moisture essential to it, 162 (see hibernation, respiration, etc.).
‘City of Baltimore’, the, sea sers. seen from, 249.
Clarke’s translation of Der Hoeven’s ‘Handbook of Zoology’, 118.
Classification, 50;
at present defective, 51;
five principal groups of ss., 53;
by dentition unsatisfactory, 354, et seq.;
difficulties occurring in, 413, 421;
Krefft on, 423 (see nomenclature).
Clayton, Mr. J., the ‘tayle’ of rattlesnakes, 280.
Climbing, 180, et seq., 196, 214, et seq., 230;
of sea-ss., 238;
of Anguis fragilis, 475, 482.
Cockburn versus Mann, 13.
75.
Coiling, the, 48;
inconstriction, 29, 203;
of tail, 182, 587;
in convolutions, 185;
for a spring, 198;
to substitute hands, 199, et seq., 206;
swiftness of, 200;
flexibility of, 218;
of the sea-ser., 257;
of Heterodon, 409, 570;
in repose, 447, 587;
of ‘Lizzie’, 472, 478;
of Natrix, 569;
of the Lacertines, 571 (see constriction);
before striking, 573.
Cold; ss. affected by, 143, 159, 165, 584, et seq.
Cold-blooded, 56;
why so, 142, 146, 159 (see hibernation, etc.).
Colours of ss., 10;
under excitement, 153, 572;
of tree ss., 219, 386;
of rattlesnakes, 270, 285;
of African vs., 321, 338, et seq.;
of ‘Bushmaster’, 417, et seq., 426;
variable in young ss., 424;
in other ss., ib.;
after moulting, 508 (see sloughing).
Combats between ss., 37, 199, 563.
Congress (U.S.), Government Commissions, 199, 376.
Constriction, 29, 199, 203, 214, 245;
of young boas, 216, 439, 446;
sometimes feeble, 583.
Constrictors, 14, 35, 38, 135, 141, 182, 198, et seq., 213, et seq., 258, 336,
438, 446, 454, 583.
Convention, a, of ss., 104;
U.S. on ss., 485, 505, 552, 572.
Cooke, M. C., ‘Our Reptiles’, 491;
editor ‘Science Gossip’, ib.;
a herpetologist, ib.
76.
Cooper, W. R.,‘Serpent Myths’, 514.
Cope, Professor, of U.S., 386.
Cotton, Dr., of Tennessee, his rattlesnake, 298.
Coues, Dr. Elliott, of U.S., a combat, 199;
sound of the rattle, 309;
development of rattle, 312;
action of fangs, 362;
cro. fang, 376, et seq.;
species of cro., 386;
virulence of bite, 390;
pigs not exempt, 394.
Council of Z. Soc., 78, 322.
Cows sucked by ss., 84, 478.
Coypu, 229.
Cragin, Mr. F. W., on Heterodon, 450.
Cranberry swamps, the Massasauga, 393.
Crepitaculum caude, the, 294.
Crispe, Dr. Edwards, F.Z.S., on the œsophagus, 490;
the maternal refuge, ib.
Crocodilia, the, 51, 261.
Crotalina, 292.
Crotalum, 292.
Crotchets mobiles, 353, 362.
Cruden, s. poison, 102.
Cruelty, 17, 169, 206, 321, 469.
‘Curiosities of Natural History’ (see Buckland).
Cuvier; his classification, 46;
his ‘boa’, 47;
he distinguished fangs, ib.;
his era, 90, 383;
quoted by Darwin, 176;
his ‘vs.’, 353;
incubation, 434, 456.
‘Cyclopædia of Anatomy’, 118
77.
‘Cyclopædia of India’,86.
‘Cyclopædia, the Penny’, 113, 120.
D
Dahomey,
ser. deities at, 514;
natives fearless of vs., 523.
Dalton, the ‘Bushmaster’, 420.
Danish vernaculars, 479.
Darwin; complex organisms, 44;
cro. mutus, 176;
a living fossil, 244;
the Platypus, 263;
survival of the fittest, 267;
on the rattle, 311;
atrophied limbs, 326, 452.
Daudin, 353, 383, 488.
Davidson, Mr. (R. N.), a sea-ser., 251.
Davis’ Lectures at the Z. G., 24, 51, 214, 413 (see Flower, Huxley, Mivart).
Dean, Dr., a sea-ser. seen by, 249.
‘Deccan Days’, 517 (see Frere, Miss).
Deer kill rattlesnakes, 394.
Deglutition; manner of, 30, 111, et seq., 132;
facilitated by saliva, 35, 109, 113;
in drinking, 92, et seq.;
Schlegel on, 362;
watched at the Z. G., 581, et seq.
De Kay; ‘Zoology of N.Y.’, 85.
Demerara, ‘Bushmaster’ there, 423.
Dentition, 34;
of Deirodon, 67, et seq.;
of sea-ss., 241;
of sea-ser., 266;
various forms of, 342, et seq.;
distinguishing names in, 347;
illus. of, 349, 355, 356, 360;
78.
Welcome to ourwebsite – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com