Introduction
Programmers dream of
Abstraction, recursion, and
Typing really fast.
Cats stands for CS61A Autocorrected Typing Software.
In this project, you will write a program that measures typing speed. Additionally, you will implement typing autocorrect, which is a feature that attempts to correct the spelling of a word after a user types it. This project is inspired by typeracer.
Important submission notes: This project has three phases. You have two weeks for all of them. Note that mid-term exam is also coming soon! We recommend starting and finishing Phase 1 as soon as possible to give yourself adequate time to complete Phases 2 and 3, which can be more time consuming.
Project Structure
Below is a list of all the files you will see in the proj02-Code.zip
.
This project includes several files, but your changes will be made only to cats.py
and utils.py
.
cats
|-gui_files # A directory of various things used by the web gui.
|-tests # Your local `ok` tests for each problem.
|-data
| |-sample_paragraphs.txt # A file containing text samples to be typed.
| |-common_words.txt # A file containing common English words in order of frequency.
| `-words.txt # A file containing many more English words in order of frequency.
|-cats.py # The typing test logic.
|-gui.py # A web server for the web-based graphical user interface (GUI).
|-ucb.py # Utility functions for CS 61A projects.
`-utils.py # Utility functions for interacting with files and strings.
作业提示(怕你们看不懂英文):
- 先读题,再写代码。
- 部分题目提供了初始代码,如果你不想用可以直接删除,你也可以自己定义新的函数。
- 不要修改作业没有提到的函数或文件,否则你的分数可能受到影响(别问为什么OJ上没分数了)。
- 你应该在完成作业的过程中不断测试你的代码的正确性。但是不要每写一行就测试,给自己足够的思考问题的时间。
When students in the past have tried to implement the functions without thoroughly reading the problem description, they’ve often run into issues. 😱 Read each description thoroughly before starting to code.
For the functions that we ask you to complete, there may be some initial code that we provide. If you would rather not use that code, feel free to delete it and start from scratch. You may also add new function definitions as you see fit.
However, do not modify any other functions or edit any files not listed above. Doing so may result in your code failing our autograder tests. Also, please do not change any function signatures (names, argument order, or number of arguments).
Throughout this project, you should be testing the correctness of your code. It is good practice to test often, so that it is easy to isolate any problems. However, you should not be testing too often, to allow yourself time to think through problems.
Phase 1: Typing
We recommend you finish phase 1 in 3 days, which needs 50~100 lines code.
Problem 1 (100pts): choose
Throughout the project, we will be making changes to functions in cats.py
.
Implement choose
, which selects which paragraph the user will type.
It takes a list of paragraphs
(strings), a select
function that returns True
for paragraphs that can be selected, and a non-negative index k
.
The choose
function returns the k
-th paragraph for which select
returns True
.
If no such paragraph exists (because k
is too large), then choose
returns the empty string.
Hint: Index starts from
0
.
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 01 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 01
Problem 2 (150pts): about
Implement about
, which takes a list of topic
words.
It returns a function that can be passed to choose
as the select
argument.
The returned function takes a paragraph and returns a boolean indicating whether that paragraph contains any of the words in topic
.
Once we've implemented about
, we'll be able to pass the returned function to choose
as the select
argument, which will be useful as we continue to implement our typing test.
To make this comparison accurately, you will need to ignore case (that is, assume that uppercase and lowercase letters don't change what word it is) and punctuation. Additionally, only check for exact matches of the words in topic in the paragraph, not substrings. For example, "dogs" is not a match for the word "dog".
Hint: You may use the string utility functions in
utils.py
. You can reference the docstrings of the utility functions to see how they are being used.
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 02 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 02
Problem 3 (150pts): accuracy
Implement accuracy
, which takes a typed
paragraph and a source
paragraph.
It returns the percentage of words in typed
that exactly match the corresponding words in source
.
Case and punctuation must match as well.
"Corresponding" here means that two words must occur at the same indices in typed
and source
- the first words of both must match, the second words of both must match, and so on.
A word in this context is any sequence of characters separated from other words by whitespace, so treat "dog;" as all one word.
If typed
is longer than source
, then the extra words in typed
that have no corresponding word in source
are all incorrect.
If both typed
and source
are empty, then the accuracy is 100.0.
If typed
is empty but source
is not empty, then the accuracy is zero.
If typed
is not empty but source
is empty, then the accuracy is zero.
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 03 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 03
Problem 4 (100pts): wpm
Implement wpm
, which computes the words per minute, a measure of typing speed, given a string typed
and the amount of elapsed
time in seconds.
Despite its name, words per minute is not based on the number of words typed, but instead the number of 5 characters, so that a typing test is not biased by the length of words.
The formula for words per minute is the ratio of the number of characters (including spaces) typed divided by 5 (a typical word length) to the elapsed time in minutes.
For example, the string "I am glad!"
contains three words and ten characters (not including the quotation marks).
The words per minute calculation uses 2 as the number of words typed (because 10 / 5 = 2).
If someone typed this string in 30 seconds (half a minute), their speed would be 4 words per minute.
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 04 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 04
Time to test your typing speed!
You can use the command line to test your typing speed on paragraphs about a particular topic.
For example, the command below will load paragraphs about cats or kittens.
See the run_typing_test
function for the implementation if you're curious (but it is defined for you).
$ python cats.py -t cats kittens
You can try out the web-based graphical user interface (GUI) using the following command.
(Use ctrl+C
or command+C
on your terminal to quit the GUI after closing the browser tab.)
$ python cats_gui.py
Congratulations! You have finished Phase 1 of this project!
Phase 2: Autocorrect
In the web-based GUI, there is an autocorrect button, but right now it doesn't do anything. Let's implement automatic correction of typos. Whenever the user presses the space bar, if the last word they typed doesn't match a word in the dictionary but is close to one, then that similar word will be substituted for what they typed.
We recommend you finish phase 2 in 5 days, as you will learn how to prune recursion(剪枝).
Problem 5 (200pts): autocorrect
Implement autocorrect
, which takes a typed_word
, a list of all word_list
, a diff_function
, and a limit
.
If the typed_word
is contained inside the word_list
list, autocorrect
returns that word.
Otherwise, autocorrect
returns the word from word_list
that has the lowest difference from the provided typed_word
based on the diff_function
.
However, if the lowest difference between typed_word
and any of the word_list
is greater than limit
, then typed_word
is returned instead.
Important: If
typed_word
is not contained insideword_list
, and multiple strings have the same lowest difference fromtyped_word
according todiff_function
,autocorrect
should return the string that appears first inword_list
.
A diff function takes in three arguments.
The first is the typed_word
, the second is the source word (in this case, a word from word_list
), and the third argument is the limit
.
The output of the diff function, which is a number, represents the amount of difference between the two strings.
Note: Some diff functions may be asymmetric (meaning flipping the first two parameters may yield a different output), so make sure to pass in the arguments to
diff_function
in the correct order. We will see an example of an asymmetric diff function in problem 7.
Here is an example of a diff function that computes the minimum of 1 + limit
and the difference in length between the two input strings:
>>> def length_diff(w1, w2, limit):
... return min(limit + 1, abs(len(w2) - len(w1)))
>>> length_diff('mellow', 'cello', 10)
1
>>> length_diff('hippo', 'hippopotamus', 5)
6
Assume that typed_word
and all elements of word_list
are lowercase and have no punctuation.
Hint: Try using
max
ormin
with the optionalkey
argument.
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 05 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 05
Problem 6 (200pts): sphinx_fixes
Implement sphinx_fixes
, which is a diff function that takes two strings.
It returns the minimum number of characters that must be changed in the start
word in order to transform it into the goal
word.
If the strings are not of equal length, the difference in lengths is added to the total.
Here are some examples:
>>> big_limit = 10
>>> sphinx_fixes("nice", "rice", big_limit) # Substitute: n -> r
1
>>> sphinx_fixes("range", "rungs", big_limit) # Substitute: a -> u, e -> s
2
>>> sphinx_fixes("pill", "pillage", big_limit) # Don't substitute anything, length difference of 3.
3
>>> sphinx_fixes("roses", "arose", big_limit) # Substitute: r -> a, o -> r, s -> o, e -> s, s -> e
5
>>> sphinx_fixes("rose", "hello", big_limit) # Substitute: r->h, o->e, s->l, e->l, length difference of 1.
5
Important: If the number of characters that must change is greater than
limit
, thensphinx_fixes
should return any number larger thanlimit
and should minimize the amount of computation needed to do so.These two calls to
sphinx_fixes
should take about the same amount of time to evaluate:>>> limit = 4 >>> sphinx_fixes("roses", "arose", limit) > limit True >>> sphinx_fixes("rosesabcdefghijklm", "arosenopqrstuvwxyz", limit) > limit True
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 06 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 06
Try turning on autocorrect in the GUI. Does it help you type faster? Are the corrections accurate?
Problem 7 (300pts): minimum_mewtations
Implement minimum_mewtations
, which is a diff function that returns the minimum number of edit operations needed to transform the typed
word into the source
word.
There are three kinds of edit operations:
- Add a letter to
typed
.- Adding
"t"
to"kiten"
gives us"kitten"
.
- Adding
- Remove a letter from
typed
.- Removing
"t"
from"catts"
gives us"cats"
.
- Removing
- Substitute a letter in
typed
for another.- Substituting
"a"
with"o"
in"baok"
gives us"book"
.
- Substituting
Each edit operation contributes 1 to the difference between two words.
>>> big_limit = 10
>>> minimum_mewtations("cats", "scat", big_limit) # cats -> scats -> scat
2
>>> minimum_mewtations("purng", "purring", big_limit) # purng -> purrng -> purring
2
>>> minimum_mewtations("ckiteus", "kittens", big_limit) # ckiteus -> kiteus -> kitteus -> kittens
3
We have provided a template of an implementation in cats.py
.
Hint: This is a recursive function with three recursive calls. One of these recursive calls will be similar to the recursive call in
sphinx_fixes
if you implementedsphinx_fixes
with recursion.
You may modify the template however you want or delete it entirely.
Important: If the number of edits required is greater than
limit
, thenminimum_mewtations
should return any number larger thanlimit
and should minimize the amount of computation needed to do so.These two calls to
minimum_mewtations
should take about the same amount of time to evaluate:>>> limit = 2 >>> minimum_mewtations("ckiteus", "kittens", limit) > limit True >>> minimum_mewtations("ckiteusabcdefghijklm", "kittensnopqrstuvwxyz", limit) > limit True
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 07 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 07
Try typing again. Are the corrections more accurate?
$ python cats_gui.py
(Optional) Extension: final_diff
(0pt)
You may optionally design your own diff function called final_diff
.
Here are some ideas for making even more accurate corrections:
- Take into account which additions and deletions are more likely than others. For example, it's much more likely that you'll accidentally leave out a letter if it appears twice in a row.
- Treat two adjacent letters that have swapped positions as one change, not two.
- Try to incorporate common misspellings.
You can also set the limit you'd like your diff function to use by changing the value of the variable FINAL_DIFF_LIMIT
in cats.py
.
You can check your final_diff
's success rate by running:
$ python score.py
If you don't know where to start, try copy-pasting your code for sphinx_fixes
and minimum_mewtations
into final_diff
and scoring them.
Looking at the typos they accidentally fixed might give you some ideas!
Phase 3: Multiplayer
When students in the past have tried to implement the functions without thoroughly reading the problem description, they’ve often run into issues. 😱 Read each description thoroughly before starting to code.
Typing is more fun with friends!
You'll now implement multiplayer functionality, so that when you run cats_gui.py
on your computer, it connects to the course server at https://sicp.pascal-lab.net/cats/ and looks for someone else to race against.
To race against a friend, 5 different programs will be running:
- Your GUI, which is a program that handles all the text coloring and display in your web browser.
- Your
cats_gui.py
, which is a web server that communicates with your GUI using the code you wrote incats.py
. - Your opponent's
cats_gui.py
. - Your opponent's GUI.
- The CS 61A multiplayer server, which matches players together and passes messages around. It is not running on your machine.
When you type, your GUI uploads what you have typed to your cats_gui.py
server, which computes how much progress you have made and returns a progress update.
It also uploads a progress update to the multiplayer server, so that your opponent's GUI can display it.
Meanwhile, your GUI display is always trying to keep current by asking for progress updates from cats_gui.py
, which in turn requests that info from the multiplayer server.
Each player has an id
number that is used by the server to track typing progress.
We recommend you to finish phase 3 in 3 days. ~50 lines code is needed.
Problem 8 (200pts): report_progress
Implement report_progress
, which is called every time the user finishes typing a word.
It takes a list of the words typed
, a list of the words in the prompt
, the user's user_id
, and a upload
function that is used to upload a progress report to the multiplayer server.
There will never be more words in typed
than in prompt
.
Your progress is a ratio of the words in the prompt
that you have typed correctly, up to the first incorrect word, divided by the number of prompt
words.
For example, this example has a progress of 0.25
:
report_progress(["Hello", "ths", "is"], ["Hello", "this", "is", "wrong"], ...)
Your report_progress
function should do two things: upload a message to the multiplayer server and return the progress of the player with user_id
.
You can upload a message to the multiplayer server by calling the upload
function on a two-element dictionary containing the keys 'id'
and 'progress'
.
You should then return the player's progress, which is the ratio of words you computed.
Hint: See the dictionary below for an example of a potential input into the upload function. This dictionary represents a player with
user_id
1 andprogress
0.6.{'id': 1, 'progress': 0.6}
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 08 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 08
Problem 9 (200pts): time_per_word
Implement time_per_word
, which takes in a list words
and times_per_player
, a list of lists for each player with timestamps indicating when each player finished typing every individual word in words
.
It returns a game
with the given information.
A game
is a data abstraction that has a list of words
and times
. The times
are stored as a list of lists of how long it took each player to type each word.
Specifically, times[i][j]
indicates how long it took player i
to type words[j]
.
For example, say words = ['Hello', 'world']
and times = [[5, 1], [4, 2]]
, then [5, 1]
corresponds to the list of times for player 0, and [4, 2]
corresponds to the list of times for player 1.
Thus, player 0 took 5 units of time to write the word 'Hello'
.
Important: Be sure to use the
game
constructor when returning agame
. The tests will check that you are using thegame
dictionary rather than assuming a particular data format.Read the definitions for the
game
constructor incats.py
to learn more about how the dictionary is implemented.
Timestamps are cumulative and always increasing, while the values in times
are differences between consecutive timestamps for each player.
Here's an example: If times_per_player = [[1, 3, 5], [2, 5, 6]]
, the corresponding times
attribute of the game
would be [[2, 2], [3, 1]]
.
This is because the differences in timestamps are (3-1)
, (5-3)
for the first player and (5-2)
, (6-5)
for the second player.
The first value of each list within times_per_player
represents the initial starting time for each player.
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 09 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 09
Problem 10 (200pts): fastest_words
Implement fastest_words
, which returns which words each player typed fastest.
This function is called once both players have finished typing.
It takes in a game
.
Specifically, the fastest_words
function returns a list of lists of words, one list for each player, and within each list the words they typed the fastest (against all the other players).
In the case of a tie, consider the earliest player in the list (the smallest player index) to be the one who typed it the fastest.
For example consider the following game with the words 'Just'
, 'have'
, and 'fun'
. Player 0 typed 'fun'
the fastest (3 seconds), Player 1 typed 'Just'
the fastest (4 seconds), and they tied on the word 'have'
(both took 1 second) so we consider to Player 0 to be the fastest, because they are the earliest player in the list.
>>> player_0 = [5, 1, 3]
>>> player_1 = [4, 1, 6]
>>> fastest_words(game(['Just', 'have', 'fun'], [player_0, player_1]))
[['have', 'fun'], ['Just']]
The game
argument is a game
dictionary, like the one returned in Problem 9.
You can access words in the game
with the selector get_word
, which takes in a game
and the word_index
(an integer).
With get_word
you can access the time it took any player to type any word using time.
Important: Be sure to use the
game
constructor when returning agame
. The tests will check that you are using thegame
dictionary rather than assuming a particular data format.Make sure your implementation does not mutate the given player input lists. For the example above, calling
fastest_words
on[player_0, player_1]
should not mutateplayer_0
orplayer_1
.
Before writing any code, unlock the tests to verify your understanding of the question:
$ python ok -q 10 -u
Once you are done unlocking, begin implementing your solution. You can check your correctness with:
$ python ok -q 10
Now you can play against other students in the course.
Set enable_multiplayer
to True
near the bottom of cats.py
and type swiftly!
Project Submission
Finally, submit your code to complete the project. You may submit more than once, and your final score of the project will be the highest score of all your submissions.
$ python ok --submit
Congratulations, you have reached the end of your second SICP project! You achieve the main body of a interesting game with only ~200 lines python by yourself. You should really be proud of it!
And remember, you can always come back to practice your typing techniques no matter happy or sad :D