# <ins>Week 1, Tutorial 2: Strings, Lists & Arrays</ins>
*ASTR 211: Observational Astronomy, Spring 2021* \
*Mason V. Tea*

Strings and lists are worth their own separate discussion, since these data types have some special attributes that regular numbers (ints and floats) just don't have. The main thing that sets them apart is the fact that they have *structure*, which we can access and manipulate. In other words, they are _mutable_ data types. The key to doing so, which can be tricky for newcomers to Python, is something called _indexing_.

# Indexing

You might be wondering how you can access the values in your lists once you've made them, or perhaps how to pick apart your strings. Both lists and strings can be thought of as a container for letters/numbers/symbols in a specific order. The place in the order that each of these symbols has is called its _index_, and we can access these symbols in both lists and strings using this index.

One odd thing about Python is that it starts counting at 0 rather than one. So, if we wanted the third letter of the word "TELESCOPE," its index would be 2 rather than 3. It's weird, but you get used to it. Take the word TELESCOPE for example:

| T | E | L | E | S | C | O | P | E |
| - | - | - | - | - | - | - | - | - |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |

With this in mind, we can talk about accessing certain parts of both lists and strings using indices. We'll continue with the "TELESCOPE" example, and I'll demonstrate with both a string and a list.

First, let's save a string ("TELESCOPE") and a list (["T","E","L","E","S","C","O","P","E"]) for convenience:

In [18]:
tel_str = "TELESCOPE"
tel_lst = ["T","E","L","E","S","C","O","P","E"]

Say we wanted to get the third letter of telescope, which should be "L". As stated above, the _third_ letter has the _second_ index, so we can write

In [19]:
print(tel_str[2])
print(tel_lst[2])

L
L


Suppose we're super cool. If we want to print the slang word for telescope, we can take a _slice_ of the string, using the indexes to return just part of it. If I want the word "SCOPE" then, I'd want to slice it at the 4th and 8th indices, the syntax for which goes like this:

In [21]:
print(tel_str[4:8])
print(tel_lst[4:8])

SCOP
['S', 'C', 'O', 'P']


(Note the fact that in the first example, when we took just one index, the value from the list was returned directly (i.e. as a string). In the second example, the values we sliced were returned as a list.)

Oh, it seems to have cut off the end of the word. Luckily there's some syntax for that. If we wanted to take the letters from index 4 all the way to the end, then, we just need to leave that last index empty:

In [22]:
print(tel_str[4:])
print(tel_lst[4:])

SCOPE
['S', 'C', 'O', 'P', 'E']


So that took the letters from index 4 onward. If we wanted a British telephone, we could do the same thing, but flip the position of the colon, taking the letters _up until_ index 4 instead:

In [23]:
print(tel_str[:4])
print(tel_lst[:4])

TELE
['T', 'E', 'L', 'E']


Another thing to notice is that if I have a string with numbers in it, and I access that index, Python will still return it as a string:

In [12]:
print(type("12345"[3]))

str

In contrast is the case of a list, which returns whatever the type of the value at the index you accessed is:

In [13]:
print(type([1.0, 'hi', 90][0]))

float

In [14]:
print(type([1.0, 'hi', 90][1]))

str

In [15]:
print(type([1.0, 'hi', 90][2]))

int

### Reverse indexing

To make things even more confusing, you can _reverse index_ your lists/strings with negative numbers. Continuing with our TELESCOPE examples, if we wanted the British phone again, we could write

In [26]:
print(tel_str[:-5])
print(tel_lst[:-5])

TELE
['T', 'E', 'L', 'E']


This works from the back, shaving the five letters off the back of the string/list. This doesn't follow the index convention described above, as the last letter has an index of -1 (because why would it?).

# Manipulating strings and lists

Now that you know how to grab values from strings/lists, let's talk about how to work with them. There are a few things you might want to do. A non-exhaustive list may include

- Adding/removing values
- Changing values
- Checking the length
- Finding specific values and their indices
- Splitting up

### Adding new values

If you want to add a value onto the end of a string or list, you could simply add them together as described in Tutorial 1 if you only want to tack onto the end/beginning.

In [31]:
print('string1' + 'string2') # string1string2
print([1,2,3]+[6])           # [1,2,3,6]

string1string2
[1, 2, 3, 6]


For lists, you can also use the `append` function for single values, which uses dot notation like functions from libraries, like so

In [40]:
new_lst = [1,2,3]
new_lst.append(4)
print(new_lst) # [1,2,3,4]

[1, 2, 3, 4]


For strings, you also have a couple of other options which are often useful: the `format()` function and _f-strings_. Both of them allow you to write out a string with some gaps that you can fill in later. Say I have some variables for my name, age, and class year:

In [4]:
my_age = 21
my_name = "Mason"
my_year = "2021"

If I want to write "My name is Mason, I'm 21 and I graduate in 2021" without explicitly writing out that string, I can insert them into a formatted string in two ways:

In [3]:
format_string = "My name is {0}, I'm {1} and I graduate in {2}".format(my_name,my_age,my_year)
f_string = f"My name is {my_name}, I'm {my_age} and I graduate in {my_year}"

print(format_string)
print(f_string)

My name is Mason, I'm 21 and I graduate in 2021
My name is Mason, I'm 21 and I graduate in 2021


Both work the same way. Choose your favorite.

### Removing values

If you want to remove a specific element from a list, you can use the `remove()` function:

In [67]:
a_lst = [1,2,3,4]
a_lst.remove(2)
print(a_lst)

[1, 3, 4]


For strings, we can use the `replace()` function, and just insert nothing. Nothing is actually something you can type, known as the _empty string_, `''`. Using the `replace()` function with the empty string will let you remove characters. It takes two values as arguments: the string you want to replace, and the string you want to replace it with.

In [71]:
a_str = 'string'
b_str = a_str.replace('st','')
print(b_str)

ring


### Changing values

If you want to change the value of an element in a list, you just need to declare the value at that index like a new variable. For example, if I have a list

In [51]:
a_lst = [1,2,3,4,5]

and I want the second value to be 6, I can reassign the value at that index like so:

In [53]:
a_lst[1] = 6
print(a_lst) # [1,6,3,4,5]

[1, 6, 3, 4, 5]


For strings you have to get creative. One option is converting the string to a list with the `list()` function, using the method above, and then turning the list back into a string using the `join()` function on an empty string. It's complicated, and I doubt you'll ever have to do it, but there it is.

In [78]:
a_str = 'string'
a_str_lst = list(a_str)
a_str_lst[4] = 'aaaa'
a_str2 = ''.join(a_str_lst)
print(a_str2)

striaaaag


### Checking the length

To see how many values or letters your list/string has, use the `len()` function. That's it.

In [55]:
print(len([1,2,3,4,5,6,7])) # 7
print(len('Mississippi'))   # 11

7
11


### Finding specific values and their indices

If you want to see if a value is in a string, you can use the `find()` function. This function returns the index of the **first** occurence of the value you're looking for. If it's not there, it returns a value of -1.

In [57]:
print('striiing'.find('i')) # 3
print('striiing'.find('f')) # -1

3
-1


As for lists, there are a couple easy ways to go about this. You can use the logical operator `in`, which returns a boolean:

In [60]:
print(4 in [1,2,3,4]) # True
print(6 in [1,2,3,4]) # False

True
False


Alternatively, you could use the `index()` function, which works similarly to the way `find()` does for strings, except it raises an error rather than returning -1. Not as helpful, but using `in` statements and the `index()` function in conjunction is a good way to get the index.

In [64]:
print([1,2,3,4].index(3)) # 2
print([1,2,3,4].index(5)) # ValueError

2


ValueError: 5 is not in list

### Splitting up

A method that I've found useful over the years has been the `split()` function. This function takes the value you want to split on as a parameter, and returns all the split versions of your string/list. For example, if I had a string that read like a list,

In [80]:
a_str = "1,2,3,4,5"

and I wanted to split the string up by the commas, I could write

In [81]:
a_str_split = a_str.split(',')
print(a_str_split)

['1', '2', '3', '4', '5']


As for lists, a junction of the rest of these methods would suffice.

# `numpy` arrays

Next, I want to talk about another library, this time one that's not built in. This library, `numpy` is basically the `math` library but better. It has all the same functions and more, but the best part about it is a new data type called an _array_. Arrays are really just lists, but math can be performed over the whole list at once.

First, I have to import `numpy`, which comes with your installation of Anaconda. It's often useful to _alias_ these libraries when you import them, i.e. give them a "nickname" so you don't have to type it out each time you reference it. The common alias for `numpy` is `np`, so I'll import it like so:

In [2]:
import numpy as np

Now, I can reference the library as `np`. So, let's make our first `numpy` arrays. To do so, you simply call the `array()` function with the list you want to convert to an array as an argument:

In [86]:
array1 = np.array([1,2,3,4,5])
array2 = np.array([4,5,6,7,8])
print(type(array1))

<class 'numpy.ndarray'>


Now, rather than in the case of plain old lists, we can do math with the values inside these arrays. Adding them together no longer tacks one onto the other, but rather adds each of the corresponding values together (index 0 to index 0, etc).

In [87]:
print(array1 + array2)

[ 5  7  9 11 13]


You can also subtract, multiply, divide, and do any other kind of arithmetic:

In [89]:
print(array1 * array2)
print(array2 / array1)
print(np.cos(array1) + 6*np.log10(array2))

[ 4 10 18 28 40]
[4.   2.5  2.   1.75 1.6 ]
[4.15266225 3.77767319 3.67891501 4.41694462 5.70220211]


Unlike lists, however, `numpy` arrays are _typed_, meaning that you can't store multiple data types in the same array. If you try to store both strings and numbers, for example, the array will default to all strings:

In [90]:
np.array([1,3.6,'cat'])

array(['1', '3.6', 'cat'], dtype='<U32')

`numpy` also gives you the ability to generate arrays with specific properties. For example, if I wanted an arary with 30 zeros and nothing else, I could use the `zeros()` command:

In [6]:
zero_array = np.zeros(30)
print(zero_array)
print(len(zero_array))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.]
30


Let's say I want to convert the frequencies between 10 and 100 Hz to wavelengths. For that I'd need a long list of frequency values. I can generate such an array with the `arange()` and `linspace()` commands. TThe `arange()` command takes as arguments, at minimum, the first and last elements you want in the array, and the array it returns will take steps of 1 through those values.

In [11]:
print(np.arange(10,100))

[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99]


Notice that it cuts off the last value, and be careful to address this if you use this function in your code! If you want to take specific sized steps through these values, i.e. add $n$ each time, you can specify the step size as well.

In [12]:
print(np.arange(10,100,10))

[10 20 30 40 50 60 70 80 90]


Rather than taking the step size as an input, the `linspace()` function takes the _number_ of steps (i.e. the length of the list) and evenly spaces the values.

In [None]:
print(np.linspace(10,100,50))

Notice also that `linspace()` _does_ give you the last element of the list.