In Go, string is an immutable array of bytes. So if created, we can’t change its value. E.g.:
package main
func main() {
s := "Hello"
s[0] = 'h'
}
The compiler will complain:
cannot assign to s[0]
To modify the content of a string, you could convert it to a byte array. But in fact, you do not operate on the original string, just a copy:
package main
import "fmt"
func main() {
s := "Hello"
b := []byte(s)
b[0] = 'h'
fmt.Printf("%s\n", b)
}
The result is like this:
hello
Between Bytes, String and Rune
String as Byte slice
"hey" // String value
[]byte{104, 101, 121} // Representing string "hey" in byte slice
[]byte("hey") // Converting string "hey" into byte slice
string([]byte{104, 101, 121}) // Converting byte slice into string value
String as rune literal
Instead of
numbers(byte slice), we can also representstring charactersasrune literals.NumbersandRune Literalsare the same thing.In Go,
Unicode Code Pointsare calledRunes.A
Rune literalis atypeless integer literal.A
Rune literalcan be ofany integer type. for e.g.byte (uint8),rune (int32)orany other integer type.In short,
Runeis aUnicode Code Pointthat is represented by anInteger Value.Using
UTF-8we can representUnicode Code Pointsbetween1 byteand4 bytes.We can represent any
Unicode Code Pointusing the
Rune Typebecause it can store
4 bytesof data. For e.g.
char := '🍺'String valuesare
read-only byte slicesi.e.
string value ----> read-only []byteString to Byte Sliceconversion creates anew []byte sliceand copies the bytes of the string to a new slice’sbacking array. They don’t share the samebacking array.In short,
Stringis animmutable byte sliceand we cannot change any of it’s elements. However, we can convertstring to a byte sliceand then we can change thatnew slice.A
stringis a data structure that points to aread-only backing array.UTF-8is avariable length encoding(for efficiency). So eachrunemay start at adifferent index.for rangeloop jumps over therunes of a string, rather than thebytes of a string. Eachindexreturns thestarting indexof thenext rune.Runesin aUTF-8 encoded stringcan have a different number ofbytesbecauseUTF-8is avariable byte-length encoding.Especially in scripting languages, we can manipulate
UTF-8 stringsbyindexeseasily. However, Go doesn’t allow us to do soby defaultbecause of efficiency reasons.Go never hides the cost of doing something.
[]rune(string)creates anew slice, and copies eachruneto new slice’sbacking array. This is inefficient way of indexing strings.A
stringvalue usually useUTF-8so it can be more efficient because eachruneon the other handuses 1 to 4 bytes(variable-byte length).Each
runein[]rune(Rune Slice) has the same length i.e.4 bytes. It is inefficient because therunetype is an alias toint32.In Go, if our
source code fileis encoded intoutf-8thenString Literalsin our file are automatically encoded intoutf-8.When we’re working with
bytes, continue working withbytes. Do not convert astringto[]byte(Byte Slice) or vice versa, unless necessary. Prefer working with[]byte(Byte Slice) whenever possible.Bytesare more efficient and used almost everywhere in Go standard libraries.
String is UTF-8
Since Go uses UTF-8 encoding, you must remember the len function will return the string’s byte number, not character number:
package main
import "fmt"
func main() {
s := "लॉगlog"
fmt.Println(len(s))
}
The result is:
9
Because each Chinese character occupied 3 bytes, s in the above example contains 5 characters and 9 bytes.
If you want to access every character, for ... range loop can give a help:
package main
import "fmt"
func main() {
s := "लॉगlog"
for index, runeValue := range s {
fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
}
}
The result is:
Reference:
Strings, bytes, runes and characters in Go;
The Go Programming Language.
https://github.com/NanXiao/golang-101-hacks
- Representing letters with numbers - Overview of ASCII and Unicode
- Characters in a computer - Advanced technical videos about the underlyings of ASCII and Unicode
- The 3rd video is especially important because it talks about UTF-8 encoding and decoding.
- Hexadecimal Number System - Hexadecimal numbers are important when working with bytes
- Go Blog: Strings