Go - String

In Go, string is an immutable array of bytes. So if created, we can’t change its value. E.g.:

package main

func main()  {
  s := "Hello"
  s[0] = 'h'
}

The compiler will complain:

cannot assign to s[0]

To modify the content of a string, you could convert it to a byte array. But in fact, you do not operate on the original string, just a copy:

package main

import "fmt"

func main()  {
  s := "Hello"
  b := []byte(s)
  b[0] = 'h'
  fmt.Printf("%s\n", b)
} 

The result is like this:

hello

Between Bytes, String and Rune

String as Byte slice

"hey"                   // String value
[]byte{104, 101, 121}   // Representing string "hey" in byte slice

[]byte("hey")                   // Converting string "hey" into byte slice
string([]byte{104, 101, 121})   // Converting byte slice into string value

String as rune literal

  • Instead of numbers (byte slice), we can also represent string characters as rune literals.

  • Numbers and Rune Literals are the same thing.

  • In Go, Unicode Code Points are called Runes.

  • A Rune literal is a typeless integer literal.

  • A Rune literal can be of any integer type. for e.g. byte (uint8), rune (int32) or any other integer type.

  • In short, Rune is a Unicode Code Point that is represented by an Integer Value.

  • Using UTF-8 we can represent Unicode Code Points between 1 byte and 4 bytes.

  • We can represent any

    Unicode Code Point
    

    using the

    Rune Type
    

    because it can store

    4 bytes
    

    of data. For e.g.

    char := '🍺'
    
  • String values
    

    are

    read-only byte slices
    

    i.e.

    string value ----> read-only []byte
    
  • String to Byte Slice conversion creates a new []byte slice and copies the bytes of the string to a new slice’s backing array. They don’t share the same backing array.

  • In short, String is an immutable byte slice and we cannot change any of it’s elements. However, we can convert string to a byte slice and then we can change that new slice.

  • A string is a data structure that points to a read-only backing array.

  • UTF-8 is a variable length encoding (for efficiency). So each rune may start at a different index.

  • for range loop jumps over the runes of a string, rather than the bytes of a string. Each index returns the starting index of the next rune.

  • Runes in a UTF-8 encoded string can have a different number of bytes because UTF-8 is a variable byte-length encoding.

  • Especially in scripting languages, we can manipulate UTF-8 strings by indexes easily. However, Go doesn’t allow us to do so by default because of efficiency reasons.

  • Go never hides the cost of doing something.

  • []rune(string) creates a new slice, and copies each rune to new slice’s backing array. This is inefficient way of indexing strings.

  • A string value usually use UTF-8 so it can be more efficient because each rune on the other hand uses 1 to 4 bytes (variable-byte length).

  • Each rune in []rune (Rune Slice) has the same length i.e. 4 bytes. It is inefficient because the rune type is an alias to int32.

  • In Go, if our source code file is encoded into utf-8 then String Literals in our file are automatically encoded into utf-8.

  • When we’re working with bytes, continue working with bytes. Do not convert a string to []byte (Byte Slice) or vice versa, unless necessary. Prefer working with []byte (Byte Slice) whenever possible. Bytes are more efficient and used almost everywhere in Go standard libraries.

String is UTF-8

Since Go uses UTF-8 encoding, you must remember the len function will return the string’s byte number, not character number:

package main

import "fmt"

func main()  {
  s := "लॉगlog"
  fmt.Println(len(s))
} 

The result is:

  9

Because each Chinese character occupied 3 bytes, s in the above example contains 5 characters and 9 bytes.

If you want to access every character, for ... range loop can give a help:

package main
import "fmt"

func main() {
  s := "लॉगlog"
  for index, runeValue := range s {
    fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
  }
}

The result is:

Reference: Strings, bytes, runes and characters in Go;
The Go Programming Language. https://github.com/NanXiao/golang-101-hacks


See also