Useful octave commands relating to strings

To read a string in from a file, see part 2 below.

Part 1

I found this information by typing
help -i string
in octave.
char (CELL_ARRAY)
 - Built-in Function:  char (S1, S2, ...)
     Create a string array from a numeric matrix, cell array, or list of

     If the argument is a numeric matrix, each element of the matrix is
     converted to the corresponding ASCII character.  For example,

          char ([97, 98, 99])
               => "abc"
 - Function File:  int2str (N)
 - Function File:  num2str (X, PRECISION)
 - Function File:  num2str (X, FORMAT)
     Convert a number to a string.  These functions are not very
     flexible, but are provided for compatibility with MATLAB.  For
     better control over the results, use `sprintf' (*note Formatted
     Output::).

 - Function File:  strcat (S1, S2, ...)
     Return a string containing all the arguments concatenated.  For
     example,

          s = [ "ab"; "cde" ];
          strcat (s, s, s)
          => "ab ab ab "
                  "cdecdecde"
Searching and Replacing
=======================

 - Function File:  findstr (S, T, OVERLAP)
     Return the vector of all positions in the longer of the two strings
     S and T where an occurrence of the shorter of the two starts.  If
     the optional argument OVERLAP is nonzero, the returned vector can
     include overlapping positions (this is the default).  For example,

          findstr ("ababab", "a")
          => [ 1, 3, 5 ]
          findstr ("abababa", "aba", 0)
          => [ 1, 5 ]

 - Function File:  index (S, T)
     Return the position of the first occurrence of the string T in the
     string S, or 0 if no occurrence is found.  For example,

          index ("Teststring", "t")
          => 4

     *Caution:*  This function does not work for arrays of strings.

 - Function File:  strcmp (S1, S2)
     Compares two strings, returning 1 if they are the same, and 0
     otherwise.

     *Caution:*  For compatibility with MATLAB, Octave's strcmp
     function returns 1 if the strings are equal, and 0 otherwise.
     This is just the opposite of the corresponding C library function.

 - Function File:  strrep (S, X, Y)
     Replaces all occurrences of the substring X of the string S with
     the string Y.  For example,

          strrep ("This is a test string", "is", "&%$")
          => "Th&%$ &%$ a test string"

 - Function File:  substr (S, BEG, LEN)
     Return the substring of S which starts at character number BEG and
     is LEN characters long.
     If OFFSET is negative, extraction starts that far from the end of
     the string.  If LEN is omitted, the substring extends to the end
     of S.

     For example,

          substr ("This is a test string", 6, 9)
          => "is a test"

          This function is patterned after AWK.  You can get the same
          result by `S (BEG : (BEG + LEN - 1))'.

String Conversions
==================

 - Function File:  hex2dec (S)
     Return the decimal number corresponding to the binary number stored
     in the string S.  For example,

          hex2dec ("1110")
          => 14

     If S is a string matrix, returns a column vector of converted
     numbers, one per row of S.  Invalid rows evaluate to NaN.

 - Function File:  dec2bin (N, LEN)
     Return a binary number corresponding the nonnegative decimal number
     N, as a string of ones and zeros.  For example,

          dec2bin (14)
          => "1110"

     If N is a vector, returns a string matrix, one row per value,
     padded with leading zeros to the width of the largest value.

 - Function File:  dec2base (N, B, LEN)
     Return a string of symbols in base B corresponding to the the
     nonnegative integer N.

          dec2base (123, 3)
          => "11120"

     If N is a vector, return a string matrix with one row per value,
     padded with leading zeros to the width of the largest value.

     If B is a string then the characters of B are used as the symbols
     for the digits of N.  Space (' ') may not be used as a symbol.

          dec2base (123, "aei")
          dec2base (123, "aei")
          => "eeeia"

     The optional third argument, LEN, specifies the minimum number of
     digits in the result.

 - Function File:  base2dec (S, B)
     Convert S from a string of digits of base B into an integer.

          base2dec ("11120", 3)
          => 123

     If S is a matrix, returns a column vector with one value per row
     of S.  If a row contains invalid symbols then the corresponding
     value will be NaN.  Rows are right-justified before converting so
     that trailing spaces are ignored.

     If B is a string, the characters of B are used as the symbols for
     the digits of S. Space (' ') may not be used as a symbol.

          base2dec ("yyyzx", "xyz")
          => 123

 - Function File:  str2num (S)
     Convert the string S to a number.

 - Mapping Function:  toascii (S)
     Return ASCII representation of S in a matrix.  For example,

          toascii ("ASCII")
               => [ 65, 83, 67, 73, 73 ]


 - Mapping Function:  tolower (S)
     Return a copy of the string S, with each upper-case character
     replaced by the corresponding lower-case one; nonalphabetic
     characters are left unchanged.  For example,

          tolower ("MiXeD cAsE 123")
               => "mixed case 123"

 - Mapping Function:  isdigit (S)
     Return 1 for characters that are decimal digits.



Part 2: LOADING STRINGS

If the string is just ascii and one character per line then we can just use:

load filename.dat
and it will strip the input into a number-array, which can then be processed however we like.

If the string is ascii but all concatenated then we can just prepend an octave friendly header. Eg:

# name: name_we_want_to_use_in_octave_workspace
# type: string
# elements: 1
# length: 71
10101010010001010101000101001010101000101001010101010100001001010110110

The hashed comments tell octave what it should expect to read. (If the length is not the full length of the string, that's fine it just reads as many characters as we tell it to. (Note that the spacing in the comment structure seems to be important. Eg

# type: string
works, but
#type:  string
seems not to.

The string is read in as a character array. We can access substrings by doing

name_we_want_to_use_in_octave_workspace(n)
or
name_we_want_to_use_in_octave_workspace(p:q)

Also, we can convert the elements to decimal (not characters) by doing using, say

name_we_want_to_use_in_octave_workspace - '0'
this substracts the ascii value of zero from each of the character array elements and magically turns everything into proper integers.


David MacKay and Simon Osindero