All java classes for these notes are found in the package java.io.
A stream is a series of bytes either:
· Written by the program to memory, an output device, or a file on the disk or
· Read by the program from one of these sources
The File class and its subclasses provide methods that facilitate manipulating data in a file on the disk.
The disk is a place where data can be written at some point in time by program1 and read by program2 at an arbitrary time in the future.
Objects instantiated from these File classes allow the programmer to locate and access files on the disk. When a program creates a File object, it must provide a String that is the name of the disk file this File object will manipulate. They provide the following general operations:
Return whether or not there is a file on disk having the name specified when the File object was created.
Disk files are protected by the operating system such that only certain users can read or write them. These methods tell a program whether it is able to read or write the underlying disk file.
There are many operations a program can perform on a disk file using a File object. See the File class API.
Reserve the underlying disk file for either:
· Reading – in general, multiple different File objects can open the same disk file for reading.
· Writing – in general, only one File object can open a disk file for writing. A File object attempting to open a disk file for writing that has already been opened by another File object will get an exception.
So, when a File object ‘opens’ a disk file, this object now has the ability to read or write the contents of this disk file depending on the kind of open the File object did.
Creating an object using one of the subclasses of File generally performs the open operation.
Release the reservation created when the File object excecuted its open method.
Read some number of bytes from the underlying disk file into memory.
Write some number of bytes from memory to the underlying disk file.
File objects generally keep a pointer (sometimes called cursor) that holds the next byte in the file to read or write. Some File subclasses allow you to move this pointer without reading or writing any data.
Most programming languages that allow file reading and writing generally provide two ways to traverse the data in a file:
· Sequential – each data item must be read (or written) immediately after the last data item read (or written). Opening a file for sequential access requires opening it for read or for write, but not for both.
· Random – at any point in time, the programmer may read or write the next data item or use the seek operation to position the file pointer to an arbitrary data item in the file.
Java supports interpreting the data items in a file as follows:
· As numbers – byte, short, int, long, float, double values – all signed values
· As booleans – each boolean takes a whole byte – either a 0 (false) or a 1 (true)
· As chars – chars in java are two byte Unicode characters, but when they are written to disk the amount of space they occupy in the file depends on the encoding. See the ‘Encoding’ section below.
Sequential access to the data in a file.
Write - DataOutputStream wraps FileOutputStream
Read – DataInputStream wraps FileInputStream
writeBoolean, // write 1 byte with value either 0 false or 1 true
writeChar, // write 2 bytes in UTF-16 format
writeByte, // write only 1 byte – the low byte of the value passed in
writeShort, writeInt, writeLong, writeFloat, writeDouble, // write the appropriate number of bytes in the appropriate format
writeUTF(String) writes strings in UTF-8 encoding
and corresponding methods with the same names for reading
see demos/BinaryDataWriteDemo.java
RandomAccessFile raf = new RandomAccessFile(“path”,”access”);
Acts
like both a stream and a file, i.e., doesn’t need an additional stream class.
“access” notes:
-
Can
be “rw”, “r”, “w”.
-
If
“r”, the file must pre-exist.
-
If
“rw” or “w” and the file doesn’t exist, it will be
created.
-
“rwd” writes with immediate updates
to storage.
seek() allows you to move the file
pointer a number of bytes
PrintWriter - characters are converted into bytes according to the platform's default character encoding
BufferedReader wraps FileReader
- Allows:
o Read an entire line. The readLine() method returns null if at the end of file, but it consumes input if not at end of file.
o Limited moving of the file pointer (mark, skip, reset). The mark() method will throw an exception if there is no more data to read.
- From the API:
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders.
All of the classes in package java.io have methods, which have ‘throws’ clauses in their signatures.
When you call these methods, you must either provide a handler for these exceptions or put a ‘throws’ clause in your method signature.
A
tool that allows you to inspect individual bytes in a file.
It
is a command line program available on linux, mac,
and windows.
hexdump –C
<filename>
displays the contents of a file in
the way we have been doing in class.
Start
| cmd
PATH="X:CS258
(6448-Spring2019)\CourseFiles";%PATH%
P:
cd
cs258\ws
The
above cd command assumes your workspace is on your P drive in the directory
cs258\ws
Mac
users can start a terminal window and type in the command.
Same
for linux users.
If
you are on windows, you can download a stand-alone version of this tool from https://www.di-mgt.com.au/hexdump-for-windows.html
(scroll to the bottom of the page.).
Extract
the zip file to C:
Copy
C:\hexdump-2.0.0\hexdump.exe to C:
(You
will likely need administrator permission to do this. If you are on a system without administrator
permission, unzip/copy hexdump.exe to any directory where you do have
permission, e.g., P:\)
Windows
10 users can also install the windows bash shell by following the instructions
here https://www.laptopmag.com/articles/use-bash-shell-windows-10
1. Start a terminal window (cmd window if you are running the stand alone Windows
version, terminal on a mac)
2. Change directory to the
directory where your file is. This step
can be tricky on the windows bash shell and on macs.
3. Run the hexdump
utility as described above.
Here is a run on my windows system using the stand-alone
hexdump.exe:
1.
Open a command line: start | cmd -- open a
terminal window on mac or linux
2. cd
<eclipseProjectDirectory>
3.
C:\hexdump –C iodemotest.bin
For Windows with hexdump copied to C:\
For mac or linux:
hexdump -C iodemotest.bin
Here is a picture of what I did after start | cmd:
Yellow text
is what I typed on the command line. (I
was a little sloppy with my highlight …)
Red underline is the path to the Demos project directory in
my eclipse workspace.
Replace this path with yours.
The
file iodemotest.bin was written by running demos/BinaryDataWriteDemo.java.
Excellent
and humorous introduction to the topic.
Please read.
We
know what this one is: 7 bits per character values 00 – 7F (0 – 127 decimal) generally each
character stored in 1 byte.
Java
uses two byte Unicode characters, which is all of the characters in Unicode
Plane 0 – see below.
https://en.wikipedia.org/wiki/Code_point
In character encoding terminology, a code
point or code position is any of the numerical values
that make up the code space.[1] Many
code points represent single characters but they can also have other meanings,
such as for formatting.
For
example, the character encoding scheme ASCII comprises 128 code
points in the range 0hex to
7Fhex, Extended ASCII comprises 256 code points in the
range 0hex to FFhex,
and Unicode comprises
1,114,112 code points in the range 0hex to
10FFFFhex. The Unicode code space is divided into seventeen planes (the
basic multilingual plane, and 16 supplementary planes), each with 65,536
(= 216) code points. Thus the total size of the Unicode code space is
17 × 65,536 = 1,114,112.
https://en.wikipedia.org/wiki/Unicode#Code_point_planes_and_blocks
https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane
UTF-8
-- https://en.wikipedia.org/wiki/UTF-8
Used
by over 90% of the existing websites.
Note
in a 4 byte character, there are 21 x’s in the table above.
Window
| Preferences | General | Workspace
https://en.wikipedia.org/wiki/Windows-1252#Character_set
Still
unknown why we don’t get 128-159 shown in the above link. They are not ISO 8859-1, but I thought they
should display if using eclipse default Cp1252.
Note:
Java chars are two bytes long. 2-byte
characters can represent only enough code points for the Basic Multilingual
Plane. Java has been enhanced to cover
the supplementary planes, but I do not know the specifics of these
enhancements.
See demos/CharsetEncodingTest.java
Here
is an example of encoding a large Unicode value to UTF8 encoding:
dao.writeUTF(""+
// first 2 bytes show len of string 00
08 -- 8 byte
long string
'\u20ac'+
// 3 bytes: b1: {1110} [0010], b2:
{10}[00 00][10], b3: {10}[10] [1100]
// E 2 8 2 A C
'\uFFFD'+
// 3 bytes: b1: {1110} [1111], b2:
{10}[11 11][11], b3: {10}[11] [1101]
// E F B F B D
'\u0250');
// 2 bytes: b1: {110} [0 10 01], b2: {10}[01
0000]
// C 9 9 0