milearning: July 2013

Monday 29 July 2013

Rename volume group on which your root(/) partition resides.

We had few of the servers which was similar in file systems, disk partitions, applications, etc .., but was exception in only Volume Group(VG). So I had cloned the systems and thought of renaming the VG, but since root Logical Volume(LV) was configured on the same VG was unable to un-mount online. I had to bring down the server to perform below steps.

Below has been performed on CentOS-6.3 32-bit kernel version - 2.6.32-279.el6

Since it is root volume, we need to umount the file system, insert the CentOS CD/DVD to boot server in rescue shell.

#boot: linux rescue

when it prompts for the questions, choose your answers.

Question 5 : Rescue window

Offers to mount Linux installation in rw, read only or not at all. Either way, we will be provided with a shell. As this system's root partition is on a logical volume in the volume group we wish to rename, we must not mount the system. [ SKIP ]

Make sure all your logical volumes are OFFLINE

# lvm lvscan

INACTIVE '/dev/vg_pcmk1/home' [1.00 GiB] inherit

INACTIVE '/dev/vg_pcmk1/rootvg' [13.18 GiB] inherit

INACTIVE '/dev/vg_pcmk1/swap' [1.46 GiB] inherit

#lvm vgscan

found volume group vg_pcmk1

(output omitted)

- Rename the volume group

#lvm rename vg_pcmk1 vg_pcmk2

- Exit the shell and reboot the server. Let CentOS DVD be in the disk.

#exit

- Reboot the server in the rescue environment and while in Question 5, choose Continue

Eventually, a shell will appear. The system files are in /mnt/sysimage. chroot into the system:

# chroot /mnt/sysimage

sh-4.1#vim /etc/fstab

- Change the entries of the volume group for the logical volumes which are residing.

# grep vg_pcmk2 /etc/fstab

/dev/mapper/vg_pcmk2-rootvg / ext4 defaults 1 1

/dev/mapper/vg_pcmk2-home /home ext4 defaults 1 2

/dev/mapper/vg_pcmk2-swap swap swap defaults 0 0

- Change an entry in grub.conf file.

#vim /boot/grub/grub.conf

# grep vg_pcmk2 /boot/grub/grub.conf

kernel /vmlinuz-2.6.32-279.el6.i686 ro root=/dev/mapper/vg_pcmk2-rootvg rd_LVM_LV=vg_pcmk2/rootvg rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rd_LVM_LV=vg_pcmk2/swap rhgb quiet

- Create a newly initrd image as below.

# tail -1 /boot/grub/grub.conf

initrd /initrd-2.6.32-279.el6.i686.img

As updated above, create new initrd image and this will take some time and will have no output.

#mkinitrd /boot/initrd-$(uname -r).img $(uname -r)

- Exit from shell:

# exit

- Exit from Linux rescue:

# exit

- Remove the CD and boot the server.

Once your server logins, check the VG name, it would be successfully changed.

# df -h | grep vg_pcmk2

/dev/mapper/vg_pcmk2-rootvg

/dev/mapper/vg_pcmk2-home

NOTE: If no proper entries in root volume being made, kernel gets PANIC and will not sync and kills init process.

Wednesday 24 July 2013

Build Solaris 11 repository without network connection

Few days back had installed Oracle Solaris 11, and wanted to upgrade it to 11.1.

I came across IPS in Solaris 11 and I had to configure on the system. This system is not networked.

I hereby wanted to share as what is IPS and how to configure on the Solaris.

• The Image Packaging System (IPS) is a new network-eccentric software packaging and delivery system in Oracle Solaris 11. IPS allows efficient, observable, and controllable transitions between known configurations of software content providing administrators with safe system upgrade, environments and better control over planned system downtime schedules.

• The ZFS file system is integral to IPS, providing administrators the ability to perform updates on a file system clones on live production systems.

If you wish to download the ISO image, burn and insert into your CD/DVD and execute below as root.

#pkg set-publisher -G '*' -g file:///cdrom/sol11repo_full/repo solaris

If you like something more permanent while installing packages then you could configure as below

- After installing Solaris 11, download (on another system perhaps) the two files that make up the Solaris 11 repository from Oracle Solaris 11 Downloads

- Copy your files into the system

- Concatenate two files.

# cat sol-11_1-repo-full.iso-a sol-11_1-repo-full.iso-b > sol-11_1-repo-full.iso

- Mount ISO file to a location

#mount -F hsfs sol-11-11-repo-full.iso /mnt

- Set the publisher to point to the /mnt/repo location, copy the repository from the mounted ISO image to a permanent, on disk location.

#zfs create -o atime=off -o compression=on rpool/export/repoSolaris11

#rsync -aP /mnt/repo /export/repoSolaris11

#pkgrepo -s /export/repoSolaris11/repo refresh

#pkg set-publisher -G '*' -g /export/repoSolaris11/repo solaris

- Check your publisher pointing to.

# pkg publisher

PUBLISHER TYPE STATUS URI

solaris origin online file:///export/repoSolaris11/repo/

- Now you can upgrade your system online without any internet connections.

IPS commands can be found here - click here

Monday 22 July 2013

Kernel Crash Report/Crash Dump Analysis

In my previous post, we have configured how to capture kernel dump for reference click on the link kernel crash dump

Here in this article,we master the basic usage of crash utility to open the dumped memory core and process the information contained therein and to intercept the output.

find out the dumped kernel location and start analyzing the code.

#crash /usr/lib/debug/lib/modules/2.6.32-279.el6.i686/vmlinux /var/crash/127.0.0.1-2013-07-18-09\:40\:28/vmcore

KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.el6.i686/vmlinux

DUMPFILE: /var/crash/127.0.0.1-2013-07-18-09:40:28/vmcore [PARTIAL DUMP]

CPUS: 1

DATE: Thu Jul 18 09:40:21 2013

UPTIME: 00:37:21

LOAD AVERAGE: 934.79, 206.96, 67.74

TASKS: 5494

NODENAME: <hostname>

RELEASE: 2.6.32-279.el6.i686

VERSION: #1 SMP Fri Jun 22 10:59:55 UTC 2012

MACHINE: i686 (2933 Mhz)

MEMORY: 895.6 MB

PANIC: "Oops: 0002 [#1] SMP " (check log for details)

PID: 6847

COMMAND: "bash"

TASK: ea142aa0 [THREAD_INFO: db8ae000]

CPU: 0

STATE: TASK_RUNNING (PANIC)

Explanation of code is as below :-

KERNEL: specifies the kernel running at the time of the crash.

DUMPFILE: is the name of the dumped memory core.

CPUS: is the number of CPUs on your machine.

DATE: specifies the time of the crash.

TASKS: indicates the number of tasks in the memory at the time of the crash. Task is a set of program instructions loaded into memory.

NODENAME: is the name of the crashed host.

RELEASE: and VERSION: specify the kernel release and version.

MACHINE: specifies the architecture of the CPU.

MEMORY: is the size of the physical memory on the crashed machine.

PANIC: specifies what kind of crash occurred on the machine.

Panic refers to the use of magic keys(SysRq), which we deliberately trigger for a crash.

SysRq (System Request) refers to Magic Keys, which allow you to send instructions directly to the kernel. They can be invoked using a keyboard sequence or by echoing letter commands to /proc/sysrq-trigger, provided the functionality is enabled. We have discussed this in the Kdump part.

I attacked system by Denial of service(DoS) using the system to consume all its resources by forking. forkbomb

if you have looked load average of the crashed kernel its too high(934.79, 206.96, 67.74) and the process responsible was PID:6847

PANIC: "Oops: 0002 [#1] SMP " has the value below.

value

-----------------------------------------------------------------

Bit 0 1

------------------------------------------------------------------

0 No page found Invalid access

1 Read or Execute Write

2 Kernel mode User mode

3 Not instruction fetch Instruction fetch

from our PANIC analysis it is clear that "we have a page not found during a write operation in Kernel mode; the fault was not an Instruction Fetch."

we have used "/proc/sysrq-trigger" from the command line to dump our kernel in previous post, but if your system is unresponsive then you would be unable to trigger. In such cases we enable SysRq feature so that we could use magic keys to collect the dump of the crashed kernel.

#echo "1" > /proc/sys/kernel/sysrq

or add an entry to /etc/sysctl.conf

#vim /etc/sysctl.conf

kernel.sysrq = 1

Once configured, you will be able to use magic keys [ alt + PrintScreenSysRq + <options> ]

Options as are below.

'b' - Will immediately reboot the system without syncing or unmounting your disks.

'c' - Will perform a system crash by a NULL pointer deference A crash dump will be taken if configured.

'd' - Shows all locks that are held.

'e' - Send a SIGTERM to all processes, except for init.

'f' - Will call oom_kill to kill a memory hog process.

'g' - Used by kgdb (kernel debugger)

'h' - Will display help

'i' - Send a SIGKILL to all processes, except for init.

'j' - Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.

'k' - Secure Access Key (SAK) Kills all programs on the current virtual console.

'l' - Shows a stack backtrace for all active CPUs.

'm' - Will dump current memory info to your console.

'n' - Used to make RT tasks nice-able

'o' - Will shut your system off (if configured and supported).

'p' - Will dump the current registers and flags to your console.

'q' - Will dump per CPU lists of all armed hrtimers (but NOT regular timer_list timers) and detailed information about all clockevent devices.

'r' - Turns off keyboard raw mode and sets it to XLATE.

's' - Will attempt to sync all mounted filesystems.

't' - Will dump a list of current tasks and their information to your console.

'u' - Will attempt to remount all mounted filesystems read-only.

'v' - Forcefully restores framebuffer console

'v' - Causes ETM buffer dump [ARM-specific]

'w' - Dumps tasks that are in uninterruptable (blocked) state.

'x' - Used by xmon interface on ppc/powerpc platforms. Show global PMU Registers on sparc64.

'y' - Show global CPU Registers [SPARC-64 specific]

'z' - Dump the ftrace buffer

[ Alt + SysRq + c ] - Crash collected by rebooting the system.

We have almost analyzed upto certain extent, however there are few of the commands which can help us in understanding more.

Let us look few more basic commands which can be helpful.

crash> help

* files mach repeat timer

alias foreach mod runq tree

ascii fuser mount search union

bt gdb net set vm

btop help p sig vtop

dev ipcs ps struct waitq

dis irq pte swap whatis

eval kmem ptob sym wr

exit list ptov sys q

extend log rd task

bt - backtrace - Display a kernel stack backtrace :=

The sequence of numbered lines, starting with the hash sign (#) is the call trace. It's a list of kernel functions executed just prior to the crash. This gives us a good indication

of what happened before the system went down.

crash> bt

PID: 6847 TASK: ea142aa0 CPU: 0 COMMAND: "bash"

#0 [db8af618] crash_kexec at c049b75c

#1 [db8af66c] oops_end at c083fe92

(output omitted ...)

foreach - display command data for multiple tasks in the system :=

This command allows for a an examination of various kernel data associated with any, or all, tasks in the system, without having to set the context

to each targeted task.

crash>foreach bt

(output omitted..)

log - dump system message buffer :=

The log command dumps the kernel log buffer contents inchronological order . This is similar to what you would see when you type dmesg on a running machine. This is useful when you want to look at the panic or oops message. An oops is triggered by some exception. It is a dump of the CPU register's state and kernel stack at that instant . From the panic message, we can find hints as to how the panic was triggered (e. g. the function or process or pid or command or address that triggered the panic), the register's information, kernel module list, whether the kernel is

tainted with proprietary kernel modules loaded, and so on..

crash>log

(output omitted..)

SysRq : Trigger a crash

BUG: unable to handle kernel NULL pointer dereference at (null)

IP: [<c06a0d8f>] sysrq_handle_crash+0xf/0x20

*pdpt = 00000000116c3001 *pde = 0000000000000000

Oops: 0002 [#1] SMP

Pid: 6847, comm: bash Not tainted 2.6.32-279.el6.i686 #1 innotek GmbH

EIP: 0060:[<c06a0d8f>] EFLAGS: 00010096 CPU: 0

EIP is at sysrq_handle_crash+0xf/0x20

(output omitted..)

we don't observe any tainted flags on the kernel, each flag has its own meaning.

P — Proprietary module has been loaded.

F — Module has been forcibly loaded.

S — SMP with a CPU not designed for SMP .

R — User forced a module unload.

M — System experienced a machine check exception.

B — System has hit bad_page.

U — Users pace- defined naughtiness .

A — ACPI table over ridden.

W — Taint on warning.

ps - Display process status :=

Display process status information This command displays process status for selected, or all, processes in the system. If no arguments are entered, the process data is displayed for all processes. The active task is marked with ">"

crash>ps

6846 5711 0 d169d550 UN 0.1 5992 1112 bash

> 6847 1 0 ea142aa0 RU 0.1 5992 1320 bash

6848 1 0 d169d000 UN 0.1 5992 1368 bash

when you observe from the log, there are number of bash instances. Hence total number of bash instances are as below

crash> ps | fgrep bash | wc -l

5363

crash>

vm - virtual memory :=

This command displays basic virtual memory information of a context, consisting of a pointer to its mm_struct and page dirctory, its RSS and total virtual memory size; and a list of pointers to each vm_area_struct, its starting and ending address, vm_flags value, and file path name.

crash> vm

PID: 6847 TASK: ea142aa0 CPU: 0 COMMAND: "bash"

MM PGD RSS TOTAL_VM

e247f740 d16c2000 1320k 5992k

VMA START END FLAGS FILE

d16b66f8 70d000 70e000 4040075

d16b6694 760000 77e000 8000875 /lib/ld-2.12.so

(output omitted..)

files - open files :=

This command displays information about open files of a context. It prints the context's current root directory and current working directory, and then for each open file descriptor it prints a pointer to its file struct, a pointer to its dentry struct, a pointer to the inode, the file type, and the pathname.

crash> files 6427

PID: 6427 TASK: d4150aa0 CPU: 0 COMMAND: "bash"

ROOT: / CWD: /var/crash

FD FILE DENTRY INODE TYPE PATH

0 f6698240 f3609df8 f3610e48 FIFO

1 d40791c0 f377c8d0 f377d1a8 FIFO

(output omitted..)

runq - run queue :=

This command displays the tasks on the run queues

of each cpu.

crash> runq

CPU 0 RUNQUEUE: c6408680

CURRENT: PID: 6847 TASK: ea142aa0 COMMAND: "bash"

RT PRIO_ARRAY: c6408778

[no tasks queued]

CFS RB_ROOT: c64086dc

[120] PID: 7990 TASK: c9bb1000 COMMAND: "bash"

[120] PID: 8616 TASK: c13f8aa0 COMMAND: "bash"

[120] PID: 7714 TASK: f2485550 COMMAND: "bash"

(output omitted..)

timer - timer queue data :=

Displays the timer queue entries in chronological order, listing the target function names, the current value of jiffies, and the expiration time of each entry.

TVEC_BASES[0]: c0be66a0

JIFFIES

1941918

EXPIRES TIMER_LIST FUNCTION

1941920 d4aa1b74 c0466210 <process_timeout>

1941920 d847fb74 c0466210 <process_timeout>

(output omitted..)

net - network command :=

Display various network related data

crash> net

NET_DEVICE NAME IP ADDRESS(ES)

f70e6820 lo 127.0.0.1

f4f32020 eth0 <ipaddress-2>

c171e020 eth1 <ipaddress-1>

crash>

we have found some of the basics of kernel dump analysis which might be helpful in knowing what went behind the kernel to crash the system. As a best practice we need to analyze the dump and take necessary actions to avoid the re-occurrences.

"there are lot of administrators who don't care rebooting server, but need server online".

Monday 8 July 2013

rsync - file transfer program for *NIX systems

Long back when I wrote an script to get backup copy of the file, I used "cp" command. Few of my colleagues recommended me to get through "rsync". On my study I found this rsync is a powerful utility software and network protocol for *NIX based systems that synchronizes files and directories from one location to another while minimizing data transfer of files and directories from one location to another using data differencing when appropriate.

Instead of having the entire file to copy, rsync copies only the differences of files that have actually changed and transferred to the destination rather than the whole file. This makes updates faster, especially over slower links like modems.

Differences are compressed on the fly, further saving you file transfer time and reducing the load on the network.

rsync is used as a backup and mirroring tool.

I have customized my tools to backup my scripts from staging to production servers automatically through cron.

Some of the features of rsync are as below

- Support for copying links, devices, owners, groups and permissions.

- Exclude and exclude-from options similar to GNU tar.

- Does not require root privileges.

- Pipe-lining of file transfers to minimize latency costs.

Discussed below are few of the regular commands which are used very frequently.

Before understanding the below, I would suggest you to go through the man page for the options.

http://linux.die.net/man/1/rsync

I have two servers, I would name it as below..

1. Source

2. Destination

Trust between by source and the destination servers are through host-key authentications.

If you need to deploy host key authentication here is the link

http://sunlnx.blogspot.in/2012/09/ssh-passwordless-linuxwindows.html

Objective: Reference guide for rsync.

Since I would have a constant backup of my scripts, I have defined my path to store the files in the destination server.

On the destination host, configure your rsync

#yum -y install rsync xinetd

#vi /etc/xinetd.d/rsync

disable = no

#/etc/init.d/xinetd start

#lsof -i :873

#mkdir -p /backup/{scripts,configs,logs}

#vi /etc/rsyncd.conf

#cat /etc/rsyncd.conf

[scripts]

path=/backup

hosts allow = <IPADDR of your source server>

hosts deny = *

list = true

uid = root

gid = root

read only = false

Source - rsync one liners

# rsync file1 /tmp - copies file1 to /tmp.

# rsync -t file1 /tmp - copies and preserves modification time.

Connecting to rsync daemon

you can connect to rsync daemon using double colon :: instead of a single colon to separate the host name from the path.

# rsync 192.168.56.99::

scripts

# rsync -tv *.sh 192.168.56.99::scripts -> transfer all files matching the pattern to the directory src of the destination.

# rsync -avz ~/Documents/scripts/ 192.168.56.99::scripts -> copies all the files inside the directory.

# rsync -avz ~/Documents/scripts 192.168.56.99::scripts -> copies complete directory to the destination.

# rsync -avz --exclude-from=/etc/rsync_exclude.lst ~/Documents/scripts 192.168.56.99::scripts-> excludes the files in the /etc/rsync_exclude.lst

# rsync -azvh --inplace ~/Documents/scripts 192.168.56.99::scripts -> update files at the receiver end in human readable format.

# rsync -avzi root@192.168.56.99:/backup/scripts . -> view the difference in the files or directories between source and destination.

> specifies that a file is being transferred to the local host.

f represents that it is a file.

s represents size changes are there.

t represents time-stamp changes are there.

o owner changed

g group changed.

NOTE: You must be very careful while deleting files using rsync command.

I would suggest to use an dry-run before executing the rsync to delete files at the receiver end.

# rsync -av --delete --dry-run -i ~/Documents/scripts 192.168.56.99::scripts

building file list ... done

*deleting scripts/9.sh

*deleting scripts/8.sh

sent 5192 bytes received 22 bytes 10428.00 bytes/sec

total size is 398198 speedup is 76.37 (DRY RUN)

Once you confirm the above are the files which are not in the source, then continue to remove the dry-run and execute the command, which deleted files which are not in the destination server.

Now, your source and the destination servers are in sync.

milearning

pages

Monday 29 July 2013

Rename volume group on which your root(/) partition resides.

Wednesday 24 July 2013

Build Solaris 11 repository without network connection

Monday 22 July 2013

Kernel Crash Report/Crash Dump Analysis

Monday 8 July 2013

rsync - file transfer program for *NIX systems

Total Pageviews